
An AI model developed by researchers at Seoul National University Bundang Hospital (SNUBH) can now detect stress with up to 70 percent accuracy—based entirely on a person’s voice.
Trained on samples from 115 Korean full-time employees, the deep learning system flags stress by analyzing subtle non-verbal cues like tone, pitch, and breath rhythm. The results, published in Psychiatry Investigation, represent one of the first biosignal-validated voice-based stress models built specifically for a Korean population.
The research team, led by Professor Kim Jeong-Hyun of SNUBH’s Department of Public Health Medical Services, in collaboration with SNU’s Department of Electrical and Computer Engineering and Institute of New Media and Communications, used ECAPA-TDNN—an AI architecture originally designed for speaker recognition.
Participants recorded their voices before and after undergoing a standardized stress-inducing protocol: the Socially Evaluated Cold Pressor Test, which involves hand immersion in ice water while being observed.

Professor Kim Jeong-hyun of Seoul National University Bundang Hospital's (SNUBH) Department of Public Health Medical Services (Courtesy of SNUBH)
To confirm whether stress had been successfully induced, the study used biological and self-reported markers—salivary cortisol levels and distress thermometer readings. Only data from participants who showed measurable stress responses were used to train and validate the model.
Compared to traditional models like convolutional neural networks and conformers, ECAPA-TDNN consistently delivered higher performance, especially when analyzing free-form speech. The model was trained and validated on 95 samples and tested on a separate group of 20, identifying stress in 70 percent of test subjects.
Instead of focusing on what people said, the model zeroed in on how they said it—capturing stress-related shifts in vocal tension, rhythm, and tempo. Because it relies only on non-linguistic features, researchers noted the system avoids common sources of bias tied to language fluency, education level, or cultural background. The researchers added that all processing took place locally on device, keeping privacy risks low.
The study was supported by SK Telecom and conducted at both SNUBH and Boramae Medical Center. Participants read a neutral essay and responded to casual prompts about their daily lives. Audio recordings were segmented into overlapping four-second chunks and converted into Mel spectrograms, a common feature representation in voice-based AI.
While not yet commercialized, the team said they believe the technology could eventually power real-time stress monitoring in consumer devices. Future iterations may integrate additional biometric inputs—such as heart rate variability or skin conductance—to further boost accuracy, Professor Kim said in a statement.
Source: https://www.koreabiomed.com/news/articleView.html?idxno=27219