Affective Computing Is Advancing Fast. Standards Are Not.
Affective computing, defined as systems that have the ability to detect, interpret or simulate human emotion, has moved from research labs into real products. Voice interfaces, call centers, and interview platforms now claim to understand how people feel, not just what they say.
But there is a problem. The foundation is still fragmented.
There is no single global standard for emotional voice recognition.
Instead, the field is stitched together from partial frameworks. The W3C (World Wide Web Consortium) offers EmotionML, a way to represent emotions in data.
The ISO, International Organization for Standardization and International Electrotechnical Commission jointly publish the 30150 series, defining how affective systems should be structured and described.
European Telecommunications Standards Institute (ETSI) provides testing guidance for emotion detection in telecom scenarios.
Emotional modeling terms
To illustrate the point here are some of the terms being used at the moment that illustrate the standards problem:
- Valence: How positive or negative an emotion is (happy = positive, angry = negative)
- Arousal: The intensity or energy level of the emotion (calm vs excited)
- Categorical model: Emotions are discrete labels (angry, sad, happy)
- Dimensional model: Emotions are coordinates on axes (valence + arousal, sometimes dominance)
These are useful, but they do not solve the core problem.
They do not agree on what emotions should be measured. They do not define consistent intensity scales. They do not establish reliable accuracy benchmarks across real world conversations. They do not solve the hardest issue, which is turning noisy human speech into consistent, decision grade emotional signals.
As a result, vendors produce outputs that are difficult to compare. One system labels "frustration". Another reports "negative sentiment". A third outputs low "valence" and high "arousal". They may all be observing the same moment, but they describe it differently, with different reliability.
This lack of standardization is holding the category back.
Businesses do not need emotion detection alone. They need signals they can trust, signals that drive decisions such as adjusting messaging, escalating risk, or prioritizing follow up.
Until global standards catch up, the real opportunity is not just building better models. It is defining a consistent, evidence backed layer that makes emotional signals comparable, explainable, and actionable.
In affective computing, the next breakthrough will not be smarter AI.
It will be trustworthy interpretation.
Written by
Stu Sjouwerman
Know what your customers feel. Not just what they say.
ReadingMinds conducts AI voice interviews that classify emotion type and intensity. Try a 3-minute Live Test Drive with Emma.
Start 3‑Minute Live Test Drive