Our Methodology
How ReadingMinds detects, measures, and scores emotional signals from voice interviews
On this page
Speech-to-Speech Architecture
ReadingMinds uses second-generation ASR (Automatic Speech Recognition) that processes voice-to-voice rather than voice-to-text-to-voice. This preserves emotional data that text-based systems lose.
First-generation platforms transcribe speech to text, run NLP on the text, and then synthesize a response. Each conversion strips away prosodic signals: the rhythm, tone, and energy that carry emotional meaning. Our speech-to-speech pipeline keeps these signals intact throughout the entire interview, enabling real-time emotional awareness and adaptive follow-up questions.
Emotion Detection
ReadingMinds analyzes multiple layers of vocal expression to detect emotional states in real time:
Prosody analysis
Rhythm, intonation, melody, and pace are continuously measured. Changes in pitch contour or speaking rate often signal shifts in engagement, uncertainty, or enthusiasm.
Energy levels
Vocal intensity and volume patterns reveal emotional activation. Higher energy typically correlates with excitement or frustration, while low energy may indicate disengagement or resignation.
Hesitation markers
Pauses, filler words (um, uh), and speech restart patterns are tracked. Frequent hesitation often signals uncertainty, discomfort, or cognitive load around a topic.
Affective computing
Multiple signals are combined using affective computing models to produce emotion tags for each detected emotion.
Sentiment Scoring
ReadingMinds goes beyond simple positive/neutral/negative classification to deliver granular, context-aware sentiment analysis:
Granular sentiment
Instead of three buckets, sentiment is scored on a continuous scale, capturing subtle differences between mild approval and enthusiastic endorsement.
Contextual sentiment
The same words can carry different emotional weight depending on context. “That’s interesting” spoken with rising intonation and high energy signals genuine curiosity; spoken flatly, it may signal polite dismissal. ReadingMinds uses vocal context to disambiguate.
Word-voice disagreements
When the literal word content says one thing but the vocal delivery suggests another (e.g., saying “I love it” in a flat, disengaged tone), ReadingMinds flags the discrepancy for review.
PEP Score (Purchase Enthusiasm Potential)
The PEP Score is a proprietary composite metric that measures emotional energy combined with buying readiness:
What it measures
Emotional energy and buying readiness inferred from voice signals during the interview.
How it’s calculated
A composite of enthusiasm signals, hesitation markers, and engagement patterns. The algorithm weighs vocal energy, consistency of positive emotion, absence of hesitation on key purchase-related questions, and overall engagement trajectory.
High PEP (> 0.7)
Strong purchase intent. The respondent shows genuine enthusiasm, consistent engagement, and minimal hesitation around buying-related topics.
Medium PEP (0.4 – 0.7)
Interested but uncertain. The respondent shows engagement but with notable hesitation, mixed signals, or unresolved concerns.
Low PEP (< 0.4)
Unconvinced or negative. The respondent shows low energy, frequent hesitation, or negative emotional signals around purchase intent.
Use cases
Renewal prediction, concept validation, pricing sensitivity testing, and identifying at-risk accounts before churn occurs.
Thematic Coding
ReadingMinds automatically identifies and organizes recurring themes across your interview data:
Automatic theme detection
As interviews complete, the system identifies recurring topics, concerns, and patterns across all respondents without manual coding.
Clustering and naming
Related mentions are clustered into themes and assigned descriptive names. The system surfaces both expected themes and emergent ones you may not have anticipated.
Quality Controls
Data quality is fundamental to trustworthy insights. ReadingMinds employs multiple layers of verification:
Anti-bot verification and liveness detection
The system verifies that a real human is speaking, filtering out automated or scripted responses.
Voice watermarking
Emma’s AI-generated voice responses are watermarked, making it possible to distinguish the interviewer’s voice from the respondent’s.
Metadata and digital fingerprinting
Device metadata, session fingerprinting, and response pattern analysis identify duplicate or fraudulent participants.
Flagged and excluded interviews
Interviews that fail quality checks are automatically flagged. You can review flagged interviews and choose to exclude them from analysis, ensuring only high-quality data informs your insights.
See it in action
Experience how ReadingMinds captures emotional signals with a free Live Test Drive.
Start 3‑Minute Live Test Drive