Our Methodology

How ReadingMinds detects, measures, and scores emotional signals from voice interviews

On this page

Speech-to-Speech Architecture

ReadingMinds uses second-generation ASR (Automatic Speech Recognition) that processes voice-to-voice rather than voice-to-text-to-voice. This preserves prosodic signal that text-based systems lose.

First-generation platforms transcribe speech to text, run NLP on the text, and then synthesize a response. Each conversion strips away prosodic signals: the rhythm, tone, and energy that carry expression meaning. Our speech-to-speech pipeline keeps these signals intact throughout the entire interview, enabling real-time expression-signal awareness and adaptive follow-up questions.

Expression Signal Analysis

ReadingMinds analyzes multiple layers of vocal expression to extract expression signals in real time:

Prosody analysis

Rhythm, intonation, melody, and pace are continuously measured. Changes in pitch contour or speaking rate often signal shifts in engagement, uncertainty, or enthusiasm.

Energy levels

Vocal intensity and volume patterns reveal emotional activation. Higher energy typically correlates with excitement or frustration, while low energy may indicate disengagement or resignation.

Hesitation markers

Pauses, filler words (um, uh), and speech restart patterns are tracked. Frequent hesitation often signals uncertainty, discomfort, or cognitive load around a topic.

Affective computing

Multiple signals are combined using affective computing models to produce expression tags at intensity 1-9 for each conversational turn.

Sentiment Scoring

ReadingMinds goes beyond simple positive/neutral/negative classification to deliver granular, context-aware sentiment analysis:

Granular sentiment

Instead of three buckets, sentiment is scored on a continuous scale, capturing subtle differences between mild approval and enthusiastic endorsement.

Contextual sentiment

The same words can carry different emotional weight depending on context. “That’s interesting” spoken with rising intonation and high energy signals genuine curiosity; spoken flatly, it may signal polite dismissal. ReadingMinds uses vocal context to disambiguate.

Word-voice disagreements

When the literal word content says one thing but the vocal delivery suggests another (e.g., saying “I love it” in a flat, disengaged tone), ReadingMinds flags the discrepancy for review.

PEP Score (Purchase Enthusiasm Potential)

The PEP Score is a proprietary composite metric that measures emotional energy combined with buying readiness:

What it measures

Emotional energy and buying readiness inferred from voice signals during the interview.

How it’s calculated

A composite of enthusiasm signals, hesitation markers, and engagement patterns. The algorithm weighs vocal energy, consistency of positive emotion, absence of hesitation on key purchase-related questions, and overall engagement trajectory.

High PEP (> 0.7)

Strong purchase intent. The respondent shows genuine enthusiasm, consistent engagement, and minimal hesitation around buying-related topics.

Medium PEP (0.4 – 0.7)

Interested but uncertain. The respondent shows engagement but with notable hesitation, mixed signals, or unresolved concerns.

Low PEP (< 0.4)

Unconvinced or negative. The respondent shows low energy, frequent hesitation, or negative emotional signals around purchase intent.

Use cases

Renewal prediction, concept validation, pricing sensitivity testing, and identifying at-risk accounts before churn occurs.

Thematic Coding

ReadingMinds automatically identifies and organizes recurring themes across your interview data:

Automatic theme detection

As interviews complete, the system identifies recurring topics, concerns, and patterns across all respondents without manual coding.

Clustering and naming

Related mentions are clustered into themes and assigned descriptive names. The system surfaces both expected themes and emergent ones you may not have anticipated.

Quality Controls

Data quality is fundamental to trustworthy insights. ReadingMinds employs multiple layers of verification:

Anti-bot verification and liveness detection

The system verifies that a real human is speaking, filtering out automated or scripted responses.

Voice watermarking

Emma’s AI-generated voice responses are watermarked, making it possible to distinguish the interviewer’s voice from the respondent’s.

Metadata and digital fingerprinting

Device metadata, session fingerprinting, and response pattern analysis identify duplicate or fraudulent participants.

Flagged and excluded interviews

Interviews that fail quality checks are automatically flagged. You can review flagged interviews and choose to exclude them from analysis, ensuring only high-quality data informs your insights.

See it in action

Experience how ReadingMinds captures emotional signals with a free Live Test Drive.

Start 3‑Minute Live Test Drive