How It Works — Synstate Labs

The Problem

Today's AI is blind

Imagine talking to a friend through text-only messages — no voice, no video, no facial expressions. They write "I'm fine", but they're actually crying. You'd never know.

That's how all AI works today. ChatGPT, Claude, every chatbot — they see only text. They don't know if you're happy, exhausted, or on the edge of burnout.

The result: AI can say something tone-deaf or even harmful, because it has no idea how you're actually feeling.

Real-World Impact

Why AI needs to understand people

Healthcare assistant

Without perception

Patient says "it doesn't hurt," but their face is tense with pain. The system takes them at their word.

With state awareness

System detects facial tension and elevated stress signals: "I can see you're uncomfortable. Let's adjust your care plan."

Adaptive tutor

Without perception

Student nods along, but their eyes are blank. The system keeps going while comprehension drops to zero.

With state awareness

System notices confusion in the interaction pattern: "Let me explain this differently, with a visual example."

AI companion

Without perception

User writes "I'm having fun," but their typing is slow and fragmented. The system misses the disconnect.

With state awareness

System detects low energy and hesitation: "It seems like you're having a rough time. I'm here if you want to talk."

Factory floor

A worker is fatigued after a long shift — reactions are slower, attention drifts. A state-aware robot adjusts its speed near the worker to prevent accidents.

Driver monitoring

A driver starts to doze off — gaze drops, micro-pauses lengthen. The system detects the drift and triggers an alert before it becomes dangerous.

Core Capabilities

Three signal channels

We teach systems to perceive human state through the same signals a perceptive colleague would notice — just faster and more consistently.

Visual signals

Facial expression tracking — brow tension, gaze direction, blink rate, micro-expressions — analyzed locally on device.

Voice signals

How you speak matters more than what you say — pitch, pace, pauses, tremor, and volume shifts reveal stress, fatigue, and engagement.

Behavioral signals

Typing rhythm, mouse dynamics, error rate, micro-pauses, context switching — continuous streams of interaction data that encode cognitive state.

Personalization

Everyone is different

A smile in Japan and a smile in Italy can mean different things. Your baseline is different from your colleague's. The system accounts for this through three layers.

Universal

Some patterns are the same for everyone: fear widens eyes, sadness drops lip corners, frustration tightens the jaw. These work across cultures.

Fear, sadness, joy — recognized everywhere

Cultural

Different cultures express emotions differently. Some smile when embarrassed, some gesticulate intensely, some speak quietly even when happy.

The system knows: "restraint is normal in this context"

Individual

Every person is unique. Some always type fast, some always speak softly. The system learns your normal behavior and detects deviations from it.

"Usually types 60 wpm. Today: 30 wpm. Something changed."

The system learns over time

Day 1

Only universal rules: general patterns of stress, fatigue, engagement.

Week 1

Learning your rhythm: when you get tired, how you type when focused.

Month 1

Knows you personally: your stress patterns, fatigue curves, peak hours.

How the layers work together

Situation	Universal	Cultural	Individual
Person smiles	Usually = happiness	In Japan, may = embarrassment	For this user, happens even when sad
Speaks quietly	May = sadness	In Finland = normal	This user is always quiet — that's their style
Types slowly	May = fatigue	—	20 wpm is normal for A, but a red flag for B
Long pauses in speech	May = deep thought	In Japan = respect	This user pauses when frustrated

The Pipeline

Five steps from signal to understanding

Collect interaction signals

Like a detective gathering clues, the system continuously collects signals about your state from multiple channels:

Face

68 facial landmarks tracked 30 times per second

Voice

Pitch, speed, pauses, tremor

Behavior

Typing speed, pauses, error rate

Convert to numerical representations

Computers don't understand images or sounds directly. We convert every signal into compact numerical vectors — like describing a painting with numbers.

Smile + rapid typing + steady voice → [0.82, 0.15, 0.03, ...] 512 dimensions

Analogy: You see "red," the computer sees #FF0000. Same thing, different language.

Fuse signals together

Any single signal can be misleading. But combined, they paint an accurate picture.

Smiling

Voice is shaking

Typing erratically

→

Hidden stress

The smile is masking tension

Compare against your personal baseline

The system remembers how you normally behave and notices when something changes.

What the system remembers

Your usual typing speed
How often you take pauses
Your typical expression during work
Your normal voice pace and tone
What time of day you get fatigued

What the system detects

"Typing 40% slower than usual today"
"3x more errors than baseline"
"Voice quieter than normal"
"More tense expression than yesterday"
"Working later than usual pattern"

Analogy: Like a close friend who knows you're not okay even when you say "I'm fine" — because they know what your normal looks like.

AI adapts its response

Now AI sees not just what you typed, but how you're actually doing — and can respond accordingly.

State-aware AI (Synstate)

Everything's fine

You've been going for a while and your pattern suggests fatigue. Want me to simplify things or suggest a break?

AI sees: face tense + voice quieter than normal + typing slower than baseline

Specifications

Key technical facts

80%+ accuracy

The system correctly identifies state 8 out of 10 times — better than an average stranger (60-70%), approaching the accuracy of a close colleague.

Under 100ms latency

Analysis happens in real time — fast enough that the response feels instant and natural, not delayed.

Fully on-device

No video or audio leaves your device. All processing runs locally. Your personal baseline stays on your machine.

1 week to personalize

After one week the system knows your patterns well enough to detect deviations. After a month, it understands you like a close colleague.

Impact

Why this matters

Safety

When AI detects that someone is in distress — depression, extreme stress, burnout trajectory — it can offer support instead of continuing business as usual.

Understanding

AI becomes more human. It can offer support when you're struggling and match your energy when things are going well.

Burnout prevention

The system detects fatigue before the person themselves is aware of it, and can suggest breaks or reduce cognitive load.

Safer human-machine interaction

Robots in hospitals, classrooms, and factories can respond to people's real state — not just their words.

Teaching AI to perceive human state

Today's AI is blind

Why AI needs to understand people

Healthcare assistant

Adaptive tutor

AI companion

Factory floor

Driver monitoring

Three signal channels

Visual signals

Voice signals

Behavioral signals

Everyone is different

Universal

Cultural

Individual

The system learns over time

Day 1

Week 1

Month 1

How the layers work together

Five steps from signal to understanding

Collect interaction signals

Convert to numerical representations

Fuse signals together

Compare against your personal baseline

What the system remembers

What the system detects

AI adapts its response

Key technical facts

80%+ accuracy

Under 100ms latency

Fully on-device

1 week to personalize

Why this matters

Safety

Understanding

Burnout prevention

Safer human-machine interaction