Teaching AI to perceive human state

A simple explanation of the technology that gives AI systems the ability to understand how people actually feel — not just what they type.

Today's AI is blind

Imagine talking to a friend through text-only messages — no voice, no video, no facial expressions. They write "I'm fine", but they're actually crying. You'd never know.

That's how all AI works today. ChatGPT, Claude, every chatbot — they see only text. They don't know if you're happy, exhausted, or on the edge of burnout.

The result: AI can say something tone-deaf or even harmful, because it has no idea how you're actually feeling.
Without state awareness
I'm fine, just tired
Great! Here are 5 more tasks for today.
AI doesn't see: focus declining, fatigue rising, session 4+ hours

Why AI needs to understand people

Healthcare assistant

Without perception

Patient says "it doesn't hurt," but their face is tense with pain. The system takes them at their word.

With state awareness

System detects facial tension and elevated stress signals: "I can see you're uncomfortable. Let's adjust your care plan."

Adaptive tutor

Without perception

Student nods along, but their eyes are blank. The system keeps going while comprehension drops to zero.

With state awareness

System notices confusion in the interaction pattern: "Let me explain this differently, with a visual example."

AI companion

Without perception

User writes "I'm having fun," but their typing is slow and fragmented. The system misses the disconnect.

With state awareness

System detects low energy and hesitation: "It seems like you're having a rough time. I'm here if you want to talk."

Factory floor

A worker is fatigued after a long shift — reactions are slower, attention drifts. A state-aware robot adjusts its speed near the worker to prevent accidents.

Driver monitoring

A driver starts to doze off — gaze drops, micro-pauses lengthen. The system detects the drift and triggers an alert before it becomes dangerous.

Three signal channels

We teach systems to perceive human state through the same signals a perceptive colleague would notice — just faster and more consistently.

Visual signals

Facial expression tracking — brow tension, gaze direction, blink rate, micro-expressions — analyzed locally on device.

Voice signals

How you speak matters more than what you say — pitch, pace, pauses, tremor, and volume shifts reveal stress, fatigue, and engagement.

Behavioral signals

Typing rhythm, mouse dynamics, error rate, micro-pauses, context switching — continuous streams of interaction data that encode cognitive state.

Everyone is different

A smile in Japan and a smile in Italy can mean different things. Your baseline is different from your colleague's. The system accounts for this through three layers.

1

Universal

Some patterns are the same for everyone: fear widens eyes, sadness drops lip corners, frustration tightens the jaw. These work across cultures.

Fear, sadness, joy — recognized everywhere
2

Cultural

Different cultures express emotions differently. Some smile when embarrassed, some gesticulate intensely, some speak quietly even when happy.

The system knows: "restraint is normal in this context"
3

Individual

Every person is unique. Some always type fast, some always speak softly. The system learns your normal behavior and detects deviations from it.

"Usually types 60 wpm. Today: 30 wpm. Something changed."

The system learns over time

Day 1

Only universal rules: general patterns of stress, fatigue, engagement.

Week 1

Learning your rhythm: when you get tired, how you type when focused.

Month 1

Knows you personally: your stress patterns, fatigue curves, peak hours.

How the layers work together

Situation Universal Cultural Individual
Person smiles Usually = happiness In Japan, may = embarrassment For this user, happens even when sad
Speaks quietly May = sadness In Finland = normal This user is always quiet — that's their style
Types slowly May = fatigue 20 wpm is normal for A, but a red flag for B
Long pauses in speech May = deep thought In Japan = respect This user pauses when frustrated

Five steps from signal to understanding

1

Collect interaction signals

Like a detective gathering clues, the system continuously collects signals about your state from multiple channels:

Face
68 facial landmarks tracked 30 times per second
Voice
Pitch, speed, pauses, tremor
Behavior
Typing speed, pauses, error rate
2

Convert to numerical representations

Computers don't understand images or sounds directly. We convert every signal into compact numerical vectors — like describing a painting with numbers.

Smile + rapid typing + steady voice [0.82, 0.15, 0.03, ...] 512 dimensions
Analogy: You see "red," the computer sees #FF0000. Same thing, different language.
3

Fuse signals together

Any single signal can be misleading. But combined, they paint an accurate picture.

Smiling
Voice is shaking
Typing erratically
Hidden stress
The smile is masking tension
4

Compare against your personal baseline

The system remembers how you normally behave and notices when something changes.

What the system remembers

  • Your usual typing speed
  • How often you take pauses
  • Your typical expression during work
  • Your normal voice pace and tone
  • What time of day you get fatigued

What the system detects

  • "Typing 40% slower than usual today"
  • "3x more errors than baseline"
  • "Voice quieter than normal"
  • "More tense expression than yesterday"
  • "Working later than usual pattern"
Analogy: Like a close friend who knows you're not okay even when you say "I'm fine" — because they know what your normal looks like.
5

AI adapts its response

Now AI sees not just what you typed, but how you're actually doing — and can respond accordingly.

Blind AI (today)
Everything's fine
Great! What can I help you with?
AI doesn't see: fatigue rising, focus declining
State-aware AI (Synstate)
Everything's fine
You've been going for a while and your pattern suggests fatigue. Want me to simplify things or suggest a break?
AI sees: face tense + voice quieter than normal + typing slower than baseline

Key technical facts

80%+ accuracy

The system correctly identifies state 8 out of 10 times — better than an average stranger (60-70%), approaching the accuracy of a close colleague.

Under 100ms latency

Analysis happens in real time — fast enough that the response feels instant and natural, not delayed.

Fully on-device

No video or audio leaves your device. All processing runs locally. Your personal baseline stays on your machine.

1 week to personalize

After one week the system knows your patterns well enough to detect deviations. After a month, it understands you like a close colleague.

Why this matters

Safety

When AI detects that someone is in distress — depression, extreme stress, burnout trajectory — it can offer support instead of continuing business as usual.

Understanding

AI becomes more human. It can offer support when you're struggling and match your energy when things are going well.

Burnout prevention

The system detects fatigue before the person themselves is aware of it, and can suggest breaks or reduce cognitive load.

Safer human-machine interaction

Robots in hospitals, classrooms, and factories can respond to people's real state — not just their words.

We give AI eyes to read faces, ears to hear voice, awareness to sense behavior, and memory to know your baseline — so it understands people like a close colleague, not a machine.