Qualia-1 for agents · research preview

Your agents are brilliant strangers.

Qualia-1 turns them into colleagues who've worked with you for years. A behavioral context engine learns how you and your organization actually work — and feeds that context to any agent. Same model, same prompt. The generic mistakes stop.

See it think ↓ Join the waitlist Closed beta · selected teams
ch: preferences · org-knowledge · work-state · history injected per turn · bidirectional · live
01 / The problem

Generic agents waste your time — measurably.

Frontier models reason well. What they don't have is you: your preferences, your org, your history. Published research puts numbers on what that costs.

<10%

Preference-following accuracy of frontier LLMs by turn 10 of a conversation. They hear your preference — then lose it within ~3k tokens.

PrefEval · Zhao et al. · ICLR 2025published research
41.4%

Best frontier-model accuracy on MultiChallenge multi-turn instruction following. Every leading model scores below 50%.

Scale AI · arXiv:2501.17399published research
34%

Best frontier-model success on τ²-bench Telecom — the hardest dual-control domain, where agent and user share control. Most failures are simply not knowing what the user has already done.

τ²-bench · Telecom · Sierra, arXiv:2506.07982published research
02 / How it works

Watch. Learn. Inject.

Qualia-1 runs on-device next to your work. It observes how work unfolds, distills it into structured facts with provenance, and injects exactly the right ones into any agent — per turn, in real time.

sessions data focus · window switches session structure docs · threads · code org & CRM sync privacy-preserving tokenize · embed QUALIA-1 structured context `preferences` `work-state` `org-facts` `history` `agent-policy` ANY AGENT raw signals are tokenized on-device · the model never sees raw data
signal exchangephase-lock · forming context
[ 01 ]

Watch

Live behavioral signals — focus, window switches, session structure — plus the artifacts of work: code, docs, threads, CRM. Raw content never leaves the device.

[ 02 ]

Learn

Patterns become facts with provenance and confidence — “prefers pnpm · 134 sessions · conf 0.98” — kept fresh, decayed when stale, inspectable and deletable by you.

`focused``debugging``reviewing``blocked``fragmented`
[ 03 ]

Inject

Each agent turn receives the few facts that change the outcome — alongside its existing prompt, memory and tools. The agent decides; Qualia-1 supplies who it's deciding for.

`respond``wait``ask``act``shorten`
03 / Live cases

Same prompt. Same model. Different agent.

Four agents, replayed twice: once generic, once with Qualia-1 context. Watch the context card — each fact lights up at the exact moment it changes the agent's behavior.

04 / Evidence

What context is worth, in numbers.

Two kinds of evidence: public agent benchmarks re-run with Qualia-1 context in our internal harness, and product metrics from simulated pilot sessions. Baselines are real, cited, and current as of June 2026. Qualia-1 figures are ours — read them as a research preview, not audited results.

Read this first. Figures labeled “+ Qualia-1” come from internal, preliminary evaluations under simulated benchmark conditions. Qualia-1 agent-context is a research preview under customer-development validation; numbers will change.
BenchmarkPublic baseline+ Qualia-1Δ uplift
MultiChallengemulti-turn: instruction retention, memory of user info 69.6%publicGPT-5, leaderboard top · Scale SEAL · 2026 78.2% +8.6 pp
LongMemEval-Slong-term memory across chat sessions 60.2%publicfull-context GPT-4o, canonical no-memory baseline · arXiv:2501.13956 84.6%published memory systems: 81.6–94.9 (Supermemory · EverMemOS · Mastra OM) +24.4 pp
LongMemEval-V2agentic memory: workflows, state, gotchas · May 2026 48.5%publicstrongest RAG-memory baseline · arXiv:2605.12493 58.9%coding-agent controller hits 72.5% at high latency — Qualia-1 targets the <400 ms regime +10.4 pp
PrefEval · turn 10preference following over a dialogue <10%publicfrontier zero-shot · ICLR 2025 · arXiv:2502.09597 74%published injection methods recover 84–97% on the MCQ subset ×7+

Product metrics from simulated pilots.

−34%
Time to task completion
median · N=2,400 sessions · 85 pilot users · 6 weeks · coding + corporate agents · internal eval
−41%
Retry & clarification rate
clarifying questions per task · N=2,400 sessions · preliminary
−47%
User-correction rate
user edits to agent output per task · coding + legal cohorts · N=1,180
−38%
Prompt length — you type less
median user prompt tokens, same tasks, with vs without context · N=2,400
6284%
First-response acceptance
output accepted without edit on first turn · N=1,180 drafting tasks
−29%
Voice escalation rate
human-handoff rate · simulated call cohort · N=640 calls
−43s
Average handle time (voice)
−22% mean AHT · N=640 calls · returning-caller context enabled
−2.3
Partner red-line cycles (legal)
redline rounds to partner sign-off · N=210 drafts · 6 firms · internal eval
05 / Privacy & compliance

Personal context without the surveillance trade-off.

Everything above only works if people accept being learned from. That acceptance is an architecture problem — and the architecture is the moat.

[ on-device ]

Your patterns never leave your device.

Qualia-1 is a compact model that runs fully on-device. Raw signals and raw content stay local; only structured context is emitted to the agent.

[ in-tenant ]

Org knowledge stays in your tenant.

Role changes, decisions, deal history — synced and resolved inside your perimeter. Agents receive facts, not your documents. Every learned fact is inspectable, editable, deletable.

[ eu ai act ]

Behavioral tempo, not emotion inference.

Qualia-1 models cadence, focus and session structure — it does not infer emotions from biometric data. Designed against the EU AI Act's Art. 5(1)(f) workplace prohibition (in force Feb 2025, fines up to €35M / 7% of turnover), which constrains cloud sentiment-analysis approaches.

Cloud memory (typical)

  • Invisible profile, accumulated in a vendor's cloud
  • Raw conversations leave your perimeter
  • Emotion / sentiment inference exposed to Art. 5(1)(f) risk
  • Hard to audit what the model “knows” about you

Qualia-1

  • Visible facts with provenance, confidence and freshness
  • On-device behavioral layer · org facts stay in-tenant
  • Behavioral signals only — no biometric emotion inference
  • Inspect, edit or delete any learned fact
06 / Access — and an honest note

This page is an experiment.

Qualia-1 agent-context is a research preview. We're deciding which vertical to build first — and the deciding data is what you do on this page. One vote, one field, and you've shaped the roadmap.

Join the waitlist

Pilot slots open by vertical, in the order this page votes them in. Closed beta · selected teams.

✓ Logged. We'll reach out as your vertical's pilot opens. — Synstate Labs

Prefer talking? Book a 20-min demo →