supabrain-x · v1 research state · project start: May 2026 · deliberately reduced page

Context allocation across time.

The problem is not longer context. The problem is deciding what deserves context, when, and in what form.

supabrain-x is a benchmark-driven research project for testing retrieval, reranking and context-allocation policies on long-term memory tasks. It is not a claim that memory is solved.

01 · Problem

Memory is not storage alone. It is an allocation policy over time.

LLMs operate under bounded and asymmetrically costly context. Passing more text forward increases latency, cost and noise.

The central question is therefore not whether information exists somewhere. The question is whether the right information enters the working context at the right time, with the right level of detail.

02 · Evaluation Protocol

Benchmark

LongMemEval-S cleaned.

Full tasks500
Full messages246,738
Dev tasks100
Budgets50 / 100

Metrics

  • R@1 and R@3 for ranked evidence.
  • evidence_hit_rate for whether labelled evidence reached context.
  • mean_tokens and runtime for cost discipline.

Evidence-hit is not final answer quality. It is a controlled intermediate measurement.

03 · Current Result

current best supported method

Routed Context Policy

A deterministic policy selects the context form after reranking. Multi-evidence queries receive top-7 aggregation. Other queries receive neighbour sidecar context.

It is not a new retriever. It is a context-allocation layer.

BM25 top-20candidate retrieval
MiniLM-L-6 Cross-Encoderreranking
Routed Context Policyallocation
Multi-Resolution packingbudget control
Routed hit
0.876
vs Cross-Encoder
+4.2pp
Ranked R@3
0.6604
Mean tokens
96.02

04 · Baselines

Method Role R@3 Evidence hit
Lexical overlap weak baseline 0.3773 0.522
BM25 + Multi-Resolution fast baseline 0.5573 0.736
Hybrid 0.5 / 0.5 superseded 0.5838 0.774
MiniLM-L-6 Cross-Encoder quality retrieval baseline 0.6604 0.834
Routed Context Policy best supported allocation method 0.6604 0.876

05 · What Worked / What Did Not

Supported

  • Cross-Encoder reranking: strongest retrieval baseline.
  • Neighbour Sidecar: improves evidence coverage at budget 100.
  • Multi-Evidence Aggregation: improves detected counting, comparison and multi-fact questions.
  • Routed Context Policy: combines both allocation behaviours and is the current best method.

Not Supported / Parked

  • Embedding-only retrieval as a standalone method.
  • Simple temporal and lifecycle keyword heuristics.
  • Naive neighbour expansion that damages ranked R@3.
  • Bounded heuristic depth policy.
  • Stronger rerankers as global defaults: BGE helped a subset but hurt global R@3 and runtime.

06 · Limitations

Evidence, not answers

The current protocol measures whether labelled evidence reaches context. It does not prove that final generated answers are correct.

One benchmark

LongMemEval-S is useful, but not a complete proxy for all memory workloads or real product usage.

Budget sensitivity

Several allocation gains appear at budget 100. Budget 50 remains tight and less forgiving.

07 · Next Work

Do not add another architecture before the current findings are consolidated.

Recommended next steps

  1. Prepare an external README or short paper.
  2. Run answer-level evaluation on a small controlled subset.
  3. Investigate candidate recall without simply paying for top-50 everywhere.
  4. Keep future claims tied to the frozen protocol.