Garry Tan System Map

GBrain Data Model and Runtime Map

The persistent memory layer as schema, runtime, and trust boundary.

gstack = methodgbrain = continuitygbrain-evals = proofYC = network

GBrain Data Model and Runtime Map

Product Frame

gbrain is persistent memory for AI agents. Its job is to let an agent search, cite, update, and reuse knowledge across sessions, repos, machines, and clients.

The public README positions it as "the memory your agent actually keeps between sessions." The codebase backs that with a Postgres/pgvector schema, source-scoped tenancy, page/chunk storage, embeddings, code-symbol metadata, soft delete, effective dates, OAuth-scoped HTTP MCP, and stdio MCP tools.

Runtime Loop

  1. Source is registered: local repo, wiki, media archive, YC/media corpus, or default brain.
  2. Content is ingested into pages with source_id, slug, type, title, compiled_truth, timeline, and frontmatter.
  3. Content is chunked into content_chunks with text, embeddings, token counts, modality, language, symbols, and line ranges.
  4. Search combines vector, text, graph/code edges, salience, source scope, and recency filters.
  5. MCP/CLI exposes search, get_page, put_page, sync, code-def, code-refs, code-callers, code-callees, and admin operations.
  6. Skills run read-enrich-write loops: search the brain, synthesize, update pages, maintain citations/backlinks.

Core Entities

EntityProduct MeaningImportant Fields / Behavior
sourcesA logical brain inside the DBid, name, local_path, federation config, chunker_version, archived flags
pagesCanonical knowledge objectssource_id, slug, type, page_kind, title, compiled_truth, timeline, frontmatter
content_chunksRetrieval unitschunk_text, embedding, model, token_count, language, symbol metadata, modality
code_edges_chunkResolved code graphfrom_chunk_id, to_chunk_id, edge_type, symbol identities
code_edges_symbolUnresolved code refstarget symbol known before source definition is imported
filesBinary/file storage referencesused for images, uploads, and filesystem-backed memory
ingest logs/jobsOperational tracetells operators what was imported, embedded, transformed, or failed
OAuth clients/tokensRemote trust boundaryscopes read/write/admin and source-specific access

Page Types

The TypeScript PageType union includes person, company, deal, yc, civic, project, concept, source, media, writing, analysis, guide, hardware, architecture, meeting, note, email, slack, calendar-event, code, image, and synthesis.

This matters because gbrain is not just note search. It encodes the major objects of founder work:

Source Scoping

sources is the tenancy layer. Each page belongs to a source. A source can be federated or non-federated:

Product implication: gbrain can support a personal brain, team brain, client brain, repo brain, and public knowledge corpus without mixing all writes into one undifferentiated pile.

Retrieval Model

The schema points to four retrieval modes:

The "Cathedral II" direction in comments is important. It moves gbrain from markdown memory into code-aware retrieval:

That lets coding agents ask "where is this defined?", "who calls it?", and "what context should I inspect next?" without falling back to broad grep.

Trust and Deployment Model

gbrain supports multiple deployment shapes:

The product spec hiding here: persistent memory only works if users can trust the boundary. gbrain's trust boundary is source-aware, client-aware, and operation-aware.

Product Requirements

For gbrain to feel like real agent memory, it must satisfy:

Open Questions

Deep Runtime Evidence Map

This pass goes below the README and treats gbrain as a runtime product. The source basis is gbrain/README.md, gbrain/src/schema.sql, gbrain/src/core/types.ts, gbrain/src/core/operations.ts, gbrain/src/core/search/*, and the MCP/security docs.

Expanded Entity Map

EntityRuntime RoleEvidenceProduct Meaning
sourcesIn-DB tenancy namespace. Every page/file/ingest row belongs to a logical source.gbrain/src/schema.sqlLets one brain hold wiki, code repos, media corpora, team knowledge, or isolated project memory without global slug collision.
pagesCanonical knowledge object.gbrain/src/schema.sql, gbrain/src/core/types.tsThe memory page: title, compiled truth, timeline, frontmatter, type, soft-delete state, emotional weight, effective date.
content_chunksRetrieval unit for text/code/image.gbrain/src/schema.sqlConverts pages into searchable chunks with embeddings, FTS vector, symbol metadata, modality, and token count.
linksPage graph, backlinks, entity relationships.gbrain/src/schema.sql, gbrain/src/core/types.tsThe brain wires itself through typed relationships from markdown, frontmatter, and manual edges.
timeline_entriesStructured temporal facts.gbrain/src/schema.sql, gbrain/src/core/types.tsLets the brain answer when/what-changed questions beyond vector similarity.
code_edges_chunk / code_edges_symbolResolved and unresolved code graph edges.gbrain/src/schema.sqlPowers code-def, code-refs, callers/callees, and two-pass code retrieval.
filesBinary/file sidecar index.gbrain/src/schema.sqlStores references for images/uploads without stuffing bytes into core page rows.
oauth_clients, oauth_tokens, oauth_codesRemote MCP identity and authorization.gbrain/src/schema.sqlDefines client identity, grants, scopes, write source, and federated-read set.
mcp_request_logRemote tool-call audit log.gbrain/src/schema.sql, gbrain/SECURITY.mdMakes remote brain access observable without retaining raw payloads by default.
minion_jobs and subagent tablesDurable agent runtime.gbrain/src/schema.sqlLets background jobs, subagent loops, tool executions, and rate leases persist.
eval_candidatesReal retrieval eval capture.gbrain/README.md, gbrain/src/schema.sqlTurns real query/search calls into replayable BrainBench-Real examples.

Source / Tenant Model

gbrain has two axes:

AxisMeaningBoundary Rule
BrainOne database: PGLite, Postgres, or Supabase.Data owner / access-control boundary.
SourceNamed content repo inside a brain.Repo, topic, team, client, or workstream boundary inside one DB.

Key mechanics:

MechanicRuntime Behavior
Per-source slug namespacepages enforces unique (source_id, slug), so different sources can safely contain the same slug.
Federationsources.config.federated=true joins default unqualified search; false requires explicit source selection.
Source resolutionPrecedence flows through explicit source flag/env/project files/registered local path/default source.
Agent citationMulti-source citations need source-qualified slugs such as [source-id:slug].
OAuth source modelRemote clients get write authority through source_id and separate read authority through federated-read configuration.

Interpretation: source scoping is not metadata decoration. It is the anti-leak primitive for shared brains. Several code comments in operations/search paths treat missing source propagation as a P0 leak class.

Retrieval Pipeline

StageWhat HappensProduct Reason
Mode resolutionconservative, balanced, and tokenmax set defaults for cache, intent weighting, token budget, expansion, limit, and reranker use.Lets operators trade cost, speed, and depth.
Intent classificationQuery intent influences detail, salience, recency, and RRF weights.Memory search should adapt to task shape.
Keyword pathAlways runs first and works without embeddings.Day-one installs and offline paths still work.
Vector pathIf an embedding provider exists, query variants are embedded and searched.Semantic recall covers fuzzy questions.
FusionKeyword and vector lists merge via weighted reciprocal-rank fusion, then score adjustments.Combines exact and semantic evidence.
BoostsBacklinks, salience, recency, and exact match affect rank.Operator memory needs relationships and time, not only similarity.
Structural expansionOptional graph walk via nearSymbol / walkDepth, capped.Coding agents need symbol adjacency.
DedupComposite (source_id, slug), text similarity, type diversity, per-page cap, compiled-truth guarantee.Avoids noisy repeated chunks.
Rerank / budgetOptional reranker and token budget enforcement.Keeps output useful inside model context.
Eval capturesearch and query can capture retrieved slugs/chunks when enabled.Converts real usage into benchmark fuel.

Product read: gbrain is not "vector DB with markdown." It is layered retrieval: lexical, vector, graph, temporal, salience, source tenancy, and code-structure expansion.

MCP / OAuth / Trust Boundary

BoundaryMechanismRequirement Implied
Local stdio MCPgbrain serve exposes tools over stdio.Local agents get structured brain tools without HTTP setup.
HTTP MCPgbrain serve --http exposes OAuth-backed MCP and admin dashboard.Remote clients need scoped auth, logs, discovery, and client management.
Operation contractsTool definitions derive from shared operations.No hand-maintained schema drift between CLI/MCP/HTTP.
Shared dispatchStdio and HTTP use the same validation/context/result path.Transport parity is a correctness and security feature.
Remote flagOperationContext.remote is required for remote/untrusted callers.Filesystem/tool operations fail closed for remote agents.
Source read scopeRead helpers prefer OAuth allowed sources, then context source ID.Every read path must thread source scope into filters.
ScopesOperations are tagged read/write/admin/local-only.Remote clients get least-privilege tool access.
Local-only opsSync/file operations are rejected over HTTP regardless of scope.Remote agents cannot touch local filesystem surfaces.
Logging redactionMCP params are logged as redacted shape by default.Admin observability must not become a private-data leak.
Network hardeningLoopback default, CORS deny by default, rate limits, proxy warnings.Personal brains should not become accidentally exposed.

Product Requirements, Ranked

PriorityRequirementWhy It ExistsAcceptance Signal
P0Source isolation must be enforced on every read/write path.Multi-source brains otherwise leak client/team/repo context.Same-slug pages across sources remain distinct; every read handler honors source filters.
P0Remote MCP must be scoped, logged, and local-file-safe.Remote clients operate outside the owner's OS trust boundary.OAuth scopes honored, local-only ops rejected, params redacted, rate limits active.
P0Search must degrade gracefully without embeddings.Day-one installs may lack provider keys.search works; query falls back instead of failing.
P1Retrieval must combine lexical, semantic, graph, recency, salience, and code structure.Founder/operator memory needs factual, temporal, relationship, and code recall.Hybrid path returns ranked, deduped, source-aware chunks with metadata.
P1Every page needs provenance and citation discipline.Memory must be auditable, not merely plausible.Compiled truth and timeline carry source citations; conflicts are explicit.
P1Agent tool schemas must be generated from one operation contract.MCP/CLI drift breaks clients and causes strict-schema failures.Tool definitions derive from operations; dispatch is shared.
P1Code memory must be symbol-aware.Coding agents need where-defined/who-calls/near-symbol more than broad prose recall.Code metadata and edge tables populate; two-pass retrieval respects source scope.
P2Runtime should expose health, eval capture, and replay.Memory quality needs proof.eval_candidates captures real calls when explicitly enabled.
P2Search modes should make cost/quality tunable.Haiku loops and Opus/tokenmax workflows have different budgets.Search modes resolve deterministic knobs and cache keys.

Bottom line: gbrain's product is a source-scoped, citation-aware memory runtime exposed through CLI/MCP. The defensible moat is the combination of typed memory entities, source tenancy, hybrid retrieval, agent-safe operation contracts, and eval-backed quality loops.