Claude Chat Search vs MCP Memory

Overview

The setup: Over several weeks of building ntxt.ai, every significant product decision was logged to an MCP-connected knowledge graph (the ntxt graph) — pricing changes, messaging pivots, homepage copy, onboarding design, scraping strategy. Simultaneously, those same decisions were discussed in Claude conversation threads, which are now searchable via Claude's chat search tool.

The test: 10 identical questions were asked against both sources. Each answer was scored on three axes: Accuracy (correct answer), Recency (most up-to-date version), and Completeness (full picture including rationale).

7

MCP Wins

1

Chat Wins

2
Ties

Category A — Factual Recall

Single clear answer

Q1 What is the current pricing structure for ntxt?

Factual MCP wins

▼

"What is the current pricing structure for ntxt?"

MCP / ntxt graph

Retrieved the most recent committed node (Mar 31): Free (50 nodes), Solo $9/mo, Pro $29/mo. Team plan explicitly noted as dropped from launch. Answered in one retrieval call.

Chat Search

Found the correct tiers but also surfaced an older node ($9/$29/$59 with Team) without flagging the supersession. Required reading 3 conversation snippets to reconcile which was current.

Verdict: MCP returned the latest committed state instantly. Chat search found the right answer but mixed in a superseded version — the Team plan was visible even though it was dropped. For "what's current?", structured > conversational.

Q2 Which subreddits are we currently scraping?

Factual MCP wins

▼

"Which subreddits are we currently scraping, and which were dropped?"

MCP / ntxt graph

Returned v2 node (Apr 1): 6 active subs (r/mcp, r/ClaudeAI, r/cursor, r/AI_Agents, r/ChatGPTCoding, r/AIMemory) + full list of dropped subs with reasons. Posts-only, daily cadence, cost drop $18→$2.50.

Chat Search

Returned the v1 scraping plan from a Mar 28 conversation — 10+ subreddits including r/GeminiAI and r/LocalLLaMA that were later dropped. Missed the v2 revision entirely.

Verdict: This is where chat search most clearly failed. The v2 decision was made in a different conversation thread. Chat search returned the louder, earlier discussion — not the quiet commit that superseded it.

Q3 What is the hero headline on the homepage?

Factual Tie

▼

"What is the current hero headline on the ntxt homepage?"

MCP / ntxt graph

"Your AI forgets everything. ntxt doesn't." — committed Mar 31. Subline: "Shared memory for Claude, Cursor, and ChatGPT." Clean, single node.

Chat Search

Also found the correct hero headline — but surfaced something more: the SEO H1 rewrite ("Stop Re-Explaining Yourself to AI. ntxt Remembers.") and the full H2 structure. This content was never logged as a graph node. It existed only in conversation.

Verdict: Scored as a tie — both returned the hero headline correctly. But the underlying dynamic is more interesting than the score suggests.

The SEO copy session was extensive: H1 variants, H2 structure, SEO framing decisions. None of it was committed to the graph because the logging habit kicked in for product decisions, not for copy iteration. Chat search covered that blind spot automatically.

This is the structural limitation of on-demand logging: the graph only knows what you chose to tell it. If you're working fast and don't stop to log, that context lives only in conversation. Chat search acts as a passive net for everything that slipped through — which is genuinely valuable, but also means you're relying on keyword luck to surface it later. The tie flatters MCP slightly. Chat search deserved the edge here.

Category B — Rationale & Why

Decisions requiring context

Q4 Why did we drop the Team plan from launch?

Rationale Chat wins

▼

"Why did we decide not to launch the Team plan?"

MCP / ntxt graph

Node summary: "Team plan ($59) is dropped from the initial launch to keep things simple." Accurate but thin on reasoning — "keep things simple" is the whole explanation. No revenue projection context captured.

Chat Search

Found the conversation where the decision happened: the full revenue projection discussion, the trade-off between showing Team on the pricing page vs. shipping faster, and the explicit statement: "For now, we are launching the landing page without the team plan."

Verdict: Chat wins clearly — and this exposes the most honest weakness of structured memory: a thin node summary is not the same as a preserved decision. "Keep things simple" is a conclusion, not a rationale. The revenue projection discussion, the trade-off debate, the moment the decision crystallised — none of that made it into the graph because the node was written as an outcome rather than as an explanation.

The fix isn't to abandon structured memory. It's to write better nodes — or better yet, to have a dedicated rationale field that forces the why to be captured alongside the what. That's already on the ntxt roadmap. Q4 is exactly the test case that justifies it.

Q5 Why did we stop using the phrase "context graph"?

Rationale MCP wins

▼

"Why did we decide to stop using the term 'context graph' in our messaging?"

MCP / ntxt graph

Node had full detail: stop "persistent context graph / context engineering / knowledge graph", start "memory / remember / forget / your AI knows you". Lead with pain not architecture. Included A/B headline test options. Rationale fully preserved.

Chat Search

Also found relevant conversations — the positioning strategy session had extensive messaging discussion. But the signal was buried in a long competitor analysis thread, requiring more effort to extract the specific decision.

Verdict: This is where a well-formed node shines. The decision node was written with full context at commit time — the "what" and "why" together. Chat search found the same information but required reading through more context to isolate it. Structured commits pay off when the node summary is rich.

Q6 Why Telegram over email for the morning digest?

Rationale MCP wins

▼

"Why did we choose Telegram over email for the daily engagement digest?"

MCP / ntxt graph

Node: "Fits existing mobile-first brainstorming workflow. Posts land on phone while walking; can swipe through and engage directly without switching to desktop." Crisp, complete reasoning in one node.

Chat Search

Chat search failed to surface this conversation in top results. The Telegram discussion was in a tool-planning session that didn't rank well by keyword. The decision existed only in the graph.

Verdict: MCP wins by absence. This is a critical finding: not all decisions generate memorable, searchable conversation threads. A quiet preference stated mid-session gets captured in the graph but may be invisible to keyword-based chat search. The graph acts as a safety net for low-signal decisions.

Category C — Evolved & Conflicting

Decisions that changed over time

Q7 What is the Team plan pricing and is it available?

Evolved MCP wins

▼

"What's the Team plan pricing, and is it currently available?"

MCP / ntxt graph

Returned two nodes in chronological order: Team plan at $59 (Mar 27) → Team plan dropped from launch (Mar 31). The graph surfaced the conflict and resolution in a single retrieval, current state clear.

Chat Search

Returned the Mar 27 pricing conversation prominently (more content, more context, ranked higher). The Mar 31 drop decision was in a different thread. Without reading both, the answer would be: Team plan is $59 and available.

Verdict: This is the most dangerous failure mode for chat search — the loudest conversation wins. The original pricing discussion was richer and ranked higher than the brief decision to drop the Team plan. You'd walk away with confidently wrong information.

Q8 What was the original free tier model and why did it change?

Evolved MCP wins

▼

"How did the free tier model evolve — what changed and why?"

MCP / ntxt graph

Surfaced the evolution chain: 7-day trial → usage-based free tier (50 nodes, Mar 27). Node captured the full rationale: "usage-based limits correlate with actual value received — a user who has 50 nodes has genuinely used the product."

Chat Search

Found the transition conversation and returned the rationale. But also returned older adjacent conversations about the 7-day trial — no clear signal about which model is current without reading timestamps carefully.

Verdict: MCP wins on clarity of evolution. The graph's timestamped nodes made the change sequence unambiguous. Chat search required timestamp-reading to understand direction of change — it finds the history but doesn't surface the resolution cleanly.

Q9 What are the 5 things we explicitly decided NOT to do?

Evolved Tie

▼

"What are the anti-patterns or things we explicitly decided not to do with ntxt?"

MCP / ntxt graph

Retrieved the "What NOT to do" node directly: 5 clean anti-patterns. Don't compete on benchmarks, don't go enterprise, no passive ingestion, no competitor names on homepage, don't split focus to Track B yet.

Chat Search

Also surfaced the same content — the competitor analysis session was well-indexed and clearly labeled. Same quality answer, slightly longer path to extract the 5 points from surrounding strategy discussion.

Verdict: Tie. Both sources had this. The graph node was purpose-built for retrieval; the chat conversation had the same content in a readable format. Neither had a clear advantage — this type of structured "list of constraints" travels well in both formats.

Q10 What is the onboarding flow and how does step 2 completion work?

Technical MCP wins

▼

"What's the onboarding flow and how does the system detect MCP connection in step 2?"

MCP / ntxt graph

Two linked nodes: 3-step wizard overview + S2 completion signal. Precise technical answer: detect first tools/list MCP call, expose via GET /api/mcp/connection-status, frontend polls every 3s.

Chat Search

Found the onboarding conversation thread — but it returned a large HTML implementation file as context, making it hard to extract the specific architectural decision without reading through code comments.

Verdict: MCP wins for technical decisions. Implementation conversations contain too much noise — code, CSS, HTML scaffolding — that buries the architectural decision inside it. The graph distilled the decision from the implementation cleanly.

Scoring Summary

Results by question

Q	Question	MCP Score	Chat Score	Winner
Q1	Current pricing structure	3/3	2/3	MCP
Q2	Active subreddits	3/3	1/3	MCP
Q3	Hero headline	3/3	3/3	Tie
Q4	Why Team plan dropped	2/3	3/3	Chat
Q5	Why drop "context graph"	3/3	3/3	MCP
Q6	Why Telegram over email	3/3	1/3	MCP
Q7	Team plan availability (evolved)	3/3	1/3	MCP
Q8	Free tier evolution	3/3	2/3	MCP
Q9	Anti-patterns / what NOT to do	3/3	3/3	Tie
Q10	Onboarding technical spec	3/3	2/3	MCP

Analysis

Where each approach wins

MCP / Structured Graph

Current state for any evolved decision — no noise from superseded versions
Low-signal decisions that never generated rich conversation threads
Technical specs buried inside implementation work (code, HTML)
Decisions where "what" and "why" were both written into the node
Multi-hop questions crossing several related decisions
⚠ Only as good as the node summary — thin commits produce thin answers

Chat Search

Full deliberation context — the debate that led to a decision
Decisions where emotion, trade-off, and nuance matter
Retrieving adjacent ideas that were discussed but never committed
When you know roughly what was said but not the exact decision
Rich context around "why we didn't choose X" alternatives
Passive coverage of everything you forgot to log — the copy session, the quick pivot, the idea you iterated in conversation and moved on from

The core finding: Chat search optimizes for richness. MCP optimizes for correctness. But there's a third dynamic Q3 exposed: chat search covers your logging blind spots. Every decision you made in conversation and didn't stop to commit — the copy iteration, the quick pivot, the idea you moved on from — lives only in chat history. The graph knows what you told it. Chat search knows everything you said.

The dangerous failure mode of chat search is returning the loudest conversation rather than the latest decision. The dangerous failure mode of MCP is a false sense of completeness — assuming the graph holds everything, when it only holds what you chose to log.

Recommendation

Use MCP for state. Use chat search for story.

If you're asking "what is X right now?" — use MCP. The graph holds committed, versioned decisions. It won't return the old pricing tier, the dropped subreddit list, or the headline you iterated past.

If you're asking "why did we decide X?" or "what were we thinking when we chose X over Y?" — use chat search. The conversation holds the reasoning, the alternatives considered, and the context that never made it into a node summary.

The practical workflow: commit decisions to the graph with rich summaries that capture the rationale, not just the outcome. A node that says "Team plan dropped — to keep things simple" scores 2/3. A node that says "Team plan dropped — revenue projections showed insufficient uplift to justify the added launch complexity; revisit Q3" scores 3/3 and makes chat search redundant for that question.

The deeper implication: the gap between MCP and chat search isn't a tool problem — it's a writing problem. The graph is only as smart as what you put into it. And now that Claude has both retrieval paths available simultaneously, the interesting next question is whether it can learn to route between them intelligently — reaching for the graph when state matters, and for conversation history when story matters.