Grok in the Timeline · Visual Spec v1.0

Grok shouldn't be a tab. It should be a layer.

Six surgical insertions of Grok into the existing X surface — designed to lift cognitive load by 30% without breaking a single user habit. Each frame is independently shippable. The system stays monochrome.

David T Phung
For the xAI Product Designer (IC) role

Companion to spec.html
Grounded in X Brand Quick Guide v1.0

2026
00 / Frame

The shift

Today, Grok is a destination users go to. Tomorrow, Grok is a layer that meets users where they already are. Everything that follows is in service of that move.

Before

Grok as a tab

  • Lives in a separate top-level surface
  • Every question is a context switch from the timeline
  • Users copy-paste posts into Grok to ask about them
  • No memory of what the user was just reading
  • Competes with the feed for attention and engagement
  • Treated as a chatbot product, not a system capability

After

Grok as a layer

  • Lives inside the timeline, post detail, and compose
  • One tap to invoke, scroll-away to dismiss
  • Always anchored to the post or thread that prompted it
  • Cites sources by default — context, never verdict
  • Steerable: users can adjust the feed in plain language
  • Treated as infrastructure that lifts every flow
01 / System

Visual primitives

The X brand is rigorously monochrome — black, white, and one muted gray for secondary text. Grok's presence in the system has to earn its pixels. No purple. No gradients. No "AI shimmer." Three primitives. That's the whole language.

The Grok Mark

A 12px outlined diagonal with a center node. Visually consonant with the X mark, distinct enough to signal "generated." Used inline on Grok-touched surfaces and nowhere else.

The Grok Surface

Grok
The chart is normalized by population. The 12% figure refers to growth rate, not absolute count.

1px inner border at rgba(255,255,255,0.16), 12px radius, near-black fill at 2% white. The entire Grok visual language — applied to every layer.

The Source Chip

@nateliason · post Reuters · Mar 14 +2 sources

Every Grok answer carries source chips. Tap to jump to the post or open the link. If sources aren't strong, Grok says so via the confidence indicator — never silently.

02 / Insertion 1

Inline Ask-Grok

A subtle affordance on every post. One tap → Grok inserts a single-sentence "what's this about" answer directly below the post, with source chips. The cell expands; the user never loses their place.

Frame 1.1 · Resting state
For youFollowing
Naval @naval · 2h
The compound interest of writing in public is the largest unexploited arbitrage of the internet.
412 1.2K 14.6K 1.1M
Ask Grok about this
Patrick Collison @patrickc · 3h
The Hubble tension is now ~5σ. Cosmology is in the same place particle physics was in 1900.
89 412 3.2K 220K
The affordance only appears on posts where Grok has something to add. Posts like "lol" or "gm" don't render it. The default state is quiet.
Frame 1.2 · Expanded
For youFollowing
Patrick Collison @patrickc · 3h
The Hubble tension is now ~5σ. Cosmology is in the same place particle physics was in 1900.
89 412 3.2K 220K
Grok
The Hubble tension is the disagreement between two ways of measuring how fast the universe is expanding — early-universe data (CMB) and late-universe data (supernovae) — now off by roughly 5 standard deviations. Patrick is comparing it to the late-1800s "ultraviolet catastrophe," which preceded quantum mechanics.
Nature · 2024 @SeanCarroll thread arXiv:2401.18965
High confidence · grounded in 3 sources
Ask a follow-up →
Answer is anchored to the post that prompted it. Sources are chips, not footnotes — tappable, named, dated. Confidence is shown explicitly so users can calibrate trust.
  1. Why no sidebar. A sidebar would force users to track two columns of state simultaneously. Inline expansion preserves single-column attention — the thing the timeline does best.
  2. Why "Ask Grok about this" wording. Active verb, possessive object. Tells the user exactly what they're getting and that they're driving.
  3. Why confidence is visible by default. Hidden uncertainty is dishonest. Showing it builds the calibration loop users need to trust the system.
03 / Insertion 2

Thread Synthesizer

When a thread has ≥8 replies, Grok summarizes the shape of the conversation above the original post. Consensus, disagreement, notable replies. The user can read four lines instead of four hundred — or expand into the full thread.

Frame 2.1 · Thread synthesizer banner
Post
Grok · synthesizing 247 replies
The thread is split: ~60% are pushing back on the methodology of the chart (normalization, sample size). The author posted a clarification at reply #34. A separate sub-thread (@karpathy, 41 replies) reframes the question as "is this even the right metric."
Consensus

The underlying data is real and recent. Source is credible.

Disagreement

Whether absolute or per-capita numbers are the honest comparison.

Worth reading

@karpathy #41 reframes the metric. @primalpoly #12 has the cleanest counter-chart.

Updated 2 min ago · 12% sample Hide synthesizer ›
Tyler Cowen @tylercowen · 6h
The chart making the rounds is real but normalized in a way that flatters the conclusion. Here's the per-capita version.
247 1.4K 8.9K 642K
The synthesizer is visibly generated — small Grok mark, sample size, timestamp. Users know it's a model's read, not editorial. Collapsible by default per user preference.
  1. The "12% sample" disclosure. Most replies aren't read by Grok in real time — it samples. Showing the sample size is non-negotiable. Pretending Grok read all 247 replies is dishonest design.
  2. Why three columns. Consensus / Disagreement / Notable maps to the three things users actually want from a thread: "what do most people think," "where's the fight," "what's worth my time."
  3. Why this lives above the OP. Above is the only place that lets users decide before investing scroll. Below the OP is too late.
04 / Insertion 3

Claim-Context Chip

When a post contains a checkable factual claim, a small chip appears: "Context available." Tap to see what Grok found, with sources and explicit confidence. Grok surfaces context — never a verdict. That's Community Notes' job.

Frame 3.1 · Chip resting
For youFollowing
Anonymous Builder @buildernyc · 1h
The US imports 90% of its rare earths from China. We have zero domestic refining capacity.
22 184 1.1K 48K
Context available · 2 claims checked
The chip is quiet. It doesn't shout "wrong" or "right" — it offers context. Users are adults; they can read the underlying data.
Frame 3.2 · Chip expanded
For youFollowing
Anonymous Builder @buildernyc · 1h
The US imports 90% of its rare earths from China. We have zero domestic refining capacity.
22 184 1.1K 48K
Context · 2 claims checked
"90% from China" — USGS 2024 data shows 72% of US rare earth imports came from China; ~95% of global refining is in China.

"Zero domestic refining" — MP Materials reopened Mountain Pass refining in 2023; capacity is limited but not zero.
USGS · 2024 MP Materials 10-K Reuters · Mar 2024
High confidence · 3 primary sources
This is context, not a verdict. Report inaccuracy · Open in Community Notes
Context, not verdict. The user gets the underlying data and decides. Reporting and Community Notes are one tap away — Grok is the first-pass research layer, not the arbiter.
05 / Insertion 4

Compose Co-Writer

The #1 reason lurkers don't post is fear of being wrong. The Co-Writer gives users a private review before they hit Post. Sharpen, steelman, or fact-check. Original draft is never overwritten without consent. This is the conversion metric.

Frame 4.1 · Draft state
Cancel Post
Everyone can reply
AI is going to eat 90% of customer support jobs in the next 2 years. The unit economics finally work.
📷🎬📊📍 Co-Writer
Co-Writer affordance appears only when the draft is over 40 chars — below that, the user knows what they're doing. Above that, there's enough content for Grok to actually help.
Frame 4.2 · Co-Writer engaged
Cancel Post
AI is going to eat 90% of customer support jobs in the next 2 years. The unit economics finally work.
Co-Writer
Sharpen

Tighter, same voice

Steelman

Strengthen weak claims

Check

2 flags found

"90% in 2 years" is a strong claim — current industry forecasts (Gartner, McKinsey) cluster around 30–50% automation by 2027. You'll likely get pushback.

Suggestion: "AI is going to take a major bite out of customer support — Gartner sees 30%+ automation by 2027, and unit economics are finally working."
The Co-Writer is private. Nothing is sent to the network. The user posts what they decide to post. Grok's job is to be the friend who reads your draft before you tweet it.
  1. Why "Check" is selected by default. Of the three actions, fact-flagging has the highest expected utility for the most users. Sharpen/Steelman are voice-dependent; Check is universally useful.
  2. Why the user's original draft is shown above unchanged. Grok is not the author. The user is. Anything Grok generates is a suggestion in a separate frame.
  3. Why "Keep mine" is a button. Equal weight to "Use suggestion." We're not nudging.
06 / Insertion 5

Steerable For You

The For You algorithm is currently a black box users resent. Letting them steer it in plain language is the most honest interaction X could ship — and only possible with a model in the loop. This is the moat.

Frame 5.1 · Tune your feed
Done
Tune your feed
What's shaping your feed right now
NBA & basketball28%
AI / ML research22%
Startup / VC18%
Politics (US)14%
Design & typography6%
Other12%
Less NBA. Less politics. More design and AI papers.
Grok
I'll lower NBA from 28% → 10% and US politics from 14% → 4%. Design will grow from 6% → 22% and AI research from 22% → 35%. Here's the diff:
NBA28% → 10%
Politics14% → 4%
Design6% → 22%
AI research22% → 35%
I can't filter individual accounts here — use mute/block for that.
The diff is the trust mechanism. Users see exactly what Grok will change before they apply it. Grok also names what it can't do — honest about the algorithm's boundaries.
Frame 5.2 · Entry point
For youFollowing
Your For You feed Tune
Frank Chimero @fchimero · 4h
A grid is a contract with the reader. Breaking it should cost the designer something — or it doesn't mean anything.
32 184 2.1K 78K
Andrej Karpathy @karpathy · 5h
New paper: scaling reasoning at inference is not just "use more tokens" — there's a clean separation between search depth and search width that nobody is exploiting yet.
412 2.1K 18.4K 980K
The "Tune" chip is the only persistent Grok affordance in the timeline header. One tap from any feed view. The post-steer feed reflects the user's stated taste — design and AI research, exactly as ordered.
07 / Insertion 6

Conversational Time-Scrub

X is the largest real-time archive of human reaction in existence and it's almost impossible to query. Grok unlocks it. Ask in plain language; get a timeline of that moment.

Frame 6.1 · Scrub query + result
"what was X saying the day GPT-4 launched"
Scrubbed to Mar 14, 2023 · 10am – 11pm PT filtered to your interest graph
Sam Altman @sama · Mar 14, 2023
here is GPT-4, our most capable and aligned model yet. it is available today in our API (with a waitlist) and in ChatGPT+.
8.4K 32K 280K 24M
Andrej Karpathy @karpathy · Mar 14, 2023
congrats to OpenAI team on GPT-4! a few early observations: handles much longer contexts, dramatically better at multi-step reasoning, and the image input is a real step change.
1.2K 4.8K 41K 3.2M
Emad @EMostaque · Mar 14, 2023
GPT-4 is impressive but also expensive — the real story of 2023 will be open-source models closing the gap faster than anyone expected.
412 2.1K 12K 480K
The result is a timeline, not a chatbot summary. Grok resolves the date and filter; X renders posts the way it always renders posts. Continue scrolling to load more from that window. The interaction model stays familiar.
  1. Why a real timeline, not a summary. Users trust posts more than summaries. Grok's job is to find the right posts, not to paraphrase them.
  2. Why "filtered to your interest graph." Without this filter, scrubbing March 14, 2023 returns mostly noise. With it, the user gets what they would have seen.
  3. Why this is build cost #6. Heaviest backend work — historical retrieval + temporal ranking. Ship insertions 1–4 first, validate the lift, then invest here.
08 / Honest

What kills this

A design that doesn't name its failure modes isn't a design — it's a poster. The four ways this dies, and how the system is structured to survive them.

Risk 1

Hallucination on a real post

Confidence labels visible by default. Source chips on every answer. One-tap report routes to the training pipeline. Context chip is tuned for high precision, lower recall — we'd rather say nothing than say something wrong.

Risk 2

Latency kills the feature

400ms first-token, 1.5s full answer — non-negotiable. Pre-fetch on viewport entry for top-of-feed posts. If we can't hit budget, we don't ship that surface.

Risk 3

Users feel watched

Every Grok action is opt-in per-tap. The only auto-trigger is the Thread Synthesizer on ≥8-reply threads, and it's collapsible globally in settings. Permanent dismiss is one tap.

Risk 4

Eng builds it as a sidebar

Every artifact in this spec rejects the sidebar. The principle "Grok is a layer, not a destination" is the load-bearing constraint and the first thing in the spec. Design reviews exist to defend it.

Risk 5

Verdict creep

Pressure will mount to make Grok rule on truth claims. We hold the line: context, never verdict. Community Notes does fact-checking with strong governance — that's the right tool for that job.

Risk 6

Visual creep

The temptation to give Grok a brand color (purple, gradient, shimmer) will be constant. The X brand is monochrome by design. The Grok mark and the 1px inner border are the entire visual language — and the restraint is the point.

09 / Details

The screens that prove it ships

A design that only shows the happy path isn't designed yet. These are the states an IC owns: loading, errors, the user who wants nothing to do with Grok, and the desktop user with twice the canvas.

Frame 9.1 · Streaming answer
For youFollowing
Patrick Collison @patrickc · 3h
The Hubble tension is now ~5σ. Cosmology is in the same place particle physics was in 1900.
89 412 3.2K 220K
Grok · thinking
The Hubble tension is the disagreement between two ways of measuring how fast the universe is expanding — early-universe data (CMB) and late-universe data (supernovae) — now off by roughly 5 standard
Loading sources…
400ms first token. The label changes from "Grok" to "Grok · thinking" so the user knows the system is alive. Sources fade in as they ground; confidence renders only once the answer is complete.
Frame 9.2 · Report inaccuracy
Report this answer
Grok's answer (under review)
"USGS 2024 data shows 72% of US rare earth imports came from China…"
What's wrong?

Reports route to the training pipeline. Repeated reports on the same answer pause it for all users while reviewed.

The trust loop. Users need a fast, dignified way to flag Grok being wrong. The report categories aren't generic — they match the actual failure modes the model has.
Frame 9.3 · Settings · Grok preferences
Grok in the timeline
Grok appears inline when it has useful context to add to a post. You're always in control.
Show "Ask Grok" on posts
Inline affordance below the action row
Thread Synthesizer
On threads with ≥8 replies
Claim-Context Chip
Context for factual claims
Compose Co-Writer
Private draft review before posting

You can still use Grok directly from the Grok tab. This setting only affects the timeline.

Granular by default, total opt-out always one tap away. No dark patterns. The "off" button is styled with the destructive color so it's findable — not buried.
Frame 9.4 · Desktop · same layer, more canvas
Home
Tune
Andrej Karpathy @karpathy · 5h
New paper: scaling reasoning at inference is not just "use more tokens" — there's a clean separation between search depth and search width that nobody is exploiting yet.
412 2.1K 18.4K 980K
Grok
Andrej is referring to the distinction between deeper chains-of-thought (depth) and exploring more candidate solutions in parallel (width). Recent work from DeepMind and Anthropic suggests width scaling is currently under-allocated relative to its returns.
DeepMind · 2025 arXiv:2501.04123 @karpathy thread
High confidence
The layer behaves the same on desktop. Wider canvas, same single-column attention model, same inline anchoring. No sidebar appears on desktop, even though we have the pixels — because the principle is "Grok is a layer," and the principle is platform-agnostic.
Frame 9.5 · When Grok has nothing to add
For youFollowing
friend @friend · 12m
lol
2 14 240
designer @designer · 22m
gm
8 41 580
Casey Newton @CaseyNewton · 1h
The EU's new digital services regulation drops next week and most US tech doesn't have a compliance plan yet.
84 220 1.4K 62K
Ask Grok about this
The most important design decision is the absence of one. "lol" and "gm" don't get a Grok affordance. The system has to know when to stay quiet, or it becomes noise. Affordance density is itself a design surface.
Frame 9.6 · First-time consent
For youFollowing

Grok can help with this post.

When you tap "Ask Grok," we send the post and your question to Grok and show you the answer below the post. Sources cited. Nothing is shared with the original author.

Sent to GrokPost content + your question
Not sentYour identity, other posts
Visible to authorNo
Consent is structural, not legal. The user sees what's sent, what isn't, and whether the post author can see anything. No dark pattern between Yes and No. Either button is acceptable.
  1. Why these screens matter. Anyone can mock the happy path. The IC's job is to design the trust loop: streaming → uncertainty → reporting → adjustment. These six frames are the trust loop.
  2. Why the "lol / gm" frame is in here. A senior designer ships the absence. Knowing where the affordance doesn't render is the same skill as knowing where it does.