Grok in the Timeline · Visual Spec v1.0

Grok shouldn't be a tab. It should be a layer.

Six surgical insertions of Grok into the existing X surface - designed to lift cognitive load by 30% without breaking a single user habit. Each frame is independently shippable. The system stays monochrome.

Built by David T Phung
For the xAI Product Designer (IC) role

Companion to spec.html
Grounded in X Brand Quick Guide v1.0

2026

00 / Frame

The shift

Today, Grok is a destination users go to. Tomorrow, Grok is a layer that meets users where they already are. Everything that follows is in service of that move.

Before

Grok as a tab

Lives in a separate top-level surface
Every question is a context switch from the timeline
Users copy-paste posts into Grok to ask about them
No memory of what the user was just reading
Competes with the feed for attention and engagement
Treated as a chatbot product, not a system capability

After

Grok as a layer

Lives inside the timeline, post detail, and compose
One tap to invoke, scroll-away to dismiss
Always anchored to the post or thread that prompted it
Cites sources by default - context, never verdict
Steerable: users can adjust the feed in plain language
Treated as infrastructure that lifts every flow

01 / System

Visual primitives

The X brand is rigorously monochrome - black, white, and one muted gray for secondary text. Grok's presence in the system has to earn its pixels. No purple. No gradients. No "AI shimmer." Three primitives. That's the whole language.

The Grok Mark

A 12px outlined diagonal with a center node. Visually consonant with the X mark, distinct enough to signal "generated." Used inline on Grok-touched surfaces and nowhere else.

The Grok Surface

Grok

The chart is normalized by population. The 12% figure refers to growth rate, not absolute count.

1px inner border at rgba(255,255,255,0.16), 12px radius, near-black fill at 2% white. The entire Grok visual language - applied to every layer.

The Source Chip

@nateliason · post Reuters · Mar 14 +2 sources

Every Grok answer carries source chips. Tap to jump to the post or open the link. If sources aren't strong, Grok says so via the confidence indicator - never silently.

02 / Insertion 1

Inline Ask-Grok

A subtle affordance on every post. One tap Grok inserts a single-sentence "what's this about" answer directly below the post, with source chips. The cell expands; the user never loses their place.

Frame 1.1 · Resting state

For youFollowing

Naval ✓ @naval · 2h

The compound interest of writing in public is the largest unexploited arbitrage of the internet.

412 1.2K 14.6K 1.1M

Ask Grok about this

Patrick Collison ✓ @patrickc · 3h

The Hubble tension is now ~5σ. Cosmology is in the same place particle physics was in 1900.

89 412 3.2K 220K

The affordance only appears on posts where Grok has something to add. Posts like "lol" or "gm" don't render it. The default state is quiet.

Frame 1.2 · Expanded

For youFollowing

Patrick Collison ✓ @patrickc · 3h

The Hubble tension is now ~5σ. Cosmology is in the same place particle physics was in 1900.

89 412 3.2K 220K

Grok

The Hubble tension is the disagreement between two ways of measuring how fast the universe is expanding - early-universe data (CMB) and late-universe data (supernovae) - now off by roughly 5 standard deviations. Patrick is comparing it to the late-1800s "ultraviolet catastrophe," which preceded quantum mechanics.

Nature · 2024 @SeanCarroll thread arXiv:2401.18965

High confidence · grounded in 3 sources

Ask a follow-up

Answer is anchored to the post that prompted it. Sources are chips, not footnotes - tappable, named, dated. Confidence is shown explicitly so users can calibrate trust.

Why no sidebar. A sidebar would force users to track two columns of state simultaneously. Inline expansion preserves single-column attention - the thing the timeline does best.
Why "Ask Grok about this" wording. Active verb, possessive object. Tells the user exactly what they're getting and that they're driving.
Why confidence is visible by default. Hidden uncertainty is dishonest. Showing it builds the calibration loop users need to trust the system.

03 / Insertion 2

Thread Synthesizer

When a thread has ≥8 replies, Grok summarizes the shape of the conversation above the original post. Consensus, disagreement, notable replies. The user can read four lines instead of four hundred - or expand into the full thread.

Frame 2.1 · Thread synthesizer banner

Post

Grok · synthesizing 247 replies

The thread is split: ~60% are pushing back on the methodology of the chart (normalization, sample size). The author posted a clarification at reply #34. A separate sub-thread (@karpathy, 41 replies) reframes the question as "is this even the right metric."

Consensus

The underlying data is real and recent. Source is credible.

Disagreement

Whether absolute or per-capita numbers are the honest comparison.

Worth reading

@karpathy #41 reframes the metric. @primalpoly #12 has the cleanest counter-chart.

Updated 2 min ago · 12% sample Hide synthesizer

Tyler Cowen ✓ @tylercowen · 6h

The chart making the rounds is real but normalized in a way that flatters the conclusion. Here's the per-capita version.

247 1.4K 8.9K 642K

The synthesizer is visibly generated - small Grok mark, sample size, timestamp. Users know it's a model's read, not editorial. Collapsible by default per user preference.

The "12% sample" disclosure. Most replies aren't read by Grok in real time - it samples. Showing the sample size is non-negotiable. Pretending Grok read all 247 replies is dishonest design.
Why three columns. Consensus / Disagreement / Notable maps to the three things users actually want from a thread: "what do most people think," "where's the fight," "what's worth my time."
Why this lives above the OP. Above is the only place that lets users decide before investing scroll. Below the OP is too late.

04 / Insertion 3

Claim-Context Chip

When a post contains a checkable factual claim, a small chip appears: "Context available." Tap to see what Grok found, with sources and explicit confidence. Grok surfaces context - never a verdict. That's Community Notes' job.

Frame 3.1 · Chip resting

For youFollowing

Anonymous Builder @buildernyc · 1h

The US imports 90% of its rare earths from China. We have zero domestic refining capacity.

22 184 1.1K 48K

Context available · 2 claims checked

The chip is quiet. It doesn't shout "wrong" or "right" - it offers context. Users are adults; they can read the underlying data.

Frame 3.2 · Chip expanded

For youFollowing

Anonymous Builder @buildernyc · 1h

The US imports 90% of its rare earths from China. We have zero domestic refining capacity.

22 184 1.1K 48K

Context · 2 claims checked

"90% from China" - USGS 2024 data shows 72% of US rare earth imports came from China; ~95% of global refining is in China.

"Zero domestic refining" - MP Materials reopened Mountain Pass refining in 2023; capacity is limited but not zero.

USGS · 2024 MP Materials 10-K Reuters · Mar 2024

High confidence · 3 primary sources

This is context, not a verdict. Report inaccuracy · Open in Community Notes

Context, not verdict. The user gets the underlying data and decides. Reporting and Community Notes are one tap away - Grok is the first-pass research layer, not the arbiter.

05 / Insertion 4

Compose Co-Writer

The #1 reason lurkers don't post is fear of being wrong. The Co-Writer gives users a private review before they hit Post. Sharpen, steelman, or fact-check. Original draft is never overwritten without consent. This is the conversion metric.

Frame 4.1 · Draft state

Cancel

Post

Everyone can reply

AI is going to eat 90% of customer support jobs in the next 2 years. The unit economics finally work.

📷🎬📊📍 Co-Writer

Co-Writer affordance appears only when the draft is over 40 chars - below that, the user knows what they're doing. Above that, there's enough content for Grok to actually help.

Frame 4.2 · Co-Writer engaged

Cancel

Post

AI is going to eat 90% of customer support jobs in the next 2 years. The unit economics finally work.

Co-Writer

Sharpen

Tighter, same voice

Steelman

Strengthen weak claims

Check

2 flags found

"90% in 2 years" is a strong claim - current industry forecasts (Gartner, McKinsey) cluster around 30–50% automation by 2027. You'll likely get pushback.

Suggestion: "AI is going to take a major bite out of customer support - Gartner sees 30%+ automation by 2027, and unit economics are finally working."

The Co-Writer is private. Nothing is sent to the network. The user posts what they decide to post. Grok's job is to be the friend who reads your draft before you tweet it.

Why "Check" is selected by default. Of the three actions, fact-flagging has the highest expected utility for the most users. Sharpen/Steelman are voice-dependent; Check is universally useful.
Why the user's original draft is shown above unchanged. Grok is not the author. The user is. Anything Grok generates is a suggestion in a separate frame.
Why "Keep mine" is a button. Equal weight to "Use suggestion." We're not nudging.

06 / Insertion 5

Steerable For You

The For You algorithm is currently a black box users resent. Letting them steer it in plain language is the most honest interaction X could ship - and only possible with a model in the loop. This is the moat.

Frame 5.1 · Tune your feed

Done

Tune your feed

What's shaping your feed right now

NBA & basketball28%

AI / ML research22%

Startup / VC18%

Politics (US)14%

Design & typography6%

Other12%

Less NBA. Less politics. More design and AI papers.

Grok

I'll lower NBA from 28% 10% and US politics from 14% 4%. Design will grow from 6% 22% and AI research from 22% 35%. Here's the diff:

NBA28% 10%

Politics14% 4%

Design6% 22%

AI research22% 35%

I can't filter individual accounts here - use mute/block for that.

The diff is the trust mechanism. Users see exactly what Grok will change before they apply it. Grok also names what it can't do - honest about the algorithm's boundaries.

Frame 5.2 · Entry point

For youFollowing

Your For You feed Tune

Frank Chimero ✓ @fchimero · 4h

A grid is a contract with the reader. Breaking it should cost the designer something - or it doesn't mean anything.

32 184 2.1K 78K

Andrej Karpathy ✓ @karpathy · 5h

New paper: scaling reasoning at inference is not just "use more tokens" - there's a clean separation between search depth and search width that nobody is exploiting yet.

412 2.1K 18.4K 980K

The "Tune" chip is the only persistent Grok affordance in the timeline header. One tap from any feed view. The post-steer feed reflects the user's stated taste - design and AI research, exactly as ordered.

07 / Insertion 6

Conversational Time-Scrub

X is the largest real-time archive of human reaction in existence and it's almost impossible to query. Grok unlocks it. Ask in plain language; get a timeline of that moment.

Frame 6.1 · Scrub query + result

"what was X saying the day GPT-4 launched"

Scrubbed to Mar 14, 2023 · 10am – 11pm PT filtered to your interest graph

Sam Altman ✓ @sama · Mar 14, 2023

here is GPT-4, our most capable and aligned model yet. it is available today in our API (with a waitlist) and in ChatGPT+.

8.4K 32K 280K 24M

Andrej Karpathy ✓ @karpathy · Mar 14, 2023

congrats to OpenAI team on GPT-4! a few early observations: handles much longer contexts, dramatically better at multi-step reasoning, and the image input is a real step change.

1.2K 4.8K 41K 3.2M

Emad ✓ @EMostaque · Mar 14, 2023

GPT-4 is impressive but also expensive - the real story of 2023 will be open-source models closing the gap faster than anyone expected.

412 2.1K 12K 480K

The result is a timeline, not a chatbot summary. Grok resolves the date and filter; X renders posts the way it always renders posts. Continue scrolling to load more from that window. The interaction model stays familiar.

Why a real timeline, not a summary. Users trust posts more than summaries. Grok's job is to find the right posts, not to paraphrase them.
Why "filtered to your interest graph." Without this filter, scrubbing March 14, 2023 returns mostly noise. With it, the user gets what they would have seen.
Why this is build cost #6. Heaviest backend work - historical retrieval + temporal ranking. Ship insertions 1–4 first, validate the lift, then invest here.

08 / Honest

What kills this

A design that doesn't name its failure modes isn't a design - it's a poster. The four ways this dies, and how the system is structured to survive them.

Risk 1

Hallucination on a real post

Confidence labels visible by default. Source chips on every answer. One-tap report routes to the training pipeline. Context chip is tuned for high precision, lower recall - we'd rather say nothing than say something wrong.

Risk 2

Latency kills the feature

400ms first-token, 1.5s full answer - non-negotiable. Pre-fetch on viewport entry for top-of-feed posts. If we can't hit budget, we don't ship that surface.

Risk 3

Users feel watched

Every Grok action is opt-in per-tap. The only auto-trigger is the Thread Synthesizer on ≥8-reply threads, and it's collapsible globally in settings. Permanent dismiss is one tap.

Risk 4

Eng builds it as a sidebar

Every artifact in this spec rejects the sidebar. The principle "Grok is a layer, not a destination" is the load-bearing constraint and the first thing in the spec. Design reviews exist to defend it.

Risk 5

Verdict creep

Pressure will mount to make Grok rule on truth claims. We hold the line: context, never verdict. Community Notes does fact-checking with strong governance - that's the right tool for that job.

Risk 6

Visual creep

The temptation to give Grok a brand color (purple, gradient, shimmer) will be constant. The X brand is monochrome by design. The Grok mark and the 1px inner border are the entire visual language - and the restraint is the point.

09 / Details

The screens that prove it ships

A design that only shows the happy path isn't designed yet. These are the states an IC owns: loading, errors, the user who wants nothing to do with Grok, and the desktop user with twice the canvas.

Frame 9.1 · Streaming answer

For youFollowing

Patrick Collison ✓ @patrickc · 3h

The Hubble tension is now ~5σ. Cosmology is in the same place particle physics was in 1900.

89 412 3.2K 220K

Grok · thinking

The Hubble tension is the disagreement between two ways of measuring how fast the universe is expanding - early-universe data (CMB) and late-universe data (supernovae) - now off by roughly 5 standard

Loading sources…

400ms first token. The label changes from "Grok" to "Grok · thinking" so the user knows the system is alive. Sources fade in as they ground; confidence renders only once the answer is complete.

Frame 9.2 · Report inaccuracy

Report this answer

Grok's answer (under review)

"USGS 2024 data shows 72% of US rare earth imports came from China…"

What's wrong?

Factually incorrect Missing important context Source is wrong or outdated Bias or framing problem

Reports route to the training pipeline. Repeated reports on the same answer pause it for all users while reviewed.

The trust loop. Users need a fast, dignified way to flag Grok being wrong. The report categories aren't generic - they match the actual failure modes the model has.

Frame 9.3 · Settings · Grok preferences

Grok in the timeline

Grok appears inline when it has useful context to add to a post. You're always in control.

Show "Ask Grok" on posts

Inline affordance below the action row

Thread Synthesizer

On threads with ≥8 replies

Claim-Context Chip

Context for factual claims

Compose Co-Writer

Private draft review before posting

You can still use Grok directly from the Grok tab. This setting only affects the timeline.

Granular by default, total opt-out always one tap away. No dark patterns. The "off" button is styled with the destructive color so it's findable - not buried.

Frame 9.4 · Desktop · same layer, more canvas

Home

Tune

Andrej Karpathy ✓ @karpathy · 5h

New paper: scaling reasoning at inference is not just "use more tokens" - there's a clean separation between search depth and search width that nobody is exploiting yet.

412 2.1K 18.4K 980K

Grok

Andrej is referring to the distinction between deeper chains-of-thought (depth) and exploring more candidate solutions in parallel (width). Recent work from DeepMind and Anthropic suggests width scaling is currently under-allocated relative to its returns.

DeepMind · 2025 arXiv:2501.04123 @karpathy thread

High confidence

The layer behaves the same on desktop. Wider canvas, same single-column attention model, same inline anchoring. No sidebar appears on desktop, even though we have the pixels - because the principle is "Grok is a layer," and the principle is platform-agnostic.

Frame 9.5 · When Grok has nothing to add

For youFollowing

friend @friend · 12m

lol

2 14 240

designer ✓ @designer · 22m

8 41 580

Casey Newton ✓ @CaseyNewton · 1h

The EU's new digital services regulation drops next week and most US tech doesn't have a compliance plan yet.

84 220 1.4K 62K

Ask Grok about this

The most important design decision is the absence of one. "lol" and "gm" don't get a Grok affordance. The system has to know when to stay quiet, or it becomes noise. Affordance density is itself a design surface.

Frame 9.6 · First-time consent

For youFollowing

Grok can help with this post.

When you tap "Ask Grok," we send the post and your question to Grok and show you the answer below the post. Sources cited. Nothing is shared with the original author.

Sent to GrokPost content + your question

Not sentYour identity, other posts

Visible to authorNo

Consent is structural, not legal. The user sees what's sent, what isn't, and whether the post author can see anything. No dark pattern between Yes and No. Either button is acceptable.

Why these screens matter. Anyone can mock the happy path. The IC's job is to design the trust loop: streaming uncertainty reporting adjustment. These six frames are the trust loop.
Why the "lol / gm" frame is in here. A senior designer ships the absence. Knowing where the affordance doesn't render is the same skill as knowing where it does.