TL;DR: xAI’s Grok 4 Fast is a new, 2M‑context multimodal model that lands #1 on LMArena’s Search Arena and top‑10 on the Text Arena—while launching at $0.20 / 1M input and $0.50 / 1M output tokens. It’s free for a limited time on OpenRouter and Vercel AI Gateway, and early signals point to reinforcement‑learning (RL) infrastructure—and lots of compute—behind the jump. Vercel+4xAI+4LMArena+4
Links referenced in this post:
Lech Mazur on Grok’s Connections benchmark results (NYT Connections “Extended”) → X (formerly Twitter)
John Boccio (xAI RL Infrastructure) on the new agent framework used in Grok 4 Fast’s training run → X (formerly Twitter)
xAI introduced Grok 4 Fast, a unified model with two API SKUs:
grok‑4‑fast‑reasoning and grok‑4‑fast‑non‑reasoning, both with a 2,000,000‑token context window. The “reasoning” vs “non‑reasoning” behavior is steered by prompts but uses the same weights, so you don’t juggle different models. xAI
Pricing (xAI API):$0.20 / 1M input, $0.50 / 1M output (cached input: $0.05 / 1M). Higher rates apply only for requests exceeding 128K context. Live search is billed $25 / 1K sources. xAI Docs
Availability: In Grok on web/iOS/Android, plus for a limited time it’s free via OpenRouter and Vercel AI Gateway. xAI+2OpenRouter+2
xAI’s announcement emphasizes large‑scale reinforcement learning and “tool‑use RL” (e.g., when to browse/code) to maximize intelligence per token (“intelligence density”)—claiming ~40% fewer thinking tokens versus Grok 4 at comparable accuracy and a ~98% reduction in price to match Grok 4’s frontier results. xAI
How good is it (so far)?
LMArena: #1 in Search, top‑10 in Text
Search Arena:grok‑4‑fast‑search is #1 with an Elo of 1163 (preliminary), edging out o3‑search, gpt‑5‑search, and gemini‑2.5‑pro‑grounding. As always, these are blind head‑to‑head votes where humans compare anonymous model outputs. Early, but notable. LMArena+1
Text Arena:grok‑4‑fast currently sits 8th in the overall Text leaderboard snapshot—impressive for a “fast”/cost‑efficient model tier. (Positions shift as new votes roll in.) LMArena
xAI’s post highlights Search/Browsing evals (BrowseComp, SimpleQA, etc.), where Grok 4 Fast claims SOTA‑level agentic search behavior; that aligns with the early LMArena Search result above. xAI
Benchmarks evolve and ratings can move as votes accumulate. Treat the Search #1 and Text top‑10 as very promising but provisional snapshots.
Why the jump? (Likely) RL at scale + infrastructure
Two tea leaves:
RL Agent Framework: John Boccio (xAI RL Infra) says a new agent framework underpinned the Grok 4 Fast training run and will power future RL training—hinting at process/scaling wins in RL post‑training. X (formerly Twitter)
Talent & Compute: Dustin Tran (8 years at Google Brain/DeepMind; RL/evals/data) announced he’s joined xAI—his thread underscores deep focus on RL/evals and, implicitly, a lot of chips. Meanwhile, Colossus (xAI’s Memphis supercomputer program) is publicly positioned as a record‑scale cluster built and scaled at unusual speed, with reporting around hundreds of thousands of GPUs. It’s reasonable to infer the RL budget is substantial. X (formerly Twitter)+2xAI+2
Put together: process + people + (a lot of) compute makes Grok 4 Fast’s “fast/cheap yet very strong” landing less mysterious.
Pricing & availability (developer quick facts)
xAI API model IDs:grok-4-fast-reasoning and grok-4-fast-non-reasoning (2M context). xAI Docs
Free access (limited time): OpenRouter has x-ai/grok-4-fast:free; Vercel AI Gateway also lists Grok 4 Fast in its model library and playground. OpenRouter+1
Rollout in apps: Grok on web/iOS/Android uses Grok 4 Fast in Fast/Auto modes for searchy/information‑seeking queries. xAI
A visual that matters: price ↔ “intelligence” tradeoff
xAI points to an independent Artificial Analysis view showing Grok 4 Fast with a state‑of‑the‑art price‑to‑intelligence ratio (they even plot an “Intelligence vs. Price” curve). Regardless of whether you love that composite index, it’s another datapoint: frontier‑adjacent quality at a far lower run‑cost. xAI+1
The Connections thing (and why people noticed)
Lech Mazur reports Grok 4 Fast (Reasoning) set a new high on his Extended NYT Connections benchmark (92.1). That tracks with the broader narrative: RL‑hardened reasoning + agentic behaviors improving practical problem‑solving, not just static Q&A. (Benchmarks are community‑run; still a useful directional signal.) X (formerly Twitter)
Why this release matters
Search is where assistants earn their keep. If Grok holds #1 in LMArena’s Search Arena as votes climb, it’s a material shift for research/productivity use cases that rely on multi‑hop browsing, citation, and source fusion. LMArena
The “fast tier” got upgraded. Grok 4 Fast lands near frontier models in text quality while undercutting many on price—reshaping the “cheap‑and‑quick” segment. LMArena+1
RL at scale may be the story of 2025. Boccio’s and Tran’s notes line up with a broader industry trend: post‑training RL (and agent training) becoming the dominant slab of compute—and a key differentiator. X (formerly Twitter)+1
xAI API: Use grok-4-fast-reasoning when you need deep chains of thought; grok-4-fast-non-reasoning for snappy responses under the same 2M context ceiling. Start at $0.20/$0.50 per 1M tokens, with caching to cut costs. xAI Docs
One more product note: “Read Aloud”
xAI/Grok added a Read Aloud mode—announced around the Grok 4 Fast window—which lets you hear responses in a natural voice. Handy for drive‑time or multitasking. (Announcement coverage linked below.) LatestLY+1
The human angle
Dustin Tran (RL/evals lead work across Gemini lines) is now at xAI; his thread reflecting on DeepMind years and this move drew big attention. X (formerly Twitter)
John Boccio says a new RL agent framework powered Grok 4 Fast’s training run and will anchor future RL runs, implying sustained investment in the approach that lifted this model. X (formerly Twitter)
What to watch next
LMArena stability: Will grok‑4‑fast-search hold the #1 as votes and confidence grow? Keep an eye on the Search and Text tabs. LMArena
API economics: At these prices, 2M‑context projects (RAG over large code/docs) become newly practical—watch for developer case studies. xAI Docs
RL scaling curve: If xAI keeps iterating the RL agent framework—and has Colossus scale behind it—expect more “fast but frontier‑ish” releases. X (formerly Twitter)+1
Sources & further reading
xAI news post: “Grok 4 Fast” (features, LMArena placement, tool‑use RL, free period on OpenRouter/Vercel, 2M context). xAI
xAI docs (pricing/specs for both SKUs): Grok 4 Fast Reasoning/Non‑Reasoning. xAI Docs+1
LMArena leaderboards:Search (Grok 4 Fast #1) and Text (Grok 4 Fast top‑10). Method: anonymous, pairwise votes. LMArena+1
OpenRouter (free period model page):x-ai/grok-4-fast:free. OpenRouter
Vercel AI Gateway (model library / playground): Grok 4 Fast listing. Vercel
Lech Mazur on Connections benchmark: Grok 4 Fast (Reasoning) 92.1. X (formerly Twitter)
John Boccio (xAI RL Infrastructure) on new agent framework for Grok 4 Fast training:X (formerly Twitter)