Anthropic at the Rubicon: The Maduro Raid, the Pentagon Ultimatum, and the Death of 'Safety First'
Claude was used in the Venezuela raid. The Pentagon issued a Friday ultimatum. Anthropic gutted its safety pledge. A complete timeline of the crisis.
Google relaunches its AI creative studio Flow with new features and integrations
Google is turning its AI creative studio Flow into an all-in-one tool for images and videos, with free image generators and new editing features. The article Google relaunches its AI creative studio Flow with new features and integrations appeared first on The Decoder.
Perplexity launches Perplexity Computer, a new multi-model system that can solve tasks end-to-end, details below
**Perplexity AI:** Introducing Perplexity Computer. Computer **unifies** every current AI capability into one system. It can research, design, code, deploy and manage any project end-to-end. Perplexity Computer is massively multi-model. Computer orchestrates models to **run agents** in parallel, le
AI Intelligence Briefing: The SaaSpocalypse, METR's Terrifying Graph, DeepSeek's Blackwell Scandal, and More
Anthropic Banned OpenClaw: The OAuth Lockdown That Fractured the Claude Developer Community
Anthropic officially banned the use of Claude subscription OAuth tokens in all third-party tools including OpenClaw. Full timeline, economics, and what it means.
Gemini 3.1 Pro Is Here: Google's Reasoning Just Jumped 2.5× in Three Months
Google ships Gemini 3.1 Pro with a verified 77.1% on ARC-AGI-2 — more than double Gemini 3 Pro. It's now the second-best reasoning model behind Deep Think, and it's available to everyone today.
Ask HN: Have top AI research institutions just given up on the idea of safety?
Anthropic Studied Millions of AI Agent Sessions. The Biggest Finding: Humans Are the Bottleneck.
Anthropic analyzed millions of human-AI interactions and found a massive 'deployment overhang' — AI can handle 5-hour tasks, but users cap them at 42 minutes.
DeepMind's Aletheia Solved 4 Problems No Human Could — And Got 68.5% of Everything Else Wrong
Google DeepMind's Aletheia is an AI research agent that wrote a publishable paper solo and cracked unsolved math problems. But its 6.5% useful answer rate tells the real story.
OpenAI's AI Can Now Hack 72% of Smart Contract Vulnerabilities — And That's the Good News
OpenAI and Paradigm launch EVMbench, a benchmark showing AI agents can exploit most known smart contract bugs. The offense-defense gap is the real story.
Claude Sonnet 4.6: The Mid-Range Model That Keeps Embarrassing Flagships
Anthropic's new Sonnet 4.6 nearly matches Opus on coding and reasoning, crushes computer use benchmarks, and somehow beats every model at office work and finance. Full benchmark breakdown inside.
Anthropic vs The Pentagon: When Your AI Company Gets Treated Like a Foreign Adversary
The Pentagon is threatening to designate Anthropic as a 'supply chain risk' over Claude's military usage guardrails. Here's what's actually happening.
Grok 4.20: xAI's 4-Agent AI System Goes Live — Benchmarks, Architecture, and Pliny's Jailbreak
xAI launches Grok 4.20, a multi-agent system where four specialized AI agents debate in real-time. Here's how it works, where it ranks, and the system prompt Pliny the Liberator already extracted.
OpenAI defeats xAI’s trade secrets lawsuit
OpenAI won a victory Tuesday in one of its legal battles with xAI, which involved allegations of poaching and theft of trade secrets. The former company's motion to dismiss the lawsuit was granted on Tuesday with leave to amend, meaning xAI has the option to refile with modified claims. In the ruli
Andrej Karpathy: Programming Changed More in the Last 2 Months Than in Years
Karpathy says coding agents crossed a reliability threshold in December and can now handle long, multi-step tasks autonomously. He describes this as a major shift from writing code manually to orchestrating AI agents. **Source:** Andrej [Tweet](https://x.com/i/status/2026731645169185220)
Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090
I wanted to check qwen3.5 35B-A3B models that can be run on my GPU. So I compared 3 GGUF options. **Hardware:** RTX 4090 (24GB VRAM) **Test:** Multi-agent Tetris development (Planner → Developer → QA) # Models Under Test |Model|Preset|Quant|Port|VRAM|Parallel| |:-|:-|:-|:-|:-|:-| |Qwen3.5-27B|`q