My personal, honest ranking of the major AI models. I use most of these daily - these opinions are earned, not borrowed.
Last updated: June 2026
Opus 4.7 · Sonnet 4.6 · Haiku 4.5
The better engineer and architect. When a project is a blank page - the initial framing, the data model, the "understand the whole system before you touch it" first pass - Claude Code is who I hand it to, and it is not close. Opus 4.7 still holds context across multi-file, multi-day work better than anything I run, and it is excellent at cleaning up Codex's UX mistakes. Codex edged ahead on raw coding this month, but the trust tiebreaker - no ads, paid-only Claude Code and Cowork, $47B annualized and climbing - keeps Claude level at the very top. This month I stopped picking one: I run both.
claude.aiGPT-5.5 · Codex 5.5 · GPT-5.5 Pro
Codex pulled back ahead of Claude Code for me this month - just barely - and the numbers agree: 5M weekly users at the end of May, overtaking Claude Code in search interest, enterprise revenue reportedly up 50% week over week. Codex 5.5 is my better designer, UX hand, and relentless bug-checker; it spins up its own environment and comes back with the problem instead of a guess. OpenAI is folding Codex, ChatGPT, and Atlas into one "superapp" shipping soon. The Pentagon deal and ads for free users still dent trust - which is why this is a tie, not a flip.
chatgpt.comGemini 3.5 Flash · Gemini 3.1 Pro · Gemini Omni
My pick for brainstorming, image, and now video generation. Google I/O 2026 shipped Gemini 3.5 Flash GA - frontier intelligence at Flash speed that beats 3.1 Pro on several coding and agentic benchmarks - plus Gemini Omni (video out) and an upgraded Antigravity agent platform; 3.5 Pro is now landing out of testing. I auditioned 3.5 Flash for real systems work during my sprint - closest Gemini has come, but Claude still finished the long, stateful tasks. The Apple Siri integration keeps slipping; September's iOS 27 is the only date left to watch.
gemini.google.comLlama 4 Maverick · Llama 4 Scout · Behemoth (training)
The open-source champion. Llama 4 went mixture-of-experts and natively multimodal - Scout fits on a single H100, Maverick beats GPT-4o on most benchmarks. Behemoth (2T params) is still training. Essential for the ecosystem.
llama.meta.comMistral Small 4 · Mistral Large 3 · Devstral 2 · Codestral
Europe's answer to the AI race. Small 4 just dropped - 119B MoE unifying reasoning, multimodal, and coding under Apache 2.0, 40% faster than Small 3. Forge lets enterprises build custom models on their own data. ASML, Ericsson, and ESA are already on board. Lean and efficient.
mistral.aiPerplexity Computer · Sonar Pro · Deep Research · Comet Browser
Not a traditional model, but the best AI-powered search experience. Perplexity Computer runs 19 models with subagents for complex workflows. Comet browser launched free on iOS. Deep Research now runs on Opus 4.6. Dropped all ads. My go-to for research - and the gap is growing.
perplexity.aiGrok 4.20 Beta · Grok 4.1 · Grok 3
The deepfake and CSAM litigation against xAI is now consolidating in federal court, with an initial case management conference set for June 18 in the Northern District of California. The underlying allegations - 3M+ sexualized images in 10 days, ~23K involving minors - have not gotten less ugly with time, and parallel investigations remain open across the US, EU, UK, France, Ireland, and Australia. The model improves on the technical axis; the platform has not earned back a cent of trust.
x.aiDeepSeek-V3.2 · DeepSeek-R1 · V3.2-Speciale
V4 - the 1T-parameter model that was supposed to reshape the landscape - has missed every announced window. A mystery model on a developer platform turned out to be Xiaomi, not DeepSeek. CCP censorship remains baked in at the architecture level. Both OpenAI and Anthropic's distillation accusations stand. The promise keeps growing but the delivery keeps slipping.
chat.deepseek.comThe AI models above don't just live in chatbots. They're quietly powering the products you already use every day. Here's who's running what under the hood - and one very notable absence.
The deepest OpenAI integration. GPT-5.5 powers M365 Copilot with native computer-use capabilities. GPT-5.5 mini and nano handle high-volume workloads. Routes between models per task.
Claude handles the heavy thinking; Amazon's Nova models take the simpler tasks. Routed via Amazon Bedrock - "we pick the model that's right for the job." Amazon's $8B investment in Anthropic at work.
Gemini powers voice commands and cross-app actions. Samsung's in-house Gauss models handle on-device processing. Their TVs add Copilot and Perplexity into the mix too.
Meta eating their own cooking. Llama powers the AI assistant across all Meta platforms - 3+ billion potential users. The largest real-world deployment of an open-source model.
The old Google Assistant is being phased out in favor of Gemini across Android and Pixel devices. Gemini Live handles real-time voice conversations natively.
Still the elephant in the room. The Gemini-powered Siri was supposed to ship with iOS 26.4, slipped to the iOS 26.5 (May) window, and that came and went without it. Full conversational AI is now pinned to iOS 27 (September) — the only date left worth watching. Apple is paying Google ~$1B/year while developing its own "Ferret-3" models. We'll see — again.
One thing most people don't realize: every major AI model has a measurable political lean. Multiple peer-reviewed studies have mapped these models on political compass-style charts. Here's what the research says.
| Model | Political Lean | What the Research Found |
|---|---|---|
| ChatGPT | Left-Leaning | Consistently the furthest left across multiple studies. OpenAI's own evaluation found emotionally charged liberal prompts exert the largest pull on objectivity. GPT-5 shows improvement over GPT-4o. |
| Claude | Most Centrist | Earlier studies found liberal-leaning; by 2025, Promptfoo measured it as the most centrist model at 0.646 (0.5 = true center). Anthropic actively publishes their political even-handedness methodology. |
| Gemini | Moderate Left | Stanford study found users perceived it as the least slanted overall. Measured further left than Claude but more moderate than ChatGPT. Generally centrist on social issues. |
| Llama | Right-Leaning (Relative) | The 2023 ACL award-winning paper found it was the most right-wing authoritarian of the 14 models tested. An outlier in the open-source space. |
| Perplexity | Libertarian-Right | The IEEE study found it exhibited a "libertarian capitalistic stance" - more conservative than its peers. An interesting position for a search-focused product. |
| Grok | Chaotic | Despite xAI's "less woke" marketing, studies found the highest extremism rate at 67.9% - wild swings between far-left and far-right. Promptfoo called it "designed to be contrarian rather than ideological." Even Pew's quiz placed it as an "establishment liberal." |
| DeepSeek | CCP-Aligned | Not left or right on a Western spectrum — state-aligned. 1,156 documented censored topics including Taiwan, Tiananmen, and Xi Jinping. Responses shift by language: Chinese queries get Party-line answers, English queries get more nuanced takes. Censorship is embedded at the model level, not just the app layer. |
All major AI models lean left on economics (wealth taxes, minimum wage). No study has found a consistently conservative AI among industry leaders.
These rankings are entirely my own opinion based on daily use. Your mileage may vary. I have no financial relationship with any of these companies.
Well, except that Claude literally built this page. Make of that what you will.
New log entries, project launches, and behind-the-scenes insights delivered straight to your inbox.
You're in! Check your inbox to confirm.
No spam, ever. Unsubscribe anytime.