Last month's ranking post went further than I expected. The pattern is clear at this point: people who use these tools for real want a take from somebody who also uses them for real. Not a leaderboard. Not a press cycle. An opinion earned by shipping with them.
April was quieter on the courtroom front than March, and louder on the model front. Both S Tier players shipped a generation. Opus 4.7 is here. GPT-5.5 is here. The conversation at the top has shifted again — and the further down the list you go, the less the technical news actually matters.
Same format. Eight models. Four tiers. Here is where things stand.
The S Tier: Claude and ChatGPT
Claude is still my number one and the gap is not shrinking.
Opus 4.7 dropped this month and it is the most noticeable generation jump since 4.5 to 4.6. The reasoning depth on long, multi-file refactors is the part I feel first. Plans hold together across longer sessions. The model holds context across a project the way Opus 4.6 held context across a conversation. Sonnet 4.6 is still the daily driver for routine work — fast, cheap, and good enough for 80% of what I need — but anything that touches architecture, anything that needs the model to actually understand the system before it edits it, goes to Opus 4.7 now. This entire site, every post on it, is still built with Claude Code. That has not changed in a year and I do not see it changing.
The product side held its own too. Memory is now mature enough that I notice when it is missing, which is the real bar for any feature. The Anthropic team kept their no-ads promise through another quarter. After last month's App Store moment, daily active usage has settled into a steady-state lead in the segments I track most closely — developers, knowledge workers, and anybody who got burned by ads in their assistant. ChatGPT is still bigger in raw numbers. Claude is bigger where it counts.
ChatGPT remains a strong number two and GPT-5.5 is the reason it is not number three.
OpenAI shipped GPT-5.5 this month and it is a real improvement over 5.4 — better at long-horizon agentic work, noticeably better at tool use, and the computer-use mode actually works on tasks I would not have trusted 5.4 with. For overflow work, for anything I want a second opinion on, for handing a task to a different model and seeing what comes back, GPT-5.5 is excellent. The 900-million-weekly-user ecosystem is still the largest in AI, and Microsoft's Copilot stack is still the deepest enterprise integration anyone has shipped.
But the trust gap from March did not close in April. The Pentagon contract is still in force. The ad rollout to free users is still in force. GPT-5.5 is a better model than GPT-5.4 and that does not move the needle on either of those questions. If you are choosing between Claude and ChatGPT today and capability is a wash, the tiebreaker is whose long-term incentives you trust. That tiebreaker keeps landing in the same place.
The A Tier: Gemini and Llama
Gemini still owns the brainstorming seat and the benchmark lead. Gemini 3.1 Pro continues to top most third-party leaderboards and the pricing is still the most aggressive at the frontier. The Google Workspace integration — pulling context from email, calendar, and Drive without a copy-paste — is a real advantage and it gets more useful the more of your life lives inside Google.
The Apple Siri story keeps getting pushed. The iOS 26.5 timeline that was supposed to land Gemini-powered Siri in May looks less certain than it did a month ago. Apple is still paying Google for the integration and still developing internal models in parallel. If the May window slips again, the September iOS 27 target becomes the only one that matters, and we are right back where we were at the start of the year — Apple being Apple about AI.
I keep testing Gemini for coding and I keep landing in the same place. The benchmarks say it should be a peer of Claude and GPT for software work. In practice it is not, at least not for the kind of work I do. I will keep testing. Brainstorming, image generation, deep Workspace context — A Tier. Building real systems — not yet.
Llama holds. Behemoth still has not shipped publicly. Meta keeps deploying Llama across every surface it owns — three billion people interacting with the same open-source model family — but the public-facing story has not changed since February. Scout and Maverick are still doing solid work in the open-source ecosystem. The story for Llama in April is the same as it was in March: enormous deployment footprint, frontier flagship still not in the room.
The B Tier: Mistral and Perplexity
Mistral keeps being the answer to a very specific question: what do you use when you care about cost, open-source licensing, and European data sovereignty? Mistral Small 4 is still the best open-source value proposition in the market, and Forge — letting enterprises build custom models on their own data — is starting to show up in real deployments. The list of named customers is growing. If you are building products for a European market, Mistral is the default and the gap to second place is not small.
Perplexity had another good month. Perplexity Computer is now stable enough that I have stopped thinking of it as a beta and started thinking of it as a tool. Comet is a legitimately interesting browser. Deep Research on Opus is the best research surface I have used. They held their no-ads line through another quarter. Perplexity is still my number one for research and the gap is still growing.
Both of these shops are running narrower playbooks than the S Tier and running them well. That is a sustainable position in a market this large.
The Nope Tier: Grok and DeepSeek
Grok did not get any better in April.
The federal lawsuit filed in March is now in active discovery. The simultaneous investigations in the U.S., EU, U.K., France, Ireland, and Australia are still simultaneous. xAI's response continues to be more capital, more product velocity, and less engagement with the underlying questions. Grok the model is improving on the technical axis. Grok the platform has not earned back any of the trust it spent. I am not running it for anything that matters and I do not recommend that anybody else does either.
DeepSeek's V4 release is somehow still pending. We are now several missed windows deep. The CCP censorship issues, the distillation accusations from both OpenAI and Anthropic, the security-vulnerability injection findings — all unresolved, all unaddressed. The technical promise is real. The trust math has not changed. If you are building anything that handles sensitive data or serves users who care about censorship, DeepSeek is the wrong answer regardless of what V4 looks like when it finally ships.
What Changed This Month
Two things, in order of importance.
First: the S Tier got a real refresh. Opus 4.7 and GPT-5.5 in the same month is the kind of cadence that used to happen once a year. The frontier is moving faster than any of the consumer-facing narratives can keep up with, and the people who notice the difference are the people who are already using these models for work. If you only touch them in a chat window, this month felt like a normal month. If you build with them, April was a generation jump.
Second: nothing structural changed in the bottom half of the table. Grok is still under legal siege. DeepSeek is still missing windows. Llama is still waiting on Behemoth. Apple is still being Apple about Siri. The technical story keeps moving at the top while the structural story stays exactly where it was.
That is the pattern I would watch for the rest of 2026. The S Tier separates further on capability, the A Tier holds on ecosystem, the B Tier wins narrow categories, and the Nope Tier keeps making the news for reasons that have nothing to do with the model.
The Updated Rankings
The full rankings — with the "Who Powers What" ecosystem map, the political bias table, and links to every model — live at johncderrick.com/ai-models. I update it monthly.
If you want to stop reading about these models and start building real systems with them, the Prompt Library has ten copy-paste prompts for AI assistants, email triage, knowledge bases, CRMs, and more. The prompts work across Claude, ChatGPT, and Gemini.
The Protocol: I use these models every day, all of them. The rankings are not theoretical — they are operational. Claude builds my software. ChatGPT handles my overflow. Gemini brainstorms with me. Perplexity does my research. The full rankings live at johncderrick.com/ai-models and the prompts to put them to work live at johncderrick.com/prompts. Check the dates. If they are current, so are the opinions.