Which AI Model Should I Use For Coding?

Page content

Most of the big players (Anthropic, Google, et. al) provide all-in-one solutions for writing code. If that’s what you’re doing, that’s fine, but you may be missing out.

For one thing, they can be pretty expensive (unless your company is paying for it). Anthropic’s best - Opus - can get really expensive, really fast. Admittedly, it still costs less than paying for an overseas team, but if it’s coming out of your wallet you’re bound to notice.

Considering how quickly everything is moving, I’m sure this post will age like milk.

Thinking outside the big-box store

If you’re using an editor or tool that can connect to any OpenAI-compatible endpoint, you’ve got options. This is a big deal because it means you aren’t locked into one provider’s ecosystem and pricing. You can shop around.

A few third-party options have sprung up to fill this need:

  • OpenRouter is what I use. It’s a pay-as-you-go service that acts as a gateway to dozens of different models, from the cheapest to the most expensive. You fund an account and your balance gets debited as you use it. Simple.
  • Together AI is another popular choice. They focus on providing fast inference for a lot of open-source models, often at very competitive prices.
  • Portkey is geared more towards businesses or power users. It offers advanced features like setting budgets, fallback models (if one fails, it tries another), and other guardrails to control costs and improve reliability.

For those who want to run things themselves, there’s LiteLLM. It’s an open-source project that gives you a unified API to talk to over 100 different LLMs. You can host it yourself, giving you full control over how you route requests, whether it’s for a personal project or a full-blown business application. Think of it as a universal remote for LLMs.

Trading cost for capability

Every model has what you could call a “complexity ceiling.” It’s the point where a task is just too hard for it. The context window is too small to hold all the relevant information, or the reasoning required is a step too far. Your choice of model determines how high that ceiling is, and what you’ll pay to get there.

The cheap models

It’s tempting to just use the cheapest model available. Tokens are cheap, so why not? The problem is that while the per-token cost is low, the cost in your time and effort can be high.

Using a weak model is like micromanaging a new intern. You have to break everything down into tiny, explicit steps.

You might end up with a workflow like this:

  1. First, you tell it to write the basic code for a function. You can’t ask for tests at the same time because it’s not “smart” enough to handle both requests at once. It’ll just get confused or do a poor job.
  2. Once you get the code, you start a new conversation. “Okay, now write a unit test for the function you just wrote.”
  3. The test it writes will probably only cover the happy path. So, you have to ask again: “Now write tests for the edge cases.” This might take a few more tries.
  4. Finally, after all that, you have to tell it: “Now, write the documentation for the function.”

Each step costs time and money. What looked cheap on paper becomes a slow, frustrating process.

The expensive models

On the other end, you have the top-tier models like GPT-4 Turbo and Anthropic’s Opus. You can throw a complex problem at them, and they often get it right on the first or second try. They write the code, the tests, and the documentation all in one go.

The work gets done much, much faster. The trade-off is obvious: you have to bring your wallet. Opus, in particular, can chew through a budget so quickly you’ll wonder what happened. It’s very good, but you have to ask yourself if the speed is worth the significant cost. For some projects, it is. For many, it’s overkill.

The middle of the road

This brings us to models like Anthropic’s Sonnet or Google’s Gemini Pro. They’re… well… they’re middle of the road. They’re more capable than the cheap models, so you won’t have to do as much hand-holding. They’re less expensive than the top-tier ones, so your wallet won’t scream in pain.

It’s a compromise. You get decent performance for a reasonable price, but you’ll still run into that complexity ceiling sooner than you would with an Opus or GPT-4.

It’s your call

Ultimately, the choice depends on how you work and what you’re working on. Are you doing something simple where a cheap model is good enough? Are you on a tight deadline where the speed of an expensive model justifies the cost? Or are you somewhere in between?

No matter which model you choose, you’ll eventually hit a wall. AI is a tool, and a powerful one, but it doesn’t remove the need for a developer to be the architect, the project manager, and the quality checker. You still have to manage the process and step in when the AI gets it wrong.