how to evaluate model, infrastructure, and product companies in AI
lessons from my job search (part 3, final.)
The AI ecosystem resembles an inverse pyramid: a few model providers at the base, more infrastructure tools in the middle, and many AI applications on top:
Each category will have a few big winners. These are some of the factors I considered to determine whether a company is set up to win in the product, infrastructure, and model categories.
section 1. AI products
AI product companies are bringing Reid Hoffman's hypothesis to life: an AI copilot for every profession by 2028. Think: Harvey for law, Cursor/Codium for software engineering, Mercor for recruiting, Salient for automotive financing.
This is the most crowded category given low barriers to entry. Even though these companies were initially dismissed as "GPT-wrappers", a handful have been rolling in cash.
When evaluating these companies, I considered two things in addition to whether they’re growing quickly:
can they build a moat before the incumbents catch up?
Most AI product companies compete directly with established software companies. AI-native companies have speed, while incumbents have distribution. This begs the question: can the startup build a technical moat before the incumbent catches up?
For example, legal-tech firms have been slow to match Harvey, but Vscode and Replit have rapidly evolved to compete with Cursor and Codium. If Vscode is able to stay at parity with Cursor/Codium, I wonder if these new age AI IDEs will be able to sustain their growth trajectory. (i hope they can fwiw).
will model providers build competing products?
Lately, model providers have been building their own products as well. There are technical advantages to building a product on top of a model you own. You can:
Bake skills directly into the model. This enables the model to excel at tasks that are critical for the product.
Create data flywheels that improve future iterations of the model.
Take Deep Research as an example. It’s a fine-tuned version of o3 trained end-to-end on hard browsing tasks1. I imagine it will put pressure on Hebbia and Glean that are building search on top of less specialized models.
The question then becomes: what will be the niches in which the model providers create better products, and the niches where they don’t.
hypothesis on product companies
My hypothesis is that the most resilient AI product companies will:
Augment the base model in some capacity. Either through fine tuning it on proprietary data (ex: SK Telecom, Harvey), or by building some models in house (ex: Codium2, Replit).
Build in a verticals with large TAMs but too narrow for model providers to prioritize (ex: Salient building for automotive financing, EliseAI in leasing management).
section 2. infrastructure companies
AI infrastructure companies build tools to help AI product companies effectively use LLMs. For example:
Inference platforms like Modal, Fireworks, Together AI simplify hosting and fine-tuning open-source models
Vector databases like Pinecone, ChromaDB, Weaviate optimize databases for N-dimensional embeddings
Eval and observability tools like Braintrust, Arize AI, Galileo AI improve LLM reliability
I was drawn to this category mainly because they’re working on hard, interesting, cutting-edge engineering problems. When evaluating the viability of their businesses, I considered:
are their margins high enough?
Inference platforms have high hardware costs driven by the need for high-end GPUs and advanced infrastructure to keep reliability high. Thus, it’s worth looking into the company’s margins. For context, great software businesses have ~70% gross margins.
how easy is the infrastructure to build in house?
Selling to engineers is tough. If the infrastructure isn't great or costs too much, they'll quickly switch tools or build their own. Ideally, the software should be complex enough that building it in-house isn't practical, and specialized enough that bigger infrastructure companies can't easily add it to their product suite.
how much of the end-to-end pipeline can they own?
Every additional piece of infrastructure that a product company buys introduces another point-of-failure. Thus, I hypothesize that product companies will eventually gravitate towards infrastructure companies that have a suite of tools across the AI development stack – analogous to AWS and GCP’s cloud computing offerings. So it’s worth considering if the company plans to expand beyond its specific part of the development stack.
section 3. model providers
Model providers build the core intelligence powering the AI revolution. There are billions of dollars to be made3 – that is, if the company can survive. When assessing model companies, I considered:
if they’re training frontier models, can they afford to keep training them?
Training frontier models is expensive (GPT-4 cost about $78M4). With scaling laws, each next generation will get more expensive. Even if scaling doesn't continue, the trend towards inference-time computation (or thinking) makes hosting these models increasingly GPU-intensive.
If a company hasn't raised billions (OpenAI/Anthropic/X) or doesn't have its own cash machine (Meta/Google), it’s hard to compete. And if you’re thinking about DeepSeek, the claim that they spent $6M to train R1 has been debunked5 – it’s estimated that they spent closer to $500M.
are they staying ahead of open source alternatives?
Closed source providers can only charge premium prices if they’re first to market with state of the art models. AI product companies will pay a premium to ensure they have the best models to build on top of (for now at least). That’s partly why most AI product companies today still buy intelligence from close source providers like Anthropic/OpenAI instead of using the Llama models.
But if the closed source provider isn’t better than the open-source6 alternatives, premium pricing becomes hard (ex: Mistral).
how do their models compare to competitors?
I agree with the take that intelligence will eventually get commoditized. Customers will move to the cheapest, highest-quality option – similar to how we choose our electricity and wifi providers today. This means it’s important to have the best, cheapest model.
It can be hard to get a sense of whether a model is the best, especially if the company specializes in a specific modality like voice (Cartesia, Eleven Labs), Videos (Runway, Pika), and Images (Midjourney). To get around this, Chatbot Arena (LMYSIS) and ArtificialAnalysis help objectively assess performance within domains.
how frequently are they releasing models?
The model landscape is moving quickly. If a company isn’t releasing new models frequently, it’s worth asking why (e.g., Magic.dev, H, Runway).
conclusion
Aaand that’s a wrap on this 3 part series about looking for jobs in the AI space!
These were some of the key things I considered when evaluating AI companies7. Joining a company that doesn’t ultimately “win” can still be incredibly rewarding— as long as you're tackling interesting problems alongside great people. The best case scenario, however, is when all three align: challenging work, smart and kind colleagues, and being at a hypergrowth company positioned to win.
If this was helpful, I also wrote about other lessons from my job search: how to spot a rocketship startup in AI, and vanity metrics to ignore + why hypergrowth matters.
https://codeium.com/blog/our-model-strategy
Note: “open source" isn't free. Either you host the model yourself – not practical for most AI product companies that have a bunch of other things they need to focus on. Or you pay an inference provider like Azure, Amazon, or Fireworks to host the weights for you.
Given the number of AI companies out there, I found it helpful to narrow down to the layer of the AI stack I was most excited about. For example, since I’d already been on the product side, I primarily explored infra and model companies which helped make the job search feel more focused.
I have been trying to determine where I want to specialize for a while now. This article was extremely helpful in helping me to focus on infrastructure. Thanks so much!
How do you go about discovering startups, even before evaluating them? I think quantity is a big factor, but it's hard to find good options in the first place.