🌲 Mark Kim

GPU Unit Economics

The unit economics of GPU clouds are surprisingly good.

The Information states that Together is sitting at around 45% GM. They are likely far from being net income positive, but it shows a path to profitability with financial discipline.

I've been a skeptic myself, but after having seen enough players with different approaches to pricing and financing, I see a potential path for sustainable growth.

Pricing by tokens, while presenting issues around churn, vastly increases profitability. Bare metal instances and managed services provide stickiness, although unit economics are typically lower.

The per-hour B200 rates are between $2.88 and $3.25 for multi-year compute contracts. On-demand prices are higher, sitting closer to $6.00/GPU/hour. But a per-token pricing makes the unit economics more attractive. Nvidia claims that its 45,000 GB200 within a NVL72 rack system can generate 12B tokens/second. That comes out to roughly $19.20/GPU/hour or about 3.2x of on-demand price (assuming $0.02 per million tokens).

There are nuances to this pricing. Utilization rates might be much lower, and system-level GPU performance is different from chip-level performance. We assume 266k tokens/second on a single GPU within a 45,000 GPU system, but this number is likely closer to between 11k and 60k tokens/second. Nonetheless, the point is that the different ways of slicing and dicing your compute product could help drive better unit economics.

Owning your GPUs drives further improves this. Equity is an expensive source of financing for existing shareholders, but it vastly lowers interest-related expenses and helps improve free cash flow.