Cost / M billable
$1.582/M
$1.582/M blended breakeven
Jason's Chips
Subscribe or sign in with your Substack email for live inputs, outputs, and saved scenarios.
Translate GPU-hour goodput expense and benchmark throughput into cost per million tokens, token revenue, gross margin, breakeven pricing, and service-level sensitivity. The calculator separates token mix, throughput, utilization, goodput, and endpoint overhead.
Cost / M billable
$1.582/M
$1.582/M blended breakeven
Output cost / M
$3.163/M
50% output-token mix
Gross margin
20.9%
$2.000/M realized blended price
Billable tokens / hr
10.4M
7.01M weighted tokens / hr
Requests / hr
5.2K
5.19M output tokens / hr
Concurrent users
14
100 tok/s/user target
Cost bridge
GPU goodput
$15.20/hr
8 GPUs at $1.900/hr each
Effective throughput
7.01M / hr
75% util. x 99.5% goodput
Weighted token cost
$2.343/M
1,350 weighted tokens / request
Billable token cost
$1.582/M
10.4M input + output tokens / hr
Throughput accounting
Effective tok/s/GPU
326
Lost tokens / hr
2.38M
Revenue / request
$0.0040
Weighted tokens let prefill and decode use different cost weights. Output tokens usually dominate latency and serving cost, while long prompts can still consume material cache and prefill capacity.
Daily run-rate
Revenue stack
| Line item | Tokens / hr | Price | Revenue / cost |
|---|---|---|---|
| Input token revenue | 5.19M | $1.000/M | $5.190/hr |
| Output token revenue | 5.19M | $3.000/M | $15.57/hr |
| GPU goodput expense | - | - | ($15.20/hr) |
| Network, storage, control plane | - | - | ($1.216/hr) |
| Gross profit | 10.4M | 20.9% GM | $4.344/hr |
Sensitivity analysis
| Tok/s/GPU \ $/GPU-hr | $1.520/hr | $1.710/hr | $1.900/hr base | $2.090/hr | $2.280/hr |
|---|---|---|---|---|---|
| 163 | $2.530/M | $2.847/M | $3.163/M | $3.479/M | $3.796/M |
| 245 | $1.687/M | $1.898/M | $2.109/M | $2.320/M | $2.530/M |
| 326 base | $1.265/M | $1.423/M | $1.582/M | $1.740/M | $1.898/M |
| 408 | $1.012/M | $1.139/M | $1.265/M | $1.392/M | $1.518/M |
| 489 | $0.843/M | $0.949/M | $1.054/M | $1.160/M | $1.265/M |
The table keeps token mix, pricing, utilization, goodput, and overhead fixed while changing the two levers that dominate endpoint economics: aggregate throughput per GPU and hourly goodput expense.
Preset comparison
| Preset | Model | Tok/s/GPU | Tok/s/user | Cost / M billable | Gross margin |
|---|---|---|---|---|---|
| H200 R1 100 TPS | DeepSeek R1 | 326 | 100 | $1.582/M | 20.9% |
| B200 R1 efficient | DeepSeek R1 | 1,980 | 50 | $0.312/M | 79.2% |
| B200 R1 fast | DeepSeek R1 | 278 | 125 | $2.220/M | 11.2% |
| GB200 R1 NVL72 | DeepSeek R1 | 2,608 | 30 | $0.129/M | 86.5% |
| B200 GPT-OSS | GPT-OSS 120B | 5,824 | 90 | $0.193/M | 91.4% |
| GB300 R1 MTP | DeepSeek R1 | 13,100 | 150 | $0.030/M | 89.4% |