Jason's Chips

Subscribe or sign in with your Substack email for live inputs, outputs, and saved scenarios.

Calculator 05Inference unit economics

Inference Token Economics Calculator

Translate GPU-hour goodput expense and benchmark throughput into cost per million tokens, token revenue, gross margin, breakeven pricing, and service-level sensitivity. The calculator separates token mix, throughput, utilization, goodput, and endpoint overhead.

Sensitivity

Cost / M billable

$1.582/M

$1.582/M blended breakeven

Output cost / M

$3.163/M

50% output-token mix

Gross margin

20.9%

$2.000/M realized blended price

Billable tokens / hr

10.4M

7.01M weighted tokens / hr

Requests / hr

5.2K

5.19M output tokens / hr

Concurrent users

100 tok/s/user target

Cost bridge

GPU-hour expense to token unit cost

GPU goodput

$15.20/hr

8 GPUs at $1.900/hr each

Effective throughput

7.01M / hr

75% util. x 99.5% goodput

Weighted token cost

$2.343/M

1,350 weighted tokens / request

Billable token cost

$1.582/M

10.4M input + output tokens / hr

Throughput accounting

Effective tok/s/GPU

326

Lost tokens / hr

2.38M

Revenue / request

$0.0040

Weighted tokens let prefill and decode use different cost weights. Output tokens usually dominate latency and serving cost, while long prompts can still consume material cache and prefill capacity.

Daily run-rate

Billable tokens249.1M

Revenue$498.23

Cost$393.98

Gross profit$104.25

Revenue stack

Token revenue and endpoint cost

Line item	Tokens / hr	Price	Revenue / cost
Input token revenue	5.19M	$1.000/M	$5.190/hr
Output token revenue	5.19M	$3.000/M	$15.57/hr
GPU goodput expense	-	-	($15.20/hr)
Network, storage, control plane	-	-	($1.216/hr)
Gross profit	10.4M	20.9% GM	$4.344/hr

Sensitivity analysis

Throughput per GPU x GPU-hour expense

Metric

Tok/s/GPU \ $/GPU-hr	$1.520/hr	$1.710/hr	$1.900/hr base	$2.090/hr	$2.280/hr
163	$2.530/M	$2.847/M	$3.163/M	$3.479/M	$3.796/M
245	$1.687/M	$1.898/M	$2.109/M	$2.320/M	$2.530/M
326 base	$1.265/M	$1.423/M	$1.582/M	$1.740/M	$1.898/M
408	$1.012/M	$1.139/M	$1.265/M	$1.392/M	$1.518/M
489	$0.843/M	$0.949/M	$1.054/M	$1.160/M	$1.265/M

The table keeps token mix, pricing, utilization, goodput, and overhead fixed while changing the two levers that dominate endpoint economics: aggregate throughput per GPU and hourly goodput expense.

Preset comparison

Benchmark cases as editable endpoints

Preset	Model	Tok/s/GPU	Tok/s/user	Cost / M billable	Gross margin
H200 R1 100 TPS	DeepSeek R1	326	100	$1.582/M	20.9%
B200 R1 efficient	DeepSeek R1	1,980	50	$0.312/M	79.2%
B200 R1 fast	DeepSeek R1	278	125	$2.220/M	11.2%
GB200 R1 NVL72	DeepSeek R1	2,608	30	$0.129/M	86.5%
B200 GPT-OSS	GPT-OSS 120B	5,824	90	$0.193/M	91.4%
GB300 R1 MTP	DeepSeek R1	13,100	150	$0.030/M	89.4%

Line item

Tokens / hr

Price

Revenue / cost

Input token revenue

5.19M

$1.000/M

$5.190/hr

Output token revenue

5.19M

$3.000/M

$15.57/hr

GPU goodput expense

($15.20/hr)

Network, storage, control plane

($1.216/hr)

Gross profit

10.4M

20.9% GM

$4.344/hr

Tok/s/GPU \ $/GPU-hr

$1.520/hr

$1.710/hr

$1.900/hr base

$2.090/hr

$2.280/hr

163

$2.530/M

$2.847/M

$3.163/M

$3.479/M

$3.796/M

245

$1.687/M

$1.898/M

$2.109/M

$2.320/M

$2.530/M

326 base

$1.265/M

$1.423/M

$1.582/M

$1.740/M

$1.898/M

408

$1.012/M

$1.139/M

$1.265/M

$1.392/M

$1.518/M

489

$0.843/M

$0.949/M

$1.054/M

$1.160/M

$1.265/M

Preset

Model

Tok/s/GPU

Tok/s/user

Cost / M billable

Gross margin

H200 R1 100 TPS

DeepSeek R1

326

100

$1.582/M

20.9%

B200 R1 efficient

DeepSeek R1

1,980

$0.312/M

79.2%

B200 R1 fast

DeepSeek R1

278

125

$2.220/M

11.2%

GB200 R1 NVL72

DeepSeek R1

2,608

$0.129/M

86.5%

B200 GPT-OSS

GPT-OSS 120B

5,824

$0.193/M

91.4%

GB300 R1 MTP

DeepSeek R1

13,100

150

$0.030/M

89.4%