Atlasresearch

Workspace

WorkspacePokedexHPC MapPortfolioCalendarTaiwan RevenueMemory
Calculators
  • 01AI Factory
  • 02CPO Stack
  • 03HBM Bridge
  • 04GPU Cloud TCO
  • 05Inference Tokens
  • 06Power Bottleneck
  • 07Custom ASIC
  • 08OCS TAM
  • 09N3 Allocation
  • 10Optical TAM
Agent APISubscribe
Sign in
Calculator
Atlasresearch

Workspace

WorkspacePokedexHPC MapPortfolioCalendarTaiwan RevenueMemory
Calculators
  • 01AI Factory
  • 02CPO Stack
  • 03HBM Bridge
  • 04GPU Cloud TCO
  • 05Inference Tokens
  • 06Power Bottleneck
  • 07Custom ASIC
  • 08OCS TAM
  • 09N3 Allocation
  • 10Optical TAM
Agent APISubscribe
Sign in
Calculators/Inference Tokens

Jason's Chips

Subscribe or sign in with your Substack email for live inputs, outputs, and saved scenarios.

Subscribe to Jason's ChipsSign in

Inputs

GPU capacity

Token shape

Token pricing

Last calculated: Not yet

Calculator 05Inference unit economics

Inference Token Economics Calculator

Translate GPU-hour goodput expense and benchmark throughput into cost per million tokens, token revenue, gross margin, breakeven pricing, and service-level sensitivity. The calculator separates token mix, throughput, utilization, goodput, and endpoint overhead.

Sensitivity

Cost / M billable

$1.582/M

$1.582/M blended breakeven

Output cost / M

$3.163/M

50% output-token mix

Gross margin

20.9%

$2.000/M realized blended price

Billable tokens / hr

10.4M

7.01M weighted tokens / hr

Requests / hr

5.2K

5.19M output tokens / hr

Concurrent users

14

100 tok/s/user target

Cost bridge

GPU-hour expense to token unit cost

GPU goodput

$15.20/hr

8 GPUs at $1.900/hr each

Effective throughput

7.01M / hr

75% util. x 99.5% goodput

Weighted token cost

$2.343/M

1,350 weighted tokens / request

Billable token cost

$1.582/M

10.4M input + output tokens / hr

Throughput accounting

Effective tok/s/GPU

326

Lost tokens / hr

2.38M

Revenue / request

$0.0040

Weighted tokens let prefill and decode use different cost weights. Output tokens usually dominate latency and serving cost, while long prompts can still consume material cache and prefill capacity.

Daily run-rate

Billable tokens249.1M
Revenue$498.23
Cost$393.98
Gross profit$104.25

Revenue stack

Token revenue and endpoint cost

Line itemTokens / hrPriceRevenue / cost
Input token revenue5.19M$1.000/M$5.190/hr
Output token revenue5.19M$3.000/M$15.57/hr
GPU goodput expense--($15.20/hr)
Network, storage, control plane--($1.216/hr)
Gross profit10.4M20.9% GM$4.344/hr

Sensitivity analysis

Throughput per GPU x GPU-hour expense

Tok/s/GPU \ $/GPU-hr$1.520/hr$1.710/hr$1.900/hr base$2.090/hr$2.280/hr
163$2.530/M$2.847/M$3.163/M$3.479/M$3.796/M
245$1.687/M$1.898/M$2.109/M$2.320/M$2.530/M
326 base$1.265/M$1.423/M$1.582/M$1.740/M$1.898/M
408$1.012/M$1.139/M$1.265/M$1.392/M$1.518/M
489$0.843/M$0.949/M$1.054/M$1.160/M$1.265/M

The table keeps token mix, pricing, utilization, goodput, and overhead fixed while changing the two levers that dominate endpoint economics: aggregate throughput per GPU and hourly goodput expense.

Preset comparison

Benchmark cases as editable endpoints

PresetModelTok/s/GPUTok/s/userCost / M billableGross margin
H200 R1 100 TPSDeepSeek R1326100$1.582/M20.9%
B200 R1 efficientDeepSeek R11,98050$0.312/M79.2%
B200 R1 fastDeepSeek R1278125$2.220/M11.2%
GB200 R1 NVL72DeepSeek R12,60830$0.129/M86.5%
B200 GPT-OSSGPT-OSS 120B5,82490$0.193/M91.4%
GB300 R1 MTPDeepSeek R113,100150$0.030/M89.4%

Assumptions

H200 R1 100 TPS

PlatformNVIDIA H200 HGX
ModelDeepSeek R1
PrecisionFP8
Serving engineInferenceX endpoint curve
ScenarioInteractive reasoning endpoint
Preset noteH200 TCO case: 326 tok/s/GPU, 75% effective utilization, and roughly $1.90-$2.10 goodput expense per GPU-hour.
Annual billable tokens90.93B
Annual revenue$181.9K
Annual cost$143.8K

Saved scenarios

H200 R1 100 TPS

21% GM / 10.4M tok/hr

$1.582/M

B200 R1 efficient

79% GM / 84.7M tok/hr

$0.312/M

B200 R1 fast

11% GM / 11.9M tok/hr

$2.220/M

GB200 R1 NVL72

86% GM / 2.05B tok/hr

$0.129/M