Pricing Overview

Transparent pricing for scalable infrastructure.

No hidden costs. Pay for what you actually use.

THE AI LIFECYCLE PLATFORM

Deploy Pad

Transparent, Workload-Based Pricing

Pricing is automatically calculated based on your model choice and daily token workload.

Intelligent GPU Cost Efficiency

We select the most cost-efficient GPU configuration for your deployment to minimize your operational expense.

Pricing Includes Full Service

Pricing covers full optimization, orchestration, and monitoring by our expert MLOps engineers.

Get Your Instant Quote

Log in, choose a model, enter workload, and get an immediate, transparent quote.

Pricing

Dedicated  GPU Inference

On demand dedicated GPU optimized for runining LLM inferencing.

TPS / User

Concurrent

Monthly Token

Hourly

Qwen3 Coder Next

RTX Pro 6000 96GB

95

32

7.8 Billion

1.6 USD

GLM 4.7 Flash

RTX Pro 6000 96GB

105

32

8.7 Billion

1.6 USD

MiniMax M2.5

RTX Pro 6000 96GB x 2

55

32

4.6 Billion

1.6 USD

Step 3.5 Flash

RTX Pro 6000 96GB x 2

80

32

6.4 Billion

3.2 USD

Deep Seek v3.2

H200 141 GB x 4

Coming Soon

Coming Soon

Coming Soon

10 USD

Trinity Mini

RTX Pro 6000 96GB

80

32

6.4 Billion

1.6 USD

H100 80GB

101

32

8.3 Billion

2.1 USD

H200 141 GB

120

32

9.9 Billion

2.5 USD

Pricing

Shared Inference

On-demand shared GPU inference optimized for throughput and cost efficiency.
Built for production workloads and available directly or through OpenRouter.

Context Size

Input Price

Output Price

MiroThinker v1.5 30B

256k

0.4 USD

0.4 USD

IQuest Coder v1 40B Instruct

128k

0.45 USD

0.5 USD

Solar Open 100B

128k

0.8 USD

1 USD

Input and Output price is calculated every million tokens