Skip to main content

Documentation Index

Fetch the complete documentation index at: https://platform.minimax.io/docs/llms.txt

Use this file to discover all available pages before exploring further.

What is a Token Plan Key?

The Token Plan Key is the key used for both Token Plan quotas and Credits. Each user has a dedicated Token Plan Key for each Team they belong to. The key can exist before any Token Plan seat or Credits are available. In that state, it has no usable paid resources. Once a Token Plan seat is assigned, or Credits access is available, the same key can use those resources. Token Plan Keys are separate from standard pay-as-you-go API Keys.

Can I use Credits without a Token Plan subscription?

Yes. Credits can be purchased and used without an assigned Token Plan subscription seat. Credits still use the Token Plan Key and have the same resource coverage as Token Plan. If you have no Token Plan seat but have Credits access, usage within that coverage is charged to Credits. If you have both Token Plan quota and Credits, Token Plan quota is used first, then Credits automatically cover overflow within Token Plan resource coverage. Use a pay-as-you-go API Key for resources outside Token Plan coverage. For details, see Credits.

What is the Default Team?

Every user gets a Default Team when creating an account. It is the user’s personal Team:
  • It has one member only.
  • Other users cannot be invited to it.
  • The user is the sole Owner.
  • The Owner can buy individual Token Plan subscriptions and Credits for personal use.
For regular Teams, Token Plan seats are assigned to members and Credits can be shared from a Team Credits pool. See Team Access.

Which models does Token Plan support? How do I switch models?

Token Plan supports MiniMax models across all modalities — text, speech, video, image, and music. Text models (M2.7) are available on all plans, while non-text models are available with different access levels and daily quotas depending on your plan tier. The High-Speed subscription also supports the MiniMax-M2.7-highspeed and MiniMax-M2.5-highspeed models. Non-text models included in Token Plan:
  • TTS HD (speech-2.8-hd / speech-2.6-hd / speech-02-hd)
  • Hailuo-2.3-Fast (768P 6s video)
  • Hailuo-2.3 (768P 6s video)
  • Music-2.6 (up to 5-minute music)
  • image-01 (image generation)
To call these non-text models in your AI agent, see the MiniMax CLI guide.
Different plans include different models. Models marked as ”—” are not available on that plan. See the pricing page for details. To switch models, modify the model parameter in your API calls:
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="MiniMax-M2.7",  # Switch to other models like MiniMax-M2.5
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Hi, how are you?"
                }
            ]
        }
    ]
)
Text models share a request quota (5-hour rolling window), while non-text models each have their own independent daily quotas.

What is the High-Speed subscription? How does it differ from the Standard plan?

The High-Speed subscription is a new plan offered by Token Plan that provides dedicated support for the MiniMax-M2.7-highspeed model. Differences between MiniMax-M2.7-highspeed and MiniMax-M2.7:
  • Same performance: MiniMax-M2.7-highspeed delivers the same model capability and output quality as MiniMax-M2.7
  • Significantly faster: MiniMax-M2.7-highspeed offers considerably higher inference output speed than MiniMax-M2.7
If you have high requirements for coding tool response speed, the High-Speed subscription is recommended.

Can I upgrade my subscription plan?

Yes. Token Plan supports upgrading your plan at any time during your subscription period, including upgrading from a Standard plan to a High-Speed plan, or upgrading to a higher tier within the same plan type. You only need to pay the price difference, and the new plan takes effect immediately.

How to check Token Plan usage?

You can check your Token Plan usage in two ways: Method 1: Visit the Subscription Management Page Visit the Billing > Token Plan page to view your usage. Method 2: Use the API Endpoint
curl --location 'https://www.minimax.io/v1/token_plan/remains' \
--header 'Authorization: Bearer <API Key>' \
--header 'Content-Type: application/json'

How is usage reset?

Token Plan has two reset mechanisms:
  • M2.7: Uses a 5-hour rolling window. The system calculates your total request usage within the past 5 hours, and any usage from more than 5 hours ago is automatically released.
  • Other models (TTS HD, video, music, image): Use daily quotas that reset automatically each day.

What happens when I reach the usage limit?

When reaching the 5-hour request limit for M2.7:
  • Use Credits
    If Credits are available, usage within Token Plan resource coverage can be automatically covered by Credits.
  • Upgrade your subscription
    You can visit the Token Plan page to upgrade to a higher-tier plan for more request quota. Token Plan supports upgrading at any time, and upgrades take effect immediately.
  • Switch to pay-as-you-go
    If you wish to continue without rate limits, you can replace your Token Plan Key with your standard MiniMax Open Platform API Key from the account management system. This will switch the tool to a pay-as-you-go model based on actual token usage, which will consume your Open Platform account balance.
  • Wait for the reset
    The text model limit is based on a dynamic 5-hour window. You can pause usage, wait for the window to roll over, and your quota will automatically recover.
When reaching the daily quota limit for non-text models:
  • Use Credits, if Credits are available for usage within Token Plan resource coverage.
  • Upgrade your subscription to get higher daily quotas.
  • Switch to pay-as-you-go, using your standard API Key to continue calling the corresponding model.
  • Wait for the next day’s automatic reset.

Can the Token Plan Key and the standard Open Platform API Key be used interchangeably?

No, they cannot.
  • Token Plan Key: Is used for Token Plan quotas and Credits. Text models are measured by request count (5-hour rolling limit), while non-text models use daily quotas. Credits have the same resource coverage as Token Plan and can cover overflow beyond subscription quota.
  • Other Open Platform API Keys: Are used for pay-as-you-go access to standard Open Platform API endpoints. Billing is based on actual token consumption and depletes your account balance.

How does API-vlm work with Token Plan?

API-vlm supports multimodal understanding for image, video, and audio inputs. Its output is text. When called with Token Plan, each API-vlm request deducts 3 M2.7 requests. For High-Speed plans, each request deducts 3 M2.7-highspeed requests. If the available Token Plan quota is exhausted and Credits are available, additional API-vlm usage can be automatically covered by Credits.

How is TPS (Tokens Per Second) calculated for text models?

TPS measures the number of tokens generated per second, and is used to evaluate the inference output speed of a model. The formula is: TPS=Number of output tokensTime of last tokenTime of first token\text{TPS} = \frac{\text{Number of output tokens}}{\text{Time of last token} - \text{Time of first token}} In other words, timing starts when the model outputs the first token and ends when the last token is generated. The total number of tokens produced is then divided by that elapsed time (in seconds).
TPS may fluctuate during actual usage. The TPS values indicated on each model page are reference values.

What are the limits of the Token Plan? Is it suitable for production?

The Token Plan is designed for individual, interactive developer use, with higher-tier plans offering increased quotas. It is recommended to use pay-as-you-go for production use. Key limits include:
  • Rate limits (RPM / TPM): Requests may be throttled when exceeded; typically reset within ~1 minute and may tighten during peak traffic
  • Text model quota: Request caps per 5-hour rolling window, with automatic recovery as the window rolls over
  • Non-text model daily quotas: Daily caps that reset automatically each day

What are the platform traffic rules?

Due to the unexpected popularity of the MiniMax-M2.7 model, traffic has grown rapidly. To ensure service stability and availability for all users, the MiniMax platform will implement dynamic rate limiting during peak hours. The details are as follows: We have observed that some requests come from ultra-high-concurrency automated batch tasks or multi-user sharing patterns. To prevent a small number of abnormal traffic from occupying public computing resources and to ensure a stable experience for the majority of users, we will implement rate control based on account usage dimensions to ensure fair distribution of computing resources. Platform Rate Limiting Rules Consistent with industry practices, MiniMax will implement dynamic rate limiting during peak hours:
  • Peak Traffic Hours: Dynamically adjusted based on cluster load, typically occurring on weekdays from 15:00–17:30
    • Starter / Plus: Supports approximately 1 Agent continuous call
    • Max: Supports approximately 2 Agents continuous calls
    • Ultra: Supports approximately 4 Agents continuous calls
  • Weekly Usage Quota: The current weekly usage quota is 10 times the “5-hour quota” (industry common range is 5–8 times)
    • Users who purchased before 2026-03-22 23:59:59: Not subject to weekly quota limits
    • Users who purchase from 2026-03-23 00:00:00 onwards: Subject to weekly quota limits
At the same time, we are continuously advancing computing capacity expansion and system optimization to provide you with more stable and reliable services. Thank you for your understanding and support!