Skip to main content

Models Overview

Text

Models          Description          Features                
MiniMax-M2• Context Length: 200k tokens
• Maximum Output: 128k tokens (including CoT)
• Agentic capabilities
• Function calling
• Advanced reasoning
• Real-time streaming

Audio

Models            Description          Features                          
speech-2.6-hd• Ultimate Similarity
• Ultra-High Quality
• 40 languages supported
• 7 emotions supported
• specified languages and dialects supported
speech-2.6-turbo• Ultimate Value
• Low latency
• 40 languages supported
• 7 emotions supported
• specified languages and dialects supported
speech-02-hd• Stronger replication similarity
• High quality voice generation
• 24 languages supported
• 7 emotions supported
• specified languages and dialects supported
speech-02-turbo• Superior rhythm and stability
• Low latency
• 24 languages supported
• 7 emotions supported
• specified languages and dialects supported

Video

Models            Description                  Res.& Dur.        FPS        
MiniMax Hailuo 2.3• Text to Video & Image to Video
• SOTA instruction following
• Extreme physics mastery
• 1080p 6s
• 768p 6s, 10s
24 fps
MiniMax Hailuo 2.3Fast• Image to Video
• Extreme physics mastery
• Value and Efficiency
• 1080p 6s
• 768p 6s, 10s
24 fps
MiniMax Hailuo 02• Text to Video & Image to Video
• SOTA instruction following
• Extreme physics mastery
• 1080p 6s
• 768p 6s, 10s
• 512p 6s, 10s
24 fps

Music

Models            Description          Features                          
Music-2.0• Text to Music
• Enhanced musicality
• Natural vocals and smooth melodies
• Human-like performance
• Riche emotional expression
• Enhanced tone control
• Realistic, expressive vocals