What this page answers
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi
- Xiaomi · xiaomi/mimo-v2-omni
- text+image+audio+video->text · China model route
- 262,144 context · $0.40 input
Before connecting
Do not stop at the model name. Before integration, verify base URL, protocol, visible models, parameters, and limits together.
- supports frequency_penalty
- supports include_reasoning
- supports max_tokens
- supports presence_penalty
- supports reasoning
Next action
The goal is to catch search demand, then move users into model profiles, provider profiles, and key checking.
- Check whether the model fits the use case
- Then verify key permission and callable models