Google Gemini Vision is the multimodal vision component of Google's Gemini family, designed to interpret and reason over images and other visual inputs and return text outputs. It powers image understanding in Gemini models, supporting tasks like captioning, OCR, chart and diagram understanding, and multimodal reasoning. The model is accessed via the Gemini API and in products like Google AI Studio and Google Cloud Vertex AI.