$catgemini-multimodal-analyzer.md
Gemini Multimodal Analyzer
Extract insights from images, documents, audio, and video using Gemini's multimodal capabilities.
Best
gemini-2.5-pro
Good
gpt-4o, claude-sonnet-4
Limited
gemini-2.5-flash, gpt-4o-mini
Updated
2026-05-22
workflow
You are a multimodal analysis expert powered by Gemini. Analyze the provided media with comprehensive attention to detail.
Media Type: {{mediaType}} Analysis Goal: {{analysisGoal}} Focus Area: {{focusArea || "All visible content"}} Output Format: {{outputFormat || "Structured report"}}
Analysis Framework
1. Content Inventory
Catalog everything visible in the media:
- Text: All readable text, labels, annotations
- Visual Elements: Charts, graphs, diagrams, icons
- Layout: Structure, hierarchy, color coding
- Metadata: File type, resolution, annotations
2. Structured Extraction
For each identified element, extract:
Element: [name]
Type: [text/visual/structural]
Content: [extracted data]
Confidence: [high/medium/low]
Notes: [ambiguities, uncertainties]
3. Pattern Recognition
Identify cross-element patterns:
- Trends: Repeating themes or data trajectories
- Anomalies: Outliers or unexpected elements
- Relationships: How elements connect or contradict
- Missing: What should be present but isn't
4. Output Generation
Format findings according to {{outputFormat}}:
- Structured Report: Sections with headings, subheadings, and bullet points
- JSON: Machine-readable key-value pairs
- Summary: Concise 3-5 paragraph overview
- Comparison: Side-by-side analysis if multiple media
5. Confidence Scoring
Rate each finding:
| Confidence | Meaning |
|---|---|
| High | Clearly visible, unambiguous |
| Medium | Reasonable interpretation, some uncertainty |
| Low | Best guess, needs human verification |
6. Limitations Acknowledgment
Note any analysis limitations:
- Blurred or low-resolution areas
- Text in unsupported languages
- Domain-specific jargon needing context
- Partial visibility or cropped content
Begin with bold headers for each section. Use code for extracted data points. End with a summary of the top 3 most important findings.
variables
^Enter
guide
how to use
- Open the Gemini Multimodal Analyzer workflow in your AI chat interface.
- Replace the variables in [brackets] with your specific inputs.
- For best results, use gemini-2.5-pro as the target model.
- Review the generated output and iterate by refining your inputs.
- Save your final result and share it with your team.
best use cases
- Quickly generate gemini-specific content with structured prompts.
- Standardize gemini workflows across your team using a shared template.
- Onboard new team members with a repeatable gemini process.
- Automate gemini tasks with AI-powered gemini workflows.
- Automate multimodal tasks with AI-powered gemini workflows.
- Automate vision tasks with AI-powered gemini workflows.
examples
- Use Gemini Multimodal Analyzer to create a gemini project from scratch.
- Adapt Gemini Multimodal Analyzer for a different gemini domain with custom variables.
- Combine Gemini Multimodal Analyzer with other workflows in the gemini category for a complete pipeline.
- Run Gemini Multimodal Analyzer with multiple AI models to compare output quality.
- Schedule Gemini Multimodal Analyzer as a recurring gemini task.
variations
- Simplified version: remove optional variables for faster results.
- Advanced version: add custom validation steps after generation.
- Batch version: run Gemini Multimodal Analyzer on multiple inputs sequentially.
- gemini-focused variant: emphasize gemini best practices in the prompt.
- multimodal-focused variant: emphasize multimodal best practices in the prompt.
common mistakes
- Skipping variable customization — always replace [bracketed] placeholders.
- Using the wrong AI model tier for complex outputs.
- Not iterating on the first result — refinement improves quality significantly.
- Ignoring gemini best practices when customizing the prompt.
- Using gemini-2.5-pro outside its optimal use case for this workflow.
related
trending
ChatGPT Conversational Tutor
Learn any subject through adaptive Socratic dialogue with ChatGPT, tailored to your knowledge level and learning style.
ChatGPT GPT Builder Configurator
Design custom GPTs for ChatGPT with tailored instructions, knowledge files, conversation starters, and capabilities.
ChatGPT Prompt Library Manager
Design, organize, and optimize a reusable library of ChatGPT prompts for consistent output across projects and teams.
$ echo "contribute.sh"