ai-multimodal

Description

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

Tool Access

4 tools

Limited to specific tools

Supporting Files

12 files

Additional assets for this skill

Tool Access

This skill is limited to using the following tools:

BashReadWriteEdit

Supporting Assets

references/audio-processing.md

references/image-generation.md

references/video-analysis.md

references/vision-understanding.md

scripts/document_converter.py

scripts/gemini_batch_process.py

scripts/media_optimizer.py

scripts/requirements.txt

scripts/tests/requirements.txt

scripts/tests/test_document_converter.py

scripts/tests/test_gemini_batch_process.py

scripts/tests/test_media_optimizer.py

Links

GitHub Stats

0 forks

Updated 1 month ago

Related Skills

Agent Development

45.7K

This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.

From plugin-dev

Command Development

45.7K

This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.

From plugin-dev

Hook Development

45.7K

This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.

From plugin-dev