Documentation
Coherence Studio Docs
Everything you need to know about screen recording, AI editing, and AI video creation.
Getting Started
Installation
Download Coherence Studio from the GitHub releases page. Available for macOS (Apple Silicon & Intel, signed & notarized), Windows 10+ (Azure Trusted Signing), and Linux (AppImage).
First Launch
On launch you'll see the project browser where you can start a new recording, open an existing project, or create an AI video (Pro). Recent projects are listed for quick access with metadata previews showing model, aesthetic, and scene count.
Settings
AI Provider & Model
Configure your AI provider and model in the settings gear icon. Supported providers:
- OpenAI — GPT-5.4, GPT-5.4-mini (default)
- Anthropic — Claude Sonnet 4.6, Claude Haiku
- MiniMax — MiniMax-M2.7 (primary for video generation)
Each provider requires an API key entered in Settings. The selected model is used for scene planning, composition, and Director chat. Vision-capable models (OpenAI, Anthropic) support image/screenshot uploads for generation.
Music Provider
Choose between MiniMax (default) and ElevenLabs for AI music generation. Both support custom prompts and mood presets.
Narration Voice
Select a narration voice or let the system auto-match based on your brief and aesthetic. Voice selection persists per project.
Screen Recording
Starting a Recording
Click "New Recording" to open the source picker. Select a screen or window to capture. You can also record your webcam as a picture-in-picture overlay.
Multi-Window
Open multiple editor windows with Ctrl+Shift+T (or Cmd+Shift+T on macOS). Each window has its own project, recording session, and save state. You can record one Studio window from another using File → Open Window for Recording.
Audio Capture
Capture system audio and/or microphone input. Audio tracks appear in the editor timeline for independent control.
AI Editing (Free)
AI Auto-Captions
Powered by on-device Whisper. Generates perfectly timed subtitles without any cloud upload. Multiple model sizes available (tiny, base, small, medium) — larger models are more accurate but slower.
Smart Trimming
AI analyzes your recording and suggests trims for dead air, loading screens, and silence. Accept or reject each suggestion individually.
Auto-Zoom on Cursor
Cursor telemetry tracks mouse position, clicks, and movement patterns. The editor suggests zoom keyframes that keep viewers focused on the action. Click to accept or manually adjust.
AI Narration
Generate a voiceover for your recording. The AI creates natural-sounding narration based on the on-screen content and any captions.
One-Click Polish
Applies zoom, captions, background, and speed ramps in one operation. Takes a raw recording to professional quality in seconds.
AI Video Creation (Pro)
Overview
Paste a website URL or describe your idea. A 13-agent production team generates a complete motion graphics video with custom scenes, transitions, narration, and music.
Input Methods
- URL mode — paste a website URL. The system crawls the site, extracts brand voice + design references, captures screenshots, and generates a video showcasing the product. Scope directives like "only focus on the homepage" are respected.
- Research mode — describe any topic. The AI researches it and creates an explainer/documentary-style video without needing a URL.
- File upload — attach a screenshot or image directly. The AI extracts product details via vision and generates a video from the image content, avoiding website crawling entirely. Requires a vision-capable model (OpenAI or Anthropic).
Video Types
Choose from preset video types that control duration, pacing, and narrative arc:
- Product Teaser — 30-60s high-energy overview
- Explainer — 60-90s deeper walkthrough
- Social Clip — 15-30s for social media
- Documentary — 60-120s editorial style
Custom Briefs
Type a free-form description in the message input alongside any video type selection. The brief drives aesthetic synthesis, voice selection, and narrative tone. Example: "Create a matrix style background, monospace font, and Morpheus as the narrator."
Production Team (Pro)
13 specialized AI agents run sequentially, each shaping a different aspect of the video. Every agent receives the cumulative output of the agents before it.
| Agent | Role |
|---|---|
| Script Doctor | Punches up headlines and subtitles. Tests: "would I screenshot this and tweet it?" |
| Hook Specialist | Reviews scenes 0-1 (the cold open). Replaces or tweaks if the hook doesn't earn a stop-scroll. |
| Story Writer | Drafts per-scene voice-over narration text aligned with the visual narrative. |
| Stylist | Applies the video aesthetic to every scene — backgrounds, fonts, effects, decorations. |
| Test Audience | Simulates a target persona watching the video. Returns a score, arc analysis, and one actionable fix. |
| Director | Generates motion briefs for hero scenes — metaphor, visual style, motion family, energy level, color palette. |
| Composer | Writes custom Remotion TSX code for each hero scene based on the Director's brief. |
| Motion Director | Reviews each scene against 12 animation principles. Scores craft 1-10 and lists specific violations. |
| Animator | Refines scenes that scored below 9/10 — fixes linear easing, missing anticipation, instant stops, etc. |
| Continuity Editor | Reviews the full sequence for rhythm, variety, and unity. Adjusts transitions and durations. |
| Scene Checker | Lints, dry-renders, and AI-repairs each scene. Catches undefined refs, hook violations, broken JSX. |
| Sound Designer | Places SFX cues (whoosh, impact, shimmer, etc.) at scene transitions and key moments. |
| Narrator | Synthesizes TTS audio for each scene using brief-matched or user-selected MiniMax voices. |
Scene Library
179+ pre-built scene components organized by category. The Composer uses these as building blocks, and you can swap types in the Scene Editor.
Categories
- Themed Systems (33) — complete aesthetic treatments: Cyberpunk, Neon, Holographic, Glassmorphism, Art Deco, Bauhaus, Memphis, Japanese, and more.
- Cinematic Presets (10) — genre-specific: Epic, Sci-Fi, Noir, Anime, Documentary, Action, Horror, Romance, Vintage, Minimal End.
- Background Layers (10) — Aurora, Bokeh, Flowing Gradient, Geometric, Grid, Mesh Gradient, Perspective Grid, Radial, Waves, Noise Texture.
- Text Animations (12) — Neon, Kinetic, Explode, 3D Flip, Scramble, Gradient, Wave, Counter, Glitch, Mask Reveal, Split, Typewriter.
- Logo Reveals (10) — 3D Rotate, Stroke, Neon Sign, Particles, Glitch, Mask Reveal, Light Trail, Morph, Split Screen, Stamp.
- Data Visualizations (8) — Bar Chart, Pie, Line, Donut, Progress, Stat Card, Gauge, Sparkline.
- Liquid Effects (10) — Blob, Ink Splash, Fluid Wave, Swirl, Morph Blob, Calligraphy Ink, Oil Spill, Paint Drip, Splatter, Water Drop.
- Shape Animations (10) — Ripples, Hex Grid, Spinning Rings, Morphing, Circular Progress, Explosion, Helix, 3D Cube, Mandala, Particle Field.
- Layouts (11) — Asymmetric, Diagonal, Frame in Frame, Fullscreen Type, Giant Number, Grid Break, Layered, Multi-Column, Off Grid, Split Contrast, Whitespace.
- Particle Effects (13) — Mist, Light Rays, Stars, Fireflies, Sparks, Embers, Bubbles, Snow, Sakura, Confetti, Fireworks, Lightning, Smoke.
- Visual Effects (10) — Film Grain, VHS, Chromatic Aberration, Glow, Light Leak, Duotone, Kaleidoscope, Depth of Field, Matrix, Noise.
Scene Types (37)
High-level scene templates that the plan generator selects from: hero-text, ghost-hook, data-flow-network, metrics-dashboard, stacked-hierarchy, scrolling-list, echo-hero, outline-hero, cinematic-title, radial-vortex, gradient-mesh-hero, contrast-pairs, word-slot-machine, app-icon-cloud, chat-narrative, countdown, impact-word, typewriter-prompt, and more. Each supports variants (e.g. data-flow-network has circles, timeline-arrows, hex-grid, isometric-blocks, orbital-rings).
Scene Editor
Overview
After AI generation, the Scene Editor lets you fine-tune every aspect. The left panel shows the live Remotion preview; the right panel has tabs for Scenes, Tools, Director, Code, Music, and Narration.
Scenes Tab
Each scene card has collapsible sections for:
- Style — scene type picker (searchable, with best-fit suggestions), variant selector, duration (frames)
- Background — color swatches, custom hex, animated background effects with intensity slider
- Pacing — layer gap timing
- AI Video Clip — attach or generate a MiniMax video clip for this scene
- Layers — per-layer editing with type selector (Text, Button, Card, Carousel, Counter, Progress, Icons, Divider, Lottie, Image, Video, Shape) and timeline controls
Scenes can be reordered, added, deleted, and collapsed. Transitions between scenes are configurable with 23 transition types and color + duration overrides.
Video Layers
Add video files to any scene as a layer. Select a file via the native picker; WebM recordings are automatically remuxed to fast-start MP4 for smooth playback. Video layers support adjustable start/end frames, and the preview plays the video inline within the scene.
Director Tab
Chat with the Creative Director to refine the video plan. For AI-composed scenes, the Director sends visual instructions to the Composer agent (e.g. "make the headline larger but keep the browser mockup") rather than replacing the entire composition. A status message in the chat shows when the Composer is regenerating visuals.
You can ask for changes to the narrative arc, swap scene types, adjust pacing, add or remove scenes, or attach reference images for visual direction.
Code Tab
View and edit the compiled Remotion composition code directly. Changes are applied to the preview when you click "Apply Changes". AI-composed scenes appear as CustomScene_N components with full access to Remotion, GSAP, and the 180+ component library.
Music Tab
Manage background music: preview the current track, adjust volume, snap scene durations to detected beats, generate new tracks from mood presets or custom prompts, and browse your music library.
Narration Tab
Narration clips are an independent audio track, decoupled from scenes. Deleting or reordering scenes preserves the narration. Each clip shows its text, source scene, and controls for:
- Play preview
- Adjustable start frame position
- Per-clip volume slider
- Delete individual clips
- "Re-sync timing to scenes" button after edits
The narration track also appears as a purple row in the multi-track timeline at the bottom of the editor. Clicking a narration clip in the timeline switches to the Narration tab and scrolls to that clip.
Timeline
The multi-track timeline shows scene blocks, layer rows, SFX cues, narration clips, and music as separate horizontal tracks. Click anywhere to seek. Ctrl+Scroll to zoom, middle-click to pan. Clicking a layer or narration clip scrolls the right panel to the corresponding editor.
Aesthetics
Preset Aesthetics
12+ curated visual systems available in the picker:
- Vox Documentary — paper backgrounds, hand-drawn annotations, red callouts
- Premium SaaS Dark — glassmorphism, Inter font, accent gradients
- Cinematic Noir — deep blacks, film grain, dramatic motion
- Retro CRT Terminal — green/amber on black, scanlines, monospace
- Editorial Magazine — cream paper, oversized serif, generous whitespace
- Neon Cyberpunk — cyan/magenta, glitch effects, perspective grids
- Hand-drawn Explainer — whiteboard, sketchy fonts, draw-on animations
- Pop Art — halftone dots, bold outlines, primary colors
- 80s Synthwave — sunset gradients, chrome text, perspective grids
- Anime — speed lines, dramatic zooms, manga-style emphasis
- Notebook Sketch — grid paper, confident ink lines, torn-edge labels
- Crayola Kids — wax crayon textures, wobbly shapes, primary colors
- Wes Anderson — centered symmetry, pastel palette, Futura type
- Risograph — offset print grain, halftone dots, duotone overlaps
- Memphis Design — bold geometric shapes, squiggles, terrazzo patterns
- Bauhaus — primary colors, geometric forms, grid-locked layouts
Bespoke Synthesis
When no preset is selected, describe any visual style in the message brief. The AI synthesizes a complete VideoAesthetic object: name, colors (bg, text, accent, secondary), font stacks, PixiJS texture preset, motion feel, decorations, and a strict systemNote that every downstream agent must follow. The brief is the source of truth — no catalog matching.
Narration & Music
Narration Voices
8 MiniMax TTS voices, each suited to different content:
| Voice ID | Character |
|---|---|
| English_expressive_narrator | British male, documentary narrator — Vox / 60 Minutes authority |
| English_WiseScholar | British male, conversational scholar — thoughtful TED talk |
| English_Trustworth_Man | American male, sincere and warm — founder pitch, SaaS explainer |
| English_CaptivatingStoryteller | American senior male, cold detached — noir, Morpheus-like |
| English_PassionateWarrior | American male, energetic and intense — launch hype |
| English_Graceful_Lady | British female, refined and sophisticated — editorial, premium |
| English_CalmWoman | American female, soothing — wellness, meditation |
| English_Upbeat_Woman | American female, energetic — playful, friendly |
Voice selection order: explicit picker → brief-matched AI pick → aesthetic default map → English_expressive_narrator.
Independent Narration Track
Narration clips are stored as an independent audio track, separate from scenes. This means:
- Deleting a scene preserves its narration clips
- Reordering scenes auto-shifts narration timing
- Changing scene duration shifts subsequent narration
- Individual clips can be repositioned, volume-adjusted, or deleted
The Narration tab provides per-clip controls, and the timeline shows narration as a separate purple track row.
Background Music
AI-generated background music via MiniMax or ElevenLabs (configurable in Settings). Multiple generation modes:
- Mood presets — Classic SaaS, Startup Energy, Cinematic, Viral, and more
- Custom Mix — select genre, BPM, instruments, and reference style
- Custom prompt — describe any sound in natural language
- Vocals — instrumental, auto-lyrics, or custom lyrics modes
Music is generated in parallel with scene compilation. Beat detection can snap scene durations to the music's BPM. Music volume is ducked during narration at export time.
Sound Effects
The Sound Designer agent places AI-generated SFX cues at scene transitions and key moments (whoosh, impact, shimmer, click). SFX are generated via ElevenLabs Sound Effects API and cached by prompt hash. Each cue has adjustable volume and timing in the timeline.
Export
MP4 Export
Export to MP4 (H.264) with quality presets: 720p, 1080p, or source resolution. The export pipeline uses Remotion's server-side rendering for frame-accurate output. Music and narration are mixed into the final audio track with automatic ducking.
GIF Export
Export to GIF with configurable options:
- Frame rate — 15, 20, 25, or 30 FPS
- Size presets — Medium, Large, or Original resolution
- Loop toggle — infinite loop or single play
Aspect Ratios
5 output formats: 16:9 (landscape), 9:16 (portrait/reels), 1:1 (square), 4:5 (Instagram), 4:3 (classic). The aspect ratio is applied to both the preview and the export.
External Assets
During export, audio and video files referenced via the studio:// protocol are automatically resolved to local file paths. The export is fully self-contained with no external dependencies.
Keyboard Shortcuts
Editor Shortcuts
| Action | Default |
|---|---|
| Add Zoom | Z |
| Add Trim | T |
| Add Speed | S |
| Add Annotation | A |
| Add Keyframe | F |
| Delete Selected | Ctrl+D |
| Play / Pause | Space |
| Toggle Play | K |
| Export | Ctrl+Shift+E |
| Toggle Captions | Ctrl+C |
| Toggle Cursor | Ctrl+U |
| Zoom In | Ctrl+= |
| Zoom Out | Ctrl+- |
Fixed Shortcuts
| Action | Key |
|---|---|
| Undo | Ctrl+Z |
| Redo | Ctrl+Shift+Z / Ctrl+Y |
| Cycle Annotations Forward | Tab |
| Cycle Annotations Backward | Shift+Tab |
| Delete Selected (alt) | Del / Backspace |
| Pan Timeline | Shift+Ctrl+Scroll |
| Zoom Timeline | Ctrl+Scroll |
All editor shortcuts are configurable in Settings. On macOS, Ctrl maps to Cmd.
Window Shortcuts
| Action | Key |
|---|---|
| New Window | Ctrl+Shift+T |
| Save Project | Ctrl+S |
| Save As | Ctrl+Shift+S |
| Open Project | Ctrl+O |
Project Files
.studio Format
Projects are saved as .studio files (JSON). They contain the full editing state: recording path, timeline edits, annotations, zoom keyframes, speed ramps, captions, scene plan data, AI composition code, narration clips, and generation metadata. The video file itself is referenced by path, not embedded. Legacy .lucid files are still supported.
Auto-Save
Projects auto-save after generation and on significant edits. Auto-saved files are stored in ~/AppData/Roaming/coherence-studio/recordings/ (Windows) or the equivalent app data directory on macOS/Linux.
Recent Projects
The project browser tracks recent files with atomic writes (temp file + rename) to prevent corruption from concurrent saves.