Walkthrough · 10 minute read
How Melted Crayons TV Works
From a single photo to a fully narrated story — comic, graphic novel, or storyboard. Here's the path most creators take, in the order it makes sense to learn it.
Section 01
Characters & Voices
Every story starts with characters. Upload a photo once, and the AI uses it as a visual reference across every panel — so the hero in panel 1 looks like the hero in panel 6, regardless of art style.
Uploading a reference photo
Go to Studio → Characters → + New Character. Choose a clear photo of yourself, your kid, your pet, anyone you want to star in your stories.
- Best results: a front-facing or three-quarter photo with the face clearly visible
- Avoid: sunglasses, heavy shadows, multiple people in one shot, low resolution images
- Photos automatically resize before upload, so high-res phone photos work fine
- The AI will extract the subject onto a transparent background — you'll see a preview before saving
No photo? Generate one from a description
In the character creator, switch the source toggle to ✨ Generate from Description. Type what you want — “a young girl with curly red hair, freckles, green eyes, wearing a yellow raincoat” — and the AI produces a clean reference portrait you can use as the character's identity anchor.
- Best for: fictional or anonymous heroes — dragons, robots, space pirates, cartoon kids, made-up characters of any kind
- Honest tradeoff: real photos preserve identity slightly better across many panels. Generated characters are still very consistent — just not pixel-perfect for real-world resemblance
- Click ✨ Try Again to regenerate if the first result isn't quite right. Each generation uses 1 credit — same as creating a panel
- The generated image flows through the same background-removal step as uploaded photos. Once extracted, the character behaves identically downstream
Background removal happens automatically
The moment you upload a photo or generate one from a description, the AI starts extracting the subject onto a transparent background — no extra button to click. You'll see “Removing background” under the Save button while it runs. The extracted cutout shows next to the source when it's done; the Save button activates as soon as both are ready.
- Runs in your browser — your photo never leaves your device for the extraction step. Privacy bonus, and zero per-extraction cost.
- First time you do this, the AI model downloads (~30MB, cached forever after). Subsequent extractions are 1–3 seconds.
- If the result looks off, click Retry next to the preview — re-runs against the same source.
Naming and panel defaults
Give your character a name. For people-style characters you can also set a default pose (standing, action, sitting, running, flying) and default expression (neutral, happy, angry, etc.). When you add this character to a new panel, those defaults pre-fill the panel's pose and expression dropdowns — override per panel if you want a different action.
Voice selection also lives on the character (see below) so the same character speaks with the same voice across every story they appear in.
Voice selection
Every character automatically gets a distinct speaking voice — matched to their gender when you set it — so they talk in their own voice the moment they're cast, with no setup. Want a specific voice? Pick one and it overrides the auto-choice everywhere that character speaks.
- Filter by gender — quick toggle row above the dropdown (All / Female / Male / Any-Neutral) so you can narrow to voices that match your character's gender
- Voices are also grouped by vibe — narrator, young, mature, hero, villain, character, announcer
- Each option shows the gender label inline (e.g. [Male] Earnest Young Man) so it's scannable even without the filter
- Click the play icon next to a selected voice to preview it before committing
- Auto-assigned, not silent: a character with no hand-picked voice still speaks — we give them a distinct, gender-fitting voice (and keep different characters sounding different). Set the character's Gender on its design page to guide the match.
Products & brands as characters
Characters aren't limited to people. You can also create Products (like a coffee cup or a sneaker) or Brands (a logo or mascot). The AI preserves their exact appearance the same way it preserves a face. Useful for product storytelling, ads, branded narratives, or storyboarding a commercial before you shoot it.
Section 02
Style, Layout & Aspect
Before you create panels, you set the look of your story. These choices apply across every panel for visual consistency, but you can override any of them per panel later.
Choosing an art style
Melted Crayons TV ships with 30+ art styles — comic book, anime, watercolor, LEGO, Ghibli-inspired, claymation, pixel art, noir, pop art, cyberpunk, and more. Each style has its own visual language, and the AI is trained to render in it consistently. The style choice often defines the format too: ink and halftone reads as a graphic novel, photoreal as a storyboard, comic-book as a Saturday-morning strip.
- Storybook pick: “storybook” for bedtime stories or “comic” for classic sequential art
- Show-stopping pick: “LEGO” or “Ghibli” — these styles are instantly recognizable and feel premium
- Brand / pre-viz pick: “cinematic” or “photorealistic” for product marketing or storyboarding a shoot before you film it
- You can change the style mid-story, but the AI works hardest to stay consistent when you keep one style throughout
Aspect ratio (panel shape)
Pick the shape of each panel. Eight ratios, grouped by orientation:
- Landscape — 16:9, 3:2, 4:3 — widescreen cinematic, photo framing, and classic comic-book proportions
- Square — 1:1 — ideal for Instagram feed, Facebook, and X
- Portrait — 4:5, 2:3, 3:4 — vertical fill for Instagram feed, Pinterest / manga, and tablet / PDF reading
- Tall — 9:16 — full vertical for TikTok, Reels, Shorts, and IG Stories
You can mix aspect ratios within a single story — a wide establishing shot, then a tight portrait close-up, then a square action panel. Each one is independent.
Resolution and quality
Panels render at 2K — sharp enough for screen and most prints. Whether your exports carry a watermark, and your commercial-use license, are set by your plan. Video export quality also steps up with your tier (1080p → 1440p → 4K) — more on that in the Save as Video section below.
Section 03
Panel Creation
This is the core loop. Every panel goes through four moves: write the scene, cast your characters, add the text, and refine until it's right. (Or skip ahead with the AI Story Writer, below.)
The AI Story Writer — generate a whole story
Don't want to build panel-by-panel? Open the Story Writer and the AI drafts the entire sequence for you — scene descriptions, character casting, poses, captions, and dialogue — then drops them in as real, editable panels you can refine like any other. There are two modes:
- Interpret a premise — describe your story in plain language (“a knight befriends the dragon he was sent to slay”) and the AI writes an original story from it, inventing natural dialogue and a full arc.
- Keep my exact words — paste a script or song lyrics and the AI preserves your text verbatim as the captions and dialogue, only generating the visuals around it. Great for turning a screenplay or a song into a comic without the AI rewriting your words.
- It creates up to 15 panels per run — for a longer story, run it again to continue.
- The draft is a starting point: every scene, line, and bubble is fully editable afterward, and you can regenerate any panel.
- After it runs, a Generate All Panels button appears so you can render every panel's art in one go.
The Scene — describing what's happening
The scene prompt is a short description of what the panel shows. Write it like you'd describe a movie shot to a friend.
“A dragon flies low over a glowing forest at dusk. Mist curls between the trees. A castle rises in the distance.”
- Be specific about lighting and mood:“dusk,” “rain-soaked,” “cozy interior,” “harsh noon sun”
- Describe the camera:“low-angle shot,” “close-up,” “wide establishing view”
- Don't describe the characters here — the AI handles them via your reference photos. Just describe the world they're in.
- You can also pick a scene preset from the dropdown (forest, castle, city, etc.) instead of writing your own
Cinematography (optional but powerful)
For more cinematic shots, pick from the four cinematography fields:
- Shot: close-up, medium, wide, establishing, etc.
- Camera angle: low-angle (heroic), high-angle (vulnerable), dutch angle (off-balance), eye-level (neutral)
- Lighting: golden hour, neon noir, soft daylight, moody shadows
- Mood: joyful, tense, melancholic, epic
These add cinematic language to your prompt without you having to write it manually. Especially useful when you want a panel to feel like a movie still, not a generic illustration.
Casting — adding characters to the scene
Click + Add Character to drop one of your created characters into the panel. You can:
- Add multiple characters per panel (up to your panel limit)
- Drag each character on the canvas to set their position
- Set per-panel pose (running, sitting, fighting) and expression (smiling, surprised, angry)
- If a character has a default pose set, it pre-fills here — you can override
The AI uses each character's reference photo for identity preservation, so they'll look like the same person across every panel even though the AI is generating fresh images.
Text — captions and dialogue
Two kinds of text overlay:
- Captions: narration boxes — usually placed at the top or bottom of a panel. Style options include narration (default), thought, action, and location stamps.
- Dialogue: speech bubbles tied to a character. Choose between speech (default), thought (cloud-style), shout (loud emphasis), and whisper (italic).
For both, you can dial:
- Position: drag anywhere on the panel
- Width: 10–90% of panel width — controls wrapping
- Font size, family, and color
- Corner radius: sharp rectangle, soft round, or full pill — match the comic-book style you're going for
- Tail direction (dialogue only): point to who's speaking
Click the speaker icon on a panel to hear the captions and dialogue read aloud using each character's voice.
Sound FX stickers — BOOM, POW, ZAP
Add classic comic sound-effects from the Sound FX section. Pick a style, type your word, and it drops onto the panel — then drag to place it, and tune the size, angle, and colors.
- Six styles: Impact, Electric, Whoosh, Soft, Kapow, and Heavy — each with its own lettering, burst shape, and palette.
- Your word, your colors: type any text and recolor the fill and burst per sticker (or keep the preset palette).
- Place it precisely: drag to position, then slide for size and rotation — angled FX look great breaking across the action.
- Free: FX stickers don't use a generation credit, and they burn into every export (PDF, social, and video) exactly as you place them.
Animate a panel — bring it to life
On any rendered panel, hit Animate (next to Recast / Refine) and the AI turns the still into a short, silent moving clip — gentle parallax, characters breathing, the scene coming alive. Your narration plays over it, so per-character voices and translations are preserved.
- Preview + pick your take: the clip plays right in the studio. Hit Re-animate for a different take — they're kept in a short history, and you use ◀ ▶ to choose which one to keep.
- Download it: grab the clip as an MP4 from the preview card.
- Paid feature: animation is a premium AI action — it costs several generation credits per clip (video generation is far heavier than a still) and is available on paid plans.
- Mix freely: animate just your best panels — the rest stay as stills. The reader shows a ▶ Animated badge on panels that have a clip.
Lip-sync — make a character actually speak
When a panel has dialogue from one character, hit Lip-sync (next to Animate) and that character's mouth, head, and expression move in time with their lines — performed in their own assigned voice. Animate brings a scene to life with silent motion; Lip-sync makes the character talk.
- When to use which: a single speaker talking → Lip-sync. Scenery, action, or a panel with several speakers → Animate (every voice still narrates over the motion).
- One speaker per panel, for now: Lip-sync targets a single speaking character. Panels with two or more speakers use Animate instead — all their voices are still heard as narration. (True multi-character lip-sync arrives once the models support it.)
- Captions come first: if the panel also has a narrator caption, in Watch Mode the narrator reads it, then the character speaks.
- Take history + preview: review the clip in the studio and re-run for a different performance, just like Animate.
- Paid feature: like animation, lip-sync is a premium AI action billed in generation credits, with a short per-panel dialogue length cap.
Regeneration — making it right
First generation rarely lands perfectly. You have three tools to iterate:
- Generate — full regeneration with the same prompt. Good if a panel just landed weird and you want a fresh attempt.
- Refine — type a text edit instruction (e.g. “make the dragon larger” or “change the lighting to morning”) and the AI edits the panel without redoing it from scratch.
- Recast (the gradient button with the refresh-halo icon) — keeps the scene, background, and lighting exactly as they are, but generates fresh character poses and expressions. Different take, same set. Each click rolls a different angle, gesture, and energy so successive recasts feel distinct.
- Undo / Redo — every regeneration is saved. Step back through previous versions or step forward again with the arrows.
Visual cues built into the editor
A few small affordances in the studio that help you see what you've filled in and what each control does:
- Tab progress dots — small green dots appear next to Scene, Characters, and Text when those tabs have content. Quick way to see what's done at a glance.
- Helper captions under each action button — Generate (“Build a fresh panel”), Recast (“Try new poses”), Refine (“Edit with text”) — so the three regen options are easy to tell apart.
- Disabled-Generate hint — when the Generate button is greyed out, a small message points to the editor where you need to type your scene description.
- Story Settings auto-collapse — settings (engine, aspect, narrator, art style, layout) hide behind a gear button so they don't crowd the workspace once you've set them.
Render words inside the image (in-scene text)
Below the canvas there's a toggle labeled “Render words inside the image”. This is for text that should appear inside the artwork — signs, posters, billboards, sound effects baked into the scene.
- Use this for: a street sign saying “BEWARE,” a billboard reading “GOTHAM TIMES,” a t-shirt graphic, a chalkboard, a book cover in the scene
- Don't confuse with captions, dialogue, or FX stickers — those are all overlays drawn on top of the panel (and stay crisp/editable). This toggle is for words rendered by the AI inside the artwork itself. For a punchy “BOOM” you can move and recolor, use an FX sticker; use this toggle only when the text should be part of the painted scene (a sign, a poster).
- Modern image engines render text fairly cleanly — works best with short phrases (1–4 words)
Section 04
AI Storytelling
A few things happen behind the scenes that are worth knowing — they explain why the app behaves the way it does and how to get the most out of it.
Cross-panel style consistency
When you generate panel 2 and beyond, the AI looks at panel 1 to understand your story's exact visual language — palette, lighting, brushwork, mood — and matches it. This is why your story feels like a coherent piece, not a collection of random AI images.
The technical name is style anchoring. It works automatically. Your job is to make sure your first panel looks the way you want the rest of the story to look.
- Regenerate the source panel and the anchor refreshes automatically — if you redo panel 1 to a new look, later panels you generate will pick up the new style anchor instead of staying locked to the old one.
- Existing panels stay in their original style until you regenerate them. Style changes don't retroactively re-render finished panels.
- Changing the story's default art style (in story settings) clears the cached anchor and warns you — your existing panels keep their look, new panels follow the new style.
Character identity preservation
Your character reference photos are passed to the AI on every single panel where that character appears. The AI is instructed to reproduce their face, features, hair, and skin tone precisely. This is what lets the same hero appear across many panels and styles without slowly morphing into someone else.
The better your reference photo (clear, well-lit, front-facing), the better the consistency. A blurry side profile gives the AI less to work with.
The AI engines
Melted Crayons TV uses two AI image engines under the hood:
- Gemini Nano Banana 2 — Google's latest. Excellent character consistency and fine-detail rendering.
- SeedDream 5.0 — ByteDance's engine. Strong stylistic range and fast generation.
You can switch between them in the story settings. Both are high-quality; sometimes one nails a specific style or character better than the other. If a panel feels off, try the other engine.
Narration and voice
Narration is text-to-speech with natural prosody. When you click the speaker on a panel, the caption text and dialogue text are voiced in order, each line using its character's assigned voice (or your story's default narrator for captions and any dialogue without a per-character voice).
- Cartesia voices — fast, natural, available on every plan including free.
- ElevenLabs voices — premium voice acting with richer emotional range, unlocked on Creator and Studio plans. Marked in the picker with a 🔒 badge for free / Storyteller users so you can see what's available.
- Filter by gender — quick toggle row above the voice dropdown narrows to female / male / neutral so you can match a character without scrolling every voice.
- Conversational pauses — the player adds natural beats between clips: a longer breath when a different character takes their turn, a shorter pause when the same speaker continues. Feels like reading aloud, not reading a script.
- Voice leveling — every clip runs through a gentle compressor before playback so a quiet narrator and a shouty villain sit at the same loudness. No more reaching for the volume knob between speakers.
- Lead-in beat — playback waits about eight-tenths of a second before the first word so the start doesn't feel rushed.
Listen Mode — narrated read-along (published stories)
Published stories have two ways to enjoy them, beyond reading the static comic: Listen and Watch. Hit Listen to read the comic top to bottom with full narration — the player walks through panels in order, voicing every caption and line of dialogue with the same per-character voices from the studio. Listen works on every narrated story.
- Page-hold timer — pick how long the player lingers on each panel after the audio finishes: 1, 2, 3, or 4 seconds. Half the time sits before the audio starts, half after — so the listener has a breath to take in the art on both ends.
- Page-flip pause — a longer beat between panels than within a panel, so panel transitions feel deliberate.
- Same voice leveling as the studio preview — a balanced listen across an entire story.
Watch Mode — the animated episode (Melted Crayons TV)
If a story has animated panels, a Watch button appears alongside Listen. Watch Mode plays the story full-screen as a narrated animated episode — each animated panel plays its motion clip while the narration carries the audio, auto-advancing panel to panel. Panels you haven't animated show their still (held for the narration), so a partly-animated story still plays start to finish.
- Read, Listen, or Watch. The static comic is always there to read; Listen narrates it; Watch brings the animated panels to life. They're three ways into the same story.
- Translated narration carries over — switch language and Watch Mode speaks it in that language too.
- Two front doors: /stories is the Listen feed — narrated comics for ears-up readers. /tv (Melted Crayons TV) is the Watch feed — animated episodes only. Same library, two different vibes; readers pick the experience they're in the mood for.
Translate your story for a global audience
On your own published story, open the language manager to generate a full translation — every caption, every line of dialogue, the title, and the description — into any supported language. Readers then get a language switcher on the story page and can read it in their own tongue.
- The art is shared, only the words change — so translating is fast and doesn't re-generate any panels or cost a generation credit.
- 10 languages today — Spanish, Portuguese (Brazil), French, German, Italian, Japanese, Korean, Chinese (Simplified), Hindi, and Arabic.
- Translated narration — after translating the text, hit Narrate next to a language and the AI generates a full audiobook in that language, using your story's same voices. Readers who switch language hear it spoken in their tongue, not just read it. (Until you generate it, the player falls back to the English audio.)
- Translation flows into video export — when a translation's narration is ready, the Save as Video dialog grows a Narration language picker. Choose Spanish and the exported MP4 ships with Spanish voices baked in — perfect for posting one video per region to YouTube or TikTok.
- Better discovery — each translation gets its own search-engine language tag, so the right version can surface for readers in each region.
Reading order: position controls playback
The audio reads your captions and dialogue in the order they're positioned on the panel — top to bottom, then left to right within a row. There's no separate sequence to manage; just drop each bubble where you want a reader's eye to land first.
- Numbered badges on every non-empty bubble show the read order at a glance — cyan for captions, magenta for dialogue.
- Drag to reorder — moving a bubble higher, lower, left, or right updates its number instantly.
- Two bubbles within ~10% of the panel height read as a row — a slight vertical drift between them won't flip the order, matching how the eye actually groups them.
- Empty bubbles (no text yet) don't take a number and aren't spoken.
When audio re-renders vs. plays from cache
Audio is generated on demand and cached. Re-listens are free — audio streams straight from Cloudinary. The audio re-renders automatically when:
- You edit the text of any caption or dialogue
- You drag a bubble to a new position (re-orders playback)
- You change a character's voice
- You change the story's default narrator voice
Re-renders count against your monthly generation limit, same as a panel re-generation.
Why it sometimes takes a moment
Generating a single panel involves resizing your reference photos, sending them to the AI, prompt construction, image generation (~10–30 seconds), and post-processing. Most panels complete within 30 seconds. If a panel takes longer, the system shows a slow-generation banner and the option to cancel and retry.
If a generation fails, you don't pay for it — only successful panels count toward your monthly limit.
What counts as a generation
Most paid actions in the studio share the same generation counter — your monthly plan cap covers all of them collectively:
- Generating a panel — 1 generation
- Recast (try new poses on a panel) — 1 generation per click
- Refine (text-edit a panel) — 1 generation
- Generate-from-description in the character creator — 1 generation per Try Again
- Panel narration (TTS) — 1 generation per panel that gets voiced
- Uploading a character photo — free (uses cheaper background-removal, not full image generation)
When you hit your monthly cap, prepaid credit packs cover the overflow. When credits run out too, the upgrade prompt opens.
Exporting your story
Once your panels are done, click Export to download your story:
- PDF — full storybook with cover page, all panels, and dialogue baked in
- PNG — individual panels at high resolution
- Instagram, TikTok, Twitter, Facebook — pre-formatted for each platform with captions and dialogue burned in
- Save as Video (MP4) — a real narrated story video, ready for YouTube, TikTok, Reels, or Shorts. Two flavors: Story Video (the comic as a narrated slideshow) and Animated Video (your animated panels in motion + lip-sync, the same thing Watch Mode plays). Details below.
Save as Video — Story or Animated MP4 export
From any published story, choose Export → Save as Video. Your browser builds a real MP4 file on the spot — H.264 video, AAC audio, plays in every share target including iOS Safari, no server-side render farm needed. You get two options: Save as Story Video (your comic as a narrated slideshow — every panel a still) and, if any of your panels are animated, Save as Animated Video (the same animated episode Watch Mode plays, with motion and lip-sync baked into the file). Both ship with full narration; pick the one that fits the platform you're posting to.
- Matches your story's shape — the output frame uses your panels' aspect ratio automatically. 16:9 stays widescreen, 9:16 exports vertical for TikTok / Reels / Shorts, 1:1 for Instagram feed, 4:3 for classic comic-book.
- Resolution scales with your plan — Storyteller exports at 1080p (Full HD), Creator steps up to 1440p (2K), and Studio renders full 4K. Higher tiers get a higher bitrate too, so text and bubble edges stay crisp.
- Captions and dialogue burned in — the rendered video uses the same styling as the public reader, so what listeners see matches what they'd see on the story page.
- Voice leveling included — the same compressor the audiobook player uses bakes into the MP4, so loudness is balanced across speakers in the final file.
- Encodes faster than real time — a 90-second story exports in a few seconds rather than 90 seconds. The whole pipeline runs offline in the browser, so timing stays locked: audio and panel transitions never drift.
- Browser support: Chrome, Edge, Arc, and modern Chromium on desktop. The button hides on browsers that don't support the underlying WebCodecs APIs (older Safari, older Firefox).
- Animated export ships your animations: pick Save as Animated Video and the MP4 bakes in motion clips and lip-sync exactly like Watch Mode — no second tool needed. Animated render takes longer than the Story Video (each animated panel is seeked frame-by-frame), but it stays browser-only and offline. Story Video stays the right choice when you want a quick, lightweight, every-panel-a-still narrated MP4.
- Pick the narration language — if your story has translated audio tracks, the export dialog shows a Narration language picker. Whatever you choose applies to both Story and Animated, so you can ship one MP4 per region (filename gets a
-es,-fr, etc. suffix so they stay distinguishable in your Downloads folder).
What's coming next
A few features are on the roadmap based on early-tester feedback:
- Collaborative editing — invite friends, family, or your team to a shared story so multiple people can build it together, like a shared Google Doc for stories. Click Share → Invite Collaborators to get notified when it ships.
- Intros, transitions, and outros for the video export — title cards, panel transitions, and an end card you can brand. The MP4 export already ships today; this layer adds polish on top.
- Custom voices — design your own narrator voice from a short audio sample (Creator tier perk).
Ready to start?
The fastest way to learn is by making one.