Seedance 2.0 Image-to-Video Tutorial (2026)
Seedance 2.0’s image-to-video capability is one of its most powerful features. Instead of describing a scene from scratch with text, you upload a still image and tell the AI exactly how to bring it to life — what moves, how the camera behaves, and what style to apply. The result is a 4-15 second video at up to 2K resolution that preserves the composition, colors, and details of your original image while adding natural, cinematic motion.
This tutorial walks you through the entire image-to-video workflow, from preparing your source images to writing effective motion prompts. You will find copy-paste prompt examples, advanced techniques like first-frame locking and character animation, and solutions to the most common problems creators encounter.
Why Image-to-Video (Instead of Text-to-Video)
Text-to-video is powerful, but it gives you less control over the starting composition. When you already have a specific look, character, product shot, or scene in mind, image-to-video is the better workflow for three reasons:
Visual precision. Your image locks the composition, color palette, lighting, and subject appearance. The AI does not have to guess what your character looks like or how the scene is framed — it starts from your exact visual.
Character consistency. One of the hardest problems in AI video is keeping characters looking the same across frames. When you provide a reference image and use the @mention system, Seedance 2.0 treats that image as a ground-truth anchor, maintaining facial features, clothing, and body proportions throughout the clip.
Faster iteration. Instead of rewriting complex visual descriptions, you swap one image for another. Your prompt stays focused on motion and camera work, which is easier to adjust.
Image-to-video is ideal for product demos, character animation from concept art, social media content from existing photos, storyboard-to-video conversion, and any workflow where the visual identity is already established.
What You Need Before You Start
Before opening Dreamina, prepare the following:
A Dreamina Account
Seedance 2.0 runs on ByteDance’s Dreamina platform at dreamina.capcut.com. Sign up with a CapCut or ByteDance account. New accounts receive free trial credits — enough for several test generations.
High-Quality Source Images
Image quality directly determines output quality. Follow these guidelines:
- Resolution: 2K (2048x1152) or higher. Seedance 2.0 outputs up to 2K, so feeding it a 720p source image means the AI has to upscale and guess at details.
- Format: JPG, PNG, or WebP. PNG is preferred for images with transparency or fine detail.
- Subject clarity: The main subject should be sharply in focus with good lighting. Avoid heavy compression artifacts, motion blur, or low-contrast scenes.
- Composition space: Leave visual room for the motion you plan to add. If you want a character to walk forward, do not crop them at the edge of the frame.
A Clear Motion Plan
Before writing your prompt, decide three things:
- What moves? — The subject, the background, or both?
- How does the camera move? — Pan, orbit, dolly, tilt, static, or handheld?
- What is the mood/pace? — Slow and cinematic, or fast and energetic?
Having these answers before you start prevents vague, unfocused prompts that produce generic results.
Step-by-Step: Image to Video in Seedance 2.0
Step 1: Open Dreamina and Select the Model
- Go to dreamina.capcut.com
- Log in to your account
- Click AI Video in the main navigation
- From the model selector dropdown, choose Seedance 2.0
Step 2: Upload Your Reference Images
Click the Reference Panel (the upload area on the left side of the interface) and upload your images. You can drag and drop or click to browse.
Each uploaded file is automatically assigned an @tag:
- First image:
@Image1 - Second image:
@Image2 - And so on, up to
@Image9
You can upload up to 9 images, 3 video clips (15 seconds total), and 3 audio files (15 seconds total) — 12 files maximum per generation.
Tip: For a basic image-to-video conversion, one image is enough. Use multiple images when you need character consistency across different shots or want to define both a subject and a style reference separately.
Step 3: Write Your Motion Prompt
This is where image-to-video differs from text-to-video. Since your image already establishes the visual scene, your prompt should focus on motion and camera work, not scene description.
Use your @tags explicitly:
@Image1 is the first frame. The woman's hair blows gently
in the wind. She slowly turns her head to the right and
smiles. Camera holds steady in a medium close-up.
Soft natural lighting, shallow depth of field.
We cover the full prompt formula and more examples in the next section.
Step 4: Configure Output Settings
Set the following parameters:
| Setting | Options | Recommendation |
|---|---|---|
| Aspect Ratio | 16:9, 9:16, 4:3, 3:4, 1:1 | Match your source image ratio |
| Duration | 4-15 seconds | Start with 5s for testing |
| Resolution | Up to 2K (2048x1152) | Use 2K for final output |
Aspect ratio matching matters. If your source image is 16:9 and you set the output to 9:16, the AI will crop or reshape the composition, often losing important details. Always match the aspect ratio of your source image to the output setting.
Step 5: Generate
Click Generate and wait. A 5-second clip at 2K resolution typically takes about 60 seconds.
Step 6: Review and Iterate
Watch the result carefully. Check for:
- Motion quality: Is the movement smooth and natural?
- Character consistency: Does the subject maintain their appearance throughout?
- Camera behavior: Does the camera follow your instructions?
- Artifacts: Look for flickering, warping, or unnatural distortions, especially around hands and faces.
If something is off, adjust one element at a time in your prompt. Changing multiple things simultaneously makes it impossible to know what improved (or worsened) the result. Generate 2-4 variations per prompt to compare outcomes.
The Image-to-Video Prompt Formula
For image-to-video, your image handles the visual composition while your prompt handles the motion. Use this formula:
Subject + Motion, Background + Motion, Camera + Motion
Break it down into three layers:
Layer 1: Subject Motion
Describe what the main subject does. Use specific action verbs:
- “The knight raises his sword slowly above his head”
- “The cat stretches and yawns”
- “The woman turns to face the camera”
- “The product rotates 180 degrees on the table”
Avoid vague instructions like “the subject moves” — the more specific the action verb, the better the result.
Layer 2: Background/Environment Motion
Describe what happens in the scene around the subject:
- “Leaves fall gently in the background”
- “Rain streaks across the window”
- “City lights pulse and flicker in the distance”
- “Clouds drift slowly across the sky”
If you want the background to stay static, say so explicitly: “The background remains still.”
Layer 3: Camera Motion
Specify exactly one camera movement per clip. Combining multiple camera moves in a short clip often produces unstable results.
| Camera Direction | What It Does |
|---|---|
Slow pan left/right | Horizontal sweep across the scene |
Dolly in/out | Camera moves toward or away from subject |
Orbit left/right | Camera circles around the subject |
Tilt up/down | Vertical camera rotation |
Tracking shot | Camera follows subject movement |
Static shot | Camera holds position, no movement |
Handheld | Subtle natural shake for documentary feel |
Putting It Together
Here is the formula applied to a portrait photo:
@Image1 is the first frame. The woman slowly lifts her chin
and looks directly into the camera [subject motion]. A gentle
breeze moves the curtains behind her [background motion].
Camera slowly dollies in from a medium shot to a close-up
[camera motion]. Warm golden-hour lighting, cinematic color
grading, shallow depth of field [style].
Style and Constraint Tags
Add style keywords at the end of your prompt to control the visual treatment:
- Cinematic: “cinematic lighting, shallow depth of field, film grain, 24fps”
- Commercial: “clean studio lighting, product photography, crisp focus”
- Dramatic: “high contrast, dramatic shadows, moody atmosphere”
- Smooth motion: “smooth continuous motion, no jump cuts”
- Slow motion: “slow-motion movement, 120fps look”
7 Copy-Paste Prompt Examples
These prompts are designed for image-to-video generation. Upload your image, paste the prompt (replacing @Image1 descriptions with your actual subject), and generate.
Example 1: Portrait Animation
@Image1 as the first frame. The person blinks naturally and
turns their head slightly to the left. A faint smile appears.
Hair moves gently as if caught by a light breeze. Camera
holds steady in a medium close-up. Soft natural lighting,
cinematic color grading, shallow depth of field.
Best for: Headshots, profile photos, character portraits.
Example 2: Product Showcase Rotation
@Image1 as the first frame. The product slowly rotates 180
degrees on a reflective surface. Soft highlights glide across
the surface as it turns. Camera holds static at eye level.
Clean studio lighting, commercial product photography style,
crisp focus throughout.
Best for: E-commerce product shots, marketing materials.
Example 3: Landscape Come-to-Life
@Image1 as the first frame. Clouds drift slowly from left to
right across the sky. Water ripples gently in the foreground.
Grass sways in a light breeze. Camera executes a slow dolly
forward into the scene. Golden-hour lighting, nature
documentary style, wide dynamic range.
Best for: Travel content, real estate, nature photography.
Example 4: Character Action Scene
@Image1 is a warrior in full armor. The warrior raises their
sword overhead with both hands, then brings it down in a
powerful swing. Cape billows with the motion. Camera orbits
slowly to the right during the swing. Dramatic side lighting,
cinematic atmosphere, epic fantasy style.
Best for: Concept art animation, game marketing, fantasy content.
Example 5: Fashion and Style Video
@Image1 as the first frame. The model takes two confident
steps forward on the runway. Fabric of the outfit flows and
catches the light with each step. Camera tracks backward,
keeping the model centered. Bright fashion show lighting,
high-contrast, editorial photography style.
Best for: Fashion lookbooks, social media reels, brand content.
Example 6: Food and Beverage
@Image1 as the first frame. Steam rises gently from the
surface of the coffee cup. A hand slowly reaches in from the
right side and lifts the cup. Liquid shifts naturally inside
the cup. Camera remains static, medium close-up. Warm
cafe lighting, cozy atmosphere, food photography style
with rich warm tones.
Best for: Restaurant marketing, food blog content, beverage ads.
Example 7: Architectural Visualization
@Image1 as the first frame. Sunlight slowly shifts across the
building facade, casting moving shadows. People walk past in
the foreground as small blurred silhouettes. Trees sway
gently. Camera slowly pans right along the building exterior.
Clean architectural photography style, natural daylight,
realistic atmosphere.
Best for: Real estate, architecture portfolios, urban content.
Advanced Techniques
Once you are comfortable with basic image-to-video, these techniques will help you produce more sophisticated results.
First-Frame Locking
The most reliable way to use image-to-video is to lock your image as the first frame of the generated video. This ensures the video starts exactly as your image looks and the AI animates forward from that point.
Use this phrase in your prompt:
@Image1 as the first frame.
This tells Seedance 2.0 to treat your image as the literal starting frame, not just a style or character reference. The composition, colors, subject position, and overall layout of your image will be preserved in frame one, and motion will build from there.
Last-Frame Targeting
You can also define an endpoint by uploading two images — one for the start and one for the end:
@Image1 as the first frame, @Image2 as the last frame.
The character smoothly transitions from the sitting position
to standing. Camera holds steady. Continuous smooth motion.
Seedance 2.0 will generate a video that transitions naturally from the composition in @Image1 to the composition in @Image2. This is powerful for:
- Before/after transformations
- Character pose transitions
- Scene transitions (day to night, empty to populated)
- Product reveal sequences
Tip: Keep both images at the same aspect ratio and roughly the same framing for the smoothest transition. Dramatic composition changes between first and last frame can produce unstable results.
Multi-Image Character Consistency
When building multi-shot content (like a short film or ad campaign), use the same character reference image across all generations:
Shot 1:
@Image1 is the main character. She walks through a busy
market street. Camera tracks alongside her. Daytime,
natural lighting.
Shot 2:
@Image1 is the main character. She stops at a fruit stand and
picks up an apple. Camera holds static, medium shot.
Same daytime lighting as previous scene.
By using the same @Image1 reference in both shots, the character’s face, clothing, and body proportions remain consistent across cuts.
Style Transfer from a Second Image
Upload one image as your subject and a second image as your style reference:
@Image1 is the main subject. Apply the visual style, color
palette, and lighting from @Image2. The subject walks forward
slowly. Camera dollies in. Match the mood and atmosphere
of @Image2 exactly.
This is useful when you want a photograph to look like a painting, a sketch to look like a 3D render, or any cross-style transformation while maintaining motion.
Combining Image and Video References
For maximum control, combine an image reference (for appearance) with a video reference (for motion):
@Image1 is the character. Replicate the exact camera movement
and action choreography from @Video1. Maintain the character's
appearance from @Image1 throughout. Cinematic lighting.
This separates appearance control from motion control — your image defines what things look like, and your video reference defines how things move.
Seed Locking for Iterative Refinement
If the Dreamina interface provides a seed value, note the seed of a generation you partially like. Re-run with the same seed and slightly adjusted prompt to refine specific elements without changing the overall composition. This is especially useful when the motion is good but the style needs adjustment, or vice versa.
Image Preparation Best Practices
The quality of your output is directly tied to the quality of your input. Follow these rules for best results.
Resolution Matters
| Input Resolution | Expected Output Quality |
|---|---|
| Below 720p | Poor — visible artifacts, soft details |
| 1080p (1920x1080) | Good — acceptable for social media |
| 2K (2048x1152) | Excellent — matches native output resolution |
| 4K (3840x2160) | Excellent — gives the AI maximum detail to work with |
Always aim for 2K or higher. If your source image is below 1080p, consider upscaling it with an AI upscaler before using it in Seedance 2.0.
Aspect Ratio Alignment
Match your source image’s aspect ratio to your desired output ratio:
| Platform | Recommended Ratio | Image Size Example |
|---|---|---|
| YouTube / Vimeo | 16:9 | 2048 x 1152 |
| TikTok / Reels / Shorts | 9:16 | 1152 x 2048 |
| Instagram Feed | 1:1 | 1440 x 1440 |
| Instagram Portrait | 4:5 or 3:4 | 1152 x 1536 |
Mismatched ratios force the AI to crop or pad your image, which introduces unintended framing changes.
Subject Positioning
- Place your subject where they will stay throughout the clip. If the character is centered, the AI will attempt to keep them centered.
- Leave headroom and negative space in the direction of intended motion. A character about to walk right needs space on the right side of the frame.
- Avoid subjects cut off at the edges of the frame unless you intentionally want partial framing.
Lighting Consistency
The AI will attempt to maintain the lighting from your source image. If your image has flat, even lighting, the video will have flat, even lighting. For more dynamic results:
- Use images with directional lighting (side-lit or backlit subjects create more visual depth).
- Avoid mixed lighting temperatures unless that is the effect you want.
- Specify lighting in your prompt if you want to override or enhance what is in the image: “warm golden-hour lighting” or “dramatic rim lighting.”
What to Avoid
- Heavy text or watermarks: The AI will try to animate them, creating distorted text.
- Extreme close-ups of faces: Can produce uncanny valley effects in the generated motion.
- Collages or multi-panel images: The AI may struggle to determine which element is the subject.
- Very dark or very bright images: Low-contrast images give the AI less information to work with.
Troubleshooting Common Issues
Character Drift (Subject Changes Appearance)
Symptom: The character’s face, clothing, or body shape shifts noticeably during the clip.
Fix:
- Simplify your prompt to a single subject with one primary action.
- Remove any instructions that imply scene changes or new characters entering.
- Use “as the first frame” locking for maximum consistency.
- Ensure your reference image has a clear, well-lit face.
Motion Blur or Jittery Movement
Symptom: The video looks shaky or has unnatural motion blur.
Fix:
- Replace fast-action verbs with smoother alternatives. Use “slowly walks” instead of “runs.”
- Add smoothness constraints: “smooth continuous motion, no sudden movements.”
- Reduce the clip duration. A 5-second clip with one motion is smoother than a 15-second clip with multiple actions.
- Avoid combining multiple camera movements in one clip.
Wrong Framing or Cropped Subject
Symptom: The output crops your subject or frames the scene differently than your source image.
Fix:
- Set the output aspect ratio to exactly match your source image ratio.
- Explicitly state the framing: “medium close-up” or “wide shot” or “full body shot.”
- Use “as the first frame” to lock the composition.
Static Output (Nothing Moves)
Symptom: The generated video looks like a still image with minimal or no motion.
Fix:
- Be more specific about what moves. Instead of “the scene comes to life,” describe exact actions: “hair blows in the wind, leaves fall in the background, clouds drift across the sky.”
- Add a camera movement to create at least some visual dynamism.
- Increase the duration to give the AI more frames to work with.
Lighting Shifts Mid-Clip
Symptom: The lighting or color temperature changes noticeably during the video.
Fix:
- Explicitly state lighting consistency: “maintain consistent warm lighting throughout.”
- Avoid prompts that imply time-of-day changes unless that is your intent.
- Use shorter clip durations — lighting is more stable in 4-5 second clips than in 15-second clips.
Unnatural Hand or Face Movements
Symptom: Hands deform, extra fingers appear, or facial expressions look uncanny.
Fix:
- Avoid prompting for close-up hand gestures or extreme facial expressions.
- Keep the camera at medium shot or wider distance from the subject.
- Use simpler hand actions: “holds the cup” works better than “picks up the cup while gesturing.”
- If hands are not critical to the scene, keep them out of focus or out of frame.
FAQ
What image formats does Seedance 2.0 accept for image-to-video?
Seedance 2.0 accepts JPG, PNG, and WebP images. For best results, use images at 2K resolution (2048x1152) or higher with clear subjects and good lighting.
How many reference images can I upload at once?
You can upload up to 9 reference images per generation, alongside up to 3 video clips and 3 audio files, for a maximum of 12 files total.
Can I control which frame my image appears in?
Yes. Use the first-frame technique by writing “@Image1 as the first frame” in your prompt. This locks your image as the opening frame and lets the AI animate forward from it.
Why does my character look different in the generated video?
Character drift usually happens when your prompt describes too many actions or scene changes. Simplify to a single subject and one primary motion. Also ensure your reference image is high-resolution and well-lit.
How long does image-to-video generation take?
A typical 5-second clip at 2K resolution takes approximately 60 seconds to generate. Longer durations and more complex reference setups may take proportionally more time.
Can I use Seedance 2.0 image-to-video for commercial projects?
Yes. Content generated with a paid Dreamina subscription can be used commercially, subject to ByteDance’s terms of service. Check the latest terms for your specific use case.
Related Content
- Seedance 2.0: The Complete Guide — Full feature breakdown, pricing, comparisons, and 20+ prompt examples for every Seedance 2.0 capability.
- 50+ Seedance 2.0 Prompts — Ready-to-use prompt library organized by category, including dedicated image-to-video prompts.
- Seedance 2.0 Review — Honest, independent review covering strengths, limitations, and how Seedance 2.0 compares to Sora 2, Kling 3.0, and Veo 3.1.
SeedanceTips is an independent resource and is not affiliated with, endorsed by, or officially connected to ByteDance or the Seedance development team. All product names, logos, and trademarks are the property of their respective owners. The information on this site is provided for educational and informational purposes based on publicly available data.