Turn Any Image into Video — Complete Image-to-Video Guide
Image-to-video is one of the most powerful capabilities in AI media generation. Take any still photograph — a product shot, a portrait, a landscape — and bring it to life with realistic motion, camera movement, and environmental effects. This guide covers everything you need to know: which models support it, how to write prompts that produce great motion, and how to build a two-step pipeline that generates both the source image and the video from scratch.
Which models support image-to-video?
Not every video model accepts an input image. Of the four video models available in agent-media CLI, two support image-to-video generation:
| Model | Image Input | Credits (5s) | Best For |
|---|---|---|---|
| Kling 3.0 Pro | Yes | 187 | General animation, product shots, scenes |
| Seedance 1.0 Pro | Yes | 104 | Human subjects, dance, expressions |
| Sora 2 Pro | No | 187 | Text-to-video only |
| Veo 3.1 | No | 395 | Text-to-video with audio |
Step 1: Prepare your image
The quality of your input image directly affects the quality of the output video. Here are practical guidelines for getting the best results:
Resolution
Use at least 1080p. Higher resolution gives the model more detail to work with. Both Kling and Seedance will scale your image internally, but starting with a high-quality source avoids artifacts from upscaling.
Aspect ratio
Match your image aspect ratio to your target output. For YouTube Shorts and TikTok, use 9:16 portrait images. For standard video, use 16:9 landscape. Mismatched ratios result in cropping or letterboxing, which wastes pixels and can cut off important content.
Composition
Leave space for motion. If your subject is tightly cropped against the frame edge, the model has no room to animate movement. A portrait with some headroom allows for natural head turns and gestures. A product shot with surrounding space enables smooth camera orbits.
Lighting
Even, well-exposed lighting produces the most consistent animations. Harsh shadows or extreme contrast can confuse the model about depth and surface boundaries, leading to unnatural motion in shadow areas.
Step 2: Upload and generate
Use the --input flag to pass your image file to the generate command. The CLI uploads the image, submits the generation job, and returns the result.
$ agent-media generate kling3 -p "Slow camera orbit around the product, soft reflections on surface, studio lighting" --input product-shot.jpg --sync
Uploading product-shot.jpg...
Submitting to Kling 3.0 Pro (image-to-video)...
Completed in 2m 12s
https://ppwvarkmpffljlqxkjux.supabase.co/storage/v1/.../output.mp4
The same workflow works with Seedance — just swap the model slug:
$ agent-media generate seedance1 -p "Person smiles and turns head slightly, natural movement, warm expression" --input portrait.jpg --sync
Step 3: Write prompts for motion, not content
This is the most important concept in image-to-video generation. The model already sees your image — it knows what the scene looks like. Your prompt should describe how things move, not what they are. Describing the image content again can confuse the model and produce inconsistent results.
Bad prompt (describes image)
Good prompt (describes motion)
Here are the key categories of motion you can describe in your prompts:
- *Camera movement: pan left, dolly forward, slow orbit, crane up, rack focus, zoom in gradually
- *Subject actions: turns head, smiles, blinks, reaches forward, takes a step, gestures with hands
- *Environmental effects: wind blows through hair, water ripples, clouds drift, smoke rises, light shifts across surfaces
- *Temporal qualities: slow motion, time-lapse, gentle movement, sudden burst of energy
Model comparison for image-to-video
Both Kling and Seedance handle image-to-video well, but they excel at different content types. Choosing the right model for your source image makes a significant difference in output quality.
Kling 3.0 Pro
187 credits / 5s
- + Best for product shots and object animation
- + Native 4K output preserves source image detail
- + 60fps for smooth camera movements
- + Excellent at environment and scene animation
Seedance 1.0 Pro
104 credits / 5s
- + Best for human subjects and portraits
- + Natural facial expressions and body movement
- + Lowest cost for image-to-video
- + Flexible duration from 1.2 to 12 seconds
A simple rule of thumb: if your image has a person as the main subject, start with Seedance. For everything else — products, landscapes, architecture, food — start with Kling.
Advanced: The two-step pipeline
What if you do not have a source image? You can generate one. Flux 2 Pro creates photorealistic images at just 5 credits each, making it the perfect first step in a fully AI-generated pipeline. Generate the still image, then animate it into video — all from your terminal.
$ agent-media generate flux2-pro -p "Professional headshot of a young woman in a modern office, soft window light, warm tones, portrait orientation" --sync
Completed in 4s
https://ppwvarkmpffljlqxkjux.supabase.co/.../output.jpg
$ curl -sL "https://ppwvarkmpffljlqxkjux.supabase.co/.../output.jpg" -o headshot.jpg
$ agent-media generate seedance1 -p "Woman smiles warmly, slight head nod, natural blink, professional demeanor" --input headshot.jpg --sync
Completed in 58s
https://ppwvarkmpffljlqxkjux.supabase.co/.../output.mp4
Total cost: 109 credits (5 for the image + 104 for the video). That is roughly $2.07 for a fully AI-generated video from scratch. You can script this entire pipeline in bash for batch production — see our automation guide for details.
Tips for best results
Start with subtle motion
The most convincing image-to-video results come from gentle, natural motion. A slow camera push, a slight head turn, a breeze through leaves. Requesting dramatic action from a still image often produces artifacts because the model has to hallucinate significant changes from limited information. Start subtle and increase motion complexity as you learn what each model handles well.
Match prompt intensity to duration
A 5-second clip supports one or two motions. A 10-second clip can handle a sequence. Do not pack a 5-second generation with five different actions — the model will try to rush through all of them and none will look natural. One clear action per 5 seconds is a good baseline.
Use consistent lighting between image and prompt
If your source image has warm golden-hour lighting, do not prompt for cold blue moonlight. The model will try to reconcile conflicting signals and produce inconsistent color shifts. Your prompt should extend the mood of the image, not fight it.
Test with Seedance first, upgrade to Kling for finals
At 104 credits per generation, Seedance is nearly half the cost of Kling. Use it to iterate on your motion prompt until you get the movement right. Once you are happy with the motion direction, switch to Kling for the final 4K render. This workflow saves credits during the creative phase where you are likely to run multiple attempts.
Cost breakdown
Here is what different image-to-video workflows cost in credits:
| Workflow | Credits | Approx. Cost |
|---|---|---|
| Your image + Seedance (5s) | 104 | $1.98 |
| Your image + Kling (5s) | 187 | $3.55 |
| Flux 2 Pro image + Seedance (5s) | 109 | $2.07 |
| Flux 2 Pro image + Kling (5s) | 192 | $3.65 |
| Grok Imagine image + Seedance (5s) | 121 | $2.30 |
Costs estimated at $0.019/credit (Starter plan rate: $19/1,000 credits).
Bring your images to life
Any image, any model, one command. Plans start at $19/mo with 1,000 credits.