Blog/

Turn Any Image into Video — Complete Image-to-Video Guide

Image-to-video is one of the most powerful capabilities in AI media generation. Take any still photograph — a product shot, a portrait, a landscape — and bring it to life with realistic motion, camera movement, and environmental effects. This guide covers everything you need to know: which models support it, how to write prompts that produce great motion, and how to build a two-step pipeline that generates both the source image and the video from scratch.

Which models support image-to-video?

Not every video model accepts an input image. Of the four video models available in agent-media CLI, two support image-to-video generation:

ModelImage InputCredits (5s)Best For
Kling 3.0 ProYes187General animation, product shots, scenes
Seedance 1.0 ProYes104Human subjects, dance, expressions
Sora 2 ProNo187Text-to-video only
Veo 3.1No395Text-to-video with audio

Step 1: Prepare your image

The quality of your input image directly affects the quality of the output video. Here are practical guidelines for getting the best results:

Resolution

Use at least 1080p. Higher resolution gives the model more detail to work with. Both Kling and Seedance will scale your image internally, but starting with a high-quality source avoids artifacts from upscaling.

Aspect ratio

Match your image aspect ratio to your target output. For YouTube Shorts and TikTok, use 9:16 portrait images. For standard video, use 16:9 landscape. Mismatched ratios result in cropping or letterboxing, which wastes pixels and can cut off important content.

Composition

Leave space for motion. If your subject is tightly cropped against the frame edge, the model has no room to animate movement. A portrait with some headroom allows for natural head turns and gestures. A product shot with surrounding space enables smooth camera orbits.

Lighting

Even, well-exposed lighting produces the most consistent animations. Harsh shadows or extreme contrast can confuse the model about depth and surface boundaries, leading to unnatural motion in shadow areas.

Step 2: Upload and generate

Use the --input flag to pass your image file to the generate command. The CLI uploads the image, submits the generation job, and returns the result.

$ agent-media generate kling3 -p "Slow camera orbit around the product, soft reflections on surface, studio lighting" --input product-shot.jpg --sync

Uploading product-shot.jpg...

Submitting to Kling 3.0 Pro (image-to-video)...

Completed in 2m 12s

https://ppwvarkmpffljlqxkjux.supabase.co/storage/v1/.../output.mp4

The same workflow works with Seedance — just swap the model slug:

$ agent-media generate seedance1 -p "Person smiles and turns head slightly, natural movement, warm expression" --input portrait.jpg --sync

Step 3: Write prompts for motion, not content

This is the most important concept in image-to-video generation. The model already sees your image — it knows what the scene looks like. Your prompt should describe how things move, not what they are. Describing the image content again can confuse the model and produce inconsistent results.

Bad prompt (describes image)

A red sports car parked in front of a modern building

Good prompt (describes motion)

Camera slowly pulls back, headlights flicker on, subtle reflections move across the hood, leaves drift across the foreground

Here are the key categories of motion you can describe in your prompts:

  • *Camera movement: pan left, dolly forward, slow orbit, crane up, rack focus, zoom in gradually
  • *Subject actions: turns head, smiles, blinks, reaches forward, takes a step, gestures with hands
  • *Environmental effects: wind blows through hair, water ripples, clouds drift, smoke rises, light shifts across surfaces
  • *Temporal qualities: slow motion, time-lapse, gentle movement, sudden burst of energy

Model comparison for image-to-video

Both Kling and Seedance handle image-to-video well, but they excel at different content types. Choosing the right model for your source image makes a significant difference in output quality.

Kling 3.0 Pro

187 credits / 5s

  • + Best for product shots and object animation
  • + Native 4K output preserves source image detail
  • + 60fps for smooth camera movements
  • + Excellent at environment and scene animation
$ agent-media generate kling3 -p "Camera orbits slowly, soft bokeh background shifts" --input hero.jpg --sync

Seedance 1.0 Pro

104 credits / 5s

  • + Best for human subjects and portraits
  • + Natural facial expressions and body movement
  • + Lowest cost for image-to-video
  • + Flexible duration from 1.2 to 12 seconds
$ agent-media generate seedance1 -p "Person smiles, head tilts, hair moves gently" --input selfie.jpg --sync

A simple rule of thumb: if your image has a person as the main subject, start with Seedance. For everything else — products, landscapes, architecture, food — start with Kling.

Advanced: The two-step pipeline

What if you do not have a source image? You can generate one. Flux 2 Pro creates photorealistic images at just 5 credits each, making it the perfect first step in a fully AI-generated pipeline. Generate the still image, then animate it into video — all from your terminal.

Step 1: Generate the source image (5 credits)

$ agent-media generate flux2-pro -p "Professional headshot of a young woman in a modern office, soft window light, warm tones, portrait orientation" --sync

Completed in 4s

https://ppwvarkmpffljlqxkjux.supabase.co/.../output.jpg

Step 2: Download the image

$ curl -sL "https://ppwvarkmpffljlqxkjux.supabase.co/.../output.jpg" -o headshot.jpg

Step 3: Animate it into video (104 credits)

$ agent-media generate seedance1 -p "Woman smiles warmly, slight head nod, natural blink, professional demeanor" --input headshot.jpg --sync

Completed in 58s

https://ppwvarkmpffljlqxkjux.supabase.co/.../output.mp4

Total cost: 109 credits (5 for the image + 104 for the video). That is roughly $2.07 for a fully AI-generated video from scratch. You can script this entire pipeline in bash for batch production — see our automation guide for details.

Tips for best results

Start with subtle motion

The most convincing image-to-video results come from gentle, natural motion. A slow camera push, a slight head turn, a breeze through leaves. Requesting dramatic action from a still image often produces artifacts because the model has to hallucinate significant changes from limited information. Start subtle and increase motion complexity as you learn what each model handles well.

Match prompt intensity to duration

A 5-second clip supports one or two motions. A 10-second clip can handle a sequence. Do not pack a 5-second generation with five different actions — the model will try to rush through all of them and none will look natural. One clear action per 5 seconds is a good baseline.

Use consistent lighting between image and prompt

If your source image has warm golden-hour lighting, do not prompt for cold blue moonlight. The model will try to reconcile conflicting signals and produce inconsistent color shifts. Your prompt should extend the mood of the image, not fight it.

Test with Seedance first, upgrade to Kling for finals

At 104 credits per generation, Seedance is nearly half the cost of Kling. Use it to iterate on your motion prompt until you get the movement right. Once you are happy with the motion direction, switch to Kling for the final 4K render. This workflow saves credits during the creative phase where you are likely to run multiple attempts.

Cost breakdown

Here is what different image-to-video workflows cost in credits:

WorkflowCreditsApprox. Cost
Your image + Seedance (5s)104$1.98
Your image + Kling (5s)187$3.55
Flux 2 Pro image + Seedance (5s)109$2.07
Flux 2 Pro image + Kling (5s)192$3.65
Grok Imagine image + Seedance (5s)121$2.30

Costs estimated at $0.019/credit (Starter plan rate: $19/1,000 credits).

Bring your images to life

Any image, any model, one command. Plans start at $19/mo with 1,000 credits.

$ agent-media generate kling3 -p "..." --input photo.jpg --sync