how tobaseball ai videoai fan camai video tutorialviral video

How to Make a Viral AI Baseball Fan Video

Step-by-step guide to making an AI baseball fan video that looks like a real broadcast catch. Photo, prompt, model, and posting tips.

Starrd Team|May 10, 202612 min read

What You're Trying to Make

You've probably seen them on your timeline by now — a 5 to 12 second clip from what looks like a televised baseball game, the camera holding on a single fan in the stands. The shot is unremarkable. The fan is just sitting there. Then the caption reveals the fan is AI-generated, and the comments lose it.

This guide walks through how to make one yourself. We'll cover what kind of photo to use, exactly what prompt to write, which AI video model to use, and how to post it so it actually gets views. By the end you'll have a generation-ready workflow you can repeat.

What the finished result looks like — generated from a single photo

Prompt used

Single continuous live sports broadcast shot, 12s, 16:9. Telephoto broadcast lens, 135mm equivalent, locked off from upper press box. Subject sits in packed stadium stands during a night game. Micro-actions only: blinks, weight shifts, sips drink. No eye contact with camera. Pure live TV capture aesthetic.

The Fastest Way — Use the Fan Cam Template on Starrd

The Fan Cam template is live in the Starrd library. It packages every step in this guide — the telephoto prompt, the time-segmented micro-actions, the audio direction, the lens spec, the identity lock — into a single upload.

  1. Pick a clear face photo. One person, front or three-quarter view, eyes open, neutral expression, decent lighting. A regular phone selfie beats a retouched headshot.
  2. Open the Fan Cam template in the Starrd app or web library.
  3. Upload the photo and tap generate. The template personalizes the broadcast prompt to your face, runs the character sheet for identity lock, then generates a 12-second broadcast catch on Seedance 2.0.

One credit, about a minute. The output is the 16:9 broadcast catch shown above — no prompt writing, no model picking, no lens math.

Fan Cam

Upload one photo, get the broadcast catch. 1 credit, ~1 minute, no prompt engineering.

Try It

The rest of this guide is for people who want to roll their own — pick a different stadium aesthetic, write a custom prompt, or run it on a model other than Seedance.

Or, Build It Yourself — What You Need

If you're going the DIY route, three things:

  1. A clear face photo of the subject. Front or three-quarter view, eyes open, neutral expression, decent lighting. One person only.
  2. Access to an AI video model that accepts a reference image. Kling, Seedance 2.0, Runway Gen-4, or any wrapper built on them.
  3. A platform to post on. TikTok, Reels, and Twitter/X are the venues this format goes viral on.

That's it. You don't need video editing software, you don't need a stock footage subscription, you don't need a Korean baseball jersey in real life.

Step 1 — Pick Your Reference Photo

The photo you feed the model determines the face that ends up in the stands. Choosing well here saves you three or four wasted generations later.

Use:

  • A clear, well-lit photo of one person
  • Front-facing or three-quarter angle
  • Neutral expression — no big smiles, no peace signs
  • Natural lighting, not heavy filters

Avoid:

  • Group photos (the model gets confused)
  • Heavy makeup or glamour shots (pushes the output toward "magazine" instead of "stands")
  • Sunglasses or anything covering the face
  • Low resolution or motion-blurred shots
  • AI-generated reference images (compounding artifacts)

A regular phone selfie taken in good light is better than a professionally retouched photo. The trend depends on the subject looking like a real fan, not a model.

Step 2 — Pick a Stadium Aesthetic

This is a creative decision that affects everything downstream. You have three main options:

Korean (KBO) Stadium

The original viral aesthetic. Bright stadium floodlights, packed crowd in team colors waving cheering sticks (응원봉), iced Americano in hand, K-pop-adjacent fashion. Maximally on-trend right now. Korean text on signage and jerseys signals authenticity to viewers.

American (MLB) Stadium

More familiar to a US audience. Lower stadium lighting, beer in hand, foam finger or cap, organ music context, classic stadium signage. Less novel but lands harder in the US market because viewers immediately recognize it.

European Football Stadium

Different sport, same broadcast grammar. Premier League and La Liga shots use the same telephoto-to-the-stands move. Scarf instead of jersey, terrace crowd packed tight, evening floodlights.

Pick one before you start writing the prompt. Mixing aesthetics ("Korean stadium with American jersey") confuses the model and produces unconvincing hybrids.

Step 3 — Write the Prompt

This is the part most people get wrong. The instinct is to make the prompt cinematic — sweeping camera moves, dramatic lighting, slow motion. Resist all of that. The whole point of this format is that nothing performative happens. Boring is the goal.

Here's the prompt structure that works. Copy this, swap in your specifics:

The Baseline Prompt (12-second version)

Single continuous live sports broadcast shot, 12s, 16:9, no cuts.

Subject sits in the upper stands of a packed [STADIUM TYPE] baseball stadium during a [day game / night game]. [Outfit description — team jersey, cap, etc]. [Drink or prop in hand]. Crowd densely packed behind, compressed by long lens.

Telephoto broadcast lens, 135mm equivalent, locked off from upper press box position. Slight handheld breath only. Shallow depth of field — subject in focus, crowd softly blurred. [Lighting note appropriate to time of day].

[0-3s] Watches the field intently, blinks twice. Small head tilt. [3-6s] Brief glance down at drink, takes a slow sip, returns gaze to field. [6-9s] Subtle weight shift, small hand reposition. [9-12s] Eyes follow something across the field. One small unconscious smile. Holds.

Unstaged, candid, real broadcast moment. No eye contact with camera. No cinematic drama. No music. Ambient stadium audio only — distant crowd murmur, faint announcer through PA. Pure live TV capture aesthetic, slight broadcast video grain, 1080p 30fps interlaced feel.

The non-negotiable elements:

  • "Single continuous live sports broadcast shot" — tells the model this is one camera, one take, no editing
  • "Telephoto broadcast lens, 135mm equivalent" — the most important sentence in the entire prompt; everything visual flows from this
  • Time-segmented micro-actions — break the 12 seconds into 3-second beats, one tiny action per beat
  • "No eye contact with camera" — single biggest "tell" if you skip this
  • "No music" — the moment a music bed plays, the broadcast illusion dies
  • "1080p 30fps interlaced feel" — pushes the model away from the slick 24fps cinematic default

Want a deeper teardown of why every line matters? We wrote the whole KBO Fan Cam prompt breakdown for that.

Step 4 — Pick a Model

The format works on multiple AI video models. Each has trade-offs:

  • Kling 1.6 / 2.0 — what the original viral clips used. Strong identity preservation, handles the candid micro-action style well. Subscription required for serious use.
  • Seedance 2.0 — the model Starrd runs on. Best-in-class text adherence, generates 12s natively, strong audio generation. Higher per-generation cost but fewer wasted runs.
  • Runway Gen-4 — solid all-rounder, good identity lock with reference images. Slightly more "cinematic" default that you have to actively suppress in the prompt.
  • Veo 3 — capable but tends toward overly polished outputs. Requires extra prompt work to make it look "broadcast" instead of "film."

If you don't have a preference, Seedance 2.0 is the safest bet — its prompt adherence is the highest, which matters when the entire format depends on the model not doing the dramatic thing it wants to do. (Read our Seedance vs Kling vs Veo comparison if you want the full breakdown.)

Step 5 — Generate and Iterate

First generations rarely nail it. Common failures and fixes:

The subject looks at the camera. Add stronger negative direction: "Subject NEVER looks at camera. Gaze stays on the field at all times."

The crowd behind looks wrong (repeating faces, screensaver motion). Reframe tighter — closer crop on the subject so less crowd is visible. Or generate at higher quality if the model supports it.

The output looks too cinematic / slick. Add more "broadcast" language: "1080i 60fps broadcast feed, slight video compression artifacts, broadcast color science (not film), zero cinematic grading."

The face doesn't look like the reference photo. Try a different reference photo with a clearer front-facing view. Some models also let you weight the reference image more heavily.

Weird mouth animation on the sip. Replace "takes a sip" with "adjusts grip on drink, brings it slightly toward face, lowers it" — fewer mouth frames, same beat.

Budget 3-5 generations before you get a keeper. If you're 8+ deep without a usable one, something structural is off — reread your prompt and check whether you've drifted toward "cinematic" anywhere.

Step 6 — Post It

The generation is half the work. How you post it determines whether it gets 200 views or 2 million.

Caption framing. The viral originals didn't say "made with AI." They captioned like a real broadcast moment — "KBO last night 😭" or "Lotte fan was unbothered the whole game." This is how the algorithm classifies your post as sports content rather than AI demo content. (Platforms are increasingly requiring AI disclosure labels — comply, but the body of your caption still matters within those rules.)

Vertical reframe for TikTok/Reels. Generate at 16:9 first, then crop to 9:16 centered. The telephoto compression only works in 16:9 natively — vertical generations break the lens illusion.

Timing. Post during US evening or Korean afternoon if you're hoping for either market. Sports content peaks during real game windows.

Engagement bait in comments. Drop one comment 20-30 minutes after posting — "wait is this AI?" The controversy signal drives reach. Sounds gross, works anyway.

Don't over-edit. No text overlays, no music, no zoom-in effects. The format only works if the clip looks unaltered. Adding TikTok edits gives away that it's a clip you're trying to make go viral.

Common Mistakes That Tank Your Video

After watching hundreds of attempts at this format, these are the recurring failure modes:

  1. Going too cinematic. Adding "dramatic," "epic," or "stunning" anywhere in the prompt pushes the model toward AI-video defaults. Cut all of it.
  2. Camera movement. Pans, zooms, and tracking shots immediately read as "produced." The broadcast catch is locked off.
  3. Performance beats. Smiling at the camera, waving, doing a pose. Real fans don't do this when they don't know they're on camera.
  4. Wrong lens. Wide angles look like phones. Anything under 85mm equivalent kills the trend.
  5. Music. Adding a music bed under the clip is the fastest way to ruin it.
  6. Too long. Real broadcast holds on fans are 3-6 seconds. 12 seconds is already long. 30+ second versions don't work.
  7. Recognizable celebrity face. If your reference photo is someone famous, the model will fight identity preservation and the viewer will instantly know what they're looking at.

Window of Opportunity

This trend has maybe a month of life left before saturation kills it. If a viral baseball video is the goal and the prompt engineering above sounds like a chore, the Fan Cam template at the top of this guide is the same workflow in one tap.

Festival Main Stage

Same broadcast-style approach for concert / festival content

Try It

Frequently Asked Questions

How do I make an AI baseball fan video? Pick a clear face photo, choose a stadium aesthetic (KBO, MLB, or European football), write a telephoto broadcast prompt with time-segmented micro-actions, generate on an AI video model that supports reference images (Kling, Seedance 2.0, Runway, or Veo), and post without "made with AI" framing. Or use the Starrd Fan Cam template to skip all the prompt work.

What AI model is best for baseball fan videos? Seedance 2.0 has the best prompt adherence, which matters because the format depends on the model NOT doing the dramatic thing it wants to do. Kling 1.6/2.0 was used for the original viral clips. Runway Gen-4 and Veo 3 also work but require more "broadcast-style" prompt language to suppress their cinematic defaults.

Why does my AI video look fake / get clocked as AI? Five common tells: the subject looks at the camera (real fans don't), camera moves or zooms (broadcast catches are locked off), the prompt asks for "cinematic" or "dramatic" (kills the broadcast feel), wide-angle lens (telephoto compression is what reads as TV), or you added a music bed (the format only works with ambient stadium audio).

How long should the video be? 3-12 seconds. Real broadcast holds on fans are 3-6 seconds; 12 seconds is the safe upper bound on Seedance 2.0. Anything longer than 12 seconds without major action starts feeling unnatural and gives the AI generation away.

What photo should I use as a reference? A clear, well-lit photo of one person, front or three-quarter view, neutral expression, no heavy filters or makeup. Avoid group photos, sunglasses, low resolution, or AI-generated reference images. A regular phone selfie in good light beats a professionally retouched headshot for this trend.

Do I need to disclose that the video is AI-generated? Yes — most platforms (TikTok, Instagram, YouTube) now require AI-generated content to be labelled. Comply with the platform rules. The framing of your caption still matters within those rules: describe it like a real broadcast moment ("KBO last night 😭") rather than as an AI demo.

Can I make this without writing the prompt myself? Yes. The Starrd Fan Cam template handles the prompt, the lens settings, the time-segmented micro-actions, the audio direction, and the personalization to your face. One upload, one tap, broadcast-grade catch — no prompt engineering required.

Ready to create your own video?

Pick a template, upload your photos, and generate a cinematic Seedance 2.0 video in minutes.

Browse Templates