Retold – Bring Stories Back To Life.

Retold AI transforms full movies, long-form videos, or raw .mp4 files into AI-generated cinematic trailers and human-style story retellings.

Retold AI preview

Overview

Retold AI is an experimental multimodal system that transforms full-length movies and long-form videos into cinematic AI-generated trailers and human-style narrative summaries. Built on top of OpenAI's multimodal models, Retold analyzes an entire film, identifies emotional and narrative peaks, and retells the story in concise prose that resembles a film critic or storyteller.

The system also generates a short video edit using selected scenes, paired narration, and optional AI-generated music. The output includes a narrative text summary and a cinematic trailer suitable for archives, short-form content, and creative research.

What Retold Does

  • Analyzes entire movies scene by scene.
  • Detects emotional peaks, tension arcs, and narrative transitions.
  • Generates a human-style narrative summary with cinematic tone.
  • Creates a short AI-edited trailer that reflects the film’s major beats.
  • Outputs a text retelling and a video edit ready for sharing or archiving.

Installation


        # Windows only
        git clone https://github.com/dfordp/retold-ai.git
        cd retold-ai
        python3 -m pip install -r requirements.txt

        # Remove placeholder files
        Get-ChildItem -Recurse -Filter ".placeholder" | Remove-Item
        

API Setup

Retold uses OpenAI for multimodal story comprehension and ElevenLabs for optional voice narration. Users must provide their own API keys.


        {
        "openai_api_key": "YOUR_API_KEY",
        "elevenlabs_api_key": "YOUR_ELEVENLABS_KEY"
        }
        

How Retold Works

Retold follows a four-stage pipeline inspired by video understanding research: ingest, extract, retell, and render.

  1. Ingest: Users place .mp4 movie files into the /movies directory.
  2. Extract: Retold loads or requests subtitles/scripts for narrative parsing.
  3. Retell: GPT-OSS identifies story structure, emotional peaks, and tension arcs, producing a cinematic prose retelling.
  4. Render: Selected scenes, narration, and timing cues are combined into a short trailer produced using MoviePy and FFmpeg.

        output/
        ├── MovieName_summary.txt
        └── MovieName_trailer.mp4
        

Running Retold

python3 main.py

A small GUI appears, allowing users to select movies, start generation, and view outputs in the /output directory.

Optional YouTube API Setup

Users may configure the YouTube API to automatically upload generated trailers. Only audited YouTube applications can publish public videos; unaudited workflows default to private uploads.

Example Output

Each Retold run outputs a narrative summary and a one-minute cinematic edit based on selected scenes. These outputs support storytelling research, film metadata, content creation, and automated summarization workflows.

Research Background

Retold emerged from a weekend experiment exploring whether short-form cinematic explainers could be automated using modern multimodal models. The project draws on research in video segmentation, emotion detection, and narrative extraction.

  • Scene Segmentation: Tencent SCRL – representation learning for video scenes.
  • Emotion Detection: Imentiv for multimodal emotional cues.
  • Tone Alignment: EMSYNC for soundtrack emotion alignment.
  • Story Generation: GPT-OSS for long-form narrative rewriting.
  • Voice: ElevenLabs for multi-character narration.
  • Rendering: MoviePy and FFmpeg for automated edit generation.
  • Music: Mubert for emotion-aware background audio.

Why I Built Retold

Retold was built as a creative exploration into whether AI can capture the emotional rhythm of a story, not just summarize plot points. The system analyzes entire films, identifies narrative beats, rewrites them in human-like prose, and then pairs the text with relevant scenes to form a cohesive short trailer. The result is an early experiment in automated cinematic storytelling.

Future Development

  • Automated music scoring aligned to emotional tone.
  • More advanced scene selection heuristics.
  • Shot-level semantic clustering.
  • Full multimodal input (video + audio + script) for better coherence.
  • Alternative narration styles with custom voice models.
Retold – Bring Stories Back To Life. | Odena | Odena