AI Agent for Educational Science Video Generation (Script → Scene → Video Pipeline)

We are building an AI-powered system to help teachers create high-quality educational videos that explain complex scientific concepts through visual storytelling.

Teachers already have strong:

•

explanations

•

analogies

•

teaching intuition

However, they lack the production tools to translate these into clear, visual, animated content.

Our goal is to build an AI agent pipeline that converts structured teaching inputs into:

scene-based visual narratives

animation-ready outputs

final educational videos

This is not a simple prompt engineering task — we are looking for someone who can design and implement a robust multi-step AI workflow.

Project Goal

Build an AI agent (or system) that can reliably generate accurate, structured, and editable scene outputs, and convert them into educational videos.

Focus areas:

•

Scene generation accuracy (critical)

•

Human-in-the-loop validation

•

Video generation pipeline integration

Current Workflow (Target System)

We are aiming for the following pipeline:

Teacher generates script (via GPT / Gemini)

AI generates structured scene breakdowns

•

visuals

•

setting

•

concept mapping

•

~6–10 scenes per minute

Teacher reviews & corrects scene outputs (critical step)

AI generates video from validated scenes

Text-to-speech + sync

Final teacher revision

Current Challenges

We have tested tools like OpenArt and Gemini, but:

•

 Scene outputs lack scientific accuracy

•

 No reliable way to structure or control scene generation

•

 Poor interoperability between tools (Gemini → OpenArt gap)

•

 Weak support for human review loop

Example attempt:

https://openart.ai/story/share/km3fvMjJhPSlZYxYCenH

What We Need Help With

We are looking for someone to design and/or build:

Core Focus (Priority)

•

Step 2: Structured scene generation system

•

Step 3: Human-in-the-loop validation workflow

•

Step 4: Video generation pipeline integration

Expected Output

A working system or prototype that can:

•

Convert script → structured scene JSON (or similar format)

•

Allow teachers to review/edit scenes easily

•

Generate consistent visual outputs from scenes

•

Produce video outputs (using existing APIs or tools)

Preferred Technical Approaches

We are open, but examples include:

•

LLM orchestration (GPT, Gemini, Claude)

•

Agent frameworks (LangChain, CrewAI, etc.)

•

Image/video generation APIs (Runway, Pika, Stable Diffusion, etc.)

•

Node-based workflows (n8n is a plus)

•

Custom pipeline design (Python / JS)

Ideal Candidate

•

Experience building AI pipelines / agents (not just prompts)

•

Strong understanding of multimodal systems (text → image → video)

•

Ability to structure outputs (JSON / schema design)

•

Experience with human-in-the-loop systems

•

Bonus:

◦

EdTech experience

◦

Scientific/technical content familiarity

◦

Experience with video generation tools