IA 첨삭 패키지
home
수학
home

AI Agent for Educational Science Video Generation (Script → Scene → Video Pipeline)

We are building an AI-powered system to help teachers create high-quality educational videos that explain complex scientific concepts through visual storytelling.
Teachers already have strong:
explanations
analogies
teaching intuition
However, they lack the production tools to translate these into clear, visual, animated content.
Our goal is to build an AI agent pipeline that converts structured teaching inputs into:
scene-based visual narratives
animation-ready outputs
final educational videos
This is not a simple prompt engineering task — we are looking for someone who can design and implement a robust multi-step AI workflow.

Project Goal

Build an AI agent (or system) that can reliably generate accurate, structured, and editable scene outputs, and convert them into educational videos.
Focus areas:
Scene generation accuracy (critical)
Human-in-the-loop validation
Video generation pipeline integration

Current Workflow (Target System)

We are aiming for the following pipeline:
1.
Teacher generates script (via GPT / Gemini)
2.
AI generates structured scene breakdowns
visuals
setting
concept mapping
~6–10 scenes per minute
3.
Teacher reviews & corrects scene outputs (critical step)
4.
AI generates video from validated scenes
5.
Text-to-speech + sync
6.
Final teacher revision

Current Challenges

We have tested tools like OpenArt and Gemini, but:
Scene outputs lack scientific accuracy
No reliable way to structure or control scene generation
Poor interoperability between tools (Gemini → OpenArt gap)
Weak support for human review loop
Example attempt:

What We Need Help With

We are looking for someone to design and/or build:

Core Focus (Priority)

Step 2: Structured scene generation system
Step 3: Human-in-the-loop validation workflow
Step 4: Video generation pipeline integration

Expected Output

A working system or prototype that can:
Convert script → structured scene JSON (or similar format)
Allow teachers to review/edit scenes easily
Generate consistent visual outputs from scenes
Produce video outputs (using existing APIs or tools)

Preferred Technical Approaches

We are open, but examples include:
LLM orchestration (GPT, Gemini, Claude)
Agent frameworks (LangChain, CrewAI, etc.)
Image/video generation APIs (Runway, Pika, Stable Diffusion, etc.)
Node-based workflows (n8n is a plus)
Custom pipeline design (Python / JS)

Ideal Candidate

Experience building AI pipelines / agents (not just prompts)
Strong understanding of multimodal systems (text → image → video)
Ability to structure outputs (JSON / schema design)
Experience with human-in-the-loop systems
Bonus:
EdTech experience
Scientific/technical content familiarity
Experience with video generation tools