Summary
We are looking for a technically strong video editor or automation specialist to help us automate multi-angle interview video editing using AI-assisted tools.
Our interviews are recorded on Zoom and typically produce two synchronized video files:
•
Gallery View (all participants)
•
Speaker View (active speaker close-up)
e.g.
interviewer close-up
speaker close-up
gallery
*Sometimes, the interview is done in studio
The goal is to automatically or semi-automatically cut between views based on who is speaking and the context — similar to a professionally edited interview.
We have already attempted this using Descript and other AI tools, but the results were inconsistent. We are now looking for someone with deeper expertise to either fix this workflow or design a better one.
Project Goal
Design a reliable workflow (automation or semi-automation) that:
•
Takes multiple Zoom video files (Gallery View + Speaker View)
•
Keeps them perfectly time-synced
•
Detects who is speaking (interviewer vs interviewee)
•
Automatically switches camera angles based on speaker or logic rules
•
Outputs a clean, interview-style edited video
•
Allows manual overrides or fine-tuning if needed
This does not need to be 100% fully automated — accuracy and editorial quality are more important than full automation.
Last but not least, let’s allow adding a template background of choice.
What We Already Tried
•
Descript (Combine into sequence, Speaker Detection)
•
Auto speaker-based cuts
•
Multi-file synchronization
Issues encountered:
•
Speaker view not switching reliably
•
Poor control over when to use gallery vs close-up
•
Inconsistent results across different recordings
We are open to:
•
Improving the Descript workflow
•
Using a different AI video editing tool
•
Combining AI tools with Premiere Pro / After Effects
•
Building a repeatable semi-automated pipeline
What We’re Looking For
You should have experience with one or more of the following:
•
AI-assisted video editing workflows
•
Multi-camera / multi-angle interview editing
•
Zoom recording formats (gallery view, speaker view)
•
Speaker detection / audio-driven edits
•
Tools such as Descript, Premiere Pro, After Effects, or similar
•
Designing repeatable workflows, not just one-off edits
Strong English communication is important, as the final videos are in English.

.jpg&blockId=2950e466-bde2-4a6b-82d2-196d8d47bbea)




