Voice Cloning Plugin - Native DAW Plugin for Movie Dubbing

Audio · AI/ML · C++ · JUCE · RVC · Plugin · Film Dubbing
INTRODUCTION
RVC (Retrieval-based Voice Conversion) makes high-quality voice cloning possible, but the open-source ecosystem ships it as a Python web tool — fine for hobbyists, useless for film post-production where dubbing artists work entirely inside DAWs and timing has to be frame-accurate. For a movie dubbing project, the dubbing team needed voice-conversion as a native step in their existing pipeline, not a context switch to a browser.

I wrapped RVC inside a native VST3/AU plugin so the entire flow happens inside the DAW: drop the plugin on a track, pick a trained voice checkpoint, render straight to the timeline. Trained custom checkpoints per voice and shipped them packaged with the plugin so the dubbing team could load any voice as a personal instrument.
MY ROLE
Designer & Developer — model training (RVC), plugin architecture (JUCE C++), checkpoint distribution, packaging for cross-DAW compatibility (VST3 / AU).
timeline
Late 2024
situation
Open-source RVC ships as a Python web app. For a film dubbing pipeline that's a workflow break: stop the DAW session, export audio, open browser, upload, generate, download, re-import, re-sync. Frame-accuracy, automation, undo history - all lost in the round-trip. Native plugins are how serious audio tools live - inside the session, on the track, rendered to the timeline. The gap between "powerful AI model" and "tool a film audio team can actually use" was a plugin.
task
  • Wrap RVC inference inside a native plugin format (VST3 / AU) that any DAW can load
  • Train custom voice checkpoints per voice cast member and distribute them with the plugin
  • Make the inference run inline — dubbing input on track, converted audio renders straight to timeline, frame-accurate
  • Cross-platform packaging — works in Logic, Ableton, FL Studio, anywhere with VST3 / AU support
  • Deliver a workflow the dubbing engineer can use without thinking about Python, models, or checkpoints — just track, plugin, render
action
  • Plugin shell - JUCE (C++). JUCE handles cross-DAW abstraction, audio I/O, parameter automation, GUI rendering. Plugin presents a clean track-loaded interface — pick checkpoint, set conversion parameters, hit render.
  • Inference layer - RVC. RVC's voice-conversion model integrated for inline audio processing. Input audio buffer → conversion → output buffer back to the DAW pipeline.
  • Checkpoint distribution. Trained models packaged alongside the plugin so the dubbing team got each voice as an instrument the moment they installed. Per-voice checkpoint approach turned the plugin into a personalized voice library rather than a generic tool.
  • Multi-DAW packaging. Plugin compiled for VST3 + AU formats so it loads in every major DAW on macOS and Windows.
result
  • Cross-DAW native plugin (VST3 + AU)
  • Custom voice checkpoints trained and shipped per voice in the dubbing cast
  • Output renders directly to the DAW timeline — no export/import cycle, frame-accurate
  • Dubbing engineers worked inside their session without touching Python, the model, or the checkpoint files directly
  • [need from you — film name (if shareable) or generic "feature film", number of voices trained, any quote from dubbing supervisor]