So I wanted some voice conversion to make Fam’s speech less robotic yet my findings resulted in:
- PyWorld: Male+Female voices = Young Boy
- SoftVC VITS Singing Voice Conversion: Outdated
- Amphion: This mixes a lot into one package, making it huge (14.6 Gb just for Vevo)!
Which leans a lot into using Machine Learning and thus, requiring expensive GPUs or you’re stuck with something basic like PyWorld.
I have found an interesting gui and a video showcasing it which helps quite a bit but also shows its limitations.