So I wanted some voice conversion to make Fam’s speech less robotic yet my findings resulted in:

Which leans a lot into using Machine Learning and thus, requiring expensive GPUs or you’re stuck with something basic like PyWorld.

I have found an interesting gui and a video showcasing it which helps quite a bit but also shows its limitations.