Wow. Amazing. I have so many questions.
Did you get your 330 samples from multiple acapellas? Is that alright because if someone is singing in different styles they shouldn’t be in the same dataset, right? I also find myself skipping lots of parts of a song that have background singing.
Did you clean up the headphone bleed on the acapellas before training? Or were the acapellas already perfectly isolated vocals?
Approximately how many seconds per sample do you recommend?