Uberduck

For some reason my voice I'm doing now can't do anything exept regurgitate what I taught it

I've been told it's because I don't have enough audio so i just added more

might've fine tuned it too much as well, make sure you have a holdout set of wave files you use for a validation file

basically that's data which is never shown to the network for training but periodically compared against your network during training to see a) that its learning the patterns and b) not just learning to mimic the exact training data

Do I just do the audio in the zip but seperate the transcriptions between two files?

if you have no validation data that's most likely the issue, note: validation.txt and training.txt should have no overlaps

I'll do this next model since im already training it. If there's issues I'll go back and take a third of it or so out

basically yes, and make sure that you point to the validation script in your config files (or CLI inputs)

What do I do though if some of it is singing and some of it is talking?

Wouldn't it pick up on inconsistant things?

Since one of them is natural, one is on beat

I haven't worked with singing actually which is why I was asking here

<@608232523230871552>  actually I was looking at your offline talknet singing repo, i think its forked from the offline model I used to do general talknet training but can I use either of those repos to make a multipurpose voice and do I need to do anything specific (like make regular audio and singing into seperate speakers) or I can train it all as once. If I want to train both at once do I use this script https://github.com/justinjohn0306/ControllableTalkNet-Singer  or the general offline talknet its forked from

How do you train something for multiple speakers? What's that mean in techincal terms.

I'm computer savvy just new to voice AI stuff

you know how the dataset has fileloc|text spoken|id format

Welll id format isn't a thing in talknet as far as I can tell

that id can correspond to different people for talknet2 models, and then you can train a multiple speaker dataset