Uberduck #tacotron-2-support

Join Discord

Amizade | Pony's voice creator

12/11/2022, 1:28 PM

https://colab.research.google.com/drive/1oKikTdvD_vVUwnD7JNIN7JRGGaZuYCEi#scrollTo=UP1trdpN_jV6

Amizade | Pony's voice creator

12/11/2022, 1:29 PM

What's wrong?

hecko

12/11/2022, 1:35 PM

i can't access that because it's your private copy

hecko

12/11/2022, 1:36 PM

please send me the dataset

Amizade | Pony's voice creator

12/11/2022, 1:36 PM

wait a minute

hecko

12/11/2022, 1:37 PM

the notebook won't help me

hecko

12/11/2022, 1:37 PM

i need the dataset

Amizade | Pony's voice creator

12/11/2022, 1:37 PM

dataset?

hecko

12/11/2022, 1:37 PM

yes

hecko

12/11/2022, 1:37 PM

the audio

Amizade | Pony's voice creator

12/11/2022, 1:37 PM

oh ok

Amizade | Pony's voice creator

12/11/2022, 1:37 PM

I'll send the folder

Amizade | Pony's voice creator

12/11/2022, 1:39 PM

@hecko

hecko

12/11/2022, 1:42 PM

hecko

12/11/2022, 1:42 PM

so i see you have a 30.mp3 file

hecko

12/11/2022, 1:42 PM

which might be the issue

hecko

12/11/2022, 1:42 PM

but you also have 8.wav which is stereo instead of mono

hecko

12/11/2022, 1:42 PM

and 30.wav is stereo too

hecko

12/11/2022, 1:43 PM

but 30-2.wav is mono and good

Amizade | Pony's voice creator

12/11/2022, 1:43 PM

Ah yes. it's just that I forgot to convert when I isolated the voice and removed the background music

Amizade | Pony's voice creator

12/11/2022, 1:43 PM

convert to wav file

Amizade | Pony's voice creator

12/11/2022, 1:51 PM

okay, thank you

AhmadGT

12/12/2022, 8:55 PM

which one should i keep my eye on? this or the other?

AhmadGT

12/12/2022, 8:56 PM

(the other)

AhmadGT

12/12/2022, 8:56 PM

i think they are the same :)

tangynacho

12/12/2022, 9:13 PM

Hey guys! Pretty new to this stuff. I've been training on a pretty sizeable dataset (approx 750 files, avg time maybe 4-5 seconds each) for quite a few hours now. I don't really know what the graphs are supposed to look like. But I noticed there's an audio tab on TensorBoard so I refresh and check that every once in a while. It's about 75 epochs in and the audio still sounds extremely demonic. The one that says "AudioSample/train" also sounds really demonic. Is this normal? Intuitively the train file would be the original file and so it should sound normal, is that not the case? And if this is all normal, do I just need to let it train for a while longer? Thanks in advance!

Gosmokeless28

12/12/2022, 9:44 PM

Yes it's normal, that's just the audio quality of the preview clips.

tangynacho

12/12/2022, 9:45 PM

Ok cool, thanks. So, what's a good way to know when it's trained enough?

Gosmokeless28

12/12/2022, 9:53 PM

Personally, I use what I call "one size fits all" parameters when it comes to training Uberduck Pipeline Tacotron 2 models. Here are my recommendations: Epochs: 200 Batch Size: 8 Learning Rate Decay: ☑️

tangynacho

12/12/2022, 10:03 PM

Ok so you think I should let it run until it reaches 200 epochs?