Uberduck #tacotron-2-support

Join Discord

JunKingchinaman

04/17/2023, 11:26 PM

@JunKingchinaman

JunKingchinaman

04/17/2023, 11:27 PM

i now need clone face how much?

TheOnlyZac

04/18/2023, 2:00 AM

Does anyone know if capitalization in the training set matters?

TheOnlyZac

04/18/2023, 2:01 AM

in the transcripts

YTR76

04/18/2023, 2:08 AM

It could matter

Raw

04/18/2023, 2:15 AM

Is this the thread for questions ?

Gosmokeless28

04/18/2023, 6:54 AM

Lol, no.

Gosmokeless28

04/18/2023, 6:54 AM

No.

Gosmokeless28

04/18/2023, 6:54 AM

Yes.

Lady Espeon

04/20/2023, 6:58 AM

I feel utterly defeated. I have everything ready. The multiple wav files, the transcript... yet I can't successfully put it in tacotron, no matter what I do. I have spent the 4 days on this now to get everything 😞 (also I'm new to this)

Lady Espeon

04/20/2023, 7:47 AM

https://cdn.discordapp.com/attachments/994486394049282058/1098515307330539580/Screenshot_20230420_034249_Google.jpg▾

Gosmokeless28

04/20/2023, 7:50 AM

That doesn't look like the notebook that's linked in #841437191073955920

Lady Espeon

04/20/2023, 7:28 PM

Sorry, I'm still not familiar. I did do the pipeline one but it wouldn't really work either. I gave up on it 😞

wiry-church

04/21/2023, 1:34 AM

what does

RuntimeError: shape '[1, 1, 147674]' is invalid for input of size 295348

mean?

The Watts and the Waves

04/21/2023, 2:24 AM

Wavs aren't formatted correctly if I recall

gforonda

04/21/2023, 11:36 AM

this is my upcoming Lammy (parappa the rapper) model sample at, 180 epoch's https://cdn.discordapp.com/attachments/994486394049282058/1098935225230307408/paradd.wav

https://cdn.discordapp.com/attachments/994486394049282058/1098935225557467206/Lammy_Worried.png▾

hecko

04/21/2023, 5:47 PM

means you didn't set the sample rate to 22050hz

みんなで語る未解決事件

04/21/2023, 9:25 PM

Has anyone built code to batch combine Whisper and Tacotron2 dataset wav formats? example (wavs/1-1.wav|How are you.) wavs/1-2.wav|I want car.) A no-programmer would lose a day of leisure time just to add three minutes worth of data sets 🥺. I am thinking it would be great if it would work with collab and output the transcription of the audio files in a specified folder to a txt file. However, I am not able to write the code.

hecko

04/21/2023, 11:29 PM

is this what you want? https://colab.research.google.com/drive/1ipCilWgbrBcECU29OH80F0MyPL8KdLUh it's in #841437191073955920

みんなで語る未解決事件

04/22/2023, 12:52 PM

Ohmygod thx i see this colabcode

tanooki426

04/22/2023, 2:14 PM

General question: does a lower batch size require a lower learning rate? Asking because my Derek Stiles impersonator has 98 voice clips which resulted in a batch size of 3 (6 minutes total worth of voice clips) but I kept the learning rate at the default level

Sonic2022_mario

04/22/2023, 5:18 PM

Darn It, My GPU Limit Has Ended, I Can't Train Models

https://cdn.discordapp.com/attachments/994486394049282058/1099383779417673758/Screenshot_2023-04-22_131824.jpg▾

Minecraftian47 (make x from y)

04/22/2023, 7:22 PM

That's what alt accounts are for.

Minecraftian47 (make x from y)

04/22/2023, 7:22 PM

Just dump all your models in one big shared folder.

Gosmokeless28

04/22/2023, 8:07 PM

> does a lower batch size require a lower learning rate? Yes in some cases

tanooki426

04/22/2023, 8:11 PM

So if I have a batch size of 3 but 10 minutes worth of voice clips, what should the learning rate be?

tanooki426

04/22/2023, 8:12 PM

Also, should each voice clip in a dataset be under 10 seconds?

Gosmokeless28

04/22/2023, 8:15 PM

Well, I'm not an expert on learning rate, but if I had to guess, I would say 5e-4 with learning rate decay disabled.

Gosmokeless28

04/22/2023, 8:16 PM

Each voice clip in a dataset should be under 15 seconds long, if I recall correctly.

tanooki426

04/22/2023, 8:16 PM

My current dataset has 98 voice clips totaling 6 minutes, but I think I'm going to add more and also split up some of the 10 second voice clips into intervals and see what happens