Uberduck #tacotron-2-support

PixPrucer

09/12/2022, 5:10 PM

the whole path should look something like this

/content/drive/MyDrive/konryuu_talk.zip

PixPrucer

09/12/2022, 5:10 PM

just put

/content/drive/

if front of the 1st path and you should be fine

PixPrucer

09/12/2022, 5:11 PM

(and delete /patryck at the end)

postmates!!

09/12/2022, 5:23 PM

alr

postmates!!

09/12/2022, 5:23 PM

thanks man

postmates!!

09/12/2022, 5:23 PM

couldntve done it without you

PixPrucer

09/12/2022, 5:25 PM

mepc36

09/12/2022, 5:52 PM

anyone know the foramt of the file thats supposed to be uploaded into

transcript.txt

in the notebook for finetuning a hifigan vocoder? https://colab.research.google.com/drive/1bLDMo8HprblyZEy4SuyzOG1xJFVF7ocA#scrollTo=oc-p_9o5eBdF

postmates!!

09/12/2022, 8:37 PM

dammit

PixPrucer

09/12/2022, 8:38 PM

Seems like the paths still don't match

PixPrucer

09/12/2022, 8:38 PM

I'm leaving you with solving that yourself tho because I need to sleep soon

postmates!!

09/12/2022, 8:40 PM

alr

mepc36

09/12/2022, 10:34 PM

I'm trying to use a hifigan vocoder I finetuned using the notebook in #841437191073955920 , but when I use RADTTS...https://github.com/NVIDIA/radtts... to try to infer some .wavs it ends up sounding like a super-sped up version of water flowing down a big tunnel: https://drive.google.com/drive/folders/1nKMwBepCMzZM-02qcS_JTbrBS5SlmxM2?usp=sharing Any ideas? - Did I not use enough .wavs to fine tune it? (I used 150 of them) - Do I have to wait longer for the model to finish? (I'm on step 2538225, epoch) - Is there a config in that RADTTS github repo that I'm not using properly? Thanks to anyone who can help

Cris140

09/13/2022, 12:47 AM

You're using a tacotron 2 model with radtts?

Cris140

09/13/2022, 12:47 AM

Or is it just a radtts model?

mega b

09/13/2022, 1:24 AM

Hi, what was your final loss?

Swiper

09/13/2022, 7:35 AM

if you're getting a tacotron output that sounds like this does this mean I messed up somewhere in training or does it mean it just needs to train longer

PUMPKINEATER

09/13/2022, 8:11 AM

Train longer

Swiper

09/13/2022, 8:13 AM

mepc36

09/13/2022, 11:03 AM

Thanks for your help man... TL;DR: - I'm using a radtts model When I try to synthesize Lupe's voice using radtts, I'm passing in radtts' pretrained model that they provide on their repo page here as the feature prediction model...https://github.com/NVIDIA/radtts#radtts-pre-trained-models... alongside the hifigan vocoder trained on Lupe Fiasco's voice by this Colab notebook I got from #841437191073955920 : https://colab.research.google.com/drive/1SKu2xRJy5q1wzuP5CSO8dJ-Nf-UIKz0K This is what the radtts command looks like when it successfully outputs an audible, synthesized .wav using the hifigan vocoder and pretrained radtts model (for feature prediction) that radtts supplies: python3 ./inference.py \ --radtts_path ./radtts_pretrained_dap_model.pt \ --config ./config_ljs_dap.json \ --vocoder_path ./hifigan_vocoder_from_radtts.pt \ --config_vocoder ./hifigan_vocoder_config_from_radtts.json \ --text_path ./tts-input-text.txt \ --speaker ljs \ --speaker_attributes ljs \ --speaker_text ljs \ --output_dir ~/4-tts-outputs/ But that same command outputs that garbled watery noise when I switch out the HG vocoder provided by radtts for the one I got using that Colab notebook I linked to above. Do I have to replace their pretrained model with a new radtts model that I've trained, so that I'm using both a Lupe radtts model for feature prediction and a hifigan Lupe vocoder for audio synthesis as inputs to radtts' inference.py command? Or what am I doing wrong here?

mepc36

09/13/2022, 11:04 AM

Sorry man, I'm not sure how to check the final loss. Is that info in the logs I'm guessing? If so I lost that info because I'm redoing training now. Apparently Colab Pro does not never disconnect runtime like I thought...eek jhaha

mepc36

09/13/2022, 11:24 AM

here's my last full log that I got just now: Current learning rate: 1.3413051059805782e-05 Epoch: 197 [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) [W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) Steps : 2526015, Gen Loss Total : 40.273, Mel-Spec. Error : 0.539, s/b : 0.637 Steps : 2526020, Gen Loss Total : 43.103, Mel-Spec. Error : 0.566, s/b : 0.641 Steps : 2526025, Gen Loss Total : 44.086, Mel-Spec. Error : 0.606, s/b : 0.639 Steps : 2526030, Gen Loss Total : 41.951, Mel-Spec. Error : 0.562, s/b : 0.636 Time taken for epoch 197 is 13 sec

Gamma Prime

09/13/2022, 2:06 PM

https://drive.google.com/file/d/1pHqkJTRgoJ2egecIbWtvzRS1080DEQOd/view?usp=sharing I don't know what's wrong with this. I thought I was careful enough with transcription, which I always do manually. The graphs looked promising in training, but I ended up with a model that can't say anything.

Radak

09/14/2022, 12:51 AM

I just submitted an AI for uberduck for a Coco Quinn for the singer category on text to speech it should be trained off of 99 audios will it sound good

Radak

09/14/2022, 12:53 AM

Are use the API key for the bot command

Dyno

09/14/2022, 12:53 AM

Read #answers, @Radak.

PixPrucer

09/14/2022, 5:49 PM

The Legacy Taoctron2 synthesis notebook seems to be outdated

PixPrucer

09/14/2022, 5:49 PM

Just 2 days ago it was fine, but now everytime it runs into any

!wget

command it throws a fit

PixPrucer

09/14/2022, 6:21 PM

Oh it might actually be a case of a broken model, will have to inspect further

{K EY1} (Kei)

09/15/2022, 1:13 AM

does this error mean the model is broken?