Uberduck #machine-learning

mepc36

12/16/2022, 8:31 PM

has anyone come across a tool that automatically removes unsuitable audio from a tts training dataset?

mepc36

12/16/2022, 8:32 PM

Here are the following classes of unsuitable audio I'd like it to detect: 1. Audio has too much noise in it. 2. Audio's speech does not match transcription speech. 3. Speaker is speaking too quickly to be intelligibly understood. 4. Audio captures a different speaker than the labeled speaker. 5. Audio speech contains out-of-vocabulary words.

mepc36

12/16/2022, 8:33 PM

I'm about to build a solution to do this so if anyone could save me a month of work by telling me that'd be great, haha

(Dawn) Will Draw Fictional Women

12/16/2022, 9:11 PM

>out of vocabulary words

(Dawn) Will Draw Fictional Women

12/16/2022, 9:11 PM

would that be an issue???

haru0l

12/17/2022, 4:54 PM

@Gosmokeless28 apologies for the ping ™️ but is it fine if i used your spongebob dataset to train on diff-svc?

Gosmokeless28

12/17/2022, 7:46 PM

That depends: Which SpongeBob dataset did you use?

Gosmokeless28

12/17/2022, 7:47 PM

Cuz one of them was originally made by Speaking of AI, not me

MegaKeith

12/18/2022, 1:36 AM

Hi I wish to train a voice clone for steve jobs and I got a rtx 4090... I just wonder is this gpu good enough for training?

Gosmokeless28

12/18/2022, 1:38 AM

I assume so, but why do you need your own GPU? Are you going to train the model locally or something?

MegaKeith

12/18/2022, 1:41 AM

Oh gotcha I can use Colab!

hecko

12/18/2022, 1:42 AM

4090 is probably like 5x better than colab

hecko

12/18/2022, 1:42 AM

that being said it does take some effort and knowledge to set up local training

hecko

12/18/2022, 1:43 AM

i tried to make the pipeline notebook not depend on colab but it hasn't been tested outside of it

hecko

12/18/2022, 1:44 AM

and it does still depend on linux, specifically debian/ubuntu/etc

MegaKeith

12/18/2022, 1:46 AM

ops I do not have linux though 😢

hecko

12/18/2022, 1:49 AM

,,though come to think of it the parts that depend on linux are mostly just the dataset loader which you probably won't need

hecko

12/18/2022, 1:50 AM

i know @Justin trains talknet on windows, idk about tacotron though

haru0l

12/18/2022, 2:37 AM

that would be the recent one in #835647732453605376

Cris140

12/18/2022, 12:20 PM

It's easy to set up with Anaconda

Gosmokeless28

12/18/2022, 6:41 PM

But there are two recent ones in datasets. Can you specify which one?

GaryThisSide

12/19/2022, 8:19 AM

so hi

GaryThisSide

12/19/2022, 8:19 AM

when i was trying to fit data in the modle

GaryThisSide

12/19/2022, 8:19 AM

random forest regressor

GaryThisSide

12/19/2022, 8:19 AM

im getting this error

GaryThisSide

12/19/2022, 8:19 AM

how i can deal with it

mepc36

12/19/2022, 2:30 PM

I would think it was, why wouldn't it be? I always thought that transcription engines like Kaldi check audio against a pre-defined dictionary. At the very least I've gotten OOV errors when using forced word alignment tools like Gentle: https://github.com/lowerquality/gentle Is your experience different? I'd love to know if so, thanks @(Dawn) Will Draw Fictional Women

mepc36

12/19/2022, 2:32 PM

On this topic, is there any solution for easily turning Colab notebooks into web-accessible servers with RESTful APIs? I had to spin up a JS server on a GPU from AWS' EC2 and call python3 as a child process in order to do some audio synthesis for the prod env of my app. It was super manual, ugh

hecko

12/19/2022, 2:49 PM

i've seen some notebooks use ngrok and cloudflare and such but for production i highly recommend against this, you'd have to manually restart it every day even with pro+ you should be able to synthesize without a gpu though, last i heard uberduck synthesizes on cpu

mepc36

12/19/2022, 2:50 PM

oh thats awesome if thats true, ill look into it! thanks