Uberduck #machine-learning

ZachB13

11/18/2022, 5:06 PM

Not necessarily sure if this is the right place to post this, but I'm honestly curious as to why Uberduck is still using Tacotron2 when there are many other, and frankly better, open source models out there now such as glow TTS and VITS.

hecko

11/18/2022, 5:18 PM

the main reason as far as i can tell is that code takes effort

hecko

11/18/2022, 5:18 PM

there are plans to add vits as well as a variant of tacotron but they seem to have been dormant for months

hecko

11/18/2022, 5:18 PM

i will note though that fastpitch is implemented already, currently used for a whopping 3 voices

{K EY1} (Kei)

11/18/2022, 5:41 PM

Wait since when has fastpitch been implemented

{K EY1} (Kei)

11/18/2022, 5:41 PM

Is there a training notebook for it?

hecko

11/18/2022, 6:54 PM

since a while ago

hecko

11/18/2022, 6:54 PM

no training nb, the only model on the site was trained by zwf

hecko

11/18/2022, 6:54 PM

i've messed with coqui's fastpitch but i don't think it'd work

ZachB13

11/18/2022, 7:46 PM

Yeah Coqui doesn't seem to want to work with anything that isn't their own server or commandline tool.

ZachB13

11/18/2022, 7:46 PM

I question what the point of it even is, if they won't add support for any of the speech APIs on mac os and windows.

zwf

11/18/2022, 7:52 PM

we've also struggled to get as good performance on glowTTS and vits with our datasets

zwf

11/18/2022, 7:52 PM

doesn't mean it's impossible

zwf

11/18/2022, 7:52 PM

but more work to do, neither is a drop in replacement particularly with the very small / finicky data people work with here

hecko

11/18/2022, 7:54 PM

i was gonna say that finetuning might be difficult but wasn't sure, thanks for the confirmation

hecko

11/18/2022, 7:55 PM

doesn't help that vits is comparatively giant

ZachB13

11/18/2022, 8:00 PM

oh I can definitely see how that could be of concern. I'm more interested in doing like actual text to speech with professionally recorded audio rather than character voices lol.

hecko

11/18/2022, 8:13 PM

por que no los dos

hecko

11/18/2022, 8:13 PM

e.g. league of legends voice acting is professionally recorded, and it's available as clean high-quality files main issue is there's not hours of it per character

Reclezon

11/18/2022, 8:41 PM

Now I wonder - how large of a datset is needed for it? Or is there is there some other system that works better with smaller ones

Reclezon

11/18/2022, 8:44 PM

A lot of voices, espically some smaller series ik won't have hours of data on one voice

hecko

11/18/2022, 10:06 PM

one can always try multispeaker

Glow

11/19/2022, 10:27 AM

I am new here. I am a ml engineer for 6 years. What can I do in this server?

{K EY1} (Kei)

11/19/2022, 2:17 PM

Mostly it's just people asking for help with making models And people meming

zwf

11/19/2022, 3:25 PM

what kind of stuff do you work on?

Reclezon

11/19/2022, 7:13 PM

Didn't know there was multispeaker already for it. Neat.

hecko

11/19/2022, 10:30 PM

coqui has multispeaker for everything that it supports

hecko

11/19/2022, 10:30 PM

so vits and glow-tts are covered

hecko

11/19/2022, 10:31 PM

nvidia's official rad-tts repo has two layers of multispeaker

hecko

11/19/2022, 10:31 PM

with the second one supposed to be for emotion but i'm reusing it for noise amount