https://uberduck.ai/ logo
Join Discord
Powered by
# machine-learning
  • z

    ZachB13

    11/18/2022, 5:06 PM
    Not necessarily sure if this is the right place to post this, but I'm honestly curious as to why Uberduck is still using Tacotron2 when there are many other, and frankly better, open source models out there now such as glow TTS and VITS.
  • h

    hecko

    11/18/2022, 5:18 PM
    the main reason as far as i can tell is that code takes effort
  • h

    hecko

    11/18/2022, 5:18 PM
    there are plans to add vits as well as a variant of tacotron but they seem to have been dormant for months
  • h

    hecko

    11/18/2022, 5:18 PM
    i will note though that fastpitch is implemented already, currently used for a whopping 3 voices
  • u

    {K EY1} (Kei)

    11/18/2022, 5:41 PM
    Wait since when has fastpitch been implemented
  • u

    {K EY1} (Kei)

    11/18/2022, 5:41 PM
    Is there a training notebook for it?
  • h

    hecko

    11/18/2022, 6:54 PM
    since a while ago
  • h

    hecko

    11/18/2022, 6:54 PM
    no training nb, the only model on the site was trained by zwf
  • h

    hecko

    11/18/2022, 6:54 PM
    i've messed with coqui's fastpitch but i don't think it'd work
  • z

    ZachB13

    11/18/2022, 7:46 PM
    Yeah Coqui doesn't seem to want to work with anything that isn't their own server or commandline tool.
  • z

    ZachB13

    11/18/2022, 7:46 PM
    I question what the point of it even is, if they won't add support for any of the speech APIs on mac os and windows.
  • z

    zwf

    11/18/2022, 7:52 PM
    we've also struggled to get as good performance on glowTTS and vits with our datasets
  • z

    zwf

    11/18/2022, 7:52 PM
    doesn't mean it's impossible
  • z

    zwf

    11/18/2022, 7:52 PM
    but more work to do, neither is a drop in replacement particularly with the very small / finicky data people work with here
  • h

    hecko

    11/18/2022, 7:54 PM
    i was gonna say that finetuning might be difficult but wasn't sure, thanks for the confirmation
  • h

    hecko

    11/18/2022, 7:55 PM
    doesn't help that vits is comparatively giant
  • z

    ZachB13

    11/18/2022, 8:00 PM
    oh I can definitely see how that could be of concern. I'm more interested in doing like actual text to speech with professionally recorded audio rather than character voices lol.
  • h

    hecko

    11/18/2022, 8:13 PM
    por que no los dos
  • h

    hecko

    11/18/2022, 8:13 PM
    e.g. league of legends voice acting is professionally recorded, and it's available as clean high-quality files main issue is there's not hours of it per character
  • r

    Reclezon

    11/18/2022, 8:41 PM
    Now I wonder - how large of a datset is needed for it? Or is there is there some other system that works better with smaller ones
  • r

    Reclezon

    11/18/2022, 8:44 PM
    A lot of voices, espically some smaller series ik won't have hours of data on one voice
  • h

    hecko

    11/18/2022, 10:06 PM
    one can always try multispeaker
  • g

    Glow

    11/19/2022, 10:27 AM
    I am new here. I am a ml engineer for 6 years. What can I do in this server?
  • u

    {K EY1} (Kei)

    11/19/2022, 2:17 PM
    Mostly it's just people asking for help with making models And people meming
  • z

    zwf

    11/19/2022, 3:25 PM
    what kind of stuff do you work on?
  • r

    Reclezon

    11/19/2022, 7:13 PM
    Didn't know there was multispeaker already for it. Neat.
  • h

    hecko

    11/19/2022, 10:30 PM
    coqui has multispeaker for everything that it supports
  • h

    hecko

    11/19/2022, 10:30 PM
    so vits and glow-tts are covered
  • h

    hecko

    11/19/2022, 10:31 PM
    nvidia's official rad-tts repo has two layers of multispeaker
  • h

    hecko

    11/19/2022, 10:31 PM
    with the second one supposed to be for emotion but i'm reusing it for noise amount
1...101210131014...1068Latest