https://uberduck.ai/ logo
Join Discord
Powered by
# machine-learning
  • u

    {K EY1} (Kei)

    09/28/2022, 5:08 PM
    Nonattentive tacotron will use ref audio It's in production rn
  • h

    HolyArapaima

    09/28/2022, 5:08 PM
    I heard about that I am really excited for it
  • p

    PixPrucer

    09/28/2022, 5:10 PM
    I believe I never posted any other NNSVS synthesis examples here of mine
  • p

    PixPrucer

    09/28/2022, 5:11 PM
    I'm mainly working on making Polish support for it, as well as some extra resources to make it easier to train a model for this language
  • f

    fatherallah

    09/28/2022, 5:15 PM
    Wow. Amazing. I have so many questions. Did you get your 330 samples from multiple acapellas? Is that alright because if someone is singing in different styles they shouldn’t be in the same dataset, right? I also find myself skipping lots of parts of a song that have background singing. Did you clean up the headphone bleed on the acapellas before training? Or were the acapellas already perfectly isolated vocals? Approximately how many seconds per sample do you recommend?
  • u

    {K EY1} (Kei)

    09/28/2022, 5:15 PM
    This was them actually voicing this bank
  • f

    fatherallah

    09/28/2022, 5:16 PM
    I’ll look into this. Does nnsvs have the option for reference audio?
  • u

    {K EY1} (Kei)

    09/28/2022, 5:16 PM
    No
  • u

    {K EY1} (Kei)

    09/28/2022, 5:16 PM
    You'll need to tune and make a midi
  • h

    hecko

    09/28/2022, 5:16 PM
    i think you could patch it together with like
  • h

    hecko

    09/28/2022, 5:16 PM
    there was a vocalistener-like plugin for utau
  • h

    hecko

    09/28/2022, 5:16 PM
    but mostly yeah midi is the way to go
  • u

    {K EY1} (Kei)

    09/28/2022, 5:16 PM
    Oh yeah something like that would be. Sorta similar.
  • f

    fatherallah

    09/28/2022, 5:17 PM
    Interesting OK
  • h

    hecko

    09/28/2022, 5:17 PM
    btw how long does it take to label 1 minute of audio
  • u

    {K EY1} (Kei)

    09/28/2022, 5:18 PM
    Oh that highly depends on the person
  • h

    hecko

    09/28/2022, 5:18 PM
    how long does it take for you then
  • u

    {K EY1} (Kei)

    09/28/2022, 5:18 PM
    I've never timed I can later
  • p

    postmates!!

    09/28/2022, 5:18 PM
    for me id take like idk 3 mins?
  • p

    postmates!!

    09/28/2022, 5:18 PM
    if im transcribing too
  • p

    postmates!!

    09/28/2022, 5:18 PM
    then maybe
  • p

    postmates!!

    09/28/2022, 5:18 PM
    10 mins
  • h

    HolyArapaima

    09/28/2022, 5:18 PM
    It's been a while since I've done much with talknet, I basically just sang a bunch of songs I knew the best by heart and chopped them up where I felt it was appropriate. The samples were recorded in my studio on my RE20 so they were pretty damn clean but I slightly gated the hum of my fridge. For me I just dumped everything in the same dataset but I recommend using things that stay relatively consistent.
  • p

    postmates!!

    09/28/2022, 5:18 PM
    or something
  • u

    {K EY1} (Kei)

    09/28/2022, 5:19 PM
    I'm probs gonna make 3 versions of my talknet bank Soft, normal, and power I'll use the data i'll use for eng nnsvs but segment it
  • u

    {K EY1} (Kei)

    09/28/2022, 5:20 PM
    Cuz for nnsvs i'm gonna do soft/normal/power and flag the appends
  • h

    HolyArapaima

    09/28/2022, 5:21 PM
    That sounds fun I haven't decided how I am gonna segment my next recording sesh because I am unsure how singing all these songs that were giving to me is gonna go 😂
  • p

    PixPrucer

    09/28/2022, 5:21 PM
    Love how everyone just ignored that
  • u

    {K EY1} (Kei)

    09/28/2022, 5:21 PM
    I didnt ignore it in my head
  • u

    {K EY1} (Kei)

    09/28/2022, 5:22 PM
    Didnt reply tho, sorry
1...983984985...1068Latest