Uberduck #machine-learning

Join Discord

{K EY1} (Kei)

09/28/2022, 5:08 PM

Nonattentive tacotron will use ref audio It's in production rn

HolyArapaima

09/28/2022, 5:08 PM

I heard about that I am really excited for it

PixPrucer

09/28/2022, 5:10 PM

I believe I never posted any other NNSVS synthesis examples here of mine

PixPrucer

09/28/2022, 5:11 PM

I'm mainly working on making Polish support for it, as well as some extra resources to make it easier to train a model for this language

fatherallah

09/28/2022, 5:15 PM

Wow. Amazing. I have so many questions. Did you get your 330 samples from multiple acapellas? Is that alright because if someone is singing in different styles they shouldn’t be in the same dataset, right? I also find myself skipping lots of parts of a song that have background singing. Did you clean up the headphone bleed on the acapellas before training? Or were the acapellas already perfectly isolated vocals? Approximately how many seconds per sample do you recommend?

{K EY1} (Kei)

09/28/2022, 5:15 PM

This was them actually voicing this bank

fatherallah

09/28/2022, 5:16 PM

I’ll look into this. Does nnsvs have the option for reference audio?

{K EY1} (Kei)

09/28/2022, 5:16 PM

{K EY1} (Kei)

09/28/2022, 5:16 PM

You'll need to tune and make a midi

hecko

09/28/2022, 5:16 PM

i think you could patch it together with like

hecko

09/28/2022, 5:16 PM

there was a vocalistener-like plugin for utau

hecko

09/28/2022, 5:16 PM

but mostly yeah midi is the way to go

{K EY1} (Kei)

09/28/2022, 5:16 PM

Oh yeah something like that would be. Sorta similar.

fatherallah

09/28/2022, 5:17 PM

Interesting OK

hecko

09/28/2022, 5:17 PM

btw how long does it take to label 1 minute of audio

{K EY1} (Kei)

09/28/2022, 5:18 PM

Oh that highly depends on the person

hecko

09/28/2022, 5:18 PM

how long does it take for you then

{K EY1} (Kei)

09/28/2022, 5:18 PM

I've never timed I can later

postmates!!

09/28/2022, 5:18 PM

for me id take like idk 3 mins?

postmates!!

09/28/2022, 5:18 PM

if im transcribing too

postmates!!

09/28/2022, 5:18 PM

then maybe

postmates!!

09/28/2022, 5:18 PM

10 mins

HolyArapaima

09/28/2022, 5:18 PM

It's been a while since I've done much with talknet, I basically just sang a bunch of songs I knew the best by heart and chopped them up where I felt it was appropriate. The samples were recorded in my studio on my RE20 so they were pretty damn clean but I slightly gated the hum of my fridge. For me I just dumped everything in the same dataset but I recommend using things that stay relatively consistent.

postmates!!

09/28/2022, 5:18 PM

or something

{K EY1} (Kei)

09/28/2022, 5:19 PM

I'm probs gonna make 3 versions of my talknet bank Soft, normal, and power I'll use the data i'll use for eng nnsvs but segment it

{K EY1} (Kei)

09/28/2022, 5:20 PM

Cuz for nnsvs i'm gonna do soft/normal/power and flag the appends

HolyArapaima

09/28/2022, 5:21 PM

That sounds fun I haven't decided how I am gonna segment my next recording sesh because I am unsure how singing all these songs that were giving to me is gonna go 😂

PixPrucer

09/28/2022, 5:21 PM

Love how everyone just ignored that

{K EY1} (Kei)

09/28/2022, 5:21 PM

I didnt ignore it in my head

{K EY1} (Kei)

09/28/2022, 5:22 PM

Didnt reply tho, sorry