Uberduck #machine-learning

Join Discord

Reclezon

11/05/2022, 2:44 AM

3 wavs? Would shoot towards having less than 0.10 loss value to start

Reclezon

11/05/2022, 2:45 AM

This is obviously an improvement over sounding like LJSpeech, but it's not right on the mark.

Amizade | Pony's voice creator

11/05/2022, 2:46 AM

Okay, but where did I go wrong was in the epoch?

Reclezon

11/05/2022, 2:48 AM

Nothing wrong with the epoch, it's mostly the loss you need to watch for. Some models may need more epochs to train on, some may not

Amizade | Pony's voice creator

11/05/2022, 2:48 AM

Oh, I understand

Amizade | Pony's voice creator

11/05/2022, 2:49 AM

but it took me almost 1 hour to wait for the model to be ready

Reclezon

11/05/2022, 3:00 AM

I've only gotten accidentally lucky twice on each nb so far.. 😅 Consistent testing is definitely avoids any risk any any further pains of trying to correct models

Reclezon

11/05/2022, 3:02 AM

The lucky ones weren't as great as they probably could've really been either imo

Amizade | Pony's voice creator

11/05/2022, 3:03 AM

Later I will see how epoch works

Reclezon

11/05/2022, 3:18 AM

Speaking of that here is a sample from one of that lucky 1 wave attempt.

Reclezon

11/05/2022, 3:19 AM

Can't dl audio from the first model

Amizade | Pony's voice creator

11/05/2022, 3:19 AM

Amizade | Pony's voice creator

11/05/2022, 3:20 AM

The voice sounds so cool

Amizade | Pony's voice creator

11/05/2022, 3:20 AM

no robotic and no repetitive

Reclezon

11/05/2022, 3:24 AM

I did start training it with a very low LR and kinda fell asleep halfway through so I'm actually surprised it turned out good. Sample using the earliest model i can grab audio from which does use more wavs (~~even more if I can get my shit together and do it :)~~)

Amizade | Pony's voice creator

11/05/2022, 3:27 AM

that's nice. you guys from uberduck create a lot of perfect voices

WeegeeFan1

11/10/2022, 4:40 PM

What does HIFI-GAN actaully do?

WeegeeFan1

11/10/2022, 4:41 PM

I know what the other 2 steps do, but not Hifigan.

WeegeeFan1

11/10/2022, 4:41 PM

I'm talking in relation to Talknet2 model training

hecko

11/10/2022, 5:04 PM

basically, talknet doesn't output raw audio because that's too hard to measure the quality of instead it outputs something called a spectrogram, which is a 2d image of what frequencies play where and then it's the job of hifi-gan to turn that into actual audio

WeegeeFan1

11/10/2022, 5:07 PM

Ahh

WeegeeFan1

11/10/2022, 5:07 PM

So why does running it longer make it sound better?

WeegeeFan1

11/10/2022, 5:12 PM

Also, I have a singer who has a somewhat rigid range of singing. Because I have not found any data of him singing certain notes, the AI singing will yell the sounds of a seizure in place of those notes. But once I give it a note I've actaully given it, it will do well. Is there a way to artificially generate these spaces in the vocal range? Is there a bit of the training I might be able to train longer to generate this?

hecko

11/10/2022, 5:32 PM

because you're training it to get good at specifically converting the current model's output into stuff that sounds like the training data the base model is already pretty good granted

hecko

11/10/2022, 5:33 PM

you could try making pitch-shifted copies of the data using something like melodyne or newtone or vocalshifter

WeegeeFan1

11/10/2022, 5:33 PM

Would that be cheating or do you just need to do that sometimes?

hecko

11/10/2022, 5:33 PM

why would it be cheating

hecko

11/10/2022, 5:34 PM

why would anything be cheating really

WeegeeFan1

11/10/2022, 5:34 PM

I'll start keeping a note of every half-step I have musical data of

hecko

11/10/2022, 5:34 PM

if it gives a better result then by all means do it