Uberduck

How do you guys decide how long to train a model for?

Would it work with just training hifigan with 44khz data?

Graph will look diagonal and it will have bright yellow pixels
Loss should be less than 0.10 or 0.15

<@424693503918866457> thanks for your answer man. How should I do it if I don't have the graphs because I'm not using one of uberduck's notebooks? 

Below are the logs describing loss that I have. Which one are you referring to should be between 0.10 and 0.15, the `loss_mel` one? Sorry if they don't make sense, I'm using radtts:

```
iter: 6366  (2.50 s)  |  lr: 0.001  |  loss_mel: -1.425  |  loss_prior_mel: 0.508  |  loss_ctc: 3.031  |  loss_duration: 0.214  |  loss_f0: 0.002  |  loss_energy: 0.003  |  loss_vpred: 0.963  |  binarization_loss: 0.450
```

RADTTS repo: https://github.com/NVIDIA/radtts

mel refers to the mel spectograms its generating from either a sample inference of the dataset i think?

Is there any documentation on training it? I have not found any

I'm no researcher so I can't tell you that

Should I preserve punctuations before training?

It can be helpful if you want a given voice to differentiate between intonations bonded to each punctuation mark

Thanks. Is there a benefit if you train the model as multispeaker? Such as better synthesizing quality for each speaker? Or using it as a base model?

I didn't try yet, but I don't think separating punctuations as separate speakers benefits in any way

No i mean training five voices instead of one in one model

i've heard it ends up making the voices sound worse than if they were trained separately

Makes sense considering the network must be divided to learn all 5 voices instead of one concrete

gpt-3 learned half the internet and rocks at it more than it would if it just learned one topic

but i guess overfitting is good for tts because everypony loves doing it

it should in theory help the other that are underperforming

but if like 4/5 speakers are underperforming then the whole model is screwed

even the datasets that were perfectly fine on its own

man 😭 not to hate on 15.ai but his voices are sounding to sound like heavy smokers

i cant really tell what they are saying anymore

i just hope they sound less raspy now like damn