https://uberduck.ai/ logo
Join DiscordCommunities
Powered by
# machine-learning
  • m

    mepc36

    09/15/2022, 5:14 PM
    How do you guys decide how long to train a model for?
  • c

    Cris140

    09/15/2022, 5:16 PM
    Would it work with just training hifigan with 44khz data?
  • t

    tylerdurdenceketi

    09/15/2022, 5:16 PM
    Graph will look diagonal and it will have bright yellow pixels Loss should be less than 0.10 or 0.15
  • m

    mepc36

    09/15/2022, 5:20 PM
    @tylerdurdenceketi thanks for your answer man. How should I do it if I don't have the graphs because I'm not using one of uberduck's notebooks? Below are the logs describing loss that I have. Which one are you referring to should be between 0.10 and 0.15, the
    loss_mel
    one? Sorry if they don't make sense, I'm using radtts:
    Copy code
    iter: 6366  (2.50 s)  |  lr: 0.001  |  loss_mel: -1.425  |  loss_prior_mel: 0.508  |  loss_ctc: 3.031  |  loss_duration: 0.214  |  loss_f0: 0.002  |  loss_energy: 0.003  |  loss_vpred: 0.963  |  binarization_loss: 0.450
    RADTTS repo: https://github.com/NVIDIA/radtts
  • h

    haru0l

    09/15/2022, 5:58 PM
    mel refers to the mel spectograms its generating from either a sample inference of the dataset i think?
  • c

    cadeotsg

    09/15/2022, 8:30 PM
    sec
  • o

    OccultMC

    09/17/2022, 12:52 AM
    Is there any documentation on training it? I have not found any
  • p

    PixPrucer

    09/17/2022, 6:24 AM
    I'm no researcher so I can't tell you that
  • c

    cadeotsg

    09/17/2022, 6:42 AM
    sec
  • t

    tylerdurdenceketi

    09/17/2022, 8:51 AM
    Should I preserve punctuations before training?
  • p

    PixPrucer

    09/17/2022, 8:53 AM
    It can be helpful if you want a given voice to differentiate between intonations bonded to each punctuation mark
  • t

    tylerdurdenceketi

    09/17/2022, 9:19 AM
    Thanks. Is there a benefit if you train the model as multispeaker? Such as better synthesizing quality for each speaker? Or using it as a base model?
  • p

    PixPrucer

    09/17/2022, 9:24 AM
    I didn't try yet, but I don't think separating punctuations as separate speakers benefits in any way
  • t

    tylerdurdenceketi

    09/17/2022, 9:29 AM
    No i mean training five voices instead of one in one model
  • p

    PixPrucer

    09/17/2022, 9:32 AM
    Oh, this way
  • h

    hecko

    09/17/2022, 9:43 AM
    i've heard it ends up making the voices sound worse than if they were trained separately
  • p

    PixPrucer

    09/17/2022, 9:50 AM
    Makes sense considering the network must be divided to learn all 5 voices instead of one concrete
  • h

    hecko

    09/17/2022, 9:59 AM
    does it make sense though
  • h

    hecko

    09/17/2022, 10:00 AM
    gpt-3 learned half the internet and rocks at it more than it would if it just learned one topic
  • h

    hecko

    09/17/2022, 10:00 AM
    but i guess overfitting is good for tts because everypony loves doing it
  • h

    haru0l

    09/17/2022, 10:08 AM
    it should in theory help the other that are underperforming
  • h

    haru0l

    09/17/2022, 10:09 AM
    but if like 4/5 speakers are underperforming then the whole model is screwed
  • h

    haru0l

    09/17/2022, 10:09 AM
    even the datasets that were perfectly fine on its own
  • m

    mega b

    09/17/2022, 3:35 PM
    lolo
  • m

    mega b

    09/17/2022, 3:37 PM
    man 😭 not to hate on 15.ai but his voices are sounding to sound like heavy smokers
  • m

    mega b

    09/17/2022, 3:37 PM
    i cant really tell what they are saying anymore
  • m

    mega b

    09/17/2022, 3:38 PM
    aw 15 is down
  • u

    (Dawn) Will Draw Fictional Women

    09/17/2022, 3:53 PM
    bro embrace it we all hatin on 15
  • j

    Justin

    09/17/2022, 4:10 PM
    he said he was gonna update them models
  • i

    IBob012

    09/17/2022, 4:27 PM
    i just hope they sound less raspy now like damn
1...973974975...1068Latest