https://uberduck.ai/ logo
Join Discord
Powered by
# tacotron-2-support
  • j

    JunKingchinaman

    04/17/2023, 11:26 PM
    @JunKingchinaman
  • j

    JunKingchinaman

    04/17/2023, 11:27 PM
    i now need clone face how much?
  • t

    TheOnlyZac

    04/18/2023, 2:00 AM
    Does anyone know if capitalization in the training set matters?
  • t

    TheOnlyZac

    04/18/2023, 2:01 AM
    in the transcripts
  • y

    YTR76

    04/18/2023, 2:08 AM
    It could matter
  • r

    Raw

    04/18/2023, 2:15 AM
    Is this the thread for questions ?
  • g

    Gosmokeless28

    04/18/2023, 6:54 AM
    Lol, no.
  • g

    Gosmokeless28

    04/18/2023, 6:54 AM
    No.
  • g

    Gosmokeless28

    04/18/2023, 6:54 AM
    Yes.
  • l

    Lady Espeon

    04/20/2023, 6:58 AM
    I feel utterly defeated. I have everything ready. The multiple wav files, the transcript... yet I can't successfully put it in tacotron, no matter what I do. I have spent the 4 days on this now to get everything 😞 (also I'm new to this)
  • l

    Lady Espeon

    04/20/2023, 7:47 AM

    https://cdn.discordapp.com/attachments/994486394049282058/1098515307330539580/Screenshot_20230420_034249_Google.jpg▾

  • g

    Gosmokeless28

    04/20/2023, 7:50 AM
    That doesn't look like the notebook that's linked in #841437191073955920
  • l

    Lady Espeon

    04/20/2023, 7:28 PM
    Sorry, I'm still not familiar. I did do the pipeline one but it wouldn't really work either. I gave up on it 😞
  • w

    wiry-church

    04/21/2023, 1:34 AM
    what does
    RuntimeError: shape '[1, 1, 147674]' is invalid for input of size 295348
    mean?
  • t

    The Watts and the Waves

    04/21/2023, 2:24 AM
    Wavs aren't formatted correctly if I recall
  • g

    gforonda

    04/21/2023, 11:36 AM
    this is my upcoming Lammy (parappa the rapper) model sample at, 180 epoch's https://cdn.discordapp.com/attachments/994486394049282058/1098935225230307408/paradd.wav

    https://cdn.discordapp.com/attachments/994486394049282058/1098935225557467206/Lammy_Worried.png▾

  • h

    hecko

    04/21/2023, 5:47 PM
    means you didn't set the sample rate to 22050hz
  • u

    みんなで語る未解決事件

    04/21/2023, 9:25 PM
    Has anyone built code to batch combine Whisper and Tacotron2 dataset wav formats? example (wavs/1-1.wav|How are you.) wavs/1-2.wav|I want car.) A no-programmer would lose a day of leisure time just to add three minutes worth of data sets 🥺. I am thinking it would be great if it would work with collab and output the transcription of the audio files in a specified folder to a txt file. However, I am not able to write the code.
  • h

    hecko

    04/21/2023, 11:29 PM
    is this what you want? https://colab.research.google.com/drive/1ipCilWgbrBcECU29OH80F0MyPL8KdLUh it's in #841437191073955920
  • u

    みんなで語る未解決事件

    04/22/2023, 12:52 PM
    Ohmygod thx i see this colabcode
  • t

    tanooki426

    04/22/2023, 2:14 PM
    General question: does a lower batch size require a lower learning rate? Asking because my Derek Stiles impersonator has 98 voice clips which resulted in a batch size of 3 (6 minutes total worth of voice clips) but I kept the learning rate at the default level
  • s

    Sonic2022_mario

    04/22/2023, 5:18 PM
    Darn It, My GPU Limit Has Ended, I Can't Train Models

    https://cdn.discordapp.com/attachments/994486394049282058/1099383779417673758/Screenshot_2023-04-22_131824.jpg▾

  • m

    Minecraftian47 (make x from y)

    04/22/2023, 7:22 PM
    That's what alt accounts are for.
  • m

    Minecraftian47 (make x from y)

    04/22/2023, 7:22 PM
    Just dump all your models in one big shared folder.
  • g

    Gosmokeless28

    04/22/2023, 8:07 PM
    > does a lower batch size require a lower learning rate? Yes in some cases
  • t

    tanooki426

    04/22/2023, 8:11 PM
    So if I have a batch size of 3 but 10 minutes worth of voice clips, what should the learning rate be?
  • t

    tanooki426

    04/22/2023, 8:12 PM
    Also, should each voice clip in a dataset be under 10 seconds?
  • g

    Gosmokeless28

    04/22/2023, 8:15 PM
    Well, I'm not an expert on learning rate, but if I had to guess, I would say 5e-4 with learning rate decay disabled.
  • g

    Gosmokeless28

    04/22/2023, 8:16 PM
    Each voice clip in a dataset should be under 15 seconds long, if I recall correctly.
  • t

    tanooki426

    04/22/2023, 8:16 PM
    My current dataset has 98 voice clips totaling 6 minutes, but I think I'm going to add more and also split up some of the 10 second voice clips into intervals and see what happens
1...145146147...158Latest