https://uberduck.ai/ logo
Join DiscordCommunities
Powered by
# machine-learning
  • t

    tylerdurdenceketi

    09/15/2022, 8:16 AM
    hello is there a way to train at lower sample rates without issues?
  • p

    PixPrucer

    09/15/2022, 8:19 AM
    Why would you want to train on anything lower than 22.05kHz though?
  • p

    PixPrucer

    09/15/2022, 8:19 AM
    I can't imagine anything else than reducing the training time or fitting the parameters to the WAV clips
  • t

    tylerdurdenceketi

    09/15/2022, 8:27 AM
    both currently, i am trying to train on turkish dataset has wavs with 16khz tried 22khz with international phonetic alphabet but didn't work
  • p

    PixPrucer

    09/15/2022, 8:28 AM
    You can upsample the wavs to be 22kHz easily
  • t

    tylerdurdenceketi

    09/15/2022, 8:28 AM
    yeah but it increases the training time 😦
  • p

    PixPrucer

    09/15/2022, 8:28 AM
    Not by much
  • p

    PixPrucer

    09/15/2022, 8:29 AM
    You'll be fine
  • p

    PixPrucer

    09/15/2022, 8:29 AM
    The real pain is training full-resolution WAVs with 44.1kHz I don't think anyone attempted that yet here
  • t

    tylerdurdenceketi

    09/15/2022, 8:30 AM
    that's overkill
  • p

    PixPrucer

    09/15/2022, 8:31 AM
    I actually accidentally trained a 44khz model once and it was fine, but 0.5 tempo 🧑‍🦲
  • p

    PixPrucer

    09/15/2022, 8:31 AM
    Oh and the pitch was very much broken, metal growl kind of vibe
  • t

    tylerdurdenceketi

    09/15/2022, 8:32 AM
    resampling and nyquist shit probably
  • t

    tylerdurdenceketi

    09/15/2022, 8:33 AM
    what would you suggest about alphabets? should i use ipa or turkish alphabet (basic cleaner)
  • p

    PixPrucer

    09/15/2022, 8:33 AM
    I've heard IPA produces more accurate results prior to text transcripts
  • t

    tylerdurdenceketi

    09/15/2022, 8:35 AM
    how much wav should i have i got a dataset with transcriptions from internet i have cleaned transcriptions and such in the end i got 251 wav file
  • p

    PixPrucer

    09/15/2022, 8:37 AM
    I'd count by minutes, because a 250 WAV dataset can have either 7 or 25 minutes of data
  • t

    tylerdurdenceketi

    09/15/2022, 8:39 AM
    that's not enough i suppose i have tried ipa with 10 speakers speech was unrecognizable i will try to make a synthetic dataset using tts reader or something
  • p

    PixPrucer

    09/15/2022, 8:45 AM
    That will do pretty good as a basemodel
  • h

    hecko

    09/15/2022, 8:52 AM
    i vaguely tried but it seems i'd at least have to make a new base model
  • h

    hecko

    09/15/2022, 8:52 AM
    which takes weeks
  • h

    hecko

    09/15/2022, 8:52 AM
    especially on t4
  • h

    hecko

    09/15/2022, 8:52 AM
    ipa won't work on uberduck without extra coding
  • p

    PixPrucer

    09/15/2022, 9:31 AM
    Ah r.i.p
  • u

    {K EY1} (Kei)

    09/15/2022, 1:42 PM
    That's why a bunch of my models aren't on uberduck 🤭
  • c

    Cris140

    09/15/2022, 3:03 PM
    The way it's been implemented crashes after some time because of Ram, I had to change some stuff to get it working, but now it's working perfectly
  • z

    zwf

    09/15/2022, 3:04 PM
    you're the man 💯
  • h

    HolyArapaima

    09/15/2022, 3:04 PM
    I trained a model before this with accidentally a different sample rate from everything else and it came out half demon. When I fixed it the training came out worse somehow so I am gonna do some investigations today and see whats up.
  • h

    hecko

    09/15/2022, 3:24 PM
    iirc setting the sample rate to 44100 while the wavs are 44100 makes the audio sound chipmunky
  • h

    hecko

    09/15/2022, 3:25 PM
    and to get a proper pitch you have to do 22050 * √2 and even then it's slower than it should be
1...972973974...1068Latest