https://uberduck.ai/ logo
Join Discord
Powered by
# machine-learning
  • z

    zwf

    03/25/2021, 7:20 PM
    nice, good luck. it's a lot of fun
  • z

    zwf

    03/27/2021, 7:09 PM
    @User Here's the glowTTS repo, They have a notebook ("Inference.ipynb") that you can use
  • z

    zwf

    03/27/2021, 7:09 PM
    I've never tried it myself
  • u

    user

    03/27/2021, 7:09 PM
    How to run inference.ipynb
  • u

    user

    03/27/2021, 7:10 PM
    Can we use pretrained model?
  • z

    zwf

    03/27/2021, 7:11 PM
    yes, they link a pretrained model in the repo README.md
  • z

    zwf

    03/27/2021, 7:11 PM
    Oops, I forgot the link
  • z

    zwf

    03/27/2021, 7:11 PM
    https://github.com/jaywalnut310/glow-tts
  • u

    user

    03/27/2021, 7:11 PM
    Can we use the michael rosen datasets?
  • u

    user

    03/27/2021, 7:11 PM
    You're asking ZWF to hand over his datasets for you?
  • u

    user

    03/27/2021, 7:12 PM
    i made a michael R dataset already
  • z

    zwf

    03/27/2021, 7:12 PM
    you put together the audio, but you didn't transcribe it, which is the time-consuming part
  • u

    user

    03/27/2021, 7:12 PM
    how to transcribe it?
  • s

    SidPlays_144p

    03/27/2021, 7:13 PM
    you can use Descript or you can try to do it yourself
  • z

    zwf

    03/27/2021, 7:13 PM
    I've found Descript to be really useful
  • u

    user

    03/27/2021, 7:13 PM
    what is this?
  • z

    zwf

    03/27/2021, 7:13 PM
    Although you still need to go back through and correct the transcriptions
  • z

    zwf

    03/27/2021, 7:14 PM
    It's a program that lets you edit audio like text https://www.descript.com/
  • z

    zwf

    03/27/2021, 7:15 PM
    So yeah, basically the input to the model is a text file that looks like:
    Copy code
    path/to/wav/1.wav|Transcription of the first file.
    path/to/wav/2.wav|Transcription of the second file.
  • u

    user

    03/27/2021, 7:15 PM
    oh
  • z

    zwf

    03/27/2021, 7:15 PM
    where each individual wav is between 1 and 10 seconds
  • u

    user

    03/27/2021, 7:15 PM
    also we can write
  • z

    zwf

    03/27/2021, 7:16 PM
    I make my datasets using Descript, so if you create a Descript project where each paragraph contains 1 to 10 seconds of audio then I can easily make the training set
  • u

    user

    03/27/2021, 7:17 PM
    michael rosen needs to be updated vo.codes
  • z

    zwf

    03/27/2021, 7:18 PM
    yeah, sounds like high-fidelity models are coming on vo.codes though
  • z

    zwf

    03/27/2021, 7:18 PM
    cuz they have more funding now
  • m

    Monero

    03/29/2021, 11:46 PM
    @zwf have you used Nvidia Nemo?
  • z

    zwf

    03/29/2021, 11:46 PM
    I've seen it, but never used it
  • m

    Monero

    03/30/2021, 12:01 AM
    Going to try it out, been downloading the docker container for the last hour 😅
  • z

    zwf

    03/30/2021, 12:06 AM
    nice, let us know how it goes!
1...567...1068Latest