https://uberduck.ai/ logo
Join Discord
Powered by
# machine-learning
  • r

    Reclezon

    12/29/2022, 11:46 PM
    I don't know what the large corps do with it, or if they even support it at all
  • m

    mepc36

    12/30/2022, 2:34 PM
    Anyone know what TTS uberduck uses under the hood to power their AI generated raps? I'm trying to do something similar but am having problems making the tempo of the rap line up with the tempo of the music behind the rap. Here's their AI-generated raps: https://app.uberduck.ai/rap
  • m

    mepc36

    12/30/2022, 2:35 PM
    It looks like they send a
    bpm
    parameter as part of the payload. Makes me wonder if there's a TTS package that accepts a
    bpm
    argument and then outputs a rap at that bpm
  • m

    mepc36

    12/30/2022, 4:51 PM
    OKay so I think it's the mellotron package: https://github.com/NVIDIA/mellotron/blob/master/mellotron_utils.py#L240
  • m

    mepc36

    12/30/2022, 4:51 PM
    Does anyone have a link to a mellotron inference notebook? I found an outdated one that relied on tensorflow v.1.15, but Colab removed supported for tensorflow v1: https://colab.research.google.com/github/yhgon/mellotron/blob/master/inference_colab.ipynb
  • h

    hecko

    12/30/2022, 4:52 PM
    i thought we abandoned mellotron like a year ago
  • h

    hecko

    12/30/2022, 4:53 PM
    my guess is either fastpitch or a custom version of tacotron that does rhythm
  • m

    mepc36

    12/30/2022, 5:23 PM
    oooo good idea
  • m

    mepc36

    12/30/2022, 5:23 PM
    I'll play around and report back with what I find, thanks
  • m

    mepc36

    12/30/2022, 5:24 PM
    I've found other sites that do similar TTS/music alignment (like https://melobytes.com/en/app/rap) so I'd be a little surprised if it's completely custom/proprietary code
  • h

    hecko

    12/30/2022, 5:32 PM
    oh alignment is idk
  • h

    hecko

    12/30/2022, 5:32 PM
    i know ditty and tiktok did it but i'm not aware of any open-source code for it
  • m

    mepc36

    12/30/2022, 5:34 PM
    yeah sorry, alignment is a vague term - but basically the ability to take a the output of a rapping TTS and then add music behind it whose tempo "aligns" with the tempo of the rapping TTS output
  • m

    mepc36

    12/30/2022, 5:34 PM
    is what I'm after
  • h

    hecko

    12/30/2022, 5:34 PM
    yeah i know what you mean
  • h

    hecko

    12/30/2022, 5:35 PM
    i haven't tried the demo so i just assume that they generate the alignment first and then synthesize using it
  • h

    hecko

    12/30/2022, 5:35 PM
    but a stretch approach could be a thing too
  • h

    hecko

    12/30/2022, 5:36 PM
    carykh tried that but settled on stretching on the verse level and hoping the rest still feels aligned-ish

    https://youtu.be/a0EyfdQ0QTQ▾

  • h

    hecko

    12/30/2022, 5:36 PM
    alternatively since tacotron outputs an alignment graph you could use it to detect the start of each syllable and quantize them
  • m

    mepc36

    12/30/2022, 5:39 PM
    > i haven't tried the demo so i just assume that they generate the alignment first and then synthesize using it Damn that's clever, Hadn't thought of it this way... what would be the inputs and outputs for this order of events? You input the song's lyrics, and get back a list of timestamps describing where each word occurs as an output? Or were you thinking of something else?
  • h

    hecko

    12/30/2022, 5:41 PM
    each syllable at least, each phoneme at best
  • m

    mepc36

    12/30/2022, 5:44 PM
    How could you synthesize using a map like that though? Like, is there an argument in the CLI command where you can pass it in or something to tacotron?
  • m

    mepc36

    12/30/2022, 5:46 PM
    oh shit that video references my website haha
  • m

    mepc36

    12/30/2022, 5:50 PM
    wow i love how he added ad libs at the end of every stanza, that's super smart
  • h

    hecko

    12/30/2022, 5:59 PM
    with tacotron there's support for inputting an attention graph, basically which phoneme of input the ai should pay attention to at any given time
  • h

    hecko

    12/30/2022, 6:00 PM
    talknet is similar except instead of being blurry it's discrete (one phoneme at 100% and everything else at 0%)
  • m

    mepc36

    12/30/2022, 6:12 PM
    Couldn't find this in the docs, you mind dropping a link to this docs section (or to an example of this attention graph please?)
  • m

    mepc36

    12/30/2022, 6:12 PM
    all my searches are just turning up alignment graphs
  • h

    hecko

    12/30/2022, 6:13 PM
    it's work-in-progress and not documented yet
  • m

    mepc36

    12/30/2022, 6:13 PM
    ah gotcha
1...102810291030...1068Latest