Uberduck #machine-learning

Reclezon

12/29/2022, 11:46 PM

I don't know what the large corps do with it, or if they even support it at all

mepc36

12/30/2022, 2:34 PM

Anyone know what TTS uberduck uses under the hood to power their AI generated raps? I'm trying to do something similar but am having problems making the tempo of the rap line up with the tempo of the music behind the rap. Here's their AI-generated raps: https://app.uberduck.ai/rap

mepc36

12/30/2022, 2:35 PM

It looks like they send a

bpm

parameter as part of the payload. Makes me wonder if there's a TTS package that accepts a

bpm

argument and then outputs a rap at that bpm

mepc36

12/30/2022, 4:51 PM

OKay so I think it's the mellotron package: https://github.com/NVIDIA/mellotron/blob/master/mellotron_utils.py#L240

mepc36

12/30/2022, 4:51 PM

Does anyone have a link to a mellotron inference notebook? I found an outdated one that relied on tensorflow v.1.15, but Colab removed supported for tensorflow v1: https://colab.research.google.com/github/yhgon/mellotron/blob/master/inference_colab.ipynb

hecko

12/30/2022, 4:52 PM

i thought we abandoned mellotron like a year ago

hecko

12/30/2022, 4:53 PM

my guess is either fastpitch or a custom version of tacotron that does rhythm

mepc36

12/30/2022, 5:23 PM

oooo good idea

mepc36

12/30/2022, 5:23 PM

I'll play around and report back with what I find, thanks

mepc36

12/30/2022, 5:24 PM

I've found other sites that do similar TTS/music alignment (like https://melobytes.com/en/app/rap) so I'd be a little surprised if it's completely custom/proprietary code

hecko

12/30/2022, 5:32 PM

oh alignment is idk

hecko

12/30/2022, 5:32 PM

i know ditty and tiktok did it but i'm not aware of any open-source code for it

mepc36

12/30/2022, 5:34 PM

yeah sorry, alignment is a vague term - but basically the ability to take a the output of a rapping TTS and then add music behind it whose tempo "aligns" with the tempo of the rapping TTS output

mepc36

12/30/2022, 5:34 PM

is what I'm after

hecko

12/30/2022, 5:34 PM

yeah i know what you mean

hecko

12/30/2022, 5:35 PM

i haven't tried the demo so i just assume that they generate the alignment first and then synthesize using it

hecko

12/30/2022, 5:35 PM

but a stretch approach could be a thing too

hecko

12/30/2022, 5:36 PM

carykh tried that but settled on stretching on the verse level and hoping the rest still feels aligned-ish

https://youtu.be/a0EyfdQ0QTQ▾

hecko

12/30/2022, 5:36 PM

alternatively since tacotron outputs an alignment graph you could use it to detect the start of each syllable and quantize them

mepc36

12/30/2022, 5:39 PM

> i haven't tried the demo so i just assume that they generate the alignment first and then synthesize using it Damn that's clever, Hadn't thought of it this way... what would be the inputs and outputs for this order of events? You input the song's lyrics, and get back a list of timestamps describing where each word occurs as an output? Or were you thinking of something else?

hecko

12/30/2022, 5:41 PM

each syllable at least, each phoneme at best

mepc36

12/30/2022, 5:44 PM

How could you synthesize using a map like that though? Like, is there an argument in the CLI command where you can pass it in or something to tacotron?

mepc36

12/30/2022, 5:46 PM

oh shit that video references my website haha

mepc36

12/30/2022, 5:50 PM

wow i love how he added ad libs at the end of every stanza, that's super smart

hecko

12/30/2022, 5:59 PM

with tacotron there's support for inputting an attention graph, basically which phoneme of input the ai should pay attention to at any given time

hecko

12/30/2022, 6:00 PM

talknet is similar except instead of being blurry it's discrete (one phoneme at 100% and everything else at 0%)

mepc36

12/30/2022, 6:12 PM

Couldn't find this in the docs, you mind dropping a link to this docs section (or to an example of this attention graph please?)

mepc36

12/30/2022, 6:12 PM

all my searches are just turning up alignment graphs

hecko

12/30/2022, 6:13 PM

it's work-in-progress and not documented yet

mepc36

12/30/2022, 6:13 PM

ah gotcha