Uberduck #machine-learning

Join Discord

mepc36

01/06/2023, 3:03 PM

What's d-id?

mepc36

01/06/2023, 3:03 PM

This Im guessing? https://www.d-id.com/speaking-portrait/

zwf

01/06/2023, 3:05 PM

Yep

mepc36

01/06/2023, 3:08 PM

Does uberduck have any speak-portrait products, even in dev? I'd be really interested in using one of your guys' solution, landing page doesn't list any though

hecko

01/06/2023, 3:08 PM

open-source software

hecko

01/06/2023, 3:09 PM

though perhaps the more relevant term would be "self-hosted"

mepc36

01/06/2023, 3:09 PM

thank you sir, I got makeittalk working yesterday but it takes too long (30 seconds, which is not bad overall, but for an end user is a lifetime)

mepc36

01/06/2023, 3:09 PM

Sorry, didn't do direct reply: Does uberduck have any speak-portrait products, even in dev? I'd be really interested in using one of your guys' solution, landing page doesn't list any though

hecko

01/06/2023, 3:10 PM

the pricing page does advertise "clone your face or voice" but i haven't heard what tech it uses or if it's even active

zwf

01/06/2023, 3:12 PM

We don't, sorry.

mepc36

01/06/2023, 3:24 PM

All good thank you sir!

Justin

01/06/2023, 3:25 PM

You can set it up locally as well

Heath

01/06/2023, 9:12 PM

What is state of the art for swapping a face in a video, including the original hair possible? From photo possible?

PixPrucer

01/07/2023, 11:14 PM

https://youtu.be/7mUr-8h60kM▾

PixPrucer

01/07/2023, 11:14 PM

So there's this singing AI code library called NNSVS

{K EY1} (Kei)

01/08/2023, 12:05 AM

Nnsvs my beloved

Reclezon

01/10/2023, 1:56 AM

https://arstechnica.com/information-technology/2023/01/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio/

(Dawn) Will Draw Fictional Women

01/10/2023, 2:16 AM

https://github.com/microsoft/unilm

(Dawn) Will Draw Fictional Women

01/10/2023, 2:16 AM

quick digging brings this up

PeaNutsAreGood

01/10/2023, 2:36 AM

is vall-e kind of like tortoise architecturally?

hecko

01/10/2023, 11:04 AM

at a glance the answer seems to be no tortoise used diffusion and clip and stuff, whereas vall-e is like a language model but trained on encoded audio so like stable diffusion vs gpt/vqgan

hecko

01/10/2023, 11:05 AM

or you could call it reverse whisper maybe

(Dawn) Will Draw Fictional Women

01/10/2023, 11:25 AM

yell

Justin

01/10/2023, 1:29 PM

WeegeeFan1

01/11/2023, 10:51 AM

@hecko this sounds like almost exactly what your trying to put together

WeegeeFan1

01/11/2023, 10:52 AM

https://www.euronews.com/next/2023/01/10/after-chatgpt-and-dalle-meet-vall-e-the-text-to-speech-ai-that-mimics-anyones-voice

hecko

01/11/2023, 1:24 PM

pretty much, but they probably used regular human speech and i want character voices

hecko

01/11/2023, 1:26 PM

there's actually this thing called tortoise tts that was made 8 months ago, trained on 50k hours, and it's probably the best regular human speech tts i've ever heard

hecko

01/11/2023, 1:26 PM

plus it accepts a speaker embedding so i can mix voices together, whereas vall-e takes in sample audio directly

WeegeeFan1

01/12/2023, 2:23 AM

That's interesting i'll look at that