https://uberduck.ai/ logo
Join Discord
Powered by
# machine-learning
  • m

    mepc36

    01/06/2023, 3:03 PM
    What's d-id?
  • m

    mepc36

    01/06/2023, 3:03 PM
    This Im guessing? https://www.d-id.com/speaking-portrait/
  • z

    zwf

    01/06/2023, 3:05 PM
    Yep
  • m

    mepc36

    01/06/2023, 3:08 PM
    Does uberduck have any speak-portrait products, even in dev? I'd be really interested in using one of your guys' solution, landing page doesn't list any though
  • h

    hecko

    01/06/2023, 3:08 PM
    open-source software
  • h

    hecko

    01/06/2023, 3:09 PM
    though perhaps the more relevant term would be "self-hosted"
  • m

    mepc36

    01/06/2023, 3:09 PM
    thank you sir, I got makeittalk working yesterday but it takes too long (30 seconds, which is not bad overall, but for an end user is a lifetime)
  • m

    mepc36

    01/06/2023, 3:09 PM
    Sorry, didn't do direct reply: Does uberduck have any speak-portrait products, even in dev? I'd be really interested in using one of your guys' solution, landing page doesn't list any though
  • h

    hecko

    01/06/2023, 3:10 PM
    the pricing page does advertise "clone your face or voice" but i haven't heard what tech it uses or if it's even active
  • z

    zwf

    01/06/2023, 3:12 PM
    We don't, sorry.
  • m

    mepc36

    01/06/2023, 3:24 PM
    All good thank you sir!
  • j

    Justin

    01/06/2023, 3:25 PM
    You can set it up locally as well
  • h

    Heath

    01/06/2023, 9:12 PM
    What is state of the art for swapping a face in a video, including the original hair possible? From photo possible?
  • p

    PixPrucer

    01/07/2023, 11:14 PM

    https://youtu.be/7mUr-8h60kM▾

  • p

    PixPrucer

    01/07/2023, 11:14 PM
    So there's this singing AI code library called NNSVS
  • u

    {K EY1} (Kei)

    01/08/2023, 12:05 AM
    Nnsvs my beloved
  • r

    Reclezon

    01/10/2023, 1:56 AM
    https://arstechnica.com/information-technology/2023/01/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio/
  • u

    (Dawn) Will Draw Fictional Women

    01/10/2023, 2:16 AM
    https://github.com/microsoft/unilm
  • u

    (Dawn) Will Draw Fictional Women

    01/10/2023, 2:16 AM
    quick digging brings this up
  • p

    PeaNutsAreGood

    01/10/2023, 2:36 AM
    is vall-e kind of like tortoise architecturally?
  • h

    hecko

    01/10/2023, 11:04 AM
    at a glance the answer seems to be no tortoise used diffusion and clip and stuff, whereas vall-e is like a language model but trained on encoded audio so like stable diffusion vs gpt/vqgan
  • h

    hecko

    01/10/2023, 11:05 AM
    or you could call it reverse whisper maybe
  • u

    (Dawn) Will Draw Fictional Women

    01/10/2023, 11:25 AM
    yell
  • j

    Justin

    01/10/2023, 1:29 PM
    ow
  • w

    WeegeeFan1

    01/11/2023, 10:51 AM
    @hecko this sounds like almost exactly what your trying to put together
  • w

    WeegeeFan1

    01/11/2023, 10:52 AM
    https://www.euronews.com/next/2023/01/10/after-chatgpt-and-dalle-meet-vall-e-the-text-to-speech-ai-that-mimics-anyones-voice
  • h

    hecko

    01/11/2023, 1:24 PM
    pretty much, but they probably used regular human speech and i want character voices
  • h

    hecko

    01/11/2023, 1:26 PM
    there's actually this thing called tortoise tts that was made 8 months ago, trained on 50k hours, and it's probably the best regular human speech tts i've ever heard
  • h

    hecko

    01/11/2023, 1:26 PM
    plus it accepts a speaker embedding so i can mix voices together, whereas vall-e takes in sample audio directly
  • w

    WeegeeFan1

    01/12/2023, 2:23 AM
    That's interesting i'll look at that
1...103210331034...1068Latest