https://uberduck.ai/ logo
Join DiscordCommunities
Powered by
# machine-learning
  • h

    hecko

    09/23/2022, 8:54 AM
    of course translating to english works better
    Copy code
    [00:00.000 --> 00:02.740]  Natural Examination from the English language
    [00:02.740 --> 00:04.720]  The basic level
    [00:04.720 --> 00:08.640]  You will hear two times the texts from the first to the third
    [00:08.640 --> 00:11.720]  Before listening to each text, you will hear the sound
    [00:11.720 --> 00:13.320]  The sound
    [00:13.320 --> 00:16.800]  In the recording, there are interruptions to get acquainted with the commands
    [00:16.800 --> 00:19.600]  and the content of the task is signalized by the sound
    [00:19.600 --> 00:21.040]  The sound
    [00:21.040 --> 00:24.160]  Unlock the individual tasks during listening to the recording
    [00:24.160 --> 00:28.160]  and during the break after listening to them
    [00:28.160 --> 00:30.600]  The first task
    [00:30.600 --> 01:00.560]  Read the command and get acquainted with the content of the task
  • h

    hecko

    09/23/2022, 8:55 AM
    this also illustrates another quirk, that being how sometimes whisper just decides to not do punctuation and sticks with it
  • h

    hecko

    09/23/2022, 8:59 AM
    i think that could be helped with the
    prompt
    option
  • h

    hecko

    09/23/2022, 9:00 AM
    give it a sample of what we want the style to be, e.g.
    Doctor number five is gonna like, fix the, the thing.
  • h

    hecko

    09/23/2022, 9:01 AM
    prompt engineering in speech recognition what has the world come to
  • h

    hecko

    09/23/2022, 9:10 AM
    ok so
    prompt
    doesn't do anything
  • h

    hecko

    09/23/2022, 9:11 AM
    prefix
    though makes it go from
    15. Burger King Foot Lettuce The last thing you'd want in your Burger King burger is someone's foot fungus, but as it turns out, that might be what you get.
    to
    Number 15, Burger King foot lettuce. The last thing you'd want in your Burger King burger is someone's foot fungus, but as it turns out, that might be what you get.
  • h

    hecko

    09/23/2022, 9:11 AM
    which i think is a decent improvement
  • h

    hecko

    09/23/2022, 9:11 AM
    @mega b prompt engineering real
  • h

    hecko

    09/23/2022, 9:13 AM
    i will note that
    prefix
    seems to be for when the text is in fact represented in the audio, since my more targeted prompt
    Number twelve. Number thirteen. Number fourteen.
    made it skip over number fifteen entirely
  • h

    hecko

    09/23/2022, 12:38 PM
    update: can be transcribed with the stutters by adding
    uh, um, like, well,
    but it adds a ghost
    this is,
    at the start
  • h

    hecko

    09/23/2022, 12:40 PM
    never thought i'd enjoy an autoregressive model
  • m

    mega b

    09/23/2022, 3:25 PM
    Cools, but what sucks about prompting is that it costs some tokens to be used. It's obviously still useful tho, so it's probably worth endeavoring to conclude the most efficient prompts.
  • h

    hecko

    09/23/2022, 3:27 PM
    there's a token cost?
  • h

    hecko

    09/23/2022, 3:27 PM
    or do you mean because of attention
  • h

    hecko

    09/23/2022, 3:27 PM
    or
  • h

    hecko

    09/23/2022, 3:27 PM
    bhh words
  • h

    hecko

    09/23/2022, 3:28 PM
    --- i wonder if we could do a softprompt
  • h

    hecko

    09/23/2022, 3:34 PM
    barring that here's what a human-engineered prompt should include: - an unabbreviated title, e.g.
    Doctor
    ,
    Miss
    etc - a number word - punctuation (maybe even ungrammatical-but-matching-intonation) - filler words - a swear word (there's at least one reported instance of it doing censorship but it seems to be rare) - be unlikely to be the start of a sentence because attention could mess things up
  • m

    mega b

    09/23/2022, 3:35 PM
    I could be wrong but that might be why they cut to 30 seconds audio automatically
  • m

    mega b

    09/23/2022, 3:36 PM
    I haven't seen censoring, I've seen the f word in rap God and "motherfucker" a ton
  • m

    mega b

    09/23/2022, 3:36 PM
    I'm sure YouTube censors tho
  • h

    hecko

    09/23/2022, 3:36 PM
    yeah i'm just going off of one reddit comment
  • m

    mega b

    09/23/2022, 3:36 PM
    Maybe they scraped a tiny bit of yt
  • h

    hecko

    09/23/2022, 3:36 PM
    to which the immediate reply was "haha closedai puritans"
  • m

    mega b

    09/23/2022, 3:36 PM
    😔
  • h

    hecko

    09/23/2022, 3:42 PM
    pff i accidentally put an instrumental through it and the hallucination is real
  • m

    mega b

    09/23/2022, 4:07 PM
    it should've outputted musical notes
  • h

    hecko

    09/23/2022, 4:10 PM
    if only they trained it on the subtitles for
    tom scott plus
  • h

    hecko

    09/23/2022, 4:15 PM
    actually hm a softprompt would be better anyway because it could affect things like how often it splits the audio (for link2model 2)
1...979980981...1068Latest