Uberduck #tacotron-2-support

Join Discord

k24789304

03/20/2023, 9:10 PM

lol yea a single 2 hour long wav

Gabherelol

03/20/2023, 9:11 PM

I don’t think that’s how it works

k24789304

03/20/2023, 9:11 PM

yea i dont know how things work

Minecraftian47 (make x from y)

03/20/2023, 9:20 PM

Must be split up between clips of various lengths.

Cris140

03/20/2023, 9:21 PM

Follow the tutorial, then start going into the notebooks

Gosmokeless28

03/21/2023, 12:11 AM

I don't blame you. There aren't any tutorials linked in plain sight on uberduck.ai.

k24789304

03/21/2023, 6:27 AM

i remember looking at uberduck's website for the tutorial but it was outdated hence im here

k24789304

03/21/2023, 6:27 AM

maybe there is a tutorial link here on discord that i havent noticed yet?

Gosmokeless28

03/21/2023, 8:51 AM

You mean like the one that's linked in #841422801965416538?

k24789304

03/21/2023, 12:51 PM

what might be the issue for this error? i dont think its due to incorrect sampling rate. do i need to decompress the wav files? its error from transcribing notebook

Copy code

Error                                     Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/whisper/audio.py in load_audio(file, sr)
     41         out, _ = (
---> 42             ffmpeg.input(file, threads=0)
     43             .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr)

4 frames
Error: ffmpeg error (see stderr output for detail)

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/whisper/audio.py in load_audio(file, sr)
     45         )
     46     except ffmpeg.Error as e:
---> 47         raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}")
     48 
     49     return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0

RuntimeError: Failed to load audio: ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the 
wavs/wavs.zip: Invalid data found when processing input

Cris140

03/21/2023, 3:13 PM

First, are you using one big file or cut pieces of audio to transcribe?

k24789304

03/21/2023, 3:24 PM

just a bunch of 19 seconds wav files

k24789304

03/21/2023, 3:25 PM

roughly 400 files

k24789304

03/21/2023, 3:25 PM

inside the zip

Cris140

03/21/2023, 3:36 PM

You will need 15 seconds or less in each file

Cris140

03/21/2023, 3:36 PM

otherwise you won't be able to train

k24789304

03/21/2023, 4:01 PM

swag

k24789304

03/21/2023, 4:03 PM

what is the limit on how many files free version of google colab can handle?

k24789304

03/21/2023, 4:36 PM

should these files be named in a specific way

Reclezon

03/21/2023, 4:43 PM

No. It's easier to use number labels like

1.wav

2.wav

, etc

Reclezon

03/21/2023, 4:43 PM

So most people do that

k24789304

03/21/2023, 4:55 PM

i should upload the file as zip right? i shouldnt decompress all the wav files inside

Reclezon

03/21/2023, 5:08 PM

The whole dataset should be just 1 .zip

Reclezon

03/21/2023, 5:09 PM

I don't think it expects nested zips?

k24789304

03/21/2023, 8:06 PM

dont have any nested zips or folders within, just wav files

Gosmokeless28

03/21/2023, 8:28 PM

12*

Gosmokeless28

03/21/2023, 8:29 PM

That's the cutoff duration

Cris140

03/21/2023, 8:29 PM

T4 can handle 15 seconds

Gosmokeless28

03/21/2023, 8:29 PM

Cris140

03/21/2023, 8:29 PM

If the batch size is 18 or lower