Uberduck #talknet-support

Join Discord

Reclezon

12/27/2022, 3:34 AM

Hold the fuck up

Reclezon

12/27/2022, 3:34 AM

There's a web app for audacity?

Reclezon

12/27/2022, 3:35 AM

PixPrucer

12/28/2022, 6:48 AM

not really talknet but HiFi-GAN finetuning Currently trying to run the notebook but I'm getting stuck at the

Compose required files

cell

PixPrucer

12/28/2022, 6:48 AM

No matter where I put the tacotron model at it refuses to load it in (I even tried putting it into

/content/hifi-gan/

folder and same error occurs)

xomnow

12/30/2022, 12:20 AM

Question on your recommended settings if I may... I've got the notebook running locally on my gpu and I'm able to run larger batch sizes that the defaults. Is there any benefit to this or do I run the risk of overfitting my models if I tinker with batch size?

Gosmokeless28

12/30/2022, 2:34 AM

The only benefit of using a larger batch size is that it makes the training happen faster—but at the risk of underfitting the model.

xomnow

12/30/2022, 2:36 AM

thanks for that - one other question if I may - should the learning rates for the other training steps remain default, or is there benefit in changing to the 1e-4/3e-7 on those as well?

Gosmokeless28

12/30/2022, 2:37 AM

I've never tinkered with those, actually. In my opinion, it's alright to leave them unchanged.

Gosmokeless28

12/30/2022, 2:37 AM

I only change the Spectrogram Generator's parameters

xomnow

12/30/2022, 2:39 AM

ty kindly for the answers - I have a suspicion I've been over-training things since I worked out completely local training. I have solid datasets and fully checked transcriptions but the results haven't been nearly as good as I expected

Gosmokeless28

12/30/2022, 2:40 AM

It took me months to realize that I have been actually lowering the learning rates instead of raising them, lol.

Gosmokeless28

12/30/2022, 2:41 AM

I thought 1e-4 & 3e-7 were higher than 1e-3 & 3e-6. It wasn't until recently that I learned that those are negative numbers.

xomnow

12/30/2022, 2:46 AM

is there any other "special sauce" advice you think particularly relevant? all my data is from audiobooks with generally clear recordings, but I can't seem to get past the slight metallic tinge when I synthesize

Gosmokeless28

12/30/2022, 2:49 AM

You should train the TalkNet model's HiFi-GAN vocoder for 3,100 epochs (Not to be confused with 3,100 steps).

xomnow

12/30/2022, 2:50 AM

I want to make sure I have that right... 3100 epochs? this is contrary to where you said 5k steps (though, to be sure, you mention not to worry about overfitting)

Gosmokeless28

12/30/2022, 2:51 AM

For HiFi-GAN, training for as many epochs as you can actually causes the vocoder to perform better for the TalkNet model it belongs to.

xomnow

12/30/2022, 2:52 AM

this is a bit of a weird question - roughly how many steps are in an epoch? I can't for the life of me get my local running copy to output steps/epochs as the colab one does. I ginned up a way to parse the log to see steps, but I don't see epochs

Gosmokeless28

12/30/2022, 2:55 AM

Good question. That depends on the amount of data you're training the model with. If you're training HiFi-GAN with a large amount of data, there are many steps per epoch.

xomnow

12/30/2022, 2:56 AM

I'm trying to look back on old runs on colab - looks like ~650/epoch, something in that neighborhood

xomnow

12/30/2022, 2:57 AM

wow, so. something akin to ~2mil steps

hudmaceachern

12/31/2022, 2:07 PM

for some reason now this happens: --------------------------------------------------------------------------- MessageError Traceback (most recent call last) in 1 #@markdown Step 2: Mount Google Drive. 2 from google.colab import drive ----> 3 drive.mount('drive') 3 frames /usr/local/lib/python3.8/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec) 100 reply.get('colab_msg_id') == message_id): 101 if 'error' in reply: --> 102 raise MessageError(reply['error']) 103 return reply.get('data', None) 104 MessageError: Error: credential propagation was unsuccessful

Reclezon

12/31/2022, 2:54 PM

It didn't sign into GDrive

Felipixel

12/31/2022, 8:29 PM

Hello, I have a question, I'm training a singing talknet model with 228 wavs, what are the best steps for it? I trained one model before with 150 wavs and if I'm not wrong I had around 4000 steps, this new 228 wav model was trained around 5300 steps, and it sounded horrible compared to the 150 wav one, then I tried to test and train again my old 150 wav model this time with 10k steps and it end up sounding as bad as the 228 wavs one TL;DR: What are the best steps while training a 228 wavs model?

Alexius08

01/02/2023, 6:40 AM

The reference audio I'm using is still messy after using vocal isolators on it. Is the "debug pitch" button revealing the version of the reference audio being used to generate the resulting audio? It's something I might be able to clean up.

(Dawn) Will Draw Fictional Women

01/02/2023, 7:03 AM

no its

(Dawn) Will Draw Fictional Women

01/02/2023, 7:03 AM

debugging the pitch detection

(Dawn) Will Draw Fictional Women

01/02/2023, 7:04 AM

it takes the clip as a whole because it has to take into account both pitch and duration

WeegeeFan1

01/03/2023, 1:14 AM

The first is SUPPOSED to be the regurtitation of the second. I trained the model on the dataset that includes the second clip. Something isn't normal

WeegeeFan1

01/03/2023, 1:15 AM

I'm really hoping Diff SVC can have some of talknet and the way it should function into it