Uberduck

https://app.uberduck.ai/voice-model/36031100-5842-476e-95c4-a77bfebbb3fc

though it seems to be dependent on sample rate even though it said it would normalize it smhead

cruise seals clip finds the right thing at 4th place, not bad

```py
# insert imports and definitions and such from above
from numpy.linalg import norm

test_embedding = classifier.encode_batch(torchaudio.load("untitled.wav")[0]).squeeze().tolist()

with TinyDB(data_path, storage = CachingMiddleware(JSONStorage)) as db:
  voices = [{**i, "distance": norm(numpy.array(i["avg_embedding"]) - numpy.array(test_embedding))} for i in db.all() if type(i["avg_embedding"]) == list]

print("\n".join([f"{i['name']} - {i['distance']}" for i in sorted(voices, key = lambda x: x["distance"])[:10]]))
```

could share the embeddings file but it's 150mb and as evidenced above it used the wrong sample rate so beh

if i were brave i'd turn this into a hosted thing

wait what does it go locate from the bot cmd name or the model name or the

it has all the metadata stored just in case

it's much smaller than the embeddings anyway because there's 5 per voice (1 per sample audio + average) and they're each hundreds floats stored as strings

but hey it has successfully shown that there's potential

So i was sorta interested in radtts
Has anyone here tested it much? I want to know how well the current pretrained model works with singing datasets and non-american accents
Cuz if it doesn't work very well i'll train the talknet pretrained model i'm making on radtts too

I think you should
It would be super interesting for people, i don't think they'd mind if there were to be bugs

i mean the main issue is that my vps is running centos 6 because i thought it'd be a good idea

does it seem to accurately reflect voices that are perceptually similar?

here all the voices i recognize are feminine

y'know what here's the json https://drive.google.com/file/d/1-2Beep8jM3Jyq59sKd1dacxj5DjcL5aK/view

note that `embeddings` has 4 elements, vs `avg_embedding` which is what i actually use for the distance calculation

and one of the entries has `null` and `nan` respectively because the voice for it had no sample audio

monkaS
also google got rid of that old wacky top bar from like 2012 in this page

https://google.github.io/df-conformer/wavefit/