it's much smaller than the embeddings anyway becau...
# machine-learning
h
it's much smaller than the embeddings anyway because there's 5 per voice (1 per sample audio + average) and they're each hundreds floats stored as strings