Problems adding a new speaker #200

JoanisTriandafilidi · 2023-12-18T16:08:37Z

Hello! Thanks for the great job!
I actively use Vits in various personal mini-projects and I had an idea related to adding new speakers to the multi-speaker model.

The essence of my idea is this:

I trained a good multispeaker model for 200 speakers.
I received an embedding for a new speaker of a suitable format using Speakernet.
I want to add a new speaker to an existing multispeaker model by adding a new embed. That is, emb_g.shape was equal to [200, 192], but will become [201, 192]. I'm adding a new embedding to the utils.load_checkpoint function.

The model loads without problems - however, on the inference, instead of the expected new (!) voice, I get one of the 200 already trained voices. Moreover, if I apply some other embedding to the input, I will get some other voice from these 200. So I can conclude that the model can potentially generate voices for artificially added speakers. But I can't get the voice to match the target.

Could you please tell me how I can solve this problem? Why, when the model sees a new embedding, does it generate a different voice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems adding a new speaker #200

Problems adding a new speaker #200

Problems adding a new speaker #200

Problems adding a new speaker #200

Comments