You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! Thanks for the great job!
I actively use Vits in various personal mini-projects and I had an idea related to adding new speakers to the multi-speaker model.
The essence of my idea is this:
I trained a good multispeaker model for 200 speakers.
I received an embedding for a new speaker of a suitable format using Speakernet.
I want to add a new speaker to an existing multispeaker model by adding a new embed. That is, emb_g.shape was equal to [200, 192], but will become [201, 192]. I'm adding a new embedding to the utils.load_checkpoint function.
The model loads without problems - however, on the inference, instead of the expected new (!) voice, I get one of the 200 already trained voices. Moreover, if I apply some other embedding to the input, I will get some other voice from these 200. So I can conclude that the model can potentially generate voices for artificially added speakers. But I can't get the voice to match the target.
Could you please tell me how I can solve this problem? Why, when the model sees a new embedding, does it generate a different voice?
The text was updated successfully, but these errors were encountered:
Hello! Thanks for the great job!
I actively use Vits in various personal mini-projects and I had an idea related to adding new speakers to the multi-speaker model.
The essence of my idea is this:
The model loads without problems - however, on the inference, instead of the expected new (!) voice, I get one of the 200 already trained voices. Moreover, if I apply some other embedding to the input, I will get some other voice from these 200. So I can conclude that the model can potentially generate voices for artificially added speakers. But I can't get the voice to match the target.
Could you please tell me how I can solve this problem? Why, when the model sees a new embedding, does it generate a different voice?
The text was updated successfully, but these errors were encountered: