JPWO2022115727A5

JPWO2022115727A5 -

Info

Publication number: JPWO2022115727A5
Application number: JP2023532750A
Authority: JP
Publication date: 2024-06-20

Claims

1. A method comprising:
The method further comprises: a chatbot system receiving an utterance generated by a user interacting with the chatbot system, the utterance including text data converted from a voice input of the user;
The chatbot system includes inputting the utterance to a machine learning model including a series of network layers, a final network layer of the series of network layers including a logit function that converts a first probability for a resolvable class to a first real number representing a first logit value and a second probability for an unresolvable class to a second real number representing a second logit value, and the method further includes:
the machine learning model determining the first probability for the resolvable class and the second probability for the unresolvable class;
and the machine learning model using the logit function to map the first probability for the solvable class to the first logit value, the logit function for mapping the first probability being a logarithm of odds corresponding to the first probability for the solvable class, the logarithm of odds being weighted by a centroid of a distribution associated with the solvable class, the method further comprising:
The machine learning model includes mapping the second probability for the unresolvable class to an enhanced logit value, the enhanced logit value being a third real number determined independently from the logit function used to map the first probability, the enhanced logit value being selected from a range of values defined by (i) a statistical value determined based on a set of logit values generated from a training dataset; and (ii) a logarithm of first odds corresponding to the second probability for the unresolvable class, the logarithm of the first odds being constrained to a range of values by a bounding function, and the unresolvable class being selected from a range of values defined by (i) a statistical value determined based on a set of logit values generated from a training dataset; (iii) a weighted value generated by a logarithm of second odds corresponding to the second probability for the unresolvable class, the logarithm of the second odds being constrained to the range of values by the bounding function, scaled by a scaling factor, and weighted by the centroid of the distribution associated with the unresolvable class; (iv) a hyper-parameter optimized value generated based on hyper-parameter tuning of the machine learning model; or (v) a learned value adjusted during training of the machine learning model, wherein the method further comprises:
The method includes the chatbot system classifying the utterance into the resolvable class or the unresolvable class based on the first logit value and the enhanced logit value.

The method of claim 1, further comprising: the chatbot system responding to the user based on the classification of the utterance as the resolvable class or the unresolvable class.

The method of claim 1 or claim 2, wherein the resolvable classes are in-domain and in-scope skills or intents, and the unresolvable classes are out-of-domain or out-of-scope skills or intents.

The enhanced logit value is the statistical value determined based on a set of the logit values generated from the training data set, and determining the statistical value comprises:
accessing a subset of the training data set, the subset of the training data set comprising a subset of utterances, each utterance of the subset of utterances being associated with the unresolvable class, and determining the statistics further comprises:
generating a set of training logit values, each training logit value of the set of training logit values being generated by applying the machine learning model to a respective utterance of the subset of utterances, and determining the statistical value further comprises:
determining the statistical value, the statistical value representing the set of training logit values, the determining of the statistical value further comprising:
The method of any one of claims 1 to 3, comprising setting the statistical value as the enhanced logit value.

The method of claim 4, wherein the statistic is the median of the set of training logit values.

The method of claim 4, wherein the statistic is the average of the set of training logit values.

The method of any one of claims 1 to 3, wherein the enhanced logit value is the bounded value, and the logit function is constrained to the range of values by the bounding function.

The method of any one of claims 1 to 3, wherein the enhanced logit values are the weighted values, and the logit function is constrained to the range of values by the bounding function and scaled by the scaling factor.

The method of claim 8, wherein a first value is assigned to the scaling factor of the logit function and a second value is assigned to the scaling factor of the logarithm of the second odds corresponding to the second probability for the unresolvable class, the second value being greater than the first value.

The enhanced logit values are the hyper-parameter optimization values, and determining the hyper-parameter optimization values includes:
accessing a subset of the training dataset, the subset of the training dataset comprising a subset of utterances, each utterance of the subset of utterances being associated with the unresolvable class, and determining the hyper-parameter optimization values further comprises:
generating a set of training logit values, each training logit value in the set of training logit values being generated by applying the machine learning model to a respective utterance in the subset of utterances, and determining the hyper-parameter optimization values further comprises:
determining the statistical value, the statistical value representing the set of training logit values, and determining the hyper-parameter optimization values further comprises:
tuning one or more hyperparameters of the machine learning model to generate optimized statistics;
Setting the optimized statistical value as the enhanced logit value.

1. A system comprising:
one or more data processors;
and a non-transitory computer-readable storage medium containing instructions that, when executed on the one or more data processors, cause the one or more data processors to perform operations, including:
receiving an utterance generated by a user interacting with the system , the utterance including text data converted from a voice input of the user, the operations further comprising:
inputting the utterance to a machine learning model including a series of network layers, a final network layer of the series of network layers including a logit function that converts a first probability for a resolvable class to a first real number representing a first logit value and a second probability for an unresolvable class to a second real number representing a second logit value, the operations further comprising:
the machine learning model determining the first probability for the resolvable class and the second probability for the unresolvable class;
and the machine learning model using the logit function to map the first probability for the solvable class to the first logit value, the logit function for mapping the first probability being a logarithm of odds corresponding to the first probability for the solvable class, the logarithm of odds being weighted by a centroid of a distribution associated with the solvable class, the operations further comprising:
The machine learning model includes mapping the second probability for the unresolvable class to an enhanced logit value, the enhanced logit value being a third real number determined independently from the logit function used to map the first probability, the enhanced logit value being selected from a range of values defined by (i) a statistical value determined based on a set of logit values generated from a training dataset; and (ii) a logarithm of first odds corresponding to the second probability for the unresolvable class, the logarithm of the first odds being constrained to a range of values by a bounding function, and the unresolvable class being selected from a range of values defined by (i) a statistical value determined based on a set of logit values generated from a training dataset; (iii) a weighted value generated by a logarithm of second odds corresponding to the second probability for the unresolvable class, the logarithm of the second odds being constrained to the range of values by the bounding function, scaled by a scaling factor, and weighted by the centroid of the distribution associated with the unresolvable class; (iv) a hyper-parameter optimized value generated based on hyper-parameter tuning of the machine learning model; or (v) a learned value adjusted during training of the machine learning model, wherein the operations further comprise:
classifying the utterance into the resolvable class or the unresolvable class based on the first logit value and the enhanced logit value.

The instructions further cause the one or more data processors to perform operations, the operations including:
The system of claim 11 , further comprising: responding to the user based on the classification of the utterance as the resolvable class or the unresolvable class.

The system of claim 11 or 12, wherein the resolvable classes are in-domain and in-scope skills or intents, and the unresolvable classes are out-of-domain or out-of-scope skills or intents.

The enhanced logit value is the statistical value determined based on a set of the logit values generated from the training data set, and determining the statistical value comprises:
accessing a subset of the training data set, the subset of the training data set comprising a subset of utterances, each utterance of the subset of utterances being associated with the unresolvable class, and determining the statistics further comprises:
generating a set of training logit values, each training logit value of the set of training logit values being generated by applying the machine learning model to a respective utterance of the subset of utterances, and determining the statistical value further comprises:
determining the statistical value, the statistical value representing the set of training logit values, the determining of the statistical value further comprising:
The system of any one of claims 11 to 13, further comprising setting the statistical value as the enhanced logit value.

The system of claim 14, wherein the statistic is the median of the set of training logit values.

The system of claim 14, wherein the statistic is the average of the set of training logit values.

The system of any one of claims 11 to 13, wherein the enhanced logit value is the bounded value, and the logit function is constrained to the range of values by the bounding function.

The system of any one of claims 11 to 13, wherein the enhanced logit values are the weighted values, and the logit function is constrained to the range of values by the bounding function and scaled by the scaling factor.

The system of claim 18, wherein a first value is assigned to the scaling coefficient of the logit function and a second value is assigned to the scaling coefficient of the logarithm of the second odds corresponding to the second probability for the unresolvable class, the second value being greater than the first value.

The enhanced logit values are the hyper-parameter optimization values, and determining the hyper-parameter optimization values includes:
accessing a subset of the training dataset, the subset of the training dataset comprising a subset of utterances, each utterance of the subset of utterances being associated with the unresolvable class, and determining the hyper-parameter optimization values further comprises:
generating a set of training logit values, each training logit value in the set of training logit values being generated by applying the machine learning model to a respective utterance in the subset of utterances, and determining the hyper-parameter optimization values further comprises:
determining the statistical value, the statistical value representing the set of training logit values, and determining the hyper-parameter optimization values further comprises:
tuning one or more hyperparameters of the machine learning model to generate optimized statistics;
and setting the optimized statistical value as the enhanced logit value.

A program for causing a computer to execute the method according to any one of claims 1 to 10.

1. A method comprising:
The method further includes: a training subsystem receiving a training data set, the training data set including a plurality of utterances generated by a user interacting with a chatbot system, at least one utterance of the plurality of utterances including text data converted from a voice input of the user, the method further comprising:
The training subsystem includes accessing a machine learning model including a series of network layers, a final network layer of the series of network layers including a logit function that converts a first probability for a resolvable class to a first real number representing a first logit value and a second probability for an unresolvable class to a second real number representing a second logit value, the method further comprising:
The training subsystem trains the machine learning model with the training dataset, such that the machine learning model:
determining the first probability for the resolvable class and the second probability for the unresolvable class;
using the logit function to map the first probability for the solvable class to the first logit value, the logit function for mapping the first probability being a logarithm of the odds corresponding to the first probability for the solvable class, the logarithm of the odds being weighted by a centroid of a distribution associated with the solvable class, the method further comprising:
the training subsystem replacing the logit function with an enhanced logit value such that the second probability for the unresolvable class is mapped to the enhanced logit value;
the enhanced logit value is a third real number determined independently from the logit function used to map the first probability;
The enhanced logit value comprises: (i) a statistical value determined based on a set of logit values generated from the training dataset; (ii) a bounded value selected from a range of values defined by the logarithm of first odds corresponding to the second probability for the unresolvable class, the logarithm of the first odds being constrained to a range of values by a bounding function and weighted by a centroid of a distribution associated with the unresolvable class; (iii) a weighted value generated by the logarithm of second odds corresponding to the second probability for the unresolvable class, the logarithm of the second odds being constrained to a range of values by the bounding function, scaled by a scaling factor and weighted by the centroid of the distribution associated with the unresolvable class; (iv) a hyper-parameter optimized value generated based on hyper-parameter tuning of the machine learning model; or (v) a learned value adjusted during training of the machine learning model;
The method, wherein the training subsystem deploys the trained machine learning model with the enriched logit values.

The method further comprises:
generating an augmented training data set from the training data set, the augmented training data set including transforming one or more copies of a particular utterance of the plurality of utterances, the particular utterance being associated with a training label that identifies the particular utterance as being associated with the unresolvable class, the method further comprising:
23. The method of claim 22 , comprising training the machine learning model using the expanded training data set.

24. The method of claim 23, wherein transforming the one or more copies of the particular utterance comprises performing one or more of the following: (i) a reverse transformation of the one or more copies of the particular utterance; (ii) a synonym substitution of one or more tokens of the one or more copies of the particular utterance; (iii) a random insertion of a token into the one or more copies of the particular utterance; (iv) a swap between two tokens of the one or more copies of the particular utterance; or (v) a random deletion of one or more tokens of the one or more copies of the particular utterance.

The enhanced logit value is the statistical value determined based on a set of the logit values generated from the training dataset, and training the machine learning model further comprises:
and training the machine learning model further comprising: accessing a subset of the training dataset, the subset of the training dataset comprising a subset of the plurality of utterances, each utterance of the subset of utterances being associated with the unresolvable class.
generating a set of training logit values, each training logit value in the set of training logit values being generated by applying the machine learning model to a respective utterance in the subset of utterances, and training the machine learning model further comprises:
determining the statistical value, the statistical value representing the set of training logit values, and training the machine learning model further comprises:
A method according to any one of claims 22 to 24 , comprising setting the statistical value as the enhanced logit value.

26. The method of claim 25 , wherein the statistic is the median of the set of training logit values.

26. The method of claim 25 , wherein the statistical value is the average of the set of training logit values.

A method according to any one of claims 22 to 24, wherein the enhanced logit values are the bounded values, and the logit function is constrained to the range of values by the bounding function.

25. The method of claim 22, wherein the enhanced logit value is the weighted value, the logit function being constrained to the range of values by the bounding function and scaled by the scaling factor.

30. The method of claim 29, wherein a first value is assigned to the scaling factor of the logit function and a second value is assigned to the scaling factor of the logarithm of the second odds corresponding to the second probability for the unresolvable class, the second value being greater than the first value.

30. The method of claim 29 , wherein training the machine learning model further comprises adjusting the scaling factor for the unresolvable classes.

The enhanced logit values are the hyper-parameter optimization values, and training the machine learning model further comprises:
accessing a subset of the training dataset, the subset of the training dataset comprising a subset of utterances, each utterance of the subset of utterances being associated with the unresolvable class, and training the machine learning model further comprises:
generating a set of training logit values, each training logit value in the set of training logit values being generated by applying the machine learning model to a respective utterance in the subset of utterances, and training the machine learning model further comprises:
determining the statistical value, the statistical value representing the set of training logit values, and training the machine learning model further comprises:
tuning one or more hyperparameters of the machine learning model to generate optimized statistics;
Setting the optimized statistical value as the enhanced logit value.

1. A system comprising:
one or more data processors;
and a non-transitory computer-readable storage medium containing instructions that, when executed on the one or more data processors, cause the one or more data processors to perform operations, including:
receiving a training dataset, the training dataset including a plurality of utterances generated by a user interacting with a chatbot system, at least one utterance of the plurality of utterances including text data converted from a voice input of the user, the operations further comprising:
accessing a machine learning model including a series of network layers, a final network layer of the series of network layers including a logit function that converts a first probability for a resolvable class to a first real number representing a first logit value and a second probability for an unresolvable class to a second real number representing a second logit value, the operations further comprising:
training the machine learning model with the training dataset,
determining the first probability for the resolvable class and the second probability for the unresolvable class;
using the logit function to map the first probability for the resolvable class to the first logit value, the logit function for mapping the first probability being a logarithm of the odds corresponding to the first probability for the resolvable class, the logarithm of the odds being weighted by a centroid of a distribution associated with the resolvable class, the operations further comprising:
replacing the logit function with an enhanced logit value such that the second probability for the unresolvable class is mapped to the enhanced logit value;
the enhanced logit value is a third real number determined independently from the logit function used to map the first probability;
The enhanced logit value may comprise: (i) a statistical value determined based on a set of logit values generated from the training dataset; (ii) a bounded value selected from a range of values defined by the logarithm of first odds corresponding to the second probability for the unresolvable class, the logarithm of the first odds being constrained to a range of values by a bounding function and weighted by a centroid of a distribution associated with the unresolvable class; (iii) a weighted value generated by the logarithm of second odds corresponding to the second probability for the unresolvable class, the logarithm of the second odds being constrained to the range of values by the bounding function, scaled by a scaling factor and weighted by the centroid of the distribution associated with the unresolvable class; (iv) a hyperparameter optimization value generated based on hyperparameter tuning of the machine learning model; or (v) a learned value adjusted during training of the machine learning model; and the operations further comprise:
deploying the trained machine learning model with the enriched logit values.

The instructions further cause the one or more data processors to perform operations, the operations including:
generating an augmented training data set from the training data set, the augmented training data set including transforming one or more copies of a particular utterance of the plurality of utterances, the particular utterance being associated with a training label that identifies the particular utterance as being associated with the unresolvable class, the operations further comprising:
34. The system of claim 33 , further comprising training the machine learning model using the expanded training data set.

35. The system of claim 34, wherein transforming the one or more copies of the particular utterance includes performing one or more of the following: (i) a reverse transformation of the one or more copies of the particular utterance; (ii) a synonym substitution of one or more tokens of the one or more copies of the particular utterance; (iii) a random insertion of a token into the one or more copies of the particular utterance; (iv) a swap between two tokens of the one or more copies of the particular utterance; or (v) a random deletion of one or more tokens of the one or more copies of the particular utterance.

The enhanced logit value is the statistical value determined based on a set of the logit values generated from the training dataset, and training the machine learning model further comprises:
accessing a subset of the training dataset, the subset of the training dataset comprising a subset of the plurality of utterances, each utterance of the subset of utterances being associated with the unresolvable class, and training the machine learning model further comprises:
generating a set of training logit values, each training logit value in the set of training logit values being generated by applying the machine learning model to a respective utterance in the subset of utterances, and training the machine learning model further comprises:
determining the statistical value, the statistical value representing the set of training logit values, and training the machine learning model further comprises:
The system of any one of claims 33 to 35 , further comprising setting the statistical value as the enhanced logit value.

37. The system of claim 36 , wherein the statistical value is the median of the set of training logit values.

37. The system of claim 36 , wherein the statistical value is the average of the set of training logit values.

The system of any one of claims 33 to 35, wherein the enhanced logit value is the bounded value, and the logit function is constrained to the range of values by the bounding function.

36. The system of claim 33, wherein the enhanced logit value is the weighted value, and the logit function is constrained to the range of values by the bounding function and scaled by the scaling factor.

41. The system of claim 40, wherein a first value is assigned to the scaling factor of the logit function and a second value is assigned to the scaling factor of the logarithm of the second odds corresponding to the second probability for the unresolvable class, the second value being greater than the first value.

41. The system of claim 40 , wherein training the machine learning model further comprises adjusting the scaling factor for the unresolvable class.

The enhanced logit values are the hyper-parameter optimization values, and training the machine learning model further comprises:
accessing a subset of the training dataset, the subset of the training dataset comprising a subset of utterances, each utterance of the subset of utterances being associated with the unresolvable class, and training the machine learning model further comprises:
generating a set of training logit values, each training logit value in the set of training logit values being generated by applying the machine learning model to a respective utterance in the subset of utterances, and training the machine learning model further comprises:
determining the statistical value, the statistical value representing the set of training logit values, and training the machine learning model further comprises:
tuning one or more hyperparameters of the machine learning model to generate optimized statistics;
and setting the optimized statistical value as the enhanced logit value.

A program for causing a computer to execute the method according to any one of claims 22 to 32.