CN116933049A

CN116933049A - Feature selection method, device, electronic equipment and storage medium

Info

Publication number: CN116933049A
Application number: CN202210332002.5A
Authority: CN
Inventors: 赵翔宇; 王叶晶; 徐童; 吴贤; 王巨宏; 张猛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2023-10-24

Abstract

The application provides a feature selection method, a feature selection device, an electronic device, a computer readable storage medium and a computer program product; can be applied to an in-vehicle scene, the method comprising: invoking a feature selection model to read weights corresponding to a plurality of features of the training sample from the controller model, and fusing the weights to each feature correspondingly; based on each feature fused with the weight, calling a feature information mining model to execute a test task to obtain a test result; performing back propagation processing based on the difference between the test result and the marking result of the training sample to update the characteristic information mining model and the controller model; and calling a feature selection model to read updated weights from the controller model, and screening partial features from the plurality of features based on the updated weights to combine to obtain combined features. According to the application, the combined characteristics for the model to execute the prediction task can be accurately and efficiently screened from the large-scale data set.

Description

Feature selection method, device, electronic device and storage medium

技术领域Technical Field

本申请涉及人工智能技术领域，尤其涉及一种特征选择方法、装置、电子设备及计算机可读存储介质。The present application relates to the field of artificial intelligence technology, and in particular to a feature selection method, device, electronic device and computer-readable storage medium.

背景技术Background Art

人工智能(AI，Artificial Intelligence)是计算机科学的一个综合技术，通过研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。人工智能技术是一门综合学科，涉及领域广泛，例如自然语言处理技术以及机器学习/深度学习等几大方向，随着技术的发展，人工智能技术将在更多的领域得到应用，并发挥越来越重要的价值。Artificial Intelligence (AI) is a comprehensive technology in computer science. By studying the design principles and implementation methods of various intelligent machines, machines are given the functions of perception, reasoning and decision-making. Artificial Intelligence technology is a comprehensive discipline that covers a wide range of fields, such as natural language processing technology and machine learning/deep learning. With the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important role.

在进行机器学习模型的训练时，相关技术提供的特征选择方案，针对训练样本所选择的特征往往只适用于小规模数据集，即对于大规模的数据集，无法保证选择的特征的效率和模型的预测精度。When training machine learning models, the feature selection schemes provided by related technologies often only apply to small-scale data sets when selecting features for training samples. That is, for large-scale data sets, the efficiency of the selected features and the prediction accuracy of the model cannot be guaranteed.

发明内容Summary of the invention

本申请实施例提供一种特征选择方法、装置、电子设备、计算机可读存储介质及计算机程序产品，能够准确和高效地从大规模数据集中筛选出用于供模型执行预测任务的组合特征。The embodiments of the present application provide a feature selection method, device, electronic device, computer-readable storage medium and computer program product, which can accurately and efficiently screen out combined features for a model to perform prediction tasks from a large-scale data set.

本申请实施例的技术方案是这样实现的：The technical solution of the embodiment of the present application is implemented as follows:

本申请实施例提供一种特征选择方法，包括：The present application provides a feature selection method, including:

调用特征选择模型从控制器模型读取训练样本的多个特征分别对应的权重，并将所述权重对应融合至每个所述特征；Calling the feature selection model to read the weights corresponding to the multiple features of the training sample from the controller model, and integrating the weights to each of the features;

基于融合有所述权重的每个所述特征，调用特征信息挖掘模型执行测试任务，得到测试结果；Based on each of the features fused with the weight, calling the feature information mining model to perform the test task and obtain the test result;

基于所述测试结果与所述训练样本的标记结果的差异进行反向传播处理，以更新所述特征信息挖掘模型和所述控制器模型；Performing back propagation processing based on the difference between the test result and the labeling result of the training sample to update the feature information mining model and the controller model;

调用所述特征选择模型从所述控制器模型读取更新的所述权重，基于更新的所述权重从所述多个特征中筛选出部分特征进行组合，得到组合特征；Calling the feature selection model to read the updated weights from the controller model, and selecting some features from the multiple features based on the updated weights for combination to obtain combined features;

其中，所述组合特征用于供更新后的所述特征信息挖掘模型执行预测任务。The combined features are used by the updated feature information mining model to perform prediction tasks.

本申请实施例提供一种特征选择装置，包括：The present application provides a feature selection device, including:

调用模块，用于调用特征选择模型从控制器模型读取训练样本的多个特征分别对应的权重；A calling module is used to call the feature selection model to read the weights corresponding to multiple features of the training sample from the controller model;

融合模块，用于将所述权重对应融合至每个所述特征；A fusion module, used for fusing the weights to each of the features;

所述调用模块，还用于基于融合有所述权重的每个所述特征，调用特征信息挖掘模型执行测试任务，得到测试结果；The calling module is further used to call the feature information mining model to perform the test task based on each feature fused with the weight, and obtain the test result;

更新模块，用于基于所述测试结果与所述训练样本的标记结果的差异进行反向传播处理，以更新所述特征信息挖掘模型和所述控制器模型；An updating module, configured to perform back propagation processing based on the difference between the test result and the labeling result of the training sample, so as to update the feature information mining model and the controller model;

所述调用模块，还用于调用所述特征选择模型从所述控制器模型读取更新的所述权重；The calling module is further used to call the feature selection model to read the updated weight from the controller model;

组合模块，用于基于更新的所述权重从所述多个特征中筛选出部分特征进行组合，得到组合特征；A combination module, used for selecting some features from the multiple features based on the updated weights for combination to obtain combined features;

本申请实施例提供一种电子设备，包括：An embodiment of the present application provides an electronic device, including:

存储器，用于存储可执行指令；A memory for storing executable instructions;

处理器，用于执行所述存储器中存储的可执行指令时，实现本申请实施例提供的特征选择方法。The processor is used to implement the feature selection method provided in the embodiment of the present application when executing the executable instructions stored in the memory.

本申请实施例提供一种计算机可读存储介质，存储有可执行指令，用于被处理器执行时，实现本申请实施例提供的特征选择方法。An embodiment of the present application provides a computer-readable storage medium storing executable instructions for implementing the feature selection method provided in the embodiment of the present application when executed by a processor.

本申请实施例提供一种计算机程序产品，包括计算机程序或指令，用于被处理器执行时，实现本申请实施例提供的特征选择方法。An embodiment of the present application provides a computer program product, including a computer program or instructions, which are used to implement the feature selection method provided in the embodiment of the present application when executed by a processor.

本申请实施例具有以下有益效果：The embodiments of the present application have the following beneficial effects:

特征选择模型首先根据控制器模型中针对训练样本的多个特征的权重为教导，确定出融合有权重的特征，接着由特征信息挖掘模型基于融合有权重的特征执行测试任务，来优化控制器模型中的权重，进而筛选出部分最优的特征进行组合供特征信息挖掘模型执行后续的预测任务，与已有技术相比，避免了对专家知识、以及人力的大量需求，能够自适应不同规模的训练集，减少计算需求的同时能够筛选出最优的组合特征供更新后的特征信息挖掘模型执行预测任务，保证了预测精度。The feature selection model first determines the weighted features based on the weights of multiple features of the training samples in the controller model. Then the feature information mining model performs the test task based on the weighted features to optimize the weights in the controller model, and then selects some of the best features for combination for the feature information mining model to perform subsequent prediction tasks. Compared with the existing technology, it avoids the large demand for expert knowledge and manpower, can adapt to training sets of different sizes, reduce computing requirements, and select the best combination of features for the updated feature information mining model to perform prediction tasks, thereby ensuring prediction accuracy.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请实施例提供的特征选择系统100的架构示意图；FIG1 is a schematic diagram of the architecture of a feature selection system 100 provided in an embodiment of the present application;

图2是本申请实施例提供的服务器200的结构示意图；FIG2 is a schematic diagram of the structure of a server 200 provided in an embodiment of the present application;

图3是本申请实施例提供的特征选择方法的原理示意图；FIG3 is a schematic diagram of the principle of the feature selection method provided in an embodiment of the present application;

图4是本申请实施例提供的特征选择方法的流程示意图；FIG4 is a schematic diagram of a flow chart of a feature selection method provided in an embodiment of the present application;

图5A是本申请实施例提供的特征选择方法的流程示意图；FIG5A is a schematic diagram of a flow chart of a feature selection method provided in an embodiment of the present application;

图5B是本申请实施例提供的特征选择方法的流程示意图；FIG5B is a schematic diagram of a flow chart of a feature selection method provided in an embodiment of the present application;

图6是本申请实施例提供的特征选择方法的原理示意图；FIG6 is a schematic diagram showing the principle of a feature selection method provided in an embodiment of the present application;

图7A是本申请实施例提供的初始化阶段的控制器网络的示意图；FIG7A is a schematic diagram of a controller network in an initialization phase provided by an embodiment of the present application;

图7B是本申请实施例提供的搜索阶段的控制器网络的示意图；FIG7B is a schematic diagram of a controller network in a search phase according to an embodiment of the present application;

图7C是本申请实施例提供的重训练阶段的控制器网络的示意图；FIG7C is a schematic diagram of a controller network in a retraining phase according to an embodiment of the present application;

图8是本申请实施例提供的特征信息挖掘模型的结构示意图。FIG8 is a schematic diagram of the structure of a feature information mining model provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请作进一步地详细描述，所描述的实施例不应视为对本申请的限制，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings. The described embodiments should not be regarded as limiting the present application. All other embodiments obtained by ordinary technicians in the field without making creative work are within the scope of protection of this application.

在以下的描述中，涉及到“一些实施例”，其描述了所有可能实施例的子集，但是可以理解，“一些实施例”可以是所有可能实施例的相同子集或不同子集，并且可以在不冲突的情况下相互结合。In the following description, reference is made to “some embodiments”, which describe a subset of all possible embodiments, but it will be understood that “some embodiments” may be the same subset or different subsets of all possible embodiments and may be combined with each other without conflict.

可以理解的是，在本申请实施例中，涉及到用户的性别、年龄、喜好等相关的数据，当本申请实施例运用到具体产品或技术中时，需要获得用户许可或者同意，且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It is understandable that in the embodiments of the present application, data related to the user's gender, age, preferences, etc., when the embodiments of the present application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of relevant data need to comply with relevant laws, regulations and standards of relevant countries and regions.

在以下的描述中，所涉及的术语“第一\第二\…”仅仅是区别类似的对象，不代表针对对象的特定排序，可以理解地，“第一\第二\…”在允许的情况下可以互换特定的顺序或先后次序，以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。In the following description, the terms "first\second\..." involved are merely used to distinguish similar objects and do not represent a specific ordering of the objects. It can be understood that "first\second\..." can be interchanged with a specific order or sequence where permitted, so that the embodiments of the present application described herein can be implemented in an order other than that illustrated or described herein.

在以下的描述中，所涉及的术语“多个”是指至少两个。In the following description, the term "plurality" referred to means at least two.

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的，不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of this application and are not intended to limit this application.

对本申请实施例进行进一步详细说明之前，对本申请实施例中涉及的名词和术语进行说明，本申请实施例中涉及的名词和术语适用于如下的解释。Before further describing the embodiments of the present application in detail, the nouns and terms involved in the embodiments of the present application are explained. The nouns and terms involved in the embodiments of the present application are subject to the following interpretations.

1)特征：包括数值特征和类别特征，其中，数值类型的数据具有实际测量意义，例如人的身高、体重、血压等，或者是计数，例如一个网站被浏览的次数、一种物品被购买的次数等。类别数据表示的量可以是人的性别、婚姻状况、家乡或者他们喜欢的电影类型等。类别数据的取值可以是数值类型(例如“1”代表男性，“0”代表女性)，但是数值没有任何数学意义，它们不能进行数学运算。类别特征不仅可以由原始数据中直接提取，也可以通过将数值特征离散化得到。1) Features: including numerical features and categorical features, where numerical data has actual measurement meaning, such as a person's height, weight, blood pressure, etc., or counts, such as the number of times a website is visited, the number of times an item is purchased, etc. The quantity represented by categorical data can be a person's gender, marital status, hometown, or the type of movies they like. The value of categorical data can be numerical (for example, "1" represents male and "0" represents female), but the numerical values have no mathematical meaning and they cannot be used for mathematical operations. Categorical features can not only be extracted directly from the original data, but also be obtained by discretizing numerical features.

2)特征选择：又称属性选择或者变量选择，是指在所有可以使用的特征中，只选择部分有效的特征使用，排除无效的、有害的特征。特征选择对于机器学习应用来说非常重要，能够简化模型、改善性能和通用性，降低过拟合风险。2) Feature selection: Also known as attribute selection or variable selection, it refers to selecting only some effective features from all available features and excluding invalid and harmful features. Feature selection is very important for machine learning applications, which can simplify the model, improve performance and versatility, and reduce the risk of overfitting.

3)组合特征：将两个或者多个单独的特征进行组合(例如相乘或者求笛卡尔积)而形成的合成特征，例如用户与类别特征组合(例如在电商平台中，用户在不同类别下的点击次数)、用户不同年龄档与类别特征组合(例如在新闻平台中，少年用户在娱乐新闻下的点击次数会比较高，青年用户在社会新闻下的点击次数会比较高)、用户身份与时间特征组合(例如学生在周末，在某游戏客户端内的行为会比较丰富)。3) Combined features: synthetic features formed by combining two or more individual features (such as multiplying or finding Cartesian products), such as the combination of user and category features (for example, the number of clicks by users in different categories on an e-commerce platform), the combination of different age groups of users and category features (for example, on a news platform, young users will have a higher number of clicks on entertainment news, while young users will have a higher number of clicks on social news), and the combination of user identity and time features (for example, students will have more diverse behaviors in a certain game client on weekends).

4)多层感知机(MLP，Multi Layer Perceptron)：也称人工神经网络，除了输入层和输出层之外，它中间可以包含多个隐层，用于将输入的多个数据集映射到单一的输出的数据集上。4) Multi-layer Perceptron (MLP): Also known as artificial neural network, in addition to the input layer and output layer, it can contain multiple hidden layers in the middle to map multiple input data sets to a single output data set.

特征选择是应用到各个场景中的机器学习模型的训练阶段的重要任务。以信息推荐场景为例，特征的质量能够显著地影响推荐系统的表现。相关技术中，利用专家知识人工选取特征进行组合的方法实际上是一种试错的方法。因为在推荐系统中，即使是专家也无法预测利用所选的特征进行组合得到的组合特征是否能够取得较好的模型效果。而一旦结果不如预期，使用者就需要再次投入大量的人力，分析原因并重新进行选择。这种方法无法避免的需要专家知识、以及大量的劳动力，对于很多推荐系统的使用者来说很难支持。Feature selection is an important task in the training phase of machine learning models applied to various scenarios. Taking the information recommendation scenario as an example, the quality of features can significantly affect the performance of the recommendation system. In related technologies, the method of manually selecting features for combination using expert knowledge is actually a trial-and-error method. Because in the recommendation system, even experts cannot predict whether the combined features obtained by combining the selected features can achieve a good model effect. Once the result is not as expected, the user needs to invest a lot of manpower again to analyze the reasons and make new selections. This method inevitably requires expert knowledge and a lot of labor, which is difficult for many users of recommendation systems to support.

此外，相关技术提供的使用搜索算法解决特征选择问题的方案，也需要消耗大量的计算资源，即使是使用精心设计的搜索算法，也无法避免每次搜索时，为了得到所选择的特征进行组合得到的组合特征在推荐系统中的表现，需要对模型进行一次完整的训练，这会带来大量的计算需求。In addition, the solution provided by the related technology using search algorithms to solve the feature selection problem also requires a large amount of computing resources. Even if a carefully designed search algorithm is used, it is impossible to avoid the need to fully train the model each time a search is performed in order to obtain the performance of the combined features obtained by combining the selected features in the recommendation system, which will bring a large amount of computing requirements.

再者，相关技术提供的根据统计理论设计的特征选择方法，无论是包装法、过滤法、还是嵌入法，都并非为大规模数据集而设计的。简单的方法，例如使用卡方得分进行过滤等，虽然能够在大规模数据集上使用，但是准确性受数据规模以及算法本身局限性影响，无法给出良好的特征选择结果；复杂的方法，例如基于相关性的快速过滤方法(FCBF，FastCorrelation-Based Filter)、最小化冗余最大相关性法(MRMR，Minimum RedundancyMaximum Relevance)等，在大规模数据集上应用时，由于算法的复杂性，无法在有限的时间内给出特征选择结果。Furthermore, the feature selection methods designed based on statistical theory provided by related technologies, whether packaging, filtering, or embedding, are not designed for large-scale data sets. Simple methods, such as filtering using chi-square scores, can be used on large-scale data sets, but their accuracy is affected by the size of the data and the limitations of the algorithm itself, and they cannot give good feature selection results; complex methods, such as the Fast Correlation-Based Filter (FCBF) and the Minimum Redundancy Maximum Relevance (MRMR) method, cannot give feature selection results within a limited time when applied to large-scale data sets due to the complexity of the algorithm.

最后，相关技术提供的基于强化学习设计的特征选择方法同样存在类似的问题。例如为了计算“奖励”(Reward)，无法避免地需要将模型训练至收敛；同时，相关技术提供的强化学习特征选择方法都是针对小规模数据集设计的，在面对大规模数据集时，特征选择的效率和模型的预测精度都将大打折扣。Finally, the feature selection methods based on reinforcement learning provided by related technologies also have similar problems. For example, in order to calculate the "reward", it is inevitable to train the model to convergence; at the same time, the reinforcement learning feature selection methods provided by related technologies are designed for small-scale data sets. When facing large-scale data sets, the efficiency of feature selection and the prediction accuracy of the model will be greatly reduced.

综上，相关技术的特征选择方案存在不适应大规模数据集，特征选择效率较低，导致计算资源的较大消耗；同时，选择的特征组合的准确性不够，影响了模型的预测精度。In summary, the feature selection schemes of related technologies are not suitable for large-scale data sets, and the feature selection efficiency is low, resulting in a large consumption of computing resources; at the same time, the accuracy of the selected feature combination is not enough, which affects the prediction accuracy of the model.

鉴于此，本申请实施例提供一种特征选择方法、装置、电子设备、计算机可读存储介质及计算机程序产品，能够准确和高效地从大规模数据集中确定出用于供模型执行预测任务的组合特征。下面说明本申请实施例提供的电子设备的示例性应用，本申请实施例提供的电子设备可以实施为服务器，也可以由服务器和终端设备协同实施。In view of this, the embodiments of the present application provide a feature selection method, device, electronic device, computer-readable storage medium and computer program product, which can accurately and efficiently determine the combined features for the model to perform prediction tasks from a large-scale data set. The following describes an exemplary application of the electronic device provided by the embodiments of the present application. The electronic device provided by the embodiments of the present application can be implemented as a server, or can be implemented by a server and a terminal device in collaboration.

示例的，参见图1，图1是本申请实施例提供的特征选择系统100的架构示意图，为实现支撑一个能够准确和高效地从大规模数据集中筛选出用于供模型执行预测任务的组合特征的应用，终端设备400通过网络300连接服务器200，网络300可以是广域网或者局域网，又或者是二者的组合。For example, see Figure 1, which is a schematic diagram of the architecture of a feature selection system 100 provided in an embodiment of the present application. In order to support an application that can accurately and efficiently screen out combined features for a model to perform prediction tasks from a large-scale data set, a terminal device 400 is connected to a server 200 via a network 300. The network 300 can be a wide area network or a local area network, or a combination of the two.

本申请实施例中涉及的模型包括控制器模型、特征选择模型和特征信息挖掘模型，这些模型可以统一由服务器200运行，也可以由服务器200和终端设备400协同运行，下面具体说明。The models involved in the embodiments of the present application include a controller model, a feature selection model, and a feature information mining model. These models can be run uniformly by the server 200, or can be run collaboratively by the server 200 and the terminal device 400, as described in detail below.

在一些实施例中，本申请实施例提供的特征选择方法可以由服务器单独实施，即服务器运行上述的全部模型。例如服务器200可以调用特征选择模型从控制器模型读取训练样本的多个特征分别对应的权重，并确定融合有权重的每个特征，其中，训练样本可以是服务器200从数据库(图1中未示出)中获取的；接着服务器200可以基于融合有权重的每个特征，调用特征信息挖掘模型执行测试任务，得到测试结果；随后服务器200可以基于测试结果与训练样本的标记结果的差异进行反向传播处理，以更新特征信息挖掘模型和控制器模型；最后，服务器200可以调用特征选择模型从控制器模型读取更新的权重，并基于更新的权重从多个特征中筛选出部分特征进行组合，得到组合特征；其中，组合特征可以用于供特征信息挖掘模型执行预测任务，例如当预测任务为推荐任务时，服务器200可以基于特征信息挖掘模型输出的预测结果(例如用户针对不同信息的点击率)，将点击率大于点击率阈值的信息发送至终端设备400，以在终端设备400运行的客户端410的人机交互界面中显示。In some embodiments, the feature selection method provided in the embodiments of the present application can be implemented by the server alone, that is, the server runs all the above-mentioned models. For example, the server 200 may call the feature selection model to read the weights corresponding to the multiple features of the training sample from the controller model, and determine each feature fused with the weight, wherein the training sample may be obtained by the server 200 from a database (not shown in FIG. 1 ); then the server 200 may call the feature information mining model to perform the test task based on each feature fused with the weight, and obtain the test result; then the server 200 may perform back propagation processing based on the difference between the test result and the labeling result of the training sample to update the feature information mining model and the controller model; finally, the server 200 may call the feature selection model to read the updated weight from the controller model, and select some features from the multiple features based on the updated weight to combine them to obtain the combined feature; wherein the combined feature may be used for the feature information mining model to perform the prediction task, for example, when the prediction task is a recommendation task, the server 200 may send the information whose click rate is greater than the click rate threshold to the terminal device 400 based on the prediction result output by the feature information mining model (such as the click rate of the user for different information), so as to be displayed in the human-computer interaction interface of the client 410 running on the terminal device 400.

在另一些实施例中，本申请实施例提供的特征选择方法也可以由终端设备和服务器协调实施，即服务器运行控制器模型、特征选择模型和特征信息挖掘模型，终端设备运行特征信息挖掘模型(即服务器将训练后的特征信息挖掘模型发送至终端设备)。例如服务器200在调用本申请实施例提供的特征选择方法从训练样本的多个特征中筛选出部分特征进行组合，得到组合特征，并基于组合特征对特征信息挖掘模型进行重新训练之后，可以将训练后的特征信息挖掘模型发送至终端设备400，以使终端设备400在接收到新样本时，可以基于新样本对应的组合特征调用服务器200发送的训练后的特征信息挖掘模型执行预测任务，得到预测结果。In other embodiments, the feature selection method provided in the embodiment of the present application can also be implemented in coordination by the terminal device and the server, that is, the server runs the controller model, the feature selection model and the feature information mining model, and the terminal device runs the feature information mining model (that is, the server sends the trained feature information mining model to the terminal device). For example, after the server 200 calls the feature selection method provided in the embodiment of the present application to select some features from multiple features of the training sample for combination, obtains the combined features, and retrains the feature information mining model based on the combined features, the trained feature information mining model can be sent to the terminal device 400, so that when the terminal device 400 receives a new sample, it can call the trained feature information mining model sent by the server 200 based on the combined features corresponding to the new sample to perform the prediction task and obtain the prediction result.

下面对本申请实施例提供的特征信息挖掘模型的应用场景进行说明。The following describes the application scenarios of the feature information mining model provided in the embodiments of the present application.

本申请实施例提供的特征信息挖掘模型可以应用于各种领域，例如可以应用于信息推荐场景(例如新闻推荐、广告推荐等)，也可以应用于分类场景(例如文本分类、图片分类等)。The feature information mining model provided in the embodiment of the present application can be applied to various fields, for example, it can be applied to information recommendation scenarios (such as news recommendation, advertising recommendation, etc.), and it can also be applied to classification scenarios (such as text classification, image classification, etc.).

在一个实施场景中，特征信息挖掘模型可以是推荐模型，为了提高内容推荐(例如新闻推荐)的精确度，在进行内容推荐时，可以调用本申请实施例提供的特征选择方法从训练样本(例如样本用户的历史行为数据)的多个特征(例如包括对象特征、内容特征、以及上下文特征等，其中，对象特征包括但不限于用户的性别、年龄、注册时间、收货地址、常用区域等，内容特征包括但不限于商品、内容的标题分词、内容来源、内容生产者等，上下文特征是代表用户当前时空状态、最近一段时间的行为抽象的特征，例如可以是用户当前的坐标、最近浏览的内容、最近购买的商品等)中筛选出部分的最优特征(例如用户的性别和内容生产者)进行组合，得到组合特征(例如将用户的性别和内容生产者进行组合而形成的合成特征)，接着推荐模型可以基于得到的组合特征执行预测任务(例如预测该用户针对同一内容生产者发布的其他内容的点击率，最后将点击率大于点击率阈值的内容推荐给该用户)，从而提高了内容推荐的精准度。In one implementation scenario, the feature information mining model may be a recommendation model. In order to improve the accuracy of content recommendation (e.g., news recommendation), when performing content recommendation, the feature selection method provided in the embodiment of the present application may be called to select some of the best features (e.g., the user's gender and content producer) from multiple features of a training sample (e.g., historical behavior data of a sample user) (e.g., including object features, content features, and context features, etc., wherein object features include but are not limited to the user's gender, age, registration time, delivery address, frequently used areas, etc., content features include but are not limited to commodities, title segmentation of content, content source, content producer, etc., and context features are features that represent the user's current spatiotemporal state and the behavior abstraction in the recent period, such as the user's current coordinates, recently browsed content, recently purchased commodities, etc.) and combine them to obtain a combined feature (e.g., a synthetic feature formed by combining the user's gender and content producer). Then, the recommendation model may perform a prediction task based on the obtained combined feature (e.g., predicting the user's click-through rate for other content published by the same content producer, and finally recommending content with a click-through rate greater than a click-through rate threshold to the user), thereby improving the accuracy of content recommendation.

在另一个实施场景中，特征信息挖掘模型可以是文本分类模型，为了提高文本分类(例如垃圾邮件的识别)的精确度，可以调用本申请实施例提供的特征选择方法从训练样本(例如多封历史邮件)的多个特征(例如包括邮件名称、发件人、收件人、发送时间、邮件大小等)中筛选出部分的最优特征(例如邮件名称和发送时间)进行组合，得到组合特征(例如将邮件名称和发送时间进行组合而形成的合成特征)，接着文本分类模型可以基于得到的组合特征执行预测任务(例如在接收到某个新邮件时，文本分类模型可以基于新邮件的邮件名称和发送时间组成的组合特征执行分类任务，以确定新接收到的邮件是否为垃圾邮件)，从而提高了垃圾邮件预测的精确度。In another implementation scenario, the feature information mining model can be a text classification model. In order to improve the accuracy of text classification (for example, identification of spam), the feature selection method provided in the embodiment of the present application can be called to filter out some of the optimal features (for example, email name and sending time) from multiple features (for example, including email name, sender, recipient, sending time, email size, etc.) of a training sample (for example, multiple historical emails) and combine them to obtain a combined feature (for example, a synthetic feature formed by combining the email name and sending time). Then the text classification model can perform a prediction task based on the obtained combined feature (for example, when a new email is received, the text classification model can perform a classification task based on the combined feature consisting of the email name and sending time of the new email to determine whether the newly received email is spam), thereby improving the accuracy of spam prediction.

在一些实施例中，本申请实施例可以借助于云技术(Cloud Technology)实现，云技术是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来，实现数据的计算、储存、处理和共享的一种托管技术。In some embodiments, the embodiments of the present application can be implemented with the help of cloud technology (Cloud Technology). Cloud technology refers to a hosting technology that unifies a series of resources such as hardware, software, and network within a wide area network or a local area network to achieve data calculation, storage, processing, and sharing.

云技术是基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、以及应用技术等的总称，可以组成资源池，按需所用，灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源。Cloud technology is a general term for network technology, information technology, integration technology, management platform technology, and application technology based on the cloud computing business model. It can form a resource pool that can be used on demand and is flexible and convenient. Cloud computing technology will become an important support. The background services of the technical network system require a large amount of computing and storage resources.

示例的，图1中示出的服务器200可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(CDN，ContentDelivery Network)、以及大数据和人工智能平台等基础云计算服务的云服务器，其中，云服务可以是特征选择服务，供终端设备400进行调用。终端设备400可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能电视、智能手表、车载终端等，但并不局限于此。终端设备400以及服务器200可以通过有线或无线通信方式进行直接或间接地连接，本申请实施例中不做限制。For example, the server 200 shown in FIG1 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution networks (CDN, Content Delivery Network), and big data and artificial intelligence platforms, wherein the cloud service can be a feature selection service for the terminal device 400 to call. The terminal device 400 can be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart TV, a smart watch, a car terminal, etc., but is not limited thereto. The terminal device 400 and the server 200 can be directly or indirectly connected by wired or wireless communication, which is not limited in the embodiments of the present application.

下面对图1中示出的服务器200的结构进行说明。参见图2，图2是本申请实施例提供的服务器200的结构示意图，图2所示的服务器200包括：至少一个处理器210、存储器240、至少一个网络接口220。服务器200中的各个组件通过总线系统230耦合在一起。可理解，总线系统230用于实现这些组件之间的连接通信。总线系统230除包括数据总线之外，还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见，在图2中将各种总线都标为总线系统230。The structure of the server 200 shown in Figure 1 is described below. Referring to Figure 2, Figure 2 is a schematic diagram of the structure of the server 200 provided in an embodiment of the present application. The server 200 shown in Figure 2 includes: at least one processor 210, a memory 240, and at least one network interface 220. The various components in the server 200 are coupled together through a bus system 230. It can be understood that the bus system 230 is used to achieve connection and communication between these components. In addition to the data bus, the bus system 230 also includes a power bus, a control bus, and a status signal bus. However, for the sake of clarity, various buses are labeled as bus systems 230 in Figure 2.

处理器210可以是一种集成电路芯片，具有信号的处理能力，例如通用处理器、数字信号处理器(DSP，Digital Signal Processor)，或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等，其中，通用处理器可以是微处理器或者任何常规的处理器等。The processor 210 can be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where the general-purpose processor can be a microprocessor or any conventional processor, etc.

存储器240可以是可移除的，不可移除的或其组合。示例性的硬件设备包括固态存储器，硬盘驱动器，光盘驱动器等。存储器240可选地包括在物理位置上远离处理器210的一个或多个存储设备。The memory 240 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard disk drives, optical disk drives, etc. The memory 240 may optionally include one or more storage devices that are physically remote from the processor 210.

存储器240包括易失性存储器或非易失性存储器，也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM，Read Only Memory)，易失性存储器可以是随机存取存储器(RAM，Random Access Memory)。本申请实施例描述的存储器240旨在包括任意适合类型的存储器。The memory 240 includes a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memories. The non-volatile memory may be a read-only memory (ROM), and the volatile memory may be a random access memory (RAM). The memory 240 described in the embodiments of the present application is intended to include any suitable type of memory.

在一些实施例中，存储器240能够存储数据以支持各种操作，这些数据的示例包括程序、模块和数据结构或者其子集或超集，下面示例性说明。In some embodiments, memory 240 can store data to support various operations, examples of which include programs, modules, and data structures, or a subset or superset thereof, as exemplarily described below.

操作系统241，包括用于处理各种基本系统服务和执行硬件相关任务的系统程序，例如框架层、核心库层、驱动层等，用于实现各种基础业务以及处理基于硬件的任务；Operating system 241, including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

网络通信模块242，用于经由一个或多个(有线或无线)网络接口220到达其他电子设备，示例性的网络接口220包括：蓝牙、无线相容性认证(WiFi)、和通用串行总线(USB，Universal Serial Bus)等；A network communication module 242, used to reach other electronic devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 include: Bluetooth, Wireless Compatibility Certification (WiFi), and Universal Serial Bus (USB), etc.;

在一些实施例中，本申请实施例提供的特征选择装置可以采用软件方式实现，图2示出了存储在存储器240中的特征选择装置243，其可以是程序和插件等形式的软件，包括以下软件模块：调用模块2431、融合模块2432、更新模块2433、组合模块2434、确定模块2435、划分模块2436和调整模块2437，这些模块是逻辑上的，因此根据所实现的功能可以进行任意的组合或进一步拆分。需要指出的是，在图2中为了表述方便，一次性示出了上述所有模块，在实际应用中，不排除特征选择装置243中仅包括调用模块2431、融合模块2432、更新模块2433、和组合模块2434的实施，将在下文中说明各个模块的功能。In some embodiments, the feature selection device provided in the embodiments of the present application can be implemented in software. FIG. 2 shows a feature selection device 243 stored in a memory 240, which can be software in the form of a program and a plug-in, including the following software modules: a calling module 2431, a fusion module 2432, an updating module 2433, a combining module 2434, a determining module 2435, a dividing module 2436, and an adjusting module 2437. These modules are logical, and therefore can be arbitrarily combined or further split according to the functions implemented. It should be pointed out that in FIG. 2, for the sake of convenience of expression, all the above modules are shown at once. In actual applications, it is not excluded that the feature selection device 243 only includes the calling module 2431, the fusion module 2432, the updating module 2433, and the combining module 2434. The functions of each module will be explained below.

下面将结合本申请实施例上文提供的电子设备的示例性应用和实施，说明本申请实施例提供的特征选择方法，如前所述，下文所述的特征选择方法可以由服务器单独实现，或由终端设备和服务器协同实现，不再重复说明。The feature selection method provided in the embodiment of the present application will be described below in combination with the exemplary application and implementation of the electronic device provided above in the embodiment of the present application. As mentioned above, the feature selection method described below can be implemented by the server alone, or by the terminal device and the server in collaboration, and will not be repeated.

参见图3，图3是本申请实施例提供的特征选择方法的原理示意图，如图3所示，本申请实施例提供的特征选择方法由两个阶段组成：搜索阶段和重训练阶段。在搜索阶段，首先初始化控制器模型的参数(即初始化多个特征分别对应的权重)，在此之后，将所有待选择的特征(例如训练样本的多个经过嵌入处理后的特征)输入到特征选择模型，以使特征选择模型从控制器模型读取每个特征分别对应的权重(例如对于特征e₁，特征选择模型从控制器模型读取对应的权重为其中，表示丢弃特征e₁的概率，表示选择特征e₁的概率，当控制器模型初始化时，即选择特征e₁的概率和丢弃特征e₁的概率相同)，并确定融合有权重的每个特征。接着，可以将融合有权重的每个特征输入到特征信息挖掘模型，以使特征信息挖掘模型执行测试任务(例如特征信息挖掘模型可以利用这些带权重的特征来预测用户偏好)，得到测试结果。随后，可以基于测试结果(即预测值)与训练样本的标记结果(即真实值)的差异进行反向传播处理，以更新特征信息挖掘模型和控制器模型。在这个阶段结束时，控制器模型的参数被充分训练(即控制器模型中多个特征分别对应的权重被更新，例如对于特征1，当控制器模型更新后，的值被更新为0.9，的值被更新为0.2，即选择特征e₁的概率大于丢弃特征e₁的概率)。See FIG3 , which is a schematic diagram of the principle of the feature selection method provided by the embodiment of the present application. As shown in FIG3 , the feature selection method provided by the embodiment of the present application consists of two stages: a search stage and a retraining stage. In the search stage, the parameters of the controller model are first initialized (i.e., the weights corresponding to multiple features are initialized). After that, all the features to be selected (e.g., multiple features of the training sample after embedding processing) are input into the feature selection model, so that the feature selection model reads the weight corresponding to each feature from the controller model (e.g., for feature e ₁ , the feature selection model reads the corresponding weight from the controller model as in, represents the probability of discarding feature e ₁ , represents the probability of selecting feature e ₁ when the controller model is initialized. That is, the probability of selecting feature _e1 is the same as the probability of discarding feature _e1 ), and each feature fused with weights is determined. Next, each feature fused with weights can be input into the feature information mining model so that the feature information mining model performs the test task (for example, the feature information mining model can use these weighted features to predict user preferences) and obtain the test results. Subsequently, back propagation processing can be performed based on the difference between the test results (that is, the predicted values) and the labeled results of the training samples (that is, the true values) to update the feature information mining model and the controller model. At the end of this stage, the parameters of the controller model are fully trained (that is, the weights corresponding to multiple features in the controller model are updated. For example, for feature 1, when the controller model is updated, The value of is updated to 0.9, The value of is updated to 0.2, that is, the probability of selecting feature e ₁ is greater than the probability of discarding feature e ₁ ).

继续参见图3，搜索阶段结束后进入重训练阶段，在重训练阶段开始时，特征选择模型可以从控制器模型中读取更新的权重，并基于更新的权重从多个特征中筛选出部分特征进行组合(例如特征选择模型基于从控制器模型读取的更新的权重，从特征{e₁、e₂、e₃、e₄}中筛选出特征e₁和e₄)，得到组合特征；其中，组合特征可以用于供特征信息挖掘模型执行预测任务。Continuing to refer to Figure 3, after the search phase ends, the retraining phase begins. At the beginning of the retraining phase, the feature selection model can read the updated weights from the controller model, and select some features from multiple features based on the updated weights for combination (for example, the feature selection model selects features _e1 and _e4 from features { _e1 , e2, _e3 , _e4 } based on the updated weights read from the controller model) to obtain combined features; wherein the combined features can be used for the feature information mining model to perform prediction tasks _.

下面将结合图3中示出的控制器模型、特征选择模型和特征信息挖掘模型，对本申请实施例提供的特征选择方法进行具体说明。The feature selection method provided in the embodiment of the present application will be specifically described below in combination with the controller model, feature selection model and feature information mining model shown in Figure 3.

示例的，参见图4，图4是本申请实施例提供的特征选择方法的流程示意图，将结合图4示出的步骤进行说明。For example, see Figure 4, which is a flowchart of the feature selection method provided in an embodiment of the present application, which will be described in conjunction with the steps shown in Figure 4.

在步骤101中，调用特征选择模型从控制器模型读取训练样本的多个特征分别对应的权重，并将权重对应融合至每个特征。In step 101, the feature selection model is called to read the weights corresponding to multiple features of the training sample from the controller model, and the weights are correspondingly integrated into each feature.

在一些实施例中，如图7A所示，控制器模型的特征搜索空间可以是有向完全图，其中，有向完全图包括：与多个特征一一对应的多个输入节点(例如与特征1、特征2和特征3分别对应的3个输入节点)、以及表征特征选择结果的输出节点；每个输入节点与输出节点之间存在两条边，分别表征输入节点对应的特征的权重，其中，权重包括用于表征特征选择模型选择该特征的概率的第一权重、以及用于表征特征选择模型丢弃该特征的概率的第二权重。In some embodiments, as shown in Figure 7A, the feature search space of the controller model can be a directed complete graph, wherein the directed complete graph includes: multiple input nodes corresponding one-to-one to multiple features (for example, three input nodes corresponding to feature 1, feature 2, and feature 3, respectively), and an output node representing the feature selection results; there are two edges between each input node and the output node, which respectively represent the weights of the features corresponding to the input nodes, wherein the weights include a first weight for representing the probability that the feature selection model selects the feature, and a second weight for representing the probability that the feature selection model discards the feature.

示例的，假设训练样本一共有N个不同的特征(例如包括对象特征、内容特征、以及上下文特征等，其中，对象特征包括但不限于用户的性别、年龄、注册时间、收货地址、常用区域等，内容特征包括但不限于商品、内容的标题分词、内容来源、内容生产者等，上下文特征是代表用户当前时空状态、最近一段时间的行为抽象的特征，例如可以是用户当前的坐标、最近浏览的内容、最近购买的商品等)，则所有可能的特征组合共有2^N个。如果直接利用编码的方法将所有的特征组合表示为特征搜索空间，那么随着待选择特征数量的增加，特征搜索空间将趋于无穷大。鉴于此，本申请实施例将特征搜索空间定义为一个有向完全图，将每一个特征作为图中的一个输入节点，并将特征选择的结果作为图中的输出节点。每个输入节点与输出节点之间存在两条边，分别代表选择或者丢弃该节点对应的特征(例如可以用实线表示选择，虚线表示丢弃；当然，也可以使用不同的颜色进行区分，例如用红色表示选择，绿色表示丢弃)。在这样的设计下，针对N个不同的特征，本申请实施例只需要决定2N条输入节点和输出节点之间的边的状态，可以显著减小特征搜索空间(即从2^N减小至2N)。For example, assuming that the training sample has a total of N different features (for example, including object features, content features, and context features, etc., wherein the object features include but are not limited to the user's gender, age, registration time, delivery address, common areas, etc., the content features include but are not limited to the goods, the title segmentation of the content, the content source, the content producer, etc., the context feature is a feature representing the user's current spatiotemporal state and the behavior abstraction of the recent period of time, such as the user's current coordinates, the most recently browsed content, the most recently purchased goods, etc.), then all possible feature combinations have a total of 2^N. If all feature combinations are directly represented as feature search spaces using the encoding method, then as the number of features to be selected increases, the feature search space will tend to infinity. In view of this, the embodiment of the present application defines the feature search space as a directed complete graph, takes each feature as an input node in the graph, and takes the result of feature selection as the output node in the graph. There are two edges between each input node and the output node, representing the selection or discarding of the feature corresponding to the node (for example, a solid line can be used to represent selection, and a dotted line can represent discarding; of course, different colors can also be used to distinguish, such as red for selection and green for discarding). Under such a design, for N different features, the embodiment of the present application only needs to determine the status of the edges between 2N input nodes and output nodes, which can significantly reduce the feature search space (i.e., from 2^N to 2N).

在一些实施例中，每个特征对应的权重包括第一权重和第二权重，其中，第一权重用于表征特征选择模型选择该特征的概率，第二权重用于表征特征选择模型丢弃该特征的概率，则可以通过以下方式将权重对应融合至每个特征：针对每个特征执行以下处理：对初始格式为离散格式的特征进行二元化处理(例如进行独热编码处理)，得到二元化后的特征，对二元化后的特征进行嵌入处理，得到嵌入后的特征，其中，嵌入后的特征的特征空间的维度，小于二元化后的特征的特征空间的维度；确定嵌入后的特征对应的第一权重与第一系数(例如设置为1)的第一相乘结果(例如其中，表示第n个特征对应的第一权重)、以及嵌入后的特征对应的第二权重与第二系数(例如设置为0)的第二相乘结果(例如其中，表示第n个特征对应的第二权重)，并对第一相乘结果和第二相乘结果进行求和处理，得到第一求和结果(例如)；将第一求和结果和嵌入后的特征(例如第n个嵌入后的特征e_n)的相乘结果(例如)，确定为融合有权重的特征。In some embodiments, the weight corresponding to each feature includes a first weight and a second weight, wherein the first weight is used to characterize the probability that the feature selection model selects the feature, and the second weight is used to characterize the probability that the feature selection model discards the feature. The weights can be fused to each feature in the following manner: the following processing is performed for each feature: binarization processing (for example, one-hot encoding processing) is performed on the feature whose initial format is a discrete format to obtain a binarized feature, and the binarized feature is embedded to obtain an embedded feature, wherein the dimension of the feature space of the embedded feature is smaller than the dimension of the feature space of the binarized feature; a first multiplication result (for example, a first coefficient (set to 1)) of the first weight corresponding to the embedded feature is determined. in, represents the first weight corresponding to the nth feature), and the second multiplication result (for example, the second weight corresponding to the embedded feature and the second coefficient (for example, set to 0) corresponding to the second feature in, represents the second weight corresponding to the nth feature), and the first multiplication result and the second multiplication result are summed to obtain a first summation result (for example ); the multiplication result of the first summation result and the embedded feature (for example, the nth embedded feature e _n ) (for example ), determined as a fusion of weighted features.

示例的，以特征“性别”为例，输入为“男”或者“女”，经过二元化处理之后，可以用(1，0)表示“男”，用(0，1)表示“女”。假设训练样本的多个特征为N个特征，则二元化处理后的N个特征可以表示为：For example, take the feature "gender" as an example. The input is "male" or "female". After binary processing, (1, 0) can be used to represent "male" and (0, 1) can be used to represent "female". Assuming that the multiple features of the training sample are N features, the N features after binary processing can be expressed as:

x＝[x₁,x₂,…,x_N]x＝[x ₁ ,x ₂ ,…,x _N ]

其中，x_n表示第n个特征对应的二元化处理后的结果。在得到二元化处理后的结果之后，还可以将二元化后的特征投影到低维连续空间中：e_n＝A_nx_n，最终得到嵌入后的N个特征：Among them, _xn represents the result of binary processing corresponding to the nth feature. After obtaining the result of binary processing, the binary features can also be projected into a low-dimensional continuous space: _en = A _n x _n , and finally the embedded N features are obtained:

E＝[e₁,e₂,…,e_N]E＝[e ₁ ,e ₂ ,…,e _N ]

其中，e_n表示第n个特征对应的嵌入处理后的结果，是待学习的参数，d由使用者确定(不同的特征可以使用相同的值或者不同的值)。如此，通过将二元化后的特征进行进一步的嵌入处理，压缩了特征的空间维度，方便了后续的特征使用。Among them, _en represents the result of embedding processing corresponding to the nth feature, is the parameter to be learned, and d is determined by the user (different features can use the same value or different values). In this way, by further embedding the binary features, the spatial dimension of the features is compressed, which facilitates the subsequent use of the features.

在得到嵌入后的N个特征之后，可以从控制器模型中读取每个特征分别对应的权重，例如以嵌入后的特征e₂为例，特征选择模型从控制器模型中读取的与特征e₂对应的权重为其中，第一权重表示选择特征e₂的概率，第二权重表示丢弃特征e₂的概率。随后，可以将第一权重与第一系数(假设为1)的第一相乘结果、以及第二权重与第二系数(假设为0)的第二相乘结果进行求和处理，并将求和结果与嵌入后的特征e₂的相乘结果，确定为融合有权重的特征e′₂，即 After obtaining the embedded N features, the weight corresponding to each feature can be read from the controller model. For example, taking the embedded feature _e2 as an example, the weight corresponding to feature _e2 read by the feature selection model from the controller model is Among them, the first weight represents the probability of selecting feature e ₂ , the second weight represents the probability of discarding feature e _2. Subsequently, the first weight can be The first multiplication result with the first coefficient (assuming it is 1), and the second weight The second multiplication result with the second coefficient (assumed to be 0) is summed, and the summation result is multiplied with the embedded feature e ₂ to determine the weighted feature e′ ₂ , that is,

在另一些实施例中，每个特征对应的权重包括第一权重和第二权重，其中，第一权重用于表征特征选择模型选择该特征的概率，第二权重用于表征特征选择模型丢弃该特征的概率，则可以通过以下方式将权重对应融合至每个特征：针对每个特征执行以下处理：对初始格式为离散格式的特征进行二元化处理，得到二元化后的特征，对二元化后的特征进行嵌入处理，得到嵌入后的特征，其中，嵌入后的特征的特征空间的维度，小于二元化后的特征的特征空间的维度；基于嵌入后的特征对应的第一权重和第二权重，确定第一中间参数(例如)和第二中间参数(例如其中，和可以根据第n个嵌入后的特征e_n对应的第一权重和第二权重得到，将在下文进行具体说明)；确定第一中间参数与第一系数(例如设置为1)的第三相乘结果(例如)、以及第二中间参数与第二系数(例如设置为0)的第四相乘结果(例如)，并对第三相乘结果和第四相乘结果进行求和处理，得到第二求和结果(例如)；将第二求和结果和嵌入后的特征的相乘结果(例如)，确定为融合有权重的特征。In other embodiments, the weight corresponding to each feature includes a first weight and a second weight, wherein the first weight is used to characterize the probability that the feature selection model selects the feature, and the second weight is used to characterize the probability that the feature selection model discards the feature. The weights can be fused to each feature in the following manner: performing the following processing on each feature: binarizing the feature in an initial discrete format to obtain a binarized feature, embedding the binarized feature to obtain an embedded feature, wherein the dimension of the feature space of the embedded feature is smaller than the dimension of the feature space of the binarized feature; determining a first intermediate parameter (e.g. ) and a second intermediate parameter (e.g. in, and According to the first weight corresponding to the nth embedded feature _en and the second weight obtain, which will be described in detail below); determine a third multiplication result (for example, ), and a fourth multiplication result (eg, ), and sum the third multiplication result and the fourth multiplication result to obtain a second summation result (for example ); multiply the second summation result and the embedded feature (e.g. ), determined as a fusion of weighted features.

示例的，可以通过以下方式实现上述的基于嵌入后的特征对应的第一权重和第二权重，确定第一中间参数和第二中间参数：基于嵌入后的特征对应的第一权重和第二权重，调用归一化指数函数，确定第一中间参数和第二中间参数；其中，归一化指数函数满足以下条件：使得第一中间参数和第二中间参数的取值边缘化(即一个接近0，另一个接近1)；保持控制器模型的可微分性。By way of example, the above-mentioned determination of the first intermediate parameter and the second intermediate parameter based on the first weight and the second weight corresponding to the embedded features can be implemented in the following manner: based on the first weight and the second weight corresponding to the embedded features, a normalized exponential function is called to determine the first intermediate parameter and the second intermediate parameter; wherein the normalized exponential function satisfies the following conditions: the values of the first intermediate parameter and the second intermediate parameter are marginalized (i.e., one is close to 0 and the other is close to 1); and the differentiability of the controller model is maintained.

举例来说，以归一化指数函数为Gumbel-Softmax函数为例，可以通过以下方式实现上述的基于嵌入后的特征对应的第一权重和第二权重，调用归一化指数函数，确定第一中间参数和第二中间参数：基于嵌入后的特征对应的第一权重和第二权重调用归一化指数函数执行以下处理：对第一权重的取对数结果(例如)与第一噪声系数(例如g₁)的求和结果，与退火参数(例如τ)进行相除处理，得到第一相除结果(例如)；对第二权重的取对数结果(例如)与第二噪声系数(例如g₀)的求和结果，与退火参数进行相除处理，得到第二相除结果(例如)；对第一相除结果的取指数结果与第二相除结果的取指数结果进行求和处理，得到第三求和结果(例如 )；将第一相除结果的取指数结果(例如)与第三求和结果的相除结果，确定为第一中间参数(例如其中，表示新的选择嵌入后的特征e_n的概率)；将第二相除结果的取指数结果(例如)与第三求和结果的相除结果，确定为第二中间参数(例如其中，表示新的丢弃嵌入后的特征e_n的概率)。For example, taking the normalized exponential function as the Gumbel-Softmax function, the first weight and the second weight corresponding to the embedded features can be implemented in the following manner: calling the normalized exponential function to determine the first intermediate parameter and the second intermediate parameter: calling the normalized exponential function based on the first weight and the second weight corresponding to the embedded features to perform the following processing: taking the logarithm of the first weight (for example ) and the first noise coefficient (eg, g ₁ ), and is divided by the annealing parameter (eg, τ) to obtain a first division result (eg, ); Take the logarithm of the second weight (for example ) and the second noise coefficient (eg, g ₀ ), and is divided by the annealing parameter to obtain a second division result (eg, ); sum the exponential result of the first division result and the exponential result of the second division result to obtain a third summation result (for example ); Exponentiate the result of the first division (e.g. ) and the third summation result, and determine it as the first intermediate parameter (for example in, represents the probability of the new selected embedded feature e _n ); the exponential result of the second division result (e.g. ) and the third summation result, and determine it as the second intermediate parameter (for example in, represents the probability of the new feature e _n after discarding the embedding).

在步骤102中，基于融合有权重的每个特征，调用特征信息挖掘模型执行测试任务，得到测试结果。In step 102, based on each feature fused with weights, a feature information mining model is called to execute the test task and obtain the test result.

作为示例，测试任务可以是推荐任务或者分类任务，当测试任务为推荐任务时，测试结果可以是用户的偏好(例如用户针对某个物品的点击率)；当测试任务为分类任务时，测试结果可以是特征信息挖掘模型基于融合有权重的每个特征输出的分类结果(例如判断某个邮件是否为垃圾邮件)。As an example, the test task can be a recommendation task or a classification task. When the test task is a recommendation task, the test result can be the user's preference (such as the user's click rate on a certain item); when the test task is a classification task, the test result can be the classification result output by the feature information mining model based on each feature with a fusion weight (such as determining whether a certain email is spam).

在一些实施例中，特征信息挖掘模型可以包括嵌入单元和感知机单元，则可以通过以下方式实现上述的步骤102：基于融合有权重的每个嵌入后的特征，调用感知机单元执行测试任务，得到测试结果；其中，嵌入后的特征是调用嵌入单元针对初始格式为离散格式的特征执行以下处理得到的：对初始格式为离散格式的特征进行二元化处理，得到二元化后的特征，对二元化后的特征进行嵌入处理，得到嵌入后的特征，其中，嵌入后的特征的特征空间的维度，小于二元化后的特征的特征空间的维度。In some embodiments, the feature information mining model may include an embedding unit and a perceptron unit, and the above-mentioned step 102 may be implemented in the following manner: based on each embedded feature fused with a weight, calling the perceptron unit to execute a test task to obtain a test result; wherein the embedded feature is obtained by calling the embedding unit to perform the following processing on the feature whose initial format is a discrete format: binarizing the feature whose initial format is a discrete format to obtain a binarized feature, embedding the binarized feature to obtain an embedded feature, wherein the dimension of the feature space of the embedded feature is smaller than the dimension of the feature space of the binarized feature.

示例的，如图8所示，感知机单元可以是多层感知机，多层感知机包括多个层(例如4层)，每个层包括对应的权重矩阵和偏置向量；则可以通过以下方式实现上述的基于融合有权重的每个嵌入后的特征，调用感知机单元执行测试任务，得到测试结果：将融合有权重的每个嵌入后的特征作为多层感知机的第1层(即输入层)的输入，并迭代m执行以下处理：对第m层的权重矩阵与第m层的输入的相乘结果，与第m层的偏置向量进行求和处理，得到第四求和结果；对第四求和结果进行线性变换处理(例如采用ReLU函数进行线性变换处理)，得到第m层的输出，并将第m层的输出作为第m+1层的输入(即前一层的输出会作为后一层的输入，例如将第1层的输出输入到第2层中，将第2层的输出输入到第3层中，以此类推)；其中，m的取值逐步递增且满足1≤m≤M-1，M为多个层的总数；对第M层(即输出层)的权重矩阵与第M层的输入的相乘结果，与第M层的偏置向量进行求和处理，得到第五求和结果；确定与测试任务对应的非线性激活函数(例如Sigmoid函数)；基于非线性激活函数对第五求和结果进行非线性激活处理，得到测试结果。For example, as shown in FIG8 , the perceptron unit may be a multilayer perceptron, which includes multiple layers (e.g., 4 layers), each layer including a corresponding weight matrix and a bias vector; the above-mentioned method based on each embedded feature fused with weights may be implemented in the following manner, calling the perceptron unit to perform the test task and obtain the test result: taking each embedded feature fused with weights as the input of the first layer (i.e., the input layer) of the multilayer perceptron, and iterating m to perform the following processing: summing the multiplication result of the weight matrix of the mth layer and the input of the mth layer with the bias vector of the mth layer to obtain a fourth summation result; performing a linear transformation on the fourth summation result (e.g., using a ReLU function for linear transformation); The output of the mth layer is obtained by performing a conversion process, and the output of the mth layer is used as the input of the m+1th layer (that is, the output of the previous layer will be used as the input of the next layer, for example, the output of the 1st layer is input into the 2nd layer, the output of the 2nd layer is input into the 3rd layer, and so on); wherein, the value of m increases gradually and satisfies 1≤m≤M-1, and M is the total number of multiple layers; the multiplication result of the weight matrix of the Mth layer (that is, the output layer) and the input of the Mth layer is summed with the bias vector of the Mth layer to obtain a fifth summation result; a nonlinear activation function (such as a Sigmoid function) corresponding to the test task is determined; a nonlinear activation process is performed on the fifth summation result based on the nonlinear activation function to obtain a test result.

在步骤103中，基于测试结果与训练样本的标记结果的差异进行反向传播处理，以更新特征信息挖掘模型和控制器模型。In step 103, back propagation processing is performed based on the difference between the test result and the labeled result of the training sample to update the feature information mining model and the controller model.

在一些实施例中，可以通过以下方式实现步骤103：以特征信息挖掘模型和控制器模型的损失为因子，基于不同的优化目标，分别构建第一目标函数和第二目标函数，其中，第一目标函数的优化目标可以是直接最小化损失函数，第二目标函数的优化目标可以是通过使用对函数求集合的函数(例如arg函数)的方式最小化损失函数(即第一目标函数的优化目标和第二目标函数的优化目标是不同的)。随后，在每轮训练过程中，可以将测试结果与训练样本的标记结果的差异作为误差信号，代入第一目标函数和第二目标函数中的任意一个，并结合梯度下降算法依次在特征信息挖掘模型和控制器模型中进行梯度反向传播，从而依次更新更新特征信息挖掘模型的参数和控制器模型的权重。In some embodiments, step 103 can be implemented in the following manner: taking the loss of the feature information mining model and the controller model as factors, and based on different optimization objectives, respectively constructing the first objective function and the second objective function, wherein the optimization objective of the first objective function can be to directly minimize the loss function, and the optimization objective of the second objective function can be to minimize the loss function by using a function that seeks a set of functions (e.g., an arg function) (i.e., the optimization objective of the first objective function and the optimization objective of the second objective function are different). Subsequently, in each round of training, the difference between the test result and the labeled result of the training sample can be used as an error signal, substituted into any one of the first objective function and the second objective function, and combined with the gradient descent algorithm, gradient back propagation is performed in the feature information mining model and the controller model in turn, thereby updating the parameters of the feature information mining model and the weights of the controller model in turn.

在另一些实施例中，参见图5A，图5A是本申请实施例提供的特征选择方法的流程示意图，如图5A所示，图4示出的步骤103可以通过图5A示出的步骤1031至步骤1032实现，将结合图5A示出的步骤进行说明。In other embodiments, referring to FIG. 5A , FIG. 5A is a flow chart of a feature selection method provided in an embodiment of the present application. As shown in FIG. 5A , step 103 shown in FIG. 4 can be implemented through steps 1031 to 1032 shown in FIG. 5A , which will be described in conjunction with the steps shown in FIG. 5A .

在步骤1031中，将多个不同的训练样本划分为训练集和验证集。In step 1031, a plurality of different training samples are divided into a training set and a validation set.

在一些实施例中，可以按照设定的比例(例如8:2)将多个不同的训练样本划分为训练集和验证集，其中，训练集用于更新特征信息挖掘模型的参数，验证集用于更新控制器模型的参数(即权重)，如此，通过利用不同批的数据来分别训练特征信息挖掘模型和控制器模型，能够避免过拟合问题的发生。In some embodiments, multiple different training samples can be divided into training sets and validation sets according to a set ratio (for example, 8:2), wherein the training set is used to update the parameters of the feature information mining model, and the validation set is used to update the parameters (i.e., weights) of the controller model. In this way, by using different batches of data to respectively train the feature information mining model and the controller model, the occurrence of overfitting problems can be avoided.

在步骤1032中，将训练集对应的测试结果与训练集的标记结果代入第一目标函数，以确定对应的第一差异，根据第一差异确定特征信息挖掘模型的第一梯度，并根据第一梯度更新特征信息挖掘模型的参数。In step 1032, the test results corresponding to the training set and the labeled results of the training set are substituted into the first objective function to determine the corresponding first difference, determine the first gradient of the feature information mining model according to the first difference, and update the parameters of the feature information mining model according to the first gradient.

在一些实施例中，第一目标函数使用的损失函数可以是二分类交叉熵(BCE，Binary Cross Entropy)损失函数，则可以通过以下方式实现上述的将训练集对应的测试结果与训练集的标记结果代入第一目标函数，以确定对应的第一差异：通过二分类交叉熵损失函数执行以下处理：对训练集的标记结果(即真实值，例如y1)与训练集对应的测试结果(即特征信息挖掘模型输出的预测值，例如)的取对数结果进行相乘处理，得到第五相乘结果(例如)；确定1与训练集的标记结果之间的第一差值(例如(1-y1))，以及1与训练集对应的测试结果之间的第二差值的取对数结果(例如)，并对第一差值与第二差值的取对数结果进行相乘处理，得到第六相乘结果(例如)；将第五相乘结果与第六相乘结果的求和结果，确定为第一差异。In some embodiments, the loss function used by the first objective function may be a binary cross entropy (BCE) loss function. The above-mentioned step of substituting the test result corresponding to the training set and the labeled result of the training set into the first objective function to determine the corresponding first difference may be implemented in the following manner: the following processing is performed by the binary cross entropy loss function: the labeled result of the training set (i.e., the true value, such as y1) and the test result corresponding to the training set (i.e., the predicted value output by the feature information mining model, such as ) are multiplied to obtain a fifth multiplication result (e.g. ); determine the first difference between 1 and the labeled result of the training set (e.g. (1-y1)), and the logarithm result of the second difference between 1 and the test result corresponding to the training set (e.g. ), and multiply the logarithmic results of the first difference and the second difference to obtain a sixth multiplication result (for example ); The sum of the fifth multiplication result and the sixth multiplication result is determined as the first difference.

需要说明的是，由于本申请实施例主要针对的是二分类问题，因此在此使用BCE损失函数；可以理解的是，对于其他类别的推荐任务，也可以使用其他损失函数，例如合页损失函数、均方误差损失函数等，本申请实施例对此不作具体限定。It should be noted that since the embodiment of the present application is mainly aimed at the binary classification problem, the BCE loss function is used here; it can be understood that other loss functions can also be used for other categories of recommendation tasks, such as hinge loss function, mean square error loss function, etc., and the embodiment of the present application does not make specific limitations on this.

在步骤1033中，将验证集对应的测试结果与验证集的标记结果代入第二目标函数，以确定对应的第二差异，根据第二差异确定控制器模型的第二梯度，并根据第二梯度更新控制器模型的权重。In step 1033, the test results corresponding to the validation set and the labeled results of the validation set are substituted into the second objective function to determine the corresponding second difference, the second gradient of the controller model is determined according to the second difference, and the weight of the controller model is updated according to the second gradient.

在一些实施例中，每个特征对应的权重包括第一权重和第二权重，第一权重用于表征特征选择模型选择该特征的概率，第二权重用于表征特征选择模型丢弃该特征的概率；其中，当控制器模型初始化时，每个特征对应的第一权重和第二权重的取值相同(例如以嵌入后的特征e₂为例，当控制器模型初始化时，特征选择模型从控制器模型中读取的与嵌入后的特征e₂对应的第一权重和第二权重的取值相同，均为0.5，即当控制器模型初始化时，选择嵌入后的特征e₂的概率与丢弃嵌入后的特征e₂的概率相同)；当控制器模型更新后，当该特征为有效特征时，则该特征对应的第一权重大于第二权重(例如仍以嵌入后的特征e₂为例，当控制器模型更新后，且嵌入后的特征e₂为有效特征时，则嵌入后的特征e₂对应的第一权重的取值将大于第二权重的取值，例如假设控制器模型在经过搜索阶段的训练之后，第一权重的取值从0.5更新为0.9，第二权重的取值从0.5更新为0.2，即针对有效特征，当控制器模型更新后，选择该特征的概率将大于丢弃该特征的概率，因此在重训练阶段，特征选择模型可以根据更新后的权重从多个特征中选中该特征)；当该特征为无效特征时，则该特征对应的第一权重小于第二权重(例如仍以嵌入后的特征e₂为例，当控制器模型更新后，且嵌入后的特征e₂为无效特征时，则嵌入后的特征e₂对应的第一权重的取值将小于第二权重的取值，例如假设控制器模型在经过搜索阶段的训练之后，第一权重的取值从0.5更新为0.3，第二权重的取值从0.5更新为0.8，即针对无效特征，当控制器模型更新后，选择该特征的概率将小于丢弃该特征的概率，因此在重训练阶段中，特征选择模型可以根据更新后的权重从多个特征中丢弃该特征)。In some embodiments, the weight corresponding to each feature includes a first weight and a second weight, the first weight is used to characterize the probability that the feature selection model selects the feature, and the second weight is used to characterize the probability that the feature selection model discards the feature; wherein, when the controller model is initialized, the first weight and the second weight corresponding to each feature have the same value (for example, taking the embedded feature _e2 as an example, when the controller model is initialized, the feature selection model reads the first weight corresponding to the embedded feature _e2 from the controller model and the second weight The values of are the same, both are 0.5, that is, when the controller model is initialized, the probability of selecting the embedded feature e ₂ is the same as the probability of discarding the embedded feature e ₂ ); when the controller model is updated, when the feature is a valid feature, the first weight corresponding to the feature is greater than the second weight (for example, still taking the embedded feature e ₂ as an example, when the controller model is updated and the embedded feature e ₂ is a valid feature, the first weight corresponding to the embedded feature e ₂ is greater than the second weight The value will be greater than the second weight For example, assuming that the controller model is trained in the search phase, the first weight The value of is updated from 0.5 to 0.9, and the second weight The value of is updated from 0.5 to 0.2, that is, for a valid feature, when the controller model is updated, the probability of selecting the feature will be greater than the probability of discarding the feature. Therefore, in the retraining stage, the feature selection model can select the feature from multiple features according to the updated weight); when the feature is an invalid feature, the first weight corresponding to the feature is less than the second weight (for example, still taking the embedded feature _e2 as an example, when the controller model is updated and the embedded feature _e2 is an invalid feature, the first weight corresponding to the embedded feature _e2 is The value will be less than the second weight For example, assuming that the controller model is trained in the search phase, the first weight The value of is updated from 0.5 to 0.3, and the second weight The value of is updated from 0.5 to 0.8, that is, for invalid features, when the controller model is updated, the probability of selecting the feature will be less than the probability of discarding the feature. Therefore, in the retraining stage, the feature selection model can discard the feature from multiple features according to the updated weight).

在另一些实施例中，第二目标函数使用的损失函数也可以是二分类交叉熵损失函数，则可以通过以下方式实现上述的将验证集对应的测试结果与验证集的标记结果代入第二目标函数，以确定对应的第二差异：通过二分类交叉熵损失函数执行以下处理：对验证集的标记结果(即真实值，例如y2)与验证集对应的测试结果(即特征信息挖掘模型输出的预测值，例如)的取对数结果进行相乘处理，得到第七相乘结果(例如)；确定1与验证集的标记结果之间的第三差值(例如(1-y2))，以及1与验证集对应的测试结果之间的第四差值的取对数结果(例如)，并对第三差值与第四差值的取对数结果进行相乘处理，得到第八相乘结果(例如)；将第七相乘结果与第八相乘结果的求和结果，确定为第二差异。In some other embodiments, the loss function used by the second objective function may also be a binary cross entropy loss function. The above-mentioned substitution of the test results corresponding to the validation set and the labeled results of the validation set into the second objective function to determine the corresponding second difference may be achieved in the following manner: the following processing is performed by the binary cross entropy loss function: the labeled results of the validation set (i.e., the true value, such as y2) and the test results corresponding to the validation set (i.e., the predicted value output by the feature information mining model, such as ) and multiply the logarithmic result to obtain the seventh multiplication result (for example ); determine the third difference between 1 and the labeled result of the validation set (e.g., (1-y2)), and the logarithm result of the fourth difference between 1 and the test result corresponding to the validation set (e.g., ), and multiply the logarithmic results of the third difference and the fourth difference to obtain an eighth multiplication result (for example ); The sum of the seventh multiplication result and the eighth multiplication result is determined as the second difference.

需要说明的是，由于本申请实施例主要针对的是二分类问题，因此使用BCE损失函数，可以理解的是，针对其他类型的推荐任务或者分类任务，也可以使用其他类型的损失函数，例如均方误差损失函数、合页损失函数、交叉熵损失函数等，本申请实施例对此不作具体限定。It should be noted that since the embodiment of the present application is mainly aimed at the binary classification problem, the BCE loss function is used. It can be understood that other types of loss functions can also be used for other types of recommendation tasks or classification tasks, such as mean square error loss function, hinge loss function, cross entropy loss function, etc. The embodiment of the present application does not make specific limitations on this.

在步骤104中，调用特征选择模型从控制器模型读取更新的权重，基于更新的权重从多个特征中筛选出部分特征进行组合，得到组合特征。In step 104, the feature selection model is called to read the updated weights from the controller model, and some features are selected from multiple features based on the updated weights to combine and obtain combined features.

这里，组合特征用于供更新后的特征信息挖掘模型执行预测任务，其中，预测任务可以是推荐任务，例如新闻、电影、音乐等信息的推荐任务；也可以是分类任务，例如文本分类任务(例如垃圾邮件识别)、图片分类任务(例如人脸识别)等。Here, the combined features are used for the updated feature information mining model to perform prediction tasks, where the prediction tasks can be recommendation tasks, such as recommendation tasks for news, movies, music and other information; they can also be classification tasks, such as text classification tasks (such as spam identification), image classification tasks (such as face recognition), etc.

需要说明的是，本申请实施例中涉及的测试任务和预测任务是针对特征信息挖掘模型的训练阶段/应用阶段而言的，但测试任务和预测任务的目标是相同的(例如都是预测用户针对某个物品的点击率、或者判断某个邮件是否为垃圾邮件)。以特征信息挖掘模型为分类模型为例，特征信息挖掘模型在训练阶段执行的分类任务称为测试任务，特征信息挖掘模型训练完成后在应用阶段执行的分类任务称为预测任务。也就是说，在这2个阶段中，特征信息挖掘模型在执行分类任务时所使用的特征是不一样的，在训练阶段特征信息挖掘模型是基于融合有权重的每个特征执行分类任务的；而在应用阶段特征信息挖掘模型是基于从多个特征中确定出的组合特征(即多个特征的子集)执行分类任务的。It should be noted that the test tasks and prediction tasks involved in the embodiments of the present application are for the training phase/application phase of the feature information mining model, but the goals of the test tasks and prediction tasks are the same (for example, both are to predict the click rate of users for a certain item, or to determine whether a certain email is spam). Taking the feature information mining model as a classification model as an example, the classification task performed by the feature information mining model in the training phase is called a test task, and the classification task performed in the application phase after the feature information mining model is trained is called a prediction task. In other words, in these two phases, the features used by the feature information mining model when performing the classification task are different. In the training phase, the feature information mining model performs the classification task based on each feature fused with a weight; and in the application phase, the feature information mining model performs the classification task based on the combined features (i.e., a subset of multiple features) determined from multiple features.

在一些实施例中，参见图5B，图5B是本申请实施例提供的特征选择方法的流程示意图，如图5B所示，图4示出的步骤104可以通过图5B示出的步骤1041至步骤1042实现，将结合图5B示出的步骤进行说明。In some embodiments, referring to FIG. 5B , FIG. 5B is a flow chart of a feature selection method provided in an embodiment of the present application. As shown in FIG. 5B , step 104 shown in FIG. 4 can be implemented through steps 1041 to 1042 shown in FIG. 5B , which will be described in conjunction with the steps shown in FIG. 5B .

在步骤1041中，调用特征选择模型从控制器模型读取更新的权重，并基于更新的权重，确定多个特征分别对应的得分。In step 1041, the feature selection model is called to read the updated weights from the controller model, and based on the updated weights, the scores corresponding to the plurality of features are determined.

在一些实施例中，每个特征对应的更新的权重包括更新的第一权重和更新的第二权重，其中，更新的第一权重用于表征特征选择模型选择该特征的概率，更新的第二权重用于表征特征选择模型丢弃该特征的概率，则可以通过以下方式实现上述的基于更新的权重，确定多个特征分别对应的得分：针对每个特征执行以下处理：对特征对应的更新的第一权重的取指数结果(例如其中，表示第n个嵌入后的特征e_n对应的更新后的第一权重，例如假设)，与特征对应的更新的第二权重(例如假设)的取指数结果进行求和处理，得到第六求和结果(例如)；将特征对应的更新的第一权重的取指数结果(例如)与第六求和结果的相除结果，确定为特征对应的得分(例如π_n，即第n个嵌入后的特征e_n对应的得分)。In some embodiments, the updated weight corresponding to each feature includes an updated first weight and an updated second weight, wherein the updated first weight is used to characterize the probability that the feature selection model selects the feature, and the updated second weight is used to characterize the probability that the feature selection model discards the feature. The above-mentioned updated weight-based determination of scores corresponding to the plurality of features can be implemented in the following manner: For each feature, the following processing is performed: the exponential result of the updated first weight corresponding to the feature (for example in, represents the updated first weight corresponding to the nth embedded feature e _n . For example, assuming ), the updated second weight corresponding to the feature (e.g. Assumptions ) is summed to obtain the sixth summation result (e.g. ); Exponentially update the first weight corresponding to the feature (e.g. ) and the sixth summation result are divided to determine the score corresponding to the feature (for example, π _n , i.e., the score corresponding to the nth embedded feature _en ).

在步骤1042中，从多个特征中筛选出得分大于得分阈值的至少部分特征进行组合，得到组合特征。In step 1042, at least some of the features having scores greater than a score threshold are selected from the multiple features and combined to obtain a combined feature.

在一些实施例中，以多个特征为N个特征为例，在得到N个特征分别对应的得分之后，可以按照得分从高到低的顺序对N个特征进行降序排序，并将降序排序结果中排在前K位的K个特征筛选出来进行组合，得到组合特征，其中，K、N均为正整数，且K＜N。In some embodiments, taking the multiple features as N features as an example, after obtaining the scores corresponding to the N features respectively, the N features can be sorted in descending order from high to low according to the scores, and the K features ranked in the top K positions in the descending sorting results are screened out and combined to obtain combined features, where K and N are both positive integers, and K＜N.

示例的，以训练样本的多个特征为4个特征为例，分别为{e₁、e₂、e₃、e₄}，同时假设特征e₁对应的得分为80，特征e₂对应的得分为60，e₃对应的得分为90，e₄对应的得分为70，则按照得分从高到低的顺序对这4个特征进行降序排序的排序结果为：e₃、e₁、e₄、e₂，接着可以从降序排序结果中筛选出排在前2位的2个特征(即特征e₃和e₁)进行组合，得到组合特征。For example, take the multiple features of the training sample as 4 features, namely {e ₁ , e ₂ , e ₃ , e ₄ }. At the same time, assuming that the score corresponding to feature e ₁ is 80, the score corresponding to feature e ₂ is 60, the score corresponding to e ₃ is 90, and the score corresponding to e ₄ is 70, then the sorting result of sorting these 4 features in descending order from high to low is: e ₃ , e ₁ , e ₄ , e _2. Then, the top 2 features (i.e., features e ₃ and e ₁ ) can be selected from the descending sorting result and combined to obtain the combined feature.

在另一些实施例中，组合特征包括的特征的数量小于多个特征的数量，特征信息挖掘模型是基于多个特征构建的；则在基于更新的权重从多个特征中筛选出部分特征进行组合，得到组合特征之后，还可以执行以下处理：将特征信息挖掘模型(例如基于差异更新后的特征信息挖掘模型)的输入层的尺寸，从与多个特征的数量对应的尺寸调整至与组合特征包括的特征的数量对应的尺寸，得到新的特征信息挖掘模型。In other embodiments, the number of features included in the combined feature is less than the number of multiple features, and the feature information mining model is constructed based on multiple features; then, after some features are screened out from the multiple features based on the updated weights for combination to obtain the combined feature, the following processing can also be performed: the size of the input layer of the feature information mining model (for example, the feature information mining model after the difference update) is adjusted from the size corresponding to the number of multiple features to the size corresponding to the number of features included in the combined feature to obtain a new feature information mining model.

示例的，以多个特征为N个特征为例，即特征信息挖掘模型是基于N个特征构建的；则在基于更新的权重从N个特征中筛选出至少部分特征(例如从N个特征中筛选出K个特征，其中，K小于N)进行组合，得到组合特征之后，还可以执行以下处理：将特征信息挖掘模型的输入层的尺寸，从与N个特征的数量对应的尺寸调整至组合特征包括的特征的数量(即K)对应的尺寸(例如将特征信息挖掘模型包括的第一个MLP层的输入尺寸从N×d调整为K×d，其中，d为特征的维度)，得到新的特征信息挖掘模型。For example, take the case where the multiple features are N features, that is, the feature information mining model is constructed based on N features; then after at least some features are screened out from the N features based on the updated weights (for example, K features are screened out from the N features, where K is less than N) and combined to obtain the combined features, the following processing can also be performed: the size of the input layer of the feature information mining model is adjusted from the size corresponding to the number of N features to the size corresponding to the number of features included in the combined features (that is, K) (for example, the input size of the first MLP layer included in the feature information mining model is adjusted from N×d to K×d, where d is the dimension of the feature), to obtain a new feature information mining model.

本申请实施例提供的特征选择方法，特征选择模型首先根据控制器模型中针对训练样本的多个特征的权重为教导，确定出融合有权重的特征，接着由特征信息挖掘模型基于融合有权重的特征执行测试任务，来优化控制器模型中的权重，进而筛选出部分最优的特征进行组合供更新后的特征信息挖掘模型执行后续的预测任务，与已有技术相比，避免了对专家知识、以及人力的大量需求，能够自适应不同规模的训练集，减少计算需求的同时能够筛选出最优的组合特征供特征信息挖掘模型执行预测任务，保证了预测精度。The feature selection method provided in the embodiment of the present application is that the feature selection model first determines the weighted features based on the weights of multiple features of the training samples in the controller model, and then the feature information mining model performs the test task based on the weighted features to optimize the weights in the controller model, and then screens out some of the best features for combination for the updated feature information mining model to perform subsequent prediction tasks. Compared with the existing technology, it avoids the large demand for expert knowledge and manpower, can adapt to training sets of different sizes, reduce computing requirements, and can screen out the best combination of features for the feature information mining model to perform prediction tasks, thereby ensuring prediction accuracy.

下面，以内容推荐场景为例说明本申请实施例在推荐场景中的示例性应用。Below, the exemplary application of the embodiments of the present application in the recommendation scenario is explained by taking the content recommendation scenario as an example.

特征的质量能够显著地影响推荐系统所应用的特征信息挖掘模型的表现。因此，在深度学习技术基础上设计的特征信息挖掘模型中，特征选择是非常重要的一个模块。本申请实施例提供了一个可以自动且适应性地为特征信息挖掘模型选择关键特征的自动化机器学习框架。具体来说，本申请实施例利用神经网络搜索的相关技术，设计了一个可微分的控制器网络(对应于上述的控制器模型)，结合本申请实施例提供的另外两个模型，分别为特征选择模型和特征信息挖掘模型，能够实现自动调整特定特征被选用的概率；在这之后，本申请实施例还给出了根据控制器网络所给的概率选择特征的具体方法；最后，只有被选中的特征被输入到对应的特征信息挖掘模型中进行重新训练。同时，本申请实施例提供的特征选择方法具有很好的泛化性，可以应用到不同的推荐系统中。The quality of features can significantly affect the performance of the feature information mining model used by the recommendation system. Therefore, in the feature information mining model designed based on deep learning technology, feature selection is a very important module. The embodiment of the present application provides an automated machine learning framework that can automatically and adaptively select key features for the feature information mining model. Specifically, the embodiment of the present application uses the relevant technology of neural network search to design a differentiable controller network (corresponding to the above-mentioned controller model), combined with the other two models provided by the embodiment of the present application, namely the feature selection model and the feature information mining model, which can automatically adjust the probability of a specific feature being selected; after that, the embodiment of the present application also provides a specific method for selecting features according to the probability given by the controller network; finally, only the selected features are input into the corresponding feature information mining model for retraining. At the same time, the feature selection method provided in the embodiment of the present application has good generalization and can be applied to different recommendation systems.

示例的，本申请实施例提供的特征选择方法可以应用于各种各样的推荐系统中，更一般地，也可以应用于分类问题。具体来说，只需要将需要使用的数据处理成表格状形式，即可作为本申请实施例提供的特征选择方法的输入，从而可以给出被选出的特征进行组合。最后只需将被选中的特征进行组合得到的组合特征，输入到需要使用的推荐系统或者分类系统中，即可得到良好的推荐效果或者分类结果。For example, the feature selection method provided in the embodiment of the present application can be applied to various recommendation systems, and more generally, it can also be applied to classification problems. Specifically, it is only necessary to process the data to be used into a tabular form, which can be used as the input of the feature selection method provided in the embodiment of the present application, so that the selected features can be given for combination. Finally, it is only necessary to input the combined features obtained by combining the selected features into the recommendation system or classification system to be used, and a good recommendation effect or classification result can be obtained.

下面对本申请实施例提供的特征选择方法进行具体说明。The feature selection method provided in the embodiment of the present application is described in detail below.

示例的，参见图6，图6是本申请实施例提供的特征选择方法的原理示意图，如图6所示，本申请实施例提供的特征选择方法由两个阶段组成：搜索阶段(Search Stage)和重训练阶段(Retraining Stage)。为了找到最优的特征进行组合，本申请实施例在搜索阶段更新控制器网络(Controller)的参数(即每个特征分别对应的权重)；之后，在重训练阶段，根据控制器网络在上一阶段最终得到的参数，本申请实施例选出最优的特征进行组合，并将被选中的特征进行组合得到的组合特征输入到需要使用的推荐系统中。由图6可以看出，两个阶段都是由三个模块组成：控制器网络、特征选择模型(Feature Selection Modle)、以及特征信息挖掘模型(Feature Utilizing Model)。For example, see Figure 6, which is a schematic diagram of the principle of the feature selection method provided by the embodiment of the present application. As shown in Figure 6, the feature selection method provided by the embodiment of the present application consists of two stages: a search stage (Search Stage) and a retraining stage (Retraining Stage). In order to find the optimal features for combination, the embodiment of the present application updates the parameters of the controller network (Controller) (that is, the weights corresponding to each feature) in the search stage; thereafter, in the retraining stage, according to the parameters finally obtained by the controller network in the previous stage, the embodiment of the present application selects the optimal features for combination, and inputs the combined features obtained by combining the selected features into the recommendation system to be used. As can be seen from Figure 6, both stages are composed of three modules: a controller network, a feature selection model (Feature Selection Modle), and a feature information mining model (Feature Utilizing Model).

在搜索阶段，本申请实施例首先初始化控制器网络的参数(例如将所有特征被选中的概率均设置为0.5)。在此之后，将所有待选择的特征(Input Fields)都输入到特征选择模型，以使特征选择模型根据控制器网络的参数，给每个特征一个权重，所有特征和对应权重相乘后输入到特征信息挖掘模型中。特征信息挖掘模型则利用这些带权重的特征来预测用户偏好(对应于上述的测试任务，例如预测用户针对某个物品的点击率)。本申请实施例通过预测训练集中用户的偏好，更新特征信息挖掘模型的参数；同时，根据对验证集中用户的预测结果更新控制器网络的参数。在这个阶段结束时，控制器网络中的参数被充分训练。In the search phase, the embodiment of the present application first initializes the parameters of the controller network (for example, the probability of all features being selected is set to 0.5). After that, all the features to be selected (Input Fields) are input into the feature selection model, so that the feature selection model gives each feature a weight according to the parameters of the controller network, and all features are multiplied by the corresponding weights and input into the feature information mining model. The feature information mining model uses these weighted features to predict user preferences (corresponding to the above-mentioned test tasks, such as predicting the user's click rate for a certain item). The embodiment of the present application updates the parameters of the feature information mining model by predicting the preferences of users in the training set; at the same time, the parameters of the controller network are updated according to the prediction results of the users in the verification set. At the end of this stage, the parameters in the controller network are fully trained.

搜索结果结束后进入重训练阶段。具体来说，在重训练阶段开始时，特征选择模型根据经过搜索阶段充分训练后的控制器网络的参数，从所有待选择的特征中选择出最优的特征进行组合，例如从所有待选择的特征{e₁、e₂、e₃、e₄}中选择出最优的特征e₁和e₄。需要说明的是，特征选择模型在两个阶段的作用是不一样的(具体来说，在搜索阶段，特征选择模型会给每个特征一个初始权重，这种选择不会真正的筛选掉特征；而在重训练阶段，特征选择模型会根据更新后的权重，从待选择的特征中筛选掉不好的特征)。随后，特征信息挖掘模型根据被选中的特征组合(例如被选中的特征e₁和e₄)进行适配并优化。After the search results are completed, the retraining phase begins. Specifically, at the beginning of the retraining phase, the feature selection model selects the best features from all the features to be selected for combination based on the parameters of the controller network that has been fully trained in the search phase, for example, the best features _e1 and _e4 are selected from all the features to be selected { _e1 , _e2 , _e3 , _e4 }. It should be noted that the role of the feature selection model in the two phases is different (specifically, in the search phase, the feature selection model will give each feature an initial weight, and this selection will not really filter out the features; while in the retraining phase, the feature selection model will filter out bad features from the features to be selected based on the updated weights). Subsequently, the feature information mining model is adapted and optimized based on the selected feature combination (for example, the selected features _e1 and _e4 ).

下面首先对图6中示出的控制器网络进行说明。First, the controller network shown in FIG. 6 will be described below.

示例的，假设一共有N个不同的特征，则所有可能的特征组合共有2^N个。如果直接利用编码的方法将所有的特征组合表示为搜索空间，那么随着待选特征数量的增加，搜索空间将趋于无穷大。鉴于此，如图7A所示，本申请实施例将搜索空间定义为一个有向完全图，将每一个特征作为图中的一个节点，并将特征选择的结果作为输出节点；每个节点与输出节点之间存在两条边，分别代表选择和丢弃该节点所代表的特征(例如实线表示选择，虚线表示丢弃)。在这样的设计下，假设共有N个不同的特征，本申请实施例只需要决定2N条节点和输出节点之间的边的状态，显著减小了搜索空间。For example, assuming that there are a total of N different features, there are a total of 2^N for all possible feature combinations. If all feature combinations are directly represented as search space using the method of encoding, then as the number of features to be selected increases, the search space will tend to infinity. In view of this, as shown in Figure 7A, the embodiment of the present application defines the search space as a directed complete graph, using each feature as a node in the graph, and using the result of feature selection as an output node; There are two edges between each node and the output node, representing the selection and discarding of the features represented by the node (for example, a solid line represents selection, and a dotted line represents discarding). Under such a design, assuming that there are a total of N different features, the embodiment of the present application only needs to determine the state of the edge between 2N nodes and the output node, which significantly reduces the search space.

在搜索阶段，首先给每条边初始化一个均匀的权重(例如对于特征1，其对应的权重和的取值均为0.5，其中，表示丢弃特征1的概率，表示选择特征1的概率)，这些权重将被输入到特征选择模型中，以决定特征选择模型的选择作用。如图7B所示，随着搜索的进行，均匀的初始权重逐渐变得不均衡(例如可以用线条的粗细来代表对应权重的数值大小，线条越粗，代表对应的权重的数值越大)，代表选择有效特征的边的权重逐渐变大，而代表选择无效特征的边的权重逐渐减小。最终，如图7C所示，在重训练阶段，每个节点的两条边中，只有粗的一条会被保留，从而完成特征选择任务。In the search phase, each edge is first initialized with a uniform weight (for example, for feature 1, its corresponding weight is and The value of is 0.5, among which, represents the probability of discarding feature 1, Represents the probability of selecting feature 1), these weights will be input into the feature selection model to determine the selection effect of the feature selection model. As shown in Figure 7B, as the search proceeds, the uniform initial weights gradually become unbalanced (for example, the thickness of the line can be used to represent the numerical value of the corresponding weight, the thicker the line, the larger the numerical value of the corresponding weight), the weight of the edge representing the selection of valid features gradually increases, and the weight of the edge representing the selection of invalid features gradually decreases. Finally, as shown in Figure 7C, in the retraining stage, only the thickest of the two edges of each node will be retained, thereby completing the feature selection task.

下面继续对图6中示出的特征信息挖掘模型进行说明。The feature information mining model shown in FIG. 6 will be described below.

示例的，参见图8，图8是本申请实施例提供的特征信息挖掘模型的结构示意图，如图8所示，特征信息挖掘模型由两个基本单元组成，分别为嵌入单元(Embedding Layer)和感知机单元(MLP Layers)。其中，嵌入单元用于将离散的输入嵌入到低维的连续空间中。首先，将输入的原始特征进行二元化处理(Binarization)，即只使用0和1来表示。具体来说，假设一个特征共有M个不同的取值，则可以用M个长度为M，每个只含有1个“1”，其余均为“0”的向量，分别表示每一种可能的值(即对特征进行独热编码，这样每个特征取值对应一维特征，独热编码得到稀疏的特征矩阵)。以特征“性别”为例，假设输入为“男”或者“女”，二元化处理后，可以用(1，0)表示“男”，用(0，1)表示“女”。假设共有N个特征，则二元化处理后的N个特征可以表示为：For example, see Figure 8, which is a structural diagram of the feature information mining model provided by an embodiment of the present application. As shown in Figure 8, the feature information mining model consists of two basic units, namely an embedding unit (Embedding Layer) and a perceptron unit (MLP Layers). Among them, the embedding unit is used to embed discrete inputs into a low-dimensional continuous space. First, the original input features are binarized (binarization), that is, only 0 and 1 are used to represent them. Specifically, assuming that a feature has M different values, M vectors of length M, each containing only 1 "1" and the rest are "0", can be used to represent each possible value (that is, the feature is uniquely encoded, so that each feature value corresponds to a one-dimensional feature, and the unique hot encoding obtains a sparse feature matrix). Taking the feature "gender" as an example, assuming that the input is "male" or "female", after binarization, (1, 0) can be used to represent "male" and (0, 1) can be used to represent "female". Assuming that there are N features in total, the N features after binarization can be expressed as:

x＝[x₁,x₂,…,x_N]x＝[x ₁ ,x ₂ ,…,x _N ]

其中，x_n为第n个特征对应的二元化处理后的结果，且其中，D_n为对应的该特征不同取值的个数(即上文中的M)。之后，由于这样的特征十分稀疏且某些特征的取值过多导致维度非常大，不利于后续的特征使用，因此还需要将二元化处理后的特征投影(Projection)到低维连续空间中：Among them, _xn is the result of binary processing corresponding to the nth feature, and Among them, _Dn is the number of different values of the corresponding feature (i.e., M in the above text). Afterwards, since such features are very sparse and some features have too many values, resulting in a very large dimension, which is not conducive to the subsequent use of features, it is also necessary to project the binary processed features into a low-dimensional continuous space:

e_n＝A_nx_n e _n =A _n x _n

其中，e_n为第n个二元化处理后的特征经过投影处理后的结果(即第n个嵌入后的特征)，且是待学习的参数，d由使用者确定(不同的特征可以使用相同的值)。最终，可以得到嵌入后的N个特征：Where _en is the result of the projection of the nth binary feature (i.e., the nth embedded feature), and is the parameter to be learned, and d is determined by the user (different features can use the same value). Finally, N features can be obtained after embedding:

E＝[e₁,e₂,…,e_N]E＝[e ₁ ,e ₂ ,…,e _N ]

其中，e_n为第n个嵌入后的特征。在得到嵌入后的N个特征之后，还可以将嵌入后的N个特征进行拼接处理(Concatenate)，并将拼接处理后得到的拼接特征输入到感知机单元中。Wherein, _en is the nth embedded feature. After obtaining the embedded N features, the embedded N features can also be concatenated (Concatenate), and the concatenated features obtained after the concatenation are input into the perceptron unit.

在一些实施例中，感知机单元可以是一个多层感知机(MLP，MultiLayerPerceptron)，由多层的线性变换(Linear Transformation)以及非线性的激活函数(例如Sigmoid)组成，感知机单元用于挖掘拼接特征中的信息，给出预测结果，具体公式如下：In some embodiments, the perceptron unit may be a multi-layer perceptron (MLP), which is composed of multiple layers of linear transformation (Linear Transformation) and nonlinear activation functions (such as Sigmoid). The perceptron unit is used to mine information in the splicing features and give a prediction result. The specific formula is as follows:

h₁＝E,h ₁ ＝E,

h_m+1＝ReLU(W_mh_m+b_m),1≤m≤M-1,h _m+1 =ReLU(W _m h _m +b _m ),1≤m≤M-1,

其中，E是上文提及的嵌入单元的输出，W_m，b_m，h_m分别表示第m层的线性变换矩阵(又称权重矩阵)，偏置向量以及输入，h_m+1表示第m层的输出(作为第m+1层的输入)，ReLU表示线性激活函数，例如取最大值的函数。M是多层感知机包括的多个层的层数，W_M，b_M分别表示输出层(即第M层)的线性变换矩阵和偏置向量，σ(·)表示输出层的非线性激活函数(例如Sigmoid函数)，其类型取决于推荐任务。Where E is the output of the embedding unit mentioned above, _Wm , _bm , and _hm represent the linear transformation matrix (also known as the weight matrix), bias vector, and input of the mth layer, respectively, hm ₊₁ represents the output of the mth layer (as the input of the m+1th layer), and ReLU represents a linear activation function, such as a maximum value function. M is the number of layers included in the multilayer perceptron, _WM , _bM represent the linear transformation matrix and bias vector of the output layer (i.e., the Mth layer), respectively, and σ(·) represents the nonlinear activation function of the output layer (e.g., Sigmoid function), and its type depends on the recommendation task.

下面继续对图6中示出的特征选择模型进行说明。The feature selection model shown in FIG6 is further described below.

在一些实施例中，特征选择模型的作用是将控制器网络给出的权重作用在嵌入后的特征E上：In some embodiments, the function of the feature selection model is to apply the weights given by the controller network to the embedded features E:

E′＝[e′₁,e′₂,…,e′_N]E′＝[e′ ₁ ,e′ ₂ ,…,e′ _N ]

其中，e_n是上文提及的第n个特征嵌入后的结果，e′_n为特征选择模型作用后的对应特征，是根据控制器网络给出的权重得到的。Among them, _en is the result of embedding the nth feature mentioned above, and _e′n is the corresponding feature after the feature selection model is applied. is the weight given by the controller network Got it.

示例的，特征选择模型可以根据控制器网络给出的权重得到为了使的取值更加边缘化(即一个接近0，另一个接近1)，同时保持网络结构端到端的可微分性，本申请实施例可以使用归一化指数函数(例如Gumbel-Softmax函数)来获得具体如下：For example, the feature selection model can be based on the weights given by the controller network get In order to The value of is more marginalized (i.e., one is close to 0 and the other is close to 1), while maintaining the end-to-end differentiability of the network structure. In the embodiment of the present application, a normalized exponential function (such as the Gumbel-Softmax function) can be used to obtain The details are as follows:

g_j＝-log(-log(u_j))g _j = -log(-log(u _j ))

u_j～Uinform(0,1)u _j ～Uinform(0,1)

其中，表示新的选择(j＝1)或者丢弃(j＝0)第n个特征的概率，g_j表示独立同分布的gumbel噪声，j的取值范围为1或者0，τ表示退火参数(又称温度参数)，Uinform(0,1)表示根据均匀分布生成一个0到1之间的实数。in, represents the probability of selecting (j=1) or discarding (j=0) the nth feature, _gj represents independent and identically distributed gumbel noise, the value range of j is 1 or 0, τ represents the annealing parameter (also known as temperature parameter), and Uinform(0,1) represents generating a real number between 0 and 1 according to the uniform distribution.

下面对本申请实施例提供的优化算法进行说明。The optimization algorithm provided in the embodiments of the present application is described below.

在一些实施例中，可以采用牛顿梯度下降法来交替更新特征信息挖掘模型的参数和控制器网络的参数。In some embodiments, the Newton gradient descent method may be used to alternately update the parameters of the feature information mining model and the parameters of the controller network.

示例的，在搜索阶段，需要优化的参数有控制器网络的参数以及特征信息挖掘模型的参数W。由于这两部分的参数是高度相关的，如果利用一批数据同时训练这两部分的参数，将导致严重的过拟合问题。为此，本申请实施例通过设计以下两个目标函数来解决过拟合的问题：For example, in the search phase, the parameters that need to be optimized are the parameters of the controller network And the parameter W of the feature information mining model. Since the parameters of these two parts are highly correlated, if a batch of data is used to train the parameters of these two parts at the same time, it will lead to serious overfitting problems. To this end, the embodiment of the present application solves the overfitting problem by designing the following two objective functions:

其中，对应于上述的第一目标函数，对应于上述的第二目标函数；W表示特征信息挖掘模型的参数，为控制器网络的参数(即权重)，W^*分别表示对应限制条件下最优的参数，分别表示在训练集以及验证集上的损失，且训练集和验证集可以在算法运行前随机划分。两个目标函数使用的损失函数可以均为二分类交叉熵(BCE，Binary Cross Entropy)损失函数。具体来说，损失函数可以定义为：in, Corresponding to the first objective function mentioned above, Corresponding to the second objective function mentioned above; W represents the parameter of the feature information mining model, are the parameters (i.e. weights) of the controller network, W ^* represents the optimal parameters under the corresponding constraints, Respectively represent the loss on the training set and the validation set, and the training set and the validation set can be randomly divided before the algorithm runs. The loss function used by the two objective functions can be the binary cross entropy (BCE) loss function. Specifically, the loss function can be defined as:

其中，y为真实值，为模型预测值。Among them, y is the true value, is the model's predicted value.

然而，在每一步优化时，为了得到上式中的不可避免地需要将特征信息挖掘模型训练至收敛，这种情况下需要的计算量是无法满足的。因此，本申请实施例中可以采用一阶近似，减小计算量：However, at each step of optimization, in order to obtain It is inevitable to train the feature information mining model to convergence, and the amount of calculation required in this case cannot be met. Therefore, in the embodiment of the present application, a first-order approximation can be used to reduce the amount of calculation:

即 Right now

因此，本申请实施例中最终所使用的优化算法可以为：Therefore, the optimization algorithm finally used in the embodiment of the present application can be:

其中，δ_W,为两部分参数分别对应的学习率，且这两部分参数的更新以一定的更新频率交替进行。Among them, δ _W , Are the learning rates corresponding to the two parts of the parameters, and the updates of these two parts of the parameters are performed alternately at a certain update frequency.

下面继续对重训练的过程进行说明。The following is a description of the retraining process.

在一些实施例中，经过搜索阶段的训练后，控制器网络中的参数已经经过良好的训练，可以根据此给出准确的特征选择结果。In some embodiments, after training in the search phase, the parameters in the controller network are It has been well trained and can give accurate feature selection results based on it.

示例的，在重训练阶段中，特征选择模型可以首先根据控制器网络训练后的参数从所有待选择的特征中选择出最优的部分特征(例如K个特征)。在该阶段中，特征选择模型可以利用Softmax算法计算每个特征相应的得分：For example, in the retraining phase, the feature selection model can first be based on the parameters of the controller network trained Select the best partial features (for example, K features) from all the features to be selected. In this stage, the feature selection model can use the Softmax algorithm to calculate the corresponding score of each feature:

在得到每个特征对应的得分之后，可以选择得分最高的K个特征作为最终的特征组合。After obtaining the score corresponding to each feature, the K features with the highest scores can be selected as the final feature combination.

此外，在选择好特征之后，本申请实施例还可以根据选择出的特征的个数K，重新建立新的特征信息挖掘模型(因为原来的特征信息挖掘模型是为N个特征设计的)，例如可以将特征信息挖掘模型包括的第1层的输入尺寸从N×d调整为K×d，从而得到新的特征信息挖掘模型。In addition, after selecting the features, the embodiment of the present application can also re-establish a new feature information mining model based on the number K of selected features (because the original feature information mining model is designed for N features). For example, the input size of the first layer included in the feature information mining model can be adjusted from N×d to K×d, thereby obtaining a new feature information mining model.

需要说明的是，除了使用上文提及的感知机单元之外，本申请实施例还可以将此模块替换为其他结构的推荐系统，例如因子分解机(FM，Factorization Machine)。也就是说，本申请实施例给出的特征选择结果具有良好的泛化性。It should be noted that, in addition to using the perceptron unit mentioned above, the embodiment of the present application can also replace this module with a recommendation system of other structures, such as a factorization machine (FM). In other words, the feature selection result given by the embodiment of the present application has good generalization.

下面将结合实验数据对本申请实施例提供的特征选择方法具有的有益效果进行进一步的说明。The beneficial effects of the feature selection method provided in the embodiment of the present application will be further described below in combination with experimental data.

为了评估本申请实施例提供的特征选择方法的有效性，本申请实施例进行了广泛的基于不同特征选择方法的对比实验，其中，参与对比的特征选择方法包括：主成分分析(PCA，Principal Component Analysis)、回归模型(LASSO，Least Absolute Shrinkageand Selection Operator)、梯度提升回归(GBR，Gradient Boosting Regression)、梯度提升决策树(GBDT，Gradient Boosting Decision Tree)和IDARTS，其中，本申请实施例提供的特征选择方法从Avazu数据集中丢弃了11个特征、以及从Criteo数据集中丢弃了6个特征。为了保障实验的公平性，其他特征选择方法也从两个数据集中丢弃了相同数量或者更少的无效特征，具体结果可以参见表1。In order to evaluate the effectiveness of the feature selection method provided in the embodiment of the present application, the embodiment of the present application conducted a wide range of comparative experiments based on different feature selection methods, wherein the feature selection methods involved in the comparison include: principal component analysis (PCA, Principal Component Analysis), regression model (LASSO, Least Absolute Shrinkageand Selection Operator), gradient boosting regression (GBR, Gradient Boosting Regression), gradient boosting decision tree (GBDT, Gradient Boosting Decision Tree) and IDARTS, wherein the feature selection method provided in the embodiment of the present application discarded 11 features from the Avazu data set and 6 features from the Criteo data set. In order to ensure the fairness of the experiment, other feature selection methods also discarded the same number or fewer invalid features from the two data sets. The specific results can be seen in Table 1.

从表1可以看出，本申请实施例提供的特征选择方法在丢弃了较多的特征的同时(例如在Avazu数据集中丢弃了50％的特征)，还能够获得较好的表现，例如在Avazu数据集中，本申请实施例提供的特征选择方法对应的AUC(即受试者工作特征曲线(ROC，ReceiverOperating Characteristic Curve)下的面积大小，能够量化地反映基于ROC曲线衡量出的模型性能，AUC越大，表明模型的效果越好)为0.7773，大于其他特征选择方法对应的AUC。此外，本申请实施例提供的特征选择方法对应的Logloss(用于评估是否准确的指标，其值越小越好)也小于其他特征选择方法对应的Logloss。It can be seen from Table 1 that the feature selection method provided in the embodiment of the present application can achieve good performance while discarding more features (for example, 50% of the features are discarded in the Avazu data set). For example, in the Avazu data set, the AUC (i.e., the area under the receiver operating characteristic curve (ROC), which can quantitatively reflect the model performance measured based on the ROC curve. The larger the AUC, the better the effect of the model) corresponding to the feature selection method provided in the embodiment of the present application is 0.7773, which is greater than the AUC corresponding to other feature selection methods. In addition, the Logloss (an indicator used to evaluate whether it is accurate, the smaller the value, the better) corresponding to the feature selection method provided in the embodiment of the present application is also smaller than the Logloss corresponding to other feature selection methods.

表1整体表现对比表Table 1 Overall performance comparison table

此外，为了进一步研究本申请实施例给出的特征选择结果的可迁移性，本申请实施例还将被选中的特征应用于六个高级深度推荐模型中，包括Avazu数据集上的因子分解机(FM，Factorization Machine)、DeepFM(从FM基础上衍生的算法，将Deep与FM相结合，用FM做特征间低阶组合，用Deep NN部分做特征间高阶组合，通过并行的方式组合两种方法)、xDeepFM(通过Deep和Cross延伸得到，其继承了Deep和Cross可以控制高阶特征交互的特点)、IPNN、Wide&Deep(WD)和DeepCrossNet(DCN)，具体结果参见表2。在表2中，每个模型的第一行是所有待选择的特征对应的性能(包括AUC、Logloss、平均推理时间(Infer)等)，而第二行是采用本申请实施例提供的特征选择方法选中的特征对应的性能。In addition, in order to further study the transferability of the feature selection results given in the embodiments of the present application, the embodiments of the present application also apply the selected features to six advanced deep recommendation models, including the Factorization Machine (FM) on the Avazu dataset, DeepFM (an algorithm derived from FM, combining Deep with FM, using FM for low-order combinations between features, and using the Deep NN part for high-order combinations between features, combining the two methods in parallel), xDeepFM (extended by Deep and Cross, which inherits the characteristics of Deep and Cross that can control the interaction of high-order features), IPNN, Wide&Deep (WD) and DeepCrossNet (DCN). For specific results, see Table 2. In Table 2, the first row of each model is the performance corresponding to all the features to be selected (including AUC, Logloss, average inference time (Infer)), and the second row is the performance corresponding to the features selected by the feature selection method provided in the embodiments of the present application.

从表2可以看出，对于上述所有模型，采用本申请实施例给出的特征选择结果不仅能够改善模型的表现，而且能够减少推理的时间，从而提高在线推理的效率。也就是说，本申请实施例提供的特征选择方法具有很好的泛化性，可以应用于各种各样的推荐系统。As can be seen from Table 2, for all the above models, the feature selection results given in the embodiments of the present application can not only improve the performance of the model, but also reduce the time of reasoning, thereby improving the efficiency of online reasoning. In other words, the feature selection method provided in the embodiments of the present application has good generalization and can be applied to various recommendation systems.

表2在Avazu数据集上的可迁移性分析表Table 2 Transferability analysis on the Avazu dataset

下面继续说明本申请实施例提供的特征选择装置243的实施为软件模块的示例性结构，在一些实施例中，如图2所示，存储在存储器240的特征选择装置243中的软件模块可以包括：调用模块2431、融合模块2432、更新模块2433和组合模块2434。The following continues to describe an exemplary structure of the feature selection device 243 provided in an embodiment of the present application implemented as a software module. In some embodiments, as shown in FIG. 2 , the software module stored in the feature selection device 243 in the memory 240 may include: a calling module 2431, a fusion module 2432, an updating module 2433 and a combining module 2434.

调用模块2431，用于调用特征选择模型从控制器模型读取训练样本的多个特征分别对应的权重；融合模块2432，用于将权重对应融合至每个特征；调用模块2431，还用于基于融合有权重的每个特征，调用特征信息挖掘模型执行测试任务，得到测试结果；更新模块2433，用于基于测试结果与训练样本的标记结果的差异进行反向传播处理，以更新特征信息挖掘模型和控制器模型；调用模块2431，还用于调用特征选择模型从控制器模型读取更新的权重；组合模块2434，用于基于更新的权重从多个特征中筛选出部分特征进行组合，得到组合特征；其中，组合特征用于供更新后的特征信息挖掘模型执行预测任务。The calling module 2431 is used to call the feature selection model to read the weights corresponding to multiple features of the training sample from the controller model; the fusion module 2432 is used to fuse the weights to each feature; the calling module 2431 is also used to call the feature information mining model to perform the test task and obtain the test result based on each feature fused with the weight; the updating module 2433 is used to perform back propagation processing based on the difference between the test result and the marking result of the training sample to update the feature information mining model and the controller model; the calling module 2431 is also used to call the feature selection model to read the updated weights from the controller model; the combining module 2434 is used to select some features from multiple features based on the updated weights for combination to obtain the combined feature; wherein the combined feature is used for the updated feature information mining model to perform the prediction task.

在一些实施例中，控制器模型的特征搜索空间是有向完全图，有向完全图包括：与多个特征一一对应的多个输入节点、以及表征特征选择结果的输出节点；其中，每个输入节点与输出节点之间存在两条边，分别表征输入节点对应的特征的权重，其中，权重包括用于表征特征选择模型选择特征的概率的第一权重、以及用于表征特征选择模型丢弃特征的概率的第二权重。In some embodiments, the feature search space of the controller model is a directed complete graph, and the directed complete graph includes: multiple input nodes corresponding one-to-one to multiple features, and an output node representing the feature selection results; wherein there are two edges between each input node and the output node, respectively representing the weights of the features corresponding to the input nodes, wherein the weights include a first weight for representing the probability of the feature selection model selecting the feature, and a second weight for representing the probability of the feature selection model discarding the feature.

在一些实施例中，每个特征对应的权重包括第一权重和第二权重，第一权重用于表征特征选择模型选择特征的概率，第二权重用于表征特征选择模型丢弃特征的概率；特征选择装置243还包括确定模块2435，还用于针对每个特征执行以下处理：对初始格式为离散格式的特征进行二元化处理，得到二元化后的特征，对二元化后的特征进行嵌入处理，得到嵌入后的特征，其中，嵌入后的特征的特征空间的维度，小于二元化后的特征的特征空间的维度；确定嵌入后的特征对应的第一权重与第一系数的第一相乘结果、以及嵌入后的特征对应的第二权重与第二系数的第二相乘结果，并对第一相乘结果和第二相乘结果进行求和处理，得到第一求和结果；将第一求和结果和嵌入后的特征的相乘结果，确定为融合有权重的特征。In some embodiments, the weight corresponding to each feature includes a first weight and a second weight, the first weight is used to characterize the probability of the feature selection model selecting the feature, and the second weight is used to characterize the probability of the feature selection model discarding the feature; the feature selection device 243 also includes a determination module 2435, which is also used to perform the following processing for each feature: binarizing the features whose initial format is a discrete format to obtain binarized features, embedding the binarized features to obtain embedded features, wherein the dimension of the feature space of the embedded features is smaller than the dimension of the feature space of the binarized features; determining a first multiplication result of the first weight corresponding to the embedded feature and the first coefficient, and a second multiplication result of the second weight corresponding to the embedded feature and the second coefficient, and summing the first multiplication result and the second multiplication result to obtain a first summation result; determining the first summation result and the multiplication result of the embedded feature as a feature fused with weights.

在一些实施例中，每个特征对应的权重包括第一权重和第二权重，第一权重用于表征特征选择模型选择特征的概率，第二权重用于表征特征选择模型丢弃特征的概率；确定模块2435，还用于针对每个特征执行以下处理：对初始格式为离散格式的特征进行二元化处理，得到二元化后的特征，对二元化后的特征进行嵌入处理，得到嵌入后的特征，其中，嵌入后的特征的特征空间的维度，小于二元化后的特征的特征空间的维度；基于嵌入后的特征对应的第一权重和第二权重，确定第一中间参数和第二中间参数；确定第一中间参数与第一系数的第三相乘结果、以及第二中间参数与第二系数的第四相乘结果，并对第三相乘结果和第四相乘结果进行求和处理，得到第二求和结果；将第二求和结果和嵌入后的特征的相乘结果，确定为融合有权重的特征。In some embodiments, the weight corresponding to each feature includes a first weight and a second weight, the first weight is used to characterize the probability of the feature selection model selecting the feature, and the second weight is used to characterize the probability of the feature selection model discarding the feature; the determination module 2435 is also used to perform the following processing for each feature: binarize the features whose initial format is a discrete format to obtain binarized features, embed the binarized features to obtain embedded features, wherein the dimension of the feature space of the embedded features is smaller than the dimension of the feature space of the binarized features; determine the first intermediate parameter and the second intermediate parameter based on the first weight and the second weight corresponding to the embedded features; determine the third multiplication result of the first intermediate parameter and the first coefficient, and the fourth multiplication result of the second intermediate parameter and the second coefficient, and sum the third multiplication result and the fourth multiplication result to obtain a second summation result; determine the second summation result and the multiplication result of the embedded features as a weighted feature.

在一些实施例中，确定模块2435，还用于基于嵌入后的特征对应的第一权重和第二权重调用归一化指数函数进行处理，得到第一中间参数和第二中间参数；其中，归一化指数函数满足以下条件：使得第一中间参数和第二中间参数的取值边缘化；保持控制器模型的可微分性。In some embodiments, the determination module 2435 is also used to call a normalized exponential function based on the first weight and the second weight corresponding to the embedded features to obtain a first intermediate parameter and a second intermediate parameter; wherein the normalized exponential function satisfies the following conditions: marginalizing the values of the first intermediate parameter and the second intermediate parameter; and maintaining the differentiability of the controller model.

在一些实施例中，确定模块2435，还用于基于嵌入后的特征对应的第一权重和第二权重调用归一化指数函数执行以下处理：将第一权重的取对数结果与第一噪声系数的求和结果与退火参数进行相除处理，得到第一相除结果；将第二权重的取对数结果与第二噪声系数的求和结果与退火参数进行相除处理，得到第二相除结果；对第一相除结果的取指数结果与第二相除结果的取指数结果进行求和处理，得到第三求和结果；将第一相除结果的取指数结果与第三求和结果的相除结果，确定为第一中间参数；将第二相除结果的取指数结果与第三求和结果的相除结果，确定为第二中间参数。In some embodiments, the determination module 2435 is also used to call the normalized exponential function to perform the following processing based on the first weight and the second weight corresponding to the embedded feature: divide the sum of the logarithm result of the first weight and the first noise coefficient by the annealing parameter to obtain a first division result; divide the sum of the logarithm result of the second weight and the second noise coefficient by the annealing parameter to obtain a second division result; sum the exponential result of the first division result and the exponential result of the second division result to obtain a third summation result; determine the division result of the exponential result of the first division result and the third summation result as the first intermediate parameter; determine the division result of the exponential result of the second division result and the third summation result as the second intermediate parameter.

在一些实施例中，特征信息挖掘模型包括嵌入单元和感知机单元；调用模块4651，还用于基于融合有权重的每个嵌入后的特征，调用感知机单元执行测试任务，得到测试结果；其中，嵌入后的特征是调用嵌入单元针对初始格式为离散格式的特征执行以下处理得到的：对初始格式为离散格式的特征进行二元化处理，得到二元化后的特征，对二元化后的特征进行嵌入处理，得到嵌入后的特征，其中，嵌入后的特征的特征空间的维度，小于二元化后的特征的特征空间的维度。In some embodiments, the feature information mining model includes an embedding unit and a perceptron unit; the calling module 4651 is also used to call the perceptron unit to perform a test task based on each embedded feature fused with a weight, and obtain a test result; wherein the embedded feature is obtained by calling the embedding unit to perform the following processing on the feature whose initial format is a discrete format: binarizing the feature whose initial format is a discrete format to obtain the binarized feature, embedding the binarized feature to obtain the embedded feature, wherein the dimension of the feature space of the embedded feature is smaller than the dimension of the feature space of the binarized feature.

在一些实施例中，感知机单元是多层感知机，多层感知机包括多个层，每个层包括对应的权重矩阵和偏置向量；调用模块2431，还用于将融合有权重的每个嵌入后的特征作为多层感知机的第1层的输入，并迭代m执行以下处理：对第m层的权重矩阵与第m层的输入的相乘结果，与第m层的偏置向量进行求和处理，得到第四求和结果；对第四求和结果进行线性变换处理，得到第m层的输出，并将第m层的输出作为第m+1层的输入；其中，m的取值逐步递增且满足1≤m≤M-1，M为多个层的总数；对第M层的权重矩阵与第M层的输入的相乘结果，与第M层的偏置向量进行求和处理，得到第五求和结果；确定与测试任务对应的非线性激活函数；基于非线性激活函数对第五求和结果进行非线性激活处理，得到测试结果。In some embodiments, the perceptron unit is a multilayer perceptron, which includes multiple layers, each layer including a corresponding weight matrix and a bias vector; calling module 2431 is also used to use each embedded feature fused with weights as the input of the first layer of the multilayer perceptron, and iterate m to perform the following processing: summing the multiplication result of the weight matrix of the mth layer and the input of the mth layer with the bias vector of the mth layer to obtain a fourth summation result; performing linear transformation processing on the fourth summation result to obtain the output of the mth layer, and using the output of the mth layer as the input of the m+1th layer; wherein the value of m gradually increases and satisfies 1≤m≤M-1, and M is the total number of multiple layers; summing the multiplication result of the weight matrix of the Mth layer and the input of the Mth layer with the bias vector of the Mth layer to obtain a fifth summation result; determining a nonlinear activation function corresponding to the test task; performing nonlinear activation processing on the fifth summation result based on the nonlinear activation function to obtain a test result.

在一些实施例中，特征选择装置243还包括划分模块2436，用于将多个不同的训练样本划分为训练集和验证集；更新模块2433，还用于将训练集对应的测试结果与训练集的标记结果代入第一目标函数，以确定对应的第一差异，根据第一差异确定特征信息挖掘模型的第一梯度，并根据第一梯度更新特征信息挖掘模型的参数；以及用于将验证集对应的测试结果与验证集的标记结果代入第二目标函数，以确定对应的第二差异，根据第二差异确定控制器模型的第二梯度，并根据第二梯度更新控制器模型的权重；其中，第一目标函数以控制器模型和特征信息挖掘模型的损失为因子，第二目标函数以控制器模型和特征信息挖掘模型的损失为因子，且第一目标函数的优化目标与第二目标函数的优化目标不同。In some embodiments, the feature selection device 243 also includes a division module 2436, which is used to divide multiple different training samples into a training set and a validation set; the update module 2433 is also used to substitute the test results corresponding to the training set and the labeled results of the training set into the first objective function to determine the corresponding first difference, determine the first gradient of the feature information mining model according to the first difference, and update the parameters of the feature information mining model according to the first gradient; and to substitute the test results corresponding to the validation set and the labeled results of the validation set into the second objective function to determine the corresponding second difference, determine the second gradient of the controller model according to the second difference, and update the weight of the controller model according to the second gradient; wherein the first objective function uses the loss of the controller model and the feature information mining model as a factor, the second objective function uses the loss of the controller model and the feature information mining model as a factor, and the optimization target of the first objective function is different from the optimization target of the second objective function.

在一些实施例中，每个特征对应的权重包括第一权重和第二权重，第一权重用于表征特征选择模型选择特征的概率，第二权重用于表征特征选择模型丢弃特征的概率；其中，当控制器模型初始化时，特征对应的第一权重和第二权重的取值相同；当控制器模型更新后，当特征为有效特征时，第一权重大于第二权重；当特征为无效特征时，第一权重小于第二权重。In some embodiments, the weight corresponding to each feature includes a first weight and a second weight, the first weight is used to characterize the probability of the feature selection model selecting the feature, and the second weight is used to characterize the probability of the feature selection model discarding the feature; wherein, when the controller model is initialized, the first weight and the second weight corresponding to the feature have the same value; when the controller model is updated, when the feature is a valid feature, the first weight is greater than the second weight; when the feature is an invalid feature, the first weight is less than the second weight.

在一些实施例中，第一目标函数和第二目标函数使用的损失函数均为二分类交叉熵损失函数；确定模块2435，还用于通过二分类交叉熵损失函数执行以下处理：对训练集的标记结果与训练集对应的测试结果的取对数结果进行相乘处理，得到第五相乘结果；确定1与训练集的标记结果之间的第一差值，以及1与训练集对应的测试结果之间的第二差值的取对数结果，并对第一差值与第二差值的取对数结果进行相乘处理，得到第六相乘结果；将第五相乘结果与第六相乘结果的求和结果，确定为第一差异；以及用于通过二分类交叉熵损失函数执行以下处理：对验证集的标记结果与验证集对应的测试结果的取对数结果进行相乘处理，得到第七相乘结果；确定1与验证集的标记结果之间的第三差值，以及1与验证集对应的测试结果之间的第四差值的取对数结果，并对第三差值与第四差值的取对数结果进行相乘处理，得到第八相乘结果；将第七相乘结果与第八相乘结果的求和结果，确定为第二差异。In some embodiments, the loss functions used by the first objective function and the second objective function are both binary cross entropy loss functions; the determination module 2435 is further used to perform the following processing through the binary cross entropy loss function: multiplying the labeled result of the training set by the logarithm result of the test result corresponding to the training set to obtain a fifth multiplication result; determining the first difference between 1 and the labeled result of the training set, and the logarithm result of the second difference between 1 and the test result corresponding to the training set, and multiplying the logarithm result of the first difference by the second difference to obtain a sixth multiplication result; multiplying the fifth The sum of the multiplication result and the sixth multiplication result is determined as the first difference; and is used to perform the following processing through the binary cross entropy loss function: multiply the logarithm result of the labeled result of the validation set by the test result corresponding to the validation set to obtain a seventh multiplication result; determine the third difference between 1 and the labeled result of the validation set, and the logarithm result of the fourth difference between 1 and the test result corresponding to the validation set, and multiply the logarithm result of the third difference by the fourth difference to obtain an eighth multiplication result; and determine the sum of the seventh multiplication result and the eighth multiplication result as the second difference.

在一些实施例中，调用模块2431，还用于调用特征选择模型从控制器模型读取更新的权重；确定模块2435，还用于基于更新的权重，确定多个特征分别对应的得分；组合模块2434，还用于从多个特征中筛选出得分大于得分阈值的至少部分特征进行组合，得到组合特征。In some embodiments, the calling module 2431 is also used to call the feature selection model to read the updated weights from the controller model; the determining module 2435 is also used to determine the scores corresponding to multiple features based on the updated weights; the combining module 2434 is also used to filter out at least some features whose scores are greater than the score threshold from the multiple features for combination to obtain combined features.

在一些实施例中，每个特征对应的更新的权重包括更新的第一权重和更新的第二权重，其中，更新的第一权重用于表征特征选择模型选择特征的概率，更新的第二权重用于表征特征选择模型丢弃特征的概率；确定模块2435，还用于针对每个特征执行以下处理：对特征对应的更新的第一权重的取指数结果，与特征对应的更新的第二权重的取指数结果进行求和处理，得到第六求和结果；将特征对应的更新的第一权重的取指数结果与第六求和结果的相除结果，确定为特征对应的得分。In some embodiments, the updated weight corresponding to each feature includes an updated first weight and an updated second weight, wherein the updated first weight is used to characterize the probability of the feature selection model selecting the feature, and the updated second weight is used to characterize the probability of the feature selection model discarding the feature; the determination module 2435 is also used to perform the following processing for each feature: summing the exponential result of the updated first weight corresponding to the feature with the exponential result of the updated second weight corresponding to the feature to obtain a sixth summation result; and determining the result of dividing the exponential result of the updated first weight corresponding to the feature by the sixth summation result as the score corresponding to the feature.

在一些实施例中，组合特征包括的特征的数量小于多个特征的数量，特征信息挖掘模型是基于多个特征构建的；特征选择装置243还包括调整模块2437，用于将特征信息挖掘模型的输入层的尺寸，从与多个特征的数量对应的尺寸调整至与组合特征包括的特征的数量对应的尺寸，得到新的特征信息挖掘模型。In some embodiments, the number of features included in the combined feature is less than the number of multiple features, and the feature information mining model is constructed based on multiple features; the feature selection device 243 also includes an adjustment module 2437, which is used to adjust the size of the input layer of the feature information mining model from the size corresponding to the number of multiple features to the size corresponding to the number of features included in the combined feature, so as to obtain a new feature information mining model.

需要说明的是，本申请实施例装置的描述，与上述方法实施例的描述是类似的，具有同方法实施例相似的有益效果，因此不做赘述。对于本申请实施例提供的特征选择装置中未尽的技术细节，可以根据图4、图5A、或图5B任一附图的说明而理解。It should be noted that the description of the device of the embodiment of the present application is similar to the description of the above-mentioned method embodiment, and has similar beneficial effects as the method embodiment, so it is not repeated. The unfinished technical details of the feature selection device provided in the embodiment of the present application can be understood according to the description of any of Figures 4, 5A, or 5B.

本申请实施例提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令(即可执行指令)，该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该电子设备执行本申请实施例上述的特征选择方法。The embodiment of the present application provides a computer program product or a computer program, which includes computer instructions (i.e., executable instructions), which are stored in a computer-readable storage medium. A processor of an electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device performs the feature selection method described in the embodiment of the present application.

本申请实施例提供一种存储有可执行指令的计算机可读存储介质，其中存储有可执行指令，当可执行指令被处理器执行时，将引起处理器执行本申请实施例提供的方法，例如，如图4、图5A、或图5B任一附图示出的特征选择方法。An embodiment of the present application provides a computer-readable storage medium storing executable instructions, wherein executable instructions are stored. When the executable instructions are executed by a processor, the processor will execute the method provided by the embodiment of the present application, for example, the feature selection method shown in any of Figures 4, 5A, or 5B.

在一些实施例中，计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器；也可以是包括上述存储器之一或任意组合的各种设备。In some embodiments, the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface storage, optical disk, or CD-ROM; or it may be various devices including one or any combination of the above memories.

在一些实施例中，可执行指令可以采用程序、软件、软件模块、脚本或代码的形式，按任意形式的编程语言(包括编译或解释语言，或者声明性或过程性语言)来编写，并且其可按任意形式部署，包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。In some embodiments, executable instructions may be in the form of a program, software, software module, script or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine or other unit suitable for use in a computing environment.

作为示例，可执行指令可以但不一定对应于文件系统中的文件，可以可被存储在保存其它程序或数据的文件的一部分，例如，存储在超文本标记语言(HTML，Hyper TextMarkup Language)文档中的一个或多个脚本中，存储在专用于所讨论的程序的单个文件中，或者，存储在多个协同文件(例如，存储一个或多个模块、子程序或代码部分的文件)中。As an example, executable instructions may, but need not, correspond to a file in a file system, may be stored as part of a file that stores other programs or data, such as in one or more scripts in a HyperText Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files storing one or more modules, subroutines, or code portions).

作为示例，可执行指令可被部署为在一个电子设备上执行，或者在位于一个地点的多个电子设备上执行，又或者，在分布在多个地点且通过通信网络互连的多个电子设备上执行。As an example, the executable instructions may be deployed to be executed on one electronic device, or on multiple electronic devices located at one site, or on multiple electronic devices distributed at multiple sites and interconnected by a communication network.

以上所述，仅为本申请的实施例而已，并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等，均包含在本申请的保护范围之内。The above is only an embodiment of the present application and is not intended to limit the protection scope of the present application. Any modifications, equivalent substitutions and improvements made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A feature selection method, characterized in that the method includes:

Call the feature selection model to read the weights corresponding to multiple features of the training sample from the controller model, and fuse the weights to each of the features;

Based on each of the features fused with the weight, call the feature information mining model to perform the test task and obtain the test results;

Perform backpropagation processing based on the difference between the test results and the labeling results of the training samples to update the feature information mining model and the controller model;

Call the feature selection model to read the updated weights from the controller model, select some features from the multiple features based on the updated weights and combine them to obtain combined features;

Wherein, the combined features are used for the updated feature information mining model to perform prediction tasks.

2. The method according to claim 1, characterized in that,

The feature search space of the controller model is a directed complete graph, and the directed complete graph includes: multiple input nodes corresponding to the multiple features one-to-one, and output nodes representing the feature selection results; wherein, each There are two edges between each of the input nodes and the output node, respectively representing the weights of the features corresponding to the input nodes, and the weights include the weights used to represent the features selected by the feature selection model. a first weight of probability, and a second weight characterizing the probability that the feature selection model discards the feature.

3. The method according to claim 1, characterized in that,

The weight corresponding to each feature includes a first weight and a second weight. The first weight is used to characterize the probability that the feature selection model selects the feature. The second weight is used to characterize the feature. The probability that the selection model discards said feature;

The corresponding fusion of the weights to each of the features includes:

The following processing is performed for each described feature:

The features whose initial format is a discrete format are binarized to obtain the binarized features, and the binarized features are embedding to obtain the embedded features, where embedding The dimension of the feature space of the feature after binaryization is smaller than the dimension of the feature space of the feature after binaryization;

Determining the first multiplication result of the first weight and the first coefficient corresponding to the embedded feature, and the second multiplication result of the second weight and the second coefficient corresponding to the embedded feature, and performing a summation process on the first multiplication result and the second multiplication result to obtain a first summation result;

The multiplication result of the first summation result and the embedded feature is determined as the feature fused with the weight.

4. The method according to claim 1, characterized in that,

The corresponding fusion of the weights to each of the features includes:

The following processing is performed for each described feature:

Based on the first weight and the second weight corresponding to the embedded feature, determine a first intermediate parameter and a second intermediate parameter;

Determine the third multiplication result of the first intermediate parameter and the first coefficient, and the fourth multiplication result of the second intermediate parameter and the second coefficient, and compare the third multiplication result and the fourth The multiplication results are summed to obtain the second summation result;

The multiplication result of the second summation result and the embedded feature is determined as the feature fused with the weight.

5. The method of claim 4, wherein determining the first intermediate parameter and the second intermediate parameter based on the first weight and the second weight corresponding to the embedded feature includes: :

Based on the first weight and the second weight corresponding to the embedded feature, a normalized exponential function is called for processing to obtain the first intermediate parameter and the second intermediate parameter;

Wherein, the normalized exponential function satisfies the following conditions: marginalizing the values of the first intermediate parameter and the second intermediate parameter; maintaining the differentiability of the controller model.

6. The method according to claim 5, characterized in that the first weight and the second weight corresponding to the embedded feature are processed by calling a normalized exponential function to obtain the first An intermediate parameter and the second intermediate parameter include:

Based on the first weight and the second weight corresponding to the embedded feature, a normalized exponential function is called to perform the following processing:

Divide the logarithmic result of the first weight, the summation result of the first noise coefficient and the annealing parameter to obtain a first division result;

Divide the summation result of the logarithm of the second weight and the second noise coefficient with the annealing parameter to obtain a second division result;

Perform a summation process on the exponential result of the first division result and the exponential result of the second division result to obtain a third summation result;

Determine the division result of the exponential result of the first division result and the division result of the third summation result as the first intermediate parameter;

The exponential result of the second division result and the division result of the third summation result are determined as the second intermediate parameters.

7. The method according to claim 1, characterized in that,

The feature information mining model includes an embedding unit and a perceptron unit;

Based on each of the features fused with the weight, the feature information mining model is called to perform the test task and the test results are obtained, including:

Based on each embedded feature fused with the weight, call the perceptron unit to perform a test task and obtain a test result;

The embedded features are obtained by calling the embedding unit to perform the following processing on the features whose initial format is a discrete format:

The features whose initial format is a discrete format are binarized to obtain the binarized features, and the binarized features are embedding to obtain the embedded features, where embedding The dimension of the feature space of the feature after the feature is smaller than the dimension of the feature space of the feature after the binary feature.

8. The method according to claim 7, characterized in that,

The perceptron unit is a multi-layer perceptron, and the multi-layer perceptron includes multiple layers, and each layer includes a corresponding weight matrix and bias vector;

The perceptron unit is called to perform a test task based on each embedded feature fused with the weight, and the test results are obtained, including:

Each embedded feature fused with the weight is used as the input of the first layer of the multi-layer perceptron, and the following processing is performed iteratively m:

The multiplication result of the weight matrix of the mth layer and the input of the mth layer is summed with the bias vector of the mth layer to obtain the fourth summation result;

Perform linear transformation processing on the fourth summation result to obtain the output of the m-th layer, and use the output of the m-th layer as the input of the m+1-th layer;

Wherein, the value of m gradually increases and satisfies 1≤m≤M-1, and M is the total number of the multiple layers;

The multiplication result of the weight matrix of the Mth layer and the input of the Mth layer is summed with the bias vector of the Mth layer to obtain a fifth summation result;

Determine a nonlinear activation function corresponding to the test task;

Perform nonlinear activation processing on the fifth summation result based on the nonlinear activation function to obtain the test result.

9. The method according to claim 1, characterized in that the back propagation process is performed based on the difference between the test result and the marking result of the training sample to update the feature information mining model and the control Server models include:

Divide multiple different training samples into a training set and a verification set;

Substitute the test results corresponding to the training set and the labeling results of the training set into the first objective function to determine the corresponding first difference, determine the first gradient of the feature information mining model based on the first difference, and Update parameters of the feature information mining model according to the first gradient;

The test results corresponding to the verification set and the marking results of the verification set are substituted into the second objective function to determine the corresponding second difference, the second gradient of the controller model is determined according to the second difference, and according to the second gradient updates the weights of the controller model;

Wherein, the first objective function takes the loss of the controller model and the feature information mining model as a factor, and the second objective function takes the loss of the controller model and the feature information mining model as a factor, And the optimization objective of the first objective function is different from the optimization objective of the second objective function.

10. The method of claim 9, wherein,

Wherein, when the controller model is initialized, the first weight and the second weight corresponding to the feature have the same value; after the controller model is updated, when the feature is a valid feature, The first weight is greater than the second weight; when the feature is an invalid feature, the first weight is less than the second weight.

11. The method according to claim 9, characterized in that the loss functions used by the first objective function and the second objective function are binary cross-entropy loss functions;

Substituting the test results corresponding to the training set and the marking results of the training set into the first objective function to determine the corresponding first difference includes:

The following processing is performed by the binary cross-entropy loss function:

Multiply the logarithm of the labeling result of the training set and the test result corresponding to the training set to obtain a fifth multiplication result;

Determine the first difference between 1 and the labeling result of the training set, and the logarithm result of the second difference between 1 and the test result corresponding to the training set, and compare the first difference Multiply the logarithm result of the second difference to obtain a sixth multiplication result;

Determine the summation result of the fifth multiplication result and the sixth multiplication result as the first difference;

Substituting the test results corresponding to the verification set and the marking results of the verification set into the second objective function to determine the corresponding second difference includes:

Multiply the logarithm of the marking results of the verification set and the test results corresponding to the verification set to obtain a seventh multiplication result;

Determine the third difference between 1 and the marking result of the verification set, and the logarithm result of the fourth difference between 1 and the test result corresponding to the verification set, and compare the third difference Multiply the logarithm result of the fourth difference to obtain an eighth multiplication result;

The summation result of the seventh multiplication result and the eighth multiplication result is determined as the second difference.

12. The method of claim 1, wherein the calling the feature selection model reads the updated weight from the controller model, and selects the updated weight from the plurality of features based on the updated weight. Select some features and combine them to obtain combined features, including:

Call the feature selection model to read the updated weights from the controller model, and determine the scores corresponding to the multiple features based on the updated weights;

At least some features with a score greater than a score threshold are selected from the plurality of features and combined to obtain a combined feature.

13. The method according to claim 12, characterized in that,

The updated weight corresponding to each feature includes an updated first weight and an updated second weight, wherein the updated first weight is used to represent the probability of the feature selection model selecting the feature, so The updated second weight is used to represent the probability that the feature selection model discards the feature;

Determining the scores corresponding to the multiple features based on the updated weights includes:

The following processing is performed for each described feature:

Perform a summation process on the exponential results of the updated first weight corresponding to the feature and the exponential results of the updated second weight corresponding to the feature to obtain a sixth summation result;

The division result of the exponential result of the updated first weight corresponding to the feature and the sixth summation result is determined as the score corresponding to the feature.

14. The method according to claim 1, characterized in that,

The number of features included in the combined feature is less than the number of the multiple features, and the feature information mining model is constructed based on the multiple features;

After selecting some features from the multiple features based on the updated weights and combining them to obtain the combined features, the method further includes:

Adjust the size of the input layer of the feature information mining model from a size corresponding to the number of the plurality of features to a size corresponding to the number of features included in the combined feature to obtain the new feature information mining model. .

15. A feature selection device, characterized in that the device includes:

The calling module is used to call the feature selection model to read the weights corresponding to multiple features of the training sample from the controller model;

A fusion module, used to fuse the weight corresponding to each of the features;

The calling module is also used to call the feature information mining model to perform the test task based on each feature fused with the weight, and obtain test results;

An update module, configured to perform backpropagation processing based on the difference between the test results and the marking results of the training samples to update the feature information mining model and the controller model;

The calling module is also used to call the feature selection model to read the updated weights from the controller model;

A combination module, configured to select some features from the multiple features based on the updated weights and combine them to obtain combined features;

16. An electronic device, characterized in that the electronic device includes:

Memory, used to store executable instructions;

A processor configured to implement the feature selection method described in any one of claims 1 to 14 when executing executable instructions stored in the memory.

17. A computer-readable storage medium, characterized in that it stores executable instructions for implementing the feature selection method according to any one of claims 1 to 14 when executed by a processor.

18. A computer program product, comprising a computer program or instructions, characterized in that when the computer program or instructions are executed by a processor, the feature selection method according to any one of claims 1 to 14 is implemented.