CN117973872B

CN117973872B - Supply chain risk identification method and device, electronic equipment and storage medium

Info

Publication number: CN117973872B
Application number: CN202410177740.6A
Authority: CN
Inventors: 张梦雨
Original assignee: Beijing Panto Data Technology Co ltd
Current assignee: Beijing Panto Data Technology Co ltd
Priority date: 2024-02-08
Filing date: 2024-02-08
Publication date: 2024-09-20
Anticipated expiration: 2044-02-08
Also published as: CN117973872A

Abstract

The application provides a supply chain risk identification method, a supply chain risk identification device, electronic equipment and a storage medium. The method comprises the following steps: primarily identifying risk categories and determining risk characteristics of the risk categories; searching information by using the initial feature words to obtain an initial information sample; identifying related events in the initial information sample, expanding a new information sample by using the events, and generating an information sample set; carrying out syntactic analysis and feature word recognition on the sample sentences to obtain core feature words and constructing a feature word library; selecting a plurality of core feature words according to the hit probability of the core feature words, and forming a feature word list from the selected core feature words; and training the machine learning model by using the training data and identifying the supply chain risk in the public text by using the trained machine learning model. The application improves the accuracy and efficiency of the risk identification of the supply chain, and can identify possible risk points of the supply chain timely and accurately.

Description

Supply chain risk identification method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a supply chain risk identification method, apparatus, electronic device, and storage medium.

Background

The complexity and information asymmetry of the global supply chain environment is a major challenge facing current supply chain management. This complexity and information asymmetry results in the supply chain risk being continually amplified during the transfer process, thereby having a significant impact on enterprise operation and efficiency. To address these challenges, businesses need to be able to timely and efficiently identify and evaluate potential risk points in the supply chain.

Effective supply chain risk management needs are becoming more and more stringent, especially in global and networked business environments. The method is required to extract key information from data sources such as public news and social media by using a big data acquisition technology and a semantic modeling method, and conduct real-time monitoring and feature modeling on behaviors such as capacity change, delay delivery, legal disputes and labour and capital relations of suppliers.

Current supply chain risk identification techniques are primarily based on static models for multi-factor analysis. These methods typically rely on structured data such as internal data (e.g., sales data, procurement records, financial data) and external data (e.g., weather records, natural disaster records) of the supply chain. The adopted analysis method comprises a Delphi method, a scene analysis method, a fault tree method, a flow chart method and other traditional methods.

Therefore, the prior art has the problems of low accuracy and low efficiency in terms of supply chain risk identification. These methods lack real-time monitoring of enterprise activities, behaviors, and states, and cannot efficiently identify vendor risk information in time. Furthermore, the prior art relies on static data analysis models, which are difficult to adapt to fast changing market environments and complex supply chain structures.

Disclosure of Invention

In view of the above, the embodiments of the present application provide a supply chain risk identification method, apparatus, electronic device, and storage medium, so as to solve the problems in the prior art that the accuracy and efficiency of supply chain risk identification are low, and possible supply chain risk points cannot be identified timely and accurately.

In a first aspect of an embodiment of the present application, there is provided a supply chain risk identification method, including: performing preliminary identification on risk categories in a supply chain, and determining risk characteristics corresponding to the risk categories; determining initial feature words related to supply chain risks based on the risk features, and searching in a plurality of preset data sources by utilizing the initial feature words to obtain initial information samples; identifying an event related to the initial feature word in the initial information sample, expanding the initial feature word by using the event to obtain a new information sample, and generating an information sample set by using the initial information sample and the new information sample; carrying out syntactic analysis on sample sentences in the information sample set, and carrying out feature word recognition on the sample sentences subjected to the syntactic analysis to obtain core feature words related to supply chain risks, and constructing a feature word library by utilizing the core feature words; determining hit probability corresponding to the core feature words, selecting a plurality of core feature words from a feature word library according to the hit probability, and forming a feature word list by the selected plurality of core feature words; and training a predetermined machine learning model by using the training data to obtain a trained machine learning model, and identifying the supply chain risk in the public text by using the trained machine learning model.

In a second aspect of an embodiment of the present application, there is provided a supply chain risk identification apparatus, including: the determining module is configured to primarily identify risk categories in the supply chain and determine risk characteristics corresponding to the risk categories; the searching module is configured to determine initial feature words related to the risk of the supply chain based on the risk features, and search is carried out in a plurality of preset data sources by utilizing the initial feature words to obtain initial information samples; the expansion module is configured to identify an event related to the initial feature word in the initial information sample, expand the initial feature word by using the event to obtain a new information sample, and generate an information sample set by using the initial information sample and the new information sample; the recognition module is configured to carry out syntactic analysis on sample sentences in the information sample set, and carry out feature word recognition on the sample sentences subjected to the syntactic analysis so as to obtain core feature words related to supply chain risks, and a feature word library is constructed by utilizing the core feature words; the selection module is configured to determine hit probability corresponding to the core feature words, select a plurality of core feature words from the feature word stock according to the hit probability, and form a feature word list from the selected plurality of core feature words; the training module is configured to train a preset machine learning model by using the training data and the information sample set and the feature word list as training data to obtain a trained machine learning model, and identify supply chain risks in the public text by using the trained machine learning model.

In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.

The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects:

Preliminary identification is carried out on risk categories in a supply chain, and risk characteristics corresponding to the risk categories are determined; determining initial feature words related to supply chain risks based on the risk features, and searching in a plurality of preset data sources by utilizing the initial feature words to obtain initial information samples; identifying an event related to the initial feature word in the initial information sample, expanding the initial feature word by using the event to obtain a new information sample, and generating an information sample set by using the initial information sample and the new information sample; carrying out syntactic analysis on sample sentences in the information sample set, and carrying out feature word recognition on the sample sentences subjected to the syntactic analysis to obtain core feature words related to supply chain risks, and constructing a feature word library by utilizing the core feature words; determining hit probability corresponding to the core feature words, selecting a plurality of core feature words from a feature word library according to the hit probability, and forming a feature word list by the selected plurality of core feature words; and training a predetermined machine learning model by using the training data to obtain a trained machine learning model, and identifying the supply chain risk in the public text by using the trained machine learning model. The application improves the accuracy and efficiency of the risk identification of the supply chain, and can identify possible risk points of the supply chain timely and accurately.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a supply chain risk identification method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a supply chain risk identification device according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the application is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.

The complexity and information asymmetry of the global supply chain environment results in a constant amplification in the supply chain risk delivery process. By utilizing a big data acquisition technology and a semantic modeling method, the characteristics of behaviors such as capacity change, delay delivery, legal disputes and labour and capital relations of suppliers are modeled by utilizing data collected in public news and social media, so that enterprise specific behaviors can be timely and efficiently found, possible supply chain risk points can be identified, and data support is provided for the next development of supply chain risk assessment.

At present, supply chain risk identification is commonly multi-factor analysis performed by a static model, and the supply chain risk identification based on big data is performed by collecting structured data such as internal data (such as sales data, purchasing records and financial data) and external data (such as weather records and natural disaster records) of a supply chain, and performing risk identification by using traditional analysis methods such as a Delphi method, a scene analysis method, a fault tree method and a flow chart method, so that timely monitoring on enterprise activities, behaviors and states is absent, and provider risk information cannot be timely and efficiently identified.

Thus, the prior art has obvious limitations in terms of supply chain risk identification, and mainly shows insufficient adaptability to dynamic market environments and complex supply chain structures, and low accuracy and efficiency of risk identification. Therefore, there is an urgent need to develop a new supply chain risk identification method that can make full use of modern big data technology and semantic analysis to improve the accuracy and efficiency of risk identification.

It should be noted that, the supply chain in the embodiment of the present application may be considered as the whole process from the purchase of raw materials to the manufacture of finished products, and then to the delivery of the final products to consumers. This process involves a number of links including suppliers, manufacturers, warehousing, transportation, distributors, retailers, and the like. The purpose of supply chain management is to effectively coordinate these links to meet market demand at minimum cost, maximum efficiency and optimal quality. In this process, the circulation of information and the management of logistics are of vital importance to ensure smooth operation of the whole supply chain.

Supply chain risk may be considered any uncertainty or potential negative events that may be encountered during operation of the supply chain, which may lead to supply chain interruptions, reduced efficiency, increased costs, or quality problems. Supply chain risks may come from a variety of sources including, but not limited to, vendor risks (e.g., vendor bankruptcy or delayed delivery), market risks (e.g., sudden changes in demand), environmental risks (e.g., natural disasters), policy and legal risks (e.g., policy changes or legal disputes), and the like. Supply chain risk management refers to the identification, assessment and control of these risks to ensure stable and efficient operation of the supply chain.

The following describes a supply chain risk identification method and apparatus according to the present application in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flowchart of a supply chain risk identification method according to an embodiment of the present application. As shown in fig. 1, the supply chain risk identification method specifically may include:

s101, primarily identifying risk categories in a supply chain, and determining risk characteristics corresponding to the risk categories;

s102, determining initial feature words related to supply chain risks based on risk features, and searching in a plurality of preset data sources by utilizing the initial feature words to obtain initial information samples;

S103, identifying an event related to the initial feature word in the initial information sample, expanding the initial feature word by using the event to obtain a new information sample, and generating an information sample set by using the initial information sample and the new information sample;

s104, carrying out syntactic analysis on sample sentences in the information sample set, and carrying out feature word recognition on the sample sentences subjected to the syntactic analysis to obtain core feature words related to supply chain risks, and constructing a feature word library by utilizing the core feature words;

S105, determining hit probability corresponding to the core feature words, selecting a plurality of core feature words from a feature word library according to the hit probability, and forming a feature word list by the selected plurality of core feature words;

And S106, training a predetermined machine learning model by using the information sample set and the feature word list as training data to obtain a trained machine learning model, and identifying the supply chain risk in the public text by using the trained machine learning model.

In some embodiments, initially identifying risk categories in a supply chain and determining risk characteristics corresponding to the risk categories includes: and initially identifying risk categories existing in a supply chain, analyzing application scenes, feature meanings and interpretation definitions corresponding to each risk category, and defining risk features and expression forms corresponding to each risk category according to analysis results.

Specifically, first, various types of data about the supply chain are collected, including but not limited to public news stories, social media posts, industry analysis reports, business announcements, and the like. The collected data is then analyzed using natural language processing techniques to initially identify supply chain risk categories that may exist. For example, risk categories such as labour and capital relationship questions, provider financial dilemma, natural disasters, etc., may be initially identified by analyzing social media discussions about a particular provider.

Further, deep feature analysis is performed on each type of risk, and the operation relies on the experience and knowledge of an expert to define the concrete expression form and the feature of each type of risk in detail in combination with industry standards and historical cases. For example, for labour and capital relationship questions, possible manifestations are analyzed, such as a strike, labour and capital negotiations, cut down, no pay, employee churn, employee shortages, job departure, etc.

In one example, for a provider financial dilemma, risk features may include continuous loss, rising asset liability rate, falling credit rating, etc. In the case of natural disaster risk, the risk features may include geographic location, historical disaster records, vulnerability of local infrastructure, and the like.

Further, the meaning of the features of each risk is explained and defined, ensuring clear guidance in subsequent analysis and application. For example, for a strike, the definition may include the length of time that the worker is out of order, the number of lines affected, potential delivery delays, etc.

In practical applications, this approach may be applied to a variety of scenarios. For example, a nationwide company may use this method to assess risk to its suppliers in different countries. By analyzing the collected data, the company can timely identify labour and capital problems or financial dilemmas that particular suppliers may face, thereby taking steps in advance to reduce the potential risk of supply chain interruption.

By the method of the embodiment of the application, the scheme provides a more systematic and comprehensive solution for risk management of the supply chain by combining data analysis and expert knowledge. The scheme can effectively understand and analyze semantic meaning and application scenes of the supply chain risk. This not only helps to more accurately identify and predict risk, but also provides a basis for taking targeted risk management measures. Therefore, the method and the system not only improve the accuracy of risk identification, but also enhance the adaptability of enterprises to supply chain risks and the effectiveness of coping strategies.

In some embodiments, identifying an event in the initial information sample that is related to the initial feature word, expanding the initial feature word with the event to obtain a new information sample includes: analyzing the initial information sample to identify events related to the initial feature words; performing secondary search on the data source by utilizing keywords corresponding to the event to obtain a new information sample; identifying the event in the new information sample, continuing to expand the new information sample, and repeating the expansion operation until all the new information samples are obtained; the initial information sample comprises text sentences with related semantic expressions with the initial feature words.

Specifically, key feature words associated therewith are determined according to different types of supply chain risks. For example, for labour and capital relationship risks, possible keywords include "strike", "cut down", "labour and capital negotiations", and so on. The sources of these keywords are analyzed and selected to ensure that they accurately reflect the specific risks in the supply chain.

Further, searches are conducted in a variety of data sources using the determined keywords, including news websites, social media platforms, business announcements, industry reports, and the like. Text sentences containing these feature words are collected, ensuring that they are related to supply chain risk and have related semantic expressions.

Further, the preliminary collected information samples are analyzed to identify specific events or contexts related to the initial feature words. And carrying out secondary search on the data source by utilizing the keywords corresponding to the identified events so as to acquire a new information sample.

By continually repeating the expansion operations described above, more detailed, deeper information samples related to supply chain risk are iteratively mined and collected. In addition, in practical application, the processing process can be automated by utilizing a machine learning algorithm and a natural language processing technology, so that the collection efficiency and the correlation of samples are improved.

Furthermore, the embodiment of the application can also control the quality of the collected information samples and evaluate the relevance and accuracy of the collected information samples. In addition, the data cleaning operation can be added to remove noise data such as risk-independent information or false alarms. High quality information samples can also be screened from the collected samples for subsequent risk analysis and model training.

By the method provided by the embodiment of the application, the information samples related to the risk of the supply chain can be systematically collected and expanded, and a rich data basis is provided for subsequent risk analysis and model training. By continually expanding and deepening information sample collection, businesses are enabled to more accurately and quickly identify and respond to potential risks in the supply chain.

In some embodiments, performing syntactic analysis on sample sentences in the information sample set and performing feature word recognition on the sample sentences after the syntactic analysis to obtain core feature words related to supply chain risks, including: preprocessing an information sample set, and carrying out syntactic analysis on sample sentences in the preprocessed information sample set by utilizing a natural language processing technology; identifying and marking core words or phrases used for representing the risk of the supply chain in sample sentences analyzed by the method, and taking the core words or phrases used for representing the risk of the supply chain as core feature words; wherein each feature word stock corresponds to a supply chain risk category.

Specifically, first, a collection of information samples is preprocessed, e.g., data cleansing, removal of extraneous characters, normalization of text formats, for subsequent analysis. Then, using natural language processing techniques, each sample sentence is parsed, for example, including identifying syntax elements in the sentence, such as a main predicate structure, a fixed-language clause, a scholarly-language clause, and the like. Through syntactic analysis, the structure and meaning of each sample sentence are deeply understood, and the consistency of all sample data formats is ensured so as to adapt to the requirements of natural language processing technology, thereby more accurately identifying core words or phrases representing specific risks.

Further, core words or phrases that characterize supply chain risk are identified and labeled based on syntactic analysis, and these words are directly related to a particular supply chain risk. For example, for labour and capital relationship risks, possible core phrases include "worker strike", "payroll", "large scale cut down", and the like. For another example, if the risk associated with a supply chain outage is analyzed, phrases such as "delay delivery," "line stop," "logistic bottleneck," etc. may be labeled as key features. In practical application, the labeling process can accurately label the core words or phrases by adopting an automatic tool or a manual auditing mode.

Further, the identified and labeled core feature words are summarized, feature word libraries aiming at each supply chain risk category are constructed, the definition of each feature word library structure needs to be ensured, and subsequent risk analysis and model training are facilitated.

Optionally, the embodiment of the application can continuously optimize the accuracy of feature word recognition according to the model training and the feedback in the practical application, and continuously optimize and update the feature word stock according to the feedback in the practical application and the data analysis result. In addition, the embodiment of the application can also update the feature word stock periodically to adapt to the change of the supply chain environment and the newly-appearing risk type.

It should be noted that, when performing the above steps, various natural language processing algorithms, such as a language model based on deep learning, may be used to improve the accuracy of the syntactic analysis and the recognition efficiency of the feature words. In addition, expert knowledge and an industry-specific vocabulary library can be combined to further improve the relevance and accuracy of feature word recognition.

By the method of the embodiment of the application, key characteristic words or phrases can be effectively extracted from a large number of information samples, and the phrases are directly related to specific supply chain risks. The method is used for extracting core feature words related to the risk of a supply chain from a large amount of text data, thereby providing key data for risk identification and assessment, which is important for establishing an accurate risk prediction model and formulating an effective risk management strategy. In this way, an enterprise can more effectively identify and address potential supply chain risks, thereby maintaining the stability and efficiency of its operation.

In some embodiments, determining the hit probability for the core feature word includes: counting the occurrence times of each core feature word in the information sample set, calculating the proportion between the occurrence times of each core feature word and the total sample amount in the information sample set according to the occurrence times corresponding to each core feature word, and taking the proportion as the hit probability of the core feature word.

Specifically, all annotated core feature words are first extracted from the information sample set. And counting the occurrence times of each core feature word in the information sample set, namely calculating the occurrence times of each core feature word in all the information samples.

Further, for each core feature word, according to the occurrence frequency of the core feature word, calculating the hit probability of each feature word or phrase, namely calculating the ratio of the occurrence frequency of the core feature word to the total sample size of the information sample, wherein the ratio is the hit probability of the core feature word. Expressed by the mathematical formula: hit probability= (number of occurrences of core feature word/total sample) ×100%. For example, if the phrase "delayed delivery" occurs 100 times in 1000 samples, then the hit probability is 10%.

It should be noted that in performing the above steps, various data processing and analysis tools, such as database management system, data analysis software, etc., may be combined. In addition, machine learning algorithms can be utilized to optimize the feature word selection and screening process, ensuring the accuracy and comprehensiveness of the selected feature word in representing supply chain risk.

In some embodiments, selecting core feature words from the feature word library according to hit probabilities includes: and sequencing the core feature words from high to low according to hit probability, and sequentially selecting the core feature words according to the sequencing result until the proportion of the total coverage sample of the selected core feature words exceeds a preset proportion threshold.

Specifically, all core feature words are ranked according to the corresponding hit probability. Core feature words are sequentially selected from the ordered list until the coverage (i.e., cumulative hit probability) of the selected core feature word exceeds a set threshold (e.g., 80%). That is, core feature words are sequentially selected from high to low according to the hit probability, and the selection of the core feature words is stopped until the total sample covered by the selected core feature words exceeds 80%. The finally selected core feature words can represent most risk related information.

Further, the selected core feature vocabulary is assembled into a feature vocabulary, and the feature vocabulary is used as a basis for subsequent semantic recognition and model training.

Optionally, the embodiment of the application can also evaluate and update the feature word list periodically according to the feedback in model training and practical application so as to ensure the accuracy and the comprehensiveness of representing the specific risk. In addition, the coverage rate threshold value can be adjusted timely according to actual requirements and performance evaluation results so as to optimize the selection of feature words.

In practical application, the selection process of the core feature words can be performed through automation technology, so that efficiency and accuracy are improved. In addition, the hit probability and coverage rate of the core feature words can be visually displayed, so that analysis and interpretation are facilitated.

By the method, the embodiment of the application provides important data support for establishing an efficient supply chain risk early warning system by accurately calculating and analyzing the hit probability of the core feature words. And the finally formed feature word list can be ensured to have higher representativeness and coverage rate, key information related to the risk of the supply chain can be effectively captured and represented, and powerful data support is provided for accurate identification and evaluation of the risk.

In some embodiments, after obtaining the trained machine learning model, the method further comprises: inputting a preset verification set into the trained machine learning model to obtain a recognition result output by the trained machine learning model, calculating recall rate and error rate according to the recognition result and a real label corresponding to a test sample in the verification set, performing performance evaluation on the trained machine learning model according to the calculation result of the recall rate and the error rate, and performing model adjustment and optimization according to the result of the performance evaluation; wherein, the machine learning model adopts a natural language processing model.

Specifically, the embodiment of the application uses the previously collected information sample set and feature vocabulary as training data. The training data contains key information and semantic expressions related to the risk of the supply chain. Depending on the requirements of supply chain risk identification, a suitable machine learning or deep learning model is selected, for example: decision trees, random forests, neural networks, etc. Alternatively, a model capable of processing natural language data, such as an NLP (natural language processing) model, may be preferentially used for the characteristics of the text processing of the present scheme.

Further, the training data is used for carrying out preliminary training on the pre-configured machine learning model. During the preliminary training phase, the machine learning model will learn how to identify and predict supply chain risk based on the vocabulary in the feature vocabulary. Finally, a plurality of rounds of iterative training are carried out to obtain a trained machine learning model.

Further, a validation set or test set is used to evaluate the performance of the model, and a validation set is selected or constructed from the existing data, which is different from the training set, but which reflects the same type of supply chain risk scenario. A true label is determined for each sample in the validation set to reflect whether it correctly characterizes the supply chain risk.

Further, the verification set is input into a trained machine learning model, and the recognition result of the model on the verification set sample is recorded. Calculating the recall rate of the model based on the identification result, namely the proportion of the positive type samples correctly identified by the model to all the actual positive type samples; meanwhile, the error rate of the model is calculated, namely the proportion of negative class samples which are incorrectly identified as positive classes of the model to all actual negative class samples.

Further, according to the calculation results of the recall rate and the error rate, the performance of the model is analyzed. And adjusting and optimizing the model according to the result of the performance evaluation. Examples include, but are not limited to, the following adjustments: model parameters, adding more training data, or altering the model structure.

In one example, parameters of the model, such as loss function, learning rate, number of layers, number of neurons, etc., are adjusted to improve model performance. The adjusted model can also be re-evaluated using a validation set to ensure performance improvement. Repeating the steps of performance analysis and parameter adjustment if necessary until the model performance meets the preset requirements.

Further, the training iterative process is continued until the recall rate and the error rate of the model reach preset requirements, which can be set based on the requirements of accuracy and reliability in practical applications. Once the model performance meets the preset requirements, the model is confirmed to be the final model. The aim of the training iteration process is to improve the recall rate of the model, reduce the error rate, ensure that the model can accurately identify real risks and can not generate excessive false alarms.

In practical application, after the model reaches a satisfactory performance level, the model can be deployed into an actual supply chain risk monitoring and early warning system, namely, the model is ready to be applied to an actual supply chain risk recognition scene, the trained model is utilized to perform actual supply chain risk recognition, and early warning is performed according to a supply chain risk recognition result.

It should be noted that in performing the above steps, various machine learning frameworks and natural language processing tools may be utilized to construct and optimize models, such as TensorFlow, pyTorch, etc. In addition, the cross-validation and other technologies can be considered in the model adjustment process to avoid over-fitting and ensure that the model has good generalization capability.

Through the method of the embodiment of the application, the embodiment of the application can construct an efficient and accurate semantic model for identifying and predicting potential risks in a supply chain. And provides an efficient method to evaluate and optimize machine learning models for supply chain risk identification. By the method, the model can be ensured to have high accuracy and high efficiency in practical application, so that enterprises are helped to better manage and cope with various risks in a supply chain. The method ensures that the scheme can effectively cope with complex and changeable supply chain environments and provide timely risk early warning information for enterprises.

According to the technical scheme provided by the embodiment of the application, the embodiment of the application has at least the following advantages:

1) High quality machine learning model training data

The data sample collected and marked by the technical scheme is directly suitable for a machine learning system, no additional data processing or conversion is needed, and the practicability and application range of the data are greatly improved. Accurate data labeling ensures the efficiency of machine learning model training so that the model can learn and adapt to specific supply chain risk scenarios more quickly. The high quality training data provides a solid basis for constructing efficient and accurate predictive models, thereby achieving higher accuracy and reliability in supply chain risk identification.

2) Improving modeling quality and efficiency

The expression mode statistical data provides powerful guidance for the establishment of the learning corpus, so that the corpus is more fit with actual requirements and application scenes. Through the evaluation of sample sparsity and data sparsity, the technical scheme can effectively identify and make up for the defects in the data set, and improves the comprehensiveness and representativeness of the data set. The advantages are combined on the whole modeling process, so that the overall quality of the model is improved, and the effectiveness and accuracy of the model in practical application are ensured.

3) Can timely find out the risk point of inducing supply risk

Through the real-time monitoring of the behavior, activity and state of the suppliers, the technical scheme can timely capture potential risk signals such as emergencies, unstable operation and the like. Early identification of risk points allows enterprises to take precautionary measures or countermeasures in time, thereby reducing the impact of potential risks on the supply chain and maintaining the stability and continuity of the supply chain. More and more accurate information is provided for decision makers, and more intelligent decisions are supported by the decision makers, so that the capability of enterprises to cope with supply chain risks is improved.

Therefore, in summary, the technical scheme of the application not only makes remarkable progress in improving the efficiency and quality of training the machine learning model, but also provides powerful technical support for enterprises to discover and cope with supply chain risks in time by effectively monitoring and analyzing the behaviors, activities and states of suppliers.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Fig. 2 is a schematic structural diagram of a supply chain risk identification device according to an embodiment of the present application. As shown in fig. 2, the supply chain risk recognition apparatus includes:

A determining module 201 configured to primarily identify risk categories in the supply chain and determine risk features corresponding to the risk categories;

The searching module 202 is configured to determine initial feature words related to the risk of the supply chain based on the risk features, and search in a plurality of preset data sources by utilizing the initial feature words to obtain initial information samples;

The expansion module 203 is configured to identify an event related to the initial feature word in the initial information sample, expand the initial feature word by using the event to obtain a new information sample, and generate an information sample set by using the initial information sample and the new information sample;

The recognition module 204 is configured to perform syntactic analysis on the sample sentences in the information sample set, and perform feature word recognition on the sample sentences subjected to the syntactic analysis to obtain core feature words related to the risk of the supply chain, and construct a feature word library by using the core feature words;

The selecting module 205 is configured to determine a hit probability corresponding to the core feature words, select a plurality of core feature words from the feature word stock according to the hit probability, and form a feature word list from the selected plurality of core feature words;

the training module 206 is configured to use the information sample set and the feature vocabulary as training data, train a predetermined machine learning model by using the training data to obtain a trained machine learning model, and identify the supply chain risk in the disclosure by using the trained machine learning model.

In some embodiments, the determining module 201 of fig. 2 primarily identifies risk categories existing in the supply chain, analyzes application scenarios, feature meanings and interpretation definitions corresponding to each risk category, and defines risk features and manifestations corresponding to each risk category according to analysis results.

In some embodiments, the expansion module 203 of fig. 2 analyzes the initial information sample to identify events related to the initial feature word; performing secondary search on the data source by utilizing keywords corresponding to the event to obtain a new information sample; identifying the event in the new information sample, continuing to expand the new information sample, and repeating the expansion operation until all the new information samples are obtained; the initial information sample comprises text sentences with related semantic expressions with the initial feature words.

In some embodiments, the recognition module 204 of fig. 2 pre-processes the information sample set and syntactic analyzes sample sentences in the pre-processed information sample set using natural language processing techniques; identifying and marking core words or phrases used for representing the risk of the supply chain in sample sentences analyzed by the method, and taking the core words or phrases used for representing the risk of the supply chain as core feature words; wherein each feature word stock corresponds to a supply chain risk category.

In some embodiments, the selection module 205 of fig. 2 counts the number of occurrences of each core feature word in the information sample set, and calculates a ratio between the number of occurrences of each core feature word and the total number of samples in the information sample set according to the number of occurrences of each core feature word, where the ratio is taken as the hit probability of the core feature word.

In some embodiments, the selection module 205 of fig. 2 sorts the core feature words according to the hit probability from high to low, and sequentially selects the core feature words according to the sorting result until the ratio of the total coverage samples of the selected core feature words exceeds the preset ratio threshold.

In some embodiments, after obtaining the trained machine learning model, the evaluation module 207 of fig. 2 inputs a preset verification set into the trained machine learning model to obtain a recognition result output by the trained machine learning model, calculates a recall rate and an error rate according to the recognition result and a real label corresponding to a test sample in the verification set, performs performance evaluation on the trained machine learning model according to the calculation result of the recall rate and the error rate, and performs model adjustment and optimization according to the result of the performance evaluation; wherein, the machine learning model adopts a natural language processing model.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device 3 according to an embodiment of the present application. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: a processor 301, a memory 302 and a computer program 303 stored in the memory 302 and executable on the processor 301. The steps of the various method embodiments described above are implemented when the processor 301 executes the computer program 303. Or the processor 301 when executing the computer program 303 performs the functions of the modules/units in the above-described device embodiments.

Illustratively, the computer program 303 may be partitioned into one or more modules/units, which are stored in the memory 302 and executed by the processor 301 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 303 in the electronic device 3.

The electronic device 3 may be an electronic device such as a desktop computer, a notebook computer, a palm computer, or a cloud server. The electronic device 3 may include, but is not limited to, a processor 301 and a memory 302. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the electronic device 3 and does not constitute a limitation of the electronic device 3, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may also include an input-output device, a network access device, a bus, etc.

The Processor 301 may be a central processing unit (Central Processing Unit, CPU) or other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 302 may be an internal storage unit of the electronic device 3, for example, a hard disk or a memory of the electronic device 3. The memory 302 may also be an external storage device of the electronic device 3, for example, a plug-in hard disk provided on the electronic device 3, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like. Further, the memory 302 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 302 is used to store computer programs and other programs and data required by the electronic device. The memory 302 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided by the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementations, multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A supply chain risk identification method, comprising:

primarily identifying risk categories in a supply chain, and determining risk characteristics corresponding to the risk categories;

Determining initial feature words related to supply chain risks based on the risk features, and searching in a plurality of preset data sources by utilizing the initial feature words to obtain initial information samples;

identifying an event related to the initial feature word in the initial information sample, expanding the initial feature word by using the event to obtain a new information sample, and generating an information sample set by using the initial information sample and the new information sample;

Carrying out syntactic analysis on sample sentences in the information sample set, and carrying out feature word recognition on the sample sentences subjected to the syntactic analysis to obtain core feature words related to supply chain risks, and constructing a feature word library by utilizing the core feature words;

Determining hit probability corresponding to the core feature words, selecting a plurality of core feature words from the feature word library according to the hit probability, and forming a feature word list by the selected plurality of core feature words;

Training a preset machine learning model by using the information sample set and the feature word list as training data to obtain a trained machine learning model, and identifying supply chain risks in a public text by using the trained machine learning model;

The preliminary identification of the risk category in the supply chain and the determination of the risk characteristic corresponding to the risk category comprise:

initially identifying risk categories existing in the supply chain, analyzing application scenes, feature meanings and interpretation definitions corresponding to each risk category, and defining risk features and expression forms corresponding to each risk category according to analysis results;

The identifying the event related to the initial feature word in the initial information sample, and expanding the initial feature word by using the event to obtain a new information sample comprises the following steps:

Analyzing the initial information sample to identify events related to the initial feature words; performing secondary search on the data source by utilizing keywords corresponding to the event to obtain a new information sample; identifying an event in the new information sample so as to expand the new information sample, and repeating the expanding operation; the initial information sample comprises text sentences with related semantic expressions with the initial feature words;

The determining the hit probability corresponding to the core feature word comprises the following steps:

counting the occurrence times of each core feature word in the information sample set, calculating the ratio between the occurrence times of each core feature word and the total sample amount in the information sample set according to the occurrence times corresponding to each core feature word, and taking the ratio as the hit probability of the core feature word.

2. The method of claim 1, wherein the syntactic analyzing the sample sentences in the information sample set and performing feature word recognition on the sample sentences after the syntactic analyzing to obtain core feature words related to supply chain risk comprises:

Preprocessing the information sample set, and carrying out syntactic analysis on sample sentences in the preprocessed information sample set by utilizing a natural language processing technology; identifying and marking core words or phrases used for representing the risk of the supply chain in sample sentences analyzed by the method, and taking the core words or phrases used for representing the risk of the supply chain as core feature words; wherein each feature word stock corresponds to a supply chain risk category.

3. The method of claim 1, wherein the selecting core feature words from the feature word stock according to the hit probability comprises:

and sequencing the core feature words from high to low according to the hit probability, and sequentially selecting the core feature words according to the sequencing result until the proportion of the core feature words which are selected to cover the total sample amount exceeds a preset proportion threshold.

4. The method of claim 1, wherein after the trained machine learning model is obtained, the method further comprises:

Inputting a preset verification set into the trained machine learning model to obtain a recognition result output by the trained machine learning model, calculating recall rate and error rate according to the recognition result and a real label corresponding to a test sample in the verification set, performing performance evaluation on the trained machine learning model according to the calculation result of the recall rate and the error rate, and performing model adjustment and optimization according to the result of the performance evaluation; wherein the machine learning model adopts a natural language processing model.

5. A supply chain risk identification device, comprising:

the determining module is configured to primarily identify risk categories in a supply chain and determine risk characteristics corresponding to the risk categories;

The searching module is configured to determine initial feature words related to supply chain risks based on the risk features, and search in a plurality of preset data sources by utilizing the initial feature words to obtain initial information samples;

The expansion module is configured to identify an event related to the initial feature word in the initial information sample, expand the initial feature word by utilizing the event so as to obtain a new information sample, and generate an information sample set by utilizing the initial information sample and the new information sample;

The recognition module is configured to carry out syntactic analysis on the sample sentences in the information sample set, and carry out feature word recognition on the sample sentences subjected to the syntactic analysis so as to obtain core feature words related to the risk of the supply chain, and the core feature words are utilized to construct a feature word library;

The selection module is configured to determine hit probability corresponding to the core feature words, select a plurality of core feature words from the feature word stock according to the hit probability, and form a feature word list by the selected plurality of core feature words;

The training module is configured to train a preset machine learning model by using the information sample set and the feature word list as training data to obtain a trained machine learning model, and identify supply chain risks in a public text by using the trained machine learning model;

the determining module is used for primarily identifying risk categories existing in the supply chain, analyzing application scenes, feature meanings and interpretation definitions corresponding to each risk category, and defining risk features and expression forms corresponding to each risk category according to analysis results;

The expansion module is used for analyzing the initial information sample to identify an event related to the initial feature word; performing secondary search on the data source by utilizing keywords corresponding to the event to obtain a new information sample; identifying an event in the new information sample so as to expand the new information sample, and repeating the expanding operation; the initial information sample comprises text sentences with related semantic expressions with the initial feature words;

The selection module is used for counting the occurrence times of each core feature word in the information sample set, calculating the proportion between the occurrence times of each core feature word and the total sample amount in the information sample set according to the occurrence times corresponding to each core feature word, and taking the proportion as the hit probability of the core feature word.

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 4 when executing the computer program.

7. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any one of claims 1 to 4.