Nothing Special   »   [go: up one dir, main page]

CN117421550A - Policy-based data analysis method and device, electronic equipment and storage medium - Google Patents

Policy-based data analysis method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117421550A
CN117421550A CN202311287862.2A CN202311287862A CN117421550A CN 117421550 A CN117421550 A CN 117421550A CN 202311287862 A CN202311287862 A CN 202311287862A CN 117421550 A CN117421550 A CN 117421550A
Authority
CN
China
Prior art keywords
policy
strategy
data
determining
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311287862.2A
Other languages
Chinese (zh)
Inventor
朱志华
蔡政
李成龙
胡博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shanghai Co Ltd
Original Assignee
Tencent Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shanghai Co Ltd filed Critical Tencent Technology Shanghai Co Ltd
Priority to CN202311287862.2A priority Critical patent/CN117421550A/en
Publication of CN117421550A publication Critical patent/CN117421550A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data analysis method, a device, electronic equipment and a storage medium based on a strategy, which comprise the following steps: the method comprises the steps of obtaining a first object set and a second object set corresponding to a plurality of preset strategies, wherein each object in the first object set is an object subjected to one strategy in the plurality of strategies, sequencing the objects in the original object set based on analysis data corresponding to each object in the original object set to obtain an object sequence, carrying rank information by each object in the object sequence, carrying out strategy analysis based on the experimental set corresponding to each strategy and the rank information carried by the object in the comparison set, and obtaining a data analysis result corresponding to each strategy, wherein the implementation of the data analysis result represents whether the strategies are different from identification information. According to the embodiment of the application, under the condition that all objects are ordered only once, a plurality of experiments operated by the online experiment platform can be subjected to rank sum test, so that the calculation resources and time consumed for ordering each experiment respectively are saved.

Description

Policy-based data analysis method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a policy-based data analysis method, apparatus, electronic device, and storage medium.
Background
Since hundreds or thousands of experiments need to be run on a large experiment platform, the experiments can be different in related objects due to different specific experiment conditions, and therefore each experiment needs to be run independently.
However, if each experiment is run separately, the resources of the platform and the time of the experimenter are extremely consumed. Therefore, how to reduce the resource consumption of the platform and the time of the experimenter during the experiment becomes an urgent problem to be solved.
Disclosure of Invention
In order to solve the problems in the prior art, the embodiment of the invention provides a data analysis method, a data analysis device, electronic equipment and a storage medium based on a strategy. The technical proposal is as follows:
in one aspect, a method for analyzing data based on a policy is provided, the method comprising:
acquiring an original object set corresponding to a plurality of preset strategies; each object in the original object set carries identification information; the identification information comprises an indication identifier and analysis data corresponding to the indication identifier; the indication identifier corresponding to each object in the original object set is the same; the original object set comprises a first object set and a second object set; each object in the first set of objects is an object that is subject to one of a plurality of policies;
Sorting the objects in the original object set based on the analysis data corresponding to each object in the original object set to obtain an object sequence; each object in the sequence of objects carries rank information; the rank information characterizes the position of the object in the object sequence;
determining an experimental set corresponding to each strategy from a first object set, and determining a comparison set corresponding to each strategy from a second object set;
performing strategy analysis based on rank information carried by objects in an experimental set and a comparison set corresponding to each strategy to obtain a data analysis result corresponding to each strategy; the data analysis results characterize the implementation of the strategy as to whether the identification information is different.
In another aspect, there is provided a policy-based data analysis device, the device comprising:
the acquisition module is used for acquiring an original object set corresponding to a plurality of preset strategies; each object in the original object set carries identification information; the identification information comprises an indication identifier and analysis data corresponding to the indication identifier; the indication identifier corresponding to each object in the original object set is the same; the original object set comprises a first object set and a second object set; each object in the first set of objects is an object that is subject to one of a plurality of policies;
The sorting module is used for sorting the objects in the original object set based on the analysis data corresponding to each object in the original object set to obtain an object sequence; each object in the sequence of objects carries rank information; the rank information characterizes the position of the object in the object sequence;
the determining module is used for determining an experiment set corresponding to each strategy from the first object set and determining a comparison set corresponding to each strategy from the second object set;
the analysis module is used for carrying out strategy analysis based on rank information carried by the objects in the experimental set and the comparison set corresponding to each strategy to obtain a data analysis result corresponding to each strategy; the data analysis results characterize the implementation of the strategy as to whether the identification information is different.
In some possible embodiments, the first set of objects includes a plurality of sets of sub-objects that are in one-to-one correspondence with a plurality of policies;
an acquisition module for:
determining a policy verification requirement for each of a plurality of policies;
according to the policy checking requirement of each policy, implementing the corresponding policy on the objects in the sub-object set corresponding to each policy;
and under the condition that implementation of the plurality of strategies is completed for the plurality of sub-object sets, acquiring the original object sets corresponding to the plurality of strategies.
In some possible embodiments, the obtaining module is configured to:
determining a policy application scope and a policy application period of each policy;
and implementing the corresponding strategy for the objects in the sub-object set corresponding to each strategy according to the strategy application range and the strategy application period of each strategy.
In some possible embodiments, the obtaining module is configured to:
determining a unified application scope based on the policy application scope of each policy in the case that the implementation of the plurality of policies is completed for the plurality of sub-object sets;
determining a unified application period based on the policy application period for each policy;
and collecting the original object sets corresponding to the policies according to the unified application range and the unified application period.
In some possible embodiments, the obtaining module is configured to:
determining an application scope intersection based on the policy application scope of each policy;
determining a unified application range according to the application range intersection;
determining an application period intersection based on the policy application period for each policy;
a unified application period is determined from the application period intersection.
In some possible embodiments, the obtaining module is configured to:
if the application period intersection contains an abnormal period corresponding to the abnormal data, determining a difference period according to the application period intersection and the abnormal period;
And determining a unified application period according to the difference period.
In some possible embodiments, the obtaining module is configured to:
collecting original object sets corresponding to a plurality of strategies according to the unified application range and the unified application period and with unified data reflux caliber;
the unified data return aperture includes a data request time aperture or a data arrival time aperture.
In some possible embodiments, the determining module is configured to:
and determining an experiment set corresponding to each strategy from the sub-object set corresponding to each strategy.
In some possible embodiments, the analysis module is configured to:
performing strategy analysis on rank information carried by objects in an experimental set and a control set corresponding to each strategy by utilizing Mannheim rank sum test to obtain first statistical data corresponding to each strategy;
and determining a data analysis result corresponding to each strategy according to the first statistical data corresponding to each strategy and preset critical data.
In some possible embodiments, the analysis module is configured to:
performing strategy analysis on rank information carried by objects in an experimental set and a control set corresponding to each strategy by utilizing non-tightly ordered rank sum test to obtain second statistical data corresponding to each strategy; the second statistical data obeys normal distribution; the second statistical data comprises differences between the experimental set and the control set and occurrence probability;
And determining a data analysis result corresponding to each strategy according to the difference corresponding to each strategy and the occurrence probability.
In another aspect, an electronic device is provided that includes a processor and a memory having at least one instruction or at least one program stored therein, the at least one instruction or at least one program loaded and executed by the processor to implement the policy-based data analysis method described above.
In another aspect, a computer readable storage medium having at least one instruction or at least one program stored therein is provided, the at least one instruction or the at least one program loaded and executed by a processor to implement a policy-based data analysis method as described above.
In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the policy-based data analysis method described above.
According to the embodiment of the invention, the original object set corresponding to the preset policies is obtained, each object in the original object set carries identification information, the identification information comprises indication identifications and analysis data corresponding to the indication identifications, the indication identifications corresponding to each object in the original object set are identical, the original object set comprises a first object set and a second object set, each object in the first object set is an object subjected to one policy in the policies, the objects in the original object set are ordered based on the analysis data corresponding to each object in the original object set to obtain an object sequence, each object in the object sequence carries rank information, the rank information characterizes the position of the object in the object sequence, an experiment set corresponding to each policy is determined from the first object set, a comparison set corresponding to each policy is determined from the second object set, policy analysis is performed based on the rank information carried by the object in the experiment set corresponding to each policy and the comparison set, a data analysis result corresponding to each policy is obtained, and the implementation of the data analysis result characterizes whether the policies have differences to the identification information or not. According to the embodiment of the application, under the condition that all objects are ordered only once, a plurality of experiments operated by the online experiment platform can be subjected to rank sum test, so that the calculation resources and time consumed for ordering each experiment respectively are saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a policy-based data analysis method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of determining an original object set according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of determining an original object set under a unified data condition according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a critical table of a Mannich test U provided by an embodiment of the present invention;
FIG. 6 is a block diagram of a policy-based data analysis device according to an embodiment of the present invention;
fig. 7 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It will be appreciated that in the specific embodiments of the present application, related data such as user information is referred to, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
In order to facilitate understanding of the technical solutions described in the embodiments of the present disclosure and the technical effects thereof, the terms involved in the embodiments of the present disclosure are briefly described:
artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for human eye observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
The automatic driving technology generally comprises high-precision map, environment perception, behavior decision, path planning, motion control and other technologies, and has wide application prospect.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
The embodiment of the invention can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.
Cloud technology (Cloud technology) refers to a hosting technology that unifies serial resources such as hardware, software, networks and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. Cloud technology (Cloud technology) is based on the general terms of network technology, information technology, integration technology, management platform technology, application technology and the like applied by Cloud computing business models, and can form a resource pool, so that the Cloud computing business model is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.
Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside. At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object. The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the set of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (RAID, redundant array of Independent Disk), and a logical volume can be understood as a stripe, whereby physical storage space is allocated for the logical volume.
The Database (Database), which can be considered as an electronic filing cabinet, is a place for storing electronic files, and users can perform operations such as adding, inquiring, updating, deleting and the like on the data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application.
Blockchains are novel application modes of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The blockchain underlying platform may include processing modules for user management, basic services, smart contracts, and operation detection. The user management module is responsible for identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, maintenance of corresponding relation between the real identity of the user and the blockchain address (authority management) and the like, and under the condition of authorization, supervision and audit of transaction conditions of certain real identities, and provision of rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node devices, is used for verifying the validity of a service request, recording the service request on a storage after the effective request is identified, for a new service request, the basic service firstly analyzes interface adaptation and authenticates the interface adaptation, encrypts service information (identification management) through an identification algorithm, and transmits the encrypted service information to a shared account book (network communication) in a complete and consistent manner, and records and stores the service information; the intelligent contract module is responsible for registering and issuing contracts, triggering contracts and executing contracts, a developer can define contract logic through a certain programming language, issue the contract logic to a blockchain (contract registering), invoke keys or other event triggering execution according to the logic of contract clauses to complete the contract logic, and simultaneously provide a function of registering contract upgrading; the operation detection module is mainly responsible for deployment in the product release process, modification of configuration, contract setting, cloud adaptation and visual output of real-time states in product operation, for example: alarms, detecting network conditions, detecting node device health status, etc.
The platform product service layer provides basic capabilities and implementation frameworks of typical applications, and developers can complete the blockchain implementation of business logic based on the basic capabilities and the characteristics of the superposition business. The application service layer provides the application service based on the block chain scheme to the business participants for use.
Referring to fig. 1, a schematic diagram of an implementation environment provided by an embodiment of the present invention is shown, where the implementation environment may include a client 110, a server 120, and a database 130.
Wherein, connection communication between the client 110 and the server 120 and between the server 120 and the database 130 may be performed through a network.
Wherein the client 110 includes, but is not limited to, a cell phone, a computer, an intelligent voice interaction device, an intelligent home appliance, a vehicle client, an aircraft, etc. The client 110 may have an application program with a man-machine interaction function running therein, and the application program may push out virtual article issuing activities of different business scenarios, such as a second killing activity, a lottery activity, a completion task, a get rewards activity, and the like.
The server 120 may obtain an original object set corresponding to a preset plurality of policies, each object in the original object set carries identification information, the identification information includes an indication identifier and analysis data corresponding to the indication identifier, the indication identifier corresponding to each object in the original object set is the same, the original object set includes a first object set and a second object set, each object in the first object set is an object subjected to one policy in the plurality of policies, the objects in the original object set are ordered based on the analysis data corresponding to each object in the original object set to obtain an object sequence, each object in the object sequence carries rank information, the rank information characterizes the position of the object in the object sequence, an experiment set corresponding to each policy is determined from the first object set, a comparison set corresponding to each policy is determined from the second object set, policy analysis is performed based on the rank information carried by the object in the experiment set corresponding to each policy and the comparison set, a data analysis result corresponding to each policy is obtained, and the implementation of the data analysis result characterizes whether the policy differs from the identification information. According to the embodiment of the application, under the condition that all objects are ordered only once, a plurality of experiments operated by the online experiment platform can be subjected to rank sum test, so that the calculation resources and time consumed for ordering each experiment respectively are saved.
The database 130 may include a memory database and a relational database, and it should be noted that, in the embodiment of the present invention, the server, the database, the node, etc. may be independent physical servers, may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and an artificial intelligent platform.
In an exemplary embodiment, the client 110, the server 120, and the database 130 may be node devices in a blockchain system, and may share acquired and generated information to other node devices in the blockchain system, so as to implement information sharing between multiple node devices. The plurality of node devices in the blockchain system can be configured with the same blockchain, the blockchain consists of a plurality of blocks, and the blocks adjacent to each other in front and back have an association relationship, so that the data in any block can be detected through the next block when being tampered, thereby avoiding the data in the blockchain from being tampered, and ensuring the safety and reliability of the data in the blockchain.
Referring to fig. 2, a flow chart of a policy-based data analysis method according to an embodiment of the present invention is shown, and the method may be applied to the implementation environment shown in fig. 1, where an execution subject of the method may be a server that performs the determination of the clustering result in fig. 1, or may be a client or other server nodes that performs the determination of the clustering result. It is noted that the present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. In actual system or product execution, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment). As shown in fig. 2, the method may include:
s201, acquiring an original object set corresponding to a plurality of preset strategies; each object in the original object set carries identification information; the identification information comprises an indication identifier and analysis data corresponding to the indication identifier; the indication identifier corresponding to each object in the original object set is the same; the original object set comprises a first object set and a second object set; each object in the first set of objects is an object that is subject to one of a plurality of policies.
In this embodiment, each of the plurality of policies may be preset by a technician, and each policy may include a corresponding experimental group and a control group. The control group is an object set which does not implement the strategy, and the experimental group is an object set which implements the strategy.
As described above, since hundreds or thousands of experiments need to be run on a large scale scene such as an online experiment platform, the experiments may be different from each other due to different specific experimental conditions, and thus each experiment needs to be run separately. However, if each experiment is run separately, the resources of the platform and the time of the experimenter are extremely consumed.
In embodiments of the present application, experiments may generally include parametric tests and non-parametric tests.
Wherein, the meaning of parameter test is: assuming that the data of the parameters obey a certain distribution (typically a normal distribution), the overall parameters (μ) are checked by an estimator (x±s) of the sample parameters, such as t-test, u-test, anova, etc. The meaning of the nonparametric test is: the distribution of the data is directly checked without assuming an overall distribution form. The name non-parametric test, since no parameters of the overall distribution are involved. Such as chi-square test, rank sum detection, etc.
The differences between parametric and non-parametric tests are:
the measurement of the central trend of the parameter test is the average value; whereas the measure of the central tendency of the non-parametric test is the median;
parameter verification requires information about the overall distribution; whereas non-parametric tests do not require information about the overall distribution;
parameter checking is applicable to variables; while non-parametric inspection applies to both variables and attributes;
if it can be assumed that the sample data is from a population with a specific distribution, a parametric test is used. If the necessary assumptions cannot be made about the dataset, a non-parametric test is used.
In the present embodiment, the rank sum test (mann-whitney U test) is a statistical method. It does not require data to be normally distributed and is suitable for various types of data. The basic idea of this approach is to combine two sets of data, then sort all the data, and calculate the sum of their rankings. By comparing the magnitude relation of the sum of the two ranks, whether the distribution positions of the population where the test data of the test pair are located are significantly different or not can be judged. If the sum of the two ranks is nearly equal, then the distribution of the two sets of data is similar, and vice versa.
In this embodiment of the present application, since the data of the measurement index of some experiments do not conform to the state distribution, the experiments of the large-scale experiment platform also include some rank sum tests applicable to the data do not conform to the state distribution. How to save platform resources and experimenters' time based on the implementation of rank sum test is also an urgent issue to be resolved.
In some possible embodiments, the original set of objects may include a first set of objects that are policy enforced and a second set of objects that are not policy enforced. Alternatively, the step of dividing the original object set into a first object set and a second object set may be operated before enforcing the policy. Optionally, the step of dividing the original object set into the first object set and the second object set may be performed after the policy is applied, according to whether the objects in the original object set are subjected to the policy. The embodiments of the present application do not limit the order of the distinguishing operations involved in distinguishing the time points of the first object set and the second object set from each other.
To ensure independence of the tests, and thus accuracy of the tests, each object in the first set of objects can only be subjected to one of a preset plurality of policies.
In some possible embodiments, the first object set includes a plurality of sub-object sets corresponding to the plurality of policies one-to-one, i.e., the number of policies is the same as the number of sub-object sets, and each policy has its corresponding sub-object set and the corresponding sub-object set is unique. Alternatively, the step of dividing the first object set into a plurality of sub-object sets corresponding one-to-one to the plurality of policies may be operated before enforcing the policies. Optionally, the step of dividing the first object set into a plurality of sub-object sets corresponding to the policies one by one may distinguish the first object set according to the indication identifier in the identification information of the policies after the policies are implemented. The embodiments of the present application do not limit the determining operation sequence related to the time points and the time points of determining the multiple sub-object sets corresponding to the policies one to one.
In an alternative embodiment, please refer to fig. 3, which illustrates a flowchart of determining an original object set according to an embodiment of the present invention, wherein S201 above includes:
s2011, determining a policy check requirement of each of the plurality of policies.
In the embodiment of the application, the policy checking requirement of each policy in the plurality of policies can be determined, and the policy checking requirement is required to meet the policy checking requirement when the corresponding policy is implemented.
Optionally, the policy verification requirements of each of the plurality of policies may include at least one of a policy application scope and a policy application period. As such, when determining policy verification requirements for each policy, a policy application scope for each policy may be determined, or a policy application period for each policy may be determined, or both a policy application scope and a policy application period for each policy may be determined.
In this embodiment of the present application, the plurality of policies includes at least two policies, such as one hundred policies or ten policies. The embodiments of the present application will be specifically described below by way of a specific example in conjunction with the above description.
In one example, the plurality of policies includes a policy a, a policy B, and a policy C, where the content of the policy a is "performing a operation in a first period of time of a first platform", the content of the policy B is "performing B operation in a second period of time of a second platform", the content of the policy C is "performing C operation in a third period of time on the first platform", and it is determined whether the policies have an effect on a certain index.
Alternatively, the certain index is the indication identifier in the above identification information, such as online time length, resource expense, attention increase amount, and the like.
In the embodiment of the present application, the implementation of the policies a, B, and C are all aimed at the same index, that is, the policies correspond to the same indication identifier, that is, the indication identifier corresponding to each object in the original object set is the same.
In the embodiment of the application, the content of each policy can be identified by using an identification model in the artificial intelligence, so as to obtain a policy application range and/or a policy application period corresponding to each policy. And keyword matching can be performed on the content of each policy by using a keyword matching method, so as to obtain a policy application range and/or a policy application period corresponding to each policy.
S2013, implementing corresponding strategies for the objects in the sub-object set corresponding to each strategy according to the strategy checking requirement of each strategy.
Alternatively, according to the policy application scope of each policy, the corresponding policy may be implemented on the objects in the sub-object set corresponding to each policy.
Alternatively, according to the policy application period of each policy, a corresponding policy may be implemented for the objects in the sub-object set corresponding to each policy.
Alternatively, according to the policy application scope and the policy application period of each policy, the corresponding policy may be implemented on the objects in the sub-object set corresponding to each policy.
Based on the above examples, it is specifically stated that the implementation a operation may be implemented according to the policy application scope (i.e., the first platform) corresponding to the policy a, and the policy application period (i.e., the first period of time) corresponding to the policy a, for the objects in the sub-object set corresponding to the policy a. The B operation is implemented according to the policy application scope (i.e., the second platform) corresponding to the policy B and the policy application period (i.e., the second period), and the C operation is implemented according to the policy application scope (i.e., the first platform) corresponding to the policy C and the policy application period (i.e., the third period), corresponding to the policy C.
Alternatively, the policy application scope of different policies may be the same or may be different. The policy application period of different policies may be the same or may be different. Generally, the operation of the different strategies is different.
S2015, under the condition that implementation of the plurality of strategies is completed for the plurality of sub-object sets, acquiring an original object set corresponding to the plurality of strategies.
Alternatively, in the case where implementation of a plurality of policies for a plurality of sub-object sets is completed, an original object set corresponding to the plurality of policies may be acquired. Each object in the original object set carries identification information, and the identification information comprises an indication identifier and analysis data corresponding to the indication identifier. The implementation of the policy may affect indicating the corresponding analysis data, such as the duration value corresponding to the duration of the online.
Alternatively, the point in time when the original object set is acquired is not limited, and may be at the moment when all policies are completed, or after a period of time after all policies are completed.
In the embodiment of the present application, in order to save platform resources and time of an experimenter, the sorting of the objects in the original object set based on the analysis data corresponding to each object in the original object set in step S203 may be performed to obtain the object sequence. In an alternative embodiment, the objects in the original object set are ordered based on the analysis data corresponding to each object in the original object set, and the premise of obtaining the object sequence is that the data is unified.
Based on the above, when the implementation of the plurality of policies on the plurality of sub-object sets is completed, the data may be unified before the original object sets corresponding to the plurality of policies are acquired. Referring to fig. 4, a flow chart of determining an original object set under a unified data condition according to an embodiment of the present invention is shown, including:
s401, determining a unified application range based on the policy application range of each policy in the case that implementation of a plurality of policies is completed for a plurality of sub-object sets.
In the embodiment of the application, in the case that implementation of a plurality of policies for a plurality of sub-object sets is completed, an application scope intersection may be determined based on the policy application scope of each policy, and a unified application scope may be determined according to the application scope intersection.
Specifically, the policy application range corresponding to the policy a is a first platform, the policy application range corresponding to the policy B is a second platform, the policy application range corresponding to the policy C is the first platform, and it may be determined that the intersection of the first platform and the second platform is an application range intersection, and then the intersection of the first platform and the second platform may be used as a unified application range. If the intersection of the first platform and the second platform is a full platform, the full platform can be used as a unified application range, that is, the policy application range of each policy is the full platform.
This policy application scope, which treats the application scope intersection as each policy, has an effect on the absolute data of the analysis data, because there is a data addition of multiple platforms, but it is reasonable for the rank sum test of the non-parametric test. Since for an experiment to be performed only on the first platform, this experiment can be understood to be effective on the full platform as well, except that the value of the analysis data on the non-first platform is 0, i.e. has no effect. Thus, the inclusion of some of the analysis data with effect 0 has no effect on the directionality of the experimental results. Even if some platform overflow phenomenon exists, such as only operating on the first platform, the method has the effect on the first platform and the second platform, and the result obtained by the statistics of the whole platform is more accurate.
Based on the above analysis, the policy application scope of each policy can be redefined as a unified application scope obtained by intersection.
S403, determining a unified application period based on the policy application period of each policy.
In the embodiment of the application, the application period intersection can be determined based on the policy application period of each policy, and the unified application period is determined according to the application period intersection.
Specifically, the policy application period corresponding to the policy a is a first period, the policy application period corresponding to the policy B is a second period, the policy application period corresponding to the policy C is a third period, and it may be determined that an intersection of the first period, the second period, and the third period is a fourth period, and then the fourth period may be used as a unified application period. If the fourth time period is a discontinuous time period, such as including 6-10 points and 12-14 points, a continuous fifth time period corresponding to the fourth time period, such as 6-14 points, may be regarded as a unified application period.
In some possible embodiments, it is contemplated that the policy is executed during a first time period, a second time period, or a third time period, but it is not excluded that there is also a response after policy enforcement after these time periods. Based on this, a preset period of time after the application period intersection and the application period intersection may be taken as a unified application time, for example, assuming that the application period intersection is 6-14 points, 6-20 points may be taken as a unified application period corresponding to each policy. Alternatively, the whole day, two whole days, whole week, or the like where the application period intersection is located may be regarded as the unified application period.
The reason why the application period intersection or a larger period of time in which the application period intersection is located can be regarded as the policy application period of each policy is that: in the description of policy a, policy a is implemented in a first period of time, and in a non-first period of time in the unified application period, policy a does not play a role in objects in its corresponding subset of objects, and therefore has no effect in subsequent sorting based on analysis data.
In some possible embodiments, if there is an abnormal period in which abnormal data occurs in the application period intersection or in a larger period in which the application period intersection is located, a difference period may be determined according to the application period intersection and the abnormal period, and a unified application period may be determined according to the difference period. For example, assuming that the intersection of the application periods is 0-24 points corresponding to a whole day and the abnormal period is 0-3 points, the difference period is 3-24 points, and the difference period can be directly determined as a unified application period. Or, assuming that the intersection of the application periods is 0-24 points corresponding to one whole day, and the abnormal period is the 0-24 points, if the difference period is none, the next whole day of the whole day corresponding to the 0-24 points can be determined as the unified application period.
S405, collecting original object sets corresponding to a plurality of strategies according to the unified application range and the unified application period.
In order to further ensure the unification of the data, thereby improving the accuracy of the data analysis result, the original object sets corresponding to the policies can be collected in a unified data reflow caliber according to the unified application range and the unified application period, wherein the unified data reflow caliber comprises a data request time caliber or a data arrival time caliber.
Optionally, the unification of the data request time aperture refers to that the collection of the related data of each object in the original object set is based on the same data request time. Alternatively, the data arrival time aperture unification means that the collection of related data of each object in the original object set is based on the same data arrival time. The actual choice in the application may be based on the specific difficulty of engineering implementation, and the application is not limited.
In this way, the embodiment of the application may collect the original object sets corresponding to the policies, that is, the first object set subjected to the policies and the second object set not subjected to any policies, that is, the sub-object sets and the second object sets corresponding to the policies one to one.
S203, sorting the objects in the original object set based on the analysis data corresponding to each object in the original object set to obtain an object sequence; each object in the sequence of objects carries rank information; the rank information characterizes the position of the object in the sequence of objects.
In this embodiment of the present application, the objects in the original object set may be sorted in ascending order based on the analysis data (such as the duration value of the online duration) corresponding to each object in the original object set, that is, the smaller the duration value, the earlier the sorting, the larger the duration value, and the later the sorting.
Alternatively, the assignment may be performed according to the position of each object in the original object set in the object sequence, for example, the position of the object a in the object sequence is 50 th bit, then the assignment 50 may be performed on the object a, and the assigned value 50 is used as rank information of the object a, so that each object in the original object set obtains its corresponding rank information.
S205, determining an experiment set corresponding to each strategy from the first object set, and determining a comparison set corresponding to each strategy from the second object set.
In the embodiment of the application, the experimental set corresponding to each strategy can be determined from the sub-object set corresponding to each strategy, and the comparison set corresponding to each strategy is determined from the second object set.
For example, an experimental set corresponding to policy a may be determined from a subset of objects corresponding to policy a, and a control set corresponding to policy a may be determined from a second set of objects. The experimental set corresponding to policy B may be determined from the subset of objects corresponding to policy B, and the control set corresponding to policy B may be determined from the second set of objects. The experimental set corresponding to policy C may be determined from the sub-object set corresponding to policy C, and the control set corresponding to policy C may be determined from the second object set.
Alternatively, all objects in the sub-object set corresponding to the policy a may constitute the experiment set corresponding to the policy a, or some objects in the sub-object set corresponding to the policy a may constitute the experiment set corresponding to the policy a. The experimental set corresponding to the strategy B and the strategy C can be similarly omitted.
Alternatively, the objects included in the comparison sets corresponding to the policies a, B and C may not be repeated at all, and repeated objects may also exist.
Therefore, the experimental set and the comparison set corresponding to each strategy can be obtained through one-time global sequencing, compared with a plurality of sequencing corresponding to a plurality of strategies, the platform resources and the time of the experimenters can be saved, and each object in the experimental set and the comparison set corresponding to each strategy contains own rank information.
S207, performing strategy analysis based on rank information carried by objects in an experimental set and a comparison set corresponding to each strategy to obtain a data analysis result corresponding to each strategy; the data analysis results characterize the implementation of the strategy as to whether the identification information is different.
In an alternative embodiment, the rank information carried by the objects in the experimental set and the control set corresponding to each policy may be analyzed by using the mann-whitney rank sum test to obtain first statistical data corresponding to each policy, and the data analysis result corresponding to each policy may be determined according to the first statistical data corresponding to each policy and preset critical data.
Specifically, the rank information carried by the objects in the experimental set and the control set corresponding to each policy can be tested by using the Mannheim rank sum test, the number of the respective objects in the experimental set and the control set, and the analysis data of the respective objects in the experimental set and the control set can be used for performing policy analysis to obtain a first statistics data corresponding to each policy
The procedure for the mann-whitney rank sum test is illustrated by a simplified example, comprising the following 7 steps:
(1) Assume that:
analytical data of the experimental set were: 19,22,16,29,24;
Analytical data for the control set were: 20,11,17,12;
(2) Sequencing the objects of the experimental set and the control set according to the value of the analysis data from small arrival to obtain the following sequence:
11,12,16,17,19,20,22,24,29;
(3) Assigning a rank information to each value in the sequence to obtain the following rank sequence:
1,2,3,4,5,6,7,8,9;
(4) Obtaining rank information of objects in the experimental set and the control set:
the rank information of the experimental set is: 5,7,3,9,8;
the rank information of the control set is: 6,1,4,2;
(5) Counting the number of objects in the experimental set and the control set and the total value of analysis data;
the number of subjects in the experimental set was n1=5, and the total value of the analysis data was m1=32;
the number of subjects in the control set was n2=4, and the total number of analysis data was m2=13;
(6) Determining respective ranks and statistics of the experimental set and the control set;
rank and statistics of experimental set: u1=m1-n1× (n1+1)/(2=17;
rank and statistics of control set: u2=m2-n2× (n2+1)/(2=3);
(7) According to the mann-whitney rank sum test, the experimental and control sets were treated with a smaller rank sum statistic as the final rank sum statistic, i.e. 3.
Referring to FIG. 5, a schematic diagram of a critical table of the Mannheimia test U according to an embodiment of the invention is shown. When the number of objects in the experimental set is n1=5, the number of objects in the comparison set is n2=4, and the critical data of the U value obtained by looking up the table is 1. Since 3 is greater than 1, the actual value is greater than the critical value and the original hypothesis cannot be rejected, i.e., the experimental set and the control set have no statistically significant difference.
When the embodiment of the present application completes step S205, an experimental set and a comparison set corresponding to each policy may be obtained, and each object in the experimental set and the comparison set corresponding to each policy includes own rank information. Then in executing step S207, only the 5 th-7 th steps of the above 7 steps need to be continuously executed, and since the first 4 steps of each policy are already uniformly completed through the shared ordering, the platform resources and the experiment time of the experimenter can be saved.
When the first statistics data is smaller than the preset critical data, the strategy is valid, otherwise, the strategy is invalid.
However, although the data analysis result corresponding to each strategy can be obtained through the mann-whitney rank sum test, one of the steps needs to be applied in combination with the critical table of the mann-whitney test U, and when the number of objects in the experimental group and the control group in the strategy is large, for example, in units of ten thousands or units of hundred million, the critical table of the mann-whitney test U is inconvenient to apply because of the large data volume. Based on this, the present application may apply other schemes to determine the data analysis result.
In another optional embodiment, the rank information carried by the objects in the experimental set and the control set corresponding to each policy may be analyzed by using the non-tightly ordered rank sum test to obtain second statistical data corresponding to each policy; the second statistical data obeys normal distribution; the second statistical data includes the difference between the experimental set and the control set and the occurrence probability of this difference. And determining a data analysis result corresponding to each strategy according to the difference corresponding to each strategy and the occurrence probability. For example, if the difference between the experimental set and the control set is large and the probability of such difference is small, such as mapping at both ends of a normal distribution, it is proved that
Wherein the formula for non-closely ordered rank sum test comprises:
wherein t is second statistical data, which obeys normal distribution; n is the number of subjects in the experimental set; m is the number of objects in the control set; sigma (sigma) 2 Sample variance calculated for rank information of n+m objects; r is R i Is the ith rank information;the average value of rank information corresponding to one strategy;The average value of rank information corresponding to an experimental set of one strategy;Is the average value of rank information corresponding to a control set of one strategy.
T obeys the normal distribution as demonstrated below:
the zero assumption for the rank sum test is H0: experimental group T and control group C were co-distributed. This null hypothesis will now be examined. Under the standard practice of no noise, the sequences of the experimental group and the control group are combined to obtain:
1,2, …, m+n: r is R 1 ,R 2 ,…,R m ,…,R m+n
This property is not present. The m+n numbers are arranged from small to large as follows: s is S 1 ,…,S m+n
At this time, R 1 ,R 2 ,…,R m ,…,R m+n Is S 1 ,…,S m+n An order, S, is some constant value, and R is some random variable with a value on S.
When H0 is established, it means that the random vector (R 1 ,R 2 ,…,R m ,…,R m+n ) Taking this (n+m) +.! The likelihood of any one of the values being equal. Then for any two samples R i ,R j The following three propositions can be demonstrated:
(1)
(2)
(3)
Wherein,
they are demonstrated below:
(1) Is easy to be proved.
(2)
(3)
The original proposition is now demonstrated.
At this timeIt is considered that normal distribution is obeyed (the central limit theorem, which is that the range of values is 1-10 hundred million and relatively uniform, so that the long tail condition does not occur), so that it is just necessary to calculate the mean and variance thereof. Where it is clearly true that its mean value is 0, only the variance needs to be calculated.
So that the number of the parts to be processed,i.e. t is subject to normal distribution.
In summary, by the method, under the condition that all objects are ordered only once, all experiments operated by the online experiment platform can be subjected to rank sum test, so that the calculation resources and time consumed for ordering each experiment respectively are saved, and the method can be widely used in non-parametric tests using the ranks of data as statistics and applied to large-scale scenes such as the online experiment platform.
Referring to fig. 6, a schematic structural diagram of a policy-based data analysis device according to an embodiment of the present invention is shown, where the device has a function of implementing the policy-based data analysis method in the foregoing method embodiment, and the function may be implemented by hardware or implemented by executing corresponding software by hardware. As shown in fig. 6, the policy-based data analysis device 600 may include:
The acquiring module 601 is configured to acquire an original object set corresponding to a plurality of preset policies; each object in the original object set carries identification information; the identification information comprises an indication identifier and analysis data corresponding to the indication identifier; the indication identifier corresponding to each object in the original object set is the same; the original object set comprises a first object set and a second object set; each object in the first set of objects is an object that is subject to one of a plurality of policies;
the sorting module 602 is configured to sort the objects in the original object set based on the analysis data corresponding to each object in the original object set, so as to obtain an object sequence; each object in the sequence of objects carries rank information; the rank information characterizes the position of the object in the object sequence;
a determining module 603, configured to determine an experiment set corresponding to each policy from the first object set, and determine a comparison set corresponding to each policy from the second object set;
the analysis module 604 is configured to perform policy analysis based on rank information carried by the objects in the experimental set and the control set corresponding to each policy, so as to obtain a data analysis result corresponding to each policy; the data analysis results characterize the implementation of the strategy as to whether the identification information is different.
In some possible embodiments, the first set of objects includes a plurality of sets of sub-objects that are in one-to-one correspondence with a plurality of policies;
an acquisition module for:
determining a policy verification requirement for each of a plurality of policies;
according to the policy checking requirement of each policy, implementing the corresponding policy on the objects in the sub-object set corresponding to each policy;
and under the condition that implementation of the plurality of strategies is completed for the plurality of sub-object sets, acquiring the original object sets corresponding to the plurality of strategies.
In some possible embodiments, the obtaining module is configured to:
determining a policy application scope and a policy application period of each policy;
and implementing the corresponding strategy for the objects in the sub-object set corresponding to each strategy according to the strategy application range and the strategy application period of each strategy.
In some possible embodiments, the obtaining module is configured to:
determining a unified application scope based on the policy application scope of each policy in the case that the implementation of the plurality of policies is completed for the plurality of sub-object sets;
determining a unified application period based on the policy application period for each policy;
and collecting the original object sets corresponding to the policies according to the unified application range and the unified application period.
In some possible embodiments, the obtaining module is configured to:
Determining an application scope intersection based on the policy application scope of each policy;
determining a unified application range according to the application range intersection;
determining an application period intersection based on the policy application period for each policy;
a unified application period is determined from the application period intersection.
In some possible embodiments, the obtaining module is configured to:
if the application period intersection contains an abnormal period corresponding to the abnormal data, determining a difference period according to the application period intersection and the abnormal period;
and determining a unified application period according to the difference period.
In some possible embodiments, the obtaining module is configured to:
collecting original object sets corresponding to a plurality of strategies according to the unified application range and the unified application period and with unified data reflux caliber;
the unified data return aperture includes a data request time aperture or a data arrival time aperture.
In some possible embodiments, the determining module is configured to:
and determining an experiment set corresponding to each strategy from the sub-object set corresponding to each strategy.
In some possible embodiments, the analysis module is configured to:
performing strategy analysis on rank information carried by objects in an experimental set and a control set corresponding to each strategy by utilizing Mannheim rank sum test to obtain first statistical data corresponding to each strategy;
And determining a data analysis result corresponding to each strategy according to the first statistical data corresponding to each strategy and preset critical data.
In some possible embodiments, the analysis module is configured to:
performing strategy analysis on rank information carried by objects in an experimental set and a control set corresponding to each strategy by utilizing non-tightly ordered rank sum test to obtain second statistical data corresponding to each strategy; the second statistical data obeys normal distribution; the second statistical data comprises differences between the experimental set and the control set and occurrence probability;
and determining a data analysis result corresponding to each strategy according to the difference corresponding to each strategy and the occurrence probability.
It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.
An embodiment of the present invention provides an electronic device, where the electronic device includes a processor and a memory, where at least one instruction or at least one section of program is stored in the memory, where the at least one instruction or the at least one section of program is loaded and executed by the processor to implement a policy-based data analysis method as provided in the above method embodiment.
The memory may be used to store software programs and modules that the processor executes to perform various functional applications and cluster result determinations by running the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.
The method embodiments provided by the embodiments of the present invention may be executed in a computer terminal, a server, or similar computing device. Taking the example of running on a server, fig. 7 is a block diagram of a hardware structure of a server running a policy-based data analysis method according to an embodiment of the present invention, as shown in fig. 7, the server 2000 may generate relatively large differences according to configuration or performance, and may include one or more central processing units (Central Processing Units, CPU) 2010 (the processor 2010 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 2030 for storing data, one or more storage media 2020 (e.g., one or more mass storage devices) storing the application 2023 or the data 2022. Wherein the memory 2030 and the storage medium 2020 may be transitory or persistent. The program stored on the storage medium 2020 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, the central processor 2010 may be configured to communicate with a storage medium 2020, executing a series of instruction operations in the storage medium 2020 on the server 2000. The server 2000 may also include one or more power supplies 2060, one or more wired or wireless network interfaces 2050, one or more input/output interfaces 2040, and/or one or more operating systems 2021, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
The input-output interface 2040 may be used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the server 2000. In one example, the input-output interface 2040 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the input/output interface 2040 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 7 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the server 2000 may also include more or fewer components than shown in fig. 7, or have a different configuration than shown in fig. 7.
Embodiments of the present invention also provide a computer readable storage medium that may be disposed in an electronic device to hold at least one instruction or at least one program related to implementing a policy-based data analysis method, the at least one instruction or the at least one program being loaded and executed by the processor to implement the policy-based data analysis method provided by the above-described method embodiments.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Embodiments of the present invention also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the policy-based data analysis method described above.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (13)

1. A policy-based data analysis method, the method comprising:
acquiring an original object set corresponding to a plurality of preset strategies; each object in the original object set carries identification information; the identification information comprises an indication identifier and analysis data corresponding to the indication identifier; the indication identifier corresponding to each object in the original object set is the same; the original object set comprises a first object set and a second object set; each object in the first set of objects is an object that is subject to one of the plurality of policies;
Sorting the objects in the original object set based on the analysis data corresponding to each object in the original object set to obtain an object sequence; each object in the object sequence carries rank information; the rank information characterizes the position of an object in the sequence of objects;
determining an experimental set corresponding to each strategy from the first object set, and determining a comparison set corresponding to each strategy from the second object set;
performing strategy analysis based on rank information carried by objects in an experimental set and a comparison set corresponding to each strategy to obtain a data analysis result corresponding to each strategy; and the implementation of the data analysis result characterization strategy is different from the identification information.
2. The policy-based data analysis method according to claim 1, wherein said first set of objects comprises a plurality of sub-sets of objects corresponding one-to-one to said plurality of policies;
the obtaining the original object set corresponding to the preset policies includes:
determining a policy verification requirement for each of the plurality of policies;
according to the policy checking requirement of each policy, implementing the corresponding policy on the objects in the sub-object set corresponding to each policy;
And under the condition that implementation of the policies is completed for the sub-object sets, acquiring the original object sets corresponding to the policies.
3. The policy-based data analysis method according to claim 2, wherein said determining a policy verification requirement for each of said plurality of policies, according to said policy verification requirement for each policy, enforcing a corresponding policy on an object in a subset of said corresponding subset of each policy, comprises:
determining a policy application scope and a policy application period of each policy;
and implementing the corresponding strategy for the objects in the sub-object set corresponding to each strategy according to the strategy application range and the strategy application period of each strategy.
4. The method for analyzing data based on policies according to claim 3, wherein, when the implementation of the policies on the plurality of sub-object sets is completed, obtaining the original object sets corresponding to the policies includes:
determining a unified application range based on the policy application range of each policy under the condition that the implementation of the plurality of policies is completed for the plurality of sub-object sets;
determining a unified application period based on the policy application period of each policy;
And collecting the original object sets corresponding to the policies according to the unified application range and the unified application period.
5. The policy-based data analysis method according to claim 4, wherein said determining a unified application scope based on the policy application scope of each policy, determining a unified application period based on the policy application period of each policy, comprises:
determining an application scope intersection based on the policy application scope of each policy;
determining the unified application range according to the application range intersection;
determining an application period intersection based on the policy application period of each policy;
and determining the unified application period according to the application period intersection.
6. The policy-based data analysis method according to claim 5, wherein said determining said uniform application period from said application period intersection comprises:
if the application period intersection contains an abnormal period corresponding to abnormal data, determining a difference period according to the application period intersection and the abnormal period;
and determining the unified application period according to the difference period.
7. The method for analyzing data based on policies according to any one of claims 4 to 6, wherein the collecting the original object sets corresponding to the policies according to the unified application scope and the unified application period includes:
Collecting original object sets corresponding to the policies with unified data reflux caliber according to the unified application range and the unified application period;
the unified data return aperture comprises a data request time aperture or a data arrival time aperture.
8. The method of policy-based data analysis according to claim 2, wherein said determining an experimental set for each policy from said first set of objects comprises:
and determining the experimental set corresponding to each strategy from the sub-object set corresponding to each strategy.
9. The method for analyzing data based on policies according to any one of claims 1-6 and 8, wherein performing policy analysis based on rank information carried by objects in the experimental set and the control set corresponding to each policy to obtain a data analysis result corresponding to each policy includes:
performing policy analysis on rank information carried by objects in an experimental set and a control set corresponding to each policy by utilizing Mannheim rank sum test to obtain first statistical data corresponding to each policy;
and determining a data analysis result corresponding to each strategy according to the first statistical data corresponding to each strategy and preset critical data.
10. The method for analyzing data based on policies according to any one of claims 1-6 and 8, wherein performing policy analysis based on rank information carried by objects in the experimental set and the control set corresponding to each policy to obtain a data analysis result corresponding to each policy includes:
performing policy analysis on rank information carried by objects in the experimental set and the control set corresponding to each policy by using non-tightly ordered rank sum test to obtain second statistical data corresponding to each policy; the second statistical data obeys a normal distribution; the second statistical data comprises differences between an experimental set and a control set and occurrence probability;
and determining a data analysis result corresponding to each strategy according to the difference corresponding to each strategy and the occurrence probability.
11. A policy-based data analysis device, the device comprising:
the acquisition module is used for acquiring an original object set corresponding to a plurality of preset strategies; each object in the original object set carries identification information; the identification information comprises an indication identifier and analysis data corresponding to the indication identifier; the indication identifier corresponding to each object in the original object set is the same; the original object set comprises a first object set and a second object set; each object in the first set of objects is an object that is subject to one of the plurality of policies;
The sorting module is used for sorting the objects in the original object set based on the analysis data corresponding to each object in the original object set to obtain an object sequence; each object in the object sequence carries rank information; the rank information characterizes the position of an object in the sequence of objects;
the determining module is used for determining an experiment set corresponding to each strategy from the first object set and determining a comparison set corresponding to each strategy from the second object set;
the analysis module is used for carrying out strategy analysis based on the rank information carried by the objects in the experimental set and the comparison set corresponding to each strategy to obtain a data analysis result corresponding to each strategy; and the implementation of the data analysis result characterization strategy is different from the identification information.
12. An electronic device comprising a processor and a memory, wherein the memory has stored therein at least one instruction or at least one program that is loaded and executed by the processor to implement the policy-based data analysis method of any of claims 1-10.
13. A computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the policy-based data analysis method of any of claims 1 to 10.
CN202311287862.2A 2023-10-07 2023-10-07 Policy-based data analysis method and device, electronic equipment and storage medium Pending CN117421550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311287862.2A CN117421550A (en) 2023-10-07 2023-10-07 Policy-based data analysis method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311287862.2A CN117421550A (en) 2023-10-07 2023-10-07 Policy-based data analysis method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117421550A true CN117421550A (en) 2024-01-19

Family

ID=89525528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311287862.2A Pending CN117421550A (en) 2023-10-07 2023-10-07 Policy-based data analysis method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117421550A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118505237A (en) * 2024-05-27 2024-08-16 江苏思行达信息技术股份有限公司 Intelligent customer service system of electric power business hall based on domestic large model and use method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118505237A (en) * 2024-05-27 2024-08-16 江苏思行达信息技术股份有限公司 Intelligent customer service system of electric power business hall based on domestic large model and use method

Similar Documents

Publication Publication Date Title
Fan et al. Performance evaluation of blockchain systems: A systematic survey
US12093837B2 (en) Building a federated learning framework
CN106663224B (en) Interactive interface for machine learning model assessment
WO2022105115A1 (en) Question and answer pair matching method and apparatus, electronic device and storage medium
Hassanien Rough set approach for attribute reduction and rule generation: a case of patients with suspected breast cancer
Korzh et al. University’s information image as a result of university web communities’ activities
CN108595619A (en) A kind of answering method and equipment
WO2018226404A1 (en) Machine reasoning based on knowledge graph
CN111949708B (en) Multi-task prediction method, device, equipment and medium based on time sequence feature extraction
Kadoić et al. Structuring e-learning multi-criteria decision making problems
CN113724847A (en) Medical resource allocation method, device, terminal equipment and medium based on artificial intelligence
Mostaeen et al. Clonecognition: machine learning based code clone validation tool
CN117421550A (en) Policy-based data analysis method and device, electronic equipment and storage medium
Levin et al. Stratified-sampling over social networks using mapreduce
CN113516205B (en) Employee stability classification method based on artificial intelligence and related equipment
CN111831715A (en) Intelligent access and certificate storage system and method based on artificial intelligence big data
CN109711849B (en) Ether house address portrait generation method and device, electronic equipment and storage medium
Xiong et al. ShenZhen transportation system (SZTS): a novel big data benchmark suite
bin Othman et al. Neuro fuzzy classification and detection technique for bioinformatics problems
Wu et al. Preserving institutional privacy in distributed binary logistic regression
CN114360732B (en) Medical data analysis method, device, electronic equipment and storage medium
CN113902302B (en) Data analysis method, device, equipment and storage medium based on artificial intelligence
Park et al. A new forecasting system using the latent dirichlet allocation (LDA) topic modeling technique
CN114943474A (en) Research and development workload detection method, device, equipment and storage medium
CN114155578A (en) Portrait clustering method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication