CN114297235A - Risk address identification method and system and electronic equipment - Google Patents
Risk address identification method and system and electronic equipment Download PDFInfo
- Publication number
- CN114297235A CN114297235A CN202111439580.0A CN202111439580A CN114297235A CN 114297235 A CN114297235 A CN 114297235A CN 202111439580 A CN202111439580 A CN 202111439580A CN 114297235 A CN114297235 A CN 114297235A
- Authority
- CN
- China
- Prior art keywords
- address
- risk
- order
- address data
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000000295 complement effect Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 description 10
- 230000011218 segmentation Effects 0.000 description 10
- 238000002372 labelling Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000012447 hatching Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 229940088594 vitamin Drugs 0.000 description 1
- 229930003231 vitamin Natural products 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 150000003722 vitamin derivatives Chemical class 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a risk address identification method, a system and electronic equipment, wherein the method comprises the following steps: acquiring order address data; acquiring target risk address data in a preset risk address library; calculating address similarity and a distance between the order address and the risk address based on the order address data and the risk address data; identifying whether the order address has risks or not based on the address similarity, the distance between the order address and the risk address and a preset identification rule; whether the order address is a risk address or not is judged by calculating the similarity and the distance between the order address data and target risk address data in the existing risk address library and judging whether the order address is slightly modified from the risk address or not, whether the order is suspected to be fraudulent or not is judged efficiently, and the accuracy is high.
Description
Technical Field
The invention relates to the field of computer data analysis, in particular to a risk address identification method, a risk address identification system and electronic equipment.
Background
The address data is used as an important data asset of an e-commerce enterprise and a financial enterprise, is also an important wind control information, and should exert the due value in the links of wind control and fraud prevention of the internet. In the current online transaction, the orders which are statistically found to be cash-out and fraudulent are mostly obtained by slightly modifying the order address to bypass the existing wind control rule. For example, the addresses of the existing and fraudulent cases are ". about.3. about.1. about.2. about.4. about.5. about.ALS". about.3. about.1. about.201. about..
At present, whether order addresses are highly similar is generally identified through manual observation, but the efficiency of the manual observation is low on one hand, and a systematic judgment method is absent on the other hand, so that the similarity of the addresses is difficult to accurately measure, and therefore, a computer-executable method capable of efficiently identifying and measuring the similarity of risk addresses is needed.
Disclosure of Invention
The invention aims to: a risk address identification method, a risk address identification system and electronic equipment are provided.
The technical scheme of the invention is as follows: in a first aspect, the present invention provides a risk address identification method, including:
acquiring order address data;
acquiring target risk address data in a preset risk address library;
calculating address similarity and a distance between the order address and the risk address based on the order address data and the risk address data;
and identifying whether the order address has risk or not based on the address similarity, the distance between the order address and the risk address and a preset identification rule.
In a preferred embodiment, the obtaining order address data includes:
acquiring an order address;
and standardizing the order address to obtain standard address data, wherein the standard address data at least comprises hierarchical address data corresponding to each preset hierarchy.
In a preferred embodiment, the normalizing the order address to obtain the standard address data includes:
performing Chinese analysis on the order address to obtain hierarchical address data corresponding to each preset hierarchy;
and judging whether the hierarchical address data corresponding to the preset hierarchy is empty one by one, and if so, completing the hierarchical address data.
In a preferred embodiment, said complementing said hierarchical address data comprises:
and inquiring a preset address base to complement the hierarchical address data based on a preset rule and a pre-trained conditional random field model.
In a preferred embodiment, the parsing the order address in chinese to obtain hierarchical address data corresponding to each preset hierarchy includes:
and carrying out Chinese analysis on the order address based on a pre-trained conditional random field model to obtain standardized order address data.
In a preferred embodiment, the acquiring target risk address data in the preset risk address library includes:
acquiring a preset risk address library in a preset range based on the order address data;
and acquiring target risk address data in the preset risk address base, wherein the target risk address data is risk address data which is in the preset risk address base and has the shortest distance with the order address data.
In a preferred embodiment, the calculating the address similarity and the distance between the order address and the risk address based on the order address data and the risk address data includes:
calculating the similarity between the order address data and the target risk address data based on a SimHash algorithm to obtain address similarity;
and calculating the distance between the order address data and the target risk address data based on a GEOhash algorithm to obtain the distance between the order address and the risk address.
In a preferred embodiment, the identifying whether the order address has the risk based on the address similarity, the distance between the order address and the risk address, and the preset identification rule includes:
scoring the order address data based on a preset scoring rule, the address similarity and the distance between the order address and a risk address to obtain a target score;
and identifying whether the order address has risk or not based on a preset identification rule and the target score.
In a second aspect, the present invention provides a risk address identification system, the system comprising:
the first acquisition module is used for acquiring order address data;
the second acquisition module is used for acquiring target risk address data in a preset risk address library;
the calculation module is used for calculating address similarity and the distance between the order address and the risk address based on the order address data and the risk address data;
and the identification module is used for identifying whether the order address has the risk or not based on the address similarity, the distance between the order address and the risk address and a preset identification rule.
In a third aspect the present invention provides an electronic device comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the method of any of the first aspects.
The invention has the advantages that: provided are a risk address identification method, a system and an electronic device, wherein the method comprises the following steps: acquiring order address data; acquiring target risk address data in a preset risk address library; calculating address similarity and a distance between the order address and the risk address based on the order address data and the risk address data; identifying whether the order address has risks or not based on the address similarity, the distance between the order address and the risk address and a preset identification rule; whether the order address is a risk address or not is judged by calculating the similarity and the distance between the order address data and target risk address data in the existing risk address library and judging whether the order address is slightly modified from the risk address or not, whether the order is suspected to be fraudulent or not is judged efficiently, and the accuracy is high.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a risk address identification method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a random field model of a linear chain element in the risk address recognition method according to an embodiment of the present invention;
fig. 3 is a flow chart of address resolution based on CRF in the risk address identification method according to the embodiment of the present invention;
FIG. 4 is a block diagram of a risk address identification system according to an embodiment of the present invention;
fig. 5 is an architecture diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As described in the background, many orders that are currently statistically discovered to be cash-out and fraudulent bypass existing wind control rules by making slight modifications to the order address, i.e., cash-out and fraudulent order addresses are different but very similar. Thus, if the address in an order closely approximates the address in the historical order of cash-out and fraud that has been determined and recorded, then the order has a very high risk of cash-out and fraud. At present, risk addresses are identified by manually observing whether the identified order addresses are highly similar to addresses in historical risk orders (determined and recorded cash-out and fraudulent historical orders), but the efficiency of manual observation is low, the judgment is not accurate enough, and the degree of address similarity is difficult to accurately measure.
In order to solve the problems, the invention provides a risk address identification method, a risk address identification system and electronic equipment. Similarity analysis and distance calculation are carried out on the order address and the risk address in the site risk address base, risk scoring is carried out, whether the transaction risk exists in the transaction or not is evaluated in an auxiliary mode, and therefore fraudulent orders are intercepted through address information.
The data monitoring method, apparatus and electronic device provided by the present invention will be further described with reference to specific embodiments.
The first embodiment is as follows: the present embodiment provides a risk address identification method, as shown in fig. 1, the method includes:
and S110, obtaining order address data.
Preferably, the present step comprises:
and S111, obtaining an order address.
And S112, standardizing the order address to obtain standard address data, wherein the standard address data at least comprises grading address data corresponding to each preset grade.
Specifically, in this embodiment, the preset hierarchy is a hierarchy divided according to an administrative hierarchy, and specifically includes: province or direct district city level, town or street level, road or village level, parcel or district level, floor number level and floor household number level.
In one embodiment, S112 includes:
s112-1, performing Chinese analysis on the order address to obtain hierarchical address data corresponding to each preset hierarchy.
More preferably, the order address is Chinese parsed based on a pre-trained conditional random field model to obtain normalized order address data.
Specifically, address resolution is a generalization of the address field of natural language processing. Its object is a textual address, a composite of geographical named entities such as place names and other components. The place name recognition research developed by using a conditional random field and Chinese word segmentation at present are the same general language library. However, the object and strength of the place name recognition and the address resolution are different, so that the method and the recognition result thereof cannot be used for address coding. Chinese address resolution actually comes to the market as a process of segmenting and standardizing address text. The difficulty of address segmentation is to identify geographical named entities such as place names, organization names and the like, wherein the named entities have geographical position attributes and are often numerous and diverse in expression; the address labeling needs to obtain the occurrence probability of each component through the address context, and determine the type of the address component according to the occurrence probability, which belongs to the category of statistics.
In the statistical model, the conditional random field model is one of the machine learning models based on probability statistics, and is used for tasks of word segmentation, part of speech tagging, named entity recognition and the like in the field of natural language processing of commonly used words. The method has good effect on processing sequence marking problems, can express context dependence in a long distance, has strong generalization capability and better learning capability, and meets the requirements of batch address resolution and rapid address coding.
The training sample data source is manually adjusted and labeled after batch marking of data after on-line model word segmentation. Since the result of the on-line model word segmentation is standardized data, the actual on-line address (the address needing word segmentation) is a non-standard address, such as the source address is labeled as four to the sun, the address is labeled as 4 to the sun, the feature needs to be added into a training sample, the source address does not contain addresses or suffix words of certain levels, for example, the Changsha city in Hunan province … … may be written as Changsha in Hunan province, or Changsha … …, some levels can be cut or the hierarchy suffix words can be removed by a certain proportion, and more features need to be checked manually by testing using a pre-trained model.
The crf model divides the user input address A into words and divides the addresses at all levels.
The address resolution process based on CRF is shown in fig. 3: and receiving a line to define an address element classification labeling system, and performing address element classification labeling and format conversion on the original corpus to obtain a standardized labeling corpus, wherein the labeling corpus is obtained by adopting a Self-training and manual mixed iterative labeling method. Then, formulating a characteristic template, and selecting the labeled corpus to carry out model training; and producing a large number of characteristic functions according to the template, and calculating the weight to obtain a trained model, namely a model learning corpus segmentation rule, comprising the occurrence frequency of each address component and a context constraint process. And finally, predicting by using the trained model to obtain the optimal labeled address sequence, and realizing the analysis of the unknown address. Generally, an observation sequence X ═ (X1, X2, X3, …, Xn) is defined as an address string to be labeled, and a state labeling sequence Y ═ (Y1, Y2, Y3, …, Yn) is defined as a corresponding label.
S112-2, judging whether the hierarchical address data corresponding to the preset hierarchy is empty one by one, and if so, completing the hierarchical address data.
In one embodiment, completing the hierarchical address data comprises:
and inquiring a preset address base to complement the hierarchical address data based on a preset rule and a pre-trained conditional random field model.
Specifically, the address standardization process is to manually mark some original addresses on the basis of a preset rule + CRF (conditional random field), correspond to a standard system of 10-level addresses, and then calculate the 10-level addresses in batch. And finally, completing the standardized address by a probability statistics method.
The conditional random field model can be viewed as an undirected graph model, which is a discriminative probabilistic model for computing a specified output node value given an input node. Referring to fig. 2, the present embodiment mainly uses a special conditional random field model when dealing with the sequence tagging problem, i.e. linear conditional random fields, i.e. a set of random variable sequences X, Y with chain structure of length n, if the conditional probability distribution P (X | Y) of Y satisfies markov property under the condition of X:
P(Yi|X,Y1,…Yi-1,Yi+1,…,Yn)=P(Yi|X,Yi-1,Yi+1)
then, P (Y | X) is considered to be a random field of the linear chain element. Under the condition that the input variable X is X, the conditional probability formula that the output variable Y is Y is as follows:
the linear model only considers the hatching number of the ringing nodes, constructs a plurality of characteristics for each entry in the sequence, performs conditional limitation on the characteristic function for each characteristic, and performs weighted summation on each characteristic function during modeling. CRF is a characteristic function and corresponding weight λk、μlDetermined, λkValue sum mulThe values are also parameters that need to be learned.
For the observation sequence X, the state sequence Y with the maximum conditional probability is output in the given conditional random field P (Y | X) with known model parameters, i.e. the position address sequence is labeled.
Y*=argmax P(Y|X)
When calculating the probability, the CRF model usually calculates all the probabilities recursively by using a forward-backward algorithm, and then obtains the expected value of the feature. The learning strategy is to obtain the optimal weight value through the maximum likelihood estimation model parameter. The dimensionality related to the characteristic function in the sequence labeling problem is very high, and an improved optimization algorithm such as an iterative scale method IIS, a gradient descent method and a quasi-Newton method is required to be adopted.
And resolving the address through an address standardization algorithm, and returning to the standard address format of 11 levels:
and S120, acquiring target risk address data in a preset risk address library.
Preferably, the present step comprises:
and S121, acquiring a preset risk address library in a preset range based on the order address data.
Specifically, the preset risk address library stores existing known addresses with risks, and is divided according to regions. And inquiring a preset risk address library with the distance between the preset risk address library and the order address within a preset range according to the order address.
And S122, acquiring target risk address data in a preset risk address library, wherein the target risk address data is risk address data which is in the preset risk address library and has the shortest distance with the order address data.
Illustratively, according to the address longitude and latitude in the order address data, searching a preset risk address database within a specified kilometer range of the address longitude and latitude in the order address data, and finding out the closest risk address data, namely target risk address data.
And S130, calculating the address similarity and the distance between the order address and the risk address based on the order address data and the risk address data.
In one embodiment, the method comprises the steps of:
s131, calculating the similarity between the order address data and the target risk address data based on a SimHash algorithm to obtain the address similarity.
Specifically, common solutions to the problem of text deduplication include cosine algorithm, euclidean distance, Jaccard similarity, longest common substring, and the like, but these methods cannot efficiently process massive data. For example, in a search engine, there are many similar keywords, the content that the user needs to obtain is similar, but the searched keywords are different, such as "good chafing dish in beijing" and "good chafing dish in which beijing" are two equivalent keywords, but through ordinary hash calculation, two hash strings that are far apart are generated. The Hash strings obtained by SimHash calculation are very close, so that the similarity of two texts can be judged.
In information encoding, the number of bits encoded in the corresponding bits of two legal codes is called the code distance, also called hamming distance. Examples are as follows: 10101 and 00110 are different from the first digit to the fourth digit and the fifth digit in sequence, the Hamming distance is 3. One method of computing the hamming distance is to perform an exclusive or (xor) operation on two bit strings and compute the number of 1's in the result of the xor operation
The SimHash algorithm calculating step comprises the following steps:
(1) word segmentation
Given a sentence, performing word segmentation to obtain valid feature vectors, and then setting 5 levels of weights (if a text is given, the feature vectors may be words in the text, and the weights may be the number of times the word appears) for each feature vector. For example, given a sentence: "go to the sea city Min's district extension road 1588 dues", after word segmentation: "go to the city minu-go-area prolongation 1588, then assign a weight to each eigenvector: shanghai city (4) Mingo region (5) extension road (3)1588 (1) where the number in parentheses represents how important this word is in the whole sentence, the larger the number the more important.
(2)Hash
And calculating the hash value of each feature vector through a hash function, wherein the hash value is an n-bit signature consisting of binary numbers 01. For example, the Hash value of "Shanghai City" is 100101, and the Hash value of "Mingo region" is "101011". In this way, the string becomes a series of numbers.
(3) Weighting
On the basis of the Hash value, all the feature vectors are weighted, namely W is Hash weight, when 1 is encountered, the Hash value is multiplied by the weight value positively, and when 0 is encountered, the Hash value is multiplied by the weight value negatively. For example, weighting the hash value "100101" of "Shanghai City" yields: w (shanghai) ═ 1001014-4-44-44, weighting the hash value "101011" of "minwh region" yields: w (minunsh) ═ 1010115-5-55-555, the rest of the eigenvectors operate similarly.
(4) Merging
And accumulating the weighted results of the feature vectors to form a sequence string. Taking the first two eigenvectors as examples, e.g. "4-4-44-44" in "Shanghai City" and "5-55-555" in "Minkou district" to be summed up to "4 + 5-4 + -5-4 + 54 + -5-4 + 54 + 5" to "9-91-11".
(5) Reducing vitamin
And for the accumulation result of the n-bit signature, if the accumulation result is larger than 0, setting 1, otherwise, setting 0, thereby obtaining the simhash value of the statement, and finally, judging the similarity of the simhash value of different statements according to the hamming distance of the simhash. For example, by reducing the dimension of the above calculated "9-91-119" (a bit greater than 0 is recorded as 1, and a bit less than 0 is recorded as 0), the resulting 01 string is: "101011", thereby forming their simhash signatures.
(6) Calculating the Hamming distance
And after each document obtains the SimHash signature value, calculating the Hamming distance of the two signatures. According to the empirical value, for the SimHash value of 64 bits, the similarity is higher when the Hamming distance is within 3.
S132, calculating the distance between the order address data and the target risk address data based on a GEOhash algorithm to obtain the distance between the order address and the risk address.
In particular, GeoHash is a method for encoding geographical coordinates, which maps two-dimensional coordinates into a character string. Each string represents a particular rectangle, and all coordinates within the rectangle share the string. The longer the character string, the higher the precision, the smaller the corresponding rectangular range. When a geographical coordinate is encoded, the target longitude and latitude are calculated according to the initial interval ranges latitude-90, 90 and longitude-180, and respectively fall into the left interval or the right interval. If the interval falls in the left interval, 0 is taken, and if the interval falls in the right interval, 1 is taken. Then, the interval obtained in the last step is continuously searched for half according to the method to obtain the next binary code. And after the code length meets the precision requirement of the service, the obtained binary codes are alternately combined according to the rules of 'longitude placing at even number positions and latitude placing at odd number positions' to obtain a new binary string. And finally, translating the binary string into a character string according to a comparison table of base32 to obtain a target GeoHash character string corresponding to the geographic coordinate. Then, the close distance between the two addresses is judged according to the similarity of the character strings.
And S140, identifying whether the order address has risk or not based on the address similarity, the distance between the order address and the risk address and a preset identification rule.
In one embodiment, the method comprises the steps of:
and S141, scoring the order address data based on a preset scoring rule, the address similarity and the distance between the order address and the risk address to obtain a target score.
And S142, identifying whether the order address has risk or not based on the preset identification rule and the target score.
Illustratively, address risk is scored during the transaction segment of the cash-out. Therefore, the address orders with fraud risks are intercepted and then audited.
And the address risk wind service interface is called according to an rpc interface mode or a rest interface.
The risk address identification method provided by the embodiment evaluates the fraud prevention risk of the transaction through the similarity calculation between the order address and the address in the preset risk address library. The whole address standardization analysis, the address completion and similarity calculation, the address distance and the like are completed in real time, the risk assessment of address anti-fraud in the transaction can be completed in a minute level, the labor cost in risk address judgment can be greatly reduced, the company fraud loss is reduced, the transaction efficiency is improved, and the shopping experience of a user is further improved.
Example two: the present embodiment provides a risk address identification system, as shown in fig. 4, the system includes:
a first obtaining module 410, configured to obtain order address data;
a second obtaining module 420, configured to obtain target risk address data in a preset risk address library;
a calculating module 430, configured to calculate an address similarity and a distance between an order address and a risk address based on the order address data and the risk address data;
and the identifying module 440 is configured to identify whether the order address has a risk or not based on the address similarity, the distance between the order address and the risk address, and a preset identification rule.
Preferably, the first obtaining module 410 includes:
a first obtaining unit 411 for obtaining an order address;
a standard processing unit 412, configured to perform standardization processing on the order address to obtain standard address data, where the standard address data at least includes hierarchical address data corresponding to each preset hierarchy.
More preferably, the standard processing unit 412 includes:
an analyzing subunit 4121, configured to perform chinese analysis on the order address to obtain hierarchical address data corresponding to each preset hierarchy;
a judging subunit 4122, configured to judge whether the hierarchical address data corresponding to the preset hierarchy is empty one by one;
a complementing subunit 4123, configured to complement the hierarchical address data when the determining subunit 4122 determines that the hierarchical address data corresponding to the preset hierarchy is empty.
More preferably, the complementing subunit 4123 is specifically configured to query a preset address library to complement the hierarchical address data based on preset rules and a pre-trained conditional random field model.
More preferably, the parsing subunit 4121 is specifically configured to perform a chinese parsing of the order address based on a pre-trained conditional random field model to obtain standardized order address data.
Preferably, the second obtaining module 420 includes:
a second obtaining unit 421, configured to obtain a preset risk address library within a preset range based on the order address data;
a third obtaining unit 422, configured to obtain target risk address data in the preset risk address library, where the target risk address data is risk address data in the preset risk address library that has a shortest distance from the order address data.
More preferably, the calculation module 430 includes:
a first calculating unit 431, configured to calculate, based on a SimHash algorithm, a similarity between the order address data and the target risk address data to obtain an address similarity;
a second calculating unit 432, configured to calculate a distance between the order address data and the target risk address data based on a GEOhash algorithm to obtain a distance between the order address and the risk address.
More preferably, the identification module 440 includes:
the scoring unit 441 is used for scoring the order address data based on a preset scoring rule, the address similarity and the distance between the order address and a risk address to obtain a target score;
an identifying unit 442, configured to identify whether the order address is at risk based on a preset identification rule and the target score.
The risk address identification system provided in this embodiment is used to execute the risk address identification method provided in the first embodiment, and the beneficial effects of the risk address identification system are the same as those of the risk address identification method, which are not described herein again.
It should be noted that: the risk address identification system provided in the above embodiment is only illustrated by the division of the functional modules when the risk address identification service is triggered, and in practical application, the function allocation may be completed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to complete all or part of the functions described above. In addition, the risk address identification system provided in the above embodiment and the risk address identification method provided in the first embodiment belong to the same concept, that is, the system is based on the method, and the specific implementation process thereof is described in the method embodiment, and is not described herein again.
Example three: the present embodiment provides an electronic device, which is shown in fig. 5 and includes: one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the risk address identification method disclosed in the above embodiments.
Fig. 5 illustrates an architecture of a computer system, which may specifically include a processor 510, a video display adapter 511, a disk drive 512, an input/output interface 513, a network interface 514, and a memory 520. The processor 510, the video display adapter 511, the disk drive 512, the input/output interface 513, the network interface 514, and the memory 520 may be communicatively connected by a communication bus 530.
The processor 510 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solution provided in the present Application.
The Memory 520 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 520 may store an operating system 521 for controlling the operation of the electronic device 500, and a Basic Input Output System (BIOS)522 for controlling low-level operations of the electronic device 500. In addition, a web browser 523, a data storage management system 524, an icon font processing system 525, and the like may also be stored. The icon font processing system 525 may be an application program that implements the operations of the foregoing steps in this embodiment of the application. In summary, when the technical solution provided in the present application is implemented by software or firmware, the relevant program codes are stored in the memory 520 and called to be executed by the processor 510.
The input/output interface 513 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The network interface 514 is used for connecting a communication module (not shown in the figure) to realize communication interaction between the device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
In addition, the electronic device 500 may also obtain information of specific pickup conditions from a virtual resource object pickup condition information database for performing condition judgment, and the like.
It should be noted that although the above-mentioned devices only show the processor 510, the video display adapter 511, the disk drive 512, the input/output interface 513, the network interface 514, the memory 520, the bus 530, etc., in a specific implementation, the device may also include other components necessary for normal operation. Furthermore, it will be understood by those skilled in the art that the apparatus described above may also include only the components necessary to implement the solution of the present application, and not necessarily all of the components shown in the figures.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means, or installed from the memory, or installed from the ROM. The computer program, when executed by a processor, performs the above-described functions defined in the methods of embodiments of the present application.
It should be noted that the computer readable medium of the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the server; or may exist separately and not be assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: when the peripheral mode of the terminal is detected to be not activated, acquiring a frame rate of an application on the terminal; when the frame rate meets the screen information condition, judging whether a user is acquiring the screen information of the terminal; and controlling the screen to enter an immediate dimming mode in response to the judgment result that the user does not acquire the screen information of the terminal.
Computer program code for carrying out operations for embodiments of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The method, the system and the electronic device for identifying a risk address provided by the application are introduced in detail, specific examples are applied in the description to explain the principle and the implementation of the application, and the description of the embodiments is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific embodiments and the application range may be changed. In view of the above, the description should not be taken as limiting the application.
It is to be understood that the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature.
All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again. The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. A method for risk address identification, the method comprising:
acquiring order address data;
acquiring target risk address data in a preset risk address library;
calculating address similarity and a distance between the order address and the risk address based on the order address data and the risk address data;
and identifying whether the order address has risk or not based on the address similarity, the distance between the order address and the risk address and a preset identification rule.
2. The risk address identification method of claim 1, wherein the obtaining order address data comprises:
acquiring an order address;
and standardizing the order address to obtain standard address data, wherein the standard address data at least comprises hierarchical address data corresponding to each preset hierarchy.
3. The risk address identification method of claim 2, wherein the normalizing the order address to obtain standard address data comprises:
performing Chinese analysis on the order address to obtain hierarchical address data corresponding to each preset hierarchy;
and judging whether the hierarchical address data corresponding to the preset hierarchy is empty one by one, and if so, completing the hierarchical address data.
4. The risk address identification method of claim 3, wherein the complementing the hierarchical address data comprises:
and inquiring a preset address base to complement the hierarchical address data based on a preset rule and a pre-trained conditional random field model.
5. The method for identifying a risk address according to claim 3, wherein the step of performing Chinese analysis on the order address to obtain hierarchical address data corresponding to each preset hierarchy comprises:
and carrying out Chinese analysis on the order address based on a pre-trained conditional random field model to obtain standardized order address data.
6. The method for identifying a risk address according to claim 1, wherein the obtaining target risk address data in a preset risk address library comprises:
acquiring a preset risk address library in a preset range based on the order address data;
and acquiring target risk address data in the preset risk address base, wherein the target risk address data is risk address data which is in the preset risk address base and has the shortest distance with the order address data.
7. The risk address identification method of claim 6, wherein the calculating of the address similarity and the distance between the order address and the risk address based on the order address data and the risk address data comprises:
calculating the similarity between the order address data and the target risk address data based on a SimHash algorithm to obtain address similarity;
and calculating the distance between the order address data and the target risk address data based on a GEOhash algorithm to obtain the distance between the order address and the risk address.
8. The risk address identification method according to claim 7, wherein the identifying whether the order address has the risk based on the address similarity, the distance between the order address and the risk address, and a preset identification rule comprises:
scoring the order address data based on a preset scoring rule, the address similarity and the distance between the order address and a risk address to obtain a target score;
and identifying whether the order address has risk or not based on a preset identification rule and the target score.
9. A risk address identification system, the system comprising:
the first acquisition module is used for acquiring order address data;
the second acquisition module is used for acquiring target risk address data in a preset risk address library;
the calculation module is used for calculating address similarity and the distance between the order address and the risk address based on the order address data and the risk address data;
and the identification module is used for identifying whether the order address has the risk or not based on the address similarity, the distance between the order address and the risk address and a preset identification rule.
10. An electronic device, comprising:
one or more processors; and
a memory associated with the one or more processors for storing program instructions that, when read and executed by the one or more processors, perform the method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111439580.0A CN114297235A (en) | 2021-11-30 | 2021-11-30 | Risk address identification method and system and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111439580.0A CN114297235A (en) | 2021-11-30 | 2021-11-30 | Risk address identification method and system and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114297235A true CN114297235A (en) | 2022-04-08 |
Family
ID=80965015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111439580.0A Pending CN114297235A (en) | 2021-11-30 | 2021-11-30 | Risk address identification method and system and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114297235A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115687870A (en) * | 2023-01-03 | 2023-02-03 | 四川易利数字城市科技有限公司 | Place name matching method based on matrix operation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020194119A1 (en) * | 2001-05-30 | 2002-12-19 | William Wright | Method and apparatus for evaluating fraud risk in an electronic commerce transaction |
CN108564448A (en) * | 2018-04-23 | 2018-09-21 | 广东奥园奥买家电子商务有限公司 | A kind of implementation method of the anti-brush of order |
CN110335115A (en) * | 2019-07-01 | 2019-10-15 | 阿里巴巴集团控股有限公司 | A kind of service order processing method and processing device |
CN110598791A (en) * | 2019-09-12 | 2019-12-20 | 深圳前海微众银行股份有限公司 | Address similarity evaluation method, device, equipment and medium |
CN112581252A (en) * | 2020-12-03 | 2021-03-30 | 信用生活(广州)智能科技有限公司 | Address fuzzy matching method and system fusing multidimensional similarity and rule set |
-
2021
- 2021-11-30 CN CN202111439580.0A patent/CN114297235A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020194119A1 (en) * | 2001-05-30 | 2002-12-19 | William Wright | Method and apparatus for evaluating fraud risk in an electronic commerce transaction |
CN108564448A (en) * | 2018-04-23 | 2018-09-21 | 广东奥园奥买家电子商务有限公司 | A kind of implementation method of the anti-brush of order |
CN110335115A (en) * | 2019-07-01 | 2019-10-15 | 阿里巴巴集团控股有限公司 | A kind of service order processing method and processing device |
CN110598791A (en) * | 2019-09-12 | 2019-12-20 | 深圳前海微众银行股份有限公司 | Address similarity evaluation method, device, equipment and medium |
CN112581252A (en) * | 2020-12-03 | 2021-03-30 | 信用生活(广州)智能科技有限公司 | Address fuzzy matching method and system fusing multidimensional similarity and rule set |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115687870A (en) * | 2023-01-03 | 2023-02-03 | 四川易利数字城市科技有限公司 | Place name matching method based on matrix operation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145219B (en) | Method and device for judging validity of interest points based on Internet text mining | |
CN110968654B (en) | Address category determining method, equipment and system for text data | |
CN109255564B (en) | Pick-up point address recommendation method and device | |
CN111222976B (en) | Risk prediction method and device based on network map data of two parties and electronic equipment | |
CN109509048B (en) | Malicious order identification method and device, electronic equipment and storage medium | |
CN111325022B (en) | Method and device for identifying hierarchical address | |
CN111753082A (en) | Text classification method and device based on comment data, equipment and medium | |
CN116089873A (en) | Model training method, data classification and classification method, device, equipment and medium | |
CN111680506A (en) | External key mapping method and device of database table, electronic equipment and storage medium | |
CN112668341B (en) | Text regularization method, apparatus, device and readable storage medium | |
CN117891939A (en) | Text classification method combining particle swarm algorithm with CNN convolutional neural network | |
CN113836308B (en) | Network big data long text multi-label classification method, system, device and medium | |
CN111161238A (en) | Image quality evaluation method and device, electronic device, and storage medium | |
CN113591881B (en) | Intention recognition method and device based on model fusion, electronic equipment and medium | |
CN111046669A (en) | Interest point matching method and device and computer system | |
CN114297235A (en) | Risk address identification method and system and electronic equipment | |
CN113779370B (en) | Address retrieval method and device | |
CN111460206B (en) | Image processing method, apparatus, electronic device, and computer-readable storage medium | |
CN110390011A (en) | The method and apparatus of data classification | |
CN111950265A (en) | Domain lexicon construction method and device | |
CN111582313A (en) | Sample data generation method and device and electronic equipment | |
CN113656586B (en) | Emotion classification method, emotion classification device, electronic equipment and readable storage medium | |
CN116701734A (en) | Address text processing method and device and computer readable storage medium | |
CN111695922A (en) | Potential user determination method and device, storage medium and electronic equipment | |
CN112417260B (en) | Localized recommendation method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |