Nothing Special   »   [go: up one dir, main page]

CN104123178B - Parallelism constraint detection method based on GPUs - Google Patents

Parallelism constraint detection method based on GPUs Download PDF

Info

Publication number
CN104123178B
CN104123178B CN201410358441.9A CN201410358441A CN104123178B CN 104123178 B CN104123178 B CN 104123178B CN 201410358441 A CN201410358441 A CN 201410358441A CN 104123178 B CN104123178 B CN 104123178B
Authority
CN
China
Prior art keywords
node
thread
pointer
result
constraint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410358441.9A
Other languages
Chinese (zh)
Other versions
CN104123178A (en
Inventor
许畅
马晓星
吕建
眭骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CVIC Software Engineering Co Ltd
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201410358441.9A priority Critical patent/CN104123178B/en
Publication of CN104123178A publication Critical patent/CN104123178A/en
Application granted granted Critical
Publication of CN104123178B publication Critical patent/CN104123178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides a parallelism constraint detection method based on GPUs. The method comprises the steps that 1, a constraint is divided into the processing units with quantifiers as division points, and by dispatching the processing units, recursions in the detection process are eliminated and the degree of parallelism is maximized; 2, according to the current processing units and an information set, GPU threads of the corresponding number are generated, a corresponding variable assignment of each GPU thread is calculated according to the thread number of the GPU, and the processing units under the assignments are processed, wherein each processing unit with one assignment is named as a parallel computing unit, and the parallel computing units are the minimum units which can be processed in the GPUs in parallel; 3, a second-level storage strategy of an index and a result pool is adopted, non-fixed-length results generated by nodes of all the parallel computing units are stored in the result pool, the initial addresses and lengths, in the result pool, of the results generated by the nodes are stored in the index, the strategy of distributing a space serially and writing the results in parallel is adopted, and the higher writing speed is achieved.

Description

Parallelization constraint detection method based on graphic process unit
Technical field
Detection method is constrained the present invention relates to a kind of parallelization based on graphic process unit.
Background technology
Constraint detection is a kind of method of conventional checking information validity.One constraint reflects an information or many The relation that should be met between bar information.In general, a constraint is formed by the connection of several node:" generality quantifier " is saved Point, " existential quantifier " node, AND node, OR node, " containing " node, " non-" node and " function " node.Every kind of node Describe a specific relation.Detect that constraint is:The information of acquisition is checked with predefined constraint, constraint is violated An information or a group information be invalid.Constraint detection is typically combined in other application.
The mode of present confinement detection mainly has two classes:Increment type is detected and parallel detection.But, both modes are all complete Central processing unit (CPU) is depended on entirely, therefore can consume the computing resource that should be largely used for other application.This method Calculating is no longer dependent on CPU, conversely, it relies primarily on graphic process unit (GPU) being calculated.Therefore, the method is improve While constraining the speed of detection, also ensure that sufficient computing resource is used for other application.
The content of the invention
It is time-consuming to present confinement detection excessive for the deficiencies in the prior art, the excessive shortcoming of resource is taken, this Invention proposes a kind of constraint detection method based on GPU.The core of the method is three parts:Constraint pretreatment;Parallel Strategy;Storage strategy.
The technical scheme is that:A kind of parallelization constraint detection method based on graphic process unit, it includes:
Constraint pretreatment, the constraint dividing method based on measure word;Specially:
Step 1, specified constraint head node are present node, are split since present node;
If step 2, present node are " generality quantifier " or " existential quantifier " node, the Tai Ho are divided into two sons Part a, subdivision is terminated with the measure word node, and another part specifies the measure word section since the child node of the measure word node The child node of point continues to split for present node;
If step 3, present node are AND node, OR node or " containing " node, then the left son of the node is specified Node is that present node continuation is split, and after having processed left child node, specifies the right child node of the node to continue to divide for present node Cut;
If step 4, present node are " non-" node, the child node of the node is specified to continue to split for present node;
If step 5, present node are " function " node, stop the recurrence of current branch;
After through over-segmentation, a constraint is converted into some processing units, and each processing unit is non-intersect, and all treatment Unit collectively forms the constraint.
Paralleling tactic, the method for parallel processing based on processing unit;Specially:
Step 1, Thread Count N needed for calculating, if since the father node of current processing unit, to the path of constraint head node In variable<υ12,...υn>Corresponding set of context information is combined into<Si,S2,...Sn>, in each contextual information set Information bar number is<I1,I2,...In>, then N=I1×I2×...×In;If current processing unit does not include any change to head node Amount, or current processing unit includes head node, then N=1;
Step 2, generate N number of GPU threads, thread id is from 0 to N-1 (id is distributed automatically by GPU);Each thread is according to certainly Body id independently calculates its corresponding assignment, if integer value Mi=j represents variable υiTake its correspondence set SiMiddle j-th strip information (0≤ Mi<Ii);Then MiValue draw according to the following steps:
i:If size=1, cur=n;
ii:If cur >=1, rotor iii otherwise terminates;
iii:Size=size*Icur;Cur=cur-1, rotor ii.
The evaluation mapping that step 3, each thread will be calculated is to be processed in processing unit, producing each thread to need Parallel computation unit;Each thread independent process each parallel computation unit;
Described all GPU threads are concurrently performed, and do not exist dependence from each other.
Storage strategy.Two level storage methods of index-outcome pool, mainly include three parts:1) array of indexes, comprising Two domains:The original position pos and length len of result;2) result array;3) result array position indicator pointer Pointer (abbreviation positions Put pointer), it can only mutually exclusive be write.If the result length of n thread generation is respectively l1,l2...li...ln, the rope Draw-two level storage methods of outcome pool are specially:
Step 1, each thread calculate storage location of the respective present node in array of indexes according to its assignment;
The acquisition result array position indicator pointer of step 2, each Line Procedure Mutually-exclusive, if ith thread gets the result array Position indicator pointer, then it the original position pos of the node is set to Pointer currencys, afterwards,
Step 3, the value for updating the result array position indicator pointer:Pointernew=Pointerold+li, wherein, PointeroldIt is position indicator pointer initial value, liFor the result length that ith thread is produced;
The result array position indicator pointer is discharged after step 4, renewal to be used for other threads, and result is inserted into number of results Group, during as a result length inserts the result length len of the node.
Beneficial effects of the present invention:The present invention efficiently can enter row constraint and detect using GPU:Constraint segmentation eliminates constraint Recurrence in processing procedure, is allowed to be adapted to the working method of GPU;Paralleling tactic based on processing unit so that each thread Can be independently positioned and processing data, improve the concurrency of the method;Concurrent storage strategy has been obviously improved storage efficiency. The invention is greatly reduced the dependence to cpu resource, so that cpu resource energy while the efficiency of constraint detection is improved It is enough more to serve other application.Further, since GPU and CPU can be performed simultaneously, in the absence of mutual situation about waiting, because This, the method also characteristic can obtain gain in extra efficiency whereby.
Brief description of the drawings
Fig. 1 constraint treatment Use Case Maps of the invention.
Context mapping and paralleling tactic in Fig. 2 calculating process of the present invention.
The storage strategy of Fig. 3 two levels of the present invention.
Specific embodiment
The present invention is described in further detail below in conjunction with the drawings and specific embodiments.
The constraint detection method based on GPU of the present embodiment.The core of the method is three parts:Constraint pretreatment; Paralleling tactic;Storage strategy.Specifically:
1. constraint pretreatment.The present invention proposes the constraint dividing method based on measure word, comprises the steps of:
A) it is present node to specify constraint head node, is split since present node;
If b) present node is " generality quantifier " or " existential quantifier " node, by the partial segmentation into two sub-portions Point, a subdivision is terminated with the measure word node, and another part specifies the measure word node since the child node of the measure word node Child node for present node continue split;
If c) present node is AND node, OR node or " containing " node, then the left child node of the node is specified For present node continues to split, after having processed left child node, the right child node of the node is specified to continue to split for present node;
If d) present node is " non-" node, the child node of the node is specified to continue to split for present node;
If e) present node is " function " node, stop the recurrence of current branch.After through over-segmentation, a constraint is turned It is changed into some processing units, each processing unit is non-intersect, and all processing units collectively form the constraint.Fig. 1 illustrates one Bar is constrained, and its implication is as follows:For any taxi in the A of city, the distance of its traveling within a period of time is only Can be in a rational scope.For this constraint, it will be divided into three parts according to above-mentioned algorithm, dashed lines institute Show.
2. paralleling tactic.Method for parallel processing based on processing unit, comprises the steps of:
A) Thread Count N needed for calculating.If since the father node of current processing unit, in the path of constraint head node Variable<υ12,...υn>Corresponding set of context information is combined into<S1,S2,...Sn>, the information in each contextual information set Bar number is<I1,I2,...In>, then N=I1×I2×...×In;If current processing unit does not include any variable to head node, Or current processing unit includes head node, then N=1;
B) N number of GPU threads are generated, thread id is from 0 to N-1 (id is distributed automatically by GPU);Each thread is according to itself id Independently calculate its corresponding assignment.If integer value Mi=j represents variable υiTake its correspondence set SiMiddle j-th strip information (0≤Mi< li), then MiValue draw according to the following steps:
I. size=1, cur=n are made;
If ii. cur >=1, turns iii, otherwise terminate;
iii.Size=size*Icur;Cur=cur-1, turns ii
C) evaluation mapping that each thread will be calculated is to be processed parallel in processing unit, producing each thread to need Computing unit;Each thread independent process each parallel computation unit.
Described all GPU threads are concurrently performed, and do not exist dependence from each other.As a example by being constrained shown in Fig. 1. If the taxi information being currently received in two A cities:Taxi 1 and taxi 2.Then in calculation processing unit 1, according to Above-mentioned algorithm steps a), processing unit to root node has 2 variables (a and b), and each variable can take two value (taxis 1 With taxi 2), therefore N=4;4 threads (id is respectively 0,1,2,3) are generated according to step b).Each thread is independently calculated The value of its variable.As a example by calculating the variable-value of the thread that id is 3, the corresponding assignment calculating process of two variables is:
Size=1, cur=2;
Size=1 × Icur=1 × 2=2, cur=cur-1=1;
Due to cur >=1, continue the process, M can be obtained1=1;Finally draw M1=1, M2=1, it is noted that information be from 0 open numbering, therefore M1=1, M2=1 means that first variable and second variable all take Article 2 in respective information aggregate Information, i.e. (a=taxis 2, b=taxis 2).Value is mapped in processing unit, parallel computation unit can be obtained.Figure 2 parallel computation unit group 1 illustrates 4 parallel computation units of generation.
3. storage strategy.Two level storage methods of index-outcome pool, mainly include three parts:1) array of indexes, bag Containing two domains:The original position pos and length len of result;2) result array;3) result array position indicator pointer Pointer (abbreviations Position indicator pointer), it can only mutually exclusive be write.If the result length of n thread generation is respectively l1,l2...li...ln, then the party Method storing process is as follows:
A) each thread calculates storage location of the respective present node in array of indexes according to its assignment;
B) the acquisition position indicator pointer of each Line Procedure Mutually-exclusive, if ith thread gets the position indicator pointer, then it is by the node Original position pos be set to Pointer currencys, afterwards,
C) value of the position indicator pointer is updated:Pointernew=Pointerold+li, wherein, PointeroldIt is position indicator pointer Initial value, liFor the result length that ith thread is produced.
D) discharge the position indicator pointer after updating to be used for other threads, and result is inserted into result array, as a result length is filled out In entering the result length len of the node.
The storage strategy is illustrated in Fig. 3:It is provided with three threads and is processing three nodes simultaneously, they is required for result Write-in memory.If the result of these three nodes is respectively necessary for 1,3,2 memory spaces of occupancy, then they will apply accessing simultaneously Position indicator pointer (currency is 0).Assuming that thread t1The position indicator pointer access right is got, then it sets the starting of the result of oneself Position is Pointer currencys (i.e. 0), and updates the value Pointer of Pointernew=Pointerold+ 1=1, discharges afterwards The pointer supplies thread t2And t3Access;t1Update after position indicator pointer value, during result inserted into description information array, and update rope The length (i.e. 1) of oneself in argument group.First element have recorded t in array of indexes1The storage original position of the result of generation And length (1) (0).If t2Prior to t3Pointer access rights are obtained, due to t1Renewal, the currency of Pointer is 1, therefore, t2Result original position be 1, update Pointer value Pointernew=Pointerold+ 3=4, discharges Pointer, knot Fruit is inserted in description information array, and updates the length (i.e. 3) of oneself in array of indexes.The serial allocation space of this strategy, But it can be parallel and Lothrus apterus completions that multiple threads write result simultaneously.
Above-mentioned 1,2 elaborate the method that is constrained using GPU parallel detections, and 3 elaborate to be applied to the efficient concurrent write of GPU The method of result.
Above example is described only for partial function of the invention, but embodiment and accompanying drawing are not for limiting It is fixed of the invention.Without departing from the spirit and scope of the invention, any equivalence changes done or retouching, also belong to this hair Bright protection domain.Therefore the content that protection scope of the present invention should be defined by claims hereof is defined.

Claims (4)

1. a kind of parallelization based on graphic process unit constrains detection method, it is characterised in that it includes:
Constraint segmentation step based on measure word;
Parallel processing step based on processing unit;
Storage strategy step;
The constraint segmentation step based on measure word is specially:
Step 1, specified constraint head node are present node, are split since present node;
If step 2, present node are " generality quantifier " or " existential quantifier " node, by the node allocation into two sub-portions Point, a subdivision is terminated with the measure word node, and another part specifies the measure word node since the child node of the measure word node Child node for present node continue split;
If step 3, present node are AND node, OR node or " containing " node, then the left child node of the node is specified For present node continues to split, after having processed left child node, the right child node of the node is specified to continue to split for present node;
If step 4, present node are " non-" node, the child node of the node is specified to continue to split for present node;
If step 5, present node are " function " node, stop the recurrence of current branch;
After through over-segmentation, a constraint is converted into some processing units, and each processing unit is non-intersect, and all processing units Collectively form the constraint.
2. parallelization according to claim 1 constrains detection method, it is characterised in that:It is described based on the parallel of processing unit Process step, specially:
Step 1, Thread Count N needed for calculating, if since the father node of current processing unit, in the path of constraint head node Variable<v1,v2,…vn>Corresponding set of context information is combined into<S1,S2,…Sn>, the information bar in each contextual information set Number is<I1,I2,…In>, then N=I1×I2×…×In;If current processing unit does not include any variable to head node, or Current processing unit includes head node, then N=1;
Step 2, generate N number of GPU threads, thread id is from 0 to N-1 (id is distributed automatically by GPU);Each thread is according to itself id Its corresponding assignment is independently calculated, if integer value Mi=j represents variable viTake its correspondence set SiMiddle j-th strip information (0≤Mi< Ii);
The evaluation mapping that step 3, each thread will be calculated is to be processed parallel in processing unit, producing each thread to need Computing unit;Each thread independent process each parallel computation unit;
Described all GPU threads are concurrently performed, and do not exist dependence from each other.
3. parallelization according to claim 2 constrains detection method, it is characterised in that:M in step 2iValue press following step Suddenly draw:
I. sub-step i:If size=1, cur=n;
Ii. sub-step ii:If cur >=1, rotor step iii otherwise terminates;
Sub-step iii:Size=size*Icur;Cur=cur-1, rotor step ii.
4. parallelization according to claim 1 constrains detection method, it is characterised in that:The storage strategy step is specific For:Two level storage methods of index-outcome pool, comprising three parts:1) array of indexes, comprising two domains:The starting of result Position pos and length len;2) result array;3) result array position indicator pointer Pointer, it can only mutually exclusive be write;
If the result length of n thread generation is respectively l1,l2…li…ln, two level storage methods of the index-outcome pool Specially:
Step 1, each thread calculate storage location of the respective present node in array of indexes according to its assignment;
The acquisition result array position indicator pointer of step 2, each Line Procedure Mutually-exclusive, if ith thread gets the result array position Pointer, then it the original position pos of the node is set to Pointer currencys, afterwards,
Step 3, the value for updating the result array position indicator pointer:Pointernew=Pointerold+li, wherein, PointeroldFor Position indicator pointer initial value, liFor the result length that ith thread is produced;
The result array position indicator pointer is discharged after step 4, renewal to be used for other threads, and result is inserted into result array, tie Fruit length is inserted in the result length len of the node.
CN201410358441.9A 2014-07-25 2014-07-25 Parallelism constraint detection method based on GPUs Active CN104123178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410358441.9A CN104123178B (en) 2014-07-25 2014-07-25 Parallelism constraint detection method based on GPUs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410358441.9A CN104123178B (en) 2014-07-25 2014-07-25 Parallelism constraint detection method based on GPUs

Publications (2)

Publication Number Publication Date
CN104123178A CN104123178A (en) 2014-10-29
CN104123178B true CN104123178B (en) 2017-05-17

Family

ID=51768602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410358441.9A Active CN104123178B (en) 2014-07-25 2014-07-25 Parallelism constraint detection method based on GPUs

Country Status (1)

Country Link
CN (1) CN104123178B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2985643A1 (en) 2015-05-15 2016-11-24 Cox Automotive, Inc. Parallel processing for solution space partitions
EP3360050A4 (en) * 2015-10-05 2019-03-20 Cox Automotive, Inc. Parallel processing for solution space partitions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7120903B2 (en) * 2001-09-26 2006-10-10 Nec Corporation Data processing apparatus and method for generating the data of an object program for a parallel operation apparatus
CN101593129A (en) * 2008-05-28 2009-12-02 国际商业机器公司 Triggering has the method and apparatus of execution of a plurality of incidents of restriction relation
CN103201764A (en) * 2010-11-12 2013-07-10 高通股份有限公司 Parallel image processing using multiple processors
CN103294775A (en) * 2013-05-10 2013-09-11 苏州祥益网络科技有限公司 Police service cloud image recognition vehicle management and control system based on geographic space-time constraint

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7120903B2 (en) * 2001-09-26 2006-10-10 Nec Corporation Data processing apparatus and method for generating the data of an object program for a parallel operation apparatus
CN101593129A (en) * 2008-05-28 2009-12-02 国际商业机器公司 Triggering has the method and apparatus of execution of a plurality of incidents of restriction relation
CN103201764A (en) * 2010-11-12 2013-07-10 高通股份有限公司 Parallel image processing using multiple processors
CN103294775A (en) * 2013-05-10 2013-09-11 苏州祥益网络科技有限公司 Police service cloud image recognition vehicle management and control system based on geographic space-time constraint

Also Published As

Publication number Publication date
CN104123178A (en) 2014-10-29

Similar Documents

Publication Publication Date Title
CN108829610B (en) Memory management method and device in neural network forward computing process
JP5425541B2 (en) Method and apparatus for partitioning and sorting data sets on a multiprocessor system
CN107609098B (en) Searching method and device
US6505283B1 (en) Efficient memory allocator utilizing a dual free-list structure
US20120036134A1 (en) Performing concurrent rehashing of a hash table for multithreaded applications
CN107918642A (en) Data query method, server and computer-readable recording medium
CN102538801B (en) A kind of disposal route of road network data in navigation map and device
CN106055401B (en) Magnanimity calculates the parallel automatic start-stop and calculating task dynamic allocation method of coarse granule
CN104361415B (en) A kind of choosing method and device for showing information
CN105045863B (en) A kind of method and system for Entities Matching
CN107491525A (en) Distributed address comparison method and device
CN104123178B (en) Parallelism constraint detection method based on GPUs
CN106649391A (en) Graph data processing method and apparatus
CN112786215A (en) Method and system for generating DIP comprehensive disease category catalog based on big data clustering
CN106709503A (en) Large spatial data clustering algorithm K-DBSCAN based on density
CN105095515A (en) Bucket dividing method, device and equipment supporting fast query of Map-Reduce output result
CN107402905A (en) Computational methods and device based on neutral net
CN109033365A (en) A kind of data processing method and relevant device
CN113031930A (en) Source code confusion generation method and device for control flow flattening
CN106777065A (en) The method and system that a kind of Frequent tree mining is excavated
CN104850646B (en) A kind of Frequent tree mining method for digging for single uncertain figure
Schram et al. Simulation of ring polymer melts with GPU acceleration
US8805891B2 (en) B-tree ordinal approximation
CN105573717B (en) A kind of procedure division method and device of multi-core processor oriented
Luria et al. A sharp threshold for spanning 2‐spheres in random 2‐complexes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220126

Address after: 250014 No. 41-1 Qianfo Shandong Road, Jinan City, Shandong Province

Patentee after: SHANDONG CVIC SOFTWARE ENGINEERING Co.,Ltd.

Address before: 210093 No. 22, Hankou Road, Gulou District, Jiangsu, Nanjing

Patentee before: NANJING University

TR01 Transfer of patent right