CN104123178B - Parallelism constraint detection method based on GPUs - Google Patents
Parallelism constraint detection method based on GPUs Download PDFInfo
- Publication number
- CN104123178B CN104123178B CN201410358441.9A CN201410358441A CN104123178B CN 104123178 B CN104123178 B CN 104123178B CN 201410358441 A CN201410358441 A CN 201410358441A CN 104123178 B CN104123178 B CN 104123178B
- Authority
- CN
- China
- Prior art keywords
- node
- thread
- pointer
- result
- constraint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Multi Processors (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention provides a parallelism constraint detection method based on GPUs. The method comprises the steps that 1, a constraint is divided into the processing units with quantifiers as division points, and by dispatching the processing units, recursions in the detection process are eliminated and the degree of parallelism is maximized; 2, according to the current processing units and an information set, GPU threads of the corresponding number are generated, a corresponding variable assignment of each GPU thread is calculated according to the thread number of the GPU, and the processing units under the assignments are processed, wherein each processing unit with one assignment is named as a parallel computing unit, and the parallel computing units are the minimum units which can be processed in the GPUs in parallel; 3, a second-level storage strategy of an index and a result pool is adopted, non-fixed-length results generated by nodes of all the parallel computing units are stored in the result pool, the initial addresses and lengths, in the result pool, of the results generated by the nodes are stored in the index, the strategy of distributing a space serially and writing the results in parallel is adopted, and the higher writing speed is achieved.
Description
Technical field
Detection method is constrained the present invention relates to a kind of parallelization based on graphic process unit.
Background technology
Constraint detection is a kind of method of conventional checking information validity.One constraint reflects an information or many
The relation that should be met between bar information.In general, a constraint is formed by the connection of several node:" generality quantifier " is saved
Point, " existential quantifier " node, AND node, OR node, " containing " node, " non-" node and " function " node.Every kind of node
Describe a specific relation.Detect that constraint is:The information of acquisition is checked with predefined constraint, constraint is violated
An information or a group information be invalid.Constraint detection is typically combined in other application.
The mode of present confinement detection mainly has two classes:Increment type is detected and parallel detection.But, both modes are all complete
Central processing unit (CPU) is depended on entirely, therefore can consume the computing resource that should be largely used for other application.This method
Calculating is no longer dependent on CPU, conversely, it relies primarily on graphic process unit (GPU) being calculated.Therefore, the method is improve
While constraining the speed of detection, also ensure that sufficient computing resource is used for other application.
The content of the invention
It is time-consuming to present confinement detection excessive for the deficiencies in the prior art, the excessive shortcoming of resource is taken, this
Invention proposes a kind of constraint detection method based on GPU.The core of the method is three parts:Constraint pretreatment;Parallel
Strategy;Storage strategy.
The technical scheme is that:A kind of parallelization constraint detection method based on graphic process unit, it includes:
Constraint pretreatment, the constraint dividing method based on measure word;Specially:
Step 1, specified constraint head node are present node, are split since present node;
If step 2, present node are " generality quantifier " or " existential quantifier " node, the Tai Ho are divided into two sons
Part a, subdivision is terminated with the measure word node, and another part specifies the measure word section since the child node of the measure word node
The child node of point continues to split for present node;
If step 3, present node are AND node, OR node or " containing " node, then the left son of the node is specified
Node is that present node continuation is split, and after having processed left child node, specifies the right child node of the node to continue to divide for present node
Cut;
If step 4, present node are " non-" node, the child node of the node is specified to continue to split for present node;
If step 5, present node are " function " node, stop the recurrence of current branch;
After through over-segmentation, a constraint is converted into some processing units, and each processing unit is non-intersect, and all treatment
Unit collectively forms the constraint.
Paralleling tactic, the method for parallel processing based on processing unit;Specially:
Step 1, Thread Count N needed for calculating, if since the father node of current processing unit, to the path of constraint head node
In variable<υ1,υ2,...υn>Corresponding set of context information is combined into<Si,S2,...Sn>, in each contextual information set
Information bar number is<I1,I2,...In>, then N=I1×I2×...×In;If current processing unit does not include any change to head node
Amount, or current processing unit includes head node, then N=1;
Step 2, generate N number of GPU threads, thread id is from 0 to N-1 (id is distributed automatically by GPU);Each thread is according to certainly
Body id independently calculates its corresponding assignment, if integer value Mi=j represents variable υiTake its correspondence set SiMiddle j-th strip information (0≤
Mi<Ii);Then MiValue draw according to the following steps:
i:If size=1, cur=n;
ii:If cur >=1, rotor iii otherwise terminates;
iii:Size=size*Icur;Cur=cur-1, rotor ii.
The evaluation mapping that step 3, each thread will be calculated is to be processed in processing unit, producing each thread to need
Parallel computation unit;Each thread independent process each parallel computation unit;
Described all GPU threads are concurrently performed, and do not exist dependence from each other.
Storage strategy.Two level storage methods of index-outcome pool, mainly include three parts:1) array of indexes, comprising
Two domains:The original position pos and length len of result;2) result array;3) result array position indicator pointer Pointer (abbreviation positions
Put pointer), it can only mutually exclusive be write.If the result length of n thread generation is respectively l1,l2...li...ln, the rope
Draw-two level storage methods of outcome pool are specially:
Step 1, each thread calculate storage location of the respective present node in array of indexes according to its assignment;
The acquisition result array position indicator pointer of step 2, each Line Procedure Mutually-exclusive, if ith thread gets the result array
Position indicator pointer, then it the original position pos of the node is set to Pointer currencys, afterwards,
Step 3, the value for updating the result array position indicator pointer:Pointernew=Pointerold+li, wherein,
PointeroldIt is position indicator pointer initial value, liFor the result length that ith thread is produced;
The result array position indicator pointer is discharged after step 4, renewal to be used for other threads, and result is inserted into number of results
Group, during as a result length inserts the result length len of the node.
Beneficial effects of the present invention:The present invention efficiently can enter row constraint and detect using GPU:Constraint segmentation eliminates constraint
Recurrence in processing procedure, is allowed to be adapted to the working method of GPU;Paralleling tactic based on processing unit so that each thread
Can be independently positioned and processing data, improve the concurrency of the method;Concurrent storage strategy has been obviously improved storage efficiency.
The invention is greatly reduced the dependence to cpu resource, so that cpu resource energy while the efficiency of constraint detection is improved
It is enough more to serve other application.Further, since GPU and CPU can be performed simultaneously, in the absence of mutual situation about waiting, because
This, the method also characteristic can obtain gain in extra efficiency whereby.
Brief description of the drawings
Fig. 1 constraint treatment Use Case Maps of the invention.
Context mapping and paralleling tactic in Fig. 2 calculating process of the present invention.
The storage strategy of Fig. 3 two levels of the present invention.
Specific embodiment
The present invention is described in further detail below in conjunction with the drawings and specific embodiments.
The constraint detection method based on GPU of the present embodiment.The core of the method is three parts:Constraint pretreatment;
Paralleling tactic;Storage strategy.Specifically:
1. constraint pretreatment.The present invention proposes the constraint dividing method based on measure word, comprises the steps of:
A) it is present node to specify constraint head node, is split since present node;
If b) present node is " generality quantifier " or " existential quantifier " node, by the partial segmentation into two sub-portions
Point, a subdivision is terminated with the measure word node, and another part specifies the measure word node since the child node of the measure word node
Child node for present node continue split;
If c) present node is AND node, OR node or " containing " node, then the left child node of the node is specified
For present node continues to split, after having processed left child node, the right child node of the node is specified to continue to split for present node;
If d) present node is " non-" node, the child node of the node is specified to continue to split for present node;
If e) present node is " function " node, stop the recurrence of current branch.After through over-segmentation, a constraint is turned
It is changed into some processing units, each processing unit is non-intersect, and all processing units collectively form the constraint.Fig. 1 illustrates one
Bar is constrained, and its implication is as follows:For any taxi in the A of city, the distance of its traveling within a period of time is only
Can be in a rational scope.For this constraint, it will be divided into three parts according to above-mentioned algorithm, dashed lines institute
Show.
2. paralleling tactic.Method for parallel processing based on processing unit, comprises the steps of:
A) Thread Count N needed for calculating.If since the father node of current processing unit, in the path of constraint head node
Variable<υ1,υ2,...υn>Corresponding set of context information is combined into<S1,S2,...Sn>, the information in each contextual information set
Bar number is<I1,I2,...In>, then N=I1×I2×...×In;If current processing unit does not include any variable to head node,
Or current processing unit includes head node, then N=1;
B) N number of GPU threads are generated, thread id is from 0 to N-1 (id is distributed automatically by GPU);Each thread is according to itself id
Independently calculate its corresponding assignment.If integer value Mi=j represents variable υiTake its correspondence set SiMiddle j-th strip information (0≤Mi<
li), then MiValue draw according to the following steps:
I. size=1, cur=n are made;
If ii. cur >=1, turns iii, otherwise terminate;
iii.Size=size*Icur;Cur=cur-1, turns ii
C) evaluation mapping that each thread will be calculated is to be processed parallel in processing unit, producing each thread to need
Computing unit;Each thread independent process each parallel computation unit.
Described all GPU threads are concurrently performed, and do not exist dependence from each other.As a example by being constrained shown in Fig. 1.
If the taxi information being currently received in two A cities:Taxi 1 and taxi 2.Then in calculation processing unit 1, according to
Above-mentioned algorithm steps a), processing unit to root node has 2 variables (a and b), and each variable can take two value (taxis 1
With taxi 2), therefore N=4;4 threads (id is respectively 0,1,2,3) are generated according to step b).Each thread is independently calculated
The value of its variable.As a example by calculating the variable-value of the thread that id is 3, the corresponding assignment calculating process of two variables is:
Size=1, cur=2;
Size=1 × Icur=1 × 2=2, cur=cur-1=1;
Due to cur >=1, continue the process, M can be obtained1=1;Finally draw M1=1, M2=1, it is noted that information be from
0 open numbering, therefore M1=1, M2=1 means that first variable and second variable all take Article 2 in respective information aggregate
Information, i.e. (a=taxis 2, b=taxis 2).Value is mapped in processing unit, parallel computation unit can be obtained.Figure
2 parallel computation unit group 1 illustrates 4 parallel computation units of generation.
3. storage strategy.Two level storage methods of index-outcome pool, mainly include three parts:1) array of indexes, bag
Containing two domains:The original position pos and length len of result;2) result array;3) result array position indicator pointer Pointer (abbreviations
Position indicator pointer), it can only mutually exclusive be write.If the result length of n thread generation is respectively l1,l2...li...ln, then the party
Method storing process is as follows:
A) each thread calculates storage location of the respective present node in array of indexes according to its assignment;
B) the acquisition position indicator pointer of each Line Procedure Mutually-exclusive, if ith thread gets the position indicator pointer, then it is by the node
Original position pos be set to Pointer currencys, afterwards,
C) value of the position indicator pointer is updated:Pointernew=Pointerold+li, wherein, PointeroldIt is position indicator pointer
Initial value, liFor the result length that ith thread is produced.
D) discharge the position indicator pointer after updating to be used for other threads, and result is inserted into result array, as a result length is filled out
In entering the result length len of the node.
The storage strategy is illustrated in Fig. 3:It is provided with three threads and is processing three nodes simultaneously, they is required for result
Write-in memory.If the result of these three nodes is respectively necessary for 1,3,2 memory spaces of occupancy, then they will apply accessing simultaneously
Position indicator pointer (currency is 0).Assuming that thread t1The position indicator pointer access right is got, then it sets the starting of the result of oneself
Position is Pointer currencys (i.e. 0), and updates the value Pointer of Pointernew=Pointerold+ 1=1, discharges afterwards
The pointer supplies thread t2And t3Access;t1Update after position indicator pointer value, during result inserted into description information array, and update rope
The length (i.e. 1) of oneself in argument group.First element have recorded t in array of indexes1The storage original position of the result of generation
And length (1) (0).If t2Prior to t3Pointer access rights are obtained, due to t1Renewal, the currency of Pointer is 1, therefore,
t2Result original position be 1, update Pointer value Pointernew=Pointerold+ 3=4, discharges Pointer, knot
Fruit is inserted in description information array, and updates the length (i.e. 3) of oneself in array of indexes.The serial allocation space of this strategy,
But it can be parallel and Lothrus apterus completions that multiple threads write result simultaneously.
Above-mentioned 1,2 elaborate the method that is constrained using GPU parallel detections, and 3 elaborate to be applied to the efficient concurrent write of GPU
The method of result.
Above example is described only for partial function of the invention, but embodiment and accompanying drawing are not for limiting
It is fixed of the invention.Without departing from the spirit and scope of the invention, any equivalence changes done or retouching, also belong to this hair
Bright protection domain.Therefore the content that protection scope of the present invention should be defined by claims hereof is defined.
Claims (4)
1. a kind of parallelization based on graphic process unit constrains detection method, it is characterised in that it includes:
Constraint segmentation step based on measure word;
Parallel processing step based on processing unit;
Storage strategy step;
The constraint segmentation step based on measure word is specially:
Step 1, specified constraint head node are present node, are split since present node;
If step 2, present node are " generality quantifier " or " existential quantifier " node, by the node allocation into two sub-portions
Point, a subdivision is terminated with the measure word node, and another part specifies the measure word node since the child node of the measure word node
Child node for present node continue split;
If step 3, present node are AND node, OR node or " containing " node, then the left child node of the node is specified
For present node continues to split, after having processed left child node, the right child node of the node is specified to continue to split for present node;
If step 4, present node are " non-" node, the child node of the node is specified to continue to split for present node;
If step 5, present node are " function " node, stop the recurrence of current branch;
After through over-segmentation, a constraint is converted into some processing units, and each processing unit is non-intersect, and all processing units
Collectively form the constraint.
2. parallelization according to claim 1 constrains detection method, it is characterised in that:It is described based on the parallel of processing unit
Process step, specially:
Step 1, Thread Count N needed for calculating, if since the father node of current processing unit, in the path of constraint head node
Variable<v1,v2,…vn>Corresponding set of context information is combined into<S1,S2,…Sn>, the information bar in each contextual information set
Number is<I1,I2,…In>, then N=I1×I2×…×In;If current processing unit does not include any variable to head node, or
Current processing unit includes head node, then N=1;
Step 2, generate N number of GPU threads, thread id is from 0 to N-1 (id is distributed automatically by GPU);Each thread is according to itself id
Its corresponding assignment is independently calculated, if integer value Mi=j represents variable viTake its correspondence set SiMiddle j-th strip information (0≤Mi<
Ii);
The evaluation mapping that step 3, each thread will be calculated is to be processed parallel in processing unit, producing each thread to need
Computing unit;Each thread independent process each parallel computation unit;
Described all GPU threads are concurrently performed, and do not exist dependence from each other.
3. parallelization according to claim 2 constrains detection method, it is characterised in that:M in step 2iValue press following step
Suddenly draw:
I. sub-step i:If size=1, cur=n;
Ii. sub-step ii:If cur >=1, rotor step iii otherwise terminates;
Sub-step iii:Size=size*Icur;Cur=cur-1, rotor step ii.
4. parallelization according to claim 1 constrains detection method, it is characterised in that:The storage strategy step is specific
For:Two level storage methods of index-outcome pool, comprising three parts:1) array of indexes, comprising two domains:The starting of result
Position pos and length len;2) result array;3) result array position indicator pointer Pointer, it can only mutually exclusive be write;
If the result length of n thread generation is respectively l1,l2…li…ln, two level storage methods of the index-outcome pool
Specially:
Step 1, each thread calculate storage location of the respective present node in array of indexes according to its assignment;
The acquisition result array position indicator pointer of step 2, each Line Procedure Mutually-exclusive, if ith thread gets the result array position
Pointer, then it the original position pos of the node is set to Pointer currencys, afterwards,
Step 3, the value for updating the result array position indicator pointer:Pointernew=Pointerold+li, wherein, PointeroldFor
Position indicator pointer initial value, liFor the result length that ith thread is produced;
The result array position indicator pointer is discharged after step 4, renewal to be used for other threads, and result is inserted into result array, tie
Fruit length is inserted in the result length len of the node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410358441.9A CN104123178B (en) | 2014-07-25 | 2014-07-25 | Parallelism constraint detection method based on GPUs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410358441.9A CN104123178B (en) | 2014-07-25 | 2014-07-25 | Parallelism constraint detection method based on GPUs |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104123178A CN104123178A (en) | 2014-10-29 |
CN104123178B true CN104123178B (en) | 2017-05-17 |
Family
ID=51768602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410358441.9A Active CN104123178B (en) | 2014-07-25 | 2014-07-25 | Parallelism constraint detection method based on GPUs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104123178B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2985643A1 (en) | 2015-05-15 | 2016-11-24 | Cox Automotive, Inc. | Parallel processing for solution space partitions |
EP3360050A4 (en) * | 2015-10-05 | 2019-03-20 | Cox Automotive, Inc. | Parallel processing for solution space partitions |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7120903B2 (en) * | 2001-09-26 | 2006-10-10 | Nec Corporation | Data processing apparatus and method for generating the data of an object program for a parallel operation apparatus |
CN101593129A (en) * | 2008-05-28 | 2009-12-02 | 国际商业机器公司 | Triggering has the method and apparatus of execution of a plurality of incidents of restriction relation |
CN103201764A (en) * | 2010-11-12 | 2013-07-10 | 高通股份有限公司 | Parallel image processing using multiple processors |
CN103294775A (en) * | 2013-05-10 | 2013-09-11 | 苏州祥益网络科技有限公司 | Police service cloud image recognition vehicle management and control system based on geographic space-time constraint |
-
2014
- 2014-07-25 CN CN201410358441.9A patent/CN104123178B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7120903B2 (en) * | 2001-09-26 | 2006-10-10 | Nec Corporation | Data processing apparatus and method for generating the data of an object program for a parallel operation apparatus |
CN101593129A (en) * | 2008-05-28 | 2009-12-02 | 国际商业机器公司 | Triggering has the method and apparatus of execution of a plurality of incidents of restriction relation |
CN103201764A (en) * | 2010-11-12 | 2013-07-10 | 高通股份有限公司 | Parallel image processing using multiple processors |
CN103294775A (en) * | 2013-05-10 | 2013-09-11 | 苏州祥益网络科技有限公司 | Police service cloud image recognition vehicle management and control system based on geographic space-time constraint |
Also Published As
Publication number | Publication date |
---|---|
CN104123178A (en) | 2014-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108829610B (en) | Memory management method and device in neural network forward computing process | |
JP5425541B2 (en) | Method and apparatus for partitioning and sorting data sets on a multiprocessor system | |
CN107609098B (en) | Searching method and device | |
US6505283B1 (en) | Efficient memory allocator utilizing a dual free-list structure | |
US20120036134A1 (en) | Performing concurrent rehashing of a hash table for multithreaded applications | |
CN107918642A (en) | Data query method, server and computer-readable recording medium | |
CN102538801B (en) | A kind of disposal route of road network data in navigation map and device | |
CN106055401B (en) | Magnanimity calculates the parallel automatic start-stop and calculating task dynamic allocation method of coarse granule | |
CN104361415B (en) | A kind of choosing method and device for showing information | |
CN105045863B (en) | A kind of method and system for Entities Matching | |
CN107491525A (en) | Distributed address comparison method and device | |
CN104123178B (en) | Parallelism constraint detection method based on GPUs | |
CN106649391A (en) | Graph data processing method and apparatus | |
CN112786215A (en) | Method and system for generating DIP comprehensive disease category catalog based on big data clustering | |
CN106709503A (en) | Large spatial data clustering algorithm K-DBSCAN based on density | |
CN105095515A (en) | Bucket dividing method, device and equipment supporting fast query of Map-Reduce output result | |
CN107402905A (en) | Computational methods and device based on neutral net | |
CN109033365A (en) | A kind of data processing method and relevant device | |
CN113031930A (en) | Source code confusion generation method and device for control flow flattening | |
CN106777065A (en) | The method and system that a kind of Frequent tree mining is excavated | |
CN104850646B (en) | A kind of Frequent tree mining method for digging for single uncertain figure | |
Schram et al. | Simulation of ring polymer melts with GPU acceleration | |
US8805891B2 (en) | B-tree ordinal approximation | |
CN105573717B (en) | A kind of procedure division method and device of multi-core processor oriented | |
Luria et al. | A sharp threshold for spanning 2‐spheres in random 2‐complexes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220126 Address after: 250014 No. 41-1 Qianfo Shandong Road, Jinan City, Shandong Province Patentee after: SHANDONG CVIC SOFTWARE ENGINEERING Co.,Ltd. Address before: 210093 No. 22, Hankou Road, Gulou District, Jiangsu, Nanjing Patentee before: NANJING University |
|
TR01 | Transfer of patent right |