CN104123178B

CN104123178B - Parallelism constraint detection method based on GPUs

Info

Publication number: CN104123178B
Application number: CN201410358441.9A
Authority: CN
Inventors: 许畅; 马晓星; 吕建; 眭骏
Original assignee: Nanjing University
Current assignee: CVIC Software Engineering Co Ltd
Priority date: 2014-07-25
Filing date: 2014-07-25
Publication date: 2017-05-17
Anticipated expiration: 2034-07-25
Also published as: CN104123178A

Abstract

The invention provides a parallelism constraint detection method based on GPUs. The method comprises the steps that 1, a constraint is divided into the processing units with quantifiers as division points, and by dispatching the processing units, recursions in the detection process are eliminated and the degree of parallelism is maximized; 2, according to the current processing units and an information set, GPU threads of the corresponding number are generated, a corresponding variable assignment of each GPU thread is calculated according to the thread number of the GPU, and the processing units under the assignments are processed, wherein each processing unit with one assignment is named as a parallel computing unit, and the parallel computing units are the minimum units which can be processed in the GPUs in parallel; 3, a second-level storage strategy of an index and a result pool is adopted, non-fixed-length results generated by nodes of all the parallel computing units are stored in the result pool, the initial addresses and lengths, in the result pool, of the results generated by the nodes are stored in the index, the strategy of distributing a space serially and writing the results in parallel is adopted, and the higher writing speed is achieved.

Description

Parallelization constraint detection method based on graphic process unit

Technical field

Detection method is constrained the present invention relates to a kind of parallelization based on graphic process unit.

Background technology

Constraint detection is a kind of method of conventional checking information validity.One constraint reflects an information or many The relation that should be met between bar information.In general, a constraint is formed by the connection of several node：" generality quantifier " is saved Point, " existential quantifier " node, AND node, OR node, " containing " node, " non-" node and " function " node.Every kind of node Describe a specific relation.Detect that constraint is：The information of acquisition is checked with predefined constraint, constraint is violated An information or a group information be invalid.Constraint detection is typically combined in other application.

The mode of present confinement detection mainly has two classes：Increment type is detected and parallel detection.But, both modes are all complete Central processing unit (CPU) is depended on entirely, therefore can consume the computing resource that should be largely used for other application.This method Calculating is no longer dependent on CPU, conversely, it relies primarily on graphic process unit (GPU) being calculated.Therefore, the method is improve While constraining the speed of detection, also ensure that sufficient computing resource is used for other application.

The content of the invention

It is time-consuming to present confinement detection excessive for the deficiencies in the prior art, the excessive shortcoming of resource is taken, this Invention proposes a kind of constraint detection method based on GPU.The core of the method is three parts：Constraint pretreatment；Parallel Strategy；Storage strategy.

The technical scheme is that：A kind of parallelization constraint detection method based on graphic process unit, it includes：

Constraint pretreatment, the constraint dividing method based on measure word；Specially：

Step 1, specified constraint head node are present node, are split since present node；

If step 2, present node are " generality quantifier " or " existential quantifier " node, the Tai Ho are divided into two sons Part a, subdivision is terminated with the measure word node, and another part specifies the measure word section since the child node of the measure word node The child node of point continues to split for present node；

If step 3, present node are AND node, OR node or " containing " node, then the left son of the node is specified Node is that present node continuation is split, and after having processed left child node, specifies the right child node of the node to continue to divide for present node Cut；

If step 4, present node are " non-" node, the child node of the node is specified to continue to split for present node；

If step 5, present node are " function " node, stop the recurrence of current branch；

After through over-segmentation, a constraint is converted into some processing units, and each processing unit is non-intersect, and all treatment Unit collectively forms the constraint.

Paralleling tactic, the method for parallel processing based on processing unit；Specially：

Step 1, Thread Count N needed for calculating, if since the father node of current processing unit, to the path of constraint head node In variable<υ₁,υ₂,...υ_n>Corresponding set of context information is combined into<S_i,S₂,...S_n>, in each contextual information set Information bar number is<I₁,I₂,...I_n>, then N=I₁×I₂×...×I_n；If current processing unit does not include any change to head node Amount, or current processing unit includes head node, then N=1；

Step 2, generate N number of GPU threads, thread id is from 0 to N-1 (id is distributed automatically by GPU)；Each thread is according to certainly Body id independently calculates its corresponding assignment, if integer value M_i=j represents variable υ_iTake its correspondence set S_iMiddle j-th strip information (0≤ M_i<I_i)；Then M_iValue draw according to the following steps：

i：If size=1, cur=n；

ii：If cur >=1, rotor iii otherwise terminates；

iii：Size=size*I_cur；Cur=cur-1, rotor ii.

The evaluation mapping that step 3, each thread will be calculated is to be processed in processing unit, producing each thread to need Parallel computation unit；Each thread independent process each parallel computation unit；

Described all GPU threads are concurrently performed, and do not exist dependence from each other.

Storage strategy.Two level storage methods of index-outcome pool, mainly include three parts：1) array of indexes, comprising Two domains：The original position pos and length len of result；2) result array；3) result array position indicator pointer Pointer (abbreviation positions Put pointer), it can only mutually exclusive be write.If the result length of n thread generation is respectively l₁,l₂...l_i...l_n, the rope Draw-two level storage methods of outcome pool are specially：

Step 1, each thread calculate storage location of the respective present node in array of indexes according to its assignment；

The acquisition result array position indicator pointer of step 2, each Line Procedure Mutually-exclusive, if ith thread gets the result array Position indicator pointer, then it the original position pos of the node is set to Pointer currencys, afterwards,

Step 3, the value for updating the result array position indicator pointer：Pointer_new=Pointer_old+l_i, wherein, Pointer_oldIt is position indicator pointer initial value, l_iFor the result length that ith thread is produced；

The result array position indicator pointer is discharged after step 4, renewal to be used for other threads, and result is inserted into number of results Group, during as a result length inserts the result length len of the node.

Beneficial effects of the present invention：The present invention efficiently can enter row constraint and detect using GPU：Constraint segmentation eliminates constraint Recurrence in processing procedure, is allowed to be adapted to the working method of GPU；Paralleling tactic based on processing unit so that each thread Can be independently positioned and processing data, improve the concurrency of the method；Concurrent storage strategy has been obviously improved storage efficiency. The invention is greatly reduced the dependence to cpu resource, so that cpu resource energy while the efficiency of constraint detection is improved It is enough more to serve other application.Further, since GPU and CPU can be performed simultaneously, in the absence of mutual situation about waiting, because This, the method also characteristic can obtain gain in extra efficiency whereby.

Brief description of the drawings

Fig. 1 constraint treatment Use Case Maps of the invention.

Context mapping and paralleling tactic in Fig. 2 calculating process of the present invention.

The storage strategy of Fig. 3 two levels of the present invention.

Specific embodiment

The present invention is described in further detail below in conjunction with the drawings and specific embodiments.

The constraint detection method based on GPU of the present embodiment.The core of the method is three parts：Constraint pretreatment； Paralleling tactic；Storage strategy.Specifically：

1. constraint pretreatment.The present invention proposes the constraint dividing method based on measure word, comprises the steps of：

A) it is present node to specify constraint head node, is split since present node；

If b) present node is " generality quantifier " or " existential quantifier " node, by the partial segmentation into two sub-portions Point, a subdivision is terminated with the measure word node, and another part specifies the measure word node since the child node of the measure word node Child node for present node continue split；

If c) present node is AND node, OR node or " containing " node, then the left child node of the node is specified For present node continues to split, after having processed left child node, the right child node of the node is specified to continue to split for present node；

If d) present node is " non-" node, the child node of the node is specified to continue to split for present node；

If e) present node is " function " node, stop the recurrence of current branch.After through over-segmentation, a constraint is turned It is changed into some processing units, each processing unit is non-intersect, and all processing units collectively form the constraint.Fig. 1 illustrates one Bar is constrained, and its implication is as follows：For any taxi in the A of city, the distance of its traveling within a period of time is only Can be in a rational scope.For this constraint, it will be divided into three parts according to above-mentioned algorithm, dashed lines institute Show.

2. paralleling tactic.Method for parallel processing based on processing unit, comprises the steps of：

A) Thread Count N needed for calculating.If since the father node of current processing unit, in the path of constraint head node Variable<υ₁,υ₂,...υ_n>Corresponding set of context information is combined into<S₁,S₂,...S_n>, the information in each contextual information set Bar number is<I₁,I₂,...I_n>, then N=I₁×I₂×...×I_n；If current processing unit does not include any variable to head node, Or current processing unit includes head node, then N=1；

B) N number of GPU threads are generated, thread id is from 0 to N-1 (id is distributed automatically by GPU)；Each thread is according to itself id Independently calculate its corresponding assignment.If integer value M_i=j represents variable υ_iTake its correspondence set S_iMiddle j-th strip information (0≤M_i< l_i), then M_iValue draw according to the following steps：

I. size=1, cur=n are made；

If ii. cur >=1, turns iii, otherwise terminate；

iii.Size=size*I_cur；Cur=cur-1, turns ii

C) evaluation mapping that each thread will be calculated is to be processed parallel in processing unit, producing each thread to need Computing unit；Each thread independent process each parallel computation unit.

Described all GPU threads are concurrently performed, and do not exist dependence from each other.As a example by being constrained shown in Fig. 1. If the taxi information being currently received in two A cities：Taxi 1 and taxi 2.Then in calculation processing unit 1, according to Above-mentioned algorithm steps a), processing unit to root node has 2 variables (a and b), and each variable can take two value (taxis 1 With taxi 2), therefore N=4；4 threads (id is respectively 0,1,2,3) are generated according to step b).Each thread is independently calculated The value of its variable.As a example by calculating the variable-value of the thread that id is 3, the corresponding assignment calculating process of two variables is：

Size=1, cur=2；

Size=1 × I_cur=1 × 2=2, cur=cur-1=1；

Due to cur >=1, continue the process, M can be obtained₁=1；Finally draw M₁=1, M₂=1, it is noted that information be from 0 open numbering, therefore M₁=1, M₂=1 means that first variable and second variable all take Article 2 in respective information aggregate Information, i.e. (a=taxis 2, b=taxis 2).Value is mapped in processing unit, parallel computation unit can be obtained.Figure 2 parallel computation unit group 1 illustrates 4 parallel computation units of generation.

3. storage strategy.Two level storage methods of index-outcome pool, mainly include three parts：1) array of indexes, bag Containing two domains：The original position pos and length len of result；2) result array；3) result array position indicator pointer Pointer (abbreviations Position indicator pointer), it can only mutually exclusive be write.If the result length of n thread generation is respectively l₁,l₂...l_i...l_n, then the party Method storing process is as follows：

A) each thread calculates storage location of the respective present node in array of indexes according to its assignment；

B) the acquisition position indicator pointer of each Line Procedure Mutually-exclusive, if ith thread gets the position indicator pointer, then it is by the node Original position pos be set to Pointer currencys, afterwards,

C) value of the position indicator pointer is updated：Pointer_new=Pointer_old+l_i, wherein, Pointer_oldIt is position indicator pointer Initial value, l_iFor the result length that ith thread is produced.

D) discharge the position indicator pointer after updating to be used for other threads, and result is inserted into result array, as a result length is filled out In entering the result length len of the node.

The storage strategy is illustrated in Fig. 3：It is provided with three threads and is processing three nodes simultaneously, they is required for result Write-in memory.If the result of these three nodes is respectively necessary for 1,3,2 memory spaces of occupancy, then they will apply accessing simultaneously Position indicator pointer (currency is 0).Assuming that thread t₁The position indicator pointer access right is got, then it sets the starting of the result of oneself Position is Pointer currencys (i.e. 0), and updates the value Pointer of Pointer_new=Pointer_old+ 1=1, discharges afterwards The pointer supplies thread t₂And t₃Access；t₁Update after position indicator pointer value, during result inserted into description information array, and update rope The length (i.e. 1) of oneself in argument group.First element have recorded t in array of indexes₁The storage original position of the result of generation And length (1) (0).If t₂Prior to t₃Pointer access rights are obtained, due to t₁Renewal, the currency of Pointer is 1, therefore, t₂Result original position be 1, update Pointer value Pointer_new=Pointer_old+ 3=4, discharges Pointer, knot Fruit is inserted in description information array, and updates the length (i.e. 3) of oneself in array of indexes.The serial allocation space of this strategy, But it can be parallel and Lothrus apterus completions that multiple threads write result simultaneously.

Above-mentioned 1,2 elaborate the method that is constrained using GPU parallel detections, and 3 elaborate to be applied to the efficient concurrent write of GPU The method of result.

Above example is described only for partial function of the invention, but embodiment and accompanying drawing are not for limiting It is fixed of the invention.Without departing from the spirit and scope of the invention, any equivalence changes done or retouching, also belong to this hair Bright protection domain.Therefore the content that protection scope of the present invention should be defined by claims hereof is defined.

Claims

1. a kind of parallelization based on graphic process unit constrains detection method, it is characterised in that it includes：

Constraint segmentation step based on measure word；

Parallel processing step based on processing unit；

Storage strategy step；

The constraint segmentation step based on measure word is specially：

If step 2, present node are " generality quantifier " or " existential quantifier " node, by the node allocation into two sub-portions Point, a subdivision is terminated with the measure word node, and another part specifies the measure word node since the child node of the measure word node Child node for present node continue split；

If step 3, present node are AND node, OR node or " containing " node, then the left child node of the node is specified For present node continues to split, after having processed left child node, the right child node of the node is specified to continue to split for present node；

After through over-segmentation, a constraint is converted into some processing units, and each processing unit is non-intersect, and all processing units Collectively form the constraint.

2. parallelization according to claim 1 constrains detection method, it is characterised in that：It is described based on the parallel of processing unit Process step, specially：

Step 1, Thread Count N needed for calculating, if since the father node of current processing unit, in the path of constraint head node Variable<v₁,v₂,…v_n>Corresponding set of context information is combined into<S₁,S₂,…S_n>, the information bar in each contextual information set Number is<I₁,I₂,…I_n>, then N=I₁×I₂×…×I_n；If current processing unit does not include any variable to head node, or Current processing unit includes head node, then N=1；

Step 2, generate N number of GPU threads, thread id is from 0 to N-1 (id is distributed automatically by GPU)；Each thread is according to itself id Its corresponding assignment is independently calculated, if integer value M_i=j represents variable v_iTake its correspondence set S_iMiddle j-th strip information (0≤M_i＜ I_i)；

The evaluation mapping that step 3, each thread will be calculated is to be processed parallel in processing unit, producing each thread to need Computing unit；Each thread independent process each parallel computation unit；

3. parallelization according to claim 2 constrains detection method, it is characterised in that：M in step 2_iValue press following step Suddenly draw：

I. sub-step i：If size=1, cur=n；

Ii. sub-step ii：If cur >=1, rotor step iii otherwise terminates；

Sub-step iii：Size=size*I_cur；Cur=cur-1, rotor step ii.

4. parallelization according to claim 1 constrains detection method, it is characterised in that：The storage strategy step is specific For：Two level storage methods of index-outcome pool, comprising three parts：1) array of indexes, comprising two domains：The starting of result Position pos and length len；2) result array；3) result array position indicator pointer Pointer, it can only mutually exclusive be write；

If the result length of n thread generation is respectively l₁,l₂…l_i…l_n, two level storage methods of the index-outcome pool Specially：

The acquisition result array position indicator pointer of step 2, each Line Procedure Mutually-exclusive, if ith thread gets the result array position Pointer, then it the original position pos of the node is set to Pointer currencys, afterwards,

Step 3, the value for updating the result array position indicator pointer：Pointer_new=Pointer_old+l_i, wherein, Pointer_oldFor Position indicator pointer initial value, l_iFor the result length that ith thread is produced；

The result array position indicator pointer is discharged after step 4, renewal to be used for other threads, and result is inserted into result array, tie Fruit length is inserted in the result length len of the node.