CN110111005A - The single method and apparatus of intelligence point, computer-readable medium and logistics system - Google Patents
The single method and apparatus of intelligence point, computer-readable medium and logistics system Download PDFInfo
- Publication number
- CN110111005A CN110111005A CN201910382830.8A CN201910382830A CN110111005A CN 110111005 A CN110111005 A CN 110111005A CN 201910382830 A CN201910382830 A CN 201910382830A CN 110111005 A CN110111005 A CN 110111005A
- Authority
- CN
- China
- Prior art keywords
- state
- ratio
- intelligence
- order
- learning model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000007704 transition Effects 0.000 claims abstract description 28
- 238000009825 accumulation Methods 0.000 claims abstract description 26
- 238000012546 transfer Methods 0.000 claims abstract description 11
- 238000012360 testing method Methods 0.000 claims abstract description 6
- 230000009471 action Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 6
- 238000004519 manufacturing process Methods 0.000 claims description 2
- 238000011217 control strategy Methods 0.000 claims 1
- 239000004744 fabric Substances 0.000 claims 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 13
- 238000009826 distribution Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06311—Scheduling, planning or task assignment for a person or group
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/083—Shipping
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Game Theory and Decision Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of single method and apparatus of intelligence point, computer-readable medium and logistics systems.This method comprises: A: being based on history logistics order, calculate state transition probability;B: using the Order splitting of one Transportation Model of current time accumulation ratio as state, determine the allocation proportion for distributing to the Transportation Model as movement current time, accumulation ratio to be achieved will be needed to terminate ratio as state at a preset time period end, complete state transfer control scheme list is established using the state transition probability, and establishes intensified learning model;C: being trained intensified learning model until the strategy exported is restrained;D: testing the strategy of output using the data of the history logistics order, and judges whether the Order splitting ratio of the preset time period meets the state and terminate ratio, if it is satisfied, then the strategy using output divides folk prescription case as intelligence;If conditions are not met, then continuing to repeat step C-D.
Description
Technical field
The present invention relates to logistics systems, more particularly to the single method and apparatus of intelligence point, computer-readable medium and logistics
System.
Background technique
Intelligent logistics transport field is the crossing domain of artificial intelligence and logistics field, it is intended to pass through the intelligence in artificial intelligence
Can algorithm substitute artificial method to solve the FAQs of logistics field, generally have intelligence divide Dan Wenti, path planning problem,
Road junction Plan Problem, vehicle dispatching problem, Warehouse Location problem etc..And the single problem of intelligence point can be divided into: to optimize route
For the purpose of intelligent route optimization point it is single, be that purpose intelligent time saving point is single using efficiency, it is intelligent for the purpose of the following accumulation ratio
Ratio regulation point list etc..
The related single field of ratio regulation point intelligent for the purpose of the following accumulation ratio, existing point of Intelligent logistics transport field
With ratio problems, there are no extraordinary solutions, due to that can not know the concrete condition of the following quantity on order daily, at present
Common practices be all to divide list at random according to the experience of oneself by staff, finally be difficult to accomplish to set in advance in the result at the end of month
Fixed ratio.
Summary of the invention
The present invention is to solve intelligent ratio regulation divides Dan Wenti for the purpose of the following accumulation ratio.
The present invention provides a kind of intelligence to divide folk prescription method, which comprises
A: being based on history logistics order, calculates state transition probability;
B: it using the Order splitting of one Transportation Model of current time accumulation ratio as state, will determine at current time to distribute to
The allocation proportion of the Transportation Model will need accumulation ratio to be achieved to terminate as state as movement at a preset time period end
Ratio establishes complete state using the state transition probability and shifts control scheme list, and establishes intensified learning model;
C: being trained intensified learning model until the strategy exported is restrained;
D: the strategy of output is tested using the data of the history logistics order, judges ordering for the preset time period
Whether single allocation proportion, which meets the state, terminates ratio, if it is satisfied, then the strategy using output divides folk prescription case as intelligence;Such as
Fruit is unsatisfactory for, then continues to repeat step C-D.
In one embodiment, the step A includes:
Quantity statistics are carried out to the history logistics order, obtain the distributed area of History Order quantity;
Based on the distributed area of the History Order quantity, the state transition probability is calculated.
In one embodiment, quantity statistics are carried out to the history logistics order and obtains the distributed area of History Order quantity
Between the step of include:
To history logistics order data production two dimensionization figure, finds out and be distributed most intensive region, obtain the quantity on order
Distributed area.
In one embodiment, the intensified learning model is Q-learning model, and the strategy of the output is Q-
table。
In one embodiment, the intensified learning model of establishing includes:
The intensified learning model uses Q (St,At) indicate, wherein Q (St,At) indicate the state S under t momenttIt executes
Act AtValue;
Establish empty Q-table, and the Q value in random initializtion Q-table;
Init state St, and establish the state and terminate ratio, ratio is terminated with regard to table if algorithm reaches the state
Show that one bout terminates;
For each bout, carry out the S to initialization in the method for ε-greedytThe movement executed is chosen, random chance is worked as
It is current to choose the maximum movement of Q value when greater than ε, and when random chance is less than ε, random move also is taken in movement
Make, to obtain the S of next stept+1With return value R;
To the intensified learning model Q (St,At) be updated;
Iterate multi-round, until Q-table converges on a fixed value.
In one embodiment, the more new formula of the intensified learning model are as follows:
Q(St,At)=Q (St,At)+α〔Rt+1+γ·maxaQ(St+1,a)-Q(St,At)〕
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table
Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped
Value when going to subsequent time and executing that movement of Maximum Value inside all action states.
In one embodiment, in state transfer control scheme list, wherein the state demarcation are as follows: 0,0.1,
0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;The movement divides are as follows: and 0%, 10%, 20%, 30%, 40%,
50%, 60%, 70%, 80%, 90%, 100%.
In one embodiment, it is 50% that the state, which terminates ratio,.
The present invention also provides a kind of single device of intelligence point, described device includes:
State transition probability determination module is configured to calculate state transition probability based on history logistics order;
Intensified learning model building module is configured to make the Order splitting accumulation ratio of one Transportation Model of current time
For state, current time is determined that the allocation proportion for distributing to the Transportation Model as movement, will need at a preset time period end
Accumulation ratio to be achieved terminates ratio as state, establishes complete state transfer control plan using the state transition probability
Sketch form, and establish intensified learning model;
Intensified learning model training module is configured to be trained the intensified learning model strategy until output
Convergence;And
Judgment module is tested, is configured to test the strategy of output using the data of the history logistics order,
And judge whether the Order splitting ratio of the preset time period meets the state and terminate ratio, if it is satisfied, then using described defeated
Strategy out divides folk prescription case as intelligence;If conditions are not met, then continuing the intensified learning model training module that reruns, directly
Order splitting ratio to the preset time period meets state termination ratio.
In one embodiment, the state transition probability determination module is further configured to order the history logistics
It is single to carry out quantity statistics, obtain the distributed area of History Order quantity;And the distributed area based on the History Order quantity, meter
Calculate the state transition probability.
In one embodiment, the intensified learning model is Q-learning model, and the strategy of the output is Q-
table。
In one embodiment, the intensified learning model building module establishes intensified learning model according to following manner:
The intensified learning model uses Q (St,At) indicate, wherein Q (St,At) indicate the state S under t momenttIt executes
Act AtValue;
Establish empty Q-table, and the Q value in random initializtion Q-table;
Init state St, and establish the state and terminate ratio, ratio is terminated with regard to table if algorithm reaches the state
Show that one bout terminates;
For each bout, carry out the S to initialization in the method for ε-greedytThe movement executed is chosen, random chance is worked as
It is current to choose the maximum movement of Q value when greater than ε, and when random chance is less than ε, random move also is taken in movement
Make, to obtain the S of next stept+1With return value R;
To the intensified learning model Q (St,At) be updated;
Iterate multi-round, until Q-table converges on a fixed value.
In one embodiment, the more new formula of the intensified learning model are as follows:
Q(St,At)=Q (St,At)+α〔Rt+1+γ·maxaQ(St+1,a)-Q(St,At)〕
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table
Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped
Value when going to subsequent time and executing that movement of Maximum Value inside all action states.
In one embodiment, in state transfer control scheme list, wherein the state demarcation are as follows: 0,0.1,
0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;The movement divides are as follows: and 0%, 10%, 20%, 30%, 40%,
50%, 60%, 70%, 80%, 90%, 100%.
In one embodiment, it is 50% that the state, which terminates ratio,.
The present invention also provides a kind of computer-readable mediums, are stored thereon with computer instruction, the computer instruction
Above-mentioned intelligence is executed when operation divides folk prescription method.
The present invention also provides a kind of logistics system, including memory and processor, being stored on the memory can be
The computer instruction run on the processor, the processor are executed when running the computer instruction as above-mentioned intelligence point is single
Method.
The present invention solves the real time orders assignment problem in the intelligent dispatching system of Intelligent logistics field.Based on current time
The ratio accumulated before is allocated using the strategy that intensified learning is acquired, so that it may be controlled order volume and be reached at the end of month
The ratio set before.It is advantageous that even if the intermediate artificial distribution for having adjusted order, finally, it can adjust automatically ratio
Example is simultaneously required finally reaching.Another advantage of the program is can to provide in the case where not knowing the following quantity on order
Suitable distribution, and guarantee to can achieve expected ratio at the end of month.
Detailed description of the invention
The above summary of the invention of the invention and following specific embodiment can obtain more preferably when reading in conjunction with the drawings
Understanding.It should be noted that attached drawing is only used as the example of claimed invention.In the accompanying drawings, identical appended drawing reference
Represent same or similar element.
Fig. 1 shows the flow chart that a kind of intelligence according to an embodiment of the invention divides folk prescription method;
Fig. 2 shows the histograms of quantity on order in history five months;
Fig. 3 shows the X-Y scheme of History Order quantity;
Fig. 4 shows state transfer control scheme list according to an embodiment of the invention;
Fig. 5 shows the value table of Q-table after convergence;
Fig. 6 show intelligence according to an embodiment of the invention divide required by folk prescription method add up divide digital ratio example diagram.
Specific embodiment
Describe detailed features and advantage of the invention in detail in a specific embodiment below, content is enough to make any
Skilled in the art realises that technology contents of the invention and implementing accordingly, and according to specification disclosed by this specification, power
Benefit requires and attached drawing, skilled person readily understands that the relevant purpose of the present invention and advantage.
Intelligent logistics transport field is the crossing domain of artificial intelligence and logistics field, it is intended to pass through the intelligence in artificial intelligence
Can algorithm substitute artificial method to solve the FAQs of logistics field, generally have intelligence divide Dan Wenti, path planning problem,
Road junction Plan Problem, vehicle dispatching problem, Warehouse Location problem etc..And the single problem of intelligence point can be divided into: to optimize route
For the purpose of intelligent route optimization point it is single, be that purpose intelligent time saving point is single using efficiency, it is intelligent for the purpose of the following accumulation ratio
Ratio regulation point list etc..
About the single field of ratio regulation point intelligent for the purpose of the following accumulation ratio, existing point of Intelligent logistics transport field
With ratio problems, there are no extraordinary solutions, due to that can not know the concrete condition of the following quantity on order daily, at present
Common practices be all to divide list at random according to the experience of oneself by staff, finally be difficult to accomplish to set in advance in the result at the end of month
Fixed ratio.
The present invention relates to machine learning and Intelligent logistics transport fields, are suitable for highway, two kinds of water route Transportation Model
Under, real-time monitoring is carried out to daily order, so that the allocation proportion of both the end of month statistics can achieve fixed proportion requirement.
Specifically, the present invention is distributed feelings using the Q-learning algorithm in intensified learning, by statistical history order
Condition, and actual point of single operation is combined, the single mathematical model of the intelligence based on intensified learning point is established, by making computer oneself anti-
Go back to school habit, finally converge to optimal strategy, below when specifically dividing single operation, execute the plan that machine oneself learns
Slightly, entire point of single process is completed, so that the ratio of the highway at the end of month, water route reaches preset ratio.
The present invention daily order volume of analysis of history data first, and it is counted, find the one of daily order volume
A distributed area.By this section we can according to current time water route (or highway) in allocation history order institute
The accumulation ratio occupied is extrapolated and is allocated according to different allocation proportions to water route (or highway) in today, water route (or
Person's highway) accumulation ratio variation.
The present invention establishes the model of intensified learning by the method in conjunction with intensified learning Q-learning, by current time water
The Order splitting accumulation ratio on road (or highway) is as state.It will determine at current time to distribute to water route (or highway)
Percentage is as action, and it is general that the historical statistical information by state and action, and before utilizing estimates state transition
Rate can establish complete state transfer control scheme list.Then, using the ratio for needing the end of month to reach as the knot for the state that jumps
Beam spot is denoted as end, and machine oneself study is allowed to obtain Q (S, A) to control distribution.The more new formula of learning algorithm are as follows:
Q(St,At)=Q (St,At)+α[Rt+1+γ·maxaQ(St+1,a)-Q(St,At)]
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table
Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped
Valence when going to subsequent time and executing that action (being indicated here with a) of Maximum Value inside all action states
Value.
The nitrification enhancement in the present invention is explained in detail with reference to the accompanying drawing, including for statistical analysis to historical data
It obtains state transition probability, by carrying out division modeling to accumulation ratio and allocation proportion, then utilizes the method for intensified learning
Model is trained and obtains allocation strategy, and gives this strategy and completes distribution.
Fig. 1 shows the flow chart that a kind of intelligence based on intensified learning according to an embodiment of the invention divides folk prescription method.
Step 101: quantity statistics are carried out to logistical histories order.In one embodiment, feelings are distributed to the order of history
Condition is counted, and order volume is done to the figure of two dimensionization, is found out and is distributed most intensive region, obtains the distributed area of quantity on order
Between, and it is counted.
For example, can refer to Fig. 2 and Fig. 3, wherein Fig. 2 is the histogram to five months quantity on order of past history, horizontal seat
It is designated as time (unit: day), ordinate is the quantity (unit: a) of order;Fig. 3 is the two dimensionization to History Order quantity and cuts
Take distributed area.
Step 102: being based on distributed area, calculate state transition probability.For example, based on the distributed area that step 101 obtains,
It can extrapolate according to the accumulation ratio that current time water route (or highway) is occupied in allocation history order in today
The variation of the accumulation ratio in water route (or highway) is allocated to water route (perhaps highway) according to different allocation proportions, is obtained
To state transition probability.
Step 103: intensified learning (Q-learning) model is based on, by the order of current time water route (or highway) point
With accumulation ratio as state (state), the percentage for distributing to water route (or highway) will be determined as movement at current time
(action), and the state transition probability that estimates of historical statistical information before utilizing, complete state transfer control is established
Then the accumulation ratio for needing the end of month to reach is denoted as by Policy Table's (i.e. model strategy table) as the end point for the state that jumps
End establishes intensified learning (Q-learning) model, with will pass through study obtain Q (S, A) control distribution.
As an example, Fig. 4 shows state transfer control scheme list according to the present invention, wherein using accumulating rate as shape
State (state) divides are as follows: 0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;It is movement with allotment ratio
(action), it divides are as follows: 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%;And with most
Whole accumulating rate reaches 50% as target.
Step 104: the model of intensified learning (Q-learning) algorithm is trained.
Step 105: constantly training being carried out to the model of the Q-learning algorithm of intensified learning until the tactful Q- exported
Table convergence.In one embodiment, Fig. 5 shows the value table of the Q-table after convergence.
Step 106: the tactful Q-table of output being tested using the data of history.
Step 107: judging whether the Order splitting ratio of every month meets preset state and terminate ratio, if full
Foot, thens follow the steps 108;If conditions are not met, then continuing to execute step 104 until meeting.
Step 108: the tactful Q-table used is as distribution uses the implementation strategy needed daily later.Based on working as
The ratio of preceding order chooses the corresponding maximum allocation proportion of Q-table value as allocation plan.
It should be appreciated by those skilled in the art that be the two kinds of Transportation Models in water route and highway involved in example of the invention,
But the present invention is not limited to both Transportation Models, and in the case where being expected, the present disclosure additionally applies for other kinds of fortune
Defeated mode.
Nitrification enhancement model (Q-learning) of the invention uses Q (St,At) indicate, establishment process is as follows:
Step 1: establishing empty Q-table, and the value of the Q (S, A) in random initializtion table.
Step 2: one state S of initializationt, and establish the mark of an end.The ratio for controlling needs in this algorithm
As end state.If algorithm reaches this state and means that one bout terminates.
Step 3: for each bout, carrying out the S to initialization in the method for ε-greedytChoose the movement executed.When with
It is current to choose the maximum movement of Q value when machine probability>ε, and when random chance<ε, random move also is taken in movement
Make.It can be obtained by the S of next step in this wayt+1And return value R.
Step 4: to existing intensified learning model Q (St,At) be updated, the formula of update is:
Q(St,At)=Q (St,At)+α[Rt+1+γ·maxaQ(St+1,a)-Q(St,At)]
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table
Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped
Valence when going to subsequent time and executing that action (being indicated here with a) of Maximum Value inside all action states
Value.
Step 5: iterate multi-round, until Q-table converges on fixed value.
It should be pointed out that above-mentioned ε-greedy method is it is known in the art that details are not described herein.
The pseudocode of Q-learning algorithm is as follows:
It initializes Q (S, A), final state SterminalThe control ratio of=setting
FOR (bout=10000):
Initialize St
FOR (each bout)
With the method for ε-greedy come to initialization StChoose the movement A executed
The A that takes action is recompensed R and next state St+1
And to Q (St,At) it is updated:
Q(St,At)=Q (St,At)+α〔Rt+1+γ·maxaQ(St+1,a)-Q(St,At)〕
And St+1State as next step
Until St+1=Sterminal
Below with reference to specific example, the present invention is further explained:
Fig. 2 is the statistics to five months quantity on order of past history, and Fig. 3 is the two dimensionization to History Order quantity and cuts
Take distributed area.Then using accumulating rate as state (state), allotment ratio is movement (action), in conjunction with History Order number
The distributed area calculating of amount jumps probability, can make the state transition strategy network such as figure four (wherein with final accumulating rate
Reach 50% as target), intensified learning (Q-learning) algorithm model is established based on tactful network, training obtains output knot
Fruit Q-table, such as Fig. 5.Test is allocated based on Q-table and to history one month order volume, test result such as Fig. 6,
It can be seen that having reached 50% in the end of month accumulation ratio.
The present invention also provides a kind of single device of intelligence based on intensified learning point, described device is comprised the following modules:
State transition probability determination module is configured to calculate state transition probability based on history logistics order;
Intensified learning model building module is configured to make the Order splitting accumulation ratio of one Transportation Model of current time
For state, current time is determined that the allocation proportion for distributing to the Transportation Model as movement, will need at a preset time period end
Accumulation ratio to be achieved terminates ratio as state, establishes complete state transfer control plan using the state transition probability
Sketch form, and establish intensified learning model;
Intensified learning model training module is configured to be trained the intensified learning model strategy until output
Convergence;And
Judgment module is tested, is configured to test the strategy of output using the data of the history logistics order,
And judge whether the Order splitting ratio of the preset time period meets the state and terminate ratio, if it is satisfied, then using described defeated
Strategy out divides folk prescription case as intelligence;If conditions are not met, then continuing the intensified learning model training module that reruns, directly
Order splitting ratio to the preset time period meets state termination ratio.
In one embodiment, the state transition probability determination module is further configured to order the history logistics
It is single to carry out quantity statistics, obtain the distributed area of History Order quantity;And the distributed area based on the History Order quantity, meter
Calculate the state transition probability.
In one embodiment, the intensified learning model is Q-learning model, and the strategy of the output is Q-
table。
In one embodiment, the intensified learning model building module establishes intensified learning model according to following manner:
The intensified learning model uses Q (St,At) indicate, wherein Q (St,At) indicate the state S under t momenttIt executes
Act AtValue;
Establish empty Q-table, and the Q value in random initializtion Q-table;
Init state St, and establish the state and terminate ratio, ratio is terminated with regard to table if algorithm reaches the state
Show that one bout terminates;
For each bout, carry out the S to initialization in the method for ε-greedytThe movement executed is chosen, random chance is worked as
It is current to choose the maximum movement of Q value when greater than ε, and when random chance is less than ε, random move also is taken in movement
Make, to obtain the S of next stept+1With return value R;
To the intensified learning model Q (St,At) be updated;
Iterate multi-round, until Q-table converges on a fixed value.
In one embodiment, the more new formula of the intensified learning model are as follows:
Q(St,At)=Q (St,At)+α〔Rt+1+γ·maxaQ(St+1,a)-Q(St,At)〕
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table
Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped
Value when going to subsequent time and executing that movement of Maximum Value inside all action states.
In one embodiment, in state transfer control scheme list, wherein the state demarcation are as follows: 0,0.1,
0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;The movement divides are as follows: and 0%, 10%, 20%, 30%, 40%,
50%, 60%, 70%, 80%, 90%, 100%.
In one embodiment, it is 50% that the state, which terminates ratio,.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, the computer
The intelligence as described above based on intensified learning, which is executed, when instruction operation divides folk prescription method.
The present invention also provides a kind of logistics system, including memory and processor, being stored on the memory can be
The computer instruction run on the processor, the processor execute base as described above when running the computer instruction
Divide folk prescription method in the intelligence of intensified learning.
To sum up, the present invention solves the real time orders assignment problem in the intelligent dispatching system of Intelligent logistics field.Based on working as
The ratio accumulated before the preceding moment, is allocated, so that it may control order volume in the moon using the strategy that intensified learning is acquired
The ratio that end is set before reaching.It is advantageous that even if the intermediate artificial distribution for having adjusted order, finally, it can automatically
Adjustment ratio is simultaneously required finally reaching.Another advantage of the program is can be the case where not knowing the following quantity on order
Under, suitable distribution is provided, and guarantee to can achieve expected ratio at the end of month.
Here the term and form of presentation used is only intended to describe, and the present invention should not be limited to these terms and table
It states.It is not meant to exclude the equivalent features of any signal and description (or in which part) using these terms and statement, should recognize
Knowing various modifications that may be present should also be included in scope of the claims.Other modifications, variations and alternatives are also likely to be present.
Correspondingly, claim should be regarded as covering all these equivalents.
Equally, it should be pointed out that although the present invention is described with reference to current specific embodiment, this technology neck
Those of ordinary skill in domain it should be appreciated that more than embodiment be intended merely to illustrate the present invention, in no disengaging present invention
Various equivalent change or replacement can be also made in the case where spirit, therefore, as long as right in spirit of the invention
The variation, modification of above-described embodiment will all be fallen in the range of following claims.
Claims (10)
1. a kind of intelligence divides folk prescription method, which is characterized in that the described method includes:
A: being based on history logistics order, calculates state transition probability;
B: it using the Order splitting of one Transportation Model of current time accumulation ratio as state, will determine at current time to distribute to the fortune
The allocation proportion of defeated mode will need accumulation ratio to be achieved to terminate ratio as state as movement at a preset time period end
Example establishes complete state using the state transition probability and shifts control scheme list, and establishes intensified learning model;
C: being trained intensified learning model until the strategy exported is restrained;
D: the strategy of output is tested using the data of the history logistics order, and judges the order of the preset time period
Whether allocation proportion, which meets the state, terminates ratio, if it is satisfied, then the strategy using output divides folk prescription case as intelligence;If
It is unsatisfactory for, then continues to repeat step C-D.
2. intelligence as described in claim 1 divides folk prescription method, which is characterized in that the step A includes:
Quantity statistics are carried out to the history logistics order, obtain the distributed area of History Order quantity;
Based on the distributed area of the History Order quantity, the state transition probability is calculated.
3. intelligence as claimed in claim 2 divides folk prescription method, which is characterized in that carry out quantity statistics to the history logistics order
The step of obtaining the distributed area of History Order quantity include:
To history logistics order data production two dimensionization figure, finds out and be distributed most intensive region, obtain point of the quantity on order
Cloth section.
4. intelligence as described in claim 1 divides folk prescription method, which is characterized in that the intensified learning model is Q-learning mould
Type, the strategy of the output are Q-table.
5. intelligence as claimed in claim 4 divides folk prescription method, which is characterized in that the intensified learning model of establishing includes:
The intensified learning model uses Q (St, At) indicate, wherein Q (St, At) indicate the state S under t momenttExecute movement
AtValue;
Establish empty Q-table, and the Q value in random initializtion Q-table;
Init state St, and establish the state and terminate ratio, if algorithm reaches the state termination ratio and means that one
Bout terminates;
For each bout, carry out the S to initialization in the method for g-greedytThe movement executed is chosen, when random chance is greater than ε
When, it is current to choose the maximum movement of Q value, and when random chance is less than ε, random movement is also taken in movement, with
Obtain the S of next stept+1With return value R;
To the intensified learning model Q (St, At) be updated;
Iterate multi-round, until Q-table converges on a fixed value.
6. intelligence as claimed in claim 5 divides folk prescription method, which is characterized in that the more new formula of the intensified learning model are as follows:
Q(St, At)=Q (St, At)+α〔Rt+1+γ·maxaQ(St+1, a)-Q (St, At)〕
Wherein, Q (St, At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Under expression
The return value at one moment, γ are the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state transition arrives
Subsequent time simultaneously executes value when that of Maximum Value acts inside all action states.
7. intelligence as described in claim 1 divides folk prescription method, which is characterized in that in state transfer control scheme list,
In, the state demarcation are as follows: 0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;The movement divides are as follows: and 0%,
10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%;It is 50% that the state, which terminates ratio,.
8. a kind of single device of intelligence point, which is characterized in that described device includes:
State transition probability determination module is configured to calculate state transition probability based on history logistics order;
Intensified learning model building module is configured to using the Order splitting of one Transportation Model of current time accumulation ratio as shape
Current time is determined that the allocation proportion for distributing to the Transportation Model as movement, will need to reach at a preset time period end by state
The accumulation ratio arrived terminates ratio as state, establishes complete state using the state transition probability and shifts control strategy
Table, and establish intensified learning model;
Intensified learning model training module is configured to be trained the intensified learning model until the strategy of output is received
It holds back;And
Judgment module is tested, is configured to test the strategy of output using the data of the history logistics order, and sentence
Whether the Order splitting ratio for the preset time period of breaking, which meets the state, terminates ratio, if it is satisfied, then using the output
Strategy divides folk prescription case as intelligence;If conditions are not met, then continuing the intensified learning model training module that reruns, until this
The Order splitting ratio of preset time period meets the state and terminates ratio.
9. a kind of computer-readable medium is stored thereon with computer instruction, such as right is executed when the computer instruction is run
It is required that intelligence described in any one of 1-7 divides folk prescription method.
10. a kind of logistics system, including memory and processor, it is stored with and can runs on the processor on the memory
Computer instruction, the processor executes such as intelligence of any of claims 1-7 when running the computer instruction
Folk prescription method can be divided.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910382830.8A CN110111005A (en) | 2019-05-09 | 2019-05-09 | The single method and apparatus of intelligence point, computer-readable medium and logistics system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910382830.8A CN110111005A (en) | 2019-05-09 | 2019-05-09 | The single method and apparatus of intelligence point, computer-readable medium and logistics system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110111005A true CN110111005A (en) | 2019-08-09 |
Family
ID=67488935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910382830.8A Pending CN110111005A (en) | 2019-05-09 | 2019-05-09 | The single method and apparatus of intelligence point, computer-readable medium and logistics system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110111005A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080408A (en) * | 2019-12-06 | 2020-04-28 | 广东工业大学 | Order information processing method based on deep reinforcement learning |
CN112685518A (en) * | 2019-10-18 | 2021-04-20 | 菜鸟智能物流控股有限公司 | Service providing object distribution method, order distribution method and device |
CN113110493A (en) * | 2021-05-07 | 2021-07-13 | 北京邮电大学 | Path planning equipment and path planning method based on photonic neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214731A (en) * | 2017-06-29 | 2019-01-15 | 菜鸟智能物流控股有限公司 | Method and device for distributing logistics orders and computer system |
WO2019041000A1 (en) * | 2017-09-01 | 2019-03-07 | Go People Pty Ltd | An intelligent demand predictive pre-emptive pre-sorting e-commerce order fulfilment, sorting and dispatch system for dispatch routing optimisation |
CN109669452A (en) * | 2018-11-02 | 2019-04-23 | 北京物资学院 | A kind of cloud robot task dispatching method and system based on parallel intensified learning |
CN109725988A (en) * | 2017-10-30 | 2019-05-07 | 北京京东尚科信息技术有限公司 | A kind of method for scheduling task and device |
-
2019
- 2019-05-09 CN CN201910382830.8A patent/CN110111005A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214731A (en) * | 2017-06-29 | 2019-01-15 | 菜鸟智能物流控股有限公司 | Method and device for distributing logistics orders and computer system |
WO2019041000A1 (en) * | 2017-09-01 | 2019-03-07 | Go People Pty Ltd | An intelligent demand predictive pre-emptive pre-sorting e-commerce order fulfilment, sorting and dispatch system for dispatch routing optimisation |
CN109725988A (en) * | 2017-10-30 | 2019-05-07 | 北京京东尚科信息技术有限公司 | A kind of method for scheduling task and device |
CN109669452A (en) * | 2018-11-02 | 2019-04-23 | 北京物资学院 | A kind of cloud robot task dispatching method and system based on parallel intensified learning |
Non-Patent Citations (2)
Title |
---|
吴万国等: "多车型回程车辆调度问题的ADP算法研究", 《计算机应用研究》 * |
彼岸花杀是条狗: "Q-learning", 《HTTPS://WWW.CNBLOGS.COM/YIFDU25/P/8169226.HTML》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112685518A (en) * | 2019-10-18 | 2021-04-20 | 菜鸟智能物流控股有限公司 | Service providing object distribution method, order distribution method and device |
CN112685518B (en) * | 2019-10-18 | 2023-10-20 | 菜鸟智能物流控股有限公司 | Service providing object distribution method, order distribution method and device |
CN111080408A (en) * | 2019-12-06 | 2020-04-28 | 广东工业大学 | Order information processing method based on deep reinforcement learning |
CN111080408B (en) * | 2019-12-06 | 2020-07-21 | 广东工业大学 | Order information processing method based on deep reinforcement learning |
CN113110493A (en) * | 2021-05-07 | 2021-07-13 | 北京邮电大学 | Path planning equipment and path planning method based on photonic neural network |
CN113110493B (en) * | 2021-05-07 | 2022-09-30 | 北京邮电大学 | Path planning equipment and path planning method based on photonic neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110111005A (en) | The single method and apparatus of intelligence point, computer-readable medium and logistics system | |
CN101206801B (en) | Self-adaption traffic control method | |
Maciejewski et al. | Large-scale microscopic simulation of taxi services | |
CN106776005A (en) | A kind of resource management system and method towards containerization application | |
CN107831685B (en) | Group robot control method and system | |
CN109636213A (en) | Order distribution and evaluation method and device, electronic equipment and storage medium | |
CN108449286A (en) | Network bandwidth resources distribution method and device | |
CN109492774A (en) | A kind of cloud resource dispatching method based on deep learning | |
CN109508839A (en) | Order allocation method and device | |
CN105913209A (en) | Warehouse management system, warehouse management method and cargo distribution method | |
CN107506845A (en) | A kind of electricity sales amount Forecasting Methodology and its system based on multi-model fusion | |
Alshamsi et al. | Multiagent self-organization for a taxi dispatch system | |
CN109993377A (en) | A kind of Intelligent worker assigning method | |
CN114841476B (en) | Urban rainwater resource utilization space-time dynamic allocation and transaction method and system | |
CN109343945A (en) | A kind of multitask dynamic allocation method based on contract net algorithm | |
KR102042413B1 (en) | Network optimization system and nethod of public transportation | |
CN110247795A (en) | A kind of cloud net resource service chain method of combination and system based on intention | |
CN112332404A (en) | Intelligent management system and method for heating service | |
CN109345296A (en) | Common people's Travel Demand Forecasting method, apparatus and terminal | |
Etkin et al. | Stochastic programming for improved multiuse reservoir operation in Burkina Faso, West Africa | |
CN117974356B (en) | Water supply allocation method for water supply plant | |
CN115062868A (en) | Pre-polymerization type vehicle distribution path planning method and device | |
KR20140120498A (en) | Smart water production&management system | |
Hapke et al. | A DSS for Ressource—Constrained Project Scheduling under Uncertainty | |
Paolucci et al. | Allocating crude oil supply to port and refinery tanks: a simulation-based decision support system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190809 |
|
RJ01 | Rejection of invention patent application after publication |