Nothing Special   »   [go: up one dir, main page]

CN110111005A - The single method and apparatus of intelligence point, computer-readable medium and logistics system - Google Patents

The single method and apparatus of intelligence point, computer-readable medium and logistics system Download PDF

Info

Publication number
CN110111005A
CN110111005A CN201910382830.8A CN201910382830A CN110111005A CN 110111005 A CN110111005 A CN 110111005A CN 201910382830 A CN201910382830 A CN 201910382830A CN 110111005 A CN110111005 A CN 110111005A
Authority
CN
China
Prior art keywords
state
ratio
intelligence
order
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910382830.8A
Other languages
Chinese (zh)
Inventor
金忠孝
袁彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAIC Anji Logistics Co Ltd
Original Assignee
SAIC Anji Logistics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SAIC Anji Logistics Co Ltd filed Critical SAIC Anji Logistics Co Ltd
Priority to CN201910382830.8A priority Critical patent/CN110111005A/en
Publication of CN110111005A publication Critical patent/CN110111005A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of single method and apparatus of intelligence point, computer-readable medium and logistics systems.This method comprises: A: being based on history logistics order, calculate state transition probability;B: using the Order splitting of one Transportation Model of current time accumulation ratio as state, determine the allocation proportion for distributing to the Transportation Model as movement current time, accumulation ratio to be achieved will be needed to terminate ratio as state at a preset time period end, complete state transfer control scheme list is established using the state transition probability, and establishes intensified learning model;C: being trained intensified learning model until the strategy exported is restrained;D: testing the strategy of output using the data of the history logistics order, and judges whether the Order splitting ratio of the preset time period meets the state and terminate ratio, if it is satisfied, then the strategy using output divides folk prescription case as intelligence;If conditions are not met, then continuing to repeat step C-D.

Description

The single method and apparatus of intelligence point, computer-readable medium and logistics system
Technical field
The present invention relates to logistics systems, more particularly to the single method and apparatus of intelligence point, computer-readable medium and logistics System.
Background technique
Intelligent logistics transport field is the crossing domain of artificial intelligence and logistics field, it is intended to pass through the intelligence in artificial intelligence Can algorithm substitute artificial method to solve the FAQs of logistics field, generally have intelligence divide Dan Wenti, path planning problem, Road junction Plan Problem, vehicle dispatching problem, Warehouse Location problem etc..And the single problem of intelligence point can be divided into: to optimize route For the purpose of intelligent route optimization point it is single, be that purpose intelligent time saving point is single using efficiency, it is intelligent for the purpose of the following accumulation ratio Ratio regulation point list etc..
The related single field of ratio regulation point intelligent for the purpose of the following accumulation ratio, existing point of Intelligent logistics transport field With ratio problems, there are no extraordinary solutions, due to that can not know the concrete condition of the following quantity on order daily, at present Common practices be all to divide list at random according to the experience of oneself by staff, finally be difficult to accomplish to set in advance in the result at the end of month Fixed ratio.
Summary of the invention
The present invention is to solve intelligent ratio regulation divides Dan Wenti for the purpose of the following accumulation ratio.
The present invention provides a kind of intelligence to divide folk prescription method, which comprises
A: being based on history logistics order, calculates state transition probability;
B: it using the Order splitting of one Transportation Model of current time accumulation ratio as state, will determine at current time to distribute to The allocation proportion of the Transportation Model will need accumulation ratio to be achieved to terminate as state as movement at a preset time period end Ratio establishes complete state using the state transition probability and shifts control scheme list, and establishes intensified learning model;
C: being trained intensified learning model until the strategy exported is restrained;
D: the strategy of output is tested using the data of the history logistics order, judges ordering for the preset time period Whether single allocation proportion, which meets the state, terminates ratio, if it is satisfied, then the strategy using output divides folk prescription case as intelligence;Such as Fruit is unsatisfactory for, then continues to repeat step C-D.
In one embodiment, the step A includes:
Quantity statistics are carried out to the history logistics order, obtain the distributed area of History Order quantity;
Based on the distributed area of the History Order quantity, the state transition probability is calculated.
In one embodiment, quantity statistics are carried out to the history logistics order and obtains the distributed area of History Order quantity Between the step of include:
To history logistics order data production two dimensionization figure, finds out and be distributed most intensive region, obtain the quantity on order Distributed area.
In one embodiment, the intensified learning model is Q-learning model, and the strategy of the output is Q- table。
In one embodiment, the intensified learning model of establishing includes:
The intensified learning model uses Q (St,At) indicate, wherein Q (St,At) indicate the state S under t momenttIt executes Act AtValue;
Establish empty Q-table, and the Q value in random initializtion Q-table;
Init state St, and establish the state and terminate ratio, ratio is terminated with regard to table if algorithm reaches the state Show that one bout terminates;
For each bout, carry out the S to initialization in the method for ε-greedytThe movement executed is chosen, random chance is worked as It is current to choose the maximum movement of Q value when greater than ε, and when random chance is less than ε, random move also is taken in movement Make, to obtain the S of next stept+1With return value R;
To the intensified learning model Q (St,At) be updated;
Iterate multi-round, until Q-table converges on a fixed value.
In one embodiment, the more new formula of the intensified learning model are as follows:
Q(St,At)=Q (St,At)+α〔Rt+1+γ·maxaQ(St+1,a)-Q(St,At)〕
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped Value when going to subsequent time and executing that movement of Maximum Value inside all action states.
In one embodiment, in state transfer control scheme list, wherein the state demarcation are as follows: 0,0.1, 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;The movement divides are as follows: and 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
In one embodiment, it is 50% that the state, which terminates ratio,.
The present invention also provides a kind of single device of intelligence point, described device includes:
State transition probability determination module is configured to calculate state transition probability based on history logistics order;
Intensified learning model building module is configured to make the Order splitting accumulation ratio of one Transportation Model of current time For state, current time is determined that the allocation proportion for distributing to the Transportation Model as movement, will need at a preset time period end Accumulation ratio to be achieved terminates ratio as state, establishes complete state transfer control plan using the state transition probability Sketch form, and establish intensified learning model;
Intensified learning model training module is configured to be trained the intensified learning model strategy until output Convergence;And
Judgment module is tested, is configured to test the strategy of output using the data of the history logistics order, And judge whether the Order splitting ratio of the preset time period meets the state and terminate ratio, if it is satisfied, then using described defeated Strategy out divides folk prescription case as intelligence;If conditions are not met, then continuing the intensified learning model training module that reruns, directly Order splitting ratio to the preset time period meets state termination ratio.
In one embodiment, the state transition probability determination module is further configured to order the history logistics It is single to carry out quantity statistics, obtain the distributed area of History Order quantity;And the distributed area based on the History Order quantity, meter Calculate the state transition probability.
In one embodiment, the intensified learning model is Q-learning model, and the strategy of the output is Q- table。
In one embodiment, the intensified learning model building module establishes intensified learning model according to following manner:
The intensified learning model uses Q (St,At) indicate, wherein Q (St,At) indicate the state S under t momenttIt executes Act AtValue;
Establish empty Q-table, and the Q value in random initializtion Q-table;
Init state St, and establish the state and terminate ratio, ratio is terminated with regard to table if algorithm reaches the state Show that one bout terminates;
For each bout, carry out the S to initialization in the method for ε-greedytThe movement executed is chosen, random chance is worked as It is current to choose the maximum movement of Q value when greater than ε, and when random chance is less than ε, random move also is taken in movement Make, to obtain the S of next stept+1With return value R;
To the intensified learning model Q (St,At) be updated;
Iterate multi-round, until Q-table converges on a fixed value.
In one embodiment, the more new formula of the intensified learning model are as follows:
Q(St,At)=Q (St,At)+α〔Rt+1+γ·maxaQ(St+1,a)-Q(St,At)〕
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped Value when going to subsequent time and executing that movement of Maximum Value inside all action states.
In one embodiment, in state transfer control scheme list, wherein the state demarcation are as follows: 0,0.1, 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;The movement divides are as follows: and 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
In one embodiment, it is 50% that the state, which terminates ratio,.
The present invention also provides a kind of computer-readable mediums, are stored thereon with computer instruction, the computer instruction Above-mentioned intelligence is executed when operation divides folk prescription method.
The present invention also provides a kind of logistics system, including memory and processor, being stored on the memory can be The computer instruction run on the processor, the processor are executed when running the computer instruction as above-mentioned intelligence point is single Method.
The present invention solves the real time orders assignment problem in the intelligent dispatching system of Intelligent logistics field.Based on current time The ratio accumulated before is allocated using the strategy that intensified learning is acquired, so that it may be controlled order volume and be reached at the end of month The ratio set before.It is advantageous that even if the intermediate artificial distribution for having adjusted order, finally, it can adjust automatically ratio Example is simultaneously required finally reaching.Another advantage of the program is can to provide in the case where not knowing the following quantity on order Suitable distribution, and guarantee to can achieve expected ratio at the end of month.
Detailed description of the invention
The above summary of the invention of the invention and following specific embodiment can obtain more preferably when reading in conjunction with the drawings Understanding.It should be noted that attached drawing is only used as the example of claimed invention.In the accompanying drawings, identical appended drawing reference Represent same or similar element.
Fig. 1 shows the flow chart that a kind of intelligence according to an embodiment of the invention divides folk prescription method;
Fig. 2 shows the histograms of quantity on order in history five months;
Fig. 3 shows the X-Y scheme of History Order quantity;
Fig. 4 shows state transfer control scheme list according to an embodiment of the invention;
Fig. 5 shows the value table of Q-table after convergence;
Fig. 6 show intelligence according to an embodiment of the invention divide required by folk prescription method add up divide digital ratio example diagram.
Specific embodiment
Describe detailed features and advantage of the invention in detail in a specific embodiment below, content is enough to make any Skilled in the art realises that technology contents of the invention and implementing accordingly, and according to specification disclosed by this specification, power Benefit requires and attached drawing, skilled person readily understands that the relevant purpose of the present invention and advantage.
Intelligent logistics transport field is the crossing domain of artificial intelligence and logistics field, it is intended to pass through the intelligence in artificial intelligence Can algorithm substitute artificial method to solve the FAQs of logistics field, generally have intelligence divide Dan Wenti, path planning problem, Road junction Plan Problem, vehicle dispatching problem, Warehouse Location problem etc..And the single problem of intelligence point can be divided into: to optimize route For the purpose of intelligent route optimization point it is single, be that purpose intelligent time saving point is single using efficiency, it is intelligent for the purpose of the following accumulation ratio Ratio regulation point list etc..
About the single field of ratio regulation point intelligent for the purpose of the following accumulation ratio, existing point of Intelligent logistics transport field With ratio problems, there are no extraordinary solutions, due to that can not know the concrete condition of the following quantity on order daily, at present Common practices be all to divide list at random according to the experience of oneself by staff, finally be difficult to accomplish to set in advance in the result at the end of month Fixed ratio.
The present invention relates to machine learning and Intelligent logistics transport fields, are suitable for highway, two kinds of water route Transportation Model Under, real-time monitoring is carried out to daily order, so that the allocation proportion of both the end of month statistics can achieve fixed proportion requirement.
Specifically, the present invention is distributed feelings using the Q-learning algorithm in intensified learning, by statistical history order Condition, and actual point of single operation is combined, the single mathematical model of the intelligence based on intensified learning point is established, by making computer oneself anti- Go back to school habit, finally converge to optimal strategy, below when specifically dividing single operation, execute the plan that machine oneself learns Slightly, entire point of single process is completed, so that the ratio of the highway at the end of month, water route reaches preset ratio.
The present invention daily order volume of analysis of history data first, and it is counted, find the one of daily order volume A distributed area.By this section we can according to current time water route (or highway) in allocation history order institute The accumulation ratio occupied is extrapolated and is allocated according to different allocation proportions to water route (or highway) in today, water route (or Person's highway) accumulation ratio variation.
The present invention establishes the model of intensified learning by the method in conjunction with intensified learning Q-learning, by current time water The Order splitting accumulation ratio on road (or highway) is as state.It will determine at current time to distribute to water route (or highway) Percentage is as action, and it is general that the historical statistical information by state and action, and before utilizing estimates state transition Rate can establish complete state transfer control scheme list.Then, using the ratio for needing the end of month to reach as the knot for the state that jumps Beam spot is denoted as end, and machine oneself study is allowed to obtain Q (S, A) to control distribution.The more new formula of learning algorithm are as follows:
Q(St,At)=Q (St,At)+α[Rt+1+γ·maxaQ(St+1,a)-Q(St,At)]
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped Valence when going to subsequent time and executing that action (being indicated here with a) of Maximum Value inside all action states Value.
The nitrification enhancement in the present invention is explained in detail with reference to the accompanying drawing, including for statistical analysis to historical data It obtains state transition probability, by carrying out division modeling to accumulation ratio and allocation proportion, then utilizes the method for intensified learning Model is trained and obtains allocation strategy, and gives this strategy and completes distribution.
Fig. 1 shows the flow chart that a kind of intelligence based on intensified learning according to an embodiment of the invention divides folk prescription method.
Step 101: quantity statistics are carried out to logistical histories order.In one embodiment, feelings are distributed to the order of history Condition is counted, and order volume is done to the figure of two dimensionization, is found out and is distributed most intensive region, obtains the distributed area of quantity on order Between, and it is counted.
For example, can refer to Fig. 2 and Fig. 3, wherein Fig. 2 is the histogram to five months quantity on order of past history, horizontal seat It is designated as time (unit: day), ordinate is the quantity (unit: a) of order;Fig. 3 is the two dimensionization to History Order quantity and cuts Take distributed area.
Step 102: being based on distributed area, calculate state transition probability.For example, based on the distributed area that step 101 obtains, It can extrapolate according to the accumulation ratio that current time water route (or highway) is occupied in allocation history order in today The variation of the accumulation ratio in water route (or highway) is allocated to water route (perhaps highway) according to different allocation proportions, is obtained To state transition probability.
Step 103: intensified learning (Q-learning) model is based on, by the order of current time water route (or highway) point With accumulation ratio as state (state), the percentage for distributing to water route (or highway) will be determined as movement at current time (action), and the state transition probability that estimates of historical statistical information before utilizing, complete state transfer control is established Then the accumulation ratio for needing the end of month to reach is denoted as by Policy Table's (i.e. model strategy table) as the end point for the state that jumps End establishes intensified learning (Q-learning) model, with will pass through study obtain Q (S, A) control distribution.
As an example, Fig. 4 shows state transfer control scheme list according to the present invention, wherein using accumulating rate as shape State (state) divides are as follows: 0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;It is movement with allotment ratio (action), it divides are as follows: 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%;And with most Whole accumulating rate reaches 50% as target.
Step 104: the model of intensified learning (Q-learning) algorithm is trained.
Step 105: constantly training being carried out to the model of the Q-learning algorithm of intensified learning until the tactful Q- exported Table convergence.In one embodiment, Fig. 5 shows the value table of the Q-table after convergence.
Step 106: the tactful Q-table of output being tested using the data of history.
Step 107: judging whether the Order splitting ratio of every month meets preset state and terminate ratio, if full Foot, thens follow the steps 108;If conditions are not met, then continuing to execute step 104 until meeting.
Step 108: the tactful Q-table used is as distribution uses the implementation strategy needed daily later.Based on working as The ratio of preceding order chooses the corresponding maximum allocation proportion of Q-table value as allocation plan.
It should be appreciated by those skilled in the art that be the two kinds of Transportation Models in water route and highway involved in example of the invention, But the present invention is not limited to both Transportation Models, and in the case where being expected, the present disclosure additionally applies for other kinds of fortune Defeated mode.
Nitrification enhancement model (Q-learning) of the invention uses Q (St,At) indicate, establishment process is as follows:
Step 1: establishing empty Q-table, and the value of the Q (S, A) in random initializtion table.
Step 2: one state S of initializationt, and establish the mark of an end.The ratio for controlling needs in this algorithm As end state.If algorithm reaches this state and means that one bout terminates.
Step 3: for each bout, carrying out the S to initialization in the method for ε-greedytChoose the movement executed.When with It is current to choose the maximum movement of Q value when machine probability>ε, and when random chance<ε, random move also is taken in movement Make.It can be obtained by the S of next step in this wayt+1And return value R.
Step 4: to existing intensified learning model Q (St,At) be updated, the formula of update is:
Q(St,At)=Q (St,At)+α[Rt+1+γ·maxaQ(St+1,a)-Q(St,At)]
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped Valence when going to subsequent time and executing that action (being indicated here with a) of Maximum Value inside all action states Value.
Step 5: iterate multi-round, until Q-table converges on fixed value.
It should be pointed out that above-mentioned ε-greedy method is it is known in the art that details are not described herein.
The pseudocode of Q-learning algorithm is as follows:
It initializes Q (S, A), final state SterminalThe control ratio of=setting
FOR (bout=10000):
Initialize St
FOR (each bout)
With the method for ε-greedy come to initialization StChoose the movement A executed
The A that takes action is recompensed R and next state St+1
And to Q (St,At) it is updated:
Q(St,At)=Q (St,At)+α〔Rt+1+γ·maxaQ(St+1,a)-Q(St,At)〕
And St+1State as next step
Until St+1=Sterminal
Below with reference to specific example, the present invention is further explained:
Fig. 2 is the statistics to five months quantity on order of past history, and Fig. 3 is the two dimensionization to History Order quantity and cuts Take distributed area.Then using accumulating rate as state (state), allotment ratio is movement (action), in conjunction with History Order number The distributed area calculating of amount jumps probability, can make the state transition strategy network such as figure four (wherein with final accumulating rate Reach 50% as target), intensified learning (Q-learning) algorithm model is established based on tactful network, training obtains output knot Fruit Q-table, such as Fig. 5.Test is allocated based on Q-table and to history one month order volume, test result such as Fig. 6, It can be seen that having reached 50% in the end of month accumulation ratio.
The present invention also provides a kind of single device of intelligence based on intensified learning point, described device is comprised the following modules:
State transition probability determination module is configured to calculate state transition probability based on history logistics order;
Intensified learning model building module is configured to make the Order splitting accumulation ratio of one Transportation Model of current time For state, current time is determined that the allocation proportion for distributing to the Transportation Model as movement, will need at a preset time period end Accumulation ratio to be achieved terminates ratio as state, establishes complete state transfer control plan using the state transition probability Sketch form, and establish intensified learning model;
Intensified learning model training module is configured to be trained the intensified learning model strategy until output Convergence;And
Judgment module is tested, is configured to test the strategy of output using the data of the history logistics order, And judge whether the Order splitting ratio of the preset time period meets the state and terminate ratio, if it is satisfied, then using described defeated Strategy out divides folk prescription case as intelligence;If conditions are not met, then continuing the intensified learning model training module that reruns, directly Order splitting ratio to the preset time period meets state termination ratio.
In one embodiment, the state transition probability determination module is further configured to order the history logistics It is single to carry out quantity statistics, obtain the distributed area of History Order quantity;And the distributed area based on the History Order quantity, meter Calculate the state transition probability.
In one embodiment, the intensified learning model is Q-learning model, and the strategy of the output is Q- table。
In one embodiment, the intensified learning model building module establishes intensified learning model according to following manner:
The intensified learning model uses Q (St,At) indicate, wherein Q (St,At) indicate the state S under t momenttIt executes Act AtValue;
Establish empty Q-table, and the Q value in random initializtion Q-table;
Init state St, and establish the state and terminate ratio, ratio is terminated with regard to table if algorithm reaches the state Show that one bout terminates;
For each bout, carry out the S to initialization in the method for ε-greedytThe movement executed is chosen, random chance is worked as It is current to choose the maximum movement of Q value when greater than ε, and when random chance is less than ε, random move also is taken in movement Make, to obtain the S of next stept+1With return value R;
To the intensified learning model Q (St,At) be updated;
Iterate multi-round, until Q-table converges on a fixed value.
In one embodiment, the more new formula of the intensified learning model are as follows:
Q(St,At)=Q (St,At)+α〔Rt+1+γ·maxaQ(St+1,a)-Q(St,At)〕
Wherein, Q (St,At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Table Show the return value of subsequent time, γ is the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state is jumped Value when going to subsequent time and executing that movement of Maximum Value inside all action states.
In one embodiment, in state transfer control scheme list, wherein the state demarcation are as follows: 0,0.1, 0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;The movement divides are as follows: and 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
In one embodiment, it is 50% that the state, which terminates ratio,.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, the computer The intelligence as described above based on intensified learning, which is executed, when instruction operation divides folk prescription method.
The present invention also provides a kind of logistics system, including memory and processor, being stored on the memory can be The computer instruction run on the processor, the processor execute base as described above when running the computer instruction Divide folk prescription method in the intelligence of intensified learning.
To sum up, the present invention solves the real time orders assignment problem in the intelligent dispatching system of Intelligent logistics field.Based on working as The ratio accumulated before the preceding moment, is allocated, so that it may control order volume in the moon using the strategy that intensified learning is acquired The ratio that end is set before reaching.It is advantageous that even if the intermediate artificial distribution for having adjusted order, finally, it can automatically Adjustment ratio is simultaneously required finally reaching.Another advantage of the program is can be the case where not knowing the following quantity on order Under, suitable distribution is provided, and guarantee to can achieve expected ratio at the end of month.
Here the term and form of presentation used is only intended to describe, and the present invention should not be limited to these terms and table It states.It is not meant to exclude the equivalent features of any signal and description (or in which part) using these terms and statement, should recognize Knowing various modifications that may be present should also be included in scope of the claims.Other modifications, variations and alternatives are also likely to be present. Correspondingly, claim should be regarded as covering all these equivalents.
Equally, it should be pointed out that although the present invention is described with reference to current specific embodiment, this technology neck Those of ordinary skill in domain it should be appreciated that more than embodiment be intended merely to illustrate the present invention, in no disengaging present invention Various equivalent change or replacement can be also made in the case where spirit, therefore, as long as right in spirit of the invention The variation, modification of above-described embodiment will all be fallen in the range of following claims.

Claims (10)

1. a kind of intelligence divides folk prescription method, which is characterized in that the described method includes:
A: being based on history logistics order, calculates state transition probability;
B: it using the Order splitting of one Transportation Model of current time accumulation ratio as state, will determine at current time to distribute to the fortune The allocation proportion of defeated mode will need accumulation ratio to be achieved to terminate ratio as state as movement at a preset time period end Example establishes complete state using the state transition probability and shifts control scheme list, and establishes intensified learning model;
C: being trained intensified learning model until the strategy exported is restrained;
D: the strategy of output is tested using the data of the history logistics order, and judges the order of the preset time period Whether allocation proportion, which meets the state, terminates ratio, if it is satisfied, then the strategy using output divides folk prescription case as intelligence;If It is unsatisfactory for, then continues to repeat step C-D.
2. intelligence as described in claim 1 divides folk prescription method, which is characterized in that the step A includes:
Quantity statistics are carried out to the history logistics order, obtain the distributed area of History Order quantity;
Based on the distributed area of the History Order quantity, the state transition probability is calculated.
3. intelligence as claimed in claim 2 divides folk prescription method, which is characterized in that carry out quantity statistics to the history logistics order The step of obtaining the distributed area of History Order quantity include:
To history logistics order data production two dimensionization figure, finds out and be distributed most intensive region, obtain point of the quantity on order Cloth section.
4. intelligence as described in claim 1 divides folk prescription method, which is characterized in that the intensified learning model is Q-learning mould Type, the strategy of the output are Q-table.
5. intelligence as claimed in claim 4 divides folk prescription method, which is characterized in that the intensified learning model of establishing includes:
The intensified learning model uses Q (St, At) indicate, wherein Q (St, At) indicate the state S under t momenttExecute movement AtValue;
Establish empty Q-table, and the Q value in random initializtion Q-table;
Init state St, and establish the state and terminate ratio, if algorithm reaches the state termination ratio and means that one Bout terminates;
For each bout, carry out the S to initialization in the method for g-greedytThe movement executed is chosen, when random chance is greater than ε When, it is current to choose the maximum movement of Q value, and when random chance is less than ε, random movement is also taken in movement, with Obtain the S of next stept+1With return value R;
To the intensified learning model Q (St, At) be updated;
Iterate multi-round, until Q-table converges on a fixed value.
6. intelligence as claimed in claim 5 divides folk prescription method, which is characterized in that the more new formula of the intensified learning model are as follows:
Q(St, At)=Q (St, At)+α〔Rt+1+γ·maxaQ(St+1, a)-Q (St, At)〕
Wherein, Q (St, At) indicate the state S under t momenttExecution acts AtValue, α indicate renewal rate, Rt+1Under expression The return value at one moment, γ are the discount factor to the cost function of subsequent time, maxaQ(St+1, a) indicate that state transition arrives Subsequent time simultaneously executes value when that of Maximum Value acts inside all action states.
7. intelligence as described in claim 1 divides folk prescription method, which is characterized in that in state transfer control scheme list, In, the state demarcation are as follows: 0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1;The movement divides are as follows: and 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%;It is 50% that the state, which terminates ratio,.
8. a kind of single device of intelligence point, which is characterized in that described device includes:
State transition probability determination module is configured to calculate state transition probability based on history logistics order;
Intensified learning model building module is configured to using the Order splitting of one Transportation Model of current time accumulation ratio as shape Current time is determined that the allocation proportion for distributing to the Transportation Model as movement, will need to reach at a preset time period end by state The accumulation ratio arrived terminates ratio as state, establishes complete state using the state transition probability and shifts control strategy Table, and establish intensified learning model;
Intensified learning model training module is configured to be trained the intensified learning model until the strategy of output is received It holds back;And
Judgment module is tested, is configured to test the strategy of output using the data of the history logistics order, and sentence Whether the Order splitting ratio for the preset time period of breaking, which meets the state, terminates ratio, if it is satisfied, then using the output Strategy divides folk prescription case as intelligence;If conditions are not met, then continuing the intensified learning model training module that reruns, until this The Order splitting ratio of preset time period meets the state and terminates ratio.
9. a kind of computer-readable medium is stored thereon with computer instruction, such as right is executed when the computer instruction is run It is required that intelligence described in any one of 1-7 divides folk prescription method.
10. a kind of logistics system, including memory and processor, it is stored with and can runs on the processor on the memory Computer instruction, the processor executes such as intelligence of any of claims 1-7 when running the computer instruction Folk prescription method can be divided.
CN201910382830.8A 2019-05-09 2019-05-09 The single method and apparatus of intelligence point, computer-readable medium and logistics system Pending CN110111005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910382830.8A CN110111005A (en) 2019-05-09 2019-05-09 The single method and apparatus of intelligence point, computer-readable medium and logistics system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910382830.8A CN110111005A (en) 2019-05-09 2019-05-09 The single method and apparatus of intelligence point, computer-readable medium and logistics system

Publications (1)

Publication Number Publication Date
CN110111005A true CN110111005A (en) 2019-08-09

Family

ID=67488935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910382830.8A Pending CN110111005A (en) 2019-05-09 2019-05-09 The single method and apparatus of intelligence point, computer-readable medium and logistics system

Country Status (1)

Country Link
CN (1) CN110111005A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080408A (en) * 2019-12-06 2020-04-28 广东工业大学 Order information processing method based on deep reinforcement learning
CN112685518A (en) * 2019-10-18 2021-04-20 菜鸟智能物流控股有限公司 Service providing object distribution method, order distribution method and device
CN113110493A (en) * 2021-05-07 2021-07-13 北京邮电大学 Path planning equipment and path planning method based on photonic neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214731A (en) * 2017-06-29 2019-01-15 菜鸟智能物流控股有限公司 Method and device for distributing logistics orders and computer system
WO2019041000A1 (en) * 2017-09-01 2019-03-07 Go People Pty Ltd An intelligent demand predictive pre-emptive pre-sorting e-commerce order fulfilment, sorting and dispatch system for dispatch routing optimisation
CN109669452A (en) * 2018-11-02 2019-04-23 北京物资学院 A kind of cloud robot task dispatching method and system based on parallel intensified learning
CN109725988A (en) * 2017-10-30 2019-05-07 北京京东尚科信息技术有限公司 A kind of method for scheduling task and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214731A (en) * 2017-06-29 2019-01-15 菜鸟智能物流控股有限公司 Method and device for distributing logistics orders and computer system
WO2019041000A1 (en) * 2017-09-01 2019-03-07 Go People Pty Ltd An intelligent demand predictive pre-emptive pre-sorting e-commerce order fulfilment, sorting and dispatch system for dispatch routing optimisation
CN109725988A (en) * 2017-10-30 2019-05-07 北京京东尚科信息技术有限公司 A kind of method for scheduling task and device
CN109669452A (en) * 2018-11-02 2019-04-23 北京物资学院 A kind of cloud robot task dispatching method and system based on parallel intensified learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴万国等: "多车型回程车辆调度问题的ADP算法研究", 《计算机应用研究》 *
彼岸花杀是条狗: "Q-learning", 《HTTPS://WWW.CNBLOGS.COM/YIFDU25/P/8169226.HTML》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685518A (en) * 2019-10-18 2021-04-20 菜鸟智能物流控股有限公司 Service providing object distribution method, order distribution method and device
CN112685518B (en) * 2019-10-18 2023-10-20 菜鸟智能物流控股有限公司 Service providing object distribution method, order distribution method and device
CN111080408A (en) * 2019-12-06 2020-04-28 广东工业大学 Order information processing method based on deep reinforcement learning
CN111080408B (en) * 2019-12-06 2020-07-21 广东工业大学 Order information processing method based on deep reinforcement learning
CN113110493A (en) * 2021-05-07 2021-07-13 北京邮电大学 Path planning equipment and path planning method based on photonic neural network
CN113110493B (en) * 2021-05-07 2022-09-30 北京邮电大学 Path planning equipment and path planning method based on photonic neural network

Similar Documents

Publication Publication Date Title
CN110111005A (en) The single method and apparatus of intelligence point, computer-readable medium and logistics system
CN101206801B (en) Self-adaption traffic control method
Maciejewski et al. Large-scale microscopic simulation of taxi services
CN106776005A (en) A kind of resource management system and method towards containerization application
CN107831685B (en) Group robot control method and system
CN109636213A (en) Order distribution and evaluation method and device, electronic equipment and storage medium
CN108449286A (en) Network bandwidth resources distribution method and device
CN109492774A (en) A kind of cloud resource dispatching method based on deep learning
CN109508839A (en) Order allocation method and device
CN105913209A (en) Warehouse management system, warehouse management method and cargo distribution method
CN107506845A (en) A kind of electricity sales amount Forecasting Methodology and its system based on multi-model fusion
Alshamsi et al. Multiagent self-organization for a taxi dispatch system
CN109993377A (en) A kind of Intelligent worker assigning method
CN114841476B (en) Urban rainwater resource utilization space-time dynamic allocation and transaction method and system
CN109343945A (en) A kind of multitask dynamic allocation method based on contract net algorithm
KR102042413B1 (en) Network optimization system and nethod of public transportation
CN110247795A (en) A kind of cloud net resource service chain method of combination and system based on intention
CN112332404A (en) Intelligent management system and method for heating service
CN109345296A (en) Common people&#39;s Travel Demand Forecasting method, apparatus and terminal
Etkin et al. Stochastic programming for improved multiuse reservoir operation in Burkina Faso, West Africa
CN117974356B (en) Water supply allocation method for water supply plant
CN115062868A (en) Pre-polymerization type vehicle distribution path planning method and device
KR20140120498A (en) Smart water production&amp;management system
Hapke et al. A DSS for Ressource—Constrained Project Scheduling under Uncertainty
Paolucci et al. Allocating crude oil supply to port and refinery tanks: a simulation-based decision support system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190809

RJ01 Rejection of invention patent application after publication