Specific embodiment
The application is described in further detail with reference to the accompanying drawing.
In a typical configuration of this application, terminal, the equipment of service network include one or more processors
(CPU), input/output interface, network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media, can be by any side
Method or technology realize that information stores.Information can be the device or other numbers of computer readable instructions, data structure, program
According to.The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory
(SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM (CD-
ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storages
Equipment or any other non-transmission medium, can be used for storage can be accessed by a computing device information.
The embodiment of the present application provides a kind of test method of application program, this process employs the mode of intensified learning,
The reward value table constantly updated is enabled to influence the decision of subsequent selection operation routing information, target can be reached by strengthening to choose
The probability of the courses of action of the page, therefore the courses of action for reaching target pages can be more efficiently explored automatically.
In actual scene, the executing subject of this method can be user equipment, the network equipment or user equipment and net
Network equipment is integrated constituted equipment by network, is furthermore also possible to run on the program in above equipment.The user
Equipment includes but is not limited to all kinds of terminal devices such as computer, mobile phone, tablet computer;The network equipment includes but is not limited to such as
Network host, single network server, multiple network server collection or set of computers based on cloud computing etc. are realized.Here,
Cloud is made of a large amount of hosts or network server for being based on cloud computing (Cloud Computing), wherein cloud computing is distributed
One kind of calculating, a virtual machine consisting of a loosely coupled set of computers.
In a kind of test method of application program provided by the embodiments of the present application, it can execute and train process at least once, and
The courses of action acquisition of information test result according to needed for reaching target pages in the trained process.Wherein, test result is
To reach the courses of action between target pages by the initial page in training process, i.e. jumping relationship and be between the two
Realize that these are jumped and on each page to movement performed by element-specific.
Assuming that target pages are PAGE4, initial page PAGE1, a courses of action of PAGE4 are reached from PAGE1 are as follows:
PAGE1 (next step) → PAGE2 (permission) → PAGE3 (login) wherein, PAGE1 (next step) is a courses of action
Information indicates to click " next step " in PAGE1, will then jump to PAGE2.Wherein, PAGE1 is a page, is clicked
It is acted for one, is in next step an element in PAGE1.Then, continue to click " permission " in PAGE2, jump to
PAGE3 clicks " login " in PAGE3, jumps to target pages PAGE4.After setting target pages actually required, base
In exploring courses of action information when reaching target pages, the operation road that target pages are reached from a certain initial page can be determined
Diameter.
Fig. 1 shows a trained process in the embodiment of the present application, and each trained process includes following processing step:
Step S101 is based on reward value table, the courses of action information being chosen in current page.
The reward value table can be used for being recorded in reward value corresponding to element execution movement in the page, wherein prize
Reward value table is denoted as Q_table, and the page is denoted as state, and the element in the page is denoted as element, and movement is denoted as action, reward value
It is denoted as r.
The element can be in the page it is all kinds of may can for the content of user's operation, usually comprising text or
All kinds of labels or button of image, Fig. 2 shows the schematic diagram of a page in a certain application program, element therein
Including content corresponding to box 201~205, respectively Language, TM, appXXXX, account login, new user's registration this
Five element.In some embodiments of the present application, can using OCR (Optical Character Recognition,
Optical character identification) technology detects the current page, so that word content is identified in the page, as under the page
Each element.Action refers to the corresponding movement of all types of user operation, such as can be click, sliding, long-pressing, return
Deng.
In actual scene, each different page can be usually identified using the member for including in the page.For example,
After the element for including in OCR technique identification current page, the set of all elements under current page can be obtained
(element1, element2, element3 ...) then believes the mark that the recognition result of each element is spliced into the page
Breath.By taking the page shown in Fig. 2 as an example, identify that the collection of the element of acquisition is combined into that (Language, TM, appXXXX, account are stepped on
Enter, new user's registration), it can be spliced into " Language_TM_appXXXX_ account login _ new user's registration ", to make
The identification informations such as title, index for the state.
In some embodiments of the present application, Q_table can be indicated using a three-dimensional data structure, that is, passed through
These three parameters of state, element and action can determine r.In the embodiment of the present application, Q_table can use Hash table
Form exist, (key) key of Hash table is state, and value (value) is the two-dimensional matrix that element and action is constituted.Example
Such as, in a Q_table of the embodiment of the present application, under state shown in Fig. 2, the Two-Dimensional Moment of element and action composition
Battle array is as shown in table 1:
|
Action1 (click) |
Action2 (sliding) |
Action3 (return) |
… |
element1(Language) |
r[0,0] |
r[0,1] |
r[0,2] |
… |
element2(appXXXX) |
r[1,0] |
r[1,1] |
r[1,2] |
… |
Element3 (new user's registration) |
r[2,0] |
r[2,1] |
r[2,2] |
… |
… |
… |
… |
… |
… |
Table 1Q_table
As a result, at some state, the corresponding r of some action is executed on some element, can be denoted as Q_
table[state].loc(element,action)。
The courses of action information can be denoted as Action, indicate the target action executed to the object element in the page.
It in some embodiments of the present application, can equally be indicated using a three-dimensional data structure, the parameter point of three dimensions
It Wei not state, element and action.For example, table 2 shows what element and action under state shown in Fig. 2 was constituted
Two-dimensional matrix:
|
Action1 (click) |
Action2 (sliding) |
Action3 (return) |
… |
element1(Language) |
Action[0,0] |
Action[0,1] |
Action[0,2] |
… |
element2(appXXXX) |
Action[1,0] |
Action[1,1] |
Action[1,2] |
… |
Element3 (new user's registration) |
Action[2,0] |
Action[2,1] |
Action[2,2] |
… |
… |
… |
… |
… |
… |
Table 2Action
Based on reward value table, when the courses of action information being chosen in current page, selection operation path letter can be set
The reward value of the target action executed represented by the probability of breath and the courses of action information, to the object element in current page
It is positively correlated.For example, reward value r is bigger, then it is higher to be selected probability by courses of action information Action corresponding to its reward value.
For the state corresponding to the Tables 1 and 2, if wherein Q_table (state) .loc [element3 (new user's registration),
Action1 (click)], i.e. the value of r [2,0] is maximum, then the probability that its corresponding courses of action information Action [2,0] is selected
Can be higher, it is equivalent to when making a policy in face of the page shown in Fig. 2, to element therein " new user's registration " progress " click "
Probability highest.
In some embodiments of the present application, can by the way of following selection operation routing information: firstly, default
A random number is generated in numberical range, such as can produce the random number random number between one 0 to 1.Then should
Random number chooses compared with preset greediness degree greedy according to comparison result in different ways
Courses of action information.Wherein, greedy degree can be a preset numerical value, the value range and generation random number of the numerical value
Numberical range it is consistent, such as in this embodiment can between 0 to 1 value.When random number is more than or equal to
When greedy, determines and choose maximum first reward value in current page in reward value table, and according to first reward value
Determine corresponding courses of action information;When random number is less than greedy, determines to randomly select in reward value table and work as
The second reward value in the preceding page, and corresponding courses of action information is determined according to second reward value.
For example, the random number of generation is 0.7, it follows that randomnumber > if greediness degree is set as 0.5
Greedy, can first determine at this time in reward value table choose current page in maximum first reward value, if by taking table 2 as an example,
In maximum first reward value be r [2,0], thus choose its corresponding courses of action information Action [2,0], i.e., in Fig. 2 institute
Element " new user's registration " in the page shown is clicked.In actual scene, if in Q_table under a certain state not
The maximum reward value of existence anduniquess can then randomly select among multiple maximum reward values, furthermore in the initial state, Q_
Reward value in table can be initialized as 0, can not also choose unique maximum reward value at this time, can also be using random
The mode of selection.If greediness degree is set as 0.5, the random number of generation is 0.3, then random number < greedy at this time, meeting
Determine the second reward value randomly selected in current page in reward value table, if randomly selected at this time by taking table 2 as an example second
Reward value is r [0,0], thus can choose its corresponding courses of action information Action [0,0], i.e., in the page shown in Fig. 2
Element " Language " clicked.
Step S102 acts the object element performance objective of current page, the page after identification execution, and obtains the page
Variable condition.In actual scene, after the object element performance objective movement to current page, the page may become
Change, such as jump to other pages or exited application program etc., it is also possible to it does not change.In the case where according to difference,
Have different page variable conditions.
When determining page variable condition, need to identify the page after executing, then by itself and the page before being not carried out
It is compared, wherein mark information of the element for including in the page as the page can be used by executing the page of front and back, passes through
Compare whether the element for including is identical to judge whether the page is changed, so that it is determined that page variable condition.
In some embodiments of the present application, the page after executing is identified, can also use OCR technique, it then will identification
To element be spliced into the mark information of the page, to be compared.For example, if the page before executing is denoted as state, after execution
The page is denoted as state_, if the mark information of state is " Language_TM_appXXXX_ account login _ new user's registration ",
The mark information of state_ is also " Language_TM_appXXXX_ account login _ new user's registration ", it may be considered that the page
It does not change;Conversely, if mark information is different, it may be considered that the page is changed.
In actual scene, it is identical that the different pages of two of application program are possible to word content, but is laid out (such as text
Position, size, color etc.) it is different in the case where.As a result, in order to further increase accuracy, other of the application are implemented
In example, in recognition element, other than identifying word content, position, size, color of text etc. can also be further identified
Additional information splices the mark information as state, together as the supplementary features of element so as to improve the judgement page
The accuracy of variable condition.
Step S103, according to the page variable condition, to the corresponding award of courses of action information described in reward value table
Value is updated.
It, can be first according to the page when being updated to the corresponding reward value of courses of action information described in reward value table
Face variable condition determines award updated value, corresponding to courses of action information described in reward value table then according to award updated value
Reward value be updated.Since award updated value will affect the later reward value table of a decision, and reward value table is conduct
The decision-making foundation of selection operation routing information, by reward value guidance selection as a result, being based on intensified learning
The principle of (reinforcement learning) is needed to correct Action (the i.e. courses of action letter of arrival target pages
Breath) strengthened, therefore in the embodiment of the present application, the page variable condition of target pages is changed and reached for the page
Highest award updated value can be set, thus strengthen correct Action.
In some embodiments of the present application, the page variable condition is in addition to the page above-mentioned changes and is not up to
It can also include that the page change and reach target pages, jump out application program and the page does not occur except target pages
Variation.The sequence of its corresponding reward updated value, from high to low successively are as follows: the page changes and reaches the target pages > page
It changes and miss the mark page > jumps out the application program > page and do not change.
For example, it is as follows to set award updated value R in the present embodiment: the page changes and reaches target pages, i.e.,
State_=target pages then award updated value R=5;The page changes and the miss the mark page, i.e. state_!=
State, and state_ ≠ target pages, then award updated value R=0;Jump out application program, i.e. boundary other than state_=app
Updated value R=-5 is then awarded in face;The page does not change, i.e. state_=state, then awards updated value R=-100.
In order to enable the update of reward value table can more accurately reflect the reinforcing to correct behavior, the embodiment of the present application
According to award updated value, when being updated to the corresponding reward value of courses of action information described in reward value table, Ke Yixian
Known after being executed according to reward value corresponding to award updated value, the courses of action information and according to courses of action information
The maximum reward value for the page being clipped to, determines changing value.Wherein, reward value corresponding to the courses of action information can be denoted as
In q_predict, as Q_table, by under the page state and the state before being executed based on courses of action information
R value determined by three object element, target action dimensions.And the page recognized after being executed according to courses of action information
Maximum reward value can be denoted as q_target, the page in as Q_table, before being executed based on courses of action information
Maximum r value in state_.It, can be using following public when determining changing value delta according to R, q_predict and q_target
Formula calculates:
Delta=R+gamma (q_target-q_predict)
Wherein, gamma is the dough softening of the pre-set value between 0-1.
After calculating changing value, it can be based on the changing value, to courses of action information pair described in reward value table
The reward value answered is updated.When being updated to r, following calculation can be used:
R=Q_table [state] .loc (element, action)=Q_table [state] .loc (element,
action)+lr·delta
Wherein, lr is learning rate, can be determined according to the demand of test speed, the update of the more high then reward value table of learning rate
Speed is faster.
Step S104 judges whether to reach target pages or jumps out application program, if it has not, repeating above-mentioned processing
Step, if it is, terminating this training process.
Since the page at this time is already based on the page after courses of action information executes, carried out according to the case where page
Judgement, if judging result be it is no, both do not reached target pages or do not jumped out application program, then again from step S101 continuation it is next
The processing of wheel, if the determination result is YES, the page have arrived at target pages or jump out application program, then this training stream of basis
Journey, according to preset frequency of training, the training process continued next time either terminates this test.
After Fig. 3 shows use scheme provided by the embodiments of the present application 10 trained processes (epoch) of execution, Q_
The what be new of table, wherein action includes click (click) and back (return), each in word segment representation page
A element, axisis indicate the coordinate position of element, and 8 pages are respectively PAGE1~PAGE8, the page object of setting
Face is PAGE8.The courses of action information according to needed for reaching target pages in training process, can determine from initial page
A courses of action of PAGE1 arrival target pages PAGE8 are as follows: PAGE1 (next step) → PAGE2 (permission) → PAGE3 (permits
Perhaps) → PAGE3 (refusal) → PAGE7 (account is logined).It may include place as shown in Figure 4 when starting a new epoch
Manage process:
Step S401 gets the current state of application program by OCR technique, such as at the beginning of when the Application testing
The beginning page is PAGE1, wherein following element can be recognized: " memory space ", " this setting will not touch or read with
This ", " electricity consumption right of speech limit determine my phone numbers ", " appXXXX will not dial other phones ", " upper call, will not read
Address list news ", " next step " can obtain the identification information of state (PAGE1) by splicing above-mentioned element are as follows: storage is empty
Between _ this setting will not be touched or be read and sheet _ electricity consumption right of speech limits and determines that my phone numbers _ appXXXX will not dial other electricity
Words _ upper call, will not read address list news _ next step.
Step S402 generates random number, and according to greedy selection operation routing information.If in the present embodiment,
The random number of generation is that 0.8, greedy is set as 0.5, and maximum award thus can be determined in state (PAGE1)
Value, as element (next step) and the corresponding r:0.038801 of action (click), and then the courses of action information chosen
I.e. are as follows: click " next step " on page PAGE1.At this point, if random number less than 0.5, can randomly select operation road
Diameter information, so as to explore the new courses of action that can reach target pages.
Step S403, after being executed according to courses of action information, by the state_ after OCR technique identification execution, and really
Determine page variable condition.State_ at this time should be page PAGE2, therefore can recognize following element: " refusal ",
" permission " can obtain the identification information of state_ by splicing above-mentioned element are as follows: refusal _ permission.It is possible thereby to determine
state_!=state, and target pages PAGE8 is not reached.
Step S404 updates Q_table according to determining page variable condition.If the award updated value set in the present embodiment
R is as follows: the page changes and reaches target pages, i.e. state_=target pages, then awards updated value R=5;The page occurs
Variation and the miss the mark page, i.e. state_!=state, and state_ ≠ target pages, then award updated value R=0;It jumps
Application program out, i.e. interface other than state_=app, then award updated value R=-5;The page does not change, i.e. state_
=state then awards updated value R=-100.At this point, can determine this R=0 by page variable condition.And pass through Q_
Table is it is found that q_target=0.048501, q_predict=0.038801, if setting dough softening gamma=0.3, thus
Acquisition changing value can be calculated:
Delta=R+gamma (q_target-q_predict)=0+0.3 × (0.048501-0.038801)=
0.00291
If learning rate is set as 0.5, element (next step) and action under state (PAGE1) can be calculated
(click) corresponding reward value r are as follows:
R=Q_table [state] .loc (element, action)+lrdelta=0.038801+0.5 ×
0.00291=0.041711
Step S405 judges whether to reach target pages or jumps out application program, since the current page is PAGE2,
Not target pages PAGE8, then judging result is False, current state can be updated to state_, and return step at this time
S402 is continued to execute.
Based on the same inventive concept, a kind of test equipment of application program is additionally provided in the embodiment of the present application, it is described to set
Standby corresponding method is the test method of implementing application in previous embodiment, and its principle for solving the problems, such as and the party
Method is similar.
The mode of intensified learning can be utilized in the test equipment of application program provided by the embodiments of the present application, so that not
The disconnected reward value table updated can influence the decision of subsequent selection operation routing information, and target pages can be reached by strengthening to choose
The probability of courses of action, therefore the courses of action for reaching target pages can be more efficiently explored automatically.
In actual scene, which can be user equipment, the network equipment or user equipment and the network equipment passes through
Network is integrated constituted equipment, is furthermore also possible to run on the program in above equipment.The user equipment include but
It is not limited to all kinds of terminal devices such as computer, mobile phone, tablet computer;The network equipment include but is not limited to as network host,
Single network server, multiple network server collection or set of computers based on cloud computing etc. are realized.Here, cloud is by being based on cloud
The a large amount of hosts or network server for calculating (Cloud Computing) are constituted, wherein cloud computing is the one of distributed computing
Kind, a virtual machine consisting of a loosely coupled set of computers.
A kind of test equipment of application program provided by the embodiments of the present application, including training module, the training module are used for
It executes and trains process at least once, and the courses of action acquisition of information according to needed for reaching target pages in the trained process is surveyed
Test result.Wherein, test result be by training process in initial page reach target pages between courses of action, i.e., two
Relationship is jumped and to realize that these are jumped and on each page to movement performed by element-specific between person.
Assuming that target pages are PAGE4, initial page PAGE1, a courses of action of PAGE4 are reached from PAGE1 are as follows:
PAGE1 (next step) → PAGE2 (permission) → PAGE3 (login) wherein, PAGE1 (next step) is a courses of action
Information indicates to click " next step " in PAGE1, will then jump to PAGE2.Wherein, PAGE1 is a page, is clicked
It is acted for one, is in next step an element in PAGE1.Then, continue to click " permission " in PAGE2, jump to
PAGE3 clicks " login " in PAGE3, jumps to target pages PAGE4.After setting target pages actually required, base
In exploring courses of action information when reaching target pages, the operation road that target pages are reached from a certain initial page can be determined
Diameter.
The training module has included at least recognition unit, decision package, has executed processing unit and judging unit, is realizing
When training process, above-mentioned processing unit is respectively used to execute processing step as shown in Figure 1:
Step S101, decision package are based on reward value table, the courses of action information being chosen in current page.
The reward value table can be used for being recorded in reward value corresponding to element execution movement in the page, wherein prize
Reward value table is denoted as Q_table, and the page is denoted as state, and the element in the page is denoted as element, and movement is denoted as action, reward value
It is denoted as r.
The element can be in the page it is all kinds of may can for the content of user's operation, usually comprising text or
All kinds of labels or button of image, Fig. 2 shows the schematic diagram of a page in a certain application program, element therein
Including content corresponding to box 201~205, respectively Language, TM, appXXXX, account login, new user's registration this
Five element.In some embodiments of the present application, OCR (Optical can be used by recognition unit
CharacterRecognition, optical character identification) technology detects the current page, to be identified in the page
Word content, as each element under the page.Action refers to the corresponding movement of all types of user operation, such as can be
Click, sliding, long-pressing, return etc..
In actual scene, each different page can be usually identified using the member for including in the page.For example,
After the element for including in OCR technique identification current page, the set of all elements under current page can be obtained
(element1, element2, element3 ...) then believes the mark that the recognition result of each element is spliced into the page
Breath.By taking the page shown in Fig. 2 as an example, identify that the collection of the element of acquisition is combined into that (Language, TM, appXXXX, account are stepped on
Enter, new user's registration), it can be spliced into " Language_TM_appXXXX_ account login _ new user's registration ", to make
The identification informations such as title, index for the state.
In some embodiments of the present application, Q_table can be indicated using a three-dimensional data structure, that is, passed through
These three parameters of state, element and action can determine r.In the embodiment of the present application, Q_table can use Hash table
Form exist, (key) key of Hash table is state, and value (value) is the two-dimensional matrix that element and action is constituted.Example
Such as, in a Q_table of the embodiment of the present application, under state shown in Fig. 2, the Two-Dimensional Moment of element and action composition
Battle array is as shown in table 1:
|
Action1 (click) |
Action2 (sliding) |
Action3 (return) |
… |
element1(Language) |
r[0,0] |
r[0,1] |
r[0,2] |
… |
element2(appXXXX) |
r[1,0] |
r[1,1] |
r[1,2] |
… |
Element3 (new user's registration) |
r[2,0] |
r[2,1] |
r[2,2] |
… |
… |
… |
… |
… |
… |
Table 1Q_table
As a result, at some state, the corresponding r of some action is executed on some element, can be denoted as Q_
table[state].loc(element,action)。
The courses of action information can be denoted as Action, indicate the target action executed to the object element in the page.
It in some embodiments of the present application, can equally be indicated using a three-dimensional data structure, the parameter point of three dimensions
It Wei not state, element and action.For example, table 2 shows what element and action under state shown in Fig. 2 was constituted
Two-dimensional matrix:
|
Action1 (click) |
Action2 (sliding) |
Action3 (return) |
… |
element1(Language) |
Action[0,0] |
Action[0,1] |
Action[0,2] |
… |
element2(appXXXX) |
Action[1,0] |
Action[1,1] |
Action[1,2] |
… |
Element3 (new user's registration) |
Action[2,0] |
Action[2,1] |
Action[2,2] |
… |
… |
… |
… |
… |
… |
Table 2Action
Based on reward value table, when the courses of action information being chosen in current page, selection operation path letter can be set
The reward value of the target action executed represented by the probability of breath and the courses of action information, to the object element in current page
It is positively correlated.For example, reward value r is bigger, then it is higher to be selected probability by courses of action information Action corresponding to its reward value.
For the state corresponding to the Tables 1 and 2, if wherein Q_table (state) .loc [element3 (new user's registration),
Action1 (click)], i.e. the value of r [2,0] is maximum, then the probability that its corresponding courses of action information Action [2,0] is selected
Can be higher, it is equivalent to when making a policy in face of the page shown in Fig. 2, to element therein " new user's registration " progress " click "
Probability highest.
In some embodiments of the present application, decision package can by the way of following selection operation routing information: it is first
First, a random number is generated within the scope of default value, such as can produce the random number random between one 0 to 1
number.Then the random number is used compared with preset greediness degree greedy according to comparison result
Different mode selection operation routing informations.Wherein, greedy degree can be a preset numerical value, the value model of the numerical value
Enclose with generate the numberical range of random number it is consistent, such as in this embodiment can between 0 to 1 value.When
When randomnumber is more than or equal to greedy, determines and chooses maximum first reward value in current page in reward value table,
And corresponding courses of action information is determined according to first reward value;When randomnumber is less than greedy, determine
The second reward value in current page is randomly selected in reward value table, and corresponding operation road is determined according to second reward value
Diameter information.
For example, the random number of generation is 0.7, it follows that randomnumber > if greediness degree is set as 0.5
Greedy, can first determine at this time in reward value table choose current page in maximum first reward value, if by taking table 2 as an example,
In maximum first reward value be r [2,0], thus choose its corresponding courses of action information Action [2,0], i.e., in Fig. 2 institute
Element " new user's registration " in the page shown is clicked.In actual scene, if in Q_table under a certain state not
The maximum reward value of existence anduniquess can then randomly select among multiple maximum reward values, furthermore in the initial state, Q_
Reward value in table can be initialized as 0, can not also choose unique maximum reward value at this time, can also be using random
The mode of selection.If greediness degree is set as 0.5, the random number of generation is 0.3, then random number < greedy at this time, meeting
Determine the second reward value randomly selected in current page in reward value table, if randomly selected at this time by taking table 2 as an example second
Reward value is r [0,0], thus can choose its corresponding courses of action information Action [0,0], i.e., in the page shown in Fig. 2
Element " Language " clicked.
Step S102 executes processing unit and acts to the object element performance objective of current page, and recognition unit identification is held
The page after row, and obtain page variable condition.In actual scene, in the object element performance objective movement to current page
Later, the page may change, such as jump to other pages or exited application program etc., it is also possible to do not become
Change.In the case where according to difference, different page variable conditions is had.
When determining page variable condition, need to identify the page after executing, then by itself and the page before being not carried out
It is compared, wherein mark information of the element for including in the page as the page can be used by executing the page of front and back, passes through
Compare whether the element for including is identical to judge whether the page is changed, so that it is determined that page variable condition.
In some embodiments of the present application, when recognition unit identifies the page after executing, OCR technique can also be used,
Then the element recognized is spliced into the mark information of the page, to be compared.For example, if the page before executing is denoted as
State, the page after execution are denoted as state_, if the mark information of state is that " Language_TM_appXXXX_ account is stepped on
Enter _ new user's registration ", the mark information of state_ is also " Language_TM_appXXXX_ account login _ new user's registration ",
It may be considered that the page does not change;Conversely, if mark information is different, it may be considered that the page is changed.
In actual scene, it is identical that the different pages of two of application program are possible to word content, but is laid out (such as text
Position, size, color etc.) it is different in the case where.As a result, in order to further increase accuracy, other of the application are implemented
In example, in recognition element, other than identifying word content, position, size, color of text etc. can also be further identified
Additional information splices the mark information as state, together as the supplementary features of element so as to improve the judgement page
The accuracy of variable condition.
Step S103 executes processing unit according to the page variable condition, believes courses of action described in reward value table
Corresponding reward value is ceased to be updated.
When being updated to the corresponding reward value of courses of action information described in reward value table, executing processing unit can be with
First according to the page variable condition, award updated value is determined, then according to award updated value, operate to described in reward value table
The corresponding reward value of routing information is updated.Since award updated value will affect the later reward value table of a decision, and encourage
Reward value table is the decision-making foundation as selection operation routing information, by reward value guidance selection as a result, being based on intensified learning
The principle of (reinforcement learning) is needed to correct Action (the i.e. courses of action letter of arrival target pages
Breath) strengthened, therefore in the embodiment of the present application, the page variable condition of target pages is changed and reached for the page
Highest award updated value can be set, thus strengthen correct Action.
In some embodiments of the present application, the page variable condition is in addition to the page above-mentioned changes and is not up to
It can also include that the page change and reach target pages, jump out application program and the page does not occur except target pages
Variation.The sequence of its corresponding reward updated value, from high to low successively are as follows: the page changes and reaches the target pages > page
It changes and miss the mark page > jumps out the application program > page and do not change.
For example, it is as follows to set award updated value R in the present embodiment: the page changes and reaches target pages, i.e.,
State_=target pages then award updated value R=5;The page changes and the miss the mark page, i.e. state_!=
State, and state_ ≠ target pages, then award updated value R=0;Jump out application program, i.e. boundary other than state_=app
Updated value R=-5 is then awarded in face;The page does not change, i.e. state_=state, then awards updated value R=-100.
In order to enable the update of reward value table can more accurately reflect the reinforcing to correct behavior, the embodiment of the present application
According to award updated value, when being updated to the corresponding reward value of courses of action information described in reward value table, Ke Yixian
Known after being executed according to reward value corresponding to award updated value, the courses of action information and according to courses of action information
The maximum reward value for the page being clipped to, determines changing value.Wherein, reward value corresponding to the courses of action information can be denoted as
In q_predict, as Q_table, by under the page state and the state before being executed based on courses of action information
R value determined by three object element, target action dimensions.And the page recognized after being executed according to courses of action information
Maximum reward value can be denoted as q_target, the page in as Q_table, before being executed based on courses of action information
Maximum r value in state_.It, can be using following public when determining changing value delta according to R, q_predict and q_target
Formula calculates:
Delta=R+gamma (q_target-q_predict)
Wherein, gamma is the dough softening of the pre-set value between 0-1.
After calculating changing value, it can be based on the changing value, to courses of action information pair described in reward value table
The reward value answered is updated.When being updated to r, following calculation can be used:
R=Q_table [state] .loc (element, action)=Q_table [state] .loc (element,
action)+lr·delta
Wherein, lr is learning rate, can be determined according to the demand of test speed, the update of the more high then reward value table of learning rate
Speed is faster.
Step S104, judging unit judges whether to reach target pages or jumps out application program, if it has not, repeating
Above-mentioned processing step, if it is, terminating this training process.
Since the page at this time is already based on the page after courses of action information executes, carried out according to the case where page
Judgement, if judging result be it is no, both do not reached target pages or do not jumped out application program, then again from step S101 continuation it is next
The processing of wheel, if the determination result is YES, the page have arrived at target pages or jump out application program, then this training stream of basis
Journey, according to preset frequency of training, the training process continued next time either terminates this test.
In conclusion process is trained at least once by executing, according to the training in the embodiment of the present application offer scheme
Courses of action acquisition of information test result needed for reaching target pages in process, it is hereby achieved that each page of application program
Between courses of action.Every time in training process, by the way of intensified learning, it is primarily based on reward value table, is chosen at and works as
Courses of action information in the preceding page is then based on the movement of courses of action information performance objective, the page after identification execution, and obtains
Page variable condition is taken, according to the page variable condition, to the corresponding reward value of courses of action information described in reward value table
It is updated, repeats above-mentioned processing step, until reaching target pages or jumping out application program, to complete primary training
Process.Updated reward value table can influence the decision of subsequent selection operation routing information as a result, strengthen and choose and can reach
The probability of the courses of action of target pages, therefore the courses of action for reaching target pages can be more efficiently explored automatically.
In addition, a part of the application can be applied to computer program product, such as computer program instructions, when its quilt
When computer executes, by the operation of the computer, it can call or provide according to the present processes and/or technical solution.
And the program instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through
Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to program instruction run calculating
In the working storage of machine equipment.Here, include a calculating equipment as shown in Figure 5 according to some embodiments of the present application,
The equipment includes being stored with one or more memories 510 of computer-readable instruction and for executing computer-readable instruction
Processor 520, wherein when the computer-readable instruction is executed by the processor, so that the equipment, which executes, is based on aforementioned
The method and/or technology scheme of multiple embodiments of application.
In addition, some embodiments of the present application additionally provide a kind of computer-readable medium, it is stored thereon with computer journey
Sequence instruction, the computer-readable instruction can be executed by processor with the method for realizing multiple embodiments of aforementioned the application and/
Or technical solution.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt
With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In some embodiments
In, the software program of the application can be executed by processor to realize above step or function.Similarly, the software of the application
Program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory, magnetic or
CD-ROM driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, for example,
As the circuit cooperated with processor thereby executing each step or function.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie
In the case where without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included in the application.Any reference signs in the claims should not be construed as limiting the involved claims.This
Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple
Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table
Show title, and does not indicate any particular order.