CN110221959A

CN110221959A - Test method, equipment and the computer-readable medium of application program

Info

Publication number: CN110221959A
Application number: CN201910304733.7A
Authority: CN
Inventors: 孙震; 陈忻; 张新琛
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced Nova Technology Singapore Holdings Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2019-09-10
Anticipated expiration: 2039-04-16
Also published as: CN110221959B

Abstract

This application provides in a kind of testing scheme of application program, the program trains process by executing at least once, the courses of action acquisition of information test result according to needed for reaching target pages in the trained process, it is hereby achieved that the courses of action between each page of application program.Every time in training process, by the way of intensified learning, it is primarily based on reward value table, the courses of action information being chosen in current page, it is then based on the movement of courses of action information performance objective, the page after identification execution, and page variable condition is obtained, according to the page variable condition, the corresponding reward value of courses of action information described in reward value table is updated, above-mentioned processing step is repeated, until reaching target pages or jumping out application program, to complete primary training process.

Description

Test method, equipment and the computer-readable medium of application program

Technical field

This application involves information technology field more particularly to a kind of test methods of application program, equipment and computer Readable medium.

Background technique

In the automatic test course of application program, according to the testing requirement of project, various automatic test frames are transferred It is an essential ring that the interface that frame provides, which writes automatic test cases,.And when generating test case, it needs to be answered With the courses of action between each page of program, the information of the courses of action includes jumping relationship and realize this between page The movement needed to be implemented when jumping a bit.Currently, the courses of action between each page of application program are typically necessary by artificial side Formula is determined, so as to cause more human cost is needed.

Apply for content

The purpose of the application is to provide a kind of testing scheme of application program, is testing in existing scheme to solve Need to put into when courses of action between each page of application program a large amount of human costs the problem of.

A kind of test method of application program is provided in the embodiment of the present application, this method comprises:

It executes and trains process at least once, and the courses of action according to needed for reaching target pages in the trained process are believed Breath obtains test result, wherein the trained process includes:

Based on reward value table, the courses of action information that is chosen in current page, wherein the reward value table is for recording The object element in the page is held in the reward value corresponding to element execution movement in the page, the courses of action information expression Capable target action；

Object element performance objective movement to current page, the page after identification execution, and obtain page variable condition；

According to the page variable condition, the corresponding reward value of courses of action information described in reward value table is carried out more Newly；

Judge whether to reach target pages or jumps out application program, if it has not, above-mentioned processing step is repeated, if It is to terminate this training process.

Another aspect based on the application additionally provides a kind of test equipment of application program, which includes:

Training module trains process for executing at least once, and according to arrival target pages institute in the trained process The courses of action acquisition of information test result needed, wherein the training module includes recognition unit, decision package, executes processing Unit and judging unit, the trained process include:

Decision package is based on reward value table, the courses of action information being chosen in current page, wherein the reward value table For being recorded in reward value corresponding to element execution movement in the page, the courses of action information is indicated to the mesh in the page Mark the target action that element executes；

It executes processing unit to act the object element performance objective of current page, recognition unit identifies performance objective movement The page afterwards, and obtain page variable condition；

Processing unit is executed according to the page variable condition, to the corresponding prize of courses of action information described in reward value table Reward value is updated；

Judging unit judges whether to reach target pages or jumps out application program, if it has not, repeating above-mentioned processing Step, if it is, terminating this training process.

In addition, the equipment includes referring to for storing computer program the embodiment of the present application also provides a kind of calculating equipment The memory of order and processor for executing computer program instructions, wherein when the computer program instructions are by the processor When execution, the test method that the equipment executes the application program is triggered.

The embodiment of the present application also provides a kind of computer-readable mediums, are stored thereon with computer program instructions, described Computer-readable instruction can be executed by processor the test method to realize the application program.

In the embodiment of the present application offer scheme, process is trained at least once by executing, is arrived according in the trained process Courses of action acquisition of information test result needed for up to target pages, it is hereby achieved that the behaviour between each page of application program Make path.Every time in training process, by the way of intensified learning, it is primarily based on reward value table, is chosen in current page Courses of action information, be then based on courses of action information performance objective movement, identification execute after the page, and obtain the page change Change state is updated the corresponding reward value of courses of action information described in reward value table according to the page variable condition, Above-mentioned processing step is repeated, until reaching target pages or jumping out application program, to complete primary training process.By This, updated reward value table can influence the decision of subsequent selection operation routing information, and page object can be reached by strengthening to choose The probability of the courses of action in face, therefore the courses of action for reaching target pages can be more efficiently explored automatically.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is the processing step schematic diagram that a trained process includes in the embodiment of the present application；

Fig. 2 is the schematic diagram of a page of application program in the embodiment of the present application；

Fig. 3 is the what be new that Q_table after multiple trained processes is executed using scheme provided by the embodiments of the present application Schematic diagram.

Fig. 4 is the processing step schematic diagram for including when starting a new training process in the embodiment of the present application；

Fig. 5 is a kind of structural schematic diagram for calculating equipment provided by the embodiments of the present application；

The same or similar appended drawing reference represents the same or similar component in attached drawing.

Specific embodiment

The application is described in further detail with reference to the accompanying drawing.

In a typical configuration of this application, terminal, the equipment of service network include one or more processors (CPU), input/output interface, network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media, can be by any side Method or technology realize that information stores.Information can be the device or other numbers of computer readable instructions, data structure, program According to.The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory techniques, CD-ROM (CD- ROM), digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storages Equipment or any other non-transmission medium, can be used for storage can be accessed by a computing device information.

The embodiment of the present application provides a kind of test method of application program, this process employs the mode of intensified learning, The reward value table constantly updated is enabled to influence the decision of subsequent selection operation routing information, target can be reached by strengthening to choose The probability of the courses of action of the page, therefore the courses of action for reaching target pages can be more efficiently explored automatically.

In actual scene, the executing subject of this method can be user equipment, the network equipment or user equipment and net Network equipment is integrated constituted equipment by network, is furthermore also possible to run on the program in above equipment.The user Equipment includes but is not limited to all kinds of terminal devices such as computer, mobile phone, tablet computer；The network equipment includes but is not limited to such as Network host, single network server, multiple network server collection or set of computers based on cloud computing etc. are realized.Here, Cloud is made of a large amount of hosts or network server for being based on cloud computing (Cloud Computing), wherein cloud computing is distributed One kind of calculating, a virtual machine consisting of a loosely coupled set of computers.

In a kind of test method of application program provided by the embodiments of the present application, it can execute and train process at least once, and The courses of action acquisition of information test result according to needed for reaching target pages in the trained process.Wherein, test result is To reach the courses of action between target pages by the initial page in training process, i.e. jumping relationship and be between the two Realize that these are jumped and on each page to movement performed by element-specific.

Assuming that target pages are PAGE4, initial page PAGE1, a courses of action of PAGE4 are reached from PAGE1 are as follows: PAGE1 (next step) → PAGE2 (permission) → PAGE3 (login) wherein, PAGE1 (next step) is a courses of action Information indicates to click " next step " in PAGE1, will then jump to PAGE2.Wherein, PAGE1 is a page, is clicked It is acted for one, is in next step an element in PAGE1.Then, continue to click " permission " in PAGE2, jump to PAGE3 clicks " login " in PAGE3, jumps to target pages PAGE4.After setting target pages actually required, base In exploring courses of action information when reaching target pages, the operation road that target pages are reached from a certain initial page can be determined Diameter.

Fig. 1 shows a trained process in the embodiment of the present application, and each trained process includes following processing step:

Step S101 is based on reward value table, the courses of action information being chosen in current page.

The reward value table can be used for being recorded in reward value corresponding to element execution movement in the page, wherein prize Reward value table is denoted as Q_table, and the page is denoted as state, and the element in the page is denoted as element, and movement is denoted as action, reward value It is denoted as r.

The element can be in the page it is all kinds of may can for the content of user's operation, usually comprising text or All kinds of labels or button of image, Fig. 2 shows the schematic diagram of a page in a certain application program, element therein Including content corresponding to box 201~205, respectively Language, TM, appXXXX, account login, new user's registration this Five element.In some embodiments of the present application, can using OCR (Optical Character Recognition, Optical character identification) technology detects the current page, so that word content is identified in the page, as under the page Each element.Action refers to the corresponding movement of all types of user operation, such as can be click, sliding, long-pressing, return Deng.

In actual scene, each different page can be usually identified using the member for including in the page.For example, After the element for including in OCR technique identification current page, the set of all elements under current page can be obtained (element1, element2, element3 ...) then believes the mark that the recognition result of each element is spliced into the page Breath.By taking the page shown in Fig. 2 as an example, identify that the collection of the element of acquisition is combined into that (Language, TM, appXXXX, account are stepped on Enter, new user's registration), it can be spliced into " Language_TM_appXXXX_ account login _ new user's registration ", to make The identification informations such as title, index for the state.

In some embodiments of the present application, Q_table can be indicated using a three-dimensional data structure, that is, passed through These three parameters of state, element and action can determine r.In the embodiment of the present application, Q_table can use Hash table Form exist, (key) key of Hash table is state, and value (value) is the two-dimensional matrix that element and action is constituted.Example Such as, in a Q_table of the embodiment of the present application, under state shown in Fig. 2, the Two-Dimensional Moment of element and action composition Battle array is as shown in table 1:

	Action1 (click)	Action2 (sliding)	Action3 (return)	…
					element1(Language)	r[0,0]	r[0,1]	r[0,2]	…
element2(appXXXX)	r[1,0]	r[1,1]	r[1,2]	…
					Element3 (new user's registration)	r[2,0]	r[2,1]	r[2,2]	…
…	…	…	…	…

Table 1Q_table

As a result, at some state, the corresponding r of some action is executed on some element, can be denoted as Q_ table[state].loc(element,action)。

The courses of action information can be denoted as Action, indicate the target action executed to the object element in the page. It in some embodiments of the present application, can equally be indicated using a three-dimensional data structure, the parameter point of three dimensions It Wei not state, element and action.For example, table 2 shows what element and action under state shown in Fig. 2 was constituted Two-dimensional matrix:

	Action1 (click)	Action2 (sliding)	Action3 (return)	…
					element1(Language)	Action[0,0]	Action[0,1]	Action[0,2]	…
element2(appXXXX)	Action[1,0]	Action[1,1]	Action[1,2]	…
					Element3 (new user's registration)	Action[2,0]	Action[2,1]	Action[2,2]	…
…	…	…	…	…

Table 2Action

Based on reward value table, when the courses of action information being chosen in current page, selection operation path letter can be set The reward value of the target action executed represented by the probability of breath and the courses of action information, to the object element in current page It is positively correlated.For example, reward value r is bigger, then it is higher to be selected probability by courses of action information Action corresponding to its reward value. For the state corresponding to the Tables 1 and 2, if wherein Q_table (state) .loc [element3 (new user's registration), Action1 (click)], i.e. the value of r [2,0] is maximum, then the probability that its corresponding courses of action information Action [2,0] is selected Can be higher, it is equivalent to when making a policy in face of the page shown in Fig. 2, to element therein " new user's registration " progress " click " Probability highest.

In some embodiments of the present application, can by the way of following selection operation routing information: firstly, default A random number is generated in numberical range, such as can produce the random number random number between one 0 to 1.Then should Random number chooses compared with preset greediness degree greedy according to comparison result in different ways Courses of action information.Wherein, greedy degree can be a preset numerical value, the value range and generation random number of the numerical value Numberical range it is consistent, such as in this embodiment can between 0 to 1 value.When random number is more than or equal to When greedy, determines and choose maximum first reward value in current page in reward value table, and according to first reward value Determine corresponding courses of action information；When random number is less than greedy, determines to randomly select in reward value table and work as The second reward value in the preceding page, and corresponding courses of action information is determined according to second reward value.

For example, the random number of generation is 0.7, it follows that randomnumber > if greediness degree is set as 0.5 Greedy, can first determine at this time in reward value table choose current page in maximum first reward value, if by taking table 2 as an example, In maximum first reward value be r [2,0], thus choose its corresponding courses of action information Action [2,0], i.e., in Fig. 2 institute Element " new user's registration " in the page shown is clicked.In actual scene, if in Q_table under a certain state not The maximum reward value of existence anduniquess can then randomly select among multiple maximum reward values, furthermore in the initial state, Q_ Reward value in table can be initialized as 0, can not also choose unique maximum reward value at this time, can also be using random The mode of selection.If greediness degree is set as 0.5, the random number of generation is 0.3, then random number < greedy at this time, meeting Determine the second reward value randomly selected in current page in reward value table, if randomly selected at this time by taking table 2 as an example second Reward value is r [0,0], thus can choose its corresponding courses of action information Action [0,0], i.e., in the page shown in Fig. 2 Element " Language " clicked.

Step S102 acts the object element performance objective of current page, the page after identification execution, and obtains the page Variable condition.In actual scene, after the object element performance objective movement to current page, the page may become Change, such as jump to other pages or exited application program etc., it is also possible to it does not change.In the case where according to difference, Have different page variable conditions.

When determining page variable condition, need to identify the page after executing, then by itself and the page before being not carried out It is compared, wherein mark information of the element for including in the page as the page can be used by executing the page of front and back, passes through Compare whether the element for including is identical to judge whether the page is changed, so that it is determined that page variable condition.

In some embodiments of the present application, the page after executing is identified, can also use OCR technique, it then will identification To element be spliced into the mark information of the page, to be compared.For example, if the page before executing is denoted as state, after execution The page is denoted as state_, if the mark information of state is " Language_TM_appXXXX_ account login _ new user's registration ", The mark information of state_ is also " Language_TM_appXXXX_ account login _ new user's registration ", it may be considered that the page It does not change；Conversely, if mark information is different, it may be considered that the page is changed.

In actual scene, it is identical that the different pages of two of application program are possible to word content, but is laid out (such as text Position, size, color etc.) it is different in the case where.As a result, in order to further increase accuracy, other of the application are implemented In example, in recognition element, other than identifying word content, position, size, color of text etc. can also be further identified Additional information splices the mark information as state, together as the supplementary features of element so as to improve the judgement page The accuracy of variable condition.

Step S103, according to the page variable condition, to the corresponding award of courses of action information described in reward value table Value is updated.

It, can be first according to the page when being updated to the corresponding reward value of courses of action information described in reward value table Face variable condition determines award updated value, corresponding to courses of action information described in reward value table then according to award updated value Reward value be updated.Since award updated value will affect the later reward value table of a decision, and reward value table is conduct The decision-making foundation of selection operation routing information, by reward value guidance selection as a result, being based on intensified learning The principle of (reinforcement learning) is needed to correct Action (the i.e. courses of action letter of arrival target pages Breath) strengthened, therefore in the embodiment of the present application, the page variable condition of target pages is changed and reached for the page Highest award updated value can be set, thus strengthen correct Action.

In some embodiments of the present application, the page variable condition is in addition to the page above-mentioned changes and is not up to It can also include that the page change and reach target pages, jump out application program and the page does not occur except target pages Variation.The sequence of its corresponding reward updated value, from high to low successively are as follows: the page changes and reaches the target pages > page It changes and miss the mark page > jumps out the application program > page and do not change.

For example, it is as follows to set award updated value R in the present embodiment: the page changes and reaches target pages, i.e., State_=target pages then award updated value R=5；The page changes and the miss the mark page, i.e. state_！= State, and state_ ≠ target pages, then award updated value R=0；Jump out application program, i.e. boundary other than state_=app Updated value R=-5 is then awarded in face；The page does not change, i.e. state_=state, then awards updated value R=-100.

In order to enable the update of reward value table can more accurately reflect the reinforcing to correct behavior, the embodiment of the present application According to award updated value, when being updated to the corresponding reward value of courses of action information described in reward value table, Ke Yixian Known after being executed according to reward value corresponding to award updated value, the courses of action information and according to courses of action information The maximum reward value for the page being clipped to, determines changing value.Wherein, reward value corresponding to the courses of action information can be denoted as In q_predict, as Q_table, by under the page state and the state before being executed based on courses of action information R value determined by three object element, target action dimensions.And the page recognized after being executed according to courses of action information Maximum reward value can be denoted as q_target, the page in as Q_table, before being executed based on courses of action information Maximum r value in state_.It, can be using following public when determining changing value delta according to R, q_predict and q_target Formula calculates:

Delta=R+gamma (q_target-q_predict)

Wherein, gamma is the dough softening of the pre-set value between 0-1.

After calculating changing value, it can be based on the changing value, to courses of action information pair described in reward value table The reward value answered is updated.When being updated to r, following calculation can be used:

R=Q_table [state] .loc (element, action)=Q_table [state] .loc (element, action)+lr·delta

Wherein, lr is learning rate, can be determined according to the demand of test speed, the update of the more high then reward value table of learning rate Speed is faster.

Step S104 judges whether to reach target pages or jumps out application program, if it has not, repeating above-mentioned processing Step, if it is, terminating this training process.

Since the page at this time is already based on the page after courses of action information executes, carried out according to the case where page Judgement, if judging result be it is no, both do not reached target pages or do not jumped out application program, then again from step S101 continuation it is next The processing of wheel, if the determination result is YES, the page have arrived at target pages or jump out application program, then this training stream of basis Journey, according to preset frequency of training, the training process continued next time either terminates this test.

After Fig. 3 shows use scheme provided by the embodiments of the present application 10 trained processes (epoch) of execution, Q_ The what be new of table, wherein action includes click (click) and back (return), each in word segment representation page A element, axisis indicate the coordinate position of element, and 8 pages are respectively PAGE1~PAGE8, the page object of setting Face is PAGE8.The courses of action information according to needed for reaching target pages in training process, can determine from initial page A courses of action of PAGE1 arrival target pages PAGE8 are as follows: PAGE1 (next step) → PAGE2 (permission) → PAGE3 (permits Perhaps) → PAGE3 (refusal) → PAGE7 (account is logined).It may include place as shown in Figure 4 when starting a new epoch Manage process:

Step S401 gets the current state of application program by OCR technique, such as at the beginning of when the Application testing The beginning page is PAGE1, wherein following element can be recognized: " memory space ", " this setting will not touch or read with This ", " electricity consumption right of speech limit determine my phone numbers ", " appXXXX will not dial other phones ", " upper call, will not read Address list news ", " next step " can obtain the identification information of state (PAGE1) by splicing above-mentioned element are as follows: storage is empty Between _ this setting will not be touched or be read and sheet _ electricity consumption right of speech limits and determines that my phone numbers _ appXXXX will not dial other electricity Words _ upper call, will not read address list news _ next step.

Step S402 generates random number, and according to greedy selection operation routing information.If in the present embodiment, The random number of generation is that 0.8, greedy is set as 0.5, and maximum award thus can be determined in state (PAGE1) Value, as element (next step) and the corresponding r:0.038801 of action (click), and then the courses of action information chosen I.e. are as follows: click " next step " on page PAGE1.At this point, if random number less than 0.5, can randomly select operation road Diameter information, so as to explore the new courses of action that can reach target pages.

Step S403, after being executed according to courses of action information, by the state_ after OCR technique identification execution, and really Determine page variable condition.State_ at this time should be page PAGE2, therefore can recognize following element: " refusal ", " permission " can obtain the identification information of state_ by splicing above-mentioned element are as follows: refusal _ permission.It is possible thereby to determine state_！=state, and target pages PAGE8 is not reached.

Step S404 updates Q_table according to determining page variable condition.If the award updated value set in the present embodiment R is as follows: the page changes and reaches target pages, i.e. state_=target pages, then awards updated value R=5；The page occurs Variation and the miss the mark page, i.e. state_！=state, and state_ ≠ target pages, then award updated value R=0；It jumps Application program out, i.e. interface other than state_=app, then award updated value R=-5；The page does not change, i.e. state_ =state then awards updated value R=-100.At this point, can determine this R=0 by page variable condition.And pass through Q_ Table is it is found that q_target=0.048501, q_predict=0.038801, if setting dough softening gamma=0.3, thus Acquisition changing value can be calculated:

Delta=R+gamma (q_target-q_predict)=0+0.3 × (0.048501-0.038801)= 0.00291

If learning rate is set as 0.5, element (next step) and action under state (PAGE1) can be calculated (click) corresponding reward value r are as follows:

R=Q_table [state] .loc (element, action)+lrdelta=0.038801+0.5 × 0.00291=0.041711

Step S405 judges whether to reach target pages or jumps out application program, since the current page is PAGE2, Not target pages PAGE8, then judging result is False, current state can be updated to state_, and return step at this time S402 is continued to execute.

Based on the same inventive concept, a kind of test equipment of application program is additionally provided in the embodiment of the present application, it is described to set Standby corresponding method is the test method of implementing application in previous embodiment, and its principle for solving the problems, such as and the party Method is similar.

The mode of intensified learning can be utilized in the test equipment of application program provided by the embodiments of the present application, so that not The disconnected reward value table updated can influence the decision of subsequent selection operation routing information, and target pages can be reached by strengthening to choose The probability of courses of action, therefore the courses of action for reaching target pages can be more efficiently explored automatically.

In actual scene, which can be user equipment, the network equipment or user equipment and the network equipment passes through Network is integrated constituted equipment, is furthermore also possible to run on the program in above equipment.The user equipment include but It is not limited to all kinds of terminal devices such as computer, mobile phone, tablet computer；The network equipment include but is not limited to as network host, Single network server, multiple network server collection or set of computers based on cloud computing etc. are realized.Here, cloud is by being based on cloud The a large amount of hosts or network server for calculating (Cloud Computing) are constituted, wherein cloud computing is the one of distributed computing Kind, a virtual machine consisting of a loosely coupled set of computers.

A kind of test equipment of application program provided by the embodiments of the present application, including training module, the training module are used for It executes and trains process at least once, and the courses of action acquisition of information according to needed for reaching target pages in the trained process is surveyed Test result.Wherein, test result be by training process in initial page reach target pages between courses of action, i.e., two Relationship is jumped and to realize that these are jumped and on each page to movement performed by element-specific between person.

The training module has included at least recognition unit, decision package, has executed processing unit and judging unit, is realizing When training process, above-mentioned processing unit is respectively used to execute processing step as shown in Figure 1:

Step S101, decision package are based on reward value table, the courses of action information being chosen in current page.

The element can be in the page it is all kinds of may can for the content of user's operation, usually comprising text or All kinds of labels or button of image, Fig. 2 shows the schematic diagram of a page in a certain application program, element therein Including content corresponding to box 201~205, respectively Language, TM, appXXXX, account login, new user's registration this Five element.In some embodiments of the present application, OCR (Optical can be used by recognition unit CharacterRecognition, optical character identification) technology detects the current page, to be identified in the page Word content, as each element under the page.Action refers to the corresponding movement of all types of user operation, such as can be Click, sliding, long-pressing, return etc..

Table 1Q_table

Table 2Action

In some embodiments of the present application, decision package can by the way of following selection operation routing information: it is first First, a random number is generated within the scope of default value, such as can produce the random number random between one 0 to 1 number.Then the random number is used compared with preset greediness degree greedy according to comparison result Different mode selection operation routing informations.Wherein, greedy degree can be a preset numerical value, the value model of the numerical value Enclose with generate the numberical range of random number it is consistent, such as in this embodiment can between 0 to 1 value.When When randomnumber is more than or equal to greedy, determines and chooses maximum first reward value in current page in reward value table, And corresponding courses of action information is determined according to first reward value；When randomnumber is less than greedy, determine The second reward value in current page is randomly selected in reward value table, and corresponding operation road is determined according to second reward value Diameter information.

Step S102 executes processing unit and acts to the object element performance objective of current page, and recognition unit identification is held The page after row, and obtain page variable condition.In actual scene, in the object element performance objective movement to current page Later, the page may change, such as jump to other pages or exited application program etc., it is also possible to do not become Change.In the case where according to difference, different page variable conditions is had.

In some embodiments of the present application, when recognition unit identifies the page after executing, OCR technique can also be used, Then the element recognized is spliced into the mark information of the page, to be compared.For example, if the page before executing is denoted as State, the page after execution are denoted as state_, if the mark information of state is that " Language_TM_appXXXX_ account is stepped on Enter _ new user's registration ", the mark information of state_ is also " Language_TM_appXXXX_ account login _ new user's registration ", It may be considered that the page does not change；Conversely, if mark information is different, it may be considered that the page is changed.

Step S103 executes processing unit according to the page variable condition, believes courses of action described in reward value table Corresponding reward value is ceased to be updated.

When being updated to the corresponding reward value of courses of action information described in reward value table, executing processing unit can be with First according to the page variable condition, award updated value is determined, then according to award updated value, operate to described in reward value table The corresponding reward value of routing information is updated.Since award updated value will affect the later reward value table of a decision, and encourage Reward value table is the decision-making foundation as selection operation routing information, by reward value guidance selection as a result, being based on intensified learning The principle of (reinforcement learning) is needed to correct Action (the i.e. courses of action letter of arrival target pages Breath) strengthened, therefore in the embodiment of the present application, the page variable condition of target pages is changed and reached for the page Highest award updated value can be set, thus strengthen correct Action.

Delta=R+gamma (q_target-q_predict)

Wherein, gamma is the dough softening of the pre-set value between 0-1.

Step S104, judging unit judges whether to reach target pages or jumps out application program, if it has not, repeating Above-mentioned processing step, if it is, terminating this training process.

In conclusion process is trained at least once by executing, according to the training in the embodiment of the present application offer scheme Courses of action acquisition of information test result needed for reaching target pages in process, it is hereby achieved that each page of application program Between courses of action.Every time in training process, by the way of intensified learning, it is primarily based on reward value table, is chosen at and works as Courses of action information in the preceding page is then based on the movement of courses of action information performance objective, the page after identification execution, and obtains Page variable condition is taken, according to the page variable condition, to the corresponding reward value of courses of action information described in reward value table It is updated, repeats above-mentioned processing step, until reaching target pages or jumping out application program, to complete primary training Process.Updated reward value table can influence the decision of subsequent selection operation routing information as a result, strengthen and choose and can reach The probability of the courses of action of target pages, therefore the courses of action for reaching target pages can be more efficiently explored automatically.

In addition, a part of the application can be applied to computer program product, such as computer program instructions, when its quilt When computer executes, by the operation of the computer, it can call or provide according to the present processes and/or technical solution. And the program instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to program instruction run calculating In the working storage of machine equipment.Here, include a calculating equipment as shown in Figure 5 according to some embodiments of the present application, The equipment includes being stored with one or more memories 510 of computer-readable instruction and for executing computer-readable instruction Processor 520, wherein when the computer-readable instruction is executed by the processor, so that the equipment, which executes, is based on aforementioned The method and/or technology scheme of multiple embodiments of application.

In addition, some embodiments of the present application additionally provide a kind of computer-readable medium, it is stored thereon with computer journey Sequence instruction, the computer-readable instruction can be executed by processor with the method for realizing multiple embodiments of aforementioned the application and/ Or technical solution.

It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In some embodiments In, the software program of the application can be executed by processor to realize above step or function.Similarly, the software of the application Program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory, magnetic or CD-ROM driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, for example, As the circuit cooperated with processor thereby executing each step or function.

It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie In the case where without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the application.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table Show title, and does not indicate any particular order.

Claims

1. a kind of test method of application program, wherein this method comprises:

It executes and trains process at least once, and the courses of action information according to needed for reaching target pages in the trained process obtains Take test result, wherein the trained process includes:

Based on reward value table, the courses of action information that is chosen in current page, wherein the reward value table is for being recorded in page The reward value corresponding to element execution movement in face, the courses of action information are indicated to the object element execution in the page Target action；

According to the page variable condition, the corresponding reward value of courses of action information described in reward value table is updated；

Judge whether to reach target pages or jumps out application program, if it has not, above-mentioned processing step is repeated, if it is, Terminate this training process.

2. according to the method described in claim 1, wherein, identifying the page after executing, comprising:

The element for including in the identification page, and using the component identification page for including in the page.

3. according to the method described in claim 2, wherein, identifying the element for including in the page, and using the member for including in the page Plain identified page, comprising:

Using the element for including in the OCR technique identification page, and the mark that the recognition result of each element is spliced into the page is believed Breath.

4. the courses of action being chosen in current page are believed according to the method described in claim 1, wherein, being based on reward value table Breath, comprising:

According to the reward value corresponding to element execution movement in the page recorded in the reward value table, it is chosen at current page Courses of action information in face, wherein the probability of selection operation routing information with represented by the courses of action information, to current The reward value for the target action that object element in the page executes is positively correlated.

5. according to the method described in claim 4, wherein, being executed in the page to element according to being recorded in the reward value table The corresponding reward value of movement, the courses of action information being chosen in current page, wherein the probability of selection operation routing information It is positively correlated with represented by the courses of action information, to the reward value for the target action that the object element in current page executes, Include:

A random number is generated within the scope of default value；

When the random number, which is more than or equal to preset greediness, to be spent, determines and chosen maximum first in current page in reward value table Reward value, and corresponding courses of action information is determined according to first reward value；

When the random number, which is less than preset greediness, to be spent, the second award randomly selected in current page in reward value table is determined Value, and corresponding courses of action information is determined according to second reward value.

6. according to the method described in claim 1, wherein, according to the page variable condition, being operated to described in reward value table The corresponding reward value of routing information is updated, comprising:

According to the page variable condition, award updated value is determined, the page changes and reaches the page variation of target pages State has highest award updated value；

According to award updated value, the corresponding reward value of courses of action information described in reward value table is updated.

7. according to the method described in claim 6, wherein, the page variable condition include award updated value from high to low as Under several:

The page changes and reaches target pages；

The page changes and the miss the mark page；

Jump out application program；

The page does not change.

8. according to the method described in claim 6, wherein, according to award updated value, believing courses of action described in reward value table Corresponding reward value is ceased to be updated, comprising:

After being executed according to reward value corresponding to award updated value, the courses of action information and according to courses of action information The maximum reward value of the page recognized, determines changing value；

Based on the changing value, the corresponding reward value of courses of action information described in reward value table is updated.

9. a kind of test equipment of application program, wherein the equipment includes:

Training module trains process for executing at least once, and according to needed for reaching target pages in the trained process Courses of action acquisition of information test result, wherein the training module includes recognition unit, decision package, executes processing unit And judging unit, the trained process include:

Decision package is based on reward value table, the courses of action information being chosen in current page, wherein the reward value table is used for It is recorded in reward value corresponding to element execution movement in the page, the courses of action information is indicated to the target element in the page The target action that element executes；

It executes processing unit to act the object element performance objective of current page, after recognition unit identifies performance objective movement The page, and obtain page variable condition；

Processing unit is executed according to the page variable condition, to the corresponding reward value of courses of action information described in reward value table It is updated；

Judging unit judges whether to reach target pages or jumps out application program, if it has not, above-mentioned processing step is repeated, If it is, terminating this training process.

10. equipment according to claim 9, wherein the recognition unit, the member for including in current page for identification Element, and using the component identification page for including in the page.

11. equipment according to claim 10, wherein the recognition unit, for identifying current page using OCR technique In include element, and the recognition result of each element is spliced into the identification information of the page.

12. equipment according to claim 9, wherein the decision package, for according to recording in the reward value table The reward value corresponding to element execution movement, the courses of action information being chosen in current page, wherein choose in the page It is moved represented by the probability of courses of action information and the courses of action information, to the target that the object element in current page executes The reward value of work is positively correlated.

13. equipment according to claim 12, wherein the decision package, for generating one within the scope of default value Random number；When the random number, which is more than or equal to preset greediness, to be spent, determine maximum in selection current page in reward value table First reward value, and corresponding courses of action information is determined according to first reward value；When the random number is greedy less than preset It is greedy when spending, determine the second reward value randomly selected in current page in reward value table, and true according to second reward value Fixed corresponding courses of action information.

14. equipment according to claim 9, wherein the execution processing unit, for changing shape according to the page State, determines award updated value, and the page changes and reaches the page variable condition of target pages with highest award update Value；According to award updated value, the corresponding reward value of courses of action information described in reward value table is updated.

15. equipment according to claim 14, wherein the page variable condition includes awarding updated value from high to low It is several following:

The page changes and reaches target pages；

The page changes and the miss the mark page；

Jump out application program；

The page does not change.

16. equipment according to claim 14, wherein the execution processing unit, for according to award updated value, described Reward value corresponding to courses of action information and the maximum award according to the page recognized after the execution of courses of action information Value, determines changing value；Based on the changing value, the corresponding reward value of courses of action information described in reward value table is carried out more Newly.

17. a kind of calculating equipment, wherein the equipment includes memory by storing computer program instructions and based on executing The processor of calculation machine program instruction, wherein when the computer program instructions are executed by the processor, trigger the equipment and execute Method described in any item of the claim 1 to 8.

18. a kind of computer-readable medium, is stored thereon with computer program instructions, the computer-readable instruction can be processed Device is executed to realize such as method described in any item of the claim 1 to 8.