CN103885935A - Book section abstract generating method based on book reading behaviors - Google Patents
Book section abstract generating method based on book reading behaviors Download PDFInfo
- Publication number
- CN103885935A CN103885935A CN201410090143.6A CN201410090143A CN103885935A CN 103885935 A CN103885935 A CN 103885935A CN 201410090143 A CN201410090143 A CN 201410090143A CN 103885935 A CN103885935 A CN 103885935A
- Authority
- CN
- China
- Prior art keywords
- sentence
- page
- books
- user
- book
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a book section abstract generating method based on book reading behaviors. A book section abstract generating technology based on the book reading behaviors is essentially a document abstract generating technology, namely, the reading behaviors of a user are added into document abstract generation and applied to engineering science and education book resources. According to the book section abstract generating method, the weight of each book page in a book section is calculated by adopting a book page quantification reading behavior grading mechanism, then the book section is divided according to sentences, the similarity among the sentences is calculated according to distances, sentence weight values existing already are spread according to fashion structures, finally, based on the concept of data reconstitution, the sentences which can represent the content of the book section best are selected out to serve as a book section abstract. The reading behaviors of the user are collected and used in importance evaluation of the book pages, the corresponding book section abstract is obtained based on the concept of data reconstitution, and then the user is assisted in rapidly learning the content of the book section to improve book reading efficiency.
Description
Technical field
The present invention relates to documentation summary generation method, relate in particular to a kind of books chapters and sections abstraction generating method based on books reading behavior.
Background technology
Growing along with digital library, user is before read books, and hope can be understood books chapters and sections content information fast and accurately, urgently wishes can provide in digital library the service of books chapters and sections summary.
Books chapters and sections summarization generation is a kind of documentation summary generation method based on reading behavior in essence, by the modeling of user's reading behavior, according to behavior model, user's reading factor is added in documentation summary generating algorithm, the summary result that obtains read by user and affect.If directly adopt traditional documentation summary generation method, books chapters and sections summary may not can be accurately expressed chapters and sections content information from user's reading angle, so also just cannot meet user's demand.
In traditional reading, the destination object of readers ' reading is simple definite linguistic notation.In the beginning of reading and the end of reading, reader only obtains and obtains cognition by the content information of word, is one and departs from the existence of social encouragement.The appearance that network socialization is read, starting of making that reader selects from reading content finished to reading content, and partly or entirely process has all formed associated with social network.Be mutually related between men in community network this, reader's reading behavior often just becomes the object that needs concern and research.
Socialization reading itself is take content as core, and take social networks as tie, emphasis is shared, the reading new model of mutual exchange and effect.In the process that user reads in content, can carry out interaction with the user of same hobby, after reading finishes, can associate and contact with the masses that read same content, even form the socialization that subject under discussion merges.The overall process of share, mutual exchange and effect being read through socialization.And in these interactions, produced a large amount of new valuable contents, as comment, summary, notes, association or intersection information.
The basic summarization generation algorithm adopting in the time carrying out books chapters and sections summarization generation is the documentation summary generating algorithm (DSDR) based on data reconstruction.Documentation summary generating algorithm based on data reconstruction is a kind of removable method, the documentation summary that the method has been thought should meet a feature: from farthest reconstruct original document of the summary of results, and the expressed content information of the whole document of covering that the summary of results can be tried one's best.
On the basis of the documentation summary generating algorithm based on data reconstruction, various actions user in the time that socialization is read are taken into account, such as reading time, user's important sentences circle professional jargon is, these sentences that enclosed picture are often considered to higher representativeness, compared with the sentence of circle picture, will not have higher weighing factor with other.
Summary of the invention
The object of the invention is, for the chapters and sections that can facilitate user to understand fast books chapters and sections information summary is provided, to have provided a kind of books chapters and sections abstraction generating method based on books reading behavior.
The technical scheme that the present invention solves its technical matters employing is as follows:
The step of the books chapters and sections abstraction generating method based on books reading behavior is as follows:
1) build book page and quantize reading behavior scoring: user's reading behavior is divided into four levels from shallow to deep by reading the degree of depth, be respectively to browse level, collection level, shallow degree reading level and the degree of depth to read level, obtain the book page scoring based on user's reading behavior based on these four levels;
2) sentence weighted value propagate: by step 1) the book page scoring based on user's reading behavior obtain books page quantize score, books chapters and sections are cut apart by sentence, books page quantizes to such an extent that branch gives each sentence initial weighted value, based on the distance between sentence, utilize the popular structural sort algorithm of data to carry out the propagation of sentence weighted value;
3) books chapters and sections summarization generation: after sentence weighted value is propagated, sentence weighted value is added in the documentation summary generating algorithm based on data reconstruction, select important sentences and make a summary as chapters and sections from books chapters and sections.
Described step 1) be:
User is read the behavior of certain page by 2.1 is divided into four levels, be respectively browse level, collection level, shallow degree reads level and the degree of depth is read level, different levels have different score contributions to page;
2.2 use retention rates, turnover rate and Scoring Index decay to weigh the difficulty of reading certain level of arrival, mark with this, book page user retention rate refers to for certain book page, number of users when browsing, proceed to the ratio of the retention number of users of collection, the reading of shallow degree and degree of depth reading, book page churn rate refers to for previous step retains number of users, the ratio of the number of users that this step reduces
Set up the evaluate formula based on user's reading behavior:
V
i=[(p
i+q
i)/p
i]exp(1-p
i) i=1,2,3,4
Book page user retention rate formula:
p
i=U
i/U
1 i=1,2,3,4
Book page churn rate formula:
Wherein: V
ifor the score contribution to books page of whole user group's reading behavior i step; p
ibe that i walks with respect to the retention rate of browsing; q
ibe the turnover rate of i step with respect to i-1 step; U
ifor proceeding to the number of users of i step;
There is dividing of priority 2.3 book page access times, the user who more first accesses and mark this book page is larger to the contribution of this page, scoring based on book page critical behavior node can calculate the significance level of book page, the significance level of book page comprehensively to divide formula equally as follows:
In above-mentioned formula: s
jfor the score value of books j page; W
ujfor the contribution weight of user u to books j page; T
jfor the summation of books j accessed time of page; t
ujfor for the first time time of access of user u to books j page; t
jfor books j page accessed time for the first time; S
ujthe score value sum of critical behavior step books j page being arrived for user u, V
ijfor the score value of user u to books j i that page reaches step critical behavior step; The degree of depth and committed step number that L arrives for user u read books j page;
2.4 according to the method for above scoring can be to the every one page of books the importance in book provide the scoring of quantification, because the otherness of books reading colony, for fear of the books page scoring high phenomenon of marking because calling party number is few, in actual page evaluation procedure, calling party number and scoring are normalized, and the comprehensive grading formula that has obtained final book page is as follows:
In above formula: u
jfor the number of users of browsing of book page j, s
jfor the scoring to book page j, PageScore
jfor the scoring of books page, utilize with the method for mean value comparison known, only browse the number of users of book page and reader all very high to the score value of this page time, comprehensive grading just can be high, feature according to user's reading behavior in books reading, set up the book page significance level appraisement system based on user's reading behavior, four levels reading by book page quantize user behavior, define by calculating the evaluation contribution margin of four levels the difficulty that user arrives to degree of depth reading level from browsing level, finally calculate by the reading behavior of user group on book page the importance that quantizes this page.
Described step 2) be:
3.1 in step 1) in provided the score PageScore of book page j
j, this score has reflected the importance of page j in books, needs to consider that drawn sentence has relative importance in this page simultaneously, the relation of the importance of sentence and page score is as follows:
W in above formula
irepresent sentence v
icurrent weighted value, supposes that the set of given document sentence is
wherein v
irepresent i sentence in set V, the sentence being streaked with straight line by user be placed on set before, suppose that a front k sentence is that user streaks, ask the weighted value of sentence by being left the relation of sentence and a front k sentence;
3.2 make dis:
be illustrated in the distance metric mode on set V, can obtain every couple of sentence v
iwith sentence v
jbetween distance dis (v
i, v
j), order mapping
represent to distribute to each sentence v
iweighted value f
iranking functions, vector f=[f
1..., f
n]
t, vectorial w=[w
1..., w
n]
t, wherein, if sentence v
istreaked w
i≠ 0, otherwise w
i=0, w
irepresent the initial weight value of each sentence;
3.3 are expressed as follows at the structural weight propagation algorithm of data manifold:
Step1: calculate sentence vector distance dis (v between any two
i, v
j), and ascending order arrangement, between the corresponding node of sentence vector, connecting a limit until obtain connected graph between two by ascending order list;
Step2: definition incidence matrix W, meets: if sentence vector v
iand v
jbetween corresponding point, there is a limit, W
ij=exp[-dis
2(v
i, v
j)/2 σ
2]; If sentence vector v
iand v
jbetween corresponding point, there is not limit, W
ij=0; And W
ii=0; Step3: incidence matrix W is carried out to symmetrical standardization, obtain matrix S: S=D
-1/2wD
-1/2, in formula, D is diagonal matrix, the diagonal element prime implicant of diagonal matrix D
Step4: iterative computation f (t+1)=aSf (t)+(1-α) w is until convergence, α be a span [0,1) parameter;
Step5: order
represent sequence { f
i(t) limit }, the limit sequence that obtains sentence weight is
Sentence weight vectors is
3.4 in Step4, and parameter alpha is used for specifying weighted value contribution and the initial weighted value of neighbor node to this node; Because the matrix S in algorithm is a diagonal matrix, so the communication process of weighted value is symmetrical; And for the convergency value of sequence { f (t) }, calculate f
*=(I-α S)
-1w; Through the propagation of weighted value, just obtain the reasonable weighted value of each sentence in books chapters and sections.
Described step 3) be:
4.1 obtain books chapters and sections sentence v
iweighted value
weighted value
reflect sentence v
iimportance in books chapters and sections, by n weighted value
as the diagonal element of matrix F, n weighted value carried out to diagonal matrix,
obtain diagonal matrix F, diagonal matrix F is added to the documentation summary generating algorithm based on data reconstruction;
4.2 in documentation summary generative process, to redefine linear nonnegative number as follows according to the objective function of restructing algorithm:
s.t.β
j≥0,a
ij≥0,and a
i∈R
n
In above formula, the process of selecting of each sentence has added books chapters and sections sentence v
iweighted value f
i *, wherein a
ij>=0 shows that the method only allows the additive operation of sentence in ensemble space, does not allow subtraction; β=[β simultaneously
1, β
2..., β
n]
tit is an auxiliary variable; If β
j=0, all a
1j..., a
njbe 0, the candidate's sentence that this means j row does not have selected, and γ is regular terms parameter;
The objective function of 4.3 documentation summary generating algorithms based on data reconstruction is protruding optimization problems, can guarantee globally optimal solution, now, and fixing a
i, making J is 0 to the derivative of β, the minimal solution that obtains β is as follows:
After having obtained the minimal solution of β, the minimization problem under nonnegativity restrictions can solve with Lagrangian method;
4.4 make α
ijfor constraint condition a
ij>=0 and A=[a
ij] under Lagrangian, lagrange formula L is as follows:
L=J+Tr[αA
T]=Tr[F(V-AV)(V-AV)
T+diag(β)
-1A
TA]+γ||β||
1+Tr[αA
T],α=[α
ij]
F is the diagonal matrix in step 4.1, and the element entry on diagonal matrix F diagonal line is respectively
also be a diagonal matrix, the element entry on diagonal matrix diag (β) diagonal line is respectively β
1..., β
n;
4.5 lagrange formula L are as follows to A differentiate result:
Order
derivative be 0, can obtain being expressed as follows about α:
α=2FVV
T-2FAVV
T-2Adiag(β)
-1
According to Karush-Kuhn-Tucker condition α
ija
ij=0, to the every a that is multiplied by of above formula
ijobtain following equation:
(FVV
T)
ija
ij-(FAVV
T)
ija
ij-(Adiag(β)
-1)
ija
ij=0
Obtain following more new formula according to above formula:
Above-mentioned more new formula iteration is carried out until convergence, finally obtained the summary sentence of books chapters and sections.
The beneficial effect that the inventive method compared with prior art has:
1. the method combines the modeling of user's reading behavior and documentation summary generation method, and the documentation summary generating algorithm based on data reconstruction is applied on books chapters and sections summarization generation, obtains the summary info of books chapters and sections;
2. the method has been carried out analysis modeling to user's reading behavior, and modeling method adopts the thought based on reading the degree of depth, and reading behavior is carried out to level division, has finally provided the comprehensive grading system of books pages, represents the significance level of books page with score height;
3. the method, take the sentence of books chapters and sections as unit, is carried out the propagation of weighted value on data stream row space according to existing sentence weighted value, finally obtains the reasonable weighted value size of each sentence, and it is more accurate to make the reflection of user behavior.
Accompanying drawing explanation
Fig. 1 is the books chapters and sections abstraction generating method system architecture diagram based on books reading behavior;
Fig. 2 is sentence weighted value transmission method block diagram of the present invention;
Fig. 3 is the library catalogue figure of the embodiment of the present invention;
Fig. 4 is the first chapters and sections schematic diagram of the embodiment of the present invention;
Fig. 5 is the chapters and sections summarization generation result figure of the embodiment of the present invention.
Embodiment
As depicted in figs. 1 and 2, the step of the books chapters and sections abstraction generating method based on books reading behavior is as follows:
1) build book page and quantize reading behavior scoring: user's reading behavior is divided into four levels from shallow to deep by reading the degree of depth, be respectively to browse level, collection level, shallow degree reading level and the degree of depth to read level, obtain the book page scoring based on user's reading behavior based on these four levels;
2) sentence weighted value propagate: by step 1) the book page scoring based on user's reading behavior obtain books page quantize score, books chapters and sections are cut apart by sentence, books page quantizes to such an extent that branch gives each sentence initial weighted value, based on the distance between sentence, utilize the popular structural sort algorithm of data to carry out the propagation of sentence weighted value;
3) books chapters and sections summarization generation: after sentence weighted value is propagated, sentence weighted value is added in the documentation summary generating algorithm based on data reconstruction, select important sentences and make a summary as chapters and sections from books chapters and sections.
Described step 1) be:
User is read the behavior of certain page by 2.1 is divided into four levels, be respectively browse level, collection level, shallow degree reads level and the degree of depth is read level, different levels have different score contributions to page;
2.2 use retention rate, turnover rate and Scoring Index decay to weigh the difficulty of reading certain level of arrival, mark with this, between scoring and retention rate, there is a kind of relation of exponential damping, mark relevant with the turnover rate of previous step in the value of a certain step, also relevant to the retention rate of starting stage, here first provide book page user retention rate and turnover rate definition, book page user retention rate refers to for certain book page, number of users when browsing, proceed to collection, the ratio of the retention number of users that shallow degree reading and the degree of depth are read, book page churn rate refers to for previous step retains number of users, the ratio of the number of users that this step reduces,
Set up the evaluate formula based on user's reading behavior:
V
i=[(p
i+q
i)/p
i]exp(1-p
i) i=1,2,3,4
Book page user retention rate formula:
p
i=U
i/U
1 i=1,2,3,4
Book page churn rate formula:
Wherein: V
ifor the score contribution to books page of whole user group's reading behavior i step; p
ibe that i walks with respect to the retention rate of browsing; q
ibe the turnover rate of i step with respect to i-1 step; Ui is the number of users that proceeds to i step;
There is dividing of priority 2.3 book page access times, the user who more first accesses and mark this book page is larger to the contribution of this page, if first calling party has just carried out degree of depth reading to certain page, the significance level of this page is relatively higher, scoring based on book page critical behavior node can calculate the significance level of book page, the significance level of book page comprehensively to divide formula equally as follows:
In above-mentioned formula: s
jfor the score value of books j page; W
ujfor the contribution weight of user u to books j page; T
jfor the summation of books j accessed time of page; t
ujfor for the first time time of access of user u to books j page; t
jfor books j page accessed time for the first time; S
ujthe score value sum of critical behavior step books j page being arrived for user u, V
ijfor the score value of user u to books j i that page reaches step critical behavior step; The degree of depth and committed step number that L arrives for user u read books j page;
2.4 according to the method for above scoring can be to the every one page of books the importance in book provide the scoring of quantification, because the otherness of books reading colony, for fear of the books page scoring high phenomenon of marking because calling party number is few, in actual page evaluation procedure, calling party number and scoring are normalized, and the comprehensive grading formula that has obtained final book page is as follows:
In above formula: u
jfor the number of users of browsing of book page j, s
jfor the scoring to book page j, PageScore
jfor the scoring of books page, utilize with the method for mean value comparison known, only browse the number of users of book page and reader all very high to the score value of this page time, comprehensive grading just can be high, feature according to user's reading behavior in books reading, set up the book page significance level appraisement system based on user's reading behavior, four levels reading by book page quantize user behavior, define by calculating the evaluation contribution margin of four levels the difficulty that user arrives to degree of depth reading level from browsing level, finally calculate by the reading behavior of user group on book page the importance that quantizes this page.
Described step 2) be:
3.1 in step 1) in provided the score PageScore of book page j
j, this score has reflected the importance of page j in books, needs to consider that drawn sentence has relative importance in this page simultaneously, the relation of the importance of sentence and page score is as follows:
W in above formula
irepresent sentence v
icurrent weighted value, supposes that the set of given document sentence is
wherein v
irepresent i sentence in set V, the sentence being streaked with straight line by user be placed on set before, suppose that a front k sentence is that user streaks, ask the weighted value of sentence by being left the relation of sentence and a front k sentence;
3.2 make dis:
be illustrated in the distance metric mode on set V, can obtain every couple of sentence v
iwith sentence v
jbetween distance dis (v
i, v
j), order mapping
represent to distribute to each sentence v
ithe ranking functions of weighted value fi, vector f=[f
1..., f
n]
t, vectorial w=[w
1..., w
n]
t, wherein, if sentence vi is streaked, w
i≠ 0, otherwise w
i=0, w
irepresent the initial weight value of each sentence;
3.3 are expressed as follows at the structural weight propagation algorithm of data manifold:
Step1: calculate sentence vector distance dis (v between any two
i, v
j), and ascending order arrangement, between the corresponding node of sentence vector, connecting a limit until obtain connected graph between two by ascending order list;
Step2: definition incidence matrix W, meets: if sentence vector v
iand v
jbetween corresponding point, there is a limit, W
ij=exp[-dis
2(v
i, v
j)/2 σ
2]; If sentence vector v
iand v
jbetween corresponding point, there is not limit, W
ij=0; And Wii=0; Step3: incidence matrix W is carried out to symmetrical standardization, obtain matrix S: S=D
-1/2wD
-1/2, in formula, D is diagonal matrix, the diagonal element prime implicant of diagonal matrix D
Step4: iterative computation f (t+1)=α Sf (t)+(1-α) w is until convergence, α be a span [0,1) parameter;
Step5: order
the limit that represents sequence { fi (t) }, the limit sequence that obtains sentence weight is
Sentence weight vectors is
3.4 in Step4, and parameter alpha is used for specifying weighted value contribution and the initial weighted value of neighbor node to this node; Because the matrix S in algorithm is a diagonal matrix, so the communication process of weighted value is symmetrical; And for the convergency value of sequence { f (t) }, calculate f
*=(I-aS)
-1w; Through the propagation of weighted value, just obtain the reasonable weighted value of each sentence in books chapters and sections.
Described step 3) be:
4.1 obtain books chapters and sections sentence v
iweighted value f
i *, weighted value f
i *reflect sentence v
iimportance in books chapters and sections, by n weighted value f
i *as the diagonal element of matrix F, n weighted value carried out to diagonal matrix, i.e. F
ii=f
i *, obtain diagonal matrix F, diagonal matrix F is added to the documentation summary generating algorithm based on data reconstruction;
4.2 in documentation summary generative process, to redefine linear nonnegative number as follows according to the objective function of restructing algorithm:
s.t.βj≥0,a
ij≥0,and a
i∈R
n
In above formula, the process of selecting of each sentence has added books chapters and sections sentence v
iweighted value f
i *, wherein a
ij>=0 shows that the method only allows the additive operation of sentence in ensemble space, does not allow subtraction; Simultaneously
β=[β
1, β
2..., β
n]
tit is an auxiliary variable; If β
j=0, all a
1j..., a
njbe 0, the candidate's sentence that this means j row does not have selected, and γ is regular terms parameter;
The objective function of 4.3 documentation summary generating algorithms based on data reconstruction is protruding optimization problems, can guarantee globally optimal solution, now, and fixing a
i, making J is 0 to the derivative of β, the minimal solution that obtains β is as follows:
After having obtained the minimal solution of β, the minimization problem under nonnegativity restrictions can solve with Lagrangian method;
4.4 make α
ijfor constraint condition a
ij>=0 and A=[a
ij] under Lagrangian, lagrange formula L is as follows:
L=J+Tr[αA
T]=Tr[F(V-AV)(V-AV)
T+diag(β)
-1A
TA]+γ||β||
1+Tr[αA
T],α=[α
ij]
F is the diagonal matrix in step 4.1, and the element entry on diagonal matrix F diagonal line is respectively
diag (β) is also a diagonal matrix, and the element entry on diagonal matrix diag (β) diagonal line is respectively β
1..., β
n;
4.5 lagrange formula L is as follows to A differentiate result:
α=2FVV
T-2FAVV
T-2Adiag(β)
-1
According to Karush-Kuhn-Tucker condition α
ija
ij=0, to the every a that is multiplied by of above formula
ijobtain following equation:
(FVV
T)
ija
ij-(FAVV
T)
ija
ij-(Adiag(β)
-1)
ija
ij=0
Obtain following more new formula according to above formula:
Above-mentioned more new formula iteration is carried out until convergence, finally obtained the summary sentence of books chapters and sections.
Embodiment
As shown in Figures 3 to 5, provided an application example of books chapters and sections abstraction generating methods.Describe below in conjunction with the method for this technology the concrete steps that this example is implemented in detail, as follows:
(1) at all books chapters and sections of pre-service of system, obtain books chapters and sections document content.Suppose that user is just at the first segment " definition " of the chapter 1 " Distributed Calculation brief introduction " of read books " Distributed Calculation principle and application ", want to know the chapters and sections summary of this joint, click Directory button, double-click corresponding chapters and sections, first system obtains the data such as the text message of these chapters and sections and user's reading behavior.
(2) type and the level read at these chapters and sections according to user's reading behavior data analysis user, the importance degree that obtains books page according to the comprehensive grading formula of books page quantizes score.
(3) text data of these chapters and sections of books is pressed to sentence and divided, the quantification score of reading setting-out behavior and books page in conjunction with user, has obtained by the initial weight value of line sentence.
(4) sentence is done to participle, remove the processing such as stop words, each sentence builds the vector of a higher dimensional space, obtains sentence similarity between any two according to the distance between vector.
(5) carry out the propagation of sentence initial weight value by the sort method on data manifold space, finally obtain the rational weighted value of each sentence.
(6) sentence weighted value matrix F is added in the documentation summary generating algorithm based on data reconstruction, execution algorithm, until the summary info of some sentences (depending on chapters and sections length) as these books chapters and sections chosen in convergence from these books chapters and sections, finally returns to user.
The operation result of this example at accompanying drawing 3 to middle demonstration, user is just at read books, can check by catalogue the clip Text of corresponding chapters and sections, facilitate the faster more detailed chapters and sections content of understanding of user, this books chapters and sections abstraction generating method has good use value and application prospect.
Claims (4)
1. the books chapters and sections abstraction generating method based on books reading behavior, is characterized in that its step is as follows:
1) build book page and quantize reading behavior scoring: user's reading behavior is divided into four levels from shallow to deep by reading the degree of depth, be respectively to browse level, collection level, shallow degree reading level and the degree of depth to read level, obtain the book page scoring based on user's reading behavior based on these four levels;
2) sentence weighted value propagate: by step 1) the book page scoring based on user's reading behavior obtain books page quantize score, books chapters and sections are cut apart by sentence, books page quantizes to such an extent that branch gives each sentence initial weighted value, based on the distance between sentence, utilize the popular structural sort algorithm of data to carry out the propagation of sentence weighted value;
3) books chapters and sections summarization generation: after sentence weighted value is propagated, sentence weighted value is added in the documentation summary generating algorithm based on data reconstruction, select important sentences and make a summary as chapters and sections from books chapters and sections.
2. according to the books chapters and sections abstraction generating method based on books reading behavior described in claim 1, it is characterized in that described step 1) be:
User is read the behavior of certain page by 2.1 is divided into four levels, be respectively browse level, collection level, shallow degree reads level and the degree of depth is read level, different levels have different score contributions to page;
2.2 use retention rates, turnover rate and Scoring Index decay to weigh the difficulty of reading certain level of arrival, mark with this, book page user retention rate refers to for certain book page, number of users when browsing, proceed to the ratio of the retention number of users of collection, the reading of shallow degree and degree of depth reading, book page churn rate refers to for previous step retains number of users, the ratio of the number of users that this step reduces
Set up the evaluate formula based on user's reading behavior:
V
i=[(p
i+q
i)/p
i]exp(1-p
i) i=1,2,3,4
Book page user retention rate formula:
p
i=U
i/U
1 i=1,2,3,4
Book page churn rate formula:
Wherein: V
ifor the score contribution to books page of whole user group's reading behavior i step; p
ibe that i walks with respect to the retention rate of browsing; q
ibe the turnover rate of i step with respect to i-1 step; U
ifor proceeding to the number of users of i step;
There is dividing of priority 2.3 book page access times, the user who more first accesses and mark this book page is larger to the contribution of this page, calculate the significance level of book page based on the scoring of book page critical behavior node, the significance level of book page comprehensively to divide formula equally as follows:
In above-mentioned formula: s
jfor the score value of books j page; W
ujfor the contribution weight of user u to books j page; T
jfor the summation of books j accessed time of page; t
ujfor for the first time time of access of user u to books j page; t
jfor books j page accessed time for the first time; S
ujthe score value sum of critical behavior step books j page being arrived for user u, V
ijfor the score value of user u to books j i that page reaches step critical behavior step; The degree of depth and committed step number that L arrives for user u read books j page;
2.4 according to the method for above scoring can be to the every one page of books the importance in book provide the scoring of quantification, because the otherness of books reading colony, for fear of the books page scoring high phenomenon of marking because calling party number is few, in actual page evaluation procedure, calling party number and scoring are normalized, and the comprehensive grading formula that has obtained final book page is as follows:
In above formula: u
jfor the number of users of browsing of book page j, s
jfor the scoring to book page j, PageScore
jfor the scoring of books page, utilize with the method for mean value comparison known, only browse the number of users of book page and reader all very high to the score value of this page time, comprehensive grading just can be high, feature according to user's reading behavior in books reading, set up the book page significance level appraisement system based on user's reading behavior, four levels reading by book page quantize user behavior, define by calculating the evaluation contribution margin of four levels the difficulty that user arrives to degree of depth reading level from browsing level, finally calculate by the reading behavior of user group on book page the importance that quantizes this page.
3. the books chapters and sections abstraction generating method based on books reading behavior according to claim 1, is characterized in that described step 2) be:
3.1 in step 1) in provided the score PageScore of book page j
j, this score has reflected the importance of page j in books, needs to consider that drawn sentence has relative importance in this page simultaneously, the relation of the importance of sentence and page score is as follows:
W in above formula
jrepresent sentence v
icurrent weighted value, supposes that the set of given document sentence is
wherein v
irepresent i sentence in set V, the sentence being streaked with straight line by user be placed on set before, suppose that a front k sentence is that user streaks, ask the weighted value of sentence by being left the relation of sentence and a front k sentence;
3.2 make dis:
be illustrated in the distance metric mode on set V, can obtain every couple of sentence v
iwith sentence v
jbetween distance dis (v
i, v
j), order mapping
represent to distribute to each sentence v
iweighted value f
iranking functions, vector f=[f
1..., f
n]
t, vectorial w=[w
1..., w
n]
t, wherein, if sentence v
istreaked w
i≠ 0, otherwise w
i=0, w
irepresent the initial weight value of each sentence;
3.3 are expressed as follows at the structural weight propagation algorithm of data manifold:
Step1: calculate sentence vector distance dis (v between any two
i, v
j), and ascending order arrangement, between the corresponding node of sentence vector, connecting a limit until obtain connected graph between two by ascending order list;
Step2: definition incidence matrix W, meets: if sentence vector v
iand v
jbetween corresponding point, there is a limit, W
ij=exp[-dis
2(v
i, v
j)/2 σ
2]; If sentence vector v
iand v
jbetween corresponding point, there is not limit, W
ij=0; And W
ii=0; Step3: incidence matrix W is carried out to symmetrical standardization, obtain matrix S: S=D
-1/2wD
-1/2, in formula, D is diagonal matrix, the diagonal element prime implicant of diagonal matrix D
Step4: iterative computation f (t+1)=α Sf (t)+(1-α) w is until convergence, α be a span [0,1) parameter;
Step5: make f
i *represent sequence { f
i(t) limit }, the limit sequence that obtains sentence weight is
sentence weight vectors is
3.4 in Step4, and parameter alpha is used for specifying weighted value contribution and the initial weighted value of neighbor node to this node; Because the matrix S in algorithm is a diagonal matrix, so the communication process of weighted value is symmetrical; And for the convergency value of sequence { f (t) }, calculate f
*=(I-aS)
-1w; Through the propagation of weighted value, just obtain the reasonable weighted value of each sentence in books chapters and sections.
4. the books chapters and sections abstraction generating method based on books reading behavior according to claim 1, is characterized in that described step 3) be:
4.1 obtain books chapters and sections sentence v
iweighted value f
i *, weighted value f
i *reflect sentence v
iimportance in books chapters and sections, by n weighted value f
i *as the diagonal element of matrix F, n weighted value carried out to diagonal matrix, i.e. F
ii=f
i *, obtain diagonal matrix F, add the documentation summary based on data reconstruction to generate diagonal matrix F
Algorithm;
4.2 in documentation summary generative process, to redefine linear nonnegative number as follows according to the objective function of restructing algorithm:
s.t.β
j≥0,a
ij≥0,and a
i∈R
n
In above formula, the process of selecting of each sentence has added books chapters and sections sentence v
iweighted value f
i *, wherein a
ij>=0 shows that the method only allows the additive operation of sentence in ensemble space, does not allow subtraction; Simultaneously
β=[β
1, β
2..., β
n]
tit is an auxiliary variable; If β
j=0, all a
1j..., a
njbe 0, the candidate's sentence that this means j row does not have selected, and γ is regular terms parameter;
The objective function of 4.3 documentation summary generating algorithms based on data reconstruction is protruding optimization problems, can guarantee globally optimal solution, now, and fixing a
i, making J is 0 to the derivative of β, the minimal solution that obtains β is as follows:
After having obtained the minimal solution of β, the minimization problem under nonnegativity restrictions can solve with Lagrangian method;
4.4 make α
ijfor constraint condition a
ij>=0 and A=[a
ij] under Lagrangian, lagrange formula L is as follows:
L=J+Tr[αA
T]=Tr[F(V-AV)(V-AV)
T+diag(β)
-1A
TA]+γ||β||
1+Tr[αA
T],α=[α
ij]
F is the diagonal matrix in step 4.1, and the element entry on diagonal matrix F diagonal line is respectively
diag (β) is also-individual diagonal matrix that the element entry on diagonal matrix diag (β) diagonal line is respectively β
1..., β
n;
4.5 lagrange formula L are as follows to A differentiate result:
α=2FVV
T-2FAVV
T-2Adiag(β)
-1
According to Karush-Kuhn-Tucker condition α
ija
ij=0, to the every a that is multiplied by of above formula
ijobtain following equation:
(FVV
T)
ija
ij-(FAVV
T)
ija
ij-(Adiag(β)
-1)
ija
ij=0
Obtain following more new formula according to above formula:
Above-mentioned more new formula iteration is carried out until convergence, finally obtained the summary sentence of books chapters and sections.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410090143.6A CN103885935B (en) | 2014-03-12 | 2014-03-12 | Books chapters and sections abstraction generating method based on books reading behavior |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410090143.6A CN103885935B (en) | 2014-03-12 | 2014-03-12 | Books chapters and sections abstraction generating method based on books reading behavior |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103885935A true CN103885935A (en) | 2014-06-25 |
CN103885935B CN103885935B (en) | 2016-06-29 |
Family
ID=50954830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410090143.6A Active CN103885935B (en) | 2014-03-12 | 2014-03-12 | Books chapters and sections abstraction generating method based on books reading behavior |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103885935B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI549003B (en) * | 2014-08-18 | 2016-09-11 | 葆光資訊有限公司 | Method for automatic sections division |
CN106469176A (en) * | 2015-08-20 | 2017-03-01 | 百度在线网络技术(北京)有限公司 | A kind of method and apparatus for extracting text snippet |
CN107608972A (en) * | 2017-10-24 | 2018-01-19 | 河海大学 | A kind of more text quick abstract methods |
CN108231064A (en) * | 2018-01-02 | 2018-06-29 | 联想(北京)有限公司 | A kind of data processing method and system |
CN109241863A (en) * | 2018-08-14 | 2019-01-18 | 北京万维之道信息技术有限公司 | For splitting the data processing method and device of reading content |
CN111199151A (en) * | 2019-12-31 | 2020-05-26 | 联想(北京)有限公司 | Data processing method and data processing device |
US10929452B2 (en) | 2017-05-23 | 2021-02-23 | Huawei Technologies Co., Ltd. | Multi-document summary generation method and apparatus, and terminal |
CN115048507A (en) * | 2022-05-24 | 2022-09-13 | 维沃移动通信有限公司 | Abstract generation method and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020138528A1 (en) * | 2000-12-12 | 2002-09-26 | Yihong Gong | Text summarization using relevance measures and latent semantic analysis |
CN1614585A (en) * | 2003-11-07 | 2005-05-11 | 摩托罗拉公司 | Context Generality |
CN102841940A (en) * | 2012-08-17 | 2012-12-26 | 浙江大学 | Document summary extracting method based on data reconstruction |
-
2014
- 2014-03-12 CN CN201410090143.6A patent/CN103885935B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020138528A1 (en) * | 2000-12-12 | 2002-09-26 | Yihong Gong | Text summarization using relevance measures and latent semantic analysis |
CN1614585A (en) * | 2003-11-07 | 2005-05-11 | 摩托罗拉公司 | Context Generality |
CN102841940A (en) * | 2012-08-17 | 2012-12-26 | 浙江大学 | Document summary extracting method based on data reconstruction |
Non-Patent Citations (3)
Title |
---|
ZHANYING HE等: "Document Summarization Based on Data Reconstruction", 《PROCEEDINGS OF THE TWENTY-SIXTY AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 * |
ZHIMING ZHANG等: "《Web-Age Information Management》", 16 June 2013, VERLAG BERLIN HEIDELBERG * |
乔少杰等: "基于中心性和PageRank的网页综合评分方法", 《西南交通大学学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI549003B (en) * | 2014-08-18 | 2016-09-11 | 葆光資訊有限公司 | Method for automatic sections division |
CN106469176A (en) * | 2015-08-20 | 2017-03-01 | 百度在线网络技术(北京)有限公司 | A kind of method and apparatus for extracting text snippet |
CN106469176B (en) * | 2015-08-20 | 2019-08-16 | 百度在线网络技术(北京)有限公司 | It is a kind of for extracting the method and apparatus of text snippet |
US10929452B2 (en) | 2017-05-23 | 2021-02-23 | Huawei Technologies Co., Ltd. | Multi-document summary generation method and apparatus, and terminal |
CN107608972A (en) * | 2017-10-24 | 2018-01-19 | 河海大学 | A kind of more text quick abstract methods |
CN108231064A (en) * | 2018-01-02 | 2018-06-29 | 联想(北京)有限公司 | A kind of data processing method and system |
CN109241863A (en) * | 2018-08-14 | 2019-01-18 | 北京万维之道信息技术有限公司 | For splitting the data processing method and device of reading content |
CN111199151A (en) * | 2019-12-31 | 2020-05-26 | 联想(北京)有限公司 | Data processing method and data processing device |
CN115048507A (en) * | 2022-05-24 | 2022-09-13 | 维沃移动通信有限公司 | Abstract generation method and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN103885935B (en) | 2016-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103885935B (en) | Books chapters and sections abstraction generating method based on books reading behavior | |
KR102302609B1 (en) | Neural Network Architecture Optimization | |
US20200019807A1 (en) | Training method of image-text matching model, bi-directional search method, and relevant apparatus | |
CN116127020B (en) | Method for training generated large language model and searching method based on model | |
CN112749326B (en) | Information processing method, information processing device, computer equipment and storage medium | |
CN115455171B (en) | Text video mutual inspection rope and model training method, device, equipment and medium | |
CN110874536B (en) | Corpus quality evaluation model generation method and double-sentence pair inter-translation quality evaluation method | |
CN104572614A (en) | Training method and system for language model | |
CN104572631A (en) | Training method and system for language model | |
CN117632098B (en) | AIGC-based intelligent building design system | |
CN112085087A (en) | Method and device for generating business rules, computer equipment and storage medium | |
CN113420212A (en) | Deep feature learning-based recommendation method, device, equipment and storage medium | |
CN112883229A (en) | Video-text cross-modal retrieval method and device based on multi-feature-map attention network model | |
CN112270181A (en) | Sequence labeling method, system, computer readable storage medium and computer device | |
CN115130711A (en) | Data processing method and device, computer and readable storage medium | |
CN111143454B (en) | Text output method and device and readable storage medium | |
CN103530421A (en) | Micro-blog based event similarity measuring method and system | |
CN115587192A (en) | Relationship information extraction method, device and computer readable storage medium | |
CN110188339A (en) | Assessment of scenic spot method, apparatus, computer equipment and storage medium | |
CN102262659A (en) | Audio label disseminating method based on content calculation | |
CN109614587B (en) | Intelligent human relationship analysis modeling method, terminal device and storage medium | |
CN108572956A (en) | Method and device for calling knowledge point slices | |
CN113361629A (en) | Training sample generation method and device, computer equipment and storage medium | |
CN114610741A (en) | Conversation method and system | |
CN113761145B (en) | Language model training method, language processing method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |