CN103885935A

CN103885935A - Book section abstract generating method based on book reading behaviors

Info

Publication number: CN103885935A
Application number: CN201410090143.6A
Authority: CN
Inventors: 鲁伟明; 安文佳; 吴江琴; 庄越挺
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-03-12
Filing date: 2014-03-12
Publication date: 2014-06-25
Anticipated expiration: 2034-03-12
Also published as: CN103885935B

Abstract

本发明公开了一种基于图书阅读行为的图书章节摘要生成方法。基于图书阅读行为的图书章节摘要生成技术本质上是一种文档摘要生成技术，即将用户阅读行为加入文档摘要生成之中，并且应用于工程科教图书资源上。本发明首先采用图书页面量化阅读行为评分机制计算图书章节中每页书页的权重大小，然后将图书章节按句子分割，句子之间的相似度按距离计算并将已有的句子权重值按流行结构传播，最后基于数据重构的思想挑选出最能够代表图书章节内容的句子作为图书章节摘要。本发明将用户阅读行为收集，用于对图书书页的重要性评价中，通过基于数据重构的文档摘要生成思想得到对应的图书章节摘要，进而辅助用户快速了解图书章节内容，提高图书阅读效率。The invention discloses a book chapter abstract generation method based on book reading behavior. Book chapter summary generation technology based on book reading behavior is essentially a document summary generation technology, that is, adding user reading behavior to document summary generation, and applying it to engineering science and education book resources. The present invention first adopts the book page quantitative reading behavior scoring mechanism to calculate the weight of each page in the book chapters, then divides the book chapters into sentences, calculates the similarity between sentences according to the distance and calculates the existing sentence weights according to the popular structure Finally, based on the idea of data reconstruction, the sentence that best represents the content of the book chapter is selected as the book chapter summary. The invention collects user reading behaviors and uses them in the importance evaluation of book pages, obtains corresponding book chapter summaries through the idea of generating document summaries based on data reconstruction, and then assists users to quickly understand book chapter content and improves book reading efficiency.

Description

A Method for Generating Book Chapter Summary Based on Book Reading Behavior

技术领域technical field

本发明涉及文档摘要生成方法，尤其涉及一种基于图书阅读行为的图书章节摘要生成方法。The invention relates to a method for generating document summaries, in particular to a method for generating book chapter summaries based on book reading behavior.

背景技术Background technique

随着数字图书馆的日益发展，用户在阅读图书前，希望能够快速准确的了解图书章节内容信息，迫切希望数字图书馆中能够提供图书章节摘要的服务。With the increasing development of digital libraries, users hope to quickly and accurately understand the content information of book chapters before reading books, and urgently hope that digital libraries can provide book chapter summary services.

图书章节摘要生成本质上是一种基于阅读行为的文档摘要生成方法，即将用户阅读行为建模，根据行为模型将用户阅读因素加入文档摘要生成算法中，得到受用户阅读影响的摘要结果。如果直接采用传统的文档摘要生成方法，图书章节摘要可能不会从用户阅读角度来准确表达章节内容信息，这样也就无法满足用户的需求。Book chapter summary generation is essentially a document summary generation method based on reading behavior, which is to model user reading behavior, add user reading factors into the document summary generation algorithm according to the behavior model, and obtain summary results affected by user reading. If the traditional document summarization method is directly used, book chapter summaries may not accurately express chapter content information from the user's reading point of view, thus failing to meet the needs of users.

在传统的阅读中，读者阅读的目标对象是简单确定的语言符号。在阅读的开始和阅读的结束，读者仅仅通过文字化的内容信息获取并得到认知，是一个脱离于社会的鼓励的存在。基于网络的社会化阅读的出现，使阅读者从阅读内容选择的开始到阅读内容结束，部分或全部过程都与社会化网络形成了关联。在这种人与人之间相互关联的社会网络中，读者的阅读行为往往就成为需要关注和研究的对象。In traditional reading, the target object of readers' reading is a simple and definite language symbol. At the beginning and end of reading, the reader only obtains and obtains cognition through textual content information, which is an existence separated from the encouragement of society. The emergence of socialized reading based on the network makes readers associate with the social network in part or all of the process from the beginning of reading content selection to the end of reading content. In such an interrelated social network, readers' reading behavior often becomes the object of attention and research.

社会化阅读本身是以内容为核心，以社交关系为纽带，注重分享、交流和互动的阅读新模式。用户在内容阅读的过程中，可以与同样喜好的用户进行互动，阅读结束后，可以与阅读同一内容的大众进行交往联系，甚至形成议题融合的社会化。分享、交流和互动贯穿于社会化阅读的全过程。而在这些互动交流中，产生了大量新的有价值的内容，如评论、摘要、笔记、关联或交叉信息。Socialized reading itself is a new mode of reading with content as the core, social relations as the link, and emphasis on sharing, communication and interaction. In the process of reading content, users can interact with users who have the same preferences. After reading, they can communicate with the public who read the same content, and even form a socialization of fusion of topics. Sharing, communication and interaction run through the whole process of socialized reading. And in these interactive exchanges, a large number of new valuable contents are produced, such as comments, summaries, notes, associations or cross information.

在进行图书章节摘要生成时所采用的基础摘要生成算法是基于数据重构的文档摘要生成算法(DSDR)。基于数据重构的文档摘要生成算法是一种抽取式的方法，该方法认为好的文档摘要应该满足一个特点：从结果摘要能够最大程度的重构原始文档，即的结果摘要能够尽量的覆盖整个文档所表达的内容信息。The basic algorithm used to generate book chapter summaries is the document summarization algorithm based on data reconstruction (DSDR). The document summary generation algorithm based on data reconstruction is an extractive method, which believes that a good document summary should satisfy one characteristic: the original document can be reconstructed from the result summary to the greatest extent, that is, the result summary can cover the entire document as much as possible. The content information expressed by the document.

在基于数据重构的文档摘要生成算法的基础上，把用户在社会化阅读时的各种行为考虑进去，比如阅读的时候用户的重要句子圈画行为，这些被圈画的句子往往被认为有比较高的代表性，与其他没有被圈画的句子相比要具有比较高的影响权重。On the basis of the document summary generation algorithm based on data reconstruction, various behaviors of users during social reading are taken into consideration, such as the user's behavior of circling important sentences during reading. These circled sentences are often considered to be meaningful A relatively high representation has a relatively high influence weight compared with other sentences that are not circled.

发明内容Contents of the invention

本发明的目的是为了提供能够方便用户快速了解图书章节信息的章节摘要，给出了一种基于图书阅读行为的图书章节摘要生成方法。The purpose of the present invention is to provide chapter summaries that can facilitate users to quickly understand book chapter information, and provides a method for generating book chapter summaries based on book reading behavior.

本发明解决其技术问题采用的技术方案如下：The technical scheme that the present invention solves its technical problem adopts is as follows:

基于图书阅读行为的图书章节摘要生成方法的步骤如下：The steps of the book chapter summary generation method based on book reading behavior are as follows:

1)构建图书页面量化阅读行为评分机制：将用户阅读行为按阅读深度由浅到深分为四个层次，分别是浏览层次、收藏层次、浅度阅读层次和深度阅读层次，基于这四个层次得到基于用户阅读行为的图书页面评分机制；1) Construct the quantitative reading behavior scoring mechanism of book pages: divide the user's reading behavior into four levels according to the reading depth from shallow to deep, which are browsing level, collection level, shallow reading level and deep reading level. Based on these four levels, the Book page scoring mechanism based on user reading behavior;

2)句子权重值传播：通过步骤1)的基于用户阅读行为的图书页面评分机制得到图书书页量化得分，将图书章节按句子分割，图书书页量化得分会赋予每个句子初始的权重值，基于句子之间的距离，利用数据流行结构上的排序算法进行句子权重值的传播；2) Sentence weight value propagation: Through the book page scoring mechanism based on user reading behavior in step 1), the quantitative score of the book page is obtained, and the chapters of the book are divided into sentences. The quantitative score of the book page will give each sentence an initial weight value, based on the sentence The distance between, using the sorting algorithm on the data popular structure to spread the sentence weight value;

3)图书章节摘要生成：句子权重值得到传播后，将句子权重值加入基于数据重构的文档摘要生成算法中，从图书章节中挑选重要句子作为章节摘要。3) Book chapter summary generation: After the sentence weight value is propagated, the sentence weight value is added to the document summary generation algorithm based on data reconstruction, and important sentences are selected from the book chapters as the chapter summary.

所述的步骤1)为：Described step 1) is:

2.1将用户阅读某页的行为划分为四个层次，分别是浏览层次、收藏层次、浅度阅读层次和深度阅读层次，不同层次对书页有不同的得分贡献；2.1 Divide the user's behavior of reading a certain page into four levels, which are browsing level, collection level, shallow reading level and deep reading level. Different levels have different score contributions to the page;

2.2使用留存率、流失率和评分指数衰减来衡量阅读到达某个层次的难度，以此来进行评分，图书页面用户留存率是指对于某图书页面来讲，相对于浏览时的用户数，进行到收藏、浅度阅读和深度阅读的留存用户数的比例，图书页面用户流失率是指对于上一步留存用户数，这一步所减少的用户数的比例，2.2 Use the retention rate, churn rate and rating index decay to measure the difficulty of reading to a certain level, and use this to score. The retention rate of book page users refers to the number of users when browsing a book page. The ratio of the number of retained users to bookmarking, shallow reading, and deep reading. The user churn rate of book pages refers to the ratio of the number of users retained in the previous step to the number of users reduced in this step.

建立基于用户阅读行为的评分公式：Create a scoring formula based on user reading behavior:

V_i=[(p_i+q_i)／p_i]exp(1-p_i) i=1,2,3,4V _i =[(p _i +q _i )／p _i ]exp(1-p _i ) i=1,2,3,4

图书页面用户留存率公式：Book page user retention rate formula:

p_i=U_i／U₁ i=1,2,3,4p _i =U _i ／U ₁ i=1,2,3,4

图书页面用户流失率公式：Book page user churn rate formula:

${q q}_{i i} = = \{\begin{matrix} {U u}_{i i} / / {U u}_{i i - - 11} & i i = = 2,3,4 2,3,4 \\ 11 & i i = = 11 \end{matrix}$

其中：V_i为整个用户群体的阅读行为第i步对图书某页的得分贡献；p_i为第i步相对于浏览的留存率；q_i为第i步相对于第i-1步的流失率；U_i为进行到第i步的用户数；Among them: V _i is the score contribution of the reading behavior of the entire user group to a certain page of the book in the i-th step; p _i is the retention rate of the i-th step relative to browsing; q _i is the loss of the i-th step relative to the i-1 step rate; U _i is the number of users who have reached step i;

2.3图书页面访问时间有先后之分，越先访问并标注该图书页面的用户对该页面的贡献越大，基于图书页面关键行为节点的评分机制可以计算图书页面的重要程度，图书页面的重要程度的综合平分公式如下：2.3 The access time of the book page is divided into sequence. The earlier the user visits and marks the book page, the greater the contribution to the page. Based on the scoring mechanism of the key behavior nodes of the book page, the importance of the book page can be calculated, and the importance of the book page can be calculated. The comprehensive bisection formula is as follows:

${s the s}_{j j} = = \frac{{Σ Σ}_{u u &Element; &Element; {R R}_{j j}} {W W}_{uj uj} \times \times {S S}_{uj uj}}{{Σ Σ}_{u u &Element; &Element; {R R}_{j j}} {W W}_{uj uj}}$

${W W}_{uj uj} = = \{\begin{matrix} {log log}_{22} (({T T}_{j j} / / (({t t}_{uj uj} - - {t t}_{j j})))) & {t t}_{uj uj} &NotEqual; &NotEqual; {t t}_{j j} \\ {log log}_{22} {T T}_{j j} & {t t}_{uj uj} = = {t t}_{j j} \end{matrix}$

${S S}_{uj uj} = = {Σ Σ}_{i i = = 11}^{L L} {V V}_{ij ij}$

上述式子中：s_j为图书第j页的评分值；W_uj为用户u对图书第j页的贡献权重；T_j为图书第j页被访问时间的总和；t_uj为用户u对图书第j页的第一次访问的时间；t_j为图书第j页第一次被访问的时间；S_uj为用户u对图书第j页所到达的关键行为步骤的评分值之和，V_ij为用户u对图书第j页所达到第i步关键行为步骤的评分值；L为用户u阅读图书第j页所到达的深度及关键步骤数；In the above formula: s _j is the score value of page j of the book; W _uj is the contribution weight of user u to page j of the book; T _j is the sum of the visit time of page j of the book; t _uj is the contribution of user u to the book page j The time of the first visit of page j; t _j is the time when page j of the book is visited for the first time; _{Suj is} the sum of the score values of the key behavior steps reached by user u on page j of the book, V _ij is the scoring value of the i-th key behavior step that user u achieves on page j of the book; L is the depth and number of key steps that user u reaches when reading page j of the book;

2.4根据以上评分机制的方法能够对图书每一页在书中的重要性给出量化的评分，因为图书阅读群体的差异性，为了避免图书书页评分因访问用户数少而评分高的现象，在实际的书页评价过程中，对访问用户数和评分进行归一化处理，得到了最终的图书页面的综合评分公式如下：2.4 According to the method of the above scoring mechanism, a quantitative score can be given for the importance of each page of the book in the book. Because of the differences in the book reading groups, in order to avoid the phenomenon that the score of the book page is high due to the small number of visiting users, in the In the actual book page evaluation process, the number of visiting users and ratings are normalized, and the final comprehensive scoring formula for book pages is obtained as follows:

${PageScore PageScore}_{j j} = = [[{log log u u}_{j j} - - \overset{&OverBar; &OverBar;}{{log log u u}_{J J}}]] + + [[{log log}_{22} {s the s}_{j j} - - \overset{&OverBar; &OverBar;}{{log log}_{22} {s the s}_{J J}}]]$

上式中：u_j为图书页面j的浏览用户数，s_j为对图书页面j的评分，PageScore_j为图书书页的评分，利用与平均值比较的方法可知，只有浏览图书页面的用户数和读者对该页面的评分值都很高的时候，综合评分才会高，根据用户阅读行为在图书阅读中的特点，建立基于用户阅读行为的图书页面重要程度评价体系，通过图书页面阅读的四个层次量化用户行为，通过计算四个层次的评价贡献值来定义用户从浏览层次到深度阅读层次到达的难度，最终通过图书页面上用户群体的阅读行为来计算量化该页面的重要性。In the above formula: u _j is the number of browsing users of book page j, s _j is the score of book page j, and PageScore _j is the score of book page. Using the method of comparing with the average value, it can be known that only the number of users browsing book pages and The overall score will be high when the readers' ratings for the page are very high. According to the characteristics of the user's reading behavior in book reading, an evaluation system for the importance of the book page based on the user's reading behavior is established. Through the four aspects of book page reading Hierarchical quantification of user behavior, by calculating the evaluation contribution value of the four levels to define the difficulty for users to reach from the browsing level to the in-depth reading level, and finally calculating and quantifying the importance of the page through the reading behavior of the user group on the book page.

所述的步骤2)为：Described step 2) is:

3.1在步骤1)中给出了图书页面j的得分PageScore_j，这个得分反映了页面j在图书中的重要性，同时需要考虑被划句子在该书页中具有相对重要性，句子的重要性与页面得分的关系如下：3.1 In step 1), the score PageScore _j of page j of the book is given. This score reflects the importance of page j in the book. At the same time, it is necessary to consider the relative importance of the marked sentence in the page. The importance of the sentence is related to The relationship between page scores is as follows:

${w w}_{i i} = = \{\begin{matrix} \frac{{L L}_{i i} * * {PageScore PageScore}_{j j}}{{Σ Σ}_{i i = = 11}^{n no} (({L L}_{i i} * * {PageScore PageScore}_{j j}))} & {L L}_{i i} &NotEqual; &NotEqual; 00 \\ 00 & {L L}_{i i} = = 00 \end{matrix}$

上式中的w_i表示句子v_i当前的权重值，假设给定文档句子集合为

其中v_i表示集合V中第i个句子，把被用户用直线划过的句子放在集合的前面，假定前k个句子是用户划过的，通过剩下句子与前k个句子的关系来求句子的权重值；The w _i in the above formula represents the current weight value of the sentence v _i , assuming that the set of given document sentences is

Among them, v _i represents the i-th sentence in the set V, put the sentences drawn by the user with a straight line in front of the set, assuming that the first k sentences are drawn by the user, and use the relationship between the remaining sentences and the first k sentences to determine Find the weight value of the sentence;

3.2令dis：

表示在集合V上的距离度量方式，则可以得到每对句子v_i和句子v_j之间的距离dis(v_i，v_j)，令映射表示分配给每个句子v_i权重值f_i的排序函数，向量f=[f₁，...，f_n]^T，向量w=[w₁，...，w_n]^T，其中，如果句子v_i被划过则w_i≠0，否则w_i=0，w_i表示每个句子的初始权重值；3.2 order dis:

Indicates the distance measurement method on the set V, then the distance dis(v _i , v _j ) between each pair of sentence v _i and sentence v _j can be obtained, let the mapping Represents the ranking function assigned to each sentence v _i weight value f _i , vector f=[f ₁ ,...,f _n ] ^T , vector w=[w ₁ ,...,w _n ] ^T , where, If the sentence v _i is crossed, then w _i ≠0, otherwise w _i =0, and w _i represents the initial weight value of each sentence;

3.3在数据流形结构上的权重传播算法表示如下：3.3 The weight propagation algorithm on the data manifold structure is expressed as follows:

Step1：计算句子向量两两之间的距离dis(v_i，v_j)，并且升序排列，按升序列表在两两句子向量所对应的节点之间连接一条边直到得到连通图；Step1: Calculate the distance dis(v _i , v _j ) between pairs of sentence vectors, and arrange them in ascending order, and connect an edge between the nodes corresponding to the pair of sentence vectors in ascending order until a connected graph is obtained;

Step2：定义关联矩阵W，满足：如果句子向量v_i和v_j对应的点之间存在一条边的话，W_ij=exp[-dis²(v_i，v_j)／2σ²]；如果句子向量v_i和v_j对应的点之间不存在边的话，W_ij=0；并且W_ii=0；Step3：对关联矩阵W进行对称标准化，得到矩阵S：S=D^-1／2WD^-1／2，式中D是对角矩阵，对角矩阵D的对角元素项 $D_{ii} = Σ_{j = 1}^{n} W_{ij};$ Step2: Define the incidence matrix W, satisfying: if there is an edge between the points corresponding to the sentence vector v _i and v _j , W _ij = exp[-dis ² (v _i , v _j )/2σ ² ]; if the sentence vector If there is no edge between the points corresponding to v _i and v _j , W _ij =0; and W _ii =0; Step3: Symmetrically normalize the incidence matrix W to obtain the matrix S: S=D ^-1/2 WD ^{-1 ／2} , where D is a diagonal matrix, and the diagonal elements of the diagonal matrix D ${D.}_{i} = Σ_{j = 1}^{no} W_{ij};$

Step4：迭代计算f(t+1)=aSf(t)+(1-α)w直到收敛，α是一个取值范围在[0，1)的参数；Step4: Iteratively calculate f(t+1)=aSf(t)+(1-α)w until convergence, α is a parameter with a value range of [0, 1);

Step5：令

表示序列{f_i(t)}的极限，得到句子权重的极限序列为

{{f_{1}}^{*}, . . ., f_{n}^{*}},

句子权重向量为

f = {[{f_{1}}^{*}, . . ., f_{n}^{*}]}^{T};

Step5: order

Represents the limit of the sequence {f _i (t)}, and the limit sequence of the sentence weight is obtained as

{{f_{1}}^{*}, . . ., f_{no}^{*}},

The sentence weight vector is

f = {[{f_{1}}^{*}, . . ., f_{no}^{*}]}^{T};

3.4在Step4中，参数α用来指定邻居节点对该节点的权重值贡献和初始的权重值；由于算法中的矩阵S是一个对角矩阵，所以权重值的传播过程是对称的；而对于序列{f(t)}的收敛值，计算f^*=(I-αS)^-1w；经过权重值的传播，就得到了图书章节中每个句子的合理权重值。3.4 In Step4, the parameter α is used to specify the weight value contribution of the neighbor node to the node and the initial weight value; since the matrix S in the algorithm is a diagonal matrix, the propagation process of the weight value is symmetrical; and for the sequence For the convergence value of {f(t)}, calculate f ^* =(I-αS) ^-1 w; after the propagation of the weight value, a reasonable weight value for each sentence in the book chapter is obtained.

所述步骤3)为：Described step 3) is:

4.1得到图书章节句子v_i的权重值

权重值

反映了句子v_i在图书章节中的重要性，将n个权重值

作为矩阵F的对角元素，对n个权重值进行对角矩阵化，即

得到对角矩阵F，将对角矩阵F加入基于数据重构的文档摘要生成算法；4.1 Get the weight value of the book chapter sentence v _i

Weights

Reflects the importance of the sentence v _i in the chapter of the book, and the n weight values

As the diagonal elements of the matrix F, diagonally matrix the n weight values, namely

Obtain the diagonal matrix F, and add the diagonal matrix F to the document summary generation algorithm based on data reconstruction;

4.2在文档摘要生成过程中重新定义线性非负数据重构算法的目标函数如下：4.2 Redefine the objective function of the linear non-negative data reconstruction algorithm in the process of document summarization as follows:

$\underset{{a a}_{i i},, β β}{min min} J J = = {Σ Σ}_{i i = = 11}^{n no} {{{f f}_{i i}^{* *} {| | | | {v v}_{i i} - - {V V}^{T T} {a a}_{i i} | | | |}^{22} + + {Σ Σ}_{j j = = 11}^{n no} \frac{{a a}_{ij ij}^{22}}{{β β}_{j j}}}} + + γ γ {| | | | β β | | | |}_{11}$

s.t.β_j≥0，a_ij≥0，and a_i∈Rⁿ stβ _j ≥ 0, a _ij ≥ 0, and a _i ∈ R ⁿ

上式中，每个句子的挑选过程加入了图书章节句子v_i的权重值f_i ^*，其中a_ij≥0表明该方法只允许集合空间中句子的加法运算，不允许减法运算；同时β=[β₁，β₂，...，β_n]^T是一个辅助变量；如果β_j=0的话，则所有的a_1j，...，a_nj为0，这意味着第j列的候选句子没有被选中，γ是正则项参数；In the above formula, the weight value f _i ^* of the book chapter sentence v _i is added to the selection process of each sentence, where a _ij ≥ 0 indicates that the method only allows the addition of sentences in the set space, and does not allow subtraction; at the same time β = [β ₁ , β ₂ ,..., β _n ] ^T is an auxiliary variable; if β _j =0, then all a _1j ,..., a _nj are 0, which means that the candidate of column j The sentence is not selected, and γ is the parameter of the regular term;

4.3基于数据重构的文档摘要生成算法的目标函数是一个凸优化问题，可以保证全局最优解，此时，固定a_i，令J对β的导数为0，得到β的最小解如下：4.3 The objective function of the document summary generation algorithm based on data reconstruction is a convex optimization problem, which can guarantee the global optimal solution. At this time, fix a _i , let the derivative of J to β be 0, and obtain the minimum solution of β as follows:

${β β}_{j j} = = \sqrt{\frac{{Σ Σ}_{i i = = 11}^{n no} {a a}_{ij ij}^{22}}{γ γ}}$

当得到了β的最小解之后，非负约束下的最小化问题可以用拉格朗日方法求解；When the minimum solution of β is obtained, the minimization problem under non-negative constraints can be solved by Lagrangian method;

4.4令α_ij为约束条件a_ij≥0和A=[a_ij]下的拉格朗日算子，则拉格朗日公式L如下：4.4 Let α _ij be the Lagrangian operator under the constraints a _ij ≥ 0 and A=[a _ij ], then the Lagrangian formula L is as follows:

L=J+Tr[αA^T]=Tr[F(V-AV)(V-AV)^T+diag(β)^-1A^TA]+γ||β||₁+Tr[αA^T]，α=[α_ij]L=J+Tr[ ^αAT ]=Tr[F(V-AV)(V-AV) ^T +diag(β) ^-1 A ^T A]+γ||β|| ₁ +Tr[ ^αAT ], α=[α _ij ]

F是步骤4.1中的对角矩阵，对角矩阵F对角线上的元素项分别为

也是一个对角矩阵，对角矩阵diag(β)对角线上的元素项分别为β₁，...，β_n；F is the diagonal matrix in step 4.1, and the elements on the diagonal of the diagonal matrix F are

It is also a diagonal matrix, and the elements on the diagonal of the diagonal matrix diag(β) are β ₁ ,..., β _n ;

4.5拉格朗日公式L对A求导结果如下：4.5 Lagrangian formula L to A derivation results are as follows:

$\frac{&PartialD; &PartialD; L L}{&PartialD; &PartialD; A A} = = - - 22 {FVV FVV}^{T T} + + 22 {FAVV FAVV}^{T T} + + 22 Adiag Adiag {((β β))}^{- - 11} + + α α$

令的导数为0，可以得到关于α的表示如下：make The derivative of is 0, and the expression about α can be obtained as follows:

α=2FVV^T-2FAVV^T-2Adiag(β)^-1 α=2FVV ^T -2FAVV ^T -2Adiag(β) ^-1

根据Karush-Kuhn-Tucker条件α_ija_ij=0，对上式各项乘以a_ij得到如下等式：According to the Karush-Kuhn-Tucker condition α _ij a _ij =0, multiply the items of the above formula by a _ij to get the following equation:

(FVV^T)_ija_ij-(FAVV^T)_ija_ij-(Adiag(β)^-1)_ija_ij=0(FVV ^T ) _ij a _ij -(FAVV ^T ) _ij a _ij -(Adiag(β) ^-1 ) _ij a _ij =0

根据上式得到如下的更新公式：According to the above formula, the following update formula is obtained:

${a a}_{ij ij} &LeftArrow; &LeftArrow; \frac{{a a}_{ij ij} {(({FVV FVV}^{T T}))}_{ij ij}}{{[[{FAVV FAVV}^{T T} + + Adiag Adiag {((β β))}^{- - 11}]]}_{ij ij}}$

将上述更新公式迭代执行直到收敛，最终得到图书章节的摘要句子。The above update formula is iteratively executed until convergence, and finally the summary sentence of the book chapter is obtained.

本发明方法与现有技术相比具有的有益效果：The inventive method has the beneficial effect compared with prior art:

1.该方法结合了用户阅读行为建模和文档摘要生成方法，将基于数据重构的文档摘要生成算法应用于图书章节摘要生成上，得到图书章节的摘要信息；1. This method combines user reading behavior modeling and document summary generation methods, and applies the document summary generation algorithm based on data reconstruction to the generation of book chapter summary to obtain the summary information of book chapters;

2.该方法对用户阅读行为进行了分析建模，建模方法采用基于阅读深度的思想，对阅读行为进行层次划分，最终给出了图书书页的综合评分体系，以得分高低表示图书书页的重要程度；2. This method analyzes and models the user's reading behavior. The modeling method uses the idea of reading depth to divide the reading behavior into layers. Finally, a comprehensive scoring system for book pages is given, and the score indicates the importance of book pages. degree;

3.该方法以图书章节的句子为单位，根据已有的句子权重值在数据流行空间上进行权重值的传播，最后得到每个句子的合理权重值大小，使得对用户行为的反映更加准确。3. This method takes the sentence of the book chapter as a unit, and propagates the weight value in the data popularity space according to the existing sentence weight value, and finally obtains a reasonable weight value for each sentence, which makes the reflection of user behavior more accurate.

附图说明Description of drawings

图1是基于图书阅读行为的图书章节摘要生成方法系统架构图；Figure 1 is a system architecture diagram of a book chapter summary generation method based on book reading behavior;

图2是本发明的句子权重值传播方法步骤图；Fig. 2 is a step diagram of the sentence weight value propagation method of the present invention;

图3是本发明实施例的图书目录图；Fig. 3 is the book catalog diagram of the embodiment of the present invention;

图4是本发明实施例的第一章节示意图；Fig. 4 is a schematic diagram of the first chapter of an embodiment of the present invention;

图5是本发明实施例的章节摘要生成结果图。Fig. 5 is a diagram of the generation result of the chapter abstract according to the embodiment of the present invention.

具体实施方式Detailed ways

如图1和图2所示，基于图书阅读行为的图书章节摘要生成方法的步骤如下：As shown in Figure 1 and Figure 2, the steps of the book chapter summary generation method based on book reading behavior are as follows:

所述的步骤1)为：Described step 1) is:

2.2使用留存率、流失率和评分指数衰减来衡量阅读到达某个层次的难度，以此来进行评分，评分与留存率之间存在一种指数衰减的关系，评分在某一步的值与上一步的流失率相关，还与初始阶段的留存率相关，这里先给出图书页面用户留存率和流失率定义，图书页面用户留存率是指对于某图书页面来讲，相对于浏览时的用户数，进行到收藏、浅度阅读和深度阅读的留存用户数的比例，图书页面用户流失率是指对于上一步留存用户数，这一步所减少的用户数的比例，2.2 Use the retention rate, loss rate and scoring exponential decay to measure the difficulty of reading to a certain level, and use this to score. There is an exponential decay relationship between the score and the retention rate, and the value of the score at a certain step is the same as that of the previous step. It is related to the churn rate of the book page, and also related to the retention rate in the initial stage. Here we first give the definition of the book page user retention rate and churn rate. The book page user retention rate refers to the number of users when browsing a certain book page. The ratio of the number of retained users to bookmarking, shallow reading, and in-depth reading. The churn rate of book pages refers to the ratio of the number of users retained in the previous step to the number of users reduced in this step.

图书页面用户留存率公式：Book page user retention rate formula:

p_i=U_i／U₁ i=1,2,3,4p _i =U _i ／U ₁ i=1,2,3,4

图书页面用户流失率公式：Book page user churn rate formula:

其中：V_i为整个用户群体的阅读行为第i步对图书某页的得分贡献；p_i为第i步相对于浏览的留存率；q_i为第i步相对于第i-1步的流失率；Ui为进行到第i步的用户数；Among them: V _i is the score contribution of the reading behavior of the entire user group to a certain page of the book in the i-th step; p _i is the retention rate of the i-th step relative to browsing; q _i is the loss of the i-th step relative to the i-1 step rate; Ui is the number of users who have reached step i;

2.3图书页面访问时间有先后之分，越先访问并标注该图书页面的用户对该页面的贡献越大，如果第一个访问用户就对某页面进行了深度阅读，则该页面的重要程度相对要高一些，基于图书页面关键行为节点的评分机制可以计算图书页面的重要程度，图书页面的重要程度的综合平分公式如下：2.3 The access time of book pages is different. The earlier the user visits and marks the book page, the greater the contribution to the page. If the first user who visits reads a page in depth, the importance of the page is relatively high. To be higher, based on the scoring mechanism of the key behavior nodes of the book page, the importance of the book page can be calculated. The comprehensive equalization formula for the importance of the book page is as follows:

${S S}_{uj uj} = = {Σ Σ}_{i i = = 11}^{L L} {V V}_{ij ij}$

所述的步骤2)为：Described step 2) is:

3.2令dis：

表示在集合V上的距离度量方式，则可以得到每对句子v_i和句子v_j之间的距离dis(v_i，v_j)，令映射表示分配给每个句子v_i权重值fi的排序函数，向量f=[f₁，...，f_n]^T，向量w=[w₁，...，w_n]^T，其中，如果句子vi被划过则w_i≠0，否则w_i=0，w_i表示每个句子的初始权重值；3.2 order dis:

Indicates the distance measurement method on the set V, then the distance dis(v _i , v _j ) between each pair of sentence v _i and sentence v _j can be obtained, let the mapping Represents the ranking function assigned to each sentence v _i weight value fi, vector f=[f ₁ ,...,f _n ] ^T , vector w=[w ₁ ,...,w _n ] ^T , where, if If the sentence vi is crossed, then w _i ≠ 0, otherwise w _i = 0, and w _i represents the initial weight value of each sentence;

Step2：定义关联矩阵W，满足：如果句子向量v_i和v_j对应的点之间存在一条边的话，W_ij=exp[-dis²(v_i，v_j)／2σ²]；如果句子向量v_i和v_j对应的点之间不存在边的话，W_ij=0；并且Wii=0；Step3：对关联矩阵W进行对称标准化，得到矩阵S：S=D^-1／2WD^-1／2，式中D是对角矩阵，对角矩阵D的对角元素项 $D_{ii} = Σ_{j = 1}^{n} W_{ij};$ Step2: Define the incidence matrix W, satisfying: if there is an edge between the points corresponding to the sentence vector v _i and v _j , W _ij = exp[-dis ² (v _i , v _j )/2σ ² ]; if the sentence vector If there is no edge between the points corresponding to v _i and v _j , W _ij = 0; and Wii = 0; Step3: Symmetrically standardize the incidence matrix W to obtain the matrix S: S=D ^-1/2 WD ^{-1/ 2} , where D is a diagonal matrix, and the diagonal elements of the diagonal matrix D ${D.}_{i} = Σ_{j = 1}^{no} W_{ij};$

Step4：迭代计算f(t+1)=αSf(t)+(1-α)w直到收敛，α是一个取值范围在[0，1)的参数；Step4: Iteratively calculate f(t+1)=αSf(t)+(1-α)w until convergence, α is a parameter with a value range of [0, 1);

Step5：令表示序列{fi(t)}的极限，得到句子权重的极限序列为 ${{f_{1}}^{*}, . . ., f_{n}^{*}},$ 句子权重向量为 $f = {[{f_{1}}^{*}, . . ., f_{n}^{*}]}^{T};$ Step5: order Represents the limit of the sequence {fi(t)}, and the limit sequence of the sentence weight is obtained as ${{f_{1}}^{*}, . . ., f_{no}^{*}},$ The sentence weight vector is $f = {[{f_{1}}^{*}, . . ., f_{no}^{*}]}^{T};$

3.4在Step4中，参数α用来指定邻居节点对该节点的权重值贡献和初始的权重值；由于算法中的矩阵S是一个对角矩阵，所以权重值的传播过程是对称的；而对于序列{f(t)}的收敛值，计算f^*=(I-aS)^-1w；经过权重值的传播，就得到了图书章节中每个句子的合理权重值。3.4 In Step4, the parameter α is used to specify the weight value contribution of the neighbor node to the node and the initial weight value; since the matrix S in the algorithm is a diagonal matrix, the propagation process of the weight value is symmetrical; and for the sequence For the convergence value of {f(t)}, calculate f ^* =(I-aS) ^-1 w; after the propagation of the weight value, a reasonable weight value for each sentence in the book chapter is obtained.

所述步骤3)为：Described step 3) is:

4.1得到图书章节句子v_i的权重值f_i ^*，权重值f_i ^*反映了句子v_i在图书章节中的重要性，将n个权重值f_i ^*作为矩阵F的对角元素，对n个权重值进行对角矩阵化，即F_ii=f_i ^*，得到对角矩阵F，将对角矩阵F加入基于数据重构的文档摘要生成算法；4.1 Get the weight value f _i ^* of the sentence v _i of the book chapter, the weight value f _i ^* reflects the importance of the sentence v _i in the book chapter, and take n weight values f _i ^* as the diagonal elements of the matrix F, for n Weight values are diagonally matrixed, that is, F _ii =f _i ^* , to obtain a diagonal matrix F, and the diagonal matrix F is added to the document summary generation algorithm based on data reconstruction;

s.t.βj≥0，a_ij≥0，and a_i∈Rⁿ stβj ≥ 0, a _ij ≥ 0, and a _i ∈ R ⁿ

上式中，每个句子的挑选过程加入了图书章节句子v_i的权重值f_i ^*，其中a_ij≥0表明该方法只允许集合空间中句子的加法运算，不允许减法运算；同时In the above formula, the weight value f _i ^* of the book chapter sentence v _i is added to the selection process of each sentence, where a _ij ≥ 0 indicates that the method only allows the addition of sentences in the set space, and does not allow subtraction; at the same time

β=[β₁，β₂，...，β_n]^T是一个辅助变量；如果β_j=0的话，则所有的a_1j，...，a_nj为0，这意味着第j列的候选句子没有被选中，γ是正则项参数；β=[β ₁ , β ₂ ,..., β _n ] ^T is an auxiliary variable; if β _j =0, then all a _1j ,..., a _nj are 0, which means that the jth column The candidate sentence of is not selected, γ is the regular term parameter;

F是步骤4.1中的对角矩阵，对角矩阵F对角线上的元素项分别为diag(β)也是一个对角矩阵，对角矩阵diag(β)对角线上的元素项分别为β₁，...，β_n；F is the diagonal matrix in step 4.1, and the elements on the diagonal of the diagonal matrix F are diag(β) is also a diagonal matrix, and the elements on the diagonal of the diagonal matrix diag(β) are β ₁ ,..., β _n ;

令

的导数为0，可以得到关于α的表示如下：make

The derivative of is 0, and the expression about α can be obtained as follows:

α=2FVV^T-2FAVV^T-2Adiag(β)^-1 α=2FVV ^T -2FAVV ^T -2Adiag(β) ^-1

实施例Example

如附图3至附图5所示，给出了图书章节摘要生成方法的一个应用实例。下面结合本技术的方法详细说明该实例实施的具体步骤，如下：As shown in accompanying drawings 3 to 5, an application example of the method for generating book chapter summaries is given. Below in conjunction with the method of this technology describe in detail the concrete steps that this example implements, as follows:

(1)在系统已经预处理所有的图书章节，得到图书章节文档内容。假设用户正在阅读图书《分布式计算原理与应用》的第一章“分布式计算简介”的第一节“定义”，想要知道这一节的章节摘要，点击“目录”按钮，双击对应章节，系统首先获取该章节的文本信息和用户的阅读行为等数据。(1) After the system has pre-processed all book chapters, the document content of the book chapters is obtained. Suppose the user is reading the first section "Definition" of the first chapter "Introduction to Distributed Computing" in the book "Principles and Applications of Distributed Computing". If you want to know the chapter summary of this section, click the "Contents" button and double-click the corresponding chapter , the system first obtains data such as the text information of the chapter and the user's reading behavior.

(2)根据用户阅读行为数据分析用户在该章节阅读的类型和层次，根据图书书页的综合评分公式得到图书书页的重要度量化得分。(2) According to the user's reading behavior data, the type and level of the user's reading in the chapter are analyzed, and the important quantitative score of the book page is obtained according to the comprehensive scoring formula of the book page.

(3)将图书该章节的文本数据按句子划分，结合用户阅读画线行为和图书书页的量化得分，得到了被划线句子的初始权重值。(3) The text data of this chapter of the book is divided into sentences, and the initial weight value of the underlined sentence is obtained by combining the user's reading line-drawing behavior and the quantitative score of the book page.

(4)将句子做分词，去除停用词等处理，每个句子构建一个高维空间的向量，根据向量之间的距离得到句子两两之间的相似度。(4) Segment the sentences, remove stop words, etc., construct a vector in a high-dimensional space for each sentence, and obtain the similarity between two sentences according to the distance between the vectors.

(5)通过数据流形空间上的排序方法进行句子初始权重值的传播，最后得到每个句子合理的权重值。(5) Propagate the initial weight value of the sentence through the sorting method on the data manifold space, and finally obtain the reasonable weight value of each sentence.

(6)将句子权重值矩阵F加入基于数据重构的文档摘要生成算法中，执行算法直到收敛从该图书章节中选取若干句子(视章节长短而定)作为该图书章节的摘要信息，最后返回给用户。(6) Add the sentence weight value matrix F into the document summary generation algorithm based on data reconstruction, and execute the algorithm until it converges. Select several sentences (depending on the length of the chapter) from the book chapter as the summary information of the book chapter, and finally return to the user.

本实例的运行结果在附图3至中显示，用户正在阅读图书，可以通过目录查看对应章节的摘要内容，方便用户更快更详细的了解章节内容，这种图书章节摘要生成方法有良好的使用价值和应用前景。The running results of this example are shown in attached drawings 3 to . The user is reading a book, and can view the summary content of the corresponding chapter through the table of contents, which is convenient for the user to understand the chapter content faster and in more detail. This method of generating book chapter summaries is very useful. value and application prospects.

Claims

1. the books chapters and sections abstraction generating method based on books reading behavior, is characterized in that its step is as follows:

1) build book page and quantize reading behavior scoring: user's reading behavior is divided into four levels from shallow to deep by reading the degree of depth, be respectively to browse level, collection level, shallow degree reading level and the degree of depth to read level, obtain the book page scoring based on user's reading behavior based on these four levels;

2) sentence weighted value propagate: by step 1) the book page scoring based on user's reading behavior obtain books page quantize score, books chapters and sections are cut apart by sentence, books page quantizes to such an extent that branch gives each sentence initial weighted value, based on the distance between sentence, utilize the popular structural sort algorithm of data to carry out the propagation of sentence weighted value;

3) books chapters and sections summarization generation: after sentence weighted value is propagated, sentence weighted value is added in the documentation summary generating algorithm based on data reconstruction, select important sentences and make a summary as chapters and sections from books chapters and sections.

2. according to the books chapters and sections abstraction generating method based on books reading behavior described in claim 1, it is characterized in that described step 1) be:

User is read the behavior of certain page by 2.1 is divided into four levels, be respectively browse level, collection level, shallow degree reads level and the degree of depth is read level, different levels have different score contributions to page;

2.2 use retention rates, turnover rate and Scoring Index decay to weigh the difficulty of reading certain level of arrival, mark with this, book page user retention rate refers to for certain book page, number of users when browsing, proceed to the ratio of the retention number of users of collection, the reading of shallow degree and degree of depth reading, book page churn rate refers to for previous step retains number of users, the ratio of the number of users that this step reduces

Set up the evaluate formula based on user's reading behavior:

V _i=[(p _i+q _i)／p _i]exp(1-p _i) i=1,2,3,4

Book page user retention rate formula:

p _i=U _i／U ₁ i=1,2,3,4

Book page churn rate formula:

q_{i} = \{\begin{matrix} U_{i} / U_{i - 1} & i = 2,3,4 \\ 1 & i = 1 \end{matrix}

Wherein: V _ifor the score contribution to books page of whole user group's reading behavior i step; p _ibe that i walks with respect to the retention rate of browsing; q _ibe the turnover rate of i step with respect to i-1 step; U _ifor proceeding to the number of users of i step;

There is dividing of priority 2.3 book page access times, the user who more first accesses and mark this book page is larger to the contribution of this page, calculate the significance level of book page based on the scoring of book page critical behavior node, the significance level of book page comprehensively to divide formula equally as follows:

s_{j} = \frac{Σ_{u &Element; R_{j}} W_{uj} \times S_{uj}}{Σ_{u &Element; R_{j}} W_{uj}}

W_{uj} = \{\begin{matrix} \log_{2} (T_{j} / (t_{uj} - t_{j})) & t_{uj} &NotEqual; t_{j} \\ \log_{2} T_{j} & t_{uj} = t_{j} \end{matrix}

S_{uj} = Σ_{i = 1}^{L} V_{ij}

In above-mentioned formula: s _jfor the score value of books j page; W _ujfor the contribution weight of user u to books j page; T _jfor the summation of books j accessed time of page; t _ujfor for the first time time of access of user u to books j page; t _jfor books j page accessed time for the first time; S _ujthe score value sum of critical behavior step books j page being arrived for user u, V _ijfor the score value of user u to books j i that page reaches step critical behavior step; The degree of depth and committed step number that L arrives for user u read books j page;

2.4 according to the method for above scoring can be to the every one page of books the importance in book provide the scoring of quantification, because the otherness of books reading colony, for fear of the books page scoring high phenomenon of marking because calling party number is few, in actual page evaluation procedure, calling party number and scoring are normalized, and the comprehensive grading formula that has obtained final book page is as follows:

{PageScore}_{j} = [{\log u}_{j} - \overset{&OverBar;}{{\log u}_{J}}] + [\log_{2} s_{j} - \overset{&OverBar;}{\log_{2} s_{J}}]

In above formula: u _jfor the number of users of browsing of book page j, s _jfor the scoring to book page j, PageScore _jfor the scoring of books page, utilize with the method for mean value comparison known, only browse the number of users of book page and reader all very high to the score value of this page time, comprehensive grading just can be high, feature according to user's reading behavior in books reading, set up the book page significance level appraisement system based on user's reading behavior, four levels reading by book page quantize user behavior, define by calculating the evaluation contribution margin of four levels the difficulty that user arrives to degree of depth reading level from browsing level, finally calculate by the reading behavior of user group on book page the importance that quantizes this page.

3. the books chapters and sections abstraction generating method based on books reading behavior according to claim 1, is characterized in that described step 2) be:

3.1 in step 1) in provided the score PageScore of book page j _j, this score has reflected the importance of page j in books, needs to consider that drawn sentence has relative importance in this page simultaneously, the relation of the importance of sentence and page score is as follows:

w_{i} = \{\begin{matrix} \frac{L_{i} * {PageScore}_{j}}{Σ_{i = 1}^{n} (L_{i} * {PageScore}_{j})} & L_{i} &NotEqual; 0 \\ 0 & L_{i} = 0 \end{matrix}

W in above formula _jrepresent sentence v _icurrent weighted value, supposes that the set of given document sentence is

wherein v _irepresent i sentence in set V, the sentence being streaked with straight line by user be placed on set before, suppose that a front k sentence is that user streaks, ask the weighted value of sentence by being left the relation of sentence and a front k sentence;

3.2 make dis: be illustrated in the distance metric mode on set V, can obtain every couple of sentence v _iwith sentence v _jbetween distance dis (v _i, v _j), order mapping

represent to distribute to each sentence v _iweighted value f _iranking functions, vector f=[f ₁..., f _n] ^t, vectorial w=[w ₁..., w _n] ^t, wherein, if sentence v _istreaked w _i≠ 0, otherwise w _i=0, w _irepresent the initial weight value of each sentence;

3.3 are expressed as follows at the structural weight propagation algorithm of data manifold:

Step1: calculate sentence vector distance dis (v between any two _i, v _j), and ascending order arrangement, between the corresponding node of sentence vector, connecting a limit until obtain connected graph between two by ascending order list;

Step2: definition incidence matrix W, meets: if sentence vector v _iand v _jbetween corresponding point, there is a limit, W _ij=exp[-dis ²(v _i, v _j)/2 σ ²]; If sentence vector v _iand v _jbetween corresponding point, there is not limit, W _ij=0; And W _ii=0; Step3: incidence matrix W is carried out to symmetrical standardization, obtain matrix S: S=D ^-1/2wD ^-1/2, in formula, D is diagonal matrix, the diagonal element prime implicant of diagonal matrix D

D_{ii} = Σ_{j = 1}^{n} W_{ij};

Step4: iterative computation f (t+1)=α Sf (t)+(1-α) w is until convergence, α be a span [0,1) parameter;

Step5: make f _i ^*represent sequence { f _i(t) limit }, the limit sequence that obtains sentence weight is sentence weight vectors is

3.4 in Step4, and parameter alpha is used for specifying weighted value contribution and the initial weighted value of neighbor node to this node; Because the matrix S in algorithm is a diagonal matrix, so the communication process of weighted value is symmetrical; And for the convergency value of sequence { f (t) }, calculate f ^*=(I-aS) ^-1w; Through the propagation of weighted value, just obtain the reasonable weighted value of each sentence in books chapters and sections.

4. the books chapters and sections abstraction generating method based on books reading behavior according to claim 1, is characterized in that described step 3) be:

4.1 obtain books chapters and sections sentence v _iweighted value f _i ^*, weighted value f _i ^*reflect sentence v _iimportance in books chapters and sections, by n weighted value f _i ^*as the diagonal element of matrix F, n weighted value carried out to diagonal matrix, i.e. F _ii=f _i ^*, obtain diagonal matrix F, add the documentation summary based on data reconstruction to generate diagonal matrix F

Algorithm;

4.2 in documentation summary generative process, to redefine linear nonnegative number as follows according to the objective function of restructing algorithm:

\min_{a_{i}, β} J = Σ_{i = 1}^{n} {{f_{i}}^{*} {| | v_{i} - V^{T} a_{i} | |}^{2} + Σ_{j = 1}^{n} \frac{a_{ij}^{2}}{β_{j}}} + γ {| | β | |}_{1}

s.t.β _j≥0，a _ij≥0，and a _i∈R ⁿ

In above formula, the process of selecting of each sentence has added books chapters and sections sentence v _iweighted value f _i ^*, wherein a _ij>=0 shows that the method only allows the additive operation of sentence in ensemble space, does not allow subtraction; Simultaneously

β=[β ₁, β ₂..., β _n] ^tit is an auxiliary variable; If β _j=0, all a _1j..., a _njbe 0, the candidate's sentence that this means j row does not have selected, and γ is regular terms parameter;

The objective function of 4.3 documentation summary generating algorithms based on data reconstruction is protruding optimization problems, can guarantee globally optimal solution, now, and fixing a _i, making J is 0 to the derivative of β, the minimal solution that obtains β is as follows:

β_{j} = \sqrt{\frac{Σ_{i = 1}^{n} a_{ij}^{2}}{γ}}

After having obtained the minimal solution of β, the minimization problem under nonnegativity restrictions can solve with Lagrangian method;

4.4 make α _ijfor constraint condition a _ij>=0 and A=[a _ij] under Lagrangian, lagrange formula L is as follows:

L=J+Tr[αA ^T]=Tr[F(V-AV)(V-AV) ^T+diag(β) ^-1A ^TA]+γ||β|| ₁+Tr[αA ^T]，α=[α _ij]

F is the diagonal matrix in step 4.1, and the element entry on diagonal matrix F diagonal line is respectively

diag (β) is also-individual diagonal matrix that the element entry on diagonal matrix diag (β) diagonal line is respectively β ₁..., β _n;

4.5 lagrange formula L are as follows to A differentiate result:

\frac{&PartialD; L}{&PartialD; A} = - 2 {FVV}^{T} + 2 {FAVV}^{T} + 2 Adiag {(β)}^{- 1} + α

Order

derivative be 0, can obtain being expressed as follows about α:

α=2FVV ^T-2FAVV ^T-2Adiag(β) ^-1

According to Karush-Kuhn-Tucker condition α _ija _ij=0, to the every a that is multiplied by of above formula _ijobtain following equation:

(FVV ^T) _ija _ij-(FAVV ^T) _ija _ij-(Adiag(β) ^-1) _ija _ij=0

Obtain following more new formula according to above formula:

a_{ij} &LeftArrow; \frac{a_{ij} {({FVV}^{T})}_{ij}}{{[{FAVV}^{T} + Adiag {(β)}^{- 1}]}_{ij}}

Above-mentioned more new formula iteration is carried out until convergence, finally obtained the summary sentence of books chapters and sections.