CN102169507B

CN102169507B - Implementation method of distributed real-time search engine

Info

Publication number: CN102169507B
Application number: CN 201110137785
Authority: CN
Inventors: 程行荣; 季刚; 陈青溪; 时宜
Original assignee: Xiamen Yaxon Networks Co Ltd
Current assignee: Xiamen Yaxun Zhilian Technology Co ltd
Priority date: 2011-05-26
Filing date: 2011-05-26
Publication date: 2013-03-20
Anticipated expiration: 2031-05-26
Also published as: CN102169507A

Abstract

The invention relates to the technical field of search engines, specifically relating to a distributed real-time search engine. A system construction and operation method of the search engine at least comprises the following steps: A, designing a functional structure of a system; B, designing a data index structure of the system; C, creating an index; D, updating the index; and E, searching the index. The distributed real-time search engine can construct an updating index and a combining index simultaneously in the memory of the system, and can access the updating index and the combining index simultaneously while searching the index; when the number of the documents of the updating index is accumulated to a threshold value, the updating index is submitted to a disk index and changed as a combining index, and the original combining index is changed as a new updating index; and therefore, the updating data can be searched, and the real time property of the retrieval data of the search engine can be improved.

Description

A kind of implementation method of distributed real-time search engine

Technical field

The present invention relates to the search engine technique field, relate in particular to a kind of implementation method of distributed real-time search engine.

Background technology

Be accompanied by the arrival of era of knowledge-driven economy, the information in the internet is explosive growth, and what the present stage people faced is not absence of information, but information spreads unchecked, the screening of having no way of, thereby, obtaining the information that needs how accurately and fast, in time, is the problem that search engine need to solve.

Search engine refers to according to certain strategy, uses specific computer program to gather information from particular network such as internet, and after information being organized and processed, for the user provides retrieval service, the information display that user search is relevant is to user's system.

Traditional search engine, for example, Google, Baidu, Yahoo etc., although the data volume of processing is huge, reached the TB level, but its data source is mainly from conventional websites such as portal website, forum, E-Government, the station data renewal frequency of this class is not high, each data volume of upgrading is also little, thereby its information processing is not high to the requirement of real-time of search engine.

Along with microblogging, the rise of the social medias such as social class website, " micromessage " that the netizen creates emerges in multitude, thereby produces the real time mass data.In addition along with the fast development of enterprise mobile application such as mobile crm system and handheld terminal, the user has higher requirement to inquiry velocity and the real-time of information, and traditional search engines can not adapt to the processing demands of the processing of real time mass data and real-time search.The data volume that the real time mass data have renewal frequency height, renewal is large, the large characteristic of data volume of accumulation, usually reaches hundreds of GB, even reaches the data volume of TB or PB level.Real-time search engine has very high requirement on the real-time of mass data processing and inquiry response.When data volume reaches the TB level, there is very large contradiction between the frequency of Data Update and the speed of inquiry response, because it is large to work as the cumulative data amount, when the data volume of upgrading is also very large, thereby can cause the structure of index and maintenance time length to cause real-time to guarantee, namely, when existing search engine scheme adopts this increment index mechanism, the structure of index and retrieving separately carry out, after the number of files that the construction logic of index is only accumulated in new section reaches threshold value (such as 10000) or reaches threshold value (such as 5 minutes) interval time, just new section is submitted in the index burst for the indexed search logic.Therefore, can retrieve the document from being submitted to of a document, between have a regular hour and postpone, usually a few minutes in the dozens of minutes scope, and in real-time retrieval, so long delay is intolerable.

Summary of the invention

Deficiency for the prior art scheme, the present invention proposes a kind ofly to overcome increment index mechanism with the contradiction between the index real-time, index during by the renewal in the Installed System Memory, a kind of distributed real-time search engine that the cooperation of index and disk index realizes when merging.

The technical solution used in the present invention is as follows:

A kind of implementation method of distributed real-time search engine, its system constructing and operation may further comprise the steps at least:

A. the functional structure of design system, this functional structure is to create in the concentrating type system based on Master/Slave, comprise the following functions node: center control nodes, index datastore node and external service node, wherein, described center control nodes is created in the Master system, described index datastore node and external service node are created in the Slave system, described center control nodes, the storage and maintenance that is used for the attribute information of data directory structure index, and the storage and maintenance of the attribute information of index datastore node, described index datastore node is used for the establishment of data directory structure index burst, upgrade and retrieval, described external service node is used for the establishment of reception hint, renewal and retrieval request also are forwarded to center control nodes with this request and process;

B. the data directory structure of design system, this index structure tree hierarchy from top to bottom consists of: index, the index burst, section, document and territory, wherein, described index can have a plurality of in a system, a described index burst is the data block of described index after divided, wherein, each index burst that belongs to same index is stored on the index datastore node, a described index burst is to be made of one or more section, a described section is to be made of one or more document, each contained document can be different data object type in the section, a described document has the uniquely identified key assignments in system's overall situation, the structure of described document comprises for the territory of describing Doctype;

C. the establishment of index may further comprise the steps:

C1. after externally service node receives the index creation request this request is forwarded to center control nodes, center control nodes is resolved this index creation request, therefrom extract the attribute information of index to be created, and verify that this attribute information is whether complete and effectively, if this attribute information is complete and effective, then carry out the processing of step C2, if this attribute information is incomplete or invalid, then send answer failed information to external service node;

C2. center control nodes is divided into some bursts according to the index burst number in the attribute information of the index to be created that generates among the step C1 with index to be created, simultaneously, according to the attribute information that is stored in the index data node in the center control nodes, judge state and the loading condition of each index data node, and come according to this to determine each index burst is stored and created, and then the attribute information with index to be created is sent to each corresponding index datastore node in which index data node.The index datastore node is according to the attribute information of the index to be created of receiving, make up the index burst of the described index to be created of center control nodes assignment at this index datastore node, if this index datastore node creates this index burst failure, then center control nodes divides the index data node in good condition, that load is relatively little of tasking other to create this index burst, finish or create unsuccessfully until whole index bursts of this index to be created create in the index datastore node, carry out the processing of step C3;

If C3. whole index bursts of index to be created create in the index datastore node and finish among the step C2, the center control nodes updated stored is in index datastore node attribute information wherein, and transmission index burst creates successful response message to external service node; If whole index bursts of index to be created create unsuccessfully in the index datastore node among the step C2, then send to external service node and create replying of index failure;

D. the renewal of index may further comprise the steps:

D1. after externally service node receives the index upgrade request this request is forwarded to center control nodes, center control nodes is sent to this index upgrade request the index datastore node at the index burst place of this index according to the index attributes information and the index datastore node attribute information that are stored in wherein;

D2. the index datastore node is according to the index upgrade request of receiving, on the index burst of index to be updated place index datastore node, to upgrade document storage in new section, if upgrade the document storage success, then will upgrade the corresponding old document of document and in new section, be labeled as the deletion state, and return the index upgrade successful information to center control nodes, if upgrade the document storage failure, then return the index upgrade failure information to center control nodes, center control nodes is sent to external service node with index upgrade success or failed information at last;

The index upgrade of this step D also comprises the delete step of document: when index upgrade request during only for the deletion document command, on the storage burst of the index datastore node at document to be deleted place, in new section the document is labeled as deletion;

The index upgrade of this step D, also comprise the step that makes up real time indexing: in the internal memory of system, make up simultaneously index when index is with merging when upgrading, the retrieval of index be when accessing this renewal index and when merging index carry out, when carrying out index upgrade, the index when index in the renewal is described renewal, when reached threshold value the update time of index when the number of documents of index reached threshold value or this renewal when this renewal, system indexes when submitting this renewal in the disk index, index index index when the upgrading during merging before index and the simultaneously change when merging when changing afterwards this renewal;

E. the retrieval of index may further comprise the steps:

E1. externally send it to center control nodes after the retrieval request of service node reception hint, center control nodes resolve this retrieval request and judge its for the target index, then according to the attribute information of index datastore node attribute information and target index, search all index bursts of this target index, and assign retrieval request to the index datastore node of each burst of storage;

E2. the index datastore node is retrieved relevant documentation according to the retrieval request of receiving at the respective index burst of its storage, will be sent to external service node after the result for retrieval ordering at last;

E3. externally the result for retrieval of service node each index datastore node that will receive is integrated, is sent to client after the ordering.

Further, the functional structure of the described system of steps A, also comprise a center control nodes for subsequent use, described center control nodes in real time with the data backed up in synchronization of its storage to center control nodes for subsequent use, when center control nodes breaks down the phase, this center control nodes for subsequent use changes to center control nodes, and when former center control nodes is recovered from fault, former center control nodes changes to new center control nodes for subsequent use.

Further, described index datastore node and external service node periodically send the heartbeat signal that characterizes its status information to described center control nodes, if center control nodes is not received heartbeat signal within the default time, then this index datastore node of mark or external service node are dead, simultaneously, center control nodes can will be labeled as all index bursts of storing in the dead index datastore node, copy is a in the index data node of any this index burst copy of not storing of other again in the copy of these index bursts of storing from other index data nodes, so that the number of copies of index burst remains unchanged, all be available at any time with assurance index burst.

Further, in the heartbeat signal that described index data node occurs in the center control nodes, the load information that comprises this index data node, in the process of index creation, center control nodes can be distributed to the index burst the little index data node storage of load as far as possible, equally, in the process of indexed search, center control nodes can be submitted to retrieval request the index datastore node processing at the little index burst of load or this burst copy place as far as possible.

Further, described index datastore node attribute information comprises: the type of the ID of node, the title of node, node, the state of node, the load of node and the position of node, described index attributes information comprises: the memory node ID of the number of copies of the burst number of the organization definition of document, index, index burst and index burst and index burst copy in the title of index, the index.

Further, in the data directory structure of the described system of step B, each index burst also has a plurality of index burst copies, this index burst copy creates when the described index creation of step C, upgrade rear asynchronous refresh at former index burst when the described index upgrade of step D, it is stored on the different index datastore nodes with former index burst; The index datastore node at former index burst place is responsible for processing the update request for this index burst, when former index burst upgrade complete after, the index data node that the index data node at former index burst place is responsible for update request is sent to asynchronously corresponding index burst copy place carries out the renewal of index burst copy; Index burst copy is all supported indexed search with corresponding former index burst, center control nodes is submitted to the little index burst of load or index burst copy place index datastore node processing according to the loading condition of former index burst and index burst copy place index datastore node with the indexed search request.

Further, center control nodes is made regular check on the number of the index burst copy of each index in whole index, and when the number of index burst copy was lower than default setting number, system copied the copy of this index burst automatically in other back end; When the index datastore node of the former index burst of storage breaks down, system chooses an index upgrade job of taking over former index burst from the index burst copy of correspondence, this index burst copy becomes new former index burst, then in other index data nodes generating an index burst copy, guarantee that the number of copies of this index burst remains unchanged; When the index data node of storage index burst copy broke down, system can generate a copy the same with former index burst in other index data nodes, guarantee that the number of copies of this index burst remains unchanged.

Further, each index burst of described same index and index burst copy creating and be stored on the index datastore node, be to carry out according to following strategy: center control nodes is according to the load information of node in the attribute information of index datastore node, described index burst and index burst copy are dispensed to the lightest index datastore node of load, when the number of available index datastore node is less than the number of index burst, center control nodes distributes a plurality of index bursts to same index datastore node, and center control nodes is the index burst copy of allocation index burst not; When the number of available index datastore node during more than the number of index burst, the center control nodes distribution portion or all the index burst copy of index bursts to remaining index datastore node.

Further, in the renewal of the described index of step D, the step of the merging of the section of comprising also: the index of the index in described renewal divides the number of medium film section to reach on threshold value or the distance once to reach interval time that index merges threshold value, the index datastore node at this index burst place reads the document in less several sections and it is stored in a new section, then with these several less section physics deletions.

Further, the storage of the described renewal document of step D on the index burst, the cryptographic hash by the key assignments that calculate to upgrade document, this cryptographic hash is counted delivery with the index burst of document place index after, at last document is assigned to the index burst of the numerical value reference numeral of this delivery and stores.

Further, the different pieces of information object type of the described document of step B, comprise: text data object, image data objects, audio data objects, video data objects, executable program data object, the attribute information of each data object type are stored in the structure in territory of document.

The present invention is by adopting technique scheme, and the beneficial effect that has is:

1. in the internal memory of system, make up simultaneously index when index is with merging when upgrading, index when index is with merging when passing through simultaneously access renewal during indexed search, after the number of documents of index runs up to threshold value when upgrading, upgrading index is submitted to the disk index and changes to index when merging, index when index changes to new renewal during original merging, guaranteed that the data of upgrading also can be retrieved, but improved the real-time of search engine retrieve data;

2. the center control nodes of native system, center control nodes for subsequent use, external service node and index datastore node are at the concentrating type system creation based on Master/Slave, has Error Tolerance, be fit to be deployed on the cheap machine, and the data access of high-throughput can be provided;

3. by the index burst that is stored in the index datastore node is created index burst copy, strengthen the fault-tolerance of system.

Description of drawings

Fig. 1 is the functional structure synoptic diagram of one embodiment of the present invention.

Fig. 2 is the synoptic diagram of data directory structure of the present invention.

Fig. 3 is the embodiment synoptic diagram of index burst of the present invention and index burst copy storage policy.

Embodiment

Now the present invention is further described with embodiment by reference to the accompanying drawings.

A kind of implementation method of distributed real-time search engine, its system constructing and operation are to be made of following steps:

Steps A: the functional structure of design system, consult shown in the accompanying drawing 1, this functional structure is to create in the concentrating type system based on Master/Slave, comprise the following functions node: center control nodes, index datastore node and external service node, wherein, described center control nodes is created in the Master system, described index datastore node and external service node are created in the Slave system, described center control nodes is host node in system, the storage and maintenance that is used for the attribute information of data directory structure index, and the storage and maintenance of the attribute information of index datastore node, described index datastore node is back end in system, be used for the establishment of data directory structure index sliced layer, upgrade and retrieval, described external service node is client node in system, is used for the establishment of reception hint, renewal and retrieval request also are forwarded to center control nodes with this request and process;

Step B: the data directory structure of design system, consult shown in the accompanying drawing 2, this index structure tree hierarchy from top to bottom consists of: index, the index burst, section, document and territory, wherein, described index can have a plurality of in a system, a described index burst is the data block of described index after divided, wherein, each index burst that belongs to same index is stored on the index datastore node, a described index burst is to be made of one or more section, a described section is to be made of one or more document, each contained document can be different data object type in the section, a described document has the uniquely identified key assignments in system's overall situation, the structure of described document comprises for the territory of describing the document different attribute; Wherein, described index provides the set of the several data object of retrieval support, and described index burst disperses to be stored on the index datastore node of system, and this can improve the retrieve data efficient of system;

Step C: the establishment of index is to be made of following step:

C2. center control nodes is divided into some bursts according to the index burst number in the attribute information of the index to be created that generates among the step C1 with index to be created, simultaneously, according to the attribute information that is stored in the index data node in the center control nodes, judge state and the loading condition of each index data node, and come according to this to determine each index burst is stored and created, and then the attribute information with index to be created is sent to each corresponding index datastore node in which index data node; The index datastore node is according to the attribute information of the index to be created of receiving, make up the index burst of the described index to be created of center control nodes assignment at this index datastore node, if this index datastore node creates this index burst failure, then center control nodes divides the index data node in good condition, that load is relatively little of tasking other to create this index burst, finish or create unsuccessfully until whole index bursts of this index to be created create in the index datastore node, carry out the processing of step C3;

Step D: the renewal of index is to be made of following steps:

Step e: the retrieval of index is to be made of following steps:

As one preferred embodiment, the functional structure of the described system of steps A, also comprise a center control nodes for subsequent use, described center control nodes in real time with the data backed up in synchronization of its storage to center control nodes for subsequent use, when center control nodes breaks down the phase, this center control nodes for subsequent use changes to center control nodes, and when former center control nodes is recovered from fault, former center control nodes changes to new center control nodes for subsequent use; Because center control nodes is host node in system, in a single day it break down, and will cause the whole system paralysis, therefore, by increasing center control nodes for subsequent use, can realize the fault of center control nodes is shifted, and improves the fault-tolerance of system.

As one preferred embodiment, described index datastore node and external service node periodically send the heartbeat signal that characterizes its status information to described center control nodes, if center control nodes is not received heartbeat signal within the default time, then this index datastore node of mark or external service node are dead, simultaneously, center control nodes can will be labeled as all index bursts of storing in the dead index datastore node, copy is a in the index data node of any this index burst copy of not storing of other again in the copy of these index bursts of storing from other index data nodes, so that the number of copies of index burst remains unchanged, all be available at any time with assurance index burst.

As one preferred embodiment, in the heartbeat signal that described index data node occurs in the center control nodes, the load information that comprises this index data node, in the process of index creation, center control nodes can be distributed to the index burst the little index data node storage of load as far as possible, equally, in the process of indexed search, center control nodes can be submitted to retrieval request the index datastore node processing at the little index burst of load or this burst copy place as far as possible.

As one preferred embodiment, described index datastore node attribute information comprises: the type of the ID of node, the title of node, node, the state of node, the load of node and the position of node, and described index attributes information comprises: the memory node ID of the number of copies of the burst number of the organization definition of document, index, index burst and index burst and index burst copy in the title of index, the index; This index datastore node attribute information and index attributes information are metadata in system, this metadata store is on center control nodes, and the center control nodes of system, index datastore node and external service node can be followed according to these metadata and be deduced each index burst position in cluster.

As one preferred embodiment, in the data directory structure of the described system of step B, each index burst also has a plurality of index burst copies, this index burst copy creates when the described index creation of step C, upgrade rear asynchronous refresh at former index burst when the described index upgrade of step D, it is stored on the different index datastore nodes with former index burst.The index datastore node at former index burst place is responsible for processing the update request for this index burst, when former index burst upgrade complete after, the index data node that the index data node at former index burst place is responsible for update request is sent to asynchronously corresponding index burst copy place carries out the renewal of index burst copy.Index burst copy is all supported indexed search with corresponding former index burst, center control nodes is submitted to the little index burst of load or index burst copy place index datastore node processing according to the loading condition of former index burst and index burst copy place index datastore node with the indexed search request.。

Further, center control nodes is made regular check on the number of the index burst copy of each index in whole index, and when the number of index burst copy was lower than default setting number, system copied the copy of this index burst automatically in other back end.When the index datastore node of the former index burst of storage breaks down, system chooses an index upgrade job of taking over former index burst from the index burst copy of correspondence, this index burst copy becomes new former index burst, then in other index data nodes generating an index burst copy, guarantee that the number of copies of this index burst remains unchanged.When the index data node of storage index burst copy broke down, system can generate a copy the same with former index burst in other index data nodes, guarantee that the number of copies of this index burst remains unchanged.

Further, each index burst of described same index and index burst copy creating and be stored on the index datastore node, be to carry out according to following strategy: center control nodes is according to the load information of node in the attribute information of index datastore node, described index burst and index burst copy are dispensed to the lightest index datastore node of load, when the number of available index datastore node is less than the number of index burst, center control nodes distributes a plurality of index bursts to same index datastore node, and center control nodes is the index burst copy of allocation index burst not; When the number of available index datastore node during more than the number of index burst, the center control nodes distribution portion or all the index burst copy of index bursts to remaining index datastore node; One that consults this strategy shown in the accompanying drawing 3 illustrates, it is that an index burst number is 2, the index burst number of copies of each index burst is 1 index in the situation of the storage of index datastore node: when the index datastore nodes of system is 1, the index burst 1 of this index and index burst 2 all are stored in the index datastore node 1, and each burst does not have index burst copy, because copy only is stored in the different nodes and could availability and the reliability of system be worked with former burst, when the index datastore nodes in the system is 2, the index burst 1 and the index burst 2 that are stored in the index datastore node 1 all have index burst copy 1 ' and the index burst copy 2 ' that is stored on the index datastore node 2, index datastore node 2 can provide with index datastore node 1 the same service, therefore increase the service performance that the index datastore node can expanding system; When the index datastore nodes of system was 4, index burst 1, index burst 2, index burst copy 1 ' and index burst copy 2 ' were separately to be stored on these 4 index datastore nodes.

As one preferred embodiment, in the renewal of the described index of step D, the step of the merging of the section of comprising also: the index of the index in described renewal divides the number of medium film section to reach on threshold value or the distance once to reach interval time that index merges threshold value, the index datastore node at this index burst place reads the document in less several sections and it is stored in a new section, then with these several less section physics deletions; In the building process of index, can constantly produce new section, when index divides the number of medium film section too many, can affect the recall precision of indexed search logic, therefore, this step is merged into a large section with a plurality of little sections, and rejects the data of tag delete, has optimized the storage space of index, reduce the number of the index segment that the indexed search logic operates simultaneously, thereby improved the recall precision of indexed search logic.

As one preferred embodiment, the storage of the described renewal document of step D on the index burst, by calculating the cryptographic hash of the key assignments that upgrades document, after this cryptographic hash counted delivery with the index burst of document place index, at last document is assigned to the index burst of the numerical value reference numeral of this delivery and stores.

As one preferred embodiment, the different pieces of information object type of the described document of step B is: text data object, image data objects, audio data objects, video data objects, executable program data object, the attribute information of each data object type is stored in the structure in territory of document, the structure in the territory of document is used for the attribute information of storage document, for example, for the document of text, can comprise following information: file name, keyword, author, file size, classification, file description etc.; And for the document of audio types, can comprise following information: file name, bit rate (bps), file size, duration, author or artist name, song title, school, album name etc.

Although specifically show and introduced the present invention in conjunction with preferred embodiment; but the those skilled in the art should be understood that; within not breaking away from the spirit and scope of the present invention that appended claims limits; can make a variety of changes the present invention in the form and details, be protection scope of the present invention.

Claims

1. the implementation method of a distributed real-time search engine, its system constructing and operation may further comprise the steps at least:

C. the establishment of index may further comprise the steps:

C2. center control nodes is divided into some bursts according to the index burst number in the attribute information of the index to be created that generates among the step C1 with index to be created, simultaneously, according to the attribute information that is stored in the index datastore node in the center control nodes, judge state and the loading condition of each index datastore node, and come according to this to determine each index burst is stored and created, and then the attribute information with index to be created is sent to each corresponding index datastore node in which index datastore node; The index datastore node is according to the attribute information of the index to be created of receiving, make up an index burst of the described index to be created of center control nodes assignment at this index datastore node, if this index datastore node creates this index burst failure, then center control nodes divides the index datastore node in good condition, that load is relatively little of tasking other to create this index burst, finish or create failure until whole index bursts of this index to be created create in the index datastore node, carry out the processing of step C3;

D. the renewal of index may further comprise the steps:

E. the retrieval of index may further comprise the steps:

2. the implementation method of distributed real-time search engine as claimed in claim 1, it is characterized in that: the functional structure of the described system of steps A, also comprise a center control nodes for subsequent use, described center control nodes in real time with the data backed up in synchronization of its storage to center control nodes for subsequent use, when center control nodes breaks down the phase, this center control nodes for subsequent use changes to center control nodes, when former center control nodes is recovered from fault, former center control nodes changes to new center control nodes for subsequent use.

3. the implementation method of distributed real-time search engine as claimed in claim 1, it is characterized in that: described index datastore node and external service node periodically send the heartbeat signal that characterizes its status information to described center control nodes, if center control nodes is not received heartbeat signal within the default time, then this index datastore node of mark or external service node are dead, simultaneously, center control nodes can will be labeled as all index bursts of storing in the dead index datastore node, copy is a in the index datastore node of any this index burst copy of not storing of other again in the copy of these index bursts of storing from other index datastore nodes, so that the number of copies of index burst remains unchanged, all be available at any time with assurance index burst; In the heartbeat signal that described index datastore node occurs in the center control nodes, the load information that comprises this index datastore node, in the process of index creation, center control nodes can be distributed to the index burst the little index datastore node storage of load as far as possible, equally, in the process of indexed search, center control nodes can be submitted to retrieval request the index datastore node processing at the little index burst of load or this burst copy place as far as possible.

4. the implementation method of distributed real-time search engine as claimed in claim 1, it is characterized in that: described index datastore node attribute information comprises: the type of the ID of node, the title of node, node, the state of node, the load of node and the position of node, described index attributes information comprises: the memory node ID of the number of copies of the burst number of the organization definition of document, index, index burst and index burst and index burst copy in the title of index, the index.

5. the implementation method of distributed real-time search engine as claimed in claim 1, it is characterized in that: in the data directory structure of the described system of step B, each index burst also has a plurality of index burst copies, this index burst copy creates when the described index creation of step C, upgrade rear asynchronous refresh at former index burst when the described index upgrade of step D, it is stored on the different index datastore nodes with former index burst; The index datastore node at former index burst place is responsible for processing the update request for this index burst, when former index burst upgrade complete after, the index datastore node that the index datastore node at former index burst place is responsible for update request is sent to asynchronously corresponding index burst copy place carries out the renewal of index burst copy; Index burst copy is all supported indexed search with corresponding former index burst, center control nodes is submitted to the little index burst of load or index burst copy place index datastore node processing according to the loading condition of former index burst and index burst copy place index datastore node with the indexed search request.

6. the implementation method of distributed real-time search engine as claimed in claim 5, it is characterized in that: center control nodes is made regular check on the number of the index burst copy of each index in whole index, when the number of index burst copy was lower than default setting number, system copied the copy of this index burst automatically in other index datastore nodes; When the index datastore node of the former index burst of storage breaks down, system chooses an index upgrade job of taking over former index burst from the index burst copy of correspondence, this index burst copy becomes new former index burst, then in other index datastore nodes generating an index burst copy, guarantee that the number of copies of this index burst remains unchanged; When the index datastore node of storage index burst copy broke down, system can generate a copy the same with former index burst in other index datastore nodes, guarantee that the number of copies of this index burst remains unchanged.

7. the implementation method of distributed real-time search engine as claimed in claim 5, it is characterized in that: each index burst of same index and index burst copy creating and be stored on the index datastore node, be to carry out according to following strategy: center control nodes is according to the load information of node in the attribute information of index datastore node, described index burst and index burst copy are dispensed to the lightest index datastore node of load, when the number of available index datastore node is less than the number of index burst, center control nodes distributes a plurality of index bursts to same index datastore node, and center control nodes is the index burst copy of allocation index burst not; When the number of available index datastore node during more than the number of index burst, the center control nodes distribution portion or all the index burst copy of index bursts to remaining index datastore node.

8. the implementation method of distributed search engine as claimed in claim 1, it is characterized in that: in the renewal of the described index of step D, the step of the merging of the section of comprising also: the index of the index in described renewal divides the number of medium film section to reach on threshold value or the distance once to reach interval time that index merges threshold value, the index datastore node at this index burst place reads the document in less several sections and it is stored in a new section, then with these several less section physics deletions.

9. the implementation method of distributed real-time search engine as claimed in claim 1, it is characterized in that: the storage of the described renewal document of step D on the index burst, by calculating the cryptographic hash of the key assignments that upgrades document, after this cryptographic hash counted delivery with the index burst of document place index, at last document is assigned to the index burst of the numerical value reference numeral of this delivery and stores.

10. the implementation method of distributed real-time search engine as claimed in claim 1, it is characterized in that: the different pieces of information object type of the described document of step B, comprise: text data object, image data objects, audio data objects, video data objects, executable program data object, the attribute information of each data object type are stored in the structure in territory of document.