KR101662527B1 - An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method - Google Patents
An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method Download PDFInfo
- Publication number
- KR101662527B1 KR101662527B1 KR1020150090254A KR20150090254A KR101662527B1 KR 101662527 B1 KR101662527 B1 KR 101662527B1 KR 1020150090254 A KR1020150090254 A KR 1020150090254A KR 20150090254 A KR20150090254 A KR 20150090254A KR 101662527 B1 KR101662527 B1 KR 101662527B1
- Authority
- KR
- South Korea
- Prior art keywords
- word
- document
- basic
- tag
- search
- Prior art date
Links
Images
Classifications
-
- G06F17/218—
-
- G06F17/277—
-
- G06F17/2795—
-
- G06F17/30967—
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to an apparatus for document management using a metadata library, a method therefor, and a computer-readable recording medium on which the method is recorded. The present invention relates to a compound word table composed of a combination of at least one of the above basic words and a basic word table in which a plurality of basic words made up of keywords used in the field of construction industry have a classification system having hierarchical levels of a plurality of levels, A storage unit configured to store a meta data library including a derivative word table and a similarity table having a similar word of the basic word, and a storage unit for storing a plurality of keywords in the document and extracting a basic word matching the keyword from the basic word table Extracts a derivation word of the extracted basic word from the derived word table, extracts a similar word of the extracted basic word from the similarity table, selects the extracted basic word, derivation word and similar word as a candidate tag, At least one of the candidate tags is selected as a tag, And a document registration module for assigning the selected tag to the document and storing the selected tag in the storage unit. The apparatus for managing a document, a method therefor, and a computer readable recording medium on which the method is recorded Lt; / RTI >
Description
The present invention relates to a document management technique, and more particularly, to a device capable of managing a document using a metadata library, a method therefor, and a computer-readable recording medium having the method recorded thereon.
The documents generated in the construction process are the entities of construction technology and knowledge, which have high practical value. However, due to the specific nature and diversity of the physical contents of the construction documents, efficient sharing and recycling among the participant . Conventional methods for efficiently managing such construction domain documents are classified into European classification systems such as Uniclass, North American Master Format, and domestic construction CALS. Classification schemes can be clearly distinguished by classifying various types as consistent criteria, but there are problems such that the classification is ambiguous and belongs to more than one classification item.
An object of the present invention is to provide a device for managing a document used in a construction field by using a construction metadata library, a method therefor, and a computer readable recording medium on which the method is recorded.
According to another aspect of the present invention, there is provided an apparatus for managing a document, the apparatus comprising: a basic word table including a plurality of basic words constituted by keywords used in the field of construction industry, A storage unit for storing a meta data library including a compound word table composed of one or more of the above basic words, a derivative word table having a derivative word of the basic word, and a similarity word table having a similar word of the basic word, Extracting a plurality of keywords from the basic word table, extracting a basic word matched to the keyword from the basic word table, extracting a derived word of the extracted basic word from the derived word table, Extracts the extracted basic word, derivative word Selecting the candidate words in the candidate tag, and by applying the selected at least one of the selected candidate tag with a tag, and the selected tag to the document comprises a document registration module for performing a registration stored in the storage unit.
Wherein the document registration module derives a keyword having a predetermined appearance frequency among the keywords that are not the basic word, the analogy word, and the derivative word from the document, and assigns at least one of the derived keywords to the document as a tag, .
Wherein the document registration module extracts a document that is not registered among a plurality of documents stored in a predetermined folder when a preset time or a predetermined period comes, and performs the registration on the extracted document .
An apparatus for managing a document according to an embodiment of the present invention extracts a plurality of keywords from texts of a plurality of documents used in the field of construction industry, extracts a plurality of keywords from a search word used for document search in the field of construction industry, When a document is registered in the construction industry field, a plurality of keywords are extracted from the keyword assigned to the document, the extracted keywords are matched to each other to select matched keywords, and basic words derived from the selected keywords are extracted And a metadata module for generating the basic word table.
The metadata module generates a compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word.
Wherein the metadata module extracts words composed of at least two syllables of the text, the search word, and the keyword when extracting the plurality of keywords, and segments the extracted words into a minimum word unit capable of recognizing meaning, And extracts the plurality of keywords by excluding terms and general terms that are not used in the field of construction industry specified in the erasure term table stored in advance.
According to another aspect of the present invention, there is provided an apparatus for managing a document, the apparatus comprising: a basic word table including a plurality of basic words constituted by keywords used in the field of construction industry, A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word, A storage unit for storing a document to which at least one of a basic word, a derivative word, and a similar word is assigned as a tag, an input unit for inputting a search word from a user, and a search unit for searching the metadata library, , And in the storage unit, , And the derivatives or variations of the basic word searches search a document given to the tag, and includes a document retrieval module that provides search results.
The document retrieval module searches the metadata library to convert the derivative or similar word into a base word if the search term is a derivative word or a similar word and converts the base word, Searching the granted document, and providing the search result.
Wherein the document retrieval module derives an upper basic word of the basic word used in the retrieval from the basic word table and derives all compound words including upper basic words derived from the compound word table as a related retrieval word, do.
The document search module arranges and provides the search results in descending order of the degree of tag match score indicating the occurrence frequency of the tag, which is a search word in relation to the appearance frequency of other tags in the searched document.
The document retrieval module may further include:
The tag matching degree score is calculated through Is the tag match score, Is the appearance frequency of the tag.The document retrieving module may include a tag matching degree score indicating a frequency of occurrence of a tag that is a search word in relation to appearance frequencies of other tags in a searched document, a file name match degree score indicating whether a file name of the searched document or a block name includes a search word, A weight is assigned to each document rating, which is a score given by the user, and the search results are sorted in the order of a weighted tag match score, a file name match score, and a document score score, do.
The document retrieval module may further include:
Calculates the sum total score through
Is a document ranking score that is the summed score, Is a normalized tag match score, Is a normalized file name match score, Is a normalized document rating, Is a weight for the tag match degree, Is a weight for the file name match degree, Is a weight for the document rating.According to another aspect of the present invention, there is provided an apparatus for managing a document, the apparatus comprising: a basic word table including a plurality of basic words constituted by keywords used in the field of construction industry, A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word, A storing unit for storing a document to which at least one of a basic word, a derivative word, and a similar word is assigned as a tag, and storing a classification system to which a classification belonging to the tag belongs in a predetermined reference; When the classification is inputted A document retrieval module for extracting a tag belonging to the classification, retrieving a document to which the extracted tag is attached, and providing a retrieval result.
According to another aspect of the present invention, there is provided a method of managing a document, the method comprising: a basic word table having a plurality of basic words constituted by keywords used in the field of construction industry, Storing a metadata library including a compound word table composed of one or more of the above basic words, a derivative word table having a derivative word of the basic word, and a similarity word table having a similar word of the basic word, Extracts a basic word matched to the keyword from the basic word table, extracts a derived word of the extracted basic word from the derived word table, extracts a similar word of the extracted basic word from the similarity table, And extracting the extracted basic word, wave Selected for the control and variation in the candidate tag, the method comprising: selecting at least one of the selected candidate tag with a tag, and a predetermined tag and a step of performing registration and storing given to the article.
The step of selecting by the tag may further include deriving a keyword having a predetermined appearance frequency among the keywords other than the basic word, the similar word, and the derivation word from the document, and selecting at least one of the derived keywords as the tag.
Extracting a plurality of keywords from texts of a plurality of documents used in the field of construction industry, extracting a plurality of keywords from a search word used for document search in the field of construction industry, and registering a document in the field of construction industry Extracting a plurality of keywords from a keyword assigned to the document; matching the extracted keywords with each other to select matched keywords; deriving a basic word from the selected keywords; The method comprising the steps of:
Wherein said compound word table consisting of a combination of at least one of said basic words before said storing step and a derivative word table having a derivative word of said basic word and said similarity table having a similar word of said basic word, Lt; / RTI >
Wherein the step of extracting the plurality of keywords comprises the steps of extracting words consisting of at least two syllables of the text, the search word and the keyword, segmenting the extracted words into a minimum word unit capable of recognizing meaning, Removing the redundant words from the segmented phrases, and extracting the plurality of keywords by excluding terms and general terms that are not defined in the field of construction industry specified in the previously stored erasure term table.
According to another aspect of the present invention, there is provided a method of managing a document, the method comprising: a basic word table having a plurality of basic words constituted by keywords used in the field of construction industry, A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word, The method comprising the steps of: storing a document to which at least one of a basic word, a derivation word, and a similarity is assigned as a tag; and if the search word is input, searching the metadata library to find a basic word, A document with a derivative or similar word tagged Search, comprises the step of providing a search result.
According to another aspect of the present invention, there is provided a method for managing a document, the method comprising: searching the metadata library to convert the derivative or similar word into a base word if the search term is a derivation word or a similar word; And a search result of the search result is provided.
Further, a method for managing a document according to an embodiment of the present invention includes deriving an upper basic word of a basic word used in the retrieval from the basic word table, and extracting all compound words including a higher basic word derived from the compound word table To the related search term and providing it.
The step of providing the search results may include sorting and providing the search results in descending order of the degree of tag match score indicating the appearance frequency of the tag, which is a search word with respect to the frequency of occurrence of other tags in the searched document.
Wherein the step of providing the search result includes:
The tag matching degree score is calculated through Is the tag match score, Is the appearance frequency of the tag.The step of providing the search result may further include a tag match degree score indicating a frequency of occurrence of a tag that is a search word with respect to appearance frequencies of other tags in the searched document, a file name match degree score indicating whether a file name of the searched document or a block name includes a search word, A weighting is given to each document rating, which is a score given by the user to a document, and the search results are sorted and provided in descending order of the weighted tag match score, the file name match score, and the document score, .
Wherein the step of providing the search result includes:
Calculates the sum total score through Is a document ranking score that is the summed score, Is a normalized tag match score, Is a normalized file name match score, Is a normalized document rating, Is a weight for the tag match degree, Is a weight for the file name match degree, Is a weight for the document rating.According to another aspect of the present invention, there is provided a method of managing a document, the method comprising: a basic word table having a plurality of basic words constituted by keywords used in the field of construction industry, A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word, Storing a document to which at least one of a basic word, a derivative word, and a similar word is assigned as a tag, and storing a classification scheme in which a classification to which the tag belongs is set in a preset reference; Extracting the extracted tag; And providing the search result.
According to another aspect of the present invention, there is provided a computer-readable recording medium having recorded thereon a program for executing a method for managing a document according to an embodiment of the present invention described above through a computer.
According to the present invention as described above, a tag can be assigned to a drawing using a metadata library specialized in a construction field, and a document can be searched through the tag, Can provide a precise and accurate search.
1 is a diagram for explaining a system for document management using a metadata library according to an embodiment of the present invention.
2 is a block diagram illustrating a configuration of a document management apparatus according to an embodiment of the present invention.
3 is a flowchart illustrating a method of generating a metadata library according to an embodiment of the present invention.
4 is a flowchart illustrating a method of registering a document using a metadata library according to an embodiment of the present invention.
FIG. 5 is a view illustrating a method of designating a part of a document as a registration area according to an embodiment of the present invention.
6 is a flowchart illustrating a method of registering a document using a metadata library according to another embodiment of the present invention.
7 is a flowchart illustrating a method of searching for a document using a metadata library according to an embodiment of the present invention.
8 is a view showing a result of searching a document using a metadata library according to an embodiment of the present invention.
9 is a flowchart for explaining a method of retrieving a document according to another embodiment of the present invention.
Prior to the detailed description of the present invention, the terms or words used in the present specification and claims should not be construed as limited to ordinary or preliminary meaning, and the inventor may designate his own invention in the best way It should be construed in accordance with the technical idea of the present invention based on the principle that it can be appropriately defined as a concept of a term to describe it. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention, and are not intended to represent all of the technical ideas of the present invention. Therefore, various equivalents It should be understood that water and variations may be present.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Note that, in the drawings, the same components are denoted by the same reference symbols as possible. Further, the detailed description of known functions and configurations that may obscure the gist of the present invention will be omitted. For the same reason, some of the elements in the accompanying drawings are exaggerated, omitted, or schematically shown, and the size of each element does not entirely reflect the actual size.
First, a system for document management using a metadata library according to an embodiment of the present invention will be described. 1 is a diagram for explaining a system for document management using a metadata library according to an embodiment of the present invention.
A system for document management according to an embodiment of the present invention comprises a plurality of
Document registration and document retrieval, which will be described below, will be described as if the
Hereinafter, the
Referring to FIG. 2, the
The
The
The
The
The
The
The present invention manages drawings using a metadata library. A method of generating such a metadata library will be described. 3 is a flowchart illustrating a method of generating a metadata library according to an embodiment of the present invention.
The
Here, the
Common keywords used in construction / other fields
Next, in step S120, the
Next, in step S130, the
Next, in step S140, the
In step S160, the
Next, in step S170, the
In step S180, the
As shown, the compound word table of Table 4 includes a plurality of compound words, and the plurality of compound words consists of a plurality of basic words. Accordingly, each compound word has a code of a basic language as an attribute (attributes 1, 2, 3). In addition, the derivation table in Table 5 has a basic word and at least one derivative word for the basic word. And the similarity table in Table 6 has a basic word and at least one similar word for the basic word.
The metadata library according to the embodiment of the present invention can be constructed according to the above-described method. The metadata library includes a plurality of basic words consisting of keywords used in the field of construction industry, A compound word (complex word) table derived from a plurality of basic words, a derivative word table derived from a plurality of basic words, and a similarity table derived from a plurality of basic words. In particular, the metadata library according to the embodiment of the present invention can update the basic word table, compound word table, derivative word table, and similarity word table by continuously reflecting data (keywords) accumulated in the practice.
Next, a registration procedure for deriving a tag for a document using the metadata library according to an embodiment of the present invention and assigning the tag to the document will be described. 4 is a flowchart illustrating a method of registering a document using a metadata library according to an embodiment of the present invention. FIG. 5 is a view illustrating a method of designating a part of a document as a registration area according to an embodiment of the present invention.
4, the
These documents can have various file formats. For example, the file format may be HWP, DOC, PDF, XLS, PPT, DWG, and the like. The user can select and register some or all of these documents. Accordingly, the user can make an input specifying some or all of the document. According to the embodiment of the present invention, when a user selects a part of a document, a part of the document can be selected as a registration area according to a method corresponding to a file format. The
In this manner, when the user makes an input for specifying an area to be registered according to the file format, the
Subsequently, in step S240, the
In step S260, the
Next, in step S270, the
On the other hand, if the user does not have a desired tag among the displayed candidate tags, or if the user determines that an additional tag is needed, the user can input a keyword to be used as a tag. Then, in step S280, the
The user can select at least one of a plurality of candidate tags displayed on the
On the other hand, as described above, when only a part of the document is registered without registering the entire document, only the registration area designated separately from the original document can be stored as a file. At this time, the file name of the file of the registration area to be stored separately is referred to as a block name. Such a block name may be designated by a name input by the user, or may be specified in a format in which an additional serial number is appended to the file name of the original document, or may be designated as one of the tags of the corresponding registration area.
Next, a method of tagging a document using the metadata library according to another embodiment of the present invention will be described. 6 is a flowchart illustrating a method of registering a document using a metadata library according to another embodiment of the present invention.
Referring to FIG. 6, another embodiment of the present invention starts from a time when the
That is, the
Next, in step S370, the
In step S380, the
When the tag is determined, the
As described above, the tag can be derived using the metadata library, and the tag can be given to the document and registered. Hereinafter, a method of retrieving a document using a metadata library and a tag will be described. 7 is a flowchart illustrating a method of searching for a document using a metadata library according to an embodiment of the present invention. 8 is a view showing a result of searching a document using a metadata library according to an embodiment of the present invention.
Referring to FIG. 7, when the search process starts according to the user's selection, the
At this time, it is determined whether there is a basic word matching the search term in step S440. That is, if there is a basic word matching the search term based on the basic word table as shown in Table 4, the
In step S450, the
On the other hand, when the search word is not matched to any of the basic word, derivation word, and similar word, the
As described above, after the search is completed, the
A method of deriving a related search term will be described in more detail with reference to Tables 3 and 4. Referring to Tables 3 and 4, each compound word has codes of basic words that are mapped to basic words as attributes. Through this, the
Finally, the
On the other hand, the documents retrieved in the embodiment of the present invention can be sorted using the following sorting method. When you register a document, the most frequently used tag is the one that best describes the document. Accordingly, according to an embodiment of the present invention, a tag match score is used. The tag matching degree indicates the occurrence frequency of the corresponding tag (search term) with respect to the frequency of appearance of other tags in the document, and the score indicates the tag match degree score. According to one embodiment, the
: Tag match score
: Frequency of tag appearance
For example, suppose that a document A and a document B are searched for in a document including a tag 'supervision' when a search is performed using the keyword 'supervision'. The tags of document A are 'Supervision (Frequency: 8)', 'Scheme (Frequency: 3)' and 'Construction (Frequency: 7) (Frequency: 2) ', and' report (frequency: 3) '. At this time, the tag matching score score of document A for the keyword 'supervision' is calculated by multiplying 'control (frequency: 8)' / [supervision (frequency: 8) + 'scheme (frequency: 3) Construction (frequency: 7) '] = 8/18 = 0.44 points. In addition, the score of the tag match score of document B is' 6 (frequency) '/' '(frequency 6)' 'construction (frequency 2)' + 'report (frequency 3)'] = 6/11 = 0.54. Accordingly, the
Meanwhile, according to another embodiment of the present invention, not only a tag match degree but also a file name match degree and a document rating can be used as parameters for sorting. Here, the file name match degree is the degree of matching between the search word and the file name (block name). The file name match score is given as 1 or 0 if the search term is included in words in the file name or block name. The document rating is the score of the utility of the document after the user has viewed the document.
For example, it is assumed that the search term is 'Supervision' and the
It is also assumed that the tag and the frequency of the tag when registering
According to another embodiment of the present invention, the
: Document Ranking Score
: Normalized tag match score
: Normalized file name match score
: Normalized document rating
: Weight for tag match
: Weight for file name match
: Weight for document ratings
Referring to Equation (2), the document ranking score of
First, the
Also, referring to Equation (2), the document ranking score of
In the same way as
As described above, according to another embodiment of the present invention, when the search result is sorted and provided, the
Next, a method of retrieving a document according to another embodiment of the present invention will be described. 9 is a flowchart for explaining a method of retrieving a document according to another embodiment of the present invention. According to another embodiment of the present invention, there is a classification system (a project classification system, a year classification system, a work classification system, etc.), and any tag belongs to any one classification system. That is, the classification system specifies a classification to which the tag belongs in a preset reference. The classification system is stored in advance in the
Referring to FIG. 9, when the search process starts according to the user's selection, the
Then, the
Meanwhile, a method for managing a document using a metadata library including a document registration method and a document retrieval method according to an embodiment of the present invention is implemented in a form of a program readable by various computer means, And recorded on a recording medium. Here, the recording medium may include program commands, data files, data structures, and the like, alone or in combination. Program instructions to be recorded on a recording medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. For example, the recording medium may be a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical medium such as a CD-ROM or a DVD, a magneto-optical medium such as a floppy disk magneto-optical media, and hardware devices that are specially configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions may include machine language wires such as those produced by a compiler, as well as high-level language wires that may be executed by a computer using an interpreter or the like. Such a hardware device may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.
While the present invention has been described with reference to several preferred embodiments, these embodiments are illustrative and not restrictive. It will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit of the invention and the scope of the appended claims.
100: Document management device
110:
120: Input unit
130:
140:
150:
151: Metadata module
153: Document registration module
155: Document Search Module
Claims (28)
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, A storage unit for storing a metadata library including a similarity table having a similar word of the basic word; And
Extracting a plurality of keywords from a document, extracting a basic word matched to the keyword from the basic word table, extracting a derived word of the extracted basic word from the derived word table, Extracts the similar words, selects the extracted basic words, derivative words, and similar words as candidate tags, selects at least one of the selected candidate tags as a tag, assigns the selected tag to the document, And a document registration module for performing the document registration process.
The document registration module
A keyword having a predetermined appearance frequency among the keywords that are not the basic word, the analogous word, and the derivation word is derived from the document, and at least one of the derived keywords is assigned to the document as a tag and is stored in the storage unit .
The document registration module
Extracts a document to which the registration has not been made among a plurality of documents stored in a predetermined folder when the predetermined time or a preset period comes, and performs the registration on the extracted document Device.
Extracting a plurality of keywords from texts of a plurality of documents used in the field of construction industry, extracting a plurality of keywords from a search word used for document search in the field of construction industry, extracting keywords from a keyword Extracts a plurality of keywords,
Further comprising a meta data module for matching the extracted keywords to select matched keywords and deriving a basic word from the selected keyword to generate the basic word table having the derived basic word, Apparatus for management.
The metadata module
A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word are generated.
The metadata module
When extracting the plurality of keywords,
Extracting a word composed of at least two syllables of the text, the search word and the keyword, segmenting the extracted words into a minimum word unit capable of recognizing meaning, removing redundant words from the segmented phrases, And extracts the plurality of keywords by excluding terms and general terms that are not terms in the construction industry field specified in the term table.
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, A storage unit for storing a metadata library including a similarity table having a similar word of the basic word and storing at least one of a basic word, a derivation word, and a similar word of the metadata library as a tag;
An input unit for receiving a search word from a user; And
If the search word is input, the metadata library is searched. If the search word is a basic word, a search is made for a basic word, which is a search word in the storage unit, a derivative word of a basic word, And a document retrieval module for providing the retrieved document.
The document search module
Searching the metadata library to search for a document that is derived from the base word, the derived word of the converted basic word, or a similar word from the storage unit if the search word is a derivative word or a similar word, And provides a search result. ≪ Desc / Clms Page number 19 >
The document search module
Deriving an upper basic word of a basic word used in the retrieval from the basic word table and deriving all compound words including an upper basic word derived from the compound word table as a related retrieval word, .
The document search module
And arranging and providing the search results in descending order of the degree of tag match score indicating the appearance frequency of the tag, which is a search word, with respect to the appearance frequency of other tags in the searched document.
The document search module
Equation
The tag matching degree score is calculated by using the tag matching score,
remind Is the tag match score,
remind Is an appearance frequency of the tag.
The document search module
A tag match degree score indicating a frequency of appearance of a tag that is a search word with respect to appearance frequencies of other tags in the searched document, a file name match degree score indicating whether a file name or block name of the searched document includes a search word, And the search result is sorted in a descending order of the weighted tag match score, the file name match score, and the document score. .
The document search module
Equation
To calculate the total score,
remind Is a document ranking score, which is the summed score,
remind Is a normalized tag match score,
remind Is a normalized file name match score,
remind Is a normalized document rating,
remind Is a weight for the tag match degree,
remind Is a weight for a file name match degree,
remind Is a weight for a document rating. ≪ RTI ID = 0.0 > 8. < / RTI >
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, A meta data library including a similarity table having a similar word of the basic word, wherein at least one of a basic word, a derivation word, and a similar word of the meta data library is stored as a tag, A storage unit for storing a classification system to which a tag belonging is assigned;
An input unit for receiving at least one classification of the classification system from a user; And
And a document search module for extracting a tag belonging to the classification and searching for a document to which the extracted tag is assigned and providing a search result when the classification is inputted.
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, Storing a metadata library including a similarity table having a similar word of the basic word;
Extracting a plurality of keywords from the document, extracting basic words matched to the keywords from the basic word table, extracting the derived words of the extracted basic words from the derived word table, Extracting a similar word;
Selecting the extracted basic words, derivative words, and similar words as candidate tags, and selecting at least one of the selected candidate tags as tags; And
And performing a registration for assigning and storing the selected tag to the document.
The step of selecting by the tag
Further comprising the step of deriving a keyword having a predetermined appearance frequency among the keywords that are not the basic words, the similar words, and the derivatives, from the document, and selecting at least one of the derived keywords as the tag Lt; / RTI >
Before the storing step,
Extracting a plurality of keywords from texts of a plurality of documents used in the field of construction industry, extracting a plurality of keywords from a search word used for document search in the field of construction industry, extracting keywords from a keyword Extracting a plurality of keywords; And
Selecting a matched keyword by matching the extracted keywords with each other and deriving a basic word from the selected keyword to generate the basic word table having the derived basic word, Lt; / RTI >
After the step of generating the basic word table,
Further comprising: generating a compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and the similarity table having a similar word of the basic word, Lt; / RTI >
The step of extracting the plurality of keywords
Extracting words composed of the text, the search word, and two or more syllables of the keyword;
Segmenting the extracted words into minimum word units capable of semantic recognition;
Removing redundant words from the segmented phrases; And
Extracting the plurality of keywords by excluding terms and general terms that are not terms in the construction industry field specified in the previously stored erasure term table.
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, Storing a metadata library including a similarity table having a similar word of the basic word, storing at least one of a basic word, a derivation word, and a similar word of the metadata library as a tag; And
If the search word is input, searching the metadata library to search for a document that is a basic word, which is a search word, a derivation word of a basic word that is a search word, or a tag to which a similar word is assigned, if the search word is a basic word, ≪ / RTI >
If the search word is a derivative word or a similar word, converts the derivative word or the similar word into a base word, searches a document to which the converted base word, a derived word of the converted base word or a similar word is assigned as the tag, ≪ / RTI > further comprising the step of:
Deriving an upper basic word of the basic word used in the retrieval from the basic word table and deriving all compound words including an upper basic word derived from the compound word table as a related retrieval word, A method for managing a document comprising:
The step of providing the search result
And arranging the search results in a descending order of the degree of tag match score indicating the appearance frequency of the tag, which is a search term in relation to appearance frequencies of other tags in the searched document.
The step of providing the search result
Equation
The tag matching degree score is calculated by using the tag matching score,
remind Is the tag match score,
remind Is the appearance frequency of the tag.
The step of providing the search result
A tag match degree score indicating a frequency of appearance of a tag that is a search word with respect to appearance frequencies of other tags in the searched document, a file name match degree score indicating whether a file name or block name of the searched document includes a search word, And the search result is sorted in a descending order of the weighted tag match score, the file name match score, and the document score. Way.
The step of providing the search result
Equation
To calculate the total score,
remind Is a document ranking score, which is the summed score,
remind Is a normalized tag match score,
remind Is a normalized file name match score,
remind Is a normalized document rating,
remind Is a weight for the tag match degree,
remind Is a weight for a file name match degree,
remind Is a weight for a document rating.
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, A meta data library including a similarity table having a similar word of the basic word, wherein at least one of a basic word, a derivation word, and a similar word of the meta data library is stored as a tag, Storing a classification system to which a tag belonging is assigned;
Extracting a tag belonging to the classification if the classification is inputted; And
And searching for a document to which the extracted tag is attached to provide a search result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150090254A KR101662527B1 (en) | 2015-06-25 | 2015-06-25 | An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150090254A KR101662527B1 (en) | 2015-06-25 | 2015-06-25 | An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method |
Publications (1)
Publication Number | Publication Date |
---|---|
KR101662527B1 true KR101662527B1 (en) | 2016-10-14 |
Family
ID=57157323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150090254A KR101662527B1 (en) | 2015-06-25 | 2015-06-25 | An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101662527B1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060130893A (en) | 2005-06-09 | 2006-12-20 | 이상호 | Project product auto converting module for cals/ec |
WO2007052285A2 (en) * | 2005-07-22 | 2007-05-10 | Yogesh Chunilal Rathod | Universal knowledge management and desktop search system |
-
2015
- 2015-06-25 KR KR1020150090254A patent/KR101662527B1/en active IP Right Grant
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060130893A (en) | 2005-06-09 | 2006-12-20 | 이상호 | Project product auto converting module for cals/ec |
WO2007052285A2 (en) * | 2005-07-22 | 2007-05-10 | Yogesh Chunilal Rathod | Universal knowledge management and desktop search system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10140371B2 (en) | Providing multi-lingual searching of mono-lingual content | |
CN111125343B (en) | Text analysis method and device suitable for person post matching recommendation system | |
CN102819604B (en) | Method for retrieving confidential information of file and judging and marking security classification based on content correlation | |
CA2777520C (en) | System and method for phrase identification | |
US20180341866A1 (en) | Method of building a sorting model, and application method and apparatus based on the model | |
CN110781670B (en) | Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vectors | |
TW202020688A (en) | Method for determining address text similarity, address searching method, apparatus, and device | |
US20180181544A1 (en) | Systems for Automatically Extracting Job Skills from an Electronic Document | |
CN107958014B (en) | Search engine | |
Heu et al. | FoDoSu: multi-document summarization exploiting semantic analysis based on social Folksonomy | |
CN105183761A (en) | Sensitive word replacement method and apparatus | |
US20150193447A1 (en) | Synthetic local type-ahead suggestions for search | |
CN105209858B (en) | The uncertainty of business location's data disappears qi and matching | |
CN113627797B (en) | Method, device, computer equipment and storage medium for generating staff member portrait | |
CN112163424A (en) | Data labeling method, device, equipment and medium | |
US20130232147A1 (en) | Generating a taxonomy from unstructured information | |
US20200192921A1 (en) | Suggesting text in an electronic document | |
KR101768089B1 (en) | An apparatus for managing document using table of contents, a method thereof, and a computer recordable medium storing the method | |
CN111259262A (en) | Information retrieval method, device, equipment and medium | |
CN115017425B (en) | Location search method, location search device, electronic device, and storage medium | |
CN113204667A (en) | Method and device for training audio labeling model and audio labeling | |
CN115309994A (en) | Location search method, electronic device, and storage medium | |
CN106547732A (en) | Near synonym recognition methodss and near synonym identifying system | |
KR102609616B1 (en) | Method and apparatus for image processing, electronic device and computer readable storage medium | |
CN112597748B (en) | Corpus generation method, corpus generation device, corpus generation equipment and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20190731 Year of fee payment: 4 |