Nothing Special   »   [go: up one dir, main page]

KR101662527B1 - An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method - Google Patents

An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method Download PDF

Info

Publication number
KR101662527B1
KR101662527B1 KR1020150090254A KR20150090254A KR101662527B1 KR 101662527 B1 KR101662527 B1 KR 101662527B1 KR 1020150090254 A KR1020150090254 A KR 1020150090254A KR 20150090254 A KR20150090254 A KR 20150090254A KR 101662527 B1 KR101662527 B1 KR 101662527B1
Authority
KR
South Korea
Prior art keywords
word
document
basic
tag
search
Prior art date
Application number
KR1020150090254A
Other languages
Korean (ko)
Inventor
구교진
박형진
조동현
박상헌
Original Assignee
서울시립대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 서울시립대학교 산학협력단 filed Critical 서울시립대학교 산학협력단
Priority to KR1020150090254A priority Critical patent/KR101662527B1/en
Application granted granted Critical
Publication of KR101662527B1 publication Critical patent/KR101662527B1/en

Links

Images

Classifications

    • G06F17/218
    • G06F17/277
    • G06F17/2795
    • G06F17/30967

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to an apparatus for document management using a metadata library, a method therefor, and a computer-readable recording medium on which the method is recorded. The present invention relates to a compound word table composed of a combination of at least one of the above basic words and a basic word table in which a plurality of basic words made up of keywords used in the field of construction industry have a classification system having hierarchical levels of a plurality of levels, A storage unit configured to store a meta data library including a derivative word table and a similarity table having a similar word of the basic word, and a storage unit for storing a plurality of keywords in the document and extracting a basic word matching the keyword from the basic word table Extracts a derivation word of the extracted basic word from the derived word table, extracts a similar word of the extracted basic word from the similarity table, selects the extracted basic word, derivation word and similar word as a candidate tag, At least one of the candidate tags is selected as a tag, And a document registration module for assigning the selected tag to the document and storing the selected tag in the storage unit. The apparatus for managing a document, a method therefor, and a computer readable recording medium on which the method is recorded Lt; / RTI >

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to an apparatus for managing a document using a metadata library, a method for the same, and a computer readable recording medium on which the method is recorded. recordable medium storing the method}

The present invention relates to a document management technique, and more particularly, to a device capable of managing a document using a metadata library, a method therefor, and a computer-readable recording medium having the method recorded thereon.

The documents generated in the construction process are the entities of construction technology and knowledge, which have high practical value. However, due to the specific nature and diversity of the physical contents of the construction documents, efficient sharing and recycling among the participant . Conventional methods for efficiently managing such construction domain documents are classified into European classification systems such as Uniclass, North American Master Format, and domestic construction CALS. Classification schemes can be clearly distinguished by classifying various types as consistent criteria, but there are problems such that the classification is ambiguous and belongs to more than one classification item.

Korea Open Patent No. 2006-0130893 December 20, 2006 Registered (Name: Project performance auto generation module of CALS / EC standard system)

An object of the present invention is to provide a device for managing a document used in a construction field by using a construction metadata library, a method therefor, and a computer readable recording medium on which the method is recorded.

According to another aspect of the present invention, there is provided an apparatus for managing a document, the apparatus comprising: a basic word table including a plurality of basic words constituted by keywords used in the field of construction industry, A storage unit for storing a meta data library including a compound word table composed of one or more of the above basic words, a derivative word table having a derivative word of the basic word, and a similarity word table having a similar word of the basic word, Extracting a plurality of keywords from the basic word table, extracting a basic word matched to the keyword from the basic word table, extracting a derived word of the extracted basic word from the derived word table, Extracts the extracted basic word, derivative word Selecting the candidate words in the candidate tag, and by applying the selected at least one of the selected candidate tag with a tag, and the selected tag to the document comprises a document registration module for performing a registration stored in the storage unit.

Wherein the document registration module derives a keyword having a predetermined appearance frequency among the keywords that are not the basic word, the analogy word, and the derivative word from the document, and assigns at least one of the derived keywords to the document as a tag, .

Wherein the document registration module extracts a document that is not registered among a plurality of documents stored in a predetermined folder when a preset time or a predetermined period comes, and performs the registration on the extracted document .

An apparatus for managing a document according to an embodiment of the present invention extracts a plurality of keywords from texts of a plurality of documents used in the field of construction industry, extracts a plurality of keywords from a search word used for document search in the field of construction industry, When a document is registered in the construction industry field, a plurality of keywords are extracted from the keyword assigned to the document, the extracted keywords are matched to each other to select matched keywords, and basic words derived from the selected keywords are extracted And a metadata module for generating the basic word table.

The metadata module generates a compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word.

Wherein the metadata module extracts words composed of at least two syllables of the text, the search word, and the keyword when extracting the plurality of keywords, and segments the extracted words into a minimum word unit capable of recognizing meaning, And extracts the plurality of keywords by excluding terms and general terms that are not used in the field of construction industry specified in the erasure term table stored in advance.

According to another aspect of the present invention, there is provided an apparatus for managing a document, the apparatus comprising: a basic word table including a plurality of basic words constituted by keywords used in the field of construction industry, A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word, A storage unit for storing a document to which at least one of a basic word, a derivative word, and a similar word is assigned as a tag, an input unit for inputting a search word from a user, and a search unit for searching the metadata library, , And in the storage unit, , And the derivatives or variations of the basic word searches search a document given to the tag, and includes a document retrieval module that provides search results.

The document retrieval module searches the metadata library to convert the derivative or similar word into a base word if the search term is a derivative word or a similar word and converts the base word, Searching the granted document, and providing the search result.

Wherein the document retrieval module derives an upper basic word of the basic word used in the retrieval from the basic word table and derives all compound words including upper basic words derived from the compound word table as a related retrieval word, do.

The document search module arranges and provides the search results in descending order of the degree of tag match score indicating the occurrence frequency of the tag, which is a search word in relation to the appearance frequency of other tags in the searched document.

The document retrieval module may further include:

Figure 112015061533558-pat00001
The tag matching degree score is calculated through
Figure 112015061533558-pat00002
Is the tag match score,
Figure 112015061533558-pat00003
Is the appearance frequency of the tag.

The document retrieving module may include a tag matching degree score indicating a frequency of occurrence of a tag that is a search word in relation to appearance frequencies of other tags in a searched document, a file name match degree score indicating whether a file name of the searched document or a block name includes a search word, A weight is assigned to each document rating, which is a score given by the user, and the search results are sorted in the order of a weighted tag match score, a file name match score, and a document score score, do.

The document retrieval module may further include:

Figure 112015061533558-pat00004

Calculates the sum total score through

Figure 112015061533558-pat00005
Is a document ranking score that is the summed score,
Figure 112015061533558-pat00006
Is a normalized tag match score,
Figure 112015061533558-pat00007
Is a normalized file name match score,
Figure 112015061533558-pat00008
Is a normalized document rating,
Figure 112015061533558-pat00009
Is a weight for the tag match degree,
Figure 112015061533558-pat00010
Is a weight for the file name match degree,
Figure 112015061533558-pat00011
Is a weight for the document rating.

According to another aspect of the present invention, there is provided an apparatus for managing a document, the apparatus comprising: a basic word table including a plurality of basic words constituted by keywords used in the field of construction industry, A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word, A storing unit for storing a document to which at least one of a basic word, a derivative word, and a similar word is assigned as a tag, and storing a classification system to which a classification belonging to the tag belongs in a predetermined reference; When the classification is inputted A document retrieval module for extracting a tag belonging to the classification, retrieving a document to which the extracted tag is attached, and providing a retrieval result.

According to another aspect of the present invention, there is provided a method of managing a document, the method comprising: a basic word table having a plurality of basic words constituted by keywords used in the field of construction industry, Storing a metadata library including a compound word table composed of one or more of the above basic words, a derivative word table having a derivative word of the basic word, and a similarity word table having a similar word of the basic word, Extracts a basic word matched to the keyword from the basic word table, extracts a derived word of the extracted basic word from the derived word table, extracts a similar word of the extracted basic word from the similarity table, And extracting the extracted basic word, wave Selected for the control and variation in the candidate tag, the method comprising: selecting at least one of the selected candidate tag with a tag, and a predetermined tag and a step of performing registration and storing given to the article.

The step of selecting by the tag may further include deriving a keyword having a predetermined appearance frequency among the keywords other than the basic word, the similar word, and the derivation word from the document, and selecting at least one of the derived keywords as the tag.

Extracting a plurality of keywords from texts of a plurality of documents used in the field of construction industry, extracting a plurality of keywords from a search word used for document search in the field of construction industry, and registering a document in the field of construction industry Extracting a plurality of keywords from a keyword assigned to the document; matching the extracted keywords with each other to select matched keywords; deriving a basic word from the selected keywords; The method comprising the steps of:

Wherein said compound word table consisting of a combination of at least one of said basic words before said storing step and a derivative word table having a derivative word of said basic word and said similarity table having a similar word of said basic word, Lt; / RTI >

Wherein the step of extracting the plurality of keywords comprises the steps of extracting words consisting of at least two syllables of the text, the search word and the keyword, segmenting the extracted words into a minimum word unit capable of recognizing meaning, Removing the redundant words from the segmented phrases, and extracting the plurality of keywords by excluding terms and general terms that are not defined in the field of construction industry specified in the previously stored erasure term table.

According to another aspect of the present invention, there is provided a method of managing a document, the method comprising: a basic word table having a plurality of basic words constituted by keywords used in the field of construction industry, A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word, The method comprising the steps of: storing a document to which at least one of a basic word, a derivation word, and a similarity is assigned as a tag; and if the search word is input, searching the metadata library to find a basic word, A document with a derivative or similar word tagged Search, comprises the step of providing a search result.

According to another aspect of the present invention, there is provided a method for managing a document, the method comprising: searching the metadata library to convert the derivative or similar word into a base word if the search term is a derivation word or a similar word; And a search result of the search result is provided.

Further, a method for managing a document according to an embodiment of the present invention includes deriving an upper basic word of a basic word used in the retrieval from the basic word table, and extracting all compound words including a higher basic word derived from the compound word table To the related search term and providing it.

The step of providing the search results may include sorting and providing the search results in descending order of the degree of tag match score indicating the appearance frequency of the tag, which is a search word with respect to the frequency of occurrence of other tags in the searched document.

Wherein the step of providing the search result includes:

Figure 112015061533558-pat00012
The tag matching degree score is calculated through
Figure 112015061533558-pat00013
Is the tag match score,
Figure 112015061533558-pat00014
Is the appearance frequency of the tag.

The step of providing the search result may further include a tag match degree score indicating a frequency of occurrence of a tag that is a search word with respect to appearance frequencies of other tags in the searched document, a file name match degree score indicating whether a file name of the searched document or a block name includes a search word, A weighting is given to each document rating, which is a score given by the user to a document, and the search results are sorted and provided in descending order of the weighted tag match score, the file name match score, and the document score, .

Wherein the step of providing the search result includes:

Figure 112015061533558-pat00015
Calculates the sum total score through
Figure 112015061533558-pat00016
Is a document ranking score that is the summed score,
Figure 112015061533558-pat00017
Is a normalized tag match score,
Figure 112015061533558-pat00018
Is a normalized file name match score,
Figure 112015061533558-pat00019
Is a normalized document rating,
Figure 112015061533558-pat00020
Is a weight for the tag match degree,
Figure 112015061533558-pat00021
Is a weight for the file name match degree,
Figure 112015061533558-pat00022
Is a weight for the document rating.

According to another aspect of the present invention, there is provided a method of managing a document, the method comprising: a basic word table having a plurality of basic words constituted by keywords used in the field of construction industry, A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word, Storing a document to which at least one of a basic word, a derivative word, and a similar word is assigned as a tag, and storing a classification scheme in which a classification to which the tag belongs is set in a preset reference; Extracting the extracted tag; And providing the search result.

According to another aspect of the present invention, there is provided a computer-readable recording medium having recorded thereon a program for executing a method for managing a document according to an embodiment of the present invention described above through a computer.

According to the present invention as described above, a tag can be assigned to a drawing using a metadata library specialized in a construction field, and a document can be searched through the tag, Can provide a precise and accurate search.

1 is a diagram for explaining a system for document management using a metadata library according to an embodiment of the present invention.
2 is a block diagram illustrating a configuration of a document management apparatus according to an embodiment of the present invention.
3 is a flowchart illustrating a method of generating a metadata library according to an embodiment of the present invention.
4 is a flowchart illustrating a method of registering a document using a metadata library according to an embodiment of the present invention.
FIG. 5 is a view illustrating a method of designating a part of a document as a registration area according to an embodiment of the present invention.
6 is a flowchart illustrating a method of registering a document using a metadata library according to another embodiment of the present invention.
7 is a flowchart illustrating a method of searching for a document using a metadata library according to an embodiment of the present invention.
8 is a view showing a result of searching a document using a metadata library according to an embodiment of the present invention.
9 is a flowchart for explaining a method of retrieving a document according to another embodiment of the present invention.

Prior to the detailed description of the present invention, the terms or words used in the present specification and claims should not be construed as limited to ordinary or preliminary meaning, and the inventor may designate his own invention in the best way It should be construed in accordance with the technical idea of the present invention based on the principle that it can be appropriately defined as a concept of a term to describe it. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention, and are not intended to represent all of the technical ideas of the present invention. Therefore, various equivalents It should be understood that water and variations may be present.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Note that, in the drawings, the same components are denoted by the same reference symbols as possible. Further, the detailed description of known functions and configurations that may obscure the gist of the present invention will be omitted. For the same reason, some of the elements in the accompanying drawings are exaggerated, omitted, or schematically shown, and the size of each element does not entirely reflect the actual size.

First, a system for document management using a metadata library according to an embodiment of the present invention will be described. 1 is a diagram for explaining a system for document management using a metadata library according to an embodiment of the present invention.

A system for document management according to an embodiment of the present invention comprises a plurality of document management apparatuses 100. [ The document management apparatus 100 is a computing apparatus. Therefore, the document management apparatus 100 basically includes a processor for computing operation and a storage medium such as a memory and a storage. Further, the document management apparatus 100 further includes an I / O device for inputting and outputting, a display for displaying a screen, a communication device for communicating with another document management apparatus 100 or a network. Such a document management apparatus 100 may represent a computer typically. Any other apparatus capable of computing operations can be a document management apparatus 100 according to an embodiment of the present invention. Some of the document management apparatuses 100 may operate as a server and others may operate as clients. As shown in the figure, a plurality of companies, for example, companies A, B, and C are made up of a plurality of document management apparatuses 100, and each of the document management apparatuses 100 operates as a server, It can act as a client. In addition, there may be a document management apparatus 100 serving as a client and all of the document management apparatuses 100 of a plurality of companies as clients. The document management apparatus 100 may be interconnected through an intranet or the Internet. Accordingly, the document management apparatuses 100 can be set to mutually share the folders.

Document registration and document retrieval, which will be described below, will be described as if the document management apparatus 100 is a stand-alone apparatus and is made in one document management apparatus 100. [ However, the present invention is not limited thereto. For example, in a client / server method or a cloud computing method, data necessary for document registration is provided from a client, and a computing operation for registering a document using the data can be performed in the server. In addition, input of a search word for document search is performed in the client, computing operation for search is performed in the server, and the search result can be displayed again on the client.

Hereinafter, the document management apparatus 100 according to the embodiment of the present invention will be described in more detail. 2 is a block diagram illustrating a configuration of a document management apparatus according to an embodiment of the present invention.

Referring to FIG. 2, the document management apparatus 100 includes an interface unit 110, an input unit 120, a display unit 130, a storage unit 140, and a control unit 150.

The interface unit 110 is an interface unit connected to the outside through a network connection or a connection with a peripheral device. That is, the interface unit 110 is an interface between the document management apparatus 100 and another document management apparatus 100, and is a communication means. The interface unit 110 may include a modem, an interface card, a wired / wireless LAN card, a USB port, a serial port, a parallel port, and a data bus. In particular, the interface unit 110 may be any unit capable of transmitting and receiving data with the outside of the document management apparatus 100 by various data transmission / reception means.

The input unit 120 receives a user's key operation for controlling various functions, operations, and the like of the document management apparatus 100, generates an input signal, and transmits the input signal to the control unit 150. The input unit 120 may be a keyboard, a mouse, or the like. The input unit 120 may include at least one of a power key, a character key, a number key, and a direction key for power on / off. The function of the input unit 120 may be performed in the display unit 130 when the display unit 130 is implemented as a touch screen and may be omitted if the display unit 130 can perform all functions only have.

The display unit 130 may receive data for screen display from the control unit 150 and display the received data on a screen. Also, the display unit 130 can visually provide menus, data, function setting information, and various other information of the document management apparatus 100 to the user. When the display unit 130 is formed by a touch screen, some or all of the functions of the input unit 120 may be performed instead. The display unit 130 may include a liquid crystal display (LCD), an organic light emitting diode (OLED), and an active matrix organic light emitting diode (AMOLED).

The storage unit 140 stores each kind of data required for the operation of the document management apparatus 100, each application, and each kind of data generated according to the operation of the document management apparatus 100. The storage unit 140 may include a program area and a data area. The program area may store an operating system (OS) for booting and operation of the document management apparatus 100, an application executing a method for managing drawings according to an embodiment of the present invention, and the like . The data area can store various kinds of data for document management. Each kind of data stored in the storage unit 140 can be deleted, changed or added according to a user's operation.

The control unit 150 can perform a general operation of the document management apparatus 100 and a data processing function of controlling the signal flow between the internal blocks of the document management apparatus 100 and processing the data. The controller 150 may be a central processing unit (CPU), an application processor, a GPU (Graphic Processing Unit), or the like.

The control unit 150 includes a metadata module 451, a document registration module 453, and a document search module 455. The metadata module 451, the document registration module 453 and the document retrieval module 455 in the embodiment of the present invention will be described as being implemented in hardware, but not limited to, the metadata module 451, The document registration module 453 and the document search module 455 may be implemented in an application executed in the control unit 150 after being stored in the storage unit 140. [ The metadata module 451 generates a metadata library according to an embodiment of the present invention. The document registration module 453 registers a tag to the document using the metadata library and stores the tag in the storage unit 140. [ The document search module 455 searches for and provides a document according to a search word input by a user. The operation of the control unit 150 including the metadata module 451, the document registration module 453 and the document retrieval module 455 will be described in more detail below.

The present invention manages drawings using a metadata library. A method of generating such a metadata library will be described. 3 is a flowchart illustrating a method of generating a metadata library according to an embodiment of the present invention.

The metadata module 151 of the control unit 150 derives a plurality of keywords from the text of the plurality of documents used in the construction industry field in step S110. Here, a plurality of documents used in the field of construction industry may be inputted through the interface unit 110 or may be a document stored in the storage unit 140. [ In step S110, the metadata module 151 extracts words consisting of two or more syllables in the text, segments the extracted words into minimum word units capable of recognizing meaning, and 3) ④ Exclude terms and general terms that are not in the field of construction industry.

Here, the metadata module 151 can select terms of the construction industry field registered in advance in the registration term table according to the embodiment of the present invention. In particular, the metadata module 151 excludes terminology and general terms that are not in the field of construction industry registered in advance in the erasing term table according to the embodiment of the present invention. Table 1 below shows an example of the erasing term table.

division Erase term Keywords not in construction sector
Common keywords used in construction / other fields
Exchange work, national competition, mechanization, functional workforce
Noun (pronoun) or proper noun Noun Maple, ventilator, dump truck, sound system, etc. Proper noun Country name, city name, river bridge name, etc. Keywords used regardless of field education training Other Keyword that does not correspond to management item, document type, material, facility site, type of work

Next, in step S120, the metadata module 151 derives a plurality of keywords from the search term used for document search in the field of construction industry. These search terms may be cumulatively stored as search terms used when a search is performed through a plurality of document management apparatuses 100. These search terms may be shared by a plurality of document search apparatuses 100, . In step S120, the metadata module 151 extracts a word composed of at least two syllables, segments the extracted words into a minimum word unit capable of recognizing meaning, and 3) (4) Exclude terms and general terms that are not in the field of construction industry.

Next, in step S130, the metadata module 151 derives a plurality of keywords from the keyword assigned to the document when the document is registered in the field of construction industry. These key words are terms input by the user when a document is registered through a plurality of document management apparatuses 100, and are cumulatively stored at the time of document registration. These key words may be shared by a plurality of document search apparatuses 100, or may be provided by a management server 200. In step S130, the metadata module 151 extracts a word composed of at least two syllables, extracts the extracted words into a minimum word unit capable of recognizing meaning, (4) Exclude terms and general terms that are not in the field of construction industry.

Next, in step S140, the metadata module 151 matches the keywords derived in steps S110, S120, and S130, and selects matching keywords in step S150.

In step S160, the metadata module 151 divides the previously selected keyword according to a predetermined classification. For example, the metadata module 151 can classify the selected keywords into six categories including management items, document types, materials, facility areas, types of work, and others as shown in Table 2 below.

Management topics Document type material Facilities ㅇ Area Work Other fair Review Steel pipe file runway theory Review Items Process Management contract Steel plate stairs Construction work conclusion Business expense Plan High strength concrete factory gas Corporation Manage expenses Management ledger aggregate Bridge metal Set destination region design standard Column concrete Play facilities machine Alternative Assessment Design management Statement basic concrete road Mechanical equipment construction bankruptcy Design stage method Foamed concrete balcony Waterproof Plan to use Design change Law Exposed concrete Shower booth Electricity Duration ... ... ... ... ... ...

Next, in step S170, the metadata module 151 derives a basic word among the selected keywords, and generates a basic word table that includes a plurality of levels of classification schemes having mutually hierarchical basic words. For example, the metadata module 151 derives a basic word among the selected keywords and generates a basic word table that constitutes a three-level classification system having a hierarchy of the derived basic words, as shown in Table 3 below.

Level 1 CODE Level 2 CODE Level 3 CODE Work W theory W01 fat W0101 A fine craft W0102 Reinforced concrete construction W02 Mold W0201 Remicon W0202 Root W0203 East W0204 Sat (engineering works) construction W03 Excavation W0301 pile W0302 blasting W0303 Tearer W0304 ... cost C overhead C01 Direct cost CO2 estimate C03 History C0301 ...

In step S180, the metadata module 151 derives a compound word, a derivative word, and a similar word from the derived basic word, and generates a compound word, a derivation word, and a similarity word table for setting a basic word relation based on a compound word, a derivative word, and a similar word. This association can be set via code. Table 4 is an example of a compound word table, Table 5 is an example of a derivative word table, and Table 6 shows an example of a similarity word table.

compound code Attribute 1 Attribute 2 Attribute 3 Hypothetical W010015 W01 W0102 Hypothetical W010016 W01 W0102 N07 Hypothetical suit tool W010017 W01 W0102 N01 History W010018 W01 W0102 C0301 Hypothetical scaffold W010018 W01 W0101

code Basic language Derivative 1 Derivative 2 ... EW0001 History Statement ... EW0002 plan Plan ... EW0003 unit price Price tag Unit price house ... EW0004 Mold Formwork ...

code Basic language Synonym 1 Synonym 2 ... SW0001 Dismantling remove ... SW0002 inspection Inspection exam ... SW0003 Soundproofing material Soundproofing Sound absorbing material ... SW0004 stainless stainless stainless ...

 As shown, the compound word table of Table 4 includes a plurality of compound words, and the plurality of compound words consists of a plurality of basic words. Accordingly, each compound word has a code of a basic language as an attribute (attributes 1, 2, 3). In addition, the derivation table in Table 5 has a basic word and at least one derivative word for the basic word. And the similarity table in Table 6 has a basic word and at least one similar word for the basic word.

The metadata library according to the embodiment of the present invention can be constructed according to the above-described method. The metadata library includes a plurality of basic words consisting of keywords used in the field of construction industry, A compound word (complex word) table derived from a plurality of basic words, a derivative word table derived from a plurality of basic words, and a similarity table derived from a plurality of basic words. In particular, the metadata library according to the embodiment of the present invention can update the basic word table, compound word table, derivative word table, and similarity word table by continuously reflecting data (keywords) accumulated in the practice.

Next, a registration procedure for deriving a tag for a document using the metadata library according to an embodiment of the present invention and assigning the tag to the document will be described. 4 is a flowchart illustrating a method of registering a document using a metadata library according to an embodiment of the present invention. FIG. 5 is a view illustrating a method of designating a part of a document as a registration area according to an embodiment of the present invention.

4, the document registration module 153 of the control unit 150 loads the document to be registered according to the input of the user through the input unit 120 at step S210, and displays the document through the display unit 130. FIG.

These documents can have various file formats. For example, the file format may be HWP, DOC, PDF, XLS, PPT, DWG, and the like. The user can select and register some or all of these documents. Accordingly, the user can make an input specifying some or all of the document. According to the embodiment of the present invention, when a user selects a part of a document, a part of the document can be selected as a registration area according to a method corresponding to a file format. The document registration module 153 derives a function of designating a registration area from an application using the file format, and provides it to the user for use. For example, as shown in FIG. 5, an application operating a HWP file can specify a block using a 'shift key and direction key', a mouse drag, 'ctrl and F3 key', and the like. Accordingly, the document registration module 153 derives a function of designating a block from the application, and provides it to the user for specifying a registration area. The DOC file also provides a function similar to that of the HWP file. Accordingly, the document registration module 153 can derive a function for designating a block from a corresponding application, so that the user can apply it to designate a registration area . In addition, it is possible to provide a function of selecting a cell provided by an XLS file, a function of selecting a page number in a PPT file, and the like so that a user can use it to designate a registration area. In addition, the user can designate an area to be registered in various other formats.

In this manner, when the user makes an input for specifying an area to be registered according to the file format, the document registration module 153 detects an input for designating the registration area of the user through the input unit 120 in step S220, Specify the registration area. If the registration area is designated, the document registration module 153 extracts all the keywords in the designated area of the document in step S230.

Subsequently, in step S240, the document registration module 153 extracts basic words matched to the extracted keywords in the basic speech table (S230). Then, in step S250, the document registration module 153 extracts a compound word and a derivation keyword associated with the extracted basic word from the compound word table and the derivative word table (S240). For example, in step S240, a basic word 'bidding' matching a keyword extracted from the basic word table is extracted, and a keyword 'bid guarantee', a 'bidder', and a 'bid system', which are compound words and derivations associated with the basic word extracted in step S250, Etc. can be extracted.

In step S260, the document registration module 153 displays the extracted basic word, compound word, and derivation word through the display unit 130 as a candidate tag. At this time, it is desirable that the candidate tags obtained from the metadata library are displayed according to the appearance frequency in the document and displayed together with the appearance frequency.

Next, in step S270, the document registration module 153 displays a plurality of occurrence keywords that are not matched with the metadata library but appear more than a predetermined number of times in the document through the display unit 130 as candidate tags. At this time, it is preferable that the candidate tags according to the multi-occurrence keywords are also sorted according to the appearance frequency in the document and displayed together with the appearance frequency.

On the other hand, if the user does not have a desired tag among the displayed candidate tags, or if the user determines that an additional tag is needed, the user can input a keyword to be used as a tag. Then, in step S280, the document registration module 153 receives the keyword input by the user through the input unit 120 and displays the input keyword on the display unit 130 as a candidate tag.

The user can select at least one of a plurality of candidate tags displayed on the display unit 130. [ Accordingly, when the user inputs at least one of the plurality of candidate tags through the input unit 120, the document registration module 153 detects the selected keyword in step S290 and determines the selected keyword as the tag of the document. When the tag is determined, the document registration module 153 performs registration to store the tag and the document in the storage unit 140 in step S300.

On the other hand, as described above, when only a part of the document is registered without registering the entire document, only the registration area designated separately from the original document can be stored as a file. At this time, the file name of the file of the registration area to be stored separately is referred to as a block name. Such a block name may be designated by a name input by the user, or may be specified in a format in which an additional serial number is appended to the file name of the original document, or may be designated as one of the tags of the corresponding registration area.

Next, a method of tagging a document using the metadata library according to another embodiment of the present invention will be described. 6 is a flowchart illustrating a method of registering a document using a metadata library according to another embodiment of the present invention.

Referring to FIG. 6, another embodiment of the present invention starts from a time when the document registration module 153 of the controller 150 detects that a preset period or a preset time has arrived. If the document registration module 153 detects that the preset period or the predetermined time has come, the document registration module 153 scans the designated folder in step S320 and extracts the unregistered document (unregistered document) among the documents stored in the folder . Here, the folder may be a folder of the document management apparatus 100 itself or a folder of another document management apparatus 100 connected to the network. Then, the following steps S330 through S390 are performed for all unregistered documents in the specified folder.

That is, the document registration module 153 extracts a keyword from the document in step S330. Subsequently, in step S340, the document registration module 153 extracts basic words matched with the extracted keywords in the basic word table (S330). Subsequently, the document registration module 153 extracts the compound words and derivative terms associated with the basic words extracted previously (S340) from the compound word table and the derivative word table in step S350, and extracts the basic words, the compound words, and the derivative words extracted in step S360 as candidate tags .

Next, in step S370, the document registration module 153 selects a plurality of occurrence keywords that do not match with the metadata library but occur more than a predetermined number of times in the document, as candidate tags.

In step S380, the document registration module 153 determines at least one of the plurality of candidate tags as a tag to be used in the corresponding document according to a predetermined rule. For example, the document registration module 153 may determine that a predetermined number of times or more of a plurality of candidate tags have appeared as a tag to be used in the document. As another example, the document registration module 153 may determine a predetermined number of tags as a tag to be used in the document in order of the number of occurrences among the plurality of candidate tags.

When the tag is determined, the document registration module 153 performs registration to store the tag and the document in the storage unit 140 in step S390.

As described above, the tag can be derived using the metadata library, and the tag can be given to the document and registered. Hereinafter, a method of retrieving a document using a metadata library and a tag will be described. 7 is a flowchart illustrating a method of searching for a document using a metadata library according to an embodiment of the present invention. 8 is a view showing a result of searching a document using a metadata library according to an embodiment of the present invention.

Referring to FIG. 7, when the search process starts according to the user's selection, the document search module 155 of the control unit 150 displays a search screen in step S410. The search screen includes a dialog box (Dialogue Box), which is a UI for inputting a search word. Accordingly, the user can input a search word through the input unit 120. [ In step S420, the document retrieval module 155 receives the retrieval word through the input unit 120, and in step S430, matches each table of the metadata library with the retrieval word.

At this time, it is determined whether there is a basic word matching the search term in step S440. That is, if there is a basic word matching the search term based on the basic word table as shown in Table 4, the document search module 155 proceeds to step S470 and performs a file name, a block name, and a tag search using the matched basic word do. That is, at this stage, the document retrieval module 155 refers to the metadata library and stores the matched basic word, derivative word or similar word of the basic word as a file name from the storage unit 140, Or a document in which a matched basic word, a derivation word of the basic word, or a similar word is assigned as a tag is searched. On the other hand, if there is no base word matching the search term, the document search module 155 proceeds to step S450.

In step S450, the document search module 155 determines whether there is a similar word or a derivation word matching the search word. That is, it is determined whether there exists a derivation word or a similar word matching the search term based on the similarity word table as shown in Table 5 and the derivation word table as shown in Table 6. Accordingly, if a derivation word or a similar word exists, the document search module 155 proceeds to step S460; otherwise, it proceeds to step S480. In step S460, the document retrieval module 155 converts a derivation word or a similar term into a basic language. Then, in step S470, the document search module 155 searches for a file name, a block name, and a tag using the converted basic word. That is, the document retrieval module 155 refers to the metadata library and stores the converted basic word, the derived word or the similar word of the basic word in the storage unit 140 as a file name, or the converted basic word, A similar word is used as a block name, or a document to which a converted basic word, a derivation word of the basic word, or a similar word is assigned as a tag is searched. For example, referring to Table 6, when a 'sound absorbing material' is inputted as a search word, the document retrieving module 155 may classify a document having a file name of a sound absorbing material, a similar word of a soundproofing material, a soundproofing material or a sound absorbing material, a soundproofing material, A document having a name, a soundproofing material, a soundproofing material, or a sound absorbing material.

On the other hand, when the search word is not matched to any of the basic word, derivation word, and similar word, the document search module 155 searches the file name, block name, and tag as the search word entered in step S480. That is, the document search module 155 searches the storage unit 140 for a document having a search word as a file name, a search word as a block name, or a search word as a tag.

As described above, after the search is completed, the document search module 155 derives, in the step S490, a compound word for a basic word or a search term, if any, as a related search term. At this time, the document retrieval module 155 first derives the upper fundamental words of the basic words used in the retrieval from the basic word table and derives all the compound words having the upper basic word derived from the compound word table as attributes have. That is, the document retrieval module 155 may derive all the compound words including the high-level basic word derived from the compound word table as related search terms.

A method of deriving a related search term will be described in more detail with reference to Tables 3 and 4. Referring to Tables 3 and 4, each compound word has codes of basic words that are mapped to basic words as attributes. Through this, the document search module 155 searches for a compound word having a basic word included in the attribute as a related keyword . For example, assuming that a user inputs a 'scaffold (code: W0101)' as a search word, a hypothesis (code: W01) It is possible to extract the compound words 'hypothetical badge', 'hypothetical badge plan', 'hypothetical suit tool', and 'hypothetical badge detail' as related search words. As described above, the present invention utilizes basic words having hierarchy as attributes, so that the level 1 field, which is impossible in the existing classification system, is related to the " hypostatic detail " having the history (C0301) You can present it as a search term.

Finally, the document search module 155 displays the search result on the display unit 130 in step S500. The search results basically include the search results of step S470 and step S480. In other words, a document having a basic word or a search word as a file name, a basic word or a search word as a block name, or a basic word or a document to which a search word is assigned is displayed. An example of such a screen is shown in Fig. As indicated by reference numeral 81, the retrieved documents (document files) for the search term 'supervision' are sorted and displayed. In addition, the search result may further include an associated search term derived in step S490. Reference numeral 83 denotes the derived related query.

On the other hand, the documents retrieved in the embodiment of the present invention can be sorted using the following sorting method. When you register a document, the most frequently used tag is the one that best describes the document. Accordingly, according to an embodiment of the present invention, a tag match score is used. The tag matching degree indicates the occurrence frequency of the corresponding tag (search term) with respect to the frequency of appearance of other tags in the document, and the score indicates the tag match degree score. According to one embodiment, the document retrieving module 155 obtains the tag matching score according to the following equation (1) for all the retrieved documents, arranges the documents in descending order of the degree of tag matching score, and displays the sorted documents through the display unit 130 .

Figure 112015061533558-pat00023

Figure 112015061533558-pat00024
: Tag match score

Figure 112015061533558-pat00025
: Frequency of tag appearance

For example, suppose that a document A and a document B are searched for in a document including a tag 'supervision' when a search is performed using the keyword 'supervision'. The tags of document A are 'Supervision (Frequency: 8)', 'Scheme (Frequency: 3)' and 'Construction (Frequency: 7) (Frequency: 2) ', and' report (frequency: 3) '. At this time, the tag matching score score of document A for the keyword 'supervision' is calculated by multiplying 'control (frequency: 8)' / [supervision (frequency: 8) + 'scheme (frequency: 3) Construction (frequency: 7) '] = 8/18 = 0.44 points. In addition, the score of the tag match score of document B is' 6 (frequency) '/' '(frequency 6)' 'construction (frequency 2)' + 'report (frequency 3)'] = 6/11 = 0.54. Accordingly, the document search module 155 places the document B at a higher position in the search result than the document A, as a result of the search term "supervision".

Meanwhile, according to another embodiment of the present invention, not only a tag match degree but also a file name match degree and a document rating can be used as parameters for sorting. Here, the file name match degree is the degree of matching between the search word and the file name (block name). The file name match score is given as 1 or 0 if the search term is included in words in the file name or block name. The document rating is the score of the utility of the document after the user has viewed the document.

For example, it is assumed that the search term is 'Supervision' and the document 1 and document 2 are searched for the search result. Here, it is assumed that the file name of the document 1 is 'review of the actual design construction workability', and the block name is the same as the file name 'review of the actual design construction workability'. In addition, the file name of Document 2 is 'maintenance guideline plan', and the block name is 'safety and maintenance plan'.

It is also assumed that the tag and the frequency of the tag when registering Document 1 are as shown in Table 7 below, and that the tag assigned at the time of document 2 registration and the frequency of the tag are as shown in Table 8 below.

tag frequency facility 12 erection 5 report 3 Supervision 3 land One construct One system 25

tag frequency facility 5 construct 4 report 4 building 2 Supervision One system 16

According to another embodiment of the present invention, the document retrieval module 155 assigns weights to the scores of the three parameters including the file name match degree, the tag name match degree, and the document rating, as shown in the following Equation 2, The score of the document ranking is derived by summing the granted scores, and the search results of the documents are displayed in the order of the highest ranking of the document ranking scores.

Figure 112015061533558-pat00026

Figure 112015061533558-pat00027
: Document Ranking Score

Figure 112015061533558-pat00028
: Normalized tag match score

Figure 112015061533558-pat00029
: Normalized file name match score

Figure 112015061533558-pat00030
: Normalized document rating

Figure 112015061533558-pat00031
: Weight for tag match

Figure 112015061533558-pat00032
: Weight for file name match

Figure 112015061533558-pat00033
: Weight for document ratings

Referring to Equation (2), the document ranking score of Document 1 can be derived as shown in Table 9 below.

Document 1 Basic score A Normalization (max = 1) B Weight C Document ranking score D = B ㅧ C File name (block name) match 0 0 0.5 0 Tag match 0.12 0.12 0.4 0.048 Document rating 10 One 0.1 0.1 0.148

First, the document search module 155 obtains a score of a file name (block name) match degree, a tag match degree, and a document score. The basic score of the file name conformance of Document 1 is 0 because it does not include the file name of the document 1, "review of the actual design construction" and the block name "review of the actual design construction". The tag match score is 3/25 = 0.12 according to Table 7. And a document rating (average score) given by a plurality of users is 10 points. The document search module 155 then normalizes the file name (block name) match degree, the tag match degree, and the document score score. Here, it is normalized to a score of 0 to 1. The file name (block name) match degree, the tag match degree, and the normalized score of the document rating are 0, 0.12, and 1, respectively. Subsequently, the document retrieval module 155 assigns weights to the file name (block name) match degree, the tag match degree, and the document rating, respectively. According to the embodiment of the present invention, it is possible to determine whether the file name search is important, whether the tag name search based on the document content is important or the document rating given by the user who browses the document is important . Here, it is assumed that 0.5, 0.4, and 0.1 are set as weights for the file name (block name) match degree, the tag match degree, and the document rating, respectively. Then, the document retrieving module 155 obtains a document ranking score according to Equation (2) according to the given weight. The document ranking score obtained for document 1 is 0.148.

Also, referring to Equation (2), the document ranking score of Document 2 can be obtained as shown in Table 10 below.

Document 2 Basic score A Normalization (max = 1) B Weight C Document ranking score D = B ㅧ C File name (block name) match 0 0 0.5 0 Tag match 0.063 0.063 0.4 0.025 Document rating 10 One 0.1 0.1 0.125

In the same way as Document 1, the document ranking score of Document 2 is 0.125. In this manner, when the document ranking score is obtained, the document retrieval module 155 arranges the documents in order of the document ranking score for all the retrieved documents, and displays the sorted documents through the display unit 130.

As described above, according to another embodiment of the present invention, when the search result is sorted and provided, the document search module 155 searches for a tag match degree indicating the occurrence frequency of the tag, A file name match degree indicating whether the file name or block name of the searched document includes a search word, and a document rating, which is a score given by the user to the searched document. In particular, the document retrieval module 155 may weight each of the tag match score, the file name match score, and the document score, and sort and provide the search results in the order of the total score, that is, the document ranking score.

Next, a method of retrieving a document according to another embodiment of the present invention will be described. 9 is a flowchart for explaining a method of retrieving a document according to another embodiment of the present invention. According to another embodiment of the present invention, there is a classification system (a project classification system, a year classification system, a work classification system, etc.), and any tag belongs to any one classification system. That is, the classification system specifies a classification to which the tag belongs in a preset reference. The classification system is stored in advance in the storage unit 140.

Referring to FIG. 9, when the search process starts according to the user's selection, the document search module 155 of the control unit 150 displays a search screen in step S510. The search screen can provide a UI for selecting a classification scheme. Accordingly, the user can input a selection through the input unit 120 to select one of the classification schemes. Then, the document retrieval module 155 receives the category selected by the user through the input unit 120 in step S520.

Then, the document retrieval module 155 extracts all the tags mapped to the category selected with reference to the classification scheme in step S530. Subsequently, the document retrieval module 155 retrieves the tag-attached document extracted in step S540. In step S550, the document retrieving module 155 displays the retrieved document through the display unit 130. [

Meanwhile, a method for managing a document using a metadata library including a document registration method and a document retrieval method according to an embodiment of the present invention is implemented in a form of a program readable by various computer means, And recorded on a recording medium. Here, the recording medium may include program commands, data files, data structures, and the like, alone or in combination. Program instructions to be recorded on a recording medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. For example, the recording medium may be a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical medium such as a CD-ROM or a DVD, a magneto-optical medium such as a floppy disk magneto-optical media, and hardware devices that are specially configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions may include machine language wires such as those produced by a compiler, as well as high-level language wires that may be executed by a computer using an interpreter or the like. Such a hardware device may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

While the present invention has been described with reference to several preferred embodiments, these embodiments are illustrative and not restrictive. It will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit of the invention and the scope of the appended claims.

100: Document management device
110:
120: Input unit
130:
140:
150:
151: Metadata module
153: Document registration module
155: Document Search Module

Claims (28)

An apparatus for managing a document, the apparatus comprising:
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, A storage unit for storing a metadata library including a similarity table having a similar word of the basic word; And
Extracting a plurality of keywords from a document, extracting a basic word matched to the keyword from the basic word table, extracting a derived word of the extracted basic word from the derived word table, Extracts the similar words, selects the extracted basic words, derivative words, and similar words as candidate tags, selects at least one of the selected candidate tags as a tag, assigns the selected tag to the document, And a document registration module for performing the document registration process.
The method according to claim 1,
The document registration module
A keyword having a predetermined appearance frequency among the keywords that are not the basic word, the analogous word, and the derivation word is derived from the document, and at least one of the derived keywords is assigned to the document as a tag and is stored in the storage unit .
The method according to claim 1,
The document registration module
Extracts a document to which the registration has not been made among a plurality of documents stored in a predetermined folder when the predetermined time or a preset period comes, and performs the registration on the extracted document Device.
The method according to claim 1,
Extracting a plurality of keywords from texts of a plurality of documents used in the field of construction industry, extracting a plurality of keywords from a search word used for document search in the field of construction industry, extracting keywords from a keyword Extracts a plurality of keywords,
Further comprising a meta data module for matching the extracted keywords to select matched keywords and deriving a basic word from the selected keyword to generate the basic word table having the derived basic word, Apparatus for management.
5. The method of claim 4,
The metadata module
A compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and a similarity table having a similar word of the basic word are generated.
5. The method of claim 4,
The metadata module
When extracting the plurality of keywords,
Extracting a word composed of at least two syllables of the text, the search word and the keyword, segmenting the extracted words into a minimum word unit capable of recognizing meaning, removing redundant words from the segmented phrases, And extracts the plurality of keywords by excluding terms and general terms that are not terms in the construction industry field specified in the term table.
An apparatus for managing a document, the apparatus comprising:
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, A storage unit for storing a metadata library including a similarity table having a similar word of the basic word and storing at least one of a basic word, a derivation word, and a similar word of the metadata library as a tag;
An input unit for receiving a search word from a user; And
If the search word is input, the metadata library is searched. If the search word is a basic word, a search is made for a basic word, which is a search word in the storage unit, a derivative word of a basic word, And a document retrieval module for providing the retrieved document.
8. The method of claim 7,
The document search module
Searching the metadata library to search for a document that is derived from the base word, the derived word of the converted basic word, or a similar word from the storage unit if the search word is a derivative word or a similar word, And provides a search result. ≪ Desc / Clms Page number 19 >
9. The method according to claim 7 or 8,
The document search module
Deriving an upper basic word of a basic word used in the retrieval from the basic word table and deriving all compound words including an upper basic word derived from the compound word table as a related retrieval word, .
8. The method of claim 7,
The document search module
And arranging and providing the search results in descending order of the degree of tag match score indicating the appearance frequency of the tag, which is a search word, with respect to the appearance frequency of other tags in the searched document.
11. The method of claim 10,
The document search module
Equation
Figure 112015061533558-pat00034

The tag matching degree score is calculated by using the tag matching score,
remind
Figure 112015061533558-pat00035
Is the tag match score,
remind
Figure 112015061533558-pat00036
Is an appearance frequency of the tag.
8. The method of claim 7,
The document search module
A tag match degree score indicating a frequency of appearance of a tag that is a search word with respect to appearance frequencies of other tags in the searched document, a file name match degree score indicating whether a file name or block name of the searched document includes a search word, And the search result is sorted in a descending order of the weighted tag match score, the file name match score, and the document score. .
13. The method of claim 12,
The document search module
Equation
Figure 112015061533558-pat00037

To calculate the total score,
remind
Figure 112015061533558-pat00038
Is a document ranking score, which is the summed score,
remind
Figure 112015061533558-pat00039
Is a normalized tag match score,
remind
Figure 112015061533558-pat00040
Is a normalized file name match score,
remind
Figure 112015061533558-pat00041
Is a normalized document rating,
remind
Figure 112015061533558-pat00042
Is a weight for the tag match degree,
remind
Figure 112015061533558-pat00043
Is a weight for a file name match degree,
remind
Figure 112015061533558-pat00044
Is a weight for a document rating. ≪ RTI ID = 0.0 > 8. < / RTI >
An apparatus for managing a document, the apparatus comprising:
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, A meta data library including a similarity table having a similar word of the basic word, wherein at least one of a basic word, a derivation word, and a similar word of the meta data library is stored as a tag, A storage unit for storing a classification system to which a tag belonging is assigned;
An input unit for receiving at least one classification of the classification system from a user; And
And a document search module for extracting a tag belonging to the classification and searching for a document to which the extracted tag is assigned and providing a search result when the classification is inputted.
A method for managing a document,
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, Storing a metadata library including a similarity table having a similar word of the basic word;
Extracting a plurality of keywords from the document, extracting basic words matched to the keywords from the basic word table, extracting the derived words of the extracted basic words from the derived word table, Extracting a similar word;
Selecting the extracted basic words, derivative words, and similar words as candidate tags, and selecting at least one of the selected candidate tags as tags; And
And performing a registration for assigning and storing the selected tag to the document.
16. The method of claim 15,
The step of selecting by the tag
Further comprising the step of deriving a keyword having a predetermined appearance frequency among the keywords that are not the basic words, the similar words, and the derivatives, from the document, and selecting at least one of the derived keywords as the tag Lt; / RTI >
16. The method of claim 15,
Before the storing step,
Extracting a plurality of keywords from texts of a plurality of documents used in the field of construction industry, extracting a plurality of keywords from a search word used for document search in the field of construction industry, extracting keywords from a keyword Extracting a plurality of keywords; And
Selecting a matched keyword by matching the extracted keywords with each other and deriving a basic word from the selected keyword to generate the basic word table having the derived basic word, Lt; / RTI >
18. The method of claim 17,
After the step of generating the basic word table,
Further comprising: generating a compound word table composed of at least one of the basic words, a derivative word table having a derivative word of the basic word, and the similarity table having a similar word of the basic word, Lt; / RTI >
18. The method of claim 17,
The step of extracting the plurality of keywords
Extracting words composed of the text, the search word, and two or more syllables of the keyword;
Segmenting the extracted words into minimum word units capable of semantic recognition;
Removing redundant words from the segmented phrases; And
Extracting the plurality of keywords by excluding terms and general terms that are not terms in the construction industry field specified in the previously stored erasure term table.
A method for managing a document,
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, Storing a metadata library including a similarity table having a similar word of the basic word, storing at least one of a basic word, a derivation word, and a similar word of the metadata library as a tag; And
If the search word is input, searching the metadata library to search for a document that is a basic word, which is a search word, a derivation word of a basic word that is a search word, or a tag to which a similar word is assigned, if the search word is a basic word, ≪ / RTI >
21. The method of claim 20,
If the search word is a derivative word or a similar word, converts the derivative word or the similar word into a base word, searches a document to which the converted base word, a derived word of the converted base word or a similar word is assigned as the tag, ≪ / RTI > further comprising the step of:
22. The method of claim 21,
Deriving an upper basic word of the basic word used in the retrieval from the basic word table and deriving all compound words including an upper basic word derived from the compound word table as a related retrieval word, A method for managing a document comprising:
21. The method of claim 20,
The step of providing the search result
And arranging the search results in a descending order of the degree of tag match score indicating the appearance frequency of the tag, which is a search term in relation to appearance frequencies of other tags in the searched document.
24. The method of claim 23,
The step of providing the search result
Equation
Figure 112016046045235-pat00045

The tag matching degree score is calculated by using the tag matching score,
remind
Figure 112016046045235-pat00046
Is the tag match score,
remind
Figure 112016046045235-pat00047
Is the appearance frequency of the tag.
21. The method of claim 20,
The step of providing the search result
A tag match degree score indicating a frequency of appearance of a tag that is a search word with respect to appearance frequencies of other tags in the searched document, a file name match degree score indicating whether a file name or block name of the searched document includes a search word, And the search result is sorted in a descending order of the weighted tag match score, the file name match score, and the document score. Way.
26. The method of claim 25,
The step of providing the search result
Equation
Figure 112015061533558-pat00048

To calculate the total score,
remind
Figure 112015061533558-pat00049
Is a document ranking score, which is the summed score,
remind
Figure 112015061533558-pat00050
Is a normalized tag match score,
remind
Figure 112015061533558-pat00051
Is a normalized file name match score,
remind
Figure 112015061533558-pat00052
Is a normalized document rating,
remind
Figure 112015061533558-pat00053
Is a weight for the tag match degree,
remind
Figure 112015061533558-pat00054
Is a weight for a file name match degree,
remind
Figure 112015061533558-pat00055
Is a weight for a document rating.
A method for managing a document,
A compound word table composed of a combination of at least one of the basic words, a derivative word table having a derivative word of the basic word, A meta data library including a similarity table having a similar word of the basic word, wherein at least one of a basic word, a derivation word, and a similar word of the meta data library is stored as a tag, Storing a classification system to which a tag belonging is assigned;
Extracting a tag belonging to the classification if the classification is inputted; And
And searching for a document to which the extracted tag is attached to provide a search result.
27. A computer-readable recording medium having recorded thereon a program for executing a method for managing a document according to any one of claims 20 to 27 via a computer.
KR1020150090254A 2015-06-25 2015-06-25 An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method KR101662527B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150090254A KR101662527B1 (en) 2015-06-25 2015-06-25 An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150090254A KR101662527B1 (en) 2015-06-25 2015-06-25 An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method

Publications (1)

Publication Number Publication Date
KR101662527B1 true KR101662527B1 (en) 2016-10-14

Family

ID=57157323

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150090254A KR101662527B1 (en) 2015-06-25 2015-06-25 An apparatus for managing document using meta-data library, related a plurality of drawings, a method thereof, and a computer recordable medium storing the method

Country Status (1)

Country Link
KR (1) KR101662527B1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060130893A (en) 2005-06-09 2006-12-20 이상호 Project product auto converting module for cals/ec
WO2007052285A2 (en) * 2005-07-22 2007-05-10 Yogesh Chunilal Rathod Universal knowledge management and desktop search system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060130893A (en) 2005-06-09 2006-12-20 이상호 Project product auto converting module for cals/ec
WO2007052285A2 (en) * 2005-07-22 2007-05-10 Yogesh Chunilal Rathod Universal knowledge management and desktop search system

Similar Documents

Publication Publication Date Title
US10140371B2 (en) Providing multi-lingual searching of mono-lingual content
CN111125343B (en) Text analysis method and device suitable for person post matching recommendation system
CN102819604B (en) Method for retrieving confidential information of file and judging and marking security classification based on content correlation
CA2777520C (en) System and method for phrase identification
US20180341866A1 (en) Method of building a sorting model, and application method and apparatus based on the model
CN110781670B (en) Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vectors
TW202020688A (en) Method for determining address text similarity, address searching method, apparatus, and device
US20180181544A1 (en) Systems for Automatically Extracting Job Skills from an Electronic Document
CN107958014B (en) Search engine
Heu et al. FoDoSu: multi-document summarization exploiting semantic analysis based on social Folksonomy
CN105183761A (en) Sensitive word replacement method and apparatus
US20150193447A1 (en) Synthetic local type-ahead suggestions for search
CN105209858B (en) The uncertainty of business location's data disappears qi and matching
CN113627797B (en) Method, device, computer equipment and storage medium for generating staff member portrait
CN112163424A (en) Data labeling method, device, equipment and medium
US20130232147A1 (en) Generating a taxonomy from unstructured information
US20200192921A1 (en) Suggesting text in an electronic document
KR101768089B1 (en) An apparatus for managing document using table of contents, a method thereof, and a computer recordable medium storing the method
CN111259262A (en) Information retrieval method, device, equipment and medium
CN115017425B (en) Location search method, location search device, electronic device, and storage medium
CN113204667A (en) Method and device for training audio labeling model and audio labeling
CN115309994A (en) Location search method, electronic device, and storage medium
CN106547732A (en) Near synonym recognition methodss and near synonym identifying system
KR102609616B1 (en) Method and apparatus for image processing, electronic device and computer readable storage medium
CN112597748B (en) Corpus generation method, corpus generation device, corpus generation equipment and computer-readable storage medium

Legal Events

Date Code Title Description
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20190731

Year of fee payment: 4