TWI647576B - Intelligent article association dictionary system and method - Google Patents
Intelligent article association dictionary system and method Download PDFInfo
- Publication number
- TWI647576B TWI647576B TW106133887A TW106133887A TWI647576B TW I647576 B TWI647576 B TW I647576B TW 106133887 A TW106133887 A TW 106133887A TW 106133887 A TW106133887 A TW 106133887A TW I647576 B TWI647576 B TW I647576B
- Authority
- TW
- Taiwan
- Prior art keywords
- industry
- target
- server unit
- data
- word
- Prior art date
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
一種智能文章關聯詞典系統。一關聯詞典伺服器單元對多篇原始文章進行斷詞產生一包含多數個詞的詞彙資料。關聯詞典伺服器單元判斷詞彙資料的每一目標詞與多個公司資料的每一目標者之間是否符合一產業關聯明確條件,並於符合時產生一產業關聯資料,產業關聯資料包含做為一關鍵詞之目標詞,及目標公司資料。關聯詞典伺服器單元判斷詞彙資料的每一目標詞與每一目標產業關聯資料之間是否符合一詞彙關聯明確條件,並於符合時產生一用於後續分析待分析文章的詞彙關聯資料,詞彙關聯資料包含做為一關聯詞之目標詞,及目標產業關聯資料。An intelligent article association dictionary system. A related dictionary server unit performs word segmentation on multiple original articles to generate a vocabulary data containing a plurality of words. The association dictionary server unit determines whether each target word of the vocabulary data and each target person of multiple company data meet an industry-specific clear condition, and generates an industry-related data when the industry-related data is included. Keyword target words, and target company information. The related dictionary server unit judges whether each target word of the lexical data and each target industry related data meet a lexical relationship clear condition, and generates a lexical related data for subsequent analysis of the article to be analyzed when the match is reached, and the lexical related The data includes the target word as a related word and related data of the target industry.
Description
本發明是有關於一種智能系統,特別是指一種用於分析文章的智能文章關聯詞典系統。 The invention relates to an intelligent system, in particular to an intelligent article association dictionary system for analyzing articles.
目前網路上有大量的財經新聞、研究報告可供投資者存取,做為其進行投資時的參考。然而對於生活忙碌、工作負擔繁重、可自由運用的時間有限的投資者而言,常常沒有足夠的時間好好閱讀前述網路上的財經文章,也沒有時間於閱讀完後釐清歸納出重點。本案發明人遂思及,若能發展出一種智能文章分析系統,能幫助投資者快速釐清文章主題、提供相關聯之詞彙、便於進行相應投資配置,對投資者而言將有相當的助益。 At present, there are a large number of financial news and research reports available on the Internet for investors to access and use as a reference when investing. However, for investors with a busy life, a heavy workload and limited free time, they often don't have enough time to read the aforementioned financial articles on the Internet and have no time to clarify the points after reading. The inventor of the case then thought that if an intelligent article analysis system could be developed that could help investors quickly clarify the topic of the article, provide related vocabulary, and facilitate the corresponding investment allocation, it would be of considerable help to investors.
因此,本發明的目的,即在提供一種智能文章關聯詞典系統。 Therefore, an object of the present invention is to provide a smart article association dictionary system.
本發明的另一目的,即在提供一種智能文章關聯詞彙分析方法。 Another object of the present invention is to provide a method for analyzing lexical terms related to intelligent articles.
於是,本發明智能文章關聯詞典系統,包含一關聯詞典伺服器單元及一終端電子裝置。該關聯詞典伺服器單元儲存有多數個辭典、多數篇原始文章及多數筆公司資料,該等公司資料的每一者包含一公司名稱及一產業類別。該終端電子裝置能與該關聯詞典伺服器單元通訊。 Therefore, the intelligent article related dictionary system of the present invention includes an associated dictionary server unit and a terminal electronic device. The related dictionary server unit stores a plurality of dictionaries, a plurality of original articles, and a plurality of company information, each of which includes a company name and an industry category. The terminal electronic device can communicate with the associated dictionary server unit.
該關聯詞典伺服器單元根據該等辭典對該等原始文章進行斷詞,產生一詞彙資料,該詞彙資料包含多數個詞。 The related dictionary server unit performs word segmentation on the original articles according to the dictionaries to generate a lexical data, the lexical data contains a plurality of words.
該關聯詞典伺服器單元判斷該詞彙資料的該等詞的每一個做為一目標詞的詞與該等公司資料的每一個做為一目標公司資料的公司資料之間是否符合一產業關聯明確條件,該產業關聯明確條件包含該目標公司資料的該公司名稱和該產業類別及該目標詞共同出現於同一原始文章的原始文章的數目與出現有該目標詞的原始文章的數目之比值落於一第一預定比值範圍。 The related dictionary server unit judges whether each of the words of the vocabulary data as a target word and the company data of each of the company data as a target company data meets an industry-specific clear condition , The ratio of the number of original articles in which the industry association clear condition includes the target company's information and the industry category and the target word appear in the same original article to the number of original articles in which the target word appears The first predetermined ratio range.
當該關聯詞典伺服器單元判斷該目標詞與該目標公司資料之間符合該產業關聯明確條件,該關聯詞典伺服器單元產生一產業關聯資料,該產業關聯資料包含做為一關鍵詞之該目標詞,及該目標公司資料。 When the related dictionary server unit judges that the target word and the target company's data meet the specific conditions of the industry association, the related dictionary server unit generates an industry related data, the industry related data contains the target as a keyword Words, and information about the target company.
該關聯詞典伺服器單元判斷該詞彙資料的該等詞的每一個做為該目標詞的詞與該等產業關聯資料的每一個做為一目標產業關聯資料的產業關聯資料之間是否符合一詞彙關聯明確條件,該 詞彙關聯明確條件包含該目標產業關聯資料的該關鍵詞和該公司資料的該產業類別及該目標詞共同出現於同一原始文章的原始文章的數目與出現有該目標詞的原始文章的數目之比值落於一第二預定比值範圍,及該目標產業關聯資料的該關鍵詞和該公司資料的該產業類別及該目標詞共同出現於同一原始文章的原始文章的數目與該目標產業關聯資料的該關鍵詞和該公司資料的該產業類別共同出現於同一原始文章的原始文章的數目之比值落於一第三預定比值範圍。 The related dictionary server unit judges whether each of the words of the lexical data as the target word and the industry-related data of each of the industry-related data as a target industry-related data match a vocabulary. Associated with clear conditions, the The vocabulary association clear condition includes the ratio of the number of original articles in which the keyword of the target industry-related information and the industry category of the company information and the target word co-occur in the same original article to the number of original articles in which the target word appears The number of original articles falling within a second predetermined ratio range, the keywords of the target industry-related information, the industry category of the company information, and the target word appearing in the same original article together with the target industry-related information. The ratio of the number of the original articles in which the keywords and the industry category of the company data appear together in the same original article falls within a third predetermined ratio range.
當該關聯詞典伺服器單元判斷該目標詞與該目標產業關聯資料之間符合該詞彙關聯明確條件,該關聯詞典伺服器單元產生一詞彙關聯資料,該詞彙關聯資料包含做為一關聯詞之該目標詞,及該目標產業關聯資料。 When the related dictionary server unit judges that the target word and the target industry related data meet the vocabulary relationship clear condition, the related dictionary server unit generates a lexical related data, the lexical related data contains the target as a related word Words, and relevant information about the target industry.
該終端電子裝置傳送一分析請求給該關聯詞典伺服器單元,該分析請求包含一待分析文章。 The terminal electronic device sends an analysis request to the associated dictionary server unit, and the analysis request includes an article to be analyzed.
當該關聯詞典伺服器單元接收到該分析請求,該關聯詞典伺服器單元從該待分析文章選定一代表詞。 When the related dictionary server unit receives the analysis request, the related dictionary server unit selects a representative word from the article to be analyzed.
該關聯詞典伺服器單元根據該代表詞及該等詞彙關聯資料產生一分析結果,並將該分析結果傳送給該終端電子裝置,該分析結果包含該等詞彙關聯資料中該關鍵詞與該代表詞相同者。 The related dictionary server unit generates an analysis result according to the representative word and the related information of the vocabulary, and transmits the analysis result to the terminal electronic device. The analysis result includes the keyword and the representative word in the related information of the vocabulary. The same.
當該終端電子裝置接收到該分析結果,該終端電子裝置 顯示該分析結果。 When the terminal electronic device receives the analysis result, the terminal electronic device The analysis results are displayed.
在一些實施態樣中,該終端電子裝置是以一旭日圖顯示該分析結果。 In some implementations, the terminal electronic device displays the analysis result in a sunburst chart.
在一些實施態樣中,當該關聯詞典伺服器單元接收到該分析請求,該關聯詞典伺服器單元是先對該待分析文章的一標題進行斷詞以產生一標題詞彙,該標題詞彙包含多個候選詞,該關聯詞典伺服器單元從該標題詞彙的該等候選詞中選定詞頻最大者做為該代表詞。 In some implementation aspects, when the related dictionary server unit receives the analysis request, the related dictionary server unit first performs a word break on a title of the article to be analyzed to generate a title vocabulary, and the title vocabulary includes multiple Candidate words, the associated dictionary server unit selects the candidate word with the highest frequency among the candidate words of the title word as the representative word.
在一些實施態樣中,該產業關聯明確條件還包含該目標詞非為一介系詞。 In some implementation forms, the industry-specific explicit condition also includes that the target word is not a preposition.
本發明智能文章關聯詞彙分析方法,藉由一智能文章關聯詞典系統實施,該智能文章關聯詞典系統包含一關聯詞典伺服器單元及一終端電子裝置,該關聯詞典伺服器單元儲存有多數個辭典、多數篇原始文章及多數筆公司資料,該等公司資料的每一者包含一公司名稱及一產業類別,該終端電子裝置能與該關聯詞典伺服器單元通訊,該方法包含:該關聯詞典伺服器單元根據該等辭典對該等原始文章進行斷詞,產生一詞彙資料,該詞彙資料包含多數個詞;該關聯詞典伺服器單元判斷該詞彙資料的該等詞的每一個做為一目標詞的詞與該等公司資料的每一個做為一目標公司資料的公司資料之間是否符合一產業關聯明確條件,該產業關聯明確條件包 含該目標公司資料的該公司名稱和該產業類別及該目標詞共同出現於同一原始文章的原始文章的數目與出現有該目標詞的原始文章的數目之比值落於一第一預定比值範圍;當該關聯詞典伺服器單元判斷該目標詞與該目標公司資料之間符合該產業關聯明確條件,該關聯詞典伺服器單元產生一產業關聯資料,該產業關聯資料包含做為一關鍵詞之該目標詞,及該目標公司資料;該關聯詞典伺服器單元判斷該詞彙資料的該等詞的每一個做為該目標詞的詞與該等產業關聯資料的每一個做為一目標產業關聯資料的產業關聯資料之間是否符合一詞彙關聯明確條件,該詞彙關聯明確條件包含該目標產業關聯資料的該關鍵詞和該公司資料的該產業類別及該目標詞共同出現於同一原始文章的原始文章的數目與出現有該目標詞的原始文章的數目之比值落於一第二預定比值範圍,及該目標產業關聯資料的該關鍵詞和該公司資料的該產業類別及該目標詞共同出現於同一原始文章的原始文章的數目與該目標產業關聯資料的該關鍵詞和該公司資料的該產業類別共同出現於同一原始文章的原始文章的數目之比值落於一第三預定比值範圍;當該關聯詞典伺服器單元判斷該目標詞與該目標產業關聯資料之間符合該詞彙關聯明確條件,該關聯詞典伺服器單元產生一詞彙關聯資料,該詞彙關聯資料包含做為一關聯詞之該目標詞,及該目標產業關聯資料;該終端電子裝置傳送一分析請求給該關聯詞典伺服器單元,該 分析請求包含一待分析文章;當該關聯詞典伺服器單元接收到該分析請求,該關聯詞典伺服器單元從該待分析文章選定一代表詞;該關聯詞典伺服器單元根據該代表詞及該等詞彙關聯資料產生一分析結果,並將該分析結果傳送給該終端電子裝置,該分析結果包含該等詞彙關聯資料中該關鍵詞與該代表詞相同者;及當該終端電子裝置接收到該分析結果,該終端電子裝置顯示該分析結果。 The intelligent article related vocabulary analysis method of the present invention is implemented by an intelligent article related dictionary system. The intelligent article related dictionary system includes a related dictionary server unit and a terminal electronic device. The related dictionary server unit stores a plurality of dictionaries, Most original articles and most company information, each of which includes a company name and an industry category, and the terminal electronic device can communicate with the associated dictionary server unit. The method includes: the associated dictionary server The unit performs word segmentation on the original articles according to the dictionaries to generate a vocabulary data containing a plurality of words; the associated dictionary server unit judges each of the words in the lexical data as a target word Whether the terms and conditions of an industry association are clear between the company's information and each of the company's information as a target company's information. The ratio of the number of original articles in which the company name, the industry category, and the target word together with the target company's information appear in the same original article to the number of original articles in which the target word appears falls within a first predetermined ratio range; When the related dictionary server unit judges that the target word and the target company's data meet the specific conditions of the industry association, the related dictionary server unit generates an industry related data, the industry related data contains the target as a keyword Word, and the target company information; the related dictionary server unit judges each of the words of the lexical data as the target word and each of the industry-related data as an industry of the target industry-related data Whether the related materials meet a vocabulary clear condition that includes the keyword of the target industry related data, the industry category of the company data, and the number of original articles in which the target word appears in the same original article The ratio to the number of original articles in which the target word appears falls within a second predetermined ratio The number of original articles in which the keyword of the target industry-related information and the industry category of the company information and the target word co-occurred in the same original article and the keywords of the target industry-related information and the company information The ratio of the number of original articles in which the industry category co-occurs in the same original article falls within a third predetermined ratio range; when the related dictionary server unit judges that the target word and the target industry related data meet the vocabulary association clear conditions , The related dictionary server unit generates a lexical related data, the lexical related data includes the target word as a related word, and the target industry related data; the terminal electronic device sends an analysis request to the related dictionary server unit, The The analysis request includes an article to be analyzed; when the related dictionary server unit receives the analysis request, the related dictionary server unit selects a representative word from the to-be-analyzed article; the related dictionary server unit according to the representative word and the like The lexical association data generates an analysis result, and transmits the analysis result to the terminal electronic device. The analysis result includes the keywords in the lexical association data that are the same as the representative word; and when the terminal electronic device receives the analysis As a result, the terminal electronic device displays the analysis result.
本發明的功效在於:藉由該產業關聯明確條件產生該等產業關聯資料,並藉由該詞彙關聯明確條件產生該等詞彙關聯資料,並藉由根據該代表詞及該等詞彙關聯資料產生該分析結果,從而能幫助投資人(操作該終端電子裝置的使用者)快速釐清該待分析文章的主題(該代表詞),並提供投資人相關聯之詞彙(該等關聯詞、該等公司名稱、該等產業類別),以利於投資人進行相應投資配置(例如買賣該等公司名稱對應之金融商品)。 The effect of the present invention is that the industry-related data is generated by the industry-specific explicit conditions, the lexical-related data is generated by the lexical-specific explicit conditions, and the The analysis results can help the investor (the user who operates the terminal electronic device) to quickly clarify the subject of the article to be analyzed (the representative word), and provide the investor's related vocabulary (the related words, the company names, (These industry categories) to facilitate investors' corresponding investment allocation (such as buying and selling financial products corresponding to the names of these companies).
100‧‧‧智能文章關聯詞典系統 100‧‧‧Intelligent Article Association Dictionary System
1‧‧‧關聯詞典伺服器單元 1‧‧‧Related dictionary server unit
D1‧‧‧辭典 D1‧‧‧ dictionary
D2‧‧‧原始文章 D2‧‧‧ Original Article
D3‧‧‧公司資料 D3‧‧‧Company Information
2‧‧‧終端電子裝置 2‧‧‧ terminal electronics
200‧‧‧旭日圖 200‧‧‧ Rising Sun
S01~S03‧‧‧步驟 S01 ~ S03‧‧‧step
S11~S15‧‧‧步驟 S11 ~ S15‧‧‧step
本發明的其他的特徵及功效,將於參照圖式的實施方式中清楚地呈現,其中:圖1是本發明智能文章關聯詞典系統的一實施例的一硬體連接關係示意圖;圖2是該實施例的一流程圖,說明一資料分析程序的步驟; 圖3是該實施例的另一流程圖,說明一資料回饋程序的步驟;及圖4是該實施例的一旭日圖。 Other features and effects of the present invention will be clearly presented in the embodiment with reference to the drawings, wherein: FIG. 1 is a schematic diagram of a hardware connection relationship of an embodiment of the intelligent article association dictionary system of the present invention; FIG. 2 is A flowchart of the embodiment, explaining the steps of a data analysis program; FIG. 3 is another flowchart of the embodiment, illustrating the steps of a data feedback procedure; and FIG. 4 is a sunburst diagram of the embodiment.
在本發明被詳細描述之前,應當注意在以下的說明內容中,類似的元件是以相同的編號來表示。 Before the present invention is described in detail, it should be noted that in the following description, similar elements are represented by the same numbers.
參閱圖1,本發明智能文章關聯詞典系統100的一實施例,包含一關聯詞典伺服器單元1及一終端電子裝置2。 Referring to FIG. 1, an embodiment of an intelligent article association dictionary system 100 according to the present invention includes an association dictionary server unit 1 and a terminal electronic device 2.
該終端電子裝置2能經由一通訊網路與該關聯詞典伺服器單元1通訊。該終端電子裝置2例如是一智慧型手機、一平板電腦、一桌上型電腦或一膝上型電腦,但不以此為限。 The terminal electronic device 2 can communicate with the associated dictionary server unit 1 via a communication network. The terminal electronic device 2 is, for example, a smart phone, a tablet computer, a desktop computer, or a laptop computer, but is not limited thereto.
該關聯詞典伺服器單元1儲存有多數個辭典D1、多數篇原始文章D2,及多數筆公司資料D3。該等辭典D1例如分別為通俗詞彙字典及專業詞彙字典。該等原始文章D2例如是投顧研究報告及財金新聞,但不以此為限。該等公司資料D3的每一者包含一公司名稱及一產業類別,舉例來說,該公司名稱及該產業類別分別為「和碩」及「電腦週邊」,或者,該公司名稱及該產業類別分別為「南亞科」及「半導體」。 The related dictionary server unit 1 stores a plurality of dictionaries D1, a plurality of original articles D2, and a plurality of company data D3. The dictionaries D1 are, for example, a popular vocabulary dictionary and a professional vocabulary dictionary. These original articles D2 are, for example, investment advisory research reports and financial news, but not limited to this. Each of these company information D3 includes a company name and an industry category, for example, the company name and the industry category are "Heshuo" and "Computer Peripheral", respectively, or the company name and the industry category They are "South Asia Branch" and "Semiconductor".
以下配合圖1及圖2說明該智能文章關聯詞典系統100的 該關聯詞典伺服器單元1執行一資料分析程序的步驟。首先,如步驟S01所示,該關聯詞典伺服器單元1根據該等辭典D1對該等原始文章D2進行斷詞,產生一詞彙資料,該詞彙資料包含多數個詞。 The following describes the intelligent article association dictionary system 100 with reference to FIGS. 1 and 2. The related dictionary server unit 1 executes steps of a data analysis program. First, as shown in step S01, the related dictionary server unit 1 performs word segmentation on the original articles D2 according to the dictionaries D1 to generate a vocabulary data that includes a plurality of words.
接著,如步驟S02所示,該關聯詞典伺服器單元1判斷該詞彙資料的該等詞的每一個做為一目標詞的詞與該等公司資料D3的每一個做為一目標公司資料的公司資料之間是否符合一產業關聯明確條件,該產業關聯明確條件包含該目標詞非為一介系詞,及該目標公司資料D3的該公司名稱和該產業類別及該目標詞共同出現於同一原始文章D2的原始文章D2的數目與出現有該目標詞的原始文章D2的數目之比值落於一第一預定比值範圍。 Next, as shown in step S02, the related dictionary server unit 1 judges each of the words of the vocabulary data as a target word and each of the company data D3 as a target company data company. Whether the data meets an industry association clear condition, the industry association clear condition includes that the target word is not a preposition, and that the company name of the target company information D3 and the industry category and the target word appear in the same original article together The ratio of the number of original articles D2 of D2 to the number of original articles D2 where the target word appears falls within a first predetermined ratio range.
當該關聯詞典伺服器單元1判斷該目標詞與該目標公司資料D3之間符合該產業關聯明確條件,該關聯詞典伺服器單元1產生一產業關聯資料,該產業關聯資料包含做為一關鍵詞之該目標詞,及該目標公司資料D3。 When the related dictionary server unit 1 determines that the target word and the target company data D3 meet the clear conditions of the industry association, the related dictionary server unit 1 generates an industry related data, which includes the industry related data as a keyword The target word, and the target company information D3.
舉例來說,當該關聯詞典伺服器單元1要判斷該詞彙資料的該等詞中的詞一「英業達」與該等公司資料D3中的其中一公司資料D3的公司名稱一「和碩」及產業類別一「電腦週邊」是否符合該產業關聯明確條件時,該關聯詞典伺服器單元1是先判斷「英業達」非為一介系詞,接著將「英業達」、「和碩」、「電腦週邊」共同出現於同一原始文章D2的原始文章D2的數目(例如40篇)除 以出現有「英業達」的原始文章D2的數目(例如100篇),再判斷前述比值(例如0.4)是否落於該第一預定比值範圍(例如0.25~1)。 For example, when the related dictionary server unit 1 is to judge the word "Inventec" among the words in the vocabulary data and the company name of one of the company data D3 in the company data D3-"Heshuo" "And the industry category 1" Computer Peripheral "meets the clear conditions of the industry association, the related dictionary server unit 1 first determines whether" Inventec "is not a preposition, and then" Inventec "," Heshuo " "," Computer Peripherals ", the number of original articles D2 (for example, 40) that appear in the same original article D2 Based on the number of original articles D2 (for example, 100) in which "Inventec" appears, it is then determined whether the aforementioned ratio (for example, 0.4) falls within the first predetermined ratio range (for example, 0.25 to 1).
當該關聯詞典伺服器單元1判斷「英業達」與「和碩」、「電腦週邊」之間符合該產業關聯明確條件(例如0.4落在0.25~1內),該關聯詞典伺服器單元1產生該產業關聯資料,該產業關聯資料包含做為一關鍵詞之「英業達」,及「和碩」、「電腦週邊」。 When the related dictionary server unit 1 determines that "Inventec", "Pegatron" and "computer peripherals" meet the clear conditions of the industry association (for example, 0.4 falls within 0.25 ~ 1), the related dictionary server unit 1 Generate industry-related data. The industry-related data includes "Inventec" as a keyword, and "Heshuo" and "Computer Peripheral".
接著,如步驟S03所示,該關聯詞典伺服器單元1判斷該詞彙資料的該等詞的每一個做為該目標詞的詞與該等產業關聯資料的每一個做為一目標產業關聯資料的產業關聯資料之間是否符合一詞彙關聯明確條件。該詞彙關聯明確條件包含該目標產業關聯資料的該關鍵詞和該公司資料D3的該產業類別及該目標詞共同出現於同一原始文章D2的原始文章D2的數目與出現有該目標詞的原始文章D2的數目之比值落於一第二預定比值範圍,及該目標產業關聯資料的該關鍵詞和該公司資料D3的該產業類別及該目標詞共同出現於同一原始文章D2的原始文章D2的數目與該目標產業關聯資料的該關鍵詞和該公司資料D3的該產業類別共同出現於同一原始文章D2的原始文章D2的數目之比值落於一第三預定比值範圍。 Next, as shown in step S03, the related dictionary server unit 1 judges that each of the words of the vocabulary data is used as the target word and each of the industry-related data is used as a target industry-related data. Whether the industry-related data meets a vocabulary-specific clear condition. The vocabulary association clear condition includes the keyword of the target industry-related data, the industry category of the company data D3, and the number of original articles D2 in which the target word appears in the same original article D2 and the original article in which the target word appears The ratio of the number of D2 falls within a second predetermined ratio range, and the number of the original article D2 of the target industry-related data and the company category D3 of the industry category and the target word appear in the same original article D2 The ratio of the number of the original articles D2 appearing in the same original article D2 with the keywords of the target industry-related data and the industry category of the company data D3 falls within a third predetermined ratio range.
當該關聯詞典伺服器單元1判斷該目標詞與該目標產業關聯資料之間符合該詞彙關聯明確條件,該關聯詞典伺服器單元1 產生一詞彙關聯資料,該詞彙關聯資料包含做為一關聯詞之該目標詞,及該目標產業關聯資料。步驟S03所產生的多個詞彙關聯資料的集合做為一產業關聯詞典。 When the related dictionary server unit 1 determines that the target word and the target industry related data meet the vocabulary relationship clear condition, the related dictionary server unit 1 Generate lexical related data, the lexical related data includes the target word as a related word, and the target industry related data. The set of multiple lexical-related data generated in step S03 is used as an industry-related dictionary.
舉例來說,當該關聯詞典伺服器單元1要判斷該詞彙資料的該等詞中的詞一「廣達」與該等產業關聯資料中包含「英業達」、「和碩」、「電腦週邊」之產業關聯資料之間是否符合該詞彙關聯明確條件時,該關聯詞典伺服器單元1是將「英業達」、「電腦週邊」及「廣達」共同出現於同一原始文章D2的原始文章D2的數目(例如30篇)除以出現有「廣達」的原始文章D2的數目(例如100篇),再判斷前述比值(例如0.3)是否落於該第二預定比值範圍(例如0.25~1),並將「英業達」、「電腦週邊」及「廣達」共同出現於同一原始文章D2的原始文章D2的數目(例如30篇)除以「英業達」、「電腦週邊」共同出現於同一原始文章D2的原始文章D2的數目(例如90篇),再判斷前述比值(例如0.33)是否落於該第三預定比值範圍(例如0.25~1)。 For example, when the related dictionary server unit 1 is to determine the word "Guangda" among the words in the vocabulary data and the industry-related data contains "Inventec", "Heshuo", "Computer" When the industry related data of “periphery” meets the clear conditions of the vocabulary association, the related dictionary server unit 1 is a combination of “Inventec”, “Computer Peripheral” and “Guangda” in the original of the same original article D2 Divide the number of articles D2 (for example, 30 articles) by the number of original articles D2 (for example, 100 articles) in which “Quanta” appears, and then determine whether the aforementioned ratio (such as 0.3) falls within the second predetermined ratio range (such as 0.25 ~ 1) and divide the number of original articles D2 (for example, 30) in which "Inventec", "Computer Peripheral" and "Guangda" appear together in the same original article D2 by "Inventec", "Computer Peripheral" The number of original articles D2 (for example, 90 articles) appearing in the same original article D2 together, and then it is determined whether the aforementioned ratio (for example, 0.33) falls within the third predetermined ratio range (for example, 0.25 to 1).
當該關聯詞典伺服器單元1判斷「廣達」與「英業達」、「電腦週邊」之間符合該詞彙關聯明確條件(例如0.3落在0.25~1內,且0.33落在0.25~1內),該關聯詞典伺服器單元1產生該詞彙關聯資料,該詞彙關聯資料包含做為一關聯詞之「廣達」,及「英業達」(關鍵詞)、「和碩」、「電腦週邊」。 When the related dictionary server unit 1 judges that "Guangda", "Inventec" and "Computer Peripheral" meet the clear conditions for the vocabulary association (for example, 0.3 falls within 0.25 ~ 1 and 0.33 falls within 0.25 ~ 1 ), The related dictionary server unit 1 generates the lexical related data, which includes "Guangda" as a related word, and "Industrial" (keywords), "Heshuo", "computer peripherals" .
該智能文章關聯詞典系統100可以是每隔預定時間執行該資料分析程序以更新分析結果。 The smart article association dictionary system 100 may execute the data analysis program every predetermined time to update the analysis result.
以下配合圖1及圖3說明該智能文章關聯詞典系統100執行一資料回饋程序的步驟。首先,如步驟S11所示,該終端電子裝置2傳送一分析請求給該關聯詞典伺服器單元1,該分析請求包含一待分析文章。 The steps of the smart article association dictionary system 100 executing a data feedback procedure are described below with reference to FIGS. 1 and 3. First, as shown in step S11, the terminal electronic device 2 sends an analysis request to the associated dictionary server unit 1, and the analysis request includes an article to be analyzed.
接著,如步驟S12所示,當該關聯詞典伺服器單元1接收到該分析請求,該關聯詞典伺服器單元1從該待分析文章選定一代表詞。在本實施例中,當該關聯詞典伺服器單元1接收到該分析請求,該關聯詞典伺服器單元1是先對該待分析文章的一標題進行斷詞以產生一標題詞彙,該標題詞彙包含多個候選詞,該關聯詞典伺服器單元1從該標題詞彙的該等候選詞中選定詞頻最大者做為該代表詞(例如「英業達」)。 Next, as shown in step S12, when the related dictionary server unit 1 receives the analysis request, the related dictionary server unit 1 selects a representative word from the article to be analyzed. In this embodiment, when the related dictionary server unit 1 receives the analysis request, the related dictionary server unit 1 first performs a word segmentation on a title of the article to be analyzed to generate a title vocabulary, and the title vocabulary includes For a plurality of candidate words, the related dictionary server unit 1 selects the candidate word with the highest frequency among the candidate words of the title vocabulary as the representative word (for example, “Inventec”).
接著,如步驟S13所示,該關聯詞典伺服器單元1根據該代表詞產生一分析結果。該分析結果包含該等詞彙關聯資料中該關鍵詞與該代表詞相同者。舉例來說,當該代表詞為「英業達」,則該分析結果包含該等詞彙關聯資料中該關鍵詞為「英業達」者。 Next, as shown in step S13, the related dictionary server unit 1 generates an analysis result according to the representative word. The analysis result includes the keywords in the lexical association data that are the same as the representative words. For example, when the representative word is "Inventec", the analysis result includes those in which the keyword is "Inventec".
接著,如步驟S14所示,該關聯詞典伺服器單元1將該分析結果傳送給該終端電子裝置2。 Next, as shown in step S14, the related dictionary server unit 1 transmits the analysis result to the terminal electronic device 2.
最後,如步驟15所示,當該終端電子裝置2接收到該分 析結果,該終端電子裝置2顯示該分析結果。如圖4所示,在本實施例中,該終端電子裝置2是以一旭日圖200顯示該分析結果,其中,該旭日圖200的中心指示該代表詞,由中心向外的第一圈指示出該等關聯詞,由中心向外的第二圈指示出該等產業類別,由中心向外的第三圈指示出該等公司名稱。 Finally, as shown in step 15, when the terminal electronic device 2 receives the branch Analysis result, the terminal electronic device 2 displays the analysis result. As shown in FIG. 4, in this embodiment, the terminal electronic device 2 displays the analysis result in a sunburst chart 200, wherein the center of the sunburst chart 200 indicates the representative word, and the first circle outwards from the center indicates Out of the related words, the industry categories are indicated by the second circle from the center, and the company names are indicated by the third circle from the center.
補充說明的是,當該智能文章關聯詞典系統100執行該資料回饋程序,該智能文章關聯詞典系統100還執行一詞典更新程序。於該詞典更新程序,該智能文章關聯詞典系統100對該待分析文章進行斷詞,並接著針對斷詞後所產生的詞執行如同該資料分析程序中的步驟S02及步驟S03之步驟以產生詞彙關聯資料,並以該詞典更新程序所產生的詞彙關聯資料更新該產業關聯詞典。藉此,每當該智能文章關聯詞典系統100執行該資料回饋程序,該智能文章關聯詞典系統100就會執行該詞典更新程序以更新該產業關聯詞典。 It is added that when the smart article association dictionary system 100 executes the material feedback program, the smart article association dictionary system 100 also executes a dictionary update program. In the dictionary update program, the intelligent article association dictionary system 100 performs word segmentation on the article to be analyzed, and then executes steps S02 and S03 in the data analysis program to generate words based on the word generated after the word segmentation. Related data, and update the industry-related dictionary with the vocabulary related data generated by the dictionary update program. Therefore, whenever the smart article association dictionary system 100 executes the data feedback program, the smart article association dictionary system 100 executes the dictionary update program to update the industry-associated dictionary.
綜上所述,本發明智能文章關聯詞典系統100的實施例藉由該產業關聯明確條件產生該等產業關聯資料,並藉由該詞彙關聯明確條件產生該等詞彙關聯資料,並藉由根據該代表詞及該等詞彙關聯資料產生該分析結果,從而能幫助投資人(操作該終端電子裝置2的使用者)快速釐清該待分析文章的主題(該代表詞),並提供投資人相關聯之詞彙(該等關聯詞、該等公司名稱、該等產業 類別),以利於投資人進行相應投資配置(例如買賣該等公司名稱對應之金融商品),故確實能達成本發明的目的。 In summary, the embodiment of the intelligent article association dictionary system 100 of the present invention generates the industry-related data by the industry-specific explicit conditions, and generates the lexical-related data by the lexical-associated specific conditions, and according to the The representative words and related data of the words generate the analysis result, which can help the investor (the user who operates the terminal electronic device 2) quickly clarify the subject of the article to be analyzed (the representative word), and provide the investor with relevant information. Glossary (the related words, the names of the companies, the industries Category) to facilitate investors to make corresponding investment allocations (such as buying and selling financial products corresponding to the names of these companies), so it can indeed achieve the purpose of cost invention.
惟以上所述者,僅為本發明的實施例而已,當不能以此限定本發明實施的範圍,凡是依本發明申請專利範圍及專利說明書內容所作的簡單的等效變化與修飾,皆仍屬本發明專利涵蓋的範圍內。 However, the above are only examples of the present invention. When the scope of implementation of the present invention cannot be limited by this, any simple equivalent changes and modifications made according to the scope of the patent application and the contents of the patent specification of the present invention are still Within the scope of the invention patent.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW106133887A TWI647576B (en) | 2017-09-30 | 2017-09-30 | Intelligent article association dictionary system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW106133887A TWI647576B (en) | 2017-09-30 | 2017-09-30 | Intelligent article association dictionary system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI647576B true TWI647576B (en) | 2019-01-11 |
TW201915773A TW201915773A (en) | 2019-04-16 |
Family
ID=65803746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW106133887A TWI647576B (en) | 2017-09-30 | 2017-09-30 | Intelligent article association dictionary system and method |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI647576B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6006221A (en) * | 1995-08-16 | 1999-12-21 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US20020198909A1 (en) * | 2000-06-06 | 2002-12-26 | Microsoft Corporation | Method and system for semantically labeling data and providing actions based on semantically labeled data |
US6658377B1 (en) * | 2000-06-13 | 2003-12-02 | Perspectus, Inc. | Method and system for text analysis based on the tagging, processing, and/or reformatting of the input text |
US20080097748A1 (en) * | 2004-11-12 | 2008-04-24 | Haley Systems, Inc. | System for Enterprise Knowledge Management and Automation |
TWM556876U (en) * | 2017-09-30 | 2018-03-11 | Capital Securities Corp | Intelligent article connective dictionary system |
-
2017
- 2017-09-30 TW TW106133887A patent/TWI647576B/en not_active IP Right Cessation
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6006221A (en) * | 1995-08-16 | 1999-12-21 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US20020198909A1 (en) * | 2000-06-06 | 2002-12-26 | Microsoft Corporation | Method and system for semantically labeling data and providing actions based on semantically labeled data |
US6658377B1 (en) * | 2000-06-13 | 2003-12-02 | Perspectus, Inc. | Method and system for text analysis based on the tagging, processing, and/or reformatting of the input text |
US20080097748A1 (en) * | 2004-11-12 | 2008-04-24 | Haley Systems, Inc. | System for Enterprise Knowledge Management and Automation |
TWM556876U (en) * | 2017-09-30 | 2018-03-11 | Capital Securities Corp | Intelligent article connective dictionary system |
Also Published As
Publication number | Publication date |
---|---|
TW201915773A (en) | 2019-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9170993B2 (en) | Identifying tasks and commitments using natural language processing and machine learning | |
TWI609279B (en) | Method for personalizing user interface content, computing apparatus and machine-readable storage medium | |
US20170147688A1 (en) | Automatically mining patterns for rule based data standardization systems | |
WO2019072091A1 (en) | Method and apparatus for use in determining tags of interest to user | |
US20130262641A1 (en) | Generating Roles for a Platform Based on Roles for an Existing Platform | |
CN107818487B (en) | Product information processing method, device, equipment and client | |
CA2747153A1 (en) | Natural language processing dialog system for obtaining goods, services or information | |
CN102929860B (en) | Chinese clause emotion polarity distinguishing method based on context | |
CN110795572A (en) | Entity alignment method, device, equipment and medium | |
EP3803628A1 (en) | Language agnostic data insight handling for user application data | |
US10937070B2 (en) | Collaborative filtering to generate recommendations | |
US10049163B1 (en) | Connected phrase search queries and titles | |
Taiwo et al. | Re-examine foreign direct investment and economic growth: Panel co-integration and causality tests for sub-Saharan African countries | |
CN111639903A (en) | Review processing method for architecture change and related equipment | |
US9201967B1 (en) | Rule based product classification | |
TWI647576B (en) | Intelligent article association dictionary system and method | |
CN109815391A (en) | News data analysis method and device, electric terminal based on big data | |
US20170039036A1 (en) | Correlation based instruments discovery | |
JP5013821B2 (en) | Apparatus, method, and program for classifying content | |
US9275035B2 (en) | Method and system to determine part-of-speech | |
TWM556876U (en) | Intelligent article connective dictionary system | |
CN113239273B (en) | Method, apparatus, device and storage medium for generating text | |
US10353929B2 (en) | System and method for computing critical data of an entity using cognitive analysis of emergent data | |
US20140059011A1 (en) | Automated data curation for lists | |
US20210312256A1 (en) | Systems and Methods for Electronic Marketing Communications Review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |