CN109982105A - Content retrieval system and method for broadcast platform - Google Patents
Content retrieval system and method for broadcast platform Download PDFInfo
- Publication number
- CN109982105A CN109982105A CN201711440357.1A CN201711440357A CN109982105A CN 109982105 A CN109982105 A CN 109982105A CN 201711440357 A CN201711440357 A CN 201711440357A CN 109982105 A CN109982105 A CN 109982105A
- Authority
- CN
- China
- Prior art keywords
- content
- information
- metadata
- server
- distributed storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/232—Content retrieval operation locally within server, e.g. reading video streams from disk arrays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/254—Management at additional data server, e.g. shopping server, rights management server
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8543—Content authoring using a description language, e.g. Multimedia and Hypermedia information coding Expert Group [MHEG], eXtensible Markup Language [XML]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of content retrieval system and method for broadcast platform, which includes text searching server, web server, database, Distributed Storage layer.Database is separately connected text searching server and Distributed Storage layer, and text searching server and Distributed Storage layer are accessed by web server.Distributed Storage layer includes message layer metadata and content layer metadata, and content layer metadata further comprises content object essential information, content object characteristic information, content substance information.Database parses the XML information of content layer metadata, and is stored.3D content metadata is described file and submits to full-text retrieval service system by database, establishes index for the full text of 3D metadata.The characteristics of present invention enables to searching system to adapt to stereotelevision, to promote recall precision.
Description
Technical field
The present invention relates to broadcast platform system and methods, examine more specifically to a kind of content for broadcast platform
Cable system and method.
Background technique
Stereotelevision is maximum with three-dimensional film the difference is that the cost of manufacture of three-dimensional program and broadcast time require not
Together.Three-dimensional film more focuses on the content of program, three-dimensional scene etc., and the production input cost of a three-dimensional film is big, processes
Period is long, is big cost, big investment, and a length of 120 minutes or so when program.In contrast, this media format of TV needs
It is carried out continuously the broadcast of different programme contents, TV can not can be carried out high cost relative to film, put into program making greatly, and
And TV requires to have the form of live broadcast according to programme content.Equally, for three-dimensional television, it is also not possible into
Row Gao Chengben, greatly put into program making, stereotelevision live streaming be it is a kind of have more there is an urgent need to the business form, with this condition
How stereoscopic visual effect, the compatibility that how to keep stereotelevision terminal and common TV are guaranteed, how based on existing net
Network realizing television network stereotelevision business, these problems must be answered simultaneously in the research of 3D interactive television system architectural framework
It solves the problems, such as.
And for the broadcast platform for playing stereotelevision, in order to adapt to storage, calling and the broadcasting of stereotelevision, extensively
It broadcasts platform and needs to establish a set of content retrieval system suitable for stereotelevision, in order to which stereotelevision content is better achieved
Quick-searching.However, broadcast platform searching system at this stage is established on the basis of common TV.If by such inspection
Cable system is directly applied in stereotelevision content, then can not embody its difference where, and stereotelevision due to content file compared with
Greatly, it undoubtedly can extend retrieval time using existing searching system, cause recall precision not high.
Summary of the invention
For the above-mentioned problems in the prior art, the object of the present invention is to provide a kind of contents for broadcast platform
Searching system and method
To achieve the above object, the present invention adopts the following technical scheme:
A kind of content retrieval system for broadcast platform, including text searching server, web server, database, point
Cloth data storage layer.Database is separately connected text searching server and Distributed Storage layer, and full article retrieval
Device and Distributed Storage layer are accessed by web server.Distributed Storage layer includes message layer metadata and interior
Hold layer metadata, content layer metadata further comprises content object essential information, content object characteristic information, content substance letter
Breath.Database parses the XML information of content layer metadata, and is stored.3D content metadata is described file and mentioned by database
Full-text retrieval service system is given, establishes index for the full text of 3D metadata.
Further, content object characteristic information includes audio feature information, video feature information and Streaming Media feature letter
Breath.
Further, content substance information includes essential information, audio-frequency information, video information, pictorial information and Streaming Media
Information.
Further, text searching server includes master index server and increment index server, increment index service
Device receives data and updates, and data update is synchronized on Distributed Storage layer.
To achieve the above object, the present invention also adopts the following technical scheme that
A kind of content search method for broadcast platform, comprising: building text searching server, web server, data
Library, Distributed Storage layer, database are separately connected text searching server and Distributed Storage layer, and full-text search
Server and Distributed Storage layer are accessed by web server;Construct Distributed Storage layer, including message layer member
Data and content layer metadata, content layer metadata further comprises content object essential information, content object characteristic information, interior
Hold entity information;The XML information of content layer metadata is parsed, and is stored in the database;3D content metadata is described into file
Full-text retrieval service system is submitted to, establishes index for the full text of 3D metadata.
Further, content object characteristic information includes audio feature information, video feature information and Streaming Media feature letter
Breath.
Further, content substance information includes essential information, audio-frequency information, video information, pictorial information and Streaming Media
Information.
Further, master index server and increment index server are constructed in text searching server, more by data
Newly on increment index server, and data update is synchronized on Distributed Storage layer.
In the above-mentioned technical solutions, the content retrieval system and method for broadcast platform of the invention enables to retrieve
System adapts to the characteristics of stereotelevision, to promote recall precision.
Detailed description of the invention
Fig. 1 is metadata hierarchical chart
Fig. 2 is the architecture diagram of searching system;
Fig. 3 is the method flow diagram of searching system.
Specific embodiment
Technical solution of the present invention is further illustrated with reference to the accompanying drawings and examples.
Referring to Fig.1, the present invention discloses a kind of content retrieval system for broadcast platform first, and applicable object is 3D
The retrieval of TV (stereotelevision) programme content.Thinking of the invention be make full use of 3D content metadata specific properties (depending on
The color of frequency, space layout, movement, picture depth of field feature), realize catalogue management and the flow configuration management of content, simultaneously
Efficient full-text search is carried out to 3D content using the specific properties of 3D content element.
As shown in Figure 1,3D metadata is described as that 3D content metadata is described using XML tree shape structure.Data are adopted
With tree, the stratification tissue of description information is supported, metadata schema can be made to have by the definition of optional node suitable
Answer the ability of different type content.From back-up environment to the total demand of content metadata, content metadata is divided into core
Collection, it is general can selected works, classification 3 parts of superset.
Core set: the attribute tags that any kind of content must all have, the main supplier including content, content mark
The information such as knowledge.
It is general can selected works: information relevant to content itself, and there is the category of universal adaptability to different types of content
Property set, such as whether manufacturer, synopsis (being briefly described), the validity period of content and the content of content the letter such as encrypt
Breath, there are also 3D attribute, as the color of video, space layout, movement, picture depth of field depth of field feature.
Classification superset: according to the feature of different content respectively, formulating the attribute being closely related with such content characteristic, and
Using some basic, necessary specific content categorical attributes as supplementing, such as content such for film, packet is provided
Include the extended attributes such as performer, director, theme song, poster and films types;For different types of type, superset difference compared with
Greatly.
3D holds metadata item and is generally divided into message layer metadata and content layer metadata, and content layer metadata includes content
Object essential information, content object characteristic information (audio characteristic information, video properties information, Streaming Media characteristic information), content
Entity information.Hierarchical structure and each layer metadata item are as shown in Figure 1.
Specifically, first independent sub-layer of the content object essential information as content layer metadata.
Second independent sub-layer of the content object characteristic information as content layer metadata further comprises audio frequency characteristics letter
Breath, video feature information and Streaming Media characteristic information.Further, audio feature information includes (audio) essential information, video
Characteristic information further comprises (video) essential information, (video) extension information, (video) gets information ready and (video) demolition is believed
Breath, Streaming Media characteristic information further comprises (Streaming Media) essential information and program information.
Third independent sub-layer of the content substance information as content layer metadata further comprises that (content substance) is basic
Information, (content substance) audio-frequency information, (content substance) video information, (content substance) pictorial information and (content substance) flow matchmaker
Body information.
Content service management system parses the metadata XML information of content first, and is stored in relevant database management
In system, facilitate the inquiry and modification of operator, when needing to send metadata toward other systems, further according to the letter in database
Breath regenerates satisfactory metadata XML file.
3D content metadata is described into file and submits to full-text retrieval service system, establishes rope for the full text of 3D metadata
Draw, and the full article retrieval of 3D content metadata is provided.
In order to improve the recall precision of metadata, the management of 3D content metadata and the use full-text search of retrieval subsystem are taken
Business device uses (Sphinx) as search engine, completes the storage of search engine data, 3D content member number using Tokyo Tyrant
According to management and retrieval subsystem realization framework as shown in Fig. 2, the content retrieval system for broadcast platform of the invention, main
Framework includes text searching server, web server, database, Distributed Storage layer.
Referring to Fig. 2, database is separately connected text searching server and Distributed Storage layer, and full article retrieval
Device and Distributed Storage layer are accessed by web server.Distributed Storage layer includes message layer metadata and interior
Hold layer metadata, content layer metadata further comprises content object essential information, content object characteristic information, content substance letter
Breath.Database parses the XML information of content layer metadata, and is stored.3D content metadata is described file and mentioned by database
Full-text retrieval service system is given, establishes index for the full text of 3D metadata.
Specifically, text searching server use Sphinx, Sphinx is a distributed index server, it by
Master index and increment index composition, data increase to first on increment index server, then by data on increment index server
It regularly updates on main index server, the single index maximum that Sphinx is provided may include 100,000,000 records, remember at 1,000
Inquiry velocity in the case of record is Millisecond, the speed of the creation index of Shpinx are as follows: the index time of 1,000,000 records of creation
It is 3~4 minutes, the increment index recorded comprising newest 100,000, rebuilding primary needs tens seconds.
Sphinx supports one-gram word.One-gram word is located at index upgrade module.Sphinx index engine for CJK (in
Japan and Korea S) language (must be UTF-8 coding) support unitary cutting, it is assumed that [3D film A Fanda] this section of text, Sphinx can be incited somebody to action
It is cut into [3D film A Fanda], then establishes reverse indexing to each word.If forming one with the word for including in the words
The word being not present, such as [shadow Ah], can also be searched, so needing to add quotation marks, such as search [" A Fan when search
Up to "], four words to connect together can be exactly matched, discontinuous [" shadow Ah "] would not be searched.It is searched for using being located at
The Chinese word segmentation of enquiry module is handled.Sphinx also supports Chinese word segmentation.Chinese word segmentation is located at search inquiry module.Search
" 3D film A Fanda ", " game of 3D film starvation ", first calls independent Chinese automatic word-cut, respectively cutting be " 3D film Ah
It is all to reach ", " game of 3D film starvation ", at this time, then give the word of space-separated plus quotation marks, remove Sphinx search [" 3D electricity
Shadow " " A Fanda "] or [" the hungry game of 3D film " " "], this can be searched and had recorded.Chinese word segmentation dictionary generation increasing,
It deletes, change, without rebuilding entire Sphinx search index.
Search engine data store Tokyo Tyrant, and Tokyo Tyrant is a distributed data cache storage.It can
Accelerate index speed so that Key-Value value to be saved in memory as Memcached.Simultaneously due to Tokyo Tyrant
It is outstanding file/text-type database Tokyo Cabinet network interface, easily carries out system extension later, uses
The content that non-relational database storage storage largely needs to be retrieved.Single Tokyo Tyrant server is supported 10000 times
Request/second.It has identical Key with Sphinx.
Text searching server includes master index server and increment index server, and increment index server receives data
It updates, and data update is synchronized on Distributed Storage layer.MySql passes through main table and increment list extension storage data.
Correspondingly, corresponding to above system frame the invention also discloses a kind of content search method for broadcast platform
Structure, as shown in figure 3, it is mainly comprised the steps that
S1: building text searching server, web server, database, Distributed Storage layer, the database point
Not Lian Jie text searching server and Distributed Storage layer, and text searching server and Distributed Storage layer are logical
Cross web server access.
S2: building Distributed Storage layer, including message layer metadata and content layer metadata, the content layer member number
According to further comprising content object essential information, content object characteristic information, content substance information.
S3: the XML information of parsing content layer metadata, and store in the database.
S4: describing file for 3D content metadata and submit to full-text retrieval service system, builds for the full text of 3D metadata
Lithol draws.
Further, the treatment process of retrieval:
When website data is updated database, data are updated onto the increment index of text searching server, and
By on data synchronization updating to search engine Distributed Storage, the two index ID having the same.Sphinx is responsible for foundation
The index of data, facilitates full-text search.Tokyo Tyrant is responsible for quick response data.
When client retrieves content by Web server, Web server initiates retrieval request to Sphinx first,
Sphinx retrieves the index ID list of response data, and returns to Web server;Web server will index ID list and send
Web server is returned data to according to index ID to Tokyo Tyrant, Tokyo Tyrant, Web server returns data
Back to client, data success is retrieved at End-Customer end.
Those of ordinary skill in the art it should be appreciated that more than embodiment be intended merely to illustrate the present invention,
And be not used as limitation of the invention, as long as the change in spirit of the invention, to embodiment described above
Change, modification will all be fallen within the scope of claims of the present invention.
Claims (8)
1. a kind of content retrieval system for broadcast platform characterized by comprising
Text searching server, web server, database, Distributed Storage layer, the database are separately connected full text and examine
Rope server and Distributed Storage layer, and text searching server and Distributed Storage layer pass through web server
Access;
The Distributed Storage layer includes message layer metadata and content layer metadata, and the content layer metadata is further
Including content object essential information, content object characteristic information, content substance information;
Database parses the XML information of content layer metadata, and is stored;
3D content metadata is described file and submits to full-text retrieval service system by database, is established for the full text of 3D metadata
Index.
2. being used for the content retrieval system of broadcast platform as described in claim 1, it is characterised in that:
The content object characteristic information includes audio feature information, video feature information and Streaming Media characteristic information.
3. being used for the content retrieval system of broadcast platform as described in claim 1, it is characterised in that:
The content substance information includes essential information, audio-frequency information, video information, pictorial information and stream media information.
4. being used for the content retrieval system of broadcast platform as described in claim 1, it is characterised in that:
The text searching server includes master index server and increment index server, and the increment index server receives
Data update, and data update is synchronized on Distributed Storage layer.
5. a kind of content search method for broadcast platform characterized by comprising
Text searching server, web server, database, Distributed Storage layer are constructed, the database is separately connected entirely
Literary retrieval server and Distributed Storage layer, and text searching server and Distributed Storage layer are taken by web
Business device access;
Construct Distributed Storage layer, including message layer metadata and content layer metadata, the content layer metadata is into one
Step includes content object essential information, content object characteristic information, content substance information;
The XML information of content layer metadata is parsed, and is stored in the database;
3D content metadata is described into file and submits to full-text retrieval service system, establishes index for the full text of 3D metadata.
6. being used for the content search method of broadcast platform as claimed in claim 5, it is characterised in that:
The content object characteristic information includes audio feature information, video feature information and Streaming Media characteristic information.
7. being used for the content search method of broadcast platform as claimed in claim 5, it is characterised in that:
The content substance information includes essential information, audio-frequency information, video information, pictorial information and stream media information.
8. being used for the content search method of broadcast platform as claimed in claim 5, it is characterised in that:
Master index server and increment index server are constructed in text searching server, and data are updated to increment index and are taken
It is engaged on device, and data update is synchronized on Distributed Storage layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711440357.1A CN109982105A (en) | 2017-12-27 | 2017-12-27 | Content retrieval system and method for broadcast platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711440357.1A CN109982105A (en) | 2017-12-27 | 2017-12-27 | Content retrieval system and method for broadcast platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109982105A true CN109982105A (en) | 2019-07-05 |
Family
ID=67071365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711440357.1A Pending CN109982105A (en) | 2017-12-27 | 2017-12-27 | Content retrieval system and method for broadcast platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109982105A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021855A (en) * | 2006-10-11 | 2007-08-22 | 鲍东山 | Video searching system based on content |
CN101520800A (en) * | 2009-03-27 | 2009-09-02 | 华中科技大学 | Cryptogram-based safe full-text indexing and retrieval system |
US20100257049A1 (en) * | 2009-04-03 | 2010-10-07 | Avichai Flombaum | System and method for identifying and retrieving targeted advertisements or other related documents |
US20110218997A1 (en) * | 2010-03-08 | 2011-09-08 | Oren Boiman | Method and system for browsing, searching and sharing of personal video by a non-parametric approach |
CN102831253A (en) * | 2012-09-25 | 2012-12-19 | 北京科东电力控制系统有限责任公司 | Distributed full-text retrieval system |
US8948515B2 (en) * | 2010-03-08 | 2015-02-03 | Sightera Technologies Ltd. | Method and system for classifying one or more images |
CN107423349A (en) * | 2017-05-18 | 2017-12-01 | 福建中金在线信息科技有限公司 | A kind of method and system of full-text search |
-
2017
- 2017-12-27 CN CN201711440357.1A patent/CN109982105A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021855A (en) * | 2006-10-11 | 2007-08-22 | 鲍东山 | Video searching system based on content |
CN101520800A (en) * | 2009-03-27 | 2009-09-02 | 华中科技大学 | Cryptogram-based safe full-text indexing and retrieval system |
US20100257049A1 (en) * | 2009-04-03 | 2010-10-07 | Avichai Flombaum | System and method for identifying and retrieving targeted advertisements or other related documents |
US20110218997A1 (en) * | 2010-03-08 | 2011-09-08 | Oren Boiman | Method and system for browsing, searching and sharing of personal video by a non-parametric approach |
US8948515B2 (en) * | 2010-03-08 | 2015-02-03 | Sightera Technologies Ltd. | Method and system for classifying one or more images |
CN102831253A (en) * | 2012-09-25 | 2012-12-19 | 北京科东电力控制系统有限责任公司 | Distributed full-text retrieval system |
CN107423349A (en) * | 2017-05-18 | 2017-12-01 | 福建中金在线信息科技有限公司 | A kind of method and system of full-text search |
Non-Patent Citations (2)
Title |
---|
张宴: "亿级数据的高并发通用搜索引擎架构设计", 《张宴的博客》 * |
顾国颖: "立体电视内容聚合与智能检索系统的设计与思考", 《电视工程》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110704411B (en) | Knowledge graph building method and device suitable for art field and electronic equipment | |
CN100595765C (en) | Medium player based key words content issue method and system | |
US9165085B2 (en) | System and method for publishing aggregated content on mobile devices | |
US20220337676A1 (en) | Dynamic and static data of metadata objects | |
US8862607B2 (en) | Content receiving apparatus with search query generator | |
US8261178B2 (en) | Audio data distribution system and method for generating a photo slideshow which automatically selects music | |
CN101917553B (en) | System for collectively processing multimedia data | |
CN1692354B (en) | Information management system, information processing device, information processing method | |
DE102017124876A1 (en) | Determine search queries to obtain information during a user experience of an event | |
CN103092958A (en) | Display method and device for search result | |
CN113779416B (en) | Information recommendation method and device, electronic equipment and computer-readable storage medium | |
CN111104583B (en) | Live broadcast room recommendation method, storage medium, electronic equipment and system | |
CN106294695A (en) | A kind of implementation method towards the biggest data search engine | |
WO2015096609A1 (en) | Method and system for creating inverted index file of video resource | |
CN102682036A (en) | Non-editing based method and system for searching media assets | |
US20100077300A1 (en) | Computer Method and Apparatus Providing Social Preview in Tag Selection | |
US8965870B2 (en) | Method and apparatus for exchanging media service queries | |
CN112307318A (en) | Content publishing method, system and device | |
CN105893640B (en) | Favorite merging method and device | |
EP3133820A1 (en) | Interactive video distribution system with content similarity matching | |
US20090043785A1 (en) | Managing structured content stored as a binary large object (blob) | |
CN105740251B (en) | Method and system for integrating different content sources in bus mode | |
CN109982105A (en) | Content retrieval system and method for broadcast platform | |
CN113641765B (en) | Unified logic model organization method and device for massive multi-source remote sensing data | |
CN103731478A (en) | Content issuing method and system based on user access time |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190705 |
|
RJ01 | Rejection of invention patent application after publication |