CN111078905A - Data processing method, device, medium and equipment - Google Patents
Data processing method, device, medium and equipment Download PDFInfo
- Publication number
- CN111078905A CN111078905A CN201811231400.8A CN201811231400A CN111078905A CN 111078905 A CN111078905 A CN 111078905A CN 201811231400 A CN201811231400 A CN 201811231400A CN 111078905 A CN111078905 A CN 111078905A
- Authority
- CN
- China
- Prior art keywords
- data
- data structure
- label
- field
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 30
- 239000000463 material Substances 0.000 claims abstract description 390
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000012545 processing Methods 0.000 claims abstract description 37
- 238000012550 audit Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 abstract description 21
- 230000008569 process Effects 0.000 description 12
- 238000004590 computer program Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 230000005540 biological transmission Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application discloses a data processing method, which is characterized in that acquired material data are processed based on a preset data structure to obtain label basic data conforming to the preset data structure, the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data; and inputting the obtained basic label data into a label system so that the label system carries out labeling processing according to the basic label data. The method can ensure that the data input into the label system have the same data structure, and accordingly, the label system only needs to make a corresponding label method aiming at the preset data structure without consuming a large amount of manpower and material resources to make a label method capable of universally adapting to various data structures; in addition, when material data from a newly added source is faced, the material data is directly converted into corresponding label basic data based on the preset data structure, and the label system does not need to be upgraded.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and medium.
Background
The media fusion is used as a new operation mode in the form of information transmission channel diversification, and effectively combines the traditional media such as newspaper, television, broadcast and the like with the information transmission channel of network media such as the internet and the like, so that the sharing of the materials of the traditional media and the network media is realized, and the shared materials are spread to the public through different traditional media platforms and network media platforms.
In order to ensure that the shared materials can be accurately delivered to each specific platform, the obtained materials generally need to be tagged, so that the materials are delivered according to the tags corresponding to the materials.
In the prior art, the labeling processing operation is usually independently completed by a labeling system, and because the source of the material data obtained by the labeling system is various and the data structures of the material data with different sources are different, that is, the types of the content stored in each field in the material data with different sources are different, a labeling method capable of universally adapting to various material data needs to be adopted by the labeling system during labeling processing, but the formulation difficulty of the labeling method capable of universally adapting to various material data is high, and errors are easy to make in the labeling processing process. In addition, in the face of a newly added material source, a new labeling method needs to be established by multiple workers of technology, operation, data analysis and the like aiming at the newly added material source, and a label system is subjected to version change and upgrade, so that the process is complex and complicated, and a large amount of labor energy needs to be consumed.
Disclosure of Invention
The embodiment of the application provides a data processing method, which can provide label basic data with a uniform data structure for a label system, thereby reducing the difficulty of labeling the label system and enabling the label system to have stronger universality.
In a first aspect, an embodiment of the present application provides a data processing method, where the method includes:
acquiring material data;
obtaining label basic data conforming to a preset data structure according to the material data and the preset data structure; the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data;
and inputting the label basic data into a label system so that the label system sets a label for the material data according to the label basic data.
Optionally, the obtaining, according to the material data and a preset data structure, tag basic data conforming to the preset data structure includes:
determining attribute categories corresponding to the fields of the material data according to the content borne by the fields of the material data;
and writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data to obtain the label basic data conforming to the preset data structure.
Optionally, the preset data structure includes a plurality of fields for carrying region attribute information, and different fields for carrying region attribute information are used for carrying region attribute information of different region classes;
writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data, including:
splitting the content carried by the fields corresponding to the region attribute categories in the material data into corresponding region content according to the region levels corresponding to the fields for carrying the region attribute information in the preset data structure;
and correspondingly writing the region content into a field for bearing region attribute information in the preset data structure.
Optionally, the preset data structure includes a field for bearing material keyword attribute information;
writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data, including:
determining a material keyword according to content borne by a field corresponding to the material content attribute category in the material data;
and writing the material keywords into a field for bearing the material keyword attribute information in the preset data structure.
Optionally, the preset data structure includes fields for carrying the following attribute information: the method comprises the following steps of material source, material title, material subtitle, material creation time, material acquisition time, material content classification, material data type, material link address, material content text, material content abstract, primary region, secondary region, tertiary region and material audit source.
Optionally, the material data includes: news material data from traditional media and news material data from network media.
Optionally, a plurality of fields included in the preset data structure are stored in a form of a database table.
In a second aspect, an embodiment of the present application provides a data processing apparatus, where the apparatus includes:
the acquisition module is used for acquiring material data;
the processing module is used for obtaining label basic data conforming to a preset data structure according to the material data and the preset data structure; the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data;
and the input module is used for inputting the label basic data into a label system so that the label system sets a label for the material data according to the label basic data.
In a third aspect, an embodiment of the present application provides a processor, where the processor is configured to execute a program, where the program executes the data processing method according to the first aspect.
In a fourth aspect, an embodiment of the present application provides a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute the data processing method according to the first aspect.
The embodiment of the application provides a data processing method, which is characterized in that acquired material data are processed based on a preset data structure to obtain label basic data conforming to the preset data structure, the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data; and inputting the obtained basic label data into a label system so that the label system carries out labeling processing according to the basic label data. The processing is carried out on each acquired material data, so that the data input into the label system can be ensured to be label basic data conforming to a preset data structure, namely the data input into the label system are ensured to have the same data structure, therefore, the label system only needs to formulate a corresponding label method aiming at the preset data structure, does not need to consume a large amount of manpower and material resources to formulate label methods capable of universally adapting to various data structures, and correspondingly, when the labeling processing is carried out, the label can be rapidly and accurately set labels for the material data according to the input label basic data by using the label method; in addition, when material data from a newly added source is faced, the material data is directly converted into corresponding label basic data based on the preset data structure, and the label system does not need to be subjected to version changing and upgrading.
Drawings
Fig. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the prior art, a labeling system generally performs labeling processing on material data acquired by the labeling system by using a labeling method which can basically adapt to various data structures. However, the method has the technical problems of high difficulty in formulating the label method, low label accuracy and the like, and when the material data from a newly added source is faced, the related technical personnel is often required to re-formulate the label method according to the data structure of the material data of the newly added source and perform version change and upgrade on the label system.
In view of the technical problems in the prior art, an embodiment of the present application provides a data processing method, which can provide tag basic data with a uniform data structure for a tag system, thereby reducing the difficulty of tagging processing of the tag system, improving the tag accuracy, and enabling the tag system to have stronger universality. The following first introduces the core technical idea of the data processing method provided in the embodiment of the present application:
according to the data processing method provided by the embodiment of the application, material data are obtained firstly, and then label basic data conforming to a preset data structure are obtained according to the material data and the preset data structure, wherein the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data; and inputting the label basic data into a label system so that the label system sets a label for the material data according to the label basic data.
The data processing method is carried out on each acquired material data, so that the data input into the label system can be guaranteed to be label basic data conforming to a preset data structure, namely the data input into the label system are guaranteed to have the same data structure, the label system only needs to formulate a corresponding label method according to the preset data structure, a large amount of manpower and material resources are not consumed to formulate label methods capable of universally adapting to various data structures, and correspondingly, when labeling is carried out, labels can be quickly and accurately set for the material data according to the input label basic data by using the label method; in addition, when material data from a newly added source is faced, the material data is directly converted into corresponding label basic data based on the preset data structure, and the label system does not need to be subjected to version changing and upgrading.
In order to facilitate understanding of the technical solution of the present application, the following describes a data processing method provided in the embodiments of the present application with reference to an actual application scenario.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a data processing method provided in an embodiment of the present application. The application scenario includes a plurality of databases 101 and a server 102; the database 101 may be a database for storing material data for each media platform, and the server 102 may obtain the material data from each database 101.
The server 102 is configured to execute the data processing method provided in the embodiment of the present application, where the server 102 may specifically be an application server or a Web server, and when the actual application is deployed, the server 102 may be an independent server or a cluster server. The server 102 may obtain a large amount of material data from the database 101 for storing material data of various conventional media platforms and/or network media platforms, then process the obtained material data based on a preset data structure to obtain tag basic data conforming to the preset data structure, and further transmit the tag basic data to the tag system, so that the tag system sets a tag for the material data accordingly according to the received tag basic data.
It should be understood that, in practical applications, the server 102 may obtain the material data from other channels, and the manner of obtaining the material data from the database 101 is only an example, and no limitation is made to the manner of obtaining the material data by the server 102.
It should be understood that the tag system may be run in the server 102 for executing the data processing method provided in the embodiment of the present application, and may also be run on other devices, where no specific limitation is made to a hardware device supporting the operation of the tag system.
It should be noted that the application scenario described in fig. 1 is only an example, and in practical application, the data processing method provided in the embodiment of the present application may also be applied to other application scenarios, and no specific limitation is made to the application scenario of the data processing method here.
The data processing method provided by the present application is described below by way of an embodiment.
Referring to fig. 2, fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application. For convenience of description, the following embodiments are described with a server as an execution subject. As shown in fig. 2, the data processing method includes the steps of:
step 201: and acquiring material data.
With the gradual development of the media fusion technology, the server can acquire material data from various traditional media and/or network media; the material data generally includes a field for carrying material attributes and a field for carrying material contents, where the material attributes specifically include information of material sources, material creation time, material content classifications, material data types, material link addresses, and the like, and the material contents refer to carrier data for carrying material contents, for example, for article materials, the material contents refer to documents showing the article contents, for image materials, the material contents refer to pictures showing the image contents, and the like.
In one possible implementation, the server may communicate with a data management server associated with each conventional media and/or network media, and accordingly obtain the data propagated by each conventional media and/or network media from a database for storing material data of each conventional media and/or network media through the data management server associated with each conventional media and/or network media.
In another possible implementation, the server may directly capture material data from material transmission media transmitted by various conventional media and/or network media. Specifically, the server may use the data capture software to capture the material data from the data transmission channels of the conventional media and/or the network media accordingly.
It should be understood that, in practical applications, the server may also obtain the material data in other manners, and no limitation is made to the manner in which the server obtains the material data.
It should be noted that, in practical application, when the server acquires the material data, the server may correspondingly acquire the material data meeting the actual needs of the server according to the actual needs of the server. Specifically, when the server acquires the material data, the material data can be screened according to material attributes and/or material contents in the material data, and finally the screened material data is acquired.
For example, if the server needs to acquire news material data from a traditional medium and from a network medium, when the server acquires the material data, the server can screen out material data of which the material content is classified into news according to the material attributes in each material data in a database or a material propagation medium, and further acquire the part of news material data for subsequent processing; for another example, if the server needs to acquire video data from a conventional medium and from a network medium, when the server acquires the material data, the material data of which the material data type belongs to the video can be screened out according to the material attributes of the material data in the database or the material propagation medium, and then the video material data is acquired for subsequent processing.
It should be understood that, when the server acquires the material data, it may also directly acquire all the material data that can be acquired by itself, that is, the server does not need to screen the material data when acquiring the material data, and directly performs subsequent processing on all the material data that can be acquired by itself.
Step 202: obtaining label basic data conforming to a preset data structure according to the material data and the preset data structure; the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data.
And after the server acquires the material data, converting the material data into the label basic data conforming to the preset data structure according to the material data and the preset data structure. It should be understood that, for each material data, the server performs data conversion according to the same preset data structure, and the tag base data corresponding to each material data obtained through such conversion all have the same data structure, that is, the tag base data corresponding to each material data all conform to the preset data structure.
It should be noted that the preset data structure includes a plurality of fields having a designated sequence, and different fields are used for carrying different types of attribute information of the material data; the arrangement sequence of each field in the preset data structure is fixed, and the type of the attribute information used for bearing each field is fixed; correspondingly, the arrangement sequence of each field in the label basic data conforming to the preset data structure is fixed, and the type of the attribute information carried by each field in the label basic data is fixed.
For example, it is assumed that a field 01 in a preset data structure is used for bearing attribute information of a source class, a field 02 is used for bearing attribute information of a content class, a field 03 is used for bearing attribute information of a creation time class, a field 01 of material data a acquired by a server bears a material source, a field 02 bears material creation time, a field 03 bears material content, a field 01 of material data b acquired by the server bears material content, a field 02 bears a material source, and a field 03 bears material creation time; the server respectively processes the material data a and the material data B according to a preset data structure to obtain corresponding label basic data A and label basic data B, wherein the fields 01 and 02 of the label basic data A and the label basic data B both bear material sources, the fields 03 both bear material content, and the fields 03 both bear material creation time.
In specific implementation, after the server acquires the material data, the server may determine the attribute category corresponding to each field of the material data according to the content carried by each field of the material data.
Specifically, when a field of the material data includes a tag capable of representing an attribute type of the content carried by the field, the server may directly use the tag of the field as the attribute type corresponding to the field, for example, if the tag of the field 01 of the material data a is a material source, the content carried by the field is "hundredth", the tag of the field 02 is material creation time, and the content carried by the field is "2016/03/23", the server may determine that the attribute type corresponding to the field 01 of the material source is the material source, and the attribute type corresponding to the field 02 is the material creation time; when the field of the material data does not include the tag capable of representing the attribute category of the content carried by the field, the server may determine the attribute category of the content carried by the field by analyzing the content carried by the field, and use the attribute category as the attribute category corresponding to the field, for example, assuming that the content carried by the field 01 of the material data b is "Baidu" and the content carried by the field 02 is "2016/03/23", the server may determine that "Baidu" belongs to the material source and "2016/03/23" belongs to the material creation time by analyzing the "Baidu" and "2016/03/23", that is, determining that the attribute category corresponding to the field 01 is the material source and the attribute category corresponding to the field 02 is the material creation time.
Of course, the server may also determine the attribute type corresponding to each field of the material data in other manners, and the manner of determining the attribute type corresponding to each field of the material data is not limited herein.
After determining the attribute type corresponding to each field of the material data, the server writes the content borne by each field of the material data into the corresponding field of the preset data structure according to the attribute type corresponding to each field, so as to obtain the label basic data conforming to the preset data structure.
Each field in the preset data structure is preset to carry the content of the designated attribute type, so that after the attribute type corresponding to each field of the material data is determined, the attribute type corresponding to each field of the material data can be respectively matched with the attribute type corresponding to each field in the preset data structure, when the attribute type corresponding to a certain field of the material data is successfully matched with the attribute type corresponding to a certain field in the preset data structure, the content carried by the field of the material data can be correspondingly written into the field of the preset data structure, and thus, the fields of the material data are processed, and after the content carried by each field of the material data is written into the preset data structure, the preset data structure is used as the label basic data.
In a possible case, the preset data structure may include a plurality of fields for carrying the region attribute information, and different fields for carrying the region attribute information are used for carrying region attribute information of different region classes; when the content carried by the field with the attribute category of the regional information in the material data is written into the preset data structure, the content carried by the field with the attribute category of the regional information in the material data needs to be correspondingly split into the corresponding regional content according to the regional level corresponding to each field used for carrying the regional information in the preset data structure; and further, writing the region contents into fields for bearing region information in a preset data structure correspondingly.
For example, assume that the preset data structure includes three fields for carrying region attribute information, which are a field for carrying country-level region attribute information, a field for carrying provincial-level region attribute information, and a field for carrying city-level region attribute information, respectively, and the content carried by the field whose attribute type is region information in the material data is "city, Sichuan province, China", the server needs to split "china", "sichuan", and "Chengdu" from "the province city of china" according to the region level corresponding to the field for carrying the region information in the preset data structure, starting from three dimensions of country level, province level, and city level, and correspondingly writing 'China' into a field for bearing country-level region attribute information, writing 'Sichuan' into a field for bearing provincial-level region attribute information, and writing 'Chengdu' into a field for bearing city-level region attribute information.
It should be understood that, in a specific implementation, splitting may be performed according to an actual situation of the material data, and if only part of the region-level attribute information is carried in the material data, directly and correspondingly writing the region attribute information split from the attribute information into a field in the preset data structure for carrying the region-level attribute information, and setting a field in the preset data structure for carrying the region-level attribute information that is not carried in the material data to be empty.
In another possible case, the preset data structure may include a field for carrying the material keyword attribute information; the server needs to determine the material keyword according to the content carried by the field of the material content whose attribute category is in the material data, and then write the determined material keyword into the field for carrying the attribute information of the material keyword in the preset data structure.
Specifically, when the material keyword is determined, the server can determine the material keyword according to the content tag carried by the field of the material content with the attribute category in the material data; specifically, the content carried by the field with the attribute category being the material content is actually the material content, the material content may carry a keyword tag, that is, there may be some vocabularies marked as keywords in the material content, and the server may directly use the vocabularies marked as keywords as the material keywords. When the material content does not have the keyword tag, the server can determine the semantic keyword of the material content by performing semantic analysis on the material content, and directly take the determined semantic keyword of the material content as the material keyword.
It should be understood that the server may also determine the material keywords in other manners, and the specific implementation manner of determining the material keywords by the server is not limited herein.
It should be noted that, in practical applications, the preset data structure may further include a field for bearing attribute information of other categories, and when the server processes the material data, the server may perform corresponding processing on the material data according to the attribute category corresponding to the field in the preset data structure, so as to obtain the attribute information meeting the attribute category, and correspondingly write the attribute information meeting the attribute category into the preset data structure.
It should be understood that the preset data structure is designed by referring to the data structures of various material data in advance, and the attribute information carried by each field of the preset data structure can cover the attribute information carried by each field in the material data of various data structures, that is, all information carried in the corresponding material data can be covered in the tag basic data processed according to the preset data structure.
Optionally, the preset data structure may include fields for carrying the following attribute information: material source, material title, material subtitle, material creation time, material acquisition time, material content classification, material data type, material link address, material content text, material content summary, primary region (corresponding to country-level region), secondary region (corresponding to provincial region), tertiary region (corresponding to city-level region), and material audit source.
It should be understood that, in practical applications, the preset data structure is not limited to the field including the attribute information, and the field included in the preset data structure may also carry other attribute information, and no limitation is made to the attribute information that the preset data structure can carry.
It should be understood that, if there is a newly added material source and the material data from the newly added material source further includes attribute information that cannot be borne in the preset data structure, the preset data structure may be updated according to the attribute information carried in the material data from the material source.
Optionally, the plurality of fields included in the preset data structure may specifically be stored in a form of a database table, and each row of data of the database table corresponds to each field of the preset data structure. The database table may include a field number column, an attribute category column, and a content column, where the attribute category column is used to record an attribute category corresponding to attribute information carried by each field of a preset data structure, and the content column is used to record content carried by each field of written material data.
To facilitate understanding of the above database tables, the database tables are described below in connection with Table 1:
TABLE 1
As shown in table 1, the field No. 01 is used to carry attribute information whose attribute type is a material source, the field No. 02 is used to carry attribute information whose attribute type is a material title, the field No. 03 is used to carry attribute information whose attribute type is a material subheading, and so on.
It should be understood that when the content of each field of the material data is written into the preset data structure, if the material data does not carry attribute information corresponding to some attribute categories in the preset data structure, the fields corresponding to these attribute categories in the preset data structure may be set to null values.
Step 203: and inputting the label basic data into a label system so that the label system sets a label for the material data according to the label basic data.
After the label basic data are obtained, the server inputs the obtained label basic data into the label system, so that the label system directly sets a corresponding label for the corresponding material data according to the label basic data.
It should be understood that when the tag system runs on the server, the server may directly input the tag basic data into the tag system that runs on itself, and when the tag system runs on other devices independent of the server, the server transmits the tag basic data to the device that supports the operation of the tag system through a corresponding data communication method, so that the tag system on the device performs tagging processing on the tag basic data.
When the tag system performs tagging processing according to the tag basic data, the content borne by the field in the tag basic data can be directly used as a tag, for example, the tag system directly uses "Baidu" borne by the field for bearing the source information of the material in the tag basic data as a tag, and uses "sports news" borne by the field for bearing the material content classification in the tag basic data as a tag; the tag system may also determine the tag accordingly according to the content carried by the field in the tag base data, for example, extract a keyword as the tag from the content carried by the field in the tag base data for carrying the material title.
It should be understood that the label system may also adopt other methods, and the label is set for the corresponding material data according to the label base data, and the method for performing labeling processing on the label system is not limited in any way.
In the data processing method provided by the embodiment of the application, the acquired material data is processed based on a preset data structure, and the label basic data conforming to the preset data structure is obtained, where the preset data structure includes a plurality of fields in a specified order, and different fields are used for bearing different types of attribute information of the material data; and inputting the obtained basic label data into a label system so that the label system carries out labeling processing according to the basic label data. The obtained material data are processed in the above way, so that the data input into the label system can be ensured to be label basic data conforming to a preset data structure, namely the data input into the label system are ensured to have the same data structure, so that the label system only needs to formulate a corresponding label method according to the preset data structure, and does not need to consume a large amount of manpower and material resources to formulate label methods capable of universally adapting to various data structures; in addition, when material data from a newly added source is faced, the material data is directly converted into corresponding label basic data based on the preset data structure, and the label system does not need to be subjected to version changing and upgrading.
For the data processing methods described above, the present application also provides corresponding data processing apparatuses, so as to facilitate the application and implementation of these methods in practice.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a data processing apparatus 300 corresponding to the method shown in fig. 2, where the data processing apparatus 300 includes:
an obtaining module 301, configured to obtain material data;
a processing module 302, configured to obtain, according to the material data and a preset data structure, tag base data that conforms to the preset data structure; the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data;
an input module 303, configured to input the tag basic data into a tag system, so that the tag system sets a tag for the material data according to the tag basic data.
Optionally, the processing module 302 includes:
the category determination module is used for determining the attribute categories corresponding to the fields of the material data according to the content borne by the fields of the material data;
and the writing module is used for writing the content borne by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data to obtain the label basic data conforming to the preset data structure.
Optionally, the preset data structure includes a plurality of fields for carrying region attribute information, and different fields for carrying region attribute information are used for carrying region attribute information of different region classes;
the write module is specifically configured to:
splitting the content carried by the fields corresponding to the region attribute categories in the material data into corresponding region content according to the region levels corresponding to the fields for carrying the region attribute information in the preset data structure;
and correspondingly writing the region content into a field for bearing region attribute information in the preset data structure.
Optionally, the preset data structure includes a field for bearing material keyword attribute information;
the write module is specifically configured to:
determining a material keyword according to content borne by a field corresponding to the material content attribute category in the material data;
and writing the material keywords into a field for bearing the material keyword attribute information in the preset data structure.
Optionally, the preset data structure includes fields for carrying the following attribute information: the method comprises the following steps of material source, material title, material subtitle, material creation time, material acquisition time, material content classification, material data type, material link address, material content text, material content abstract, primary region, secondary region, tertiary region and material audit source.
Optionally, the material data includes: news material data from traditional media and news material data from network media.
Optionally, a plurality of fields included in the preset data structure are stored in a form of a database table.
The data processing apparatus provided in the embodiment of the present application processes the acquired material data based on the preset data structure to obtain the tag base data that conforms to the preset data structure, where the preset data structure includes a plurality of fields in a specified order, and different fields are used for bearing different types of attribute information of the material data; and inputting the obtained basic label data into a label system so that the label system carries out labeling processing according to the basic label data. The obtained material data are processed in the above way, so that the data input into the label system can be ensured to be label basic data conforming to a preset data structure, namely the data input into the label system are ensured to have the same data structure, so that the label system only needs to formulate a corresponding label method according to the preset data structure, and does not need to consume a large amount of manpower and material resources to formulate label methods capable of universally adapting to various data structures; in addition, when material data from a newly added source is faced, the material data is directly converted into corresponding label basic data based on the preset data structure, and the label system does not need to be subjected to version changing and upgrading.
The data processing device comprises a processor and a memory, wherein the acquisition module, the processing module, the input module and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to one or more, and the player is tested by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the data processing method when executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the data processing method is executed when the program runs.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps:
acquiring material data;
obtaining label basic data conforming to a preset data structure according to the material data and the preset data structure; the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data;
and inputting the label basic data into a label system so that the label system sets a label for the material data according to the label basic data.
Optionally, the obtaining, according to the material data and a preset data structure, tag basic data conforming to the preset data structure includes:
determining attribute categories corresponding to the fields of the material data according to the content borne by the fields of the material data;
and writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data to obtain the label basic data conforming to the preset data structure.
Optionally, the preset data structure includes a plurality of fields for carrying region attribute information, and different fields for carrying region attribute information are used for carrying region attribute information of different region classes;
writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data, including:
splitting the content carried by the fields corresponding to the region attribute categories in the material data into corresponding region content according to the region levels corresponding to the fields for carrying the region attribute information in the preset data structure;
and correspondingly writing the region content into a field for bearing region attribute information in the preset data structure.
Optionally, the preset data structure includes a field for bearing material keyword attribute information;
writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data, including:
determining a material keyword according to content borne by a field corresponding to the material content attribute category in the material data;
and writing the material keywords into a field for bearing the material keyword attribute information in the preset data structure.
Optionally, the preset data structure includes fields for carrying the following attribute information: the method comprises the following steps of material source, material title, material subtitle, material creation time, material acquisition time, material content classification, material data type, material link address, material content text, material content abstract, primary region, secondary region, tertiary region and material audit source.
Optionally, the material data includes: news material data from traditional media and news material data from network media.
Optionally, a plurality of fields included in the preset data structure are stored in a form of a database table.
The device herein may be a PC, PAD, handset, etc.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:
acquiring material data;
obtaining label basic data conforming to a preset data structure according to the material data and the preset data structure; the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data;
and inputting the label basic data into a label system so that the label system sets a label for the material data according to the label basic data.
Optionally, the obtaining, according to the material data and a preset data structure, tag basic data conforming to the preset data structure includes:
determining attribute categories corresponding to the fields of the material data according to the content borne by the fields of the material data;
and writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data to obtain the label basic data conforming to the preset data structure.
Optionally, the preset data structure includes a plurality of fields for carrying region attribute information, and different fields for carrying region attribute information are used for carrying region attribute information of different region classes;
writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data, including:
splitting the content carried by the fields corresponding to the region attribute categories in the material data into corresponding region content according to the region levels corresponding to the fields for carrying the region attribute information in the preset data structure;
and correspondingly writing the region content into a field for bearing region attribute information in the preset data structure.
Optionally, the preset data structure includes a field for bearing material keyword attribute information;
writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data, including:
determining a material keyword according to content borne by a field corresponding to the material content attribute category in the material data;
and writing the material keywords into a field for bearing the material keyword attribute information in the preset data structure.
Optionally, the preset data structure includes fields for carrying the following attribute information: the method comprises the following steps of material source, material title, material subtitle, material creation time, material acquisition time, material content classification, material data type, material link address, material content text, material content abstract, primary region, secondary region, tertiary region and material audit source.
Optionally, the material data includes: news material data from traditional media and news material data from network media.
Optionally, a plurality of fields included in the preset data structure are stored in a form of a database table.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (10)
1. A method of data processing, the method comprising:
acquiring material data;
obtaining label basic data conforming to a preset data structure according to the material data and the preset data structure; the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data;
and inputting the label basic data into a label system so that the label system sets a label for the material data according to the label basic data.
2. The method according to claim 1, wherein the obtaining, according to the material data and a preset data structure, tag base data conforming to the preset data structure comprises:
determining attribute categories corresponding to the fields of the material data according to the content borne by the fields of the material data;
and writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data to obtain the label basic data conforming to the preset data structure.
3. The method according to claim 2, wherein the preset data structure includes a plurality of fields for carrying geographical attribute information, and different fields for carrying geographical attribute information are used for carrying geographical attribute information of different geographical levels;
writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data, including:
splitting the content carried by the fields corresponding to the region attribute categories in the material data into corresponding region content according to the region levels corresponding to the fields for carrying the region attribute information in the preset data structure;
and correspondingly writing the region content into a field for bearing region attribute information in the preset data structure.
4. The method according to claim 2, wherein the preset data structure comprises a field for carrying material keyword attribute information;
writing the content carried by each field of the material data into the corresponding field of the preset data structure according to the attribute category corresponding to each field of the material data, including:
determining a material keyword according to content borne by a field corresponding to the material content attribute category in the material data;
and writing the material keywords into a field for bearing the material keyword attribute information in the preset data structure.
5. The method according to claim 1, wherein the preset data structure comprises fields for carrying the following attribute information: the method comprises the following steps of material source, material title, material subheader, material creation time, material acquisition time, material content classification, material data type, material link address, material content text, material content abstract, primary region, secondary region, tertiary region and material audit source.
6. The method of claim 1, wherein the material data comprises: news material data from traditional media and news material data from network media.
7. The method of claim 1, wherein the plurality of fields included in the predetermined data structure are stored in a database table.
8. A data processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring material data;
the processing module is used for obtaining label basic data conforming to a preset data structure according to the material data and the preset data structure; the preset data structure comprises a plurality of fields with a specified sequence, and different fields are used for bearing different types of attribute information of the material data;
and the input module is used for inputting the label basic data into a label system so that the label system sets a label for the material data according to the label basic data.
9. A processor for running a program, wherein the program is run to perform the data processing method of any one of claims 1 to 7.
10. A storage medium, characterized in that the storage medium includes a stored program, wherein a device on which the storage medium is located is controlled to execute the data processing method according to any one of claims 1 to 7 when the program runs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811231400.8A CN111078905A (en) | 2018-10-22 | 2018-10-22 | Data processing method, device, medium and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811231400.8A CN111078905A (en) | 2018-10-22 | 2018-10-22 | Data processing method, device, medium and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111078905A true CN111078905A (en) | 2020-04-28 |
Family
ID=70309788
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811231400.8A Pending CN111078905A (en) | 2018-10-22 | 2018-10-22 | Data processing method, device, medium and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111078905A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112667871A (en) * | 2020-12-30 | 2021-04-16 | 新奥数能科技有限公司 | Data identification method and device, computer readable storage medium and electronic equipment |
CN113971500A (en) * | 2020-07-23 | 2022-01-25 | 中国移动通信集团广东有限公司 | Data subdivision management method and device and data management platform |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030004960A1 (en) * | 2001-05-17 | 2003-01-02 | Peter Pressmar | Virtual database of heterogeneous data structures |
CN106354857A (en) * | 2016-09-06 | 2017-01-25 | 中国传媒大学 | News tag management system |
CN107231570A (en) * | 2017-06-13 | 2017-10-03 | 中国传媒大学 | News data content characteristic obtains system and application system |
CN107748803A (en) * | 2017-11-20 | 2018-03-02 | 中国运载火箭技术研究院 | A kind of roomage state characteristic event database design method |
CN107861974A (en) * | 2017-09-19 | 2018-03-30 | 北京金堤科技有限公司 | A kind of adaptive network crawler system and its data capture method |
-
2018
- 2018-10-22 CN CN201811231400.8A patent/CN111078905A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030004960A1 (en) * | 2001-05-17 | 2003-01-02 | Peter Pressmar | Virtual database of heterogeneous data structures |
CN106354857A (en) * | 2016-09-06 | 2017-01-25 | 中国传媒大学 | News tag management system |
CN107231570A (en) * | 2017-06-13 | 2017-10-03 | 中国传媒大学 | News data content characteristic obtains system and application system |
CN107861974A (en) * | 2017-09-19 | 2018-03-30 | 北京金堤科技有限公司 | A kind of adaptive network crawler system and its data capture method |
CN107748803A (en) * | 2017-11-20 | 2018-03-02 | 中国运载火箭技术研究院 | A kind of roomage state characteristic event database design method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113971500A (en) * | 2020-07-23 | 2022-01-25 | 中国移动通信集团广东有限公司 | Data subdivision management method and device and data management platform |
CN112667871A (en) * | 2020-12-30 | 2021-04-16 | 新奥数能科技有限公司 | Data identification method and device, computer readable storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112203122B (en) | Similar video processing method and device based on artificial intelligence and electronic equipment | |
Staar et al. | Corpus conversion service: A machine learning platform to ingest documents at scale | |
US10409583B2 (en) | Content deployment system having a content publishing engine with a filter module for selectively extracting content items provided from content sources for integration into a specific release and methods for implementing the same | |
US20200264865A1 (en) | Content deployment system having a proxy for continuously providing selected content items to a content publishing engine for integration into a specific release and methods for implementing the same | |
CN110941621A (en) | Method and device for synchronizing databases between internal network and external network | |
US20210049711A1 (en) | Method of automatically transmitting data information and device of automatically transmitting data information | |
CN110895544B (en) | Interface data processing method, device, system and storage medium | |
CN107943465B (en) | Method and device for generating HTML (Hypertext markup language) form | |
CN110019111B (en) | Data processing method, data processing device, storage medium and processor | |
CN105630475A (en) | Data label organization system and organization method | |
CN112346761B (en) | Front-end resource online method, device, system and storage medium | |
CN104765849A (en) | Method and system for acquiring copied data source information | |
US20110055373A1 (en) | Service identification for resources in a computing environment | |
CN105447040B (en) | Binary file management and updating method, device and system | |
CN114416868B (en) | Data synchronization method, device, equipment and storage medium | |
CN111078905A (en) | Data processing method, device, medium and equipment | |
CN110716804A (en) | Method and device for automatically deleting useless resources, storage medium and electronic equipment | |
CN114428705A (en) | Network data monitoring method, device, equipment and storage medium | |
CN117271478A (en) | Data migration method and device, storage medium and electronic equipment | |
CN113254455A (en) | Dynamic configuration method and device of database, computer equipment and storage medium | |
CN117539981A (en) | Method, equipment and medium for constructing theme data set | |
CN108520012B (en) | Mobile internet user comment mining method based on machine learning | |
CN116594628A (en) | Data tracing method and device and computer equipment | |
CN112491943A (en) | Data request method, device, storage medium and electronic equipment | |
CN108268545B (en) | Method and device for establishing hierarchical user label library |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200428 |
|
RJ01 | Rejection of invention patent application after publication |