Nothing Special   »   [go: up one dir, main page]

CN110134702A - Data flow joining method, device, equipment and storage medium - Google Patents

Data flow joining method, device, equipment and storage medium Download PDF

Info

Publication number
CN110134702A
CN110134702A CN201910412910.3A CN201910412910A CN110134702A CN 110134702 A CN110134702 A CN 110134702A CN 201910412910 A CN201910412910 A CN 201910412910A CN 110134702 A CN110134702 A CN 110134702A
Authority
CN
China
Prior art keywords
media information
data source
data
current media
external storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910412910.3A
Other languages
Chinese (zh)
Inventor
张小虎
万田起
程怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910412910.3A priority Critical patent/CN110134702A/en
Publication of CN110134702A publication Critical patent/CN110134702A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention discloses a kind of data flow joining method, device, equipment and storage mediums.This method comprises: obtaining at least three data Source logs of current media information from least three data sources;Determine the line unit of the current media information;Wherein, there are unique mapping relations between media communication and line unit;It, will be in same a line of at least three data source logs write-in external storage of the current media information according to the line unit of the current media information.Multiple row characteristic of the embodiment of the present invention based on table in memory, by being written a plurality of data flow of same media communication in a line, realize effectively integrating for the same a plurality of data flow of media communication in splicing, in order in the subsequent data analysis of splicing, multiple data Source logs of media communication are quickly obtained from external storage, to improve splicing efficiency, reduce the resource occupation of splicing.

Description

Data flow joining method, device, equipment and storage medium
Technical field
The present embodiments relate to technical field of data processing more particularly to a kind of data flow joining methods, device, equipment And storage medium.
Background technique
With the fast development of Internet technology, the exhibition method of media communication gradually diversification.In order to excavate medium letter The dispensing effect of breath needs the background class log to media communication, exposure class log, clicks class log and conversion class log progress Analysis.Since media communication can there are many dispensing channels, that is, same type log can be there are many data source.Therefore in matchmaker In the dispensing effect mining process of Jie's information, the data to the multiple data sources of media communication is needed to splice.
Currently, generalling use streaming splicing, i.e., it is based on data flow arrival time using streaming engine, by first number According in stream write-in external storage, when second data stream reaches, the external storage and latter is written into second data flow In a external storage, and so on, to be spliced in real time to the two data streams in external storage.If supported a plurality of Data flow splicing, then need to be aligned the progress between a plurality of data flow, while saving data flow state into memory, benefit The splicing two-by-two of data flow is supported with multiple operators.
However, the splicing for a plurality of data flow, the timeliness of available data stream connecting method is lower, disappears for system resource It consumes larger, can not support the real-time splicing of a plurality of data flow.
Summary of the invention
The embodiment of the invention provides a kind of data flow joining method, device, equipment and storage mediums, can be in data flow The a plurality of data flow of same media communication is effectively integrated in splicing.
In a first aspect, it is applied to External memory equipment the embodiment of the invention provides a kind of data flow joining method, it is described Method includes:
At least three data Source logs of current media information are obtained from least three data sources;
Determine the line unit of the current media information;Wherein, there are unique mapping relations between media communication and line unit;
According to the line unit of the current media information, at least three data source logs of the current media information are written In same a line of external storage.
Second aspect, the embodiment of the invention provides a kind of data flow splicing apparatus, are configured at External memory equipment, described Device includes:
Data flow obtains module, for obtaining at least three data sources of current media information from least three data sources Log;
Medium line unit determining module, for determining the line unit of the current media information;Wherein, media communication and line unit it Between have unique mapping relations;
Data flow writing module, for the line unit according to the current media information, extremely by the current media information In same a line of few three data Source logs write-in external storage.
The third aspect, the embodiment of the invention provides a kind of equipment, comprising:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes data flow joining method described in any embodiment of that present invention.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey Sequence realizes data flow joining method described in any embodiment of that present invention when the program is executed by processor.
The embodiment of the present invention obtains at least three data sources of current media information in the splicing of a plurality of data flow Log is deposited at least three data source logs write-in outside of same media communication based on the line unit of current media information association In same a line of storage.Multiple row characteristic of the embodiment of the present invention based on table in memory, by by a plurality of of same media communication Data flow write-in is with effectively integrating for the same a plurality of data flow of media communication in splicing in a line, is realized, in order to spell In the subsequent data analysis of termination process, multiple data Source logs of media communication are quickly obtained from external storage, to improve Splicing efficiency, reduces the resource occupation of splicing.
Detailed description of the invention
Fig. 1 is a kind of flow chart for data flow joining method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of data flow joining method provided by Embodiment 2 of the present invention;
Fig. 3 is that multi-source data stream provided by Embodiment 2 of the present invention splices schematic diagram;
Fig. 4 is a kind of structural schematic diagram for data flow splicing apparatus that the embodiment of the present invention three provides;
Fig. 5 is a kind of structural schematic diagram for equipment that the embodiment of the present invention four provides.
Specific embodiment
The embodiment of the present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this Locate described specific embodiment and is used only for explaining the embodiment of the present invention, rather than limitation of the invention.It further needs exist for Bright, only parts related to embodiments of the present invention are shown for ease of description, in attached drawing rather than entire infrastructure.
It also should be noted that illustrate only part relevant to the application for ease of description, in attached drawing rather than Full content.It should be mentioned that some exemplary embodiments are described before exemplary embodiment is discussed in greater detail At the processing or method described as flow chart.Although operations (or step) are described as the processing of sequence by flow chart, It is that many of these operations can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be by again It arranges.The processing can be terminated when its operations are completed, it is also possible to have the additional step being not included in attached drawing. The processing can correspond to method, function, regulation, subroutine, subprogram etc..
Embodiment one
Fig. 1 is a kind of flow chart for data flow joining method that the embodiment of the present invention one provides, and the present embodiment is applicable to The case where splicing to multi-source data stream, this method can be executed by External memory equipment, and this method can be by a kind of data flow Splicing apparatus executes, which can be realized by the way of software and/or hardware, be preferably arranged in external storage and set It is standby.This method specifically includes as follows:
S110, at least three data Source logs that current media information is obtained from least three data sources.
In the specific embodiment of the invention, media communication refers to the information for publicizing or propagating, and may include advertisement, shadow Depending on resource etc..Data source, which refers to, provides the device or original media of current media information, may include server, website platform Deng.Correspondingly, data Source log can be generated under each data source, for recording user behavior.For example, in advertisement field, it can To publicize same advertisement dispensing in different websites, while website can generate backstage log, exposure day as data source The web log files such as will, click logs and conversion log, the physics for recording advertisement respectively launch record, reflect advertisement exposure journey Degree, record user access, browsing, click behavior and user are changed into the conversion behaviors such as member or client by advertisement.Cause This, data Source log of the media communication under different data sources can help webmaster, operation personnel, extension worker etc. real When obtain Web Site Traffic Information, and provide the number of web analytics from traffic source, web site contents, site visitor's characteristic etc. are many-sided According to foundation.To help to improve website traffic, website user's experience is promoted, allows visitor more to precipitate and becomes member or visitor Family obtains maximized income by the investment of less media communication.
In the present embodiment, it is only able to achieve two data streams for the prior art and splices in real time, and can not support a plurality of data The technical problem spliced in real time is flowed, the current media communication to be analyzed is obtained from least three data sources of its dispensing Take at least three data Source logs of current media information.Wherein, at least one data Source log can be generated in each data source. Illustratively, it is assumed that using advertisement A as current media communication to be analyzed, it is determined that at least three data that advertisement A is launched Source, such as website A, website B and website C then obtain the correlation log of advertisement A from the server of this website respectively.
S120, the line unit for determining current media information.
In the specific embodiment of the invention, line unit refers to the unique identification of media communication, can be using in media communication The information such as key message, timestamp indicate.There are unique mapping relations between media communication and line unit, be convenient for multi-source data stream The quick search of data after integration.
Illustratively, the unified time stamp of current media information is selected from the timestamp of at least three data Source logs, Unified time stamp can be the first timestamp at least three data Source logs, combine current media information keyword and The unified time of current media information stabs, the line unit as current media information.
S130, according to the line unit of current media information, will current media information at least three data source logs write-in it is outer In same a line of portion's storage.
It, can be true based on the unique mapping relations having between media communication and line unit in the specific embodiment of the invention Determine the row of current media information association in external storage.And the multiple row characteristic based on table in memory, it can determine each data Source associated column in the external storage row.Wherein, if in external storage there is no with the associated column in any data source, outside Configuration and the associated column of the data source in portion's storage, to update the mapping relations in external storage between column and data source.To By at least three data Source logs of current media information, external storage is respectively written into at least three column of a line.Specifically, Each data Source log of current media information can be written in the associated column of the data source as key assignments, it can also will be current Data source association is written collectively as key assignments in each data Source log of media communication and the timestamp of the data Source log Column in, to complete the integration of multi-source data stream in splicing.To ring in the data analysis process of splicing At least three should be inquired in the row of target media information association in the splicing request for the target media information that splicing module is sent Data Source log;At least three data Source logs arrived to splicing module feedback query carry out multi-source data stream for splicing module Splicing.
The technical solution of the present embodiment obtains at least the three of current media information in the splicing of a plurality of data flow A data Source log, based on the line unit of current media information association, by least three data source log writes of same media communication Enter in same a line of external storage.Multiple row characteristic of the embodiment of the present invention based on table in memory, by believing same medium The a plurality of data flow write-in of breath in a line, realizing effectively integrating for the same a plurality of data flow of media communication in splicing, In order to quickly obtain multiple data Source logs of media communication from external storage in the subsequent data analysis of splicing, To improve splicing efficiency, reduce the resource occupation of splicing.
Embodiment two
The present embodiment on the basis of the above embodiment 1, provides a preferred implementation side of data flow joining method Formula can carry out the disposable acquisition and disposable spelling of multi-source data stream based on the multi-source data stream integrated in external storage It connects.Fig. 2 be a kind of flow chart of data flow joining method provided by Embodiment 2 of the present invention, as shown in Fig. 2, this method include with It is lower specific:
S210, at least three data Source logs that current media information is obtained from least three data sources.
In the specific embodiment of the invention, the dispensing that there are current media information at least three data sources to carry out information, often A data source can generate at least one data Source log, and then be launched when carrying out data flow splicing to current media information with realizing When effect analysis, data Source log can be obtained respectively from least three data sources of current media information, to obtain at least Three data Source logs.
S220, the unified time stamp that current media information is selected from the timestamp of at least three data Source logs.
In the specific embodiment of the invention, different data Source log has its respective generation progress, corresponding timestamp Illustrate the generation time of data Source log.By parsing at least three data Source logs, each data Source log is determined Timestamp, one can be selected from the timestamp of at least three data Source logs, as current media information it is unified when Between stab.Illustratively, by the first timestamp of at least three data Source logs, the unified time as current media information is stabbed. The unified time stamp of current media information can identify the approximate time node of the current media inter-area traffic interarea got.
The unified time stamp of S230, the keyword for combining current media information and current media information, as current media The line unit of information.
In the specific embodiment of the invention, the keyword of current media information can be that can be identified for that or unique identification is current Field of media communication, such as advertised name, advertisement version etc..Line unit refers to the unique identification of current media information, medium letter There are unique mapping relations between breath and line unit.It can be by the unification of the keyword of current media information and current media information Timestamp is combined, collectively as the line unit of current media information, for multiple data under unique identification current media information The log in source.
S240, it is closed according to unique mapping in the line unit and external storage of current media information between column and data source System, by at least three data source logs write-in external storage of current media information at least three column of a line.
It, can be true based on the unique mapping relations having between media communication and line unit in the specific embodiment of the invention Determine the row of current media information association in external storage.Based on the multiple row characteristic of table in memory, each data source can be determined The associated column in the external storage row.To using at least three data Source logs of current media information as key assignments, i.e., with The corresponding storing data content of line unit, be respectively written into external storage this in a line it is associated at least three column in.
Optionally, by the timestamp of each the data Source log and the data Source log of current media information, as key assignments It is written in the associated column of the data source.
It, can also be in the writing process of data, by each data source in addition to the unified time stamp in line unit in the present embodiment The timestamp of log is used as key assignments together with data Source log, is written in the associated column of external storage.To in a line not Data Source log in same column has respective associated timestamp information, convenient for the further inquiry and analysis of data.
Optionally, if in external storage there is no with the associated column in any data source, in external storage configuration and should The associated column of data source, to update the mapping relations in external storage between column and data source;Data source log write-in is outer In the column of portion's storage configuration.
In the present embodiment, media communication data flow for the first time splicing when or media communication there are new dispensing data When source, may and the associated column of the data source be not present in external storage.If being therefore not present and appointing in external storage The associated column of one data source, then configuration and the associated column of the data source in external storage, and update in external storage column and number According to the mapping relations between source.To determine the row of current media information association based on line unit, which is written outer Portion stores in the column newly configured in the row.
Illustratively, the column feature based on table in memory, integrates multi-source data stream in splicing The results are shown in Table 1.Wherein, RowKey column indicate the line unit constituted with time stamp T imestamp_logkey. ColumnFamily indicates column family, and each column Column indicates one data source of mark in column family.For example, same company The corresponding data flow of different web sites Column under ColumnFamily.As it can be seen from table 1 in data flow splicing, base In line unit, the data Source log that different data sources under same media communication are generated uniformly is stored in same a line as key assignments In different lines, and then the data Source log for obtaining all data sources under the media communication can be disposably inquired based on line unit.
Multi-source data stream integrates sample table in 1 data flow splicing of table
The splicing request of S250, the target media information sent in response to splicing module, in target media information association At least three data Source logs are inquired in row.
In the specific embodiment of the invention, splicing module refers to the module in splicing for data analysis, splices mould Block can be arranged in the computer equipment different from external storage.Splicing module is by the external storage for being integrated with data The splicing request for sending target media information, integral data to be analyzed is obtained with this.Wherein, it may include mesh in splicing request Mark media communication or line unit information.Correspondingly, the splicing for the target media information that external-device response is sent in splicing module is asked It asks, is indexed based on line unit, by splicing request being compared with line unit, is inquired with this in the table of external storage Row where target media information, thus from at least three data source days disposably obtained in the row under target media information Will.
S260, at least three data Source logs arrived to splicing module feedback query carry out multi-source data for splicing module Stream splicing.
In the specific embodiment of the invention, in data flow splicing, External memory equipment will inquire the target obtained The total data Source log of media communication, disposably feeds back to splicing module, thus total data of the splicing module according to acquisition Stream disposably splice.
Specifically, Fig. 3 is that multi-source data stream splices schematic diagram.As shown in figure 3, in the Data Integration of data flow splicing The data flow log of multiple data sources of same media communication is stored in outer by the stage based on the column characteristic in memory table In the different lines of same a line of portion's storage, the integration of multi-source data stream is realized by external storage.In data flow splicing Data analysis phase, external memory receives the splicing request that splicing module is sent, based on the timestamp in line unit, according to data Degree of flowing into can quickly inquire the multi-source data stream of target media information from external storage.To which splicing module can be primary Property obtain multiple data Source logs of target media information, and each data flow progress detected according to water table management device, and The preconfigured splicing condition of business demand, disposably splices multiple data Source logs, and can support day rank Splice window.
The technical solution of the present embodiment obtains at least the three of current media information in the splicing of a plurality of data flow A data Source log, based on the line unit of current media information association, by least three data source log writes of same media communication Enter in same a line of external storage, so that multiple data Source logs of target media information are inquired in data flow splicing, it will be more A data Source log is disposably supplied to splicing module, so that splicing module is disposably spliced.The embodiment of the present invention is based on The multiple row characteristic of table in memory, by by the write-in of a plurality of data flow of same media communication in a line, realizing splicing The a plurality of data flow of same media communication effectively integrates in the process, in order in the subsequent data analysis of splicing, quickly from Multiple data Source logs that media communication is obtained in external storage, reduce the splicing number of data flow, to improve splicing effect Rate reduces the resource occupation of splicing.
Embodiment three
Fig. 4 is a kind of structural schematic diagram for data flow splicing apparatus that the embodiment of the present invention three provides, and the present embodiment can fit The case where for splicing to multi-source data stream, the device can realize data flow splicing side described in any embodiment of that present invention Method.The device specifically includes:
Data flow obtains module 410, for obtaining at least three numbers of current media information from least three data sources According to Source log;
Medium line unit determining module 420, for determining the line unit of the current media information;Wherein, media communication and row There are unique mapping relations between key;
Data flow writing module 430, for the line unit according to the current media information, by the current media information At least three data source logs are written in same a line of external storage.
Optionally, the medium line unit determining module 420 is specifically used for:
The unified time stamp of current media information is selected from the timestamp of at least three data Source log;
The keyword of the current media information and the unified time stamp of the current media information are combined, as current matchmaker The line unit of Jie's information.
Optionally, the medium line unit determining module 420 is specifically used for:
By the first timestamp of at least three data Source log, the unified time as current media information is stabbed.
Optionally, the data flow writing module 430 is specifically used for:
According to unique mapping relations in the external storage between column and data source, extremely by the current media information The external storage is written at least three column of a line in few three data Source logs.
Optionally, the data flow writing module 430 is specifically used for:
If in the external storage there is no with the associated column in any data source, in the external storage configuration and should The associated column of data source, to update the mapping relations in external storage between column and data source;
The data source log is written in the column of the external storage configuration.
Optionally, the data flow writing module 430 is specifically used for:
By the timestamp of each the data Source log and the data Source log of the current media information, be written as key assignments In the associated column of the data source.
Further, described device further includes Data stream query module 440;The Data stream query module 440 is specifically used In:
Described by it in same a line of at least three data source logs write-in external storage of the current media information Afterwards, in response to splicing module send target media information splicing request, inquired in the row of target media information association to Few three data Source logs;
At least three data Source logs arrived to the splicing module feedback query carry out multi-source number for the splicing module Splice according to stream.
The technical solution of the present embodiment, by the mutual cooperation between each functional module, realize data flow acquisition, The functions such as the determination of line unit, the write-in of key assignments and the inquiry of data flow.The embodiment of the present invention based in memory table it is more Column characteristic, by believing a plurality of data flow write-in of same media communication with same medium in splicing in a line, is realized Effectively integrating for a plurality of data flow is ceased, in order to quickly obtain matchmaker from external storage in the subsequent data analysis of splicing Multiple data Source logs of Jie's information, the splicing number for reducing data flow reduce splicing to improve splicing efficiency Resource occupation.
Example IV
Fig. 5 is a kind of structural schematic diagram for equipment that the embodiment of the present invention four provides, and Fig. 5, which is shown, to be suitable for being used to realizing this The block diagram of the example devices of inventive embodiments embodiment.The equipment that Fig. 5 is shown is only an example, should not be to the present invention The function and use scope of embodiment bring any restrictions.
The equipment 12 that Fig. 5 is shown is only an example, should not function to the embodiment of the present invention and use scope bring Any restrictions.The equipment 12 is preferably External memory equipment.
As shown in figure 5, equipment 12 is showed in the form of universal computing device.The component of equipment 12 may include but unlimited In one or more processor 16, system storage 28, different system components (including system storage 28 and processing are connected Device 16) bus 18.
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by equipment 12 The usable medium of access, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 30 and/or cache memory 32.Equipment 12 may further include it is other it is removable/nonremovable, Volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing irremovable , non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 5, use can be provided In the disc driver read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to removable anonvolatile optical disk The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can To be connected by one or more data media interfaces with bus 18.System storage 28 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention real Apply the function of each embodiment of example.
Program/utility 40 with one group of (at least one) program module 42 can store and store in such as system In device 28, such program module 42 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 42 Usually execute the function and/or method in described embodiment of the embodiment of the present invention.
Equipment 12 can also be communicated with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.), Can also be enabled a user to one or more equipment interacted with the equipment 12 communication, and/or with enable the equipment 12 with One or more of the other any equipment (such as network interface card, modem etc.) communication for calculating equipment and being communicated.It is this logical Letter can be carried out by input/output (I/O) interface 22.Also, equipment 12 can also by network adapter 20 and one or The multiple networks of person (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown, Network adapter 20 is communicated by bus 18 with other modules of equipment 12.It should be understood that although not shown in the drawings, can combine Equipment 12 uses other hardware and/or software module, including but not limited to: microcode, device driver, redundant processor, outer Portion's disk drive array, RAID system, tape drive and data backup storage system etc..
The program that processor 16 is stored in system storage 28 by operation, thereby executing various function application and number According to processing, such as realize data flow joining method provided by the embodiment of the present invention.
Embodiment five
The embodiment of the present invention five also provides a kind of computer readable storage medium, be stored thereon with computer program (or For computer executable instructions), for executing a kind of data flow joining method, this method packet when which is executed by processor It includes:
At least three data Source logs of current media information are obtained from least three data sources;
Determine the line unit of the current media information;Wherein, there are unique mapping relations between media communication and line unit;
According to the line unit of the current media information, at least three data source logs of the current media information are written In same a line of external storage.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with one or more programming languages or combinations thereof come write for execute the embodiment of the present invention operation Computer program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed in equipment.In situations involving remote computers, remote computer can pass through the network of any kind --- including Local area network (LAN) or wide area network (WAN)-are connected to subscriber computer, or, it may be connected to outer computer (such as using ISP is connected by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being implemented by above embodiments to the present invention Example is described in further detail, but the embodiment of the present invention is not limited only to above embodiments, is not departing from structure of the present invention It can also include more other equivalent embodiments in the case where think of, and the scope of the present invention is determined by scope of the appended claims It is fixed.

Claims (16)

1. a kind of data flow joining method, which is characterized in that be applied to External memory equipment, which comprises
At least three data Source logs of current media information are obtained from least three data sources;
Determine the line unit of the current media information;Wherein, there are unique mapping relations between media communication and line unit;
According to the line unit of the current media information, at least three data source logs of the current media information are written external In same a line of storage.
2. the method according to claim 1, wherein the line unit of the determination current media information, comprising:
The unified time stamp of current media information is selected from the timestamp of at least three data Source log;
The keyword of the current media information and the unified time stamp of the current media information are combined, is believed as current media The line unit of breath.
3. according to the method described in claim 2, it is characterized in that, the timestamp from at least three data Source log The unified time stamp of middle selection current media information, comprising:
By the first timestamp of at least three data Source log, the unified time as current media information is stabbed.
4. the method according to claim 1, wherein at least three data by the current media information Source log is written in same a line of external storage, comprising:
According to unique mapping relations in the external storage between column and data source, by least the three of the current media information The external storage is written at least three column of a line in a data Source log.
5. according to the method described in claim 4, it is characterized in that, described arrange between data source according in the external storage Unique mapping relations, the external storage is written into a line at least three data source logs of the current media information In at least three column, comprising:
If being not present in the external storage with the associated column in any data source, configuration and the data in the external storage The associated column in source, to update the mapping relations in external storage between column and data source;
The data source log is written in the column of the external storage configuration.
6. according to the method described in claim 4, it is characterized in that, at least three data by the current media information Source log is written in same a line of external storage, comprising:
By the timestamp of each the data Source log and the data Source log of the current media information, the number is written as key assignments According in the associated column in source.
7. the method according to claim 1, wherein at least three numbers by the current media information After in same a line of Source log write-in external storage, further includes:
In response to splicing module send target media information splicing request, inquired in the row of target media information association to Few three data Source logs;
At least three data Source logs arrived to the splicing module feedback query carry out multi-source data stream for the splicing module Splicing.
8. a kind of data flow splicing apparatus, which is characterized in that be configured at External memory equipment, described device includes:
Data flow obtains module, for obtaining at least three data source days of current media information from least three data sources Will;
Medium line unit determining module, for determining the line unit of the current media information;Wherein, have between media communication and line unit There are unique mapping relations;
Data flow writing module, for the line unit according to the current media information, by least the three of the current media information In same a line of a data Source log write-in external storage.
9. device according to claim 8, which is characterized in that the medium line unit determining module is specifically used for:
The unified time stamp of current media information is selected from the timestamp of at least three data Source log;
The keyword of the current media information and the unified time stamp of the current media information are combined, is believed as current media The line unit of breath.
10. device according to claim 9, which is characterized in that the medium line unit determining module is specifically used for:
By the first timestamp of at least three data Source log, the unified time as current media information is stabbed.
11. device according to claim 8, which is characterized in that the data flow writing module is specifically used for:
According to unique mapping relations in the external storage between column and data source, by least the three of the current media information The external storage is written at least three column of a line in a data Source log.
12. device according to claim 11, which is characterized in that the data flow writing module is specifically used for:
If being not present in the external storage with the associated column in any data source, configuration and the data in the external storage The associated column in source, to update the mapping relations in external storage between column and data source;
The data source log is written in the column of the external storage configuration.
13. device according to claim 11, which is characterized in that the data flow writing module is specifically used for:
By the timestamp of each the data Source log and the data Source log of the current media information, the number is written as key assignments According in the associated column in source.
14. device according to claim 8, which is characterized in that described device further includes Data stream query module, the number It is as follows for executing according to continuous query module:
After in same a line that external storage is written in at least three data source logs by the current media information, ring At least three should be inquired in the row of target media information association in the splicing request for the target media information that splicing module is sent Data Source log;
At least three data Source logs arrived to the splicing module feedback query carry out multi-source data stream for the splicing module Splicing.
15. a kind of equipment characterized by comprising
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as data flow joining method of any of claims 1-7.
16. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Such as data flow joining method of any of claims 1-7 is realized when execution.
CN201910412910.3A 2019-05-17 2019-05-17 Data flow joining method, device, equipment and storage medium Pending CN110134702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910412910.3A CN110134702A (en) 2019-05-17 2019-05-17 Data flow joining method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910412910.3A CN110134702A (en) 2019-05-17 2019-05-17 Data flow joining method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110134702A true CN110134702A (en) 2019-08-16

Family

ID=67574984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910412910.3A Pending CN110134702A (en) 2019-05-17 2019-05-17 Data flow joining method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110134702A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502506A (en) * 2019-08-29 2019-11-26 北京博睿宏远数据科技股份有限公司 A kind of data processing method, device, equipment and storage medium
CN110515954A (en) * 2019-08-29 2019-11-29 北京博睿宏远数据科技股份有限公司 A kind of data processing method, device, equipment and storage medium
CN111600944A (en) * 2020-05-12 2020-08-28 北京锐安科技有限公司 Data processing method, device, equipment and storage medium
CN111831383A (en) * 2020-07-20 2020-10-27 北京百度网讯科技有限公司 Window splicing method, device, equipment and storage medium
CN112434023A (en) * 2019-08-26 2021-03-02 长鑫存储技术有限公司 Process data analysis method and device, storage medium and computer equipment
CN113127511A (en) * 2020-01-15 2021-07-16 百度在线网络技术(北京)有限公司 Data splicing method and device for multiple data streams, electronic equipment and storage medium
CN113127512A (en) * 2020-01-15 2021-07-16 百度在线网络技术(北京)有限公司 Data splicing triggering method and device for multiple data streams, electronic equipment and medium
CN113377809A (en) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 Data processing method and apparatus, computing device, and medium
JP2023534347A (en) * 2021-06-23 2023-08-09 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Data processing method and apparatus, computing equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102510568A (en) * 2011-11-22 2012-06-20 联通宽带业务应用国家工程实验室有限公司 Internet access data processing system and method for mobile terminal
CN103685207A (en) * 2012-09-21 2014-03-26 百度在线网络技术(北京)有限公司 System, apparatus, and method for integrating data spanning data sources
CN103810224A (en) * 2012-11-15 2014-05-21 阿里巴巴集团控股有限公司 Information persistence and query method and device
CN103870570A (en) * 2014-03-14 2014-06-18 广州携智信息科技有限公司 HBase (Hadoop database) data usability and durability method based on remote log backup
CN104391910A (en) * 2014-11-17 2015-03-04 西安交通大学 HBase-based tax statistic report storage and calculation method
CN104951462A (en) * 2014-03-27 2015-09-30 国际商业机器公司 Method and system for managing data base
US20180189339A1 (en) * 2016-12-30 2018-07-05 Dropbox, Inc. Event context enrichment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102510568A (en) * 2011-11-22 2012-06-20 联通宽带业务应用国家工程实验室有限公司 Internet access data processing system and method for mobile terminal
CN103685207A (en) * 2012-09-21 2014-03-26 百度在线网络技术(北京)有限公司 System, apparatus, and method for integrating data spanning data sources
CN103810224A (en) * 2012-11-15 2014-05-21 阿里巴巴集团控股有限公司 Information persistence and query method and device
CN103870570A (en) * 2014-03-14 2014-06-18 广州携智信息科技有限公司 HBase (Hadoop database) data usability and durability method based on remote log backup
CN104951462A (en) * 2014-03-27 2015-09-30 国际商业机器公司 Method and system for managing data base
CN104391910A (en) * 2014-11-17 2015-03-04 西安交通大学 HBase-based tax statistic report storage and calculation method
US20180189339A1 (en) * 2016-12-30 2018-07-05 Dropbox, Inc. Event context enrichment

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434023A (en) * 2019-08-26 2021-03-02 长鑫存储技术有限公司 Process data analysis method and device, storage medium and computer equipment
CN110515954A (en) * 2019-08-29 2019-11-29 北京博睿宏远数据科技股份有限公司 A kind of data processing method, device, equipment and storage medium
CN110515954B (en) * 2019-08-29 2023-01-31 北京博睿宏远数据科技股份有限公司 Data processing method, device, equipment and storage medium
CN110502506A (en) * 2019-08-29 2019-11-26 北京博睿宏远数据科技股份有限公司 A kind of data processing method, device, equipment and storage medium
CN113127512A (en) * 2020-01-15 2021-07-16 百度在线网络技术(北京)有限公司 Data splicing triggering method and device for multiple data streams, electronic equipment and medium
CN113127511A (en) * 2020-01-15 2021-07-16 百度在线网络技术(北京)有限公司 Data splicing method and device for multiple data streams, electronic equipment and storage medium
CN113127511B (en) * 2020-01-15 2023-09-15 百度在线网络技术(北京)有限公司 Multi-data stream data splicing method and device, electronic equipment and storage medium
CN113127512B (en) * 2020-01-15 2023-09-29 百度在线网络技术(北京)有限公司 Multi-data stream data splicing triggering method and device, electronic equipment and medium
CN111600944A (en) * 2020-05-12 2020-08-28 北京锐安科技有限公司 Data processing method, device, equipment and storage medium
CN111600944B (en) * 2020-05-12 2023-02-28 北京锐安科技有限公司 Data processing method, device, equipment and storage medium
CN111831383A (en) * 2020-07-20 2020-10-27 北京百度网讯科技有限公司 Window splicing method, device, equipment and storage medium
CN113377809A (en) * 2021-06-23 2021-09-10 北京百度网讯科技有限公司 Data processing method and apparatus, computing device, and medium
WO2022267368A1 (en) * 2021-06-23 2022-12-29 北京百度网讯科技有限公司 Data processing method and apparatus, and computing device and medium
JP2023534347A (en) * 2021-06-23 2023-08-09 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Data processing method and apparatus, computing equipment and medium

Similar Documents

Publication Publication Date Title
CN110134702A (en) Data flow joining method, device, equipment and storage medium
US11238099B2 (en) Method and device for obtaining answer, and computer device
CN108140007B (en) Securely deploying applications across deployment locations
CN110008045B (en) Method, device and equipment for aggregating microservices and storage medium
CN111813804B (en) Data query method and device, electronic equipment and storage medium
CN109495392B (en) Message conversion processing method and device, electronic equipment and storage medium
US10572597B2 (en) Resolution of acronyms in question answering systems
CN114528044B (en) Interface calling method, device, equipment and medium
CN109634764A (en) Work-flow control method, apparatus, equipment, storage medium and system
US10216802B2 (en) Presenting answers from concept-based representation of a topic oriented pipeline
CN111552895B (en) Page route analysis method, system, equipment and medium in applet application
US10380257B2 (en) Generating answers from concept-based representation of a topic oriented pipeline
CN109033456B (en) Condition query method and device, electronic equipment and storage medium
CN109669790A (en) Data sharing method, device, shared platform and storage medium based on cloud platform
US10339205B2 (en) Efficient handling of bi-directional data
CN117271554A (en) Distributed database view processing method, device, equipment and storage medium
CN113220237B (en) Distributed storage method, device, equipment and storage medium
US20210157881A1 (en) Object oriented self-discovered cognitive chatbot
CN112364268A (en) Resource acquisition method and device, electronic equipment and storage medium
CN112288452A (en) Advertisement preview method and device, electronic equipment and storage medium
CN113572809B (en) Single request source multi-target source data communication method, computer equipment and storage medium
US20230419047A1 (en) Dynamic meeting attendee introduction generation and presentation
CN111428544B (en) Scene recognition method and device, electronic equipment and storage medium
CN114339125A (en) Voice broadcasting method, device, equipment and storage medium
US20220253455A1 (en) Reducing character set conversion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816

RJ01 Rejection of invention patent application after publication