CN116910113B

CN116910113B - Streaming statistics method and device for archive data, server and readable storage medium

Info

Publication number: CN116910113B
Application number: CN202310402956.3A
Authority: CN
Inventors: 陈常雨
Original assignee: Beijing Hesi Information Technology Co Ltd
Current assignee: Beijing Hesi Information Technology Co Ltd
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2024-10-29
Anticipated expiration: 2043-04-14
Also published as: CN116910113A

Abstract

The invention provides a streaming statistics method, a streaming statistics device, a streaming statistics server and a streaming statistics storage medium of archive data, wherein the streaming statistics method comprises the following steps: monitoring the data change of the archive data stored in each database in the database cluster through a data monitoring model, and generating a data change message corresponding to a target database when monitoring that the target database is in the data change; maintaining associated fields corresponding to the changed archive data based on the data change message and the first data category of the changed archive data through a data monitoring model; and issuing the maintained associated field to a target database through a data monitoring model, so that when the target database receives the file inquiry request, inquiring and counting the file identifier to be inquired carried by the file inquiry request based on the maintained associated field, and obtaining a file inquiry result. The invention can obviously relieve the pressure on the service memory when inquiring the archive data and can also effectively improve the inquiring efficiency of the archive data.

Description

Streaming statistics method and device for archive data, server and readable storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a server, and a readable storage medium for stream statistics of archive data.

Background

At present, the file management system platform can be accessed with various third party electronic accounting file data, such as information of accounting vouchers, original receipts, bank electronic receipts, value-added tax receipts and the like. The existing enterprise management financial archives are generally managed by the dimension of the certificates, and comprise data such as invoices, receipts and the like under the certificates, wherein the data are pushed from manufacturers of different platforms, and information such as the invoices, the receipts and the like associated under the certificates are changed at any time.

In practical applications, the association between the voucher and the invoice may not be a direct association, such as a voucher associated document, a document re-associated invoice. Based on the relation, the number of invoices corresponding to the certificates cannot be directly inquired through the relation when the certificates are inquired, and if the certificates are inquired through a layer-by-layer relation when the certificates are inquired, certain pressure is caused on a service memory, and the inquiry efficiency is poor.

Disclosure of Invention

Accordingly, the present invention is directed to a method, an apparatus, a server, and a readable storage medium for streaming statistics of archive data, which can significantly relieve the pressure on a service memory when querying archive data, and can effectively improve the querying efficiency of archive data.

In a first aspect, an embodiment of the present invention provides a method for streaming statistics of archive data, where the method is applied to a server, and the server is configured with a data listening model, and the method includes:

Monitoring the data change of the archive data stored by each database in the database cluster through the data monitoring model, and generating a data change message corresponding to a target database when the target database is monitored to be in the data change; wherein the data change message at least comprises a first file identifier of changed file data;

Maintaining associated fields corresponding to the changed archive data based on the data change message and the first data category of the changed archive data through the data monitoring model; the association field is used for recording a second file identifier of association file data with direct or indirect association relation with the changed file data;

and issuing the maintained associated field to the target database through the data monitoring model, so that when the target database receives the file inquiry request, inquiring and counting the file identifier to be inquired carried by the file inquiry request based on the maintained associated field to obtain a file inquiry result.

In one embodiment, the data listening model includes Debezium units, the Debezium units being provided with DebeziumConfig and DebeziumEngine classes;

and monitoring the data change of the archive data stored by each database in the database cluster through the data monitoring model, wherein the data monitoring model comprises the following steps of:

Establishing a connection with each database in the database cluster based on the DebeziumConfig classes; wherein, the DebeziumConfig class at least comprises address information, port information and account information of each database;

After the connection between the data base and each database is established successfully, loading DebeziumEngine types into global environment variables, and monitoring the data change of the archive data stored in each database by using an asynchronous thread; wherein the DebeziumEngine classes are used to declare data reception formats and data processing functions.

In one embodiment, the data processing functions include a time decision function, a data class decision function, and an operation class decision function;

When the target database is monitored to be changed, generating a data change message corresponding to the target database, wherein the data change message comprises the following steps:

if data change information sent by a database in the data receiving format is received, determining the database as a target database, and determining that the target database is monitored to be in data change; wherein, the data change information carries a first file identifier of changed file data;

Determining the change time corresponding to the data change information through the time judging function; and determining a first data category of changed archive data related to the data change information through the data category judging function; determining an operation category corresponding to the data change information through the operation category judging function;

and generating a data change message corresponding to the target database based on the change time, the first data category and the operation category corresponding to the data change information.

In one embodiment, the data listening model further includes a Kafka unit configured with Topic and consumer corresponding to each data category;

Before maintaining the association field corresponding to the changed archive data based on the data change message and the first data category of the changed archive data, the method further comprises:

transmitting a data change message to a target Topic corresponding to the first data category through the Debezium unit;

Based on the data change message and the first data category of the changed archive data, maintaining an associated field corresponding to the changed archive data, including:

And maintaining the associated field corresponding to the changed archive data according to the data change information and the operation category corresponding to the data change information by the target consumer corresponding to the first data category.

In one embodiment, the first data category is a parent data category, the target consumer is a parent consumer, and the association field includes at least one level association child level field;

maintaining, by the target consumer corresponding to the first data category of the changed archive data, an association field corresponding to the changed archive data according to the data change information and an operation category corresponding to the data change information, including:

If the operation type corresponding to the data change information is a new operation or a modification operation, searching downwards for a second file identifier and a second data type of sub-level associated file data with a direct or indirect association relationship with the first file identifier of the changed file data through the father-level consumer; wherein the second data category is a sub-level data category;

For each sub-level association archive data, determining a first target association sub-level field to be newly added or modified according to the sub-level data category to which the sub-level association archive data belongs, and newly adding or modifying the first target association sub-level field based on the second file identification of the sub-level association archive data.

In one embodiment, the first data category is a sub-level data category, the target consumer is a sub-level consumer, and the association field is a second target association sub-level field corresponding to the sub-level data category;

Maintaining, by the target consumer corresponding to the first data category of the changed archive data, an association field corresponding to the changed archive data according to the data change information and the operation category corresponding to the data change information, and further including:

searching parent-level associated archive data with direct or indirect association relation with a first archive identification of the changed archive data upwards through the child-level consumer;

Determining a second target association sub-level field from at least one level association sub-level field corresponding to the parent level association archive data according to the sub-level data category;

If the operation category corresponding to the data change information is a new operation or a modification operation and the first file identifier of the changed file data is not stored in the second target association sub-level field, the second target association sub-level field is newly added or modified based on the first file identifier;

Or if the operation type corresponding to the data change information is a deletion operation and the first file identifier of the changed archive data is stored in the second target association sub-level field, deleting the first file identifier from the second target association sub-level field.

In one embodiment, the Debezium unit is further provided with a sync class; the method further comprises the steps of:

Querying each parent level history archive data stored in each database through the synchronization class;

For each parent level history archive data, querying each child level history archive data directly or indirectly associated with the parent level history archive data;

Generating each level of associated sub-level field corresponding to the parent level of history archive data according to the sub-level data category to which each sub-level history archive data belongs and the archive identification of each sub-level history archive data.

In a second aspect, an embodiment of the present invention further provides a streaming statistics device for archive data, where the device is applied to a server, and the server is configured with a data listening model, and the device includes:

The monitoring module is used for monitoring the data change of the archive data stored in each database in the database cluster through the data monitoring model, and generating a data change message corresponding to the target database when the target database is monitored to be in the data change; wherein the data change message at least comprises a first file identifier of changed file data;

The maintenance module is used for maintaining the associated field corresponding to the changed archive data based on the data change message and the first data category of the changed archive data through the data monitoring model; the association field is used for recording a second file identifier of association file data with direct or indirect association relation with the changed file data;

And the issuing module is used for issuing the maintained associated field to the target database through the data monitoring model, so that when the target database receives the file inquiry request, the file identification to be inquired carried by the file inquiry request is inquired and counted based on the maintained associated field, and a file inquiry result is obtained.

In a third aspect, embodiments of the present invention also provide a server comprising a processor and a memory storing computer executable instructions executable by the processor, the processor executing the computer executable instructions to implement the method of any one of the first aspects.

In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium storing computer-executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of any one of the first aspects.

The method, the device, the server and the readable storage medium for stream statistics of the archive data are applied to the server, the server is configured with a data monitoring model, the data monitoring model monitors the data change of the archive data stored in each database, corresponding data change information is generated when the data change is monitored, further, the associated fields corresponding to the changed archive data are maintained based on the data change information and the first data type of the changed archive data, the maintained associated fields are issued to the target database, and the target database can feed back the archive query result based on the maintained associated fields when receiving an archive query request, which is equivalent to converting the archive query process into stream operation.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a method for stream statistics of archive data according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating another method for stream statistics of archive data according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a stream statistics device for archive data according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described in conjunction with the embodiments, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

At present, with the development of business and the increase of data volume, the data efficiency of directly or indirectly related invoices, receipts and the like is very slow through inquiring the certificates firstly through the certificate list page and then inquiring the certificates layer by layer, and interface overtime and the like often occur. And the method is also extremely occupied for the service memory, when the association relation is more and the hierarchy is deeper, data can be greatly cached in the memory, so that the service memory overflows, and even the service downtime can be caused.

Based on the above, the embodiment of the invention provides a stream statistics method, a stream statistics device, a stream statistics server and a stream statistics storage medium for archive data, which can obviously relieve the pressure on a service memory when inquiring archive data and can also effectively improve the inquiring efficiency of archive data.

For the convenience of understanding the present embodiment, first, a detailed description will be given of a method for stream statistics of archive data disclosed in the present embodiment, where the method is applied to a server, and the server is configured with a data monitoring model, and refer to a flow chart of a method for stream statistics of archive data shown in fig. 1, and the method mainly includes the following steps S102 to S106:

Step S102, monitoring data change of archive data stored in each database in the database cluster through a data monitoring model, and generating a data change message corresponding to the target database when monitoring that the target database is in the data change. The data change message at least comprises a first file identification of the changed file data, wherein the first file identification is a file ID (Identity document, identification number) of the changed file data. The data change message may further include a change time, a first data category, and an operation category corresponding to the data change information. The first data category, i.e., the category of changed profile data, may include a parent data category and a child data category, and exemplary, the parent data category may be a receipt category, and the child data category may include a receipt category and an invoice category, wherein the next level of the receipt category is a receipt category, and the next level of the receipt category is an invoice category. The operation types, i.e. the types of operations performed on the changed archive data, may include an add operation, a modify operation, and a delete operation. The database cluster may be a MySQL database cluster.

In one embodiment, the data monitoring model includes Debezium units and Kafka units, and monitors, through Debezium units, data changes of archive data stored in each database, generates corresponding data change messages according to a first archive identifier, a first data category, a change time and an operation category for operating on the changed archive data when the data changes are monitored, and sends the data change messages to the Kafka units so as to maintain association fields corresponding to the changed archive data through the Kafka units.

Step S104, maintaining the associated field corresponding to the changed archive data based on the data change message and the first data category of the changed archive data through the data monitoring model. The association field is used for recording a second file identifier of the associated file data with a direct or indirect association relationship with the changed file data, and the second file identifier is the file ID of the associated file data.

In one embodiment, the Kafka unit is provided with a Topic corresponding to each data type, and the data change message can be received through a target Topic corresponding to the first data type, and the consumption mechanism of the Kafka unit is utilized to broadcast the data change message, so that all consumers monitoring the target Topic can receive the corresponding data change message. In addition, the consumers corresponding to each data type, such as a father-level consumer (e.g., a certificate change consumer) and a child-level consumer (e.g., a receipt change consumer) are respectively configured in the Kafka unit, so that the file identification of each receipt data and the file identification of each invoice data associated with each certificate data are accurately recorded by respectively processing the file data of the certificate type (short for certificate data), the file data of the receipt type (short for receipt data) and the file data of the invoice type (short for invoice data) change by the corresponding consumers.

Step S106, the maintained association field is issued to the target database through the data monitoring model, so that when the target database receives the file inquiry request, the file identification to be inquired carried by the file inquiry request is inquired and counted based on the maintained association field, and a file inquiry result is obtained. The archive query result may include archive data corresponding to the archive identifier to be queried, and assuming that the archive data corresponding to the archive identifier to be queried is document data, the archive query result may further include an archive identifier and an archive number of receipt data associated with the document data, and/or an archive identifier and an archive number of invoice data associated with the document data.

According to the streaming statistics method for the archive data, the data monitoring model monitors the data change of the archive data stored in each database, and generates the corresponding data change message when the data change is monitored, further, the corresponding association field of the changed archive data is maintained based on the data change message and the first data type of the changed archive data, the maintained association field is issued to the target database, and the target database can feed back the archive query result based on the maintained association field when receiving an archive query request, which is equivalent to converting the archive query process into streaming operation.

The core implementation of the embodiment of the invention is to convert the something which can not be done by the prior file real-time query into the streaming operation according to the service scene by utilizing the streaming operation capability of Kafka and Debezium, solve the problem of low efficiency of directly carrying out the recursive query on the original association relation table, improve the query efficiency and improve the data query efficiency. Specifically, the embodiment of the invention designs a data monitoring model of Kafka and Debezium, wherein a Debezium unit is provided with DebeziumConfig types and DebeziumEngine types, and the Kafka unit is configured with Topic and consumers corresponding to each data type.

On the basis of the data monitoring model, the embodiment of the present invention firstly explains step S102, in practical application, debezium units are used to monitor the archive table (i.e. the archive data) in the MySQL database cluster, and the data change is sent to the archive Topic of the Kafka unit, where the Topic includes database operations such as modification, addition, deletion, etc. of the archive table. Specifically, see the following (1) to (6):

(1) A connection is established with each database in the database cluster based on the DebeziumConfig classes. The DebeziumConfig classes include at least address information, port information, and account information (i.e., account number and password) of each database, and may also include database types. In practical application, the DebeziumConfig class is written in the code, and the DebeziumConfig class contains information such as address account number, password, port, database type and the like of the database, so that the method is used for connecting and monitoring the database.

(2) After the connection between each database is established successfully, debeziumEngine classes are loaded into the global environment variable, and data change of archive data stored in each database is monitored by an asynchronous thread. The DebeziumEngine class is used for declaring a data receiving format and a data processing function, the data receiving format can be JOSN format, the data processing function comprises a time judging function, a data class judging function and an operation class judging function, the time judging function is used for judging the change time of the archive data, the data class judging function is used for judging the data class of the changed archive data, and the operation class judging function is used for judging the class of an operation performed on the changed archive data.

In one embodiment, debeziumEngine classes are written and loaded into global environment variables. Running in an asynchronous thread mode, including the life cycle of the whole MySQL database cluster connection, declaring the format of the received data change information in DebeziumEngine classes, and by way of example, setting the data receiving format to be a JSON format to receive the data change information in JOSN format.

(3) If the data change information sent by the database in the data receiving format is received, the database is determined to be a target database, and the target database is determined to be monitored to store the data change. The data change information carries a first file identifier of the changed file data. In one embodiment, a thread listens to a database, and when a thread receives JOSN format data change information sent by the database, the database corresponding to the thread can be determined as a target database where data change occurs.

(4) Determining the changing time corresponding to the data changing information through a time judging function; and determining a first data category of the changed archive data related to the data change information through a data category judging function; and determining the operation type corresponding to the data change information through the operation type judging function. In practical application, a function method (namely, a time judging function) for changing the data time is written in DebeziumEngine types, and the changing time corresponding to the data changing information is determined through the function method; in addition, since the data change message needs to be sent to the corresponding target Topic according to the category, the first data category of the changed archive data needs to be determined through the data category judging function; the embodiment of the invention only processes the data change generated after the new operation, the modification operation and the deletion operation, so that an operation type judging function is needed to judge whether the operation acted on the changed archive data belongs to the three types of operations, and if the operation belongs to the three types of operations, a corresponding data change message is generated.

(5) And generating a data change message corresponding to the target database based on the change time, the first data category and the operation category corresponding to the data change information.

(6) And sending the data change message to the target Topic corresponding to the first data category through the Debezium unit. The archive Topic comprises a certificate Topic, a receipt Topic and an invoice Topic. In one embodiment, the first file identification, change time, first data category and operation category may be sent to the Kafka unit in the form of a message. For example, assuming that the changed profile data is credential data, a data change message is sent to the credential Topic; assuming that the changed file data is receipt data, sending a data change message to a receipt Topic; assuming that the changed profile data is invoice data, a data change message is sent to invoice Topic.

Further, the statistics logic provided in the step S104 is also a core idea of the embodiment of the present invention. When processing data, the written credential associated data statistics logic performs operation statistics on the data according to the need and the class, and rapidly and accurately statistics the data to credentials according to business scenes, thereby defining the duplication removal logic and deletion logic of the data, and being rapid and accurate.

In order to facilitate understanding, the embodiment of the present invention provides an implementation manner of step S104, where the association field corresponding to the changed archive data may be maintained by the target consumer corresponding to the first data type according to the data change information and the operation type corresponding to the data change information. In one embodiment, the message may be broadcast using a kakfa subscription model, allowing all consumers listening to the topic to receive the corresponding archive change message. Meanwhile, multiple types of consumers are configured to respectively process the change of different data, and voucherConsumer (certificate change consumer), receiptConsumer (receipt change consumer) and invoiceConsumer (invoice change consumer) are respectively written, wherein voucherConsumer monitors and calculates the change of the certificate data, receiptConsumer monitors and calculates the change of the receipt data, and invoiceConsumer monitors and calculates the change of the invoice data. Optionally, after receiving the message, the type of the data in the message may be determined separately, so as to facilitate the monitoring calculation thereof by the corresponding consumer.

The embodiment of the invention provides an implementation manner for maintaining associated fields corresponding to changed archive data according to different data types, which is referred to as a first mode to a second mode:

Mode one: the first data category is a parent data category, the target consumer is a parent consumer, and the association field includes at least one level association child level field. Illustratively, the parent class data category is a credential category, the parent consumer is a credential change consumer, and the associated child field is an associated receipt field and an associated invoice field.

On this basis, when the step of maintaining the associated field corresponding to the changed archive data is performed, the following steps a1 to a2 can be referred to:

And a step a1, if the operation type corresponding to the data change information is the new operation or the modification operation, searching downwards for a second file identifier and a second data type of the sub-level associated file data with a direct or indirect association relation with the first file identifier of the changed file data through the father-level consumer. Wherein the second data category is a sub-level data category. The sub-level associated archive data is archive data of a receipt type (abbreviated as receipt data) and/or archive data of an invoice type (abbreviated as invoice data). For example, assuming that the changed profile data is the profile data of the credential category, the consumer is changed by the credential, looking down the receipt data associated with the credential data, continuing looking down the invoice data associated with the receipt data, and determining the profile identification for each receipt data and each invoice data.

Step a2, for each sub-level associated archive data, determining a first target associated sub-level field to be newly added or modified according to the sub-level data category to which the sub-level associated archive data belongs, and newly adding or modifying the first target associated sub-level field based on the second archive identification of the sub-level associated archive data. The first target association sub-level field corresponding to the receipt data is an association receipt field of the credential data, and the first target association sub-level field corresponding to the invoice data is an association invoice field of the credential data. For example, if the sub-level associated archive data is receipt data, the archive ID of the receipt data is newly added to the associated receipt field, or the associated receipt field is modified; similarly, if the sub-level associated profile data is invoice data, the profile ID of the invoice data is newly added to the associated invoice field, or the associated invoice field is modified.

In practical application, firstly, determining an operation type in a data change message, and if the data change message is newly added or modified credential data, recursively inquiring a corresponding association relationship according to the credential ID so as to sequentially find directly or indirectly associated archive data, and then judging the data type of each archive data. If the receipt is of the receipt type, recording the file ID (abbreviated as receipt ID) of the receipt data, and inserting all the associated receipt IDs into the associated receipt field of the credential data in comma segmentation; similarly, if the invoice is classified, the archive ID (invoice ID) of the invoice data is recorded, and all the associated invoice IDs are inserted into the associated invoice fields of the credential data by comma separation.

If the operation type is delete, then no processing is needed, the data is deleted, and no query is needed, and the data is ignored.

Mode two: the first data category is a sub-level data category, the target consumer is a sub-level consumer, and the associated field is a second target associated sub-level field corresponding to the sub-level data category. Illustratively, the sub-level data category is a receipt category or an invoice category, and the sub-level consumer is a receipt change consumer or an invoice change consumer. If the sub-level data category is a receipt category, the associated field is an associated receipt field; if the sub-level data category is an invoice category, the associated field is an associated invoice category.

On this basis, when the step of maintaining the associated field corresponding to the changed archive data is performed, the following steps b1 to b4 may be referred to:

Step b1, searching up the father-level associated archive data with direct or indirect association relation with the first archive identification of the changed archive data through the child-level consumers. Wherein the father-level associated archive data is credential data. In one embodiment, when the operation type is new, modified or deleted, reverse searching is performed, recursion is performed on files related to the upper-level searching according to data in sequence, and all file data of the upper level are queried out. Taking the changed file data as the receipt data as an example, the receipt change consumer searches the credential data associated with the receipt data upwards; taking the invoice data as an example, the invoice change consumer searches the receipt data associated with the invoice data upwards, and continues to search the credential data associated with the receipt data upwards.

And b2, determining a second target association sub-level field from at least one level association sub-level field corresponding to the father level association archive data according to the sub-level data category. Wherein the credential data corresponds to associated receipt data and associated invoice data. Taking the changed archive data as the receipt data as an example, determining the associated receipt data of the credential data as a second target associated sub-level field; taking the changed archive data as invoice data as an example, determining the associated invoice data of the credential data as a second target associated sub-level field.

And b3, if the operation type corresponding to the data change information is the newly added operation or the modified operation and the first file identification of the changed file data is not stored in the second target association sub-level field, the second target association sub-level field is newly added or modified based on the first file identification. Taking the changed file data as the receipt data as an example, for the new operation or the modification operation, if the receipt ID of the new or the modification exists in the associated receipt field, ignoring the receipt ID; if the newly added or modified receipt ID does not exist in the associated receipt data, the receipt ID is newly added or modified into the associated receipt field. Similarly, taking the changed file data as invoice data as an example, for the newly added operation or the modified operation, if the newly added or modified invoice ID exists in the associated invoice field, the newly added or modified invoice ID is ignored; and if the invoice ID newly added or modified at this time does not exist in the associated invoice data, the invoice ID is newly added or modified into an associated invoice field.

And b4, if the operation type corresponding to the data change information is a deleting operation and the first file identifier of the changed file data is stored in the second target association sub-level field, deleting the first file identifier from the second target association sub-level field. Taking the receipt data as an example, for deleting operation, if the receipt ID deleted at this time is stored in the associated receipt field, deleting the receipt ID from the associated receipt field, and if not, ignoring the receipt ID; similarly, taking invoice data as an example, for deleting operation, if the invoice ID deleted this time is stored in the associated invoice field, deleting the invoice ID from the associated invoice field, and if not, ignoring.

In consideration of the fact that the historical archive data cannot be directly inquired, after the inquiry mode is modified, no corresponding receipt and invoice statistics data exist below the historical archive data, so that after the stream monitoring mechanism is written, the statistics data are initialized once in a full quantity according to the dimensionality of the archive data, the existing historical archive data are processed, and new change data can be changed according to stream operation. In one implementation manner, the Debezium unit is further provided with a synchronization class, so that the embodiment of the invention provides an implementation manner of directly or indirectly associating the receipt and the invoice data initialization under the condition of writing the full-volume certificate, and the implementation manner is specifically referred to as the following (one) to (three):

and (one) querying each parent level history archive data stored in each database through a synchronization class. In one embodiment, all credential data is queried out through the sync class.

And (II) for each parent level history archive data, querying each child level history archive data directly or indirectly associated with the parent level history archive data. In one embodiment, for each document data, the associated profile data of the document data is recursively queried one by one to fully discover its subordinate associated profile data (including receipt data and invoice data).

And thirdly, generating each level of associated sub-level field corresponding to the father level history archive data according to the sub-level data category to which each sub-level history archive data belongs and the archive identification of each sub-level history archive data. In one embodiment, the receipt ID of the queried receipt data is filled into the associated receipt field of the corresponding credential data, and the invoice ID of the queried invoice data is filled into the associated invoice field of the corresponding credential data.

And (3) filling the associated invoice field of the historical data and filling the associated receipt field. When the associated relation data changes, the user can quickly perceive the change of the data from the second step, inquire the change of the data and adjust the change in time, but the second operation is too slow and time-consuming for the historical data, so that the direct association or indirect association receipt and invoice data initialization under the condition of writing a full-quantity certificate is written. In specific implementation, a synchronization class can be written, all the credential class data are queried, the associated data of the credential data are queried recursively one by one, and the lower associated data are all discovered. The required directly/indirectly associated receipt and invoice data are recorded in the voucher data by category. Thus, the initialization of the credential data is completed.

On the basis of the foregoing embodiment, when the database receives the archive query request, the data of the credential may be already processed in advance, so that the data statistics associated directly or indirectly under the credential may be queried directly by using the efficient real-time streaming operation result. By acquiring the values processed by the embodiment, the related receipt data and invoice data can be directly inquired in the dimension of the certificate, so that the inquiry efficiency is greatly improved.

In order to solve the problems that the statistics of the number of directly or indirectly associated receipts and invoices cannot be directly queried to obtain results, the efficiency is low and the performance is poor, the embodiments of the invention adopt Kafka and Debezium to construct real-time streaming operation, carry out message notification on data change, establish a receipt and invoice statistics consumption mechanism, and can more effectively and accurately count the receipt and invoice associated with the receipt in real time by writing the counted real-time results into corresponding receipt data.

For the convenience of understanding the foregoing embodiments, the embodiment of the present invention further provides an application example of a streaming statistics method for archive data, referring to a flowchart of another streaming statistics method for archive data shown in fig. 2, the method mainly includes the following steps S202 to S208:

In step S202, a data snoop model of Kafka and Debezium is designed to send all archive data changes to the archive Topic of Kafka. In particular, reference may be made to the foregoing (1) to (6), which will not be described in detail in the embodiments of the present invention.

Step S204, the consumer aiming at the archive table topic is compiled, and the message is broadcast by using the consumption mechanism of kafka, so that all the consumers monitoring the topic can receive the corresponding archive change message; the method comprises the steps of respectively compiling a voucher changing consumer, a receipt changing consumer and an invoice changing consumer, respectively monitoring and calculating the voucher data change, calculating the receipt data change and calculating the invoice data change. In particular, reference may be made to the foregoing steps 1 to 2, which are not described in detail in the embodiments of the present invention.

Step S206, writing the direct association or indirect association receipt and invoice data initialization under the full amount of certificates. In particular, reference may be made to the foregoing (one) to (three), and details thereof are not repeated in the embodiments of the present invention.

Step S208, directly inquiring the related receipt and invoice statistics in the dimension of the certificate.

The core implementation of the embodiment of the invention is to write credential associated data statistics logic according to a service scene by utilizing the streaming operation capability of kafka and Debezium, and convert things which cannot be done by the prior real-time query into streaming operation. And based on the established statistical logic, the required data is quickly and accurately queried. And meanwhile, the receipt and invoice can be not limited. Other documents associated with the credentials are equally applicable.

The method for counting the file data in the embodiment of the invention provides a method for counting the number of the associated receipts or invoices under the vouchers in the electronic accounting file management flow, and in other similar products, related data statistics does not exist, and the data is the core competitiveness of the embodiment of the invention, plays a key role in the later business development and has great attraction for users to purchase the product.

In summary, the streaming statistics method for archive data provided by the embodiment of the invention can improve the query efficiency, ensure the accuracy and the integrity of the query result, and avoid the possible problems of other query optimization schemes. The embodiment of the invention has at least the following characteristics:

(1) The query efficiency is improved: because the association level of data is deeper and the data is more, the data recursion query efficiency is slower by using mysql layer by layer, and the data statistics of direct or indirect association of the credentials cannot be quickly retrieved on the credential list page. The streaming capabilities of kafka and Debezium and self-defined statistical logic are used. The required data can be efficiently queried. The query efficiency is further improved;

(2) Improving user experience: by the implementation scheme of the invention, the user can more quickly and accurately search the dynamic fields in the service organization, and the use experience of the user is improved;

(3) Data consistency and accuracy are guaranteed: the invention realizes a data synchronization mechanism, ensures the synchronous update of the data in the certificate, and ensures the consistency and accuracy of the data;

(4) Scalability: the implementation scheme of the invention is based on kafka for subscribing and consuming the message, and has good expandability. When other categories of data associated with the statistical credential are required. The required data can be accurately counted only by compiling the corresponding consumers for judgment, and the codes do not need to be modified on a large scale.

For the streaming statistics method of archive data provided in the foregoing embodiment, the embodiment of the present invention provides a streaming statistics device of archive data, where the device is applied to a server, and the server is configured with a data monitoring model, and referring to a schematic structural diagram of the streaming statistics device of archive data shown in fig. 3, the device mainly includes the following parts:

the monitoring module 302 is configured to monitor, through a data monitoring model, a data change of archive data stored in each database in the database cluster, and generate a data change message corresponding to the target database when it is monitored that the target database is in the data change; the data change message at least comprises a first file identifier of changed file data;

The maintenance module 304 is configured to maintain, through the data monitoring model, an association field corresponding to the changed archive data based on the data change message and the first data category of the changed archive data; the association field is used for recording a second file identifier of association file data with direct or indirect association relation with the changed file data;

and the issuing module 306 is configured to issue the maintained association field to the target database through the data monitoring model, so that when the target database receives the file query request, the query and statistics processing are performed on the file identifier to be queried carried by the file query request based on the maintained association field, so as to obtain a file query result.

According to the streaming statistics device for the archive data, the data monitoring model monitors the data change of the archive data stored in each database, and generates the corresponding data change message when the data change is monitored, further, the corresponding association field of the changed archive data is maintained based on the data change message and the first data type of the changed archive data, the maintained association field is issued to the target database, and the target database can feed back the archive query result based on the maintained association field when receiving an archive query request, which is equivalent to converting the archive query process into streaming operation.

In one embodiment, the data snoop model includes Debezium units, debezium units are provided with DebeziumConfig and DebeziumEngine classes;

The listening module 302 is further configured to:

Establishing a connection with each database in the database cluster based on the DebeziumConfig classes; wherein DebeziumConfig classes at least comprise address information, port information and account information of each database;

After the connection between each database is established successfully, debeziumEngine classes are loaded into global environment variables, and data change of archive data stored in each database is monitored by an asynchronous thread; wherein the DebeziumEngine class is used to declare a data reception format and a data processing function.

The listening module 302 is further configured to:

if data change information sent by the database in a data receiving format is received, determining the database as a target database, and determining that the target database is monitored to be in data change; the data change information carries a first file identifier of changed file data;

Determining the changing time corresponding to the data changing information through a time judging function; and determining a first data category of the changed archive data related to the data change information through a data category judging function; determining an operation type corresponding to the data change information through an operation type judging function;

In one embodiment, the data listening model further comprises a Kafka unit configured with Topic and consumer corresponding to each data category;

The system also comprises a message sending module for:

Transmitting the data change message to a target Topic corresponding to the first data category through a Debezium unit;

The maintenance module 304 is further configured to:

And maintaining the associated field corresponding to the changed archive data according to the data change information and the operation type corresponding to the data change information by the target consumer corresponding to the first data type.

The maintenance module 304 is further configured to:

If the operation type corresponding to the data change information is the new operation or the modification operation, searching downwards for a second file identifier and a second data type of the sub-level associated file data with a direct or indirect association relation with the first file identifier of the changed file data through the father-level consumer; wherein the second data category is a sub-level data category;

For each sub-level association archive data, determining a first target association sub-level field to be newly added or modified according to the sub-level data category to which the sub-level association archive data belongs, and based on a second file identification of the sub-level association archive data, newly adding or modifying the first target association sub-level field.

The maintenance module 304 is further configured to:

Searching parent-level associated archive data with direct or indirect association relation with the first archive identification of the changed archive data upwards through the child-level consumer;

Determining a second target associated sub-level field from at least one associated sub-level field corresponding to the parent level associated archive data according to the sub-level data category;

if the operation type corresponding to the data change information is a new operation or a modification operation and the first file identifier of the changed file data is not stored in the second target associated sub-level field, the second target associated sub-level field is newly added or modified based on the first file identifier;

or if the operation type corresponding to the data change information is a deleting operation and the first file identification of the changed file data is stored in the second target association sub-level field, deleting the first file identification from the second target association sub-level field.

In one embodiment, debezium units are also provided with a sync class; the system also comprises a history archive data processing module for:

querying each parent level history archive data stored in each database through a synchronization class;

generating each level of associated sub-level field corresponding to the parent level history archive data according to the sub-level data category to which each sub-level history archive data belongs and the archive identification of each sub-level history archive data.

The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned.

The embodiment of the invention provides a server, which specifically comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the embodiments described above.

Fig. 4 is a schematic structural diagram of a server according to an embodiment of the present invention, where the server 100 includes: a processor 40, a memory 41, a bus 42 and a communication interface 43, the processor 40, the communication interface 43 and the memory 41 being connected by the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.

The memory 41 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and the at least one other network element is achieved via at least one communication interface 43 (which may be wired or wireless), which may use the internet, a wide area network, a local network, a metropolitan area network, etc.

Bus 42 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.

The memory 41 is configured to store a program, and the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus for flow defining disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40 or implemented by the processor 40.

The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in processor 40. The processor 40 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 41 and the processor 40 reads the information in the memory 41 and in combination with its hardware performs the steps of the method described above.

The computer program product of the readable storage medium provided by the embodiment of the present invention includes a computer readable storage medium storing a program code, where the program code includes instructions for executing the method described in the foregoing method embodiment, and the specific implementation may refer to the foregoing method embodiment and will not be described herein.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for streaming statistics of archive data, the method being applied to a server configured with a data listening model, the method comprising:

The maintained association field is issued to the target database through the data monitoring model, so that when the target database receives a file inquiry request, the file identification to be inquired carried by the file inquiry request is inquired and counted based on the maintained association field, and a file inquiry result is obtained;

The data monitoring model comprises Debezium units, wherein the Debezium units are provided with DebeziumConfig classes and DebeziumEngine classes; and monitoring the data change of the archive data stored by each database in the database cluster through the data monitoring model, wherein the data monitoring model comprises the following steps of: establishing a connection with each database in the database cluster based on the DebeziumConfig classes; wherein, the DebeziumConfig class at least comprises address information, port information and account information of each database; after the connection between the data base and each database is established successfully, loading DebeziumEngine types into global environment variables, and monitoring the data change of the archive data stored in each database by using an asynchronous thread; wherein the DebeziumEngine classes are used for declaring a data receiving format and a data processing function;

The data processing function comprises a time judging function, a data category judging function and an operation category judging function; when the target database is monitored to be changed, generating a data change message corresponding to the target database, wherein the data change message comprises the following steps: if data change information sent by a database in the data receiving format is received, determining the database as a target database, and determining that the target database is monitored to be in data change; wherein, the data change information carries a first file identifier of changed file data; determining the change time corresponding to the data change information through the time judging function; and determining a first data category of changed archive data related to the data change information through the data category judging function; determining an operation category corresponding to the data change information through the operation category judging function; and generating a data change message corresponding to the target database based on the change time, the first data category and the operation category corresponding to the data change information.

2. The streaming statistics method of archive data according to claim 1, wherein the data listening model further comprises a Kafka unit configured with Topic and consumer corresponding to each data category;

3. A method of streaming statistics of archival data according to claim 2, wherein the first data category is a parent data category, the target consumer is a parent consumer, and the association field comprises at least one level of association sub-level field;

4. A method of streaming statistics of archive data according to claim 2, wherein the first data category is a sub-level data category, the target consumer is a sub-level consumer, and the association field is a second target association sub-level field corresponding to the sub-level data category;

5. A method of streaming statistics of archive data according to claim 1, wherein the Debezium unit is further provided with a synchronization class; the method further comprises the steps of:

generating each level of associated sub-level field corresponding to the parent level of history archive data according to the sub-level data category to which each sub-level of history archive data belongs and the archive identifier of each sub-level of history archive data.

6. A streaming statistics apparatus for archive data, the apparatus being applied to a server configured with a data listening model, the apparatus comprising:

the issuing module is used for issuing the maintained associated field to the target database through the data monitoring model, so that when the target database receives the file inquiry request, the file identification to be inquired carried by the file inquiry request is inquired and counted based on the maintained associated field, and a file inquiry result is obtained;

The data monitoring model comprises Debezium units, wherein the Debezium units are provided with DebeziumConfig classes and DebeziumEngine classes; the monitoring module is specifically configured to: establishing a connection with each database in the database cluster based on the DebeziumConfig classes; wherein, the DebeziumConfig class at least comprises address information, port information and account information of each database; after the connection between the data base and each database is established successfully, loading DebeziumEngine types into global environment variables, and monitoring the data change of the archive data stored in each database by using an asynchronous thread; wherein the DebeziumEngine classes are used for declaring a data receiving format and a data processing function;

The data processing function comprises a time judging function, a data category judging function and an operation category judging function; the monitoring module is specifically configured to: if data change information sent by a database in the data receiving format is received, determining the database as a target database, and determining that the target database is monitored to be in data change; wherein, the data change information carries a first file identifier of changed file data; determining the change time corresponding to the data change information through the time judging function; and determining a first data category of changed archive data related to the data change information through the data category judging function; determining an operation category corresponding to the data change information through the operation category judging function; and generating a data change message corresponding to the target database based on the change time, the first data category and the operation category corresponding to the data change information.

7. A server comprising a processor and a memory, the memory storing computer executable instructions executable by the processor, the processor executing the computer executable instructions to implement the method of any one of claims 1 to 5.

8. A computer readable storage medium storing computer executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of any one of claims 1 to 5.