Nothing Special   »   [go: up one dir, main page]

CN1223951C - Self adapting history data compression method - Google Patents

Self adapting history data compression method Download PDF

Info

Publication number
CN1223951C
CN1223951C CN 02120383 CN02120383A CN1223951C CN 1223951 C CN1223951 C CN 1223951C CN 02120383 CN02120383 CN 02120383 CN 02120383 A CN02120383 A CN 02120383A CN 1223951 C CN1223951 C CN 1223951C
Authority
CN
China
Prior art keywords
data
compression
value
current
slope
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 02120383
Other languages
Chinese (zh)
Other versions
CN1459743A (en
Inventor
王宏安
金宏
王强
戴国忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN 02120383 priority Critical patent/CN1223951C/en
Publication of CN1459743A publication Critical patent/CN1459743A/en
Application granted granted Critical
Publication of CN1223951C publication Critical patent/CN1223951C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention relates to a self adapting data compression method for compressing and processing history data in a database, which comprises the procedures that compression time is judged, whether the measuring time of the existing measuring value is in a specified compression time interval or not is judged; when a time difference of the existing measuring value is small, the following procedure is not executed, and next measuring data is continue to be examined; when the time difference of the existing measuring value is large, the previous value of the existing value is stored so that the previous value is used as a start point and a new previous memory point of the next data compression test; gradient is calculated, and the gradient of the existing value, the existing maximum gradient and the minimum gradient are calculated; compression deviation parameters of the measuring data in different time are calculated in a self-adapting dynmaic mode according to the existing maximum or minimum gradient; an upper bound and a lower bound of the gradient of the existing value are calculated by the new calculated compression deviation parameters; the compress test is judged, and the existing value stores the previous value of the existing value by the compression test so that the previous value is used as the start point and the new previous memory point of the next data compression test; otherwise, the next new data point is continue to be tested.

Description

Adaptive historical data compression method
Technical field
The present invention relates to the compress technique of historical data in real-time data base, especially a kind of adaptive historical data compression method.
Background technology
For a real-time data base, because data volume is huge, the management of data archival is undoubtedly one of its most important parts, and can crucial technology promptly be and store efficiently and visit data.Because the historical data amount is big, directly preservation needs very big physical storage medium if do not compress, and therefore the compression to historical data just becomes one of most important technology in the real-time data base.
In real-time dataBase system, the use of historical data compress technique is in order to be used to reduce disk space on traditional sense, at different application, data compression has several different methods, real-time dataBase system not only requirement can be stored a large amount of historical datas in limited hard drive space, but also requires these data can be accessed apace.Yet in the subject matter of using compress technique to occur later on be, in compressed database, handle inquiry and become very slow that this just needs the compressing data method to propose certain requirement for restriction, carry out data compression according to the characteristics of process data.For a real-time data base, because data volume is huge, the management of data archival is undoubtedly one of its most important parts, and can crucial technology promptly be and store efficiently and visit data.Because the historical data amount is big, directly preservation needs very big physical storage medium if do not compress, thereby the compression of historical data is just become one of most important technology in the real-time data base.
In the production information management and the development and development of decision support system (DSS) of the applicant in petrochemical industry application software demoncal ration engineering, developed a kind ofly collect the real-time data base technology, The Technology of Temporal Database is the real-time dataBase system AGILOR of one, in this system, compression, the filing management of historical data carried out primary study.
In the research and development process of historical data compress technique, used for reference the swinging door compression algorithm of U.S. OSI company exploitation.In the revolving door algorithm, the compression straggling parameter is got and is decided to be constant.Experimental data shows that at same compression straggling parameter, to the data of some tests, the revolving door algorithm has higher ratio of compression; And to the data of some other test, it is little many that ratio of compression is wanted.
Usually, any measurement device data can be divided into three parts, first is relative changing unit, the physical change process that depends on measurement parameter, second portion is normal value part, is absolute, can predict according to historical data in the past, third part is the measurement noise of sensor or is called measuring error, depends on the measuring accuracy of sensor.In this three part, normal value part does not have influence to the revolving door algorithm, and an influence is other two parts.Usually relatively changing unit is a continually varying, clocklike, and measuring error be at random, uncertain.When the sensor measurement precision was high, measuring error was just little, and at this moment data compression rate depends primarily on the relative changing unit of data.When the sensor measurement precision was low, measuring error was big, and the compressing data influence is very big.When measuring error arrived to a certain degree greatly, it had covered the relative changing unit of data, at this moment, the relative changing unit of data of description to become meaningless.
Particularly, there is the main deficiency of following two aspects in the revolving door algorithm:
1). the size of compression straggling parameter is bigger to the data compression rate influence that utilizes the revolving door algorithm and obtain;
2). very sensitive to measuring error, when the measuring error of sensor was relatively big, the data compression rate that utilizes the revolving door algorithm to obtain significantly reduced.
Summary of the invention
The purpose of this invention is to provide a kind of adaptive history data compression method, it can compress straggling parameter and utilize the data smoothing technology to suppress to measure noise by choice of dynamical, thereby reduces the influence of the measuring error of sensor to compression algorithm.
Self-adapting data compression method of the present invention comprises step:
Compression time is judged: to the current measured value from the historical data buffer queue, judge that its Measuring Time is whether in corresponding compression time zone;
The calculating of slope: except calculating the currency slope, also calculate current maximum slope and current minimum slope etc.;
The calculating of compression straggling parameter: to difference measurement data constantly, according to current maximum/minimum slope, the straggling parameter of dynamic calculation compression adaptively;
The calculating of the slope upper bound and lower bound: utilize the new compression straggling parameter that calculates, calculate the upper bound and the lower bound of currency slope;
Compression verification judges that (its function is to judge that currency is whether in its compression zone, to determine whether tested pilot needs storage): currency passes through compression verification, system stores the previous value of currency into the historical data Buffer Pool, and as the starting point of next round data compressing and testing and a new last memory point, otherwise continue the next new data point of test.
In the compression time determining step, also comprise and be used for that the data of lienar for variation tendency are carried out pretreated data compression pre-treatment step (its function is that some measurement data with characteristic feature are carried out fast processing, reduce the corresponding compression processing time), comprising to gathering sensitivity test and second order difference upper bound testing procedure.
Also comprise data smoothing treatment step (its function is to weaken to measure the influence of noise to measurement data, improves the ratio of compression of data), wherein adopt the filtering smoothing technique, measurement data is carried out smoothing processing.
Wherein the absolute value expression formula of current second order difference is:
Wherein, y iBe current measured value, τ iPoor for the Measuring Time of the Measuring Time of current measured value and a last measured value.
One of adoptable data smoothing formula of filtering smoothing technique is:
x i=ax i-1+by i+(1-a-b)y i-1
Wherein, x iBe the result after the smoothing processing of current time, x I-1Be the smooth value of previous moment, y iBe the measurement data of current time, y I-1Be the measurement data of previous moment, a and b are the weighting factor greater than zero, and satisfy a+b<1.
Data compression pre-service: the data of lienar for variation tendency are carried out pre-service (its effect is to omit further compression step, shortens the compression time to data), comprising to gathering sensitivity test and second order difference upper bound testing procedure;
Data smoothing is handled: relevant technologies such as employing filtering is level and smooth, measurement data is carried out smoothing processing; When the sensor measurement ratio of precision was higher, the smoothing process of data can be omitted, to save compression time;
The calculating of slope: except calculating the currency slope, also calculate current maximum slope and current minimum slope etc.;
The calculating of compression straggling parameter: to difference measurement data constantly, according to current maximum/minimum slope, the straggling parameter of dynamic calculation compression adaptively;
The calculating of the slope upper bound and lower bound: utilize the new compression straggling parameter that calculates, calculate the upper bound and the lower bound of currency slope;
Compression verification is judged: currency is by compression verification, and system stores the previous value of currency into the historical data Buffer Pool, and as the starting point of next round data compressing and testing and a new last memory point, otherwise continue the next new data point of test.
Description of drawings
Fig. 1 is the process flow diagram according to an embodiment of self-adapting data compression method of the present invention;
Fig. 2 is a relatively synoptic diagram of the memory point that obtained of self-adapting data compression method according to the present invention and the memory point that obtained by the revolving door algorithm;
Fig. 3 is the slope variation zone diagrammatic sketch by the self-adapting data compression method of the embodiment of the invention;
Fig. 4 is the slope variation zone diagrammatic sketch of the revolving door algorithm of prior art.
Fig. 5 is the synoptic diagram by the memory point of revolving door algorithm acquisition;
Fig. 6 is the synoptic diagram by the memory point of self-adapting data compression method acquisition of the present invention.
Specific embodiments
With reference to the accompanying drawings and in conjunction with the embodiments the present invention is described in detail:
Fig. 1 is the process flow diagram according to an embodiment of self-adapting data compression method of the present invention.At first obtain current measured value step 1 from the historical data buffer queue.Then, proceed to compression time determining step 2, to current measured value, whether the Measuring Time difference of at first judging its Measuring Time and a last memory point is less than the minimal compression time or greater than the maximum compression time, this step is simply judged data before compression is handled, only in current Measuring Time difference in certain given time range, just the compression that current measurement data is carried out next step is handled.During the time (N in the compression time determining step 2), then do not continue to carry out following step less than minimal compression in current Measuring Time difference, do not store data (step 23), directly return, continue next measured data values is judged; In current Measuring Time difference greater than maximum compression during the time (Y in the compression time determining step 2), system stores the previous value of currency into historical data Buffer Pool (step 21,22), and as the starting point of next round data compressing and testing and a new last memory point.And to the current lower bound (S1) and lower bound (S2) initialize of current minimum and maximum slope, currency slope; , when minimal compression time and maximum compression are between the time, current measurement data is for further processing in current Measuring Time difference.
In compression time determining step 2, can comprise the data of lienar for variation tendency are compressed pre-treatment step, wherein current measured value with on the absolute value of difference of a measured value less than gathering sensitivity (incident A), the absolute value of current second order difference is during less than the difference upper bound (incident B), decision event A and incident B.When incident A and incident B set up simultaneously, current measured value did not store, and does not carry out following step, continued to investigate next measurement data.Wherein one of absolute value expression formula of current second order difference is: | ( τ i - 1 y i - ( τ i - 1 + τ i ) y i - 1 + τ i y i - 2 ) / ( τ i - 1 2 τ i ) | , Wherein, y iBe current measured value, τ iPoor for the Measuring Time of the Measuring Time of current measured value and a last measured value.
Next, the data of obtaining after the compressed time judgement are carried out smoothing processing determining step 3, when the measuring accuracy of survey sensor is very poor, need carry out data smoothing treatment step 31 current measured value.Data smoothing treatment step 31 can be undertaken by following data smoothing formula:
x i=ax l-1+by i+(1-a-b)y i-1
Wherein, x iBe the result after the smoothing processing of current time, x I-1Be the smooth value of previous moment, y iBe the measurement data of current time, y I-1Be the measurement data of previous moment, a and b are the weighting factor greater than zero, and satisfy a+b<1.Also can adopt other data smoothing formula in the data smoothing treatment step 31.
Next, to carry out the calculation procedure 4 of slope through the measurement data of smoothing processing determining step 3, this step provides next step to calculate the call parameter of compression straggling parameter:
Currency slope (S)=(a currency-last storage point value)/mistiming,
As S during greater than current maximum slope value, replace current maximum slope with new value S,
As S during less than current minimum slope value, replace current minimum slope with new value S,
The mistiming here is meant the Measuring Time of currency and Measuring Time poor of a last storage point value.
Next, to carry out compression straggling parameter calculation procedure 5 from the measurement data of the calculation procedure 4 of slope, this step is calculated corresponding compression straggling parameter according to the dynamic change situation of data, provides next step to calculate the adaptive change parameter that needs.Provide a kind of exponential form of calculating current compression straggling parameter below:
Current compression straggling parameter=α exp{-β (current maximum slope-current minimum slope) * mistiming } wherein, parameter alpha and β are given positive number, here α is called the upper bound of current compression straggling parameter, and β is called the running parameter of current compression straggling parameter, and exp (.) is an exponential function.The selection of α and β can be according to the statistical value of data and the measuring accuracy of sensor were determined in the past.
Next, utilize current compression straggling parameter, the current lower bound S1 of calculating currency slope and current upper bound S2 (step 6), the test that this step is done for data compression is judged provides critical parameters:
S1=currency slope-current compression straggling parameter/mistiming
S2=currency slope+current compression straggling parameter/mistiming
To the S1 of new calculation, when greater than original S1, get new value; To the S2 of new calculation, when less than original S2, get new value.
Next, carry out compression verification determining step 7.Wherein, the currency slope S is less than the S1 (incident C) after upgrading, and the currency slope S is investigated incident C and incident D greater than the S2 (incident D) after upgrading.When incident C and incident D have one to take place, currency passes through compression verification, system stores the previous value of currency into the historical data Buffer Pool, and as the starting point of next round data compressing and testing and a new last memory point, and to current maximum/minimum slope, S1 and S2 initialize; When incident C and incident D do not take place, continue the next new data point of test.
After executing above steps, when new value arrives, repeat above process.
In order better advantage of the present invention to be described, with prior art example as a comparison, compare below with the above embodiment of the present invention.
For for simplicity, in the emulation below, do not consider maximum compression time and minimal compression time.In addition, data compression rate is defined as (M-N)/M, and wherein M is total test data number, and N is total stored data number, and the total number of (or being called compressed) data of storage is not M-N like this.
This relatively in, participate in totally 1001 of the points (comprising initial point) (do not contain noise, see the solid line among Fig. 2) of emulation, the compression straggling parameter is taken as 0.01 in the revolving door algorithm; In self-adapting data compression method of the present invention, compression straggling parameter initial value also is taken as 0.01, and the upper bound and its corresponding running parameter of calculating current compression straggling parameter are taken as 0.05 and 0.5 respectively.Show by emulation, compress common 16 points of storage (comprising initial point and last point) that need with the revolving door algorithm, and compress only 8 points of needs storage (comprising initial point and last point) with self-adapting data compression method of the present invention, compressibility is improved.The storage point that provides revolving door and method of the present invention among Fig. 2 compares, and the point that is labeled as o among the figure is a storage point of the present invention, and the point that is labeled as * then is the storage point of revolving door algorithm.
In Fig. 3 and Fig. 4, provide the currency slope and the slope variation zone thereof that obtain by the technology of the present invention and revolving door algorithm respectively.As seen from the figure, compress and these 1001 data need be divided into 15 sections test data set with the revolving door algorithm, and the present invention only need be divided into 7 sections test data set.Solid line among the figure is a currency slope variation curve in each section, and dotted line is the upper bound change curve of each slope over 10, and dot-and-dash line is the lower bound change curve of each slope over 10.Simulation result shows, by adjusting current compression straggling parameter adaptively, has dynamically amplified the variation range of the initial currency slope of each section, has improved the compressibility of data effectively.
The increase standard deviation is 0.01 normal random number in the data of emulation in the above, and at this moment the data compression rate that is obtained by the revolving door algorithm has only 43.66%, and reaches 96.60% (seeing Fig. 5 and Fig. 6) by the data compression rate that the present invention obtains.The storage point line (totally 564 data points) of curve among Fig. 5 for obtaining by the revolving door algorithm; And the solid line among Fig. 6 partly is to be linked to be by 1001 actual measured value, is labeled as the storage point (totally 34 data points) of point for being obtained by the present invention of o among the figure.Significantly, compare with revolving door, data compression rate of the present invention is greatly improved, and has played to a certain extent measuring the Noise Suppression effect.

Claims (4)

1. self-adapting data compression method is characterized in that comprising step:
Compression time is judged: to current measured value, judge its Measuring Time whether in corresponding compression time zone from the historical data buffer queue, mistiming of current measured value too hour, do not carry out the step of back, continue to investigate next measurement data; When too big, system stores the previous value of currency into the historical data Buffer Pool in the mistiming of current measured value, and as the starting point of next round data compressing and testing and a new last memory point;
The calculating of slope: except calculating the currency slope, also calculate current maximum slope and current minimum slope;
The calculating of compression straggling parameter: to difference measurement data constantly, according to current maximum/minimum slope, the straggling parameter of dynamic calculation compression adaptively;
The calculating of the slope upper bound and lower bound: utilize the new compression straggling parameter that calculates, calculate the upper bound and the lower bound of currency slope;
Compression verification is judged: currency is by compression verification, and system stores the previous value of currency into the historical data Buffer Pool, and as the starting point of next round data compressing and testing and a new last memory point, otherwise continue the next new data point of test;
Historical data buffering: utilize the historical data Buffer Pool that the historical data of key is stored;
Data smoothing is handled: adopt the filtering smoothing technique, measurement data is carried out smoothing processing.
2. self-adapting data compression method according to claim 1, it is characterized in that in the compression time determining step, also comprising being used for the data of lienar for variation tendency are carried out pretreated data compression pre-treatment step, comprising to gathering sensitivity test and second order difference upper bound testing procedure.
3. self-adapting data compression method according to claim 2 is characterized in that the absolute value expression formula of described current second order difference is:
| ( τ i - 1 y i - ( τ i - 1 + τ i ) y i - 1 + τ i y i - 2 ) / ( τ i - 1 2 τ i ) |
Wherein, y iBe current measured value, τ iPoor for the Measuring Time of the Measuring Time of current measured value and a last measured value.
4. self-adapting data compression method according to claim 1 is characterized in that the adoptable data smoothing formula of described filtering smoothing technique is:
x i=ax i-1+by i+(1-a-b)y i-1
Wherein, x iBe the result after the smoothing processing of current time, x I-1Be the smooth value of previous moment, y iBe the measurement data of current time, y I-1Be the measurement data of previous moment, a and b are the weighting factor greater than zero, and satisfy a+b<1.
CN 02120383 2002-05-24 2002-05-24 Self adapting history data compression method Expired - Fee Related CN1223951C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 02120383 CN1223951C (en) 2002-05-24 2002-05-24 Self adapting history data compression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 02120383 CN1223951C (en) 2002-05-24 2002-05-24 Self adapting history data compression method

Publications (2)

Publication Number Publication Date
CN1459743A CN1459743A (en) 2003-12-03
CN1223951C true CN1223951C (en) 2005-10-19

Family

ID=29427022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 02120383 Expired - Fee Related CN1223951C (en) 2002-05-24 2002-05-24 Self adapting history data compression method

Country Status (1)

Country Link
CN (1) CN1223951C (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7076402B2 (en) * 2004-09-28 2006-07-11 General Electric Company Critical aperture convergence filtering and systems and methods thereof
CN100412863C (en) * 2005-08-05 2008-08-20 北京人大金仓信息技术有限公司 Huge amount of data compacting storage method and implementation apparatus therefor
CN100430943C (en) * 2006-01-09 2008-11-05 中国科学院自动化研究所 Intelligent two-stage compression method for process industrial historical data
CN101807925B (en) * 2010-02-08 2013-01-30 江苏瑞中数据股份有限公司 Historical data compression method based on numerical ordering and linear fitting
CN102622367B (en) * 2011-01-30 2014-08-20 上海振华重工(集团)股份有限公司 Method for filtering and compressing process data
CN103646056B (en) * 2013-11-29 2017-02-01 北京广利核系统工程有限公司 Method for storing and extracting historical data based on characteristic value storage
CN106649026B (en) * 2016-09-26 2020-07-07 国家电网公司北京电力医院 Monitoring data compression method suitable for operation and maintenance automation system
US11143545B2 (en) * 2019-02-12 2021-10-12 Computational Systems, Inc. Thinning of scalar vibration data
CN109933568A (en) * 2019-03-13 2019-06-25 安徽海螺集团有限责任公司 A kind of industry big data platform system and its querying method
CN112182034A (en) * 2019-07-03 2021-01-05 河南许继仪表有限公司 Data compression method and device
CN113114265B (en) * 2021-04-26 2024-03-19 北京交通大学 Synchronous phasor real-time data compression method based on extrapolation
CN117650791B (en) * 2024-01-30 2024-04-05 苏芯物联技术(南京)有限公司 Welding history airflow data compression method integrating welding process mechanism

Also Published As

Publication number Publication date
CN1459743A (en) 2003-12-03

Similar Documents

Publication Publication Date Title
CN1223951C (en) Self adapting history data compression method
US7647585B2 (en) Methods and apparatus to detect patterns in programs
KR101779992B1 (en) Caching based on spatial distribution of accesses to data storage devices
CN1279441C (en) Internal insersion of dynamic analysis based on storehouse tracing high speed buffer storage
CN106649026B (en) Monitoring data compression method suitable for operation and maintenance automation system
CN110191159B (en) Load adjusting method, system and equipment of resource server
CN114279842B (en) Method and system for determining cracking stress and damage stress of rock cracks
CN111782700B (en) Data stream frequency estimation method, system and medium based on double-layer structure
CN110146374B (en) Method and device for measuring brittleness index
CN116915259B (en) Bin allocation data optimized storage method and system based on internet of things
CN1547145A (en) Dynamic detecting and ensuring method for equipment operating status data quality
US6845312B1 (en) Method for detecting engine knock
CN107707680A (en) A kind of distributed data load-balancing method and system based on node computing capability
CN111779572B (en) Fire diagnosis method, device, equipment and storage medium
CN116610469A (en) Comprehensive quality performance test method and system for solid state disk
CN113407425B (en) Internal user behavior detection method based on BiGAN and OTSU
CN112922724B (en) Method for identifying knock interference
CN109828031A (en) Rock brittleness evaluation method and device
CN106484539A (en) A kind of determination method of processor cache characteristic
CN115684363A (en) Concrete performance degradation evaluation method based on acoustic emission signal processing
CN1096895A (en) Chaos processor
Tsybanyov Application of Modified Fatigue Curve for Evaluation of Fatigue Damage of Steels at Variable Stress Amplitudes. Part 1. Calculation Model and Initial Data at Constant Stress Amplitudes
CN118118553B (en) Intelligent sensor data caching method and system based on edge calculation
CN111863117B (en) Flash memory error page proportion evaluation model and method
RU2823230C1 (en) Method and system for authenticating users on web resource using identifier based on web browser fingerprint

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20051019

Termination date: 20140524