CN1223951C - Self adapting history data compression method - Google Patents
Self adapting history data compression method Download PDFInfo
- Publication number
- CN1223951C CN1223951C CN 02120383 CN02120383A CN1223951C CN 1223951 C CN1223951 C CN 1223951C CN 02120383 CN02120383 CN 02120383 CN 02120383 A CN02120383 A CN 02120383A CN 1223951 C CN1223951 C CN 1223951C
- Authority
- CN
- China
- Prior art keywords
- data
- compression
- value
- current
- slope
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention relates to a self adapting data compression method for compressing and processing history data in a database, which comprises the procedures that compression time is judged, whether the measuring time of the existing measuring value is in a specified compression time interval or not is judged; when a time difference of the existing measuring value is small, the following procedure is not executed, and next measuring data is continue to be examined; when the time difference of the existing measuring value is large, the previous value of the existing value is stored so that the previous value is used as a start point and a new previous memory point of the next data compression test; gradient is calculated, and the gradient of the existing value, the existing maximum gradient and the minimum gradient are calculated; compression deviation parameters of the measuring data in different time are calculated in a self-adapting dynmaic mode according to the existing maximum or minimum gradient; an upper bound and a lower bound of the gradient of the existing value are calculated by the new calculated compression deviation parameters; the compress test is judged, and the existing value stores the previous value of the existing value by the compression test so that the previous value is used as the start point and the new previous memory point of the next data compression test; otherwise, the next new data point is continue to be tested.
Description
Technical field
The present invention relates to the compress technique of historical data in real-time data base, especially a kind of adaptive historical data compression method.
Background technology
For a real-time data base, because data volume is huge, the management of data archival is undoubtedly one of its most important parts, and can crucial technology promptly be and store efficiently and visit data.Because the historical data amount is big, directly preservation needs very big physical storage medium if do not compress, and therefore the compression to historical data just becomes one of most important technology in the real-time data base.
In real-time dataBase system, the use of historical data compress technique is in order to be used to reduce disk space on traditional sense, at different application, data compression has several different methods, real-time dataBase system not only requirement can be stored a large amount of historical datas in limited hard drive space, but also requires these data can be accessed apace.Yet in the subject matter of using compress technique to occur later on be, in compressed database, handle inquiry and become very slow that this just needs the compressing data method to propose certain requirement for restriction, carry out data compression according to the characteristics of process data.For a real-time data base, because data volume is huge, the management of data archival is undoubtedly one of its most important parts, and can crucial technology promptly be and store efficiently and visit data.Because the historical data amount is big, directly preservation needs very big physical storage medium if do not compress, thereby the compression of historical data is just become one of most important technology in the real-time data base.
In the production information management and the development and development of decision support system (DSS) of the applicant in petrochemical industry application software demoncal ration engineering, developed a kind ofly collect the real-time data base technology, The Technology of Temporal Database is the real-time dataBase system AGILOR of one, in this system, compression, the filing management of historical data carried out primary study.
In the research and development process of historical data compress technique, used for reference the swinging door compression algorithm of U.S. OSI company exploitation.In the revolving door algorithm, the compression straggling parameter is got and is decided to be constant.Experimental data shows that at same compression straggling parameter, to the data of some tests, the revolving door algorithm has higher ratio of compression; And to the data of some other test, it is little many that ratio of compression is wanted.
Usually, any measurement device data can be divided into three parts, first is relative changing unit, the physical change process that depends on measurement parameter, second portion is normal value part, is absolute, can predict according to historical data in the past, third part is the measurement noise of sensor or is called measuring error, depends on the measuring accuracy of sensor.In this three part, normal value part does not have influence to the revolving door algorithm, and an influence is other two parts.Usually relatively changing unit is a continually varying, clocklike, and measuring error be at random, uncertain.When the sensor measurement precision was high, measuring error was just little, and at this moment data compression rate depends primarily on the relative changing unit of data.When the sensor measurement precision was low, measuring error was big, and the compressing data influence is very big.When measuring error arrived to a certain degree greatly, it had covered the relative changing unit of data, at this moment, the relative changing unit of data of description to become meaningless.
Particularly, there is the main deficiency of following two aspects in the revolving door algorithm:
1). the size of compression straggling parameter is bigger to the data compression rate influence that utilizes the revolving door algorithm and obtain;
2). very sensitive to measuring error, when the measuring error of sensor was relatively big, the data compression rate that utilizes the revolving door algorithm to obtain significantly reduced.
Summary of the invention
The purpose of this invention is to provide a kind of adaptive history data compression method, it can compress straggling parameter and utilize the data smoothing technology to suppress to measure noise by choice of dynamical, thereby reduces the influence of the measuring error of sensor to compression algorithm.
Self-adapting data compression method of the present invention comprises step:
Compression time is judged: to the current measured value from the historical data buffer queue, judge that its Measuring Time is whether in corresponding compression time zone;
The calculating of slope: except calculating the currency slope, also calculate current maximum slope and current minimum slope etc.;
The calculating of compression straggling parameter: to difference measurement data constantly, according to current maximum/minimum slope, the straggling parameter of dynamic calculation compression adaptively;
The calculating of the slope upper bound and lower bound: utilize the new compression straggling parameter that calculates, calculate the upper bound and the lower bound of currency slope;
Compression verification judges that (its function is to judge that currency is whether in its compression zone, to determine whether tested pilot needs storage): currency passes through compression verification, system stores the previous value of currency into the historical data Buffer Pool, and as the starting point of next round data compressing and testing and a new last memory point, otherwise continue the next new data point of test.
In the compression time determining step, also comprise and be used for that the data of lienar for variation tendency are carried out pretreated data compression pre-treatment step (its function is that some measurement data with characteristic feature are carried out fast processing, reduce the corresponding compression processing time), comprising to gathering sensitivity test and second order difference upper bound testing procedure.
Also comprise data smoothing treatment step (its function is to weaken to measure the influence of noise to measurement data, improves the ratio of compression of data), wherein adopt the filtering smoothing technique, measurement data is carried out smoothing processing.
Wherein the absolute value expression formula of current second order difference is:
Wherein, y
iBe current measured value, τ
iPoor for the Measuring Time of the Measuring Time of current measured value and a last measured value.
One of adoptable data smoothing formula of filtering smoothing technique is:
x
i=ax
i-1+by
i+(1-a-b)y
i-1
Wherein, x
iBe the result after the smoothing processing of current time, x
I-1Be the smooth value of previous moment, y
iBe the measurement data of current time, y
I-1Be the measurement data of previous moment, a and b are the weighting factor greater than zero, and satisfy a+b<1.
Data compression pre-service: the data of lienar for variation tendency are carried out pre-service (its effect is to omit further compression step, shortens the compression time to data), comprising to gathering sensitivity test and second order difference upper bound testing procedure;
Data smoothing is handled: relevant technologies such as employing filtering is level and smooth, measurement data is carried out smoothing processing; When the sensor measurement ratio of precision was higher, the smoothing process of data can be omitted, to save compression time;
The calculating of slope: except calculating the currency slope, also calculate current maximum slope and current minimum slope etc.;
The calculating of compression straggling parameter: to difference measurement data constantly, according to current maximum/minimum slope, the straggling parameter of dynamic calculation compression adaptively;
The calculating of the slope upper bound and lower bound: utilize the new compression straggling parameter that calculates, calculate the upper bound and the lower bound of currency slope;
Compression verification is judged: currency is by compression verification, and system stores the previous value of currency into the historical data Buffer Pool, and as the starting point of next round data compressing and testing and a new last memory point, otherwise continue the next new data point of test.
Description of drawings
Fig. 1 is the process flow diagram according to an embodiment of self-adapting data compression method of the present invention;
Fig. 2 is a relatively synoptic diagram of the memory point that obtained of self-adapting data compression method according to the present invention and the memory point that obtained by the revolving door algorithm;
Fig. 3 is the slope variation zone diagrammatic sketch by the self-adapting data compression method of the embodiment of the invention;
Fig. 4 is the slope variation zone diagrammatic sketch of the revolving door algorithm of prior art.
Fig. 5 is the synoptic diagram by the memory point of revolving door algorithm acquisition;
Fig. 6 is the synoptic diagram by the memory point of self-adapting data compression method acquisition of the present invention.
Specific embodiments
With reference to the accompanying drawings and in conjunction with the embodiments the present invention is described in detail:
Fig. 1 is the process flow diagram according to an embodiment of self-adapting data compression method of the present invention.At first obtain current measured value step 1 from the historical data buffer queue.Then, proceed to compression time determining step 2, to current measured value, whether the Measuring Time difference of at first judging its Measuring Time and a last memory point is less than the minimal compression time or greater than the maximum compression time, this step is simply judged data before compression is handled, only in current Measuring Time difference in certain given time range, just the compression that current measurement data is carried out next step is handled.During the time (N in the compression time determining step 2), then do not continue to carry out following step less than minimal compression in current Measuring Time difference, do not store data (step 23), directly return, continue next measured data values is judged; In current Measuring Time difference greater than maximum compression during the time (Y in the compression time determining step 2), system stores the previous value of currency into historical data Buffer Pool (step 21,22), and as the starting point of next round data compressing and testing and a new last memory point.And to the current lower bound (S1) and lower bound (S2) initialize of current minimum and maximum slope, currency slope; , when minimal compression time and maximum compression are between the time, current measurement data is for further processing in current Measuring Time difference.
In compression time determining step 2, can comprise the data of lienar for variation tendency are compressed pre-treatment step, wherein current measured value with on the absolute value of difference of a measured value less than gathering sensitivity (incident A), the absolute value of current second order difference is during less than the difference upper bound (incident B), decision event A and incident B.When incident A and incident B set up simultaneously, current measured value did not store, and does not carry out following step, continued to investigate next measurement data.Wherein one of absolute value expression formula of current second order difference is:
Wherein, y
iBe current measured value, τ
iPoor for the Measuring Time of the Measuring Time of current measured value and a last measured value.
Next, the data of obtaining after the compressed time judgement are carried out smoothing processing determining step 3, when the measuring accuracy of survey sensor is very poor, need carry out data smoothing treatment step 31 current measured value.Data smoothing treatment step 31 can be undertaken by following data smoothing formula:
x
i=ax
l-1+by
i+(1-a-b)y
i-1
Wherein, x
iBe the result after the smoothing processing of current time, x
I-1Be the smooth value of previous moment, y
iBe the measurement data of current time, y
I-1Be the measurement data of previous moment, a and b are the weighting factor greater than zero, and satisfy a+b<1.Also can adopt other data smoothing formula in the data smoothing treatment step 31.
Next, to carry out the calculation procedure 4 of slope through the measurement data of smoothing processing determining step 3, this step provides next step to calculate the call parameter of compression straggling parameter:
Currency slope (S)=(a currency-last storage point value)/mistiming,
As S during greater than current maximum slope value, replace current maximum slope with new value S,
As S during less than current minimum slope value, replace current minimum slope with new value S,
The mistiming here is meant the Measuring Time of currency and Measuring Time poor of a last storage point value.
Next, to carry out compression straggling parameter calculation procedure 5 from the measurement data of the calculation procedure 4 of slope, this step is calculated corresponding compression straggling parameter according to the dynamic change situation of data, provides next step to calculate the adaptive change parameter that needs.Provide a kind of exponential form of calculating current compression straggling parameter below:
Current compression straggling parameter=α exp{-β (current maximum slope-current minimum slope) * mistiming } wherein, parameter alpha and β are given positive number, here α is called the upper bound of current compression straggling parameter, and β is called the running parameter of current compression straggling parameter, and exp (.) is an exponential function.The selection of α and β can be according to the statistical value of data and the measuring accuracy of sensor were determined in the past.
Next, utilize current compression straggling parameter, the current lower bound S1 of calculating currency slope and current upper bound S2 (step 6), the test that this step is done for data compression is judged provides critical parameters:
S1=currency slope-current compression straggling parameter/mistiming
S2=currency slope+current compression straggling parameter/mistiming
To the S1 of new calculation, when greater than original S1, get new value; To the S2 of new calculation, when less than original S2, get new value.
Next, carry out compression verification determining step 7.Wherein, the currency slope S is less than the S1 (incident C) after upgrading, and the currency slope S is investigated incident C and incident D greater than the S2 (incident D) after upgrading.When incident C and incident D have one to take place, currency passes through compression verification, system stores the previous value of currency into the historical data Buffer Pool, and as the starting point of next round data compressing and testing and a new last memory point, and to current maximum/minimum slope, S1 and S2 initialize; When incident C and incident D do not take place, continue the next new data point of test.
After executing above steps, when new value arrives, repeat above process.
In order better advantage of the present invention to be described, with prior art example as a comparison, compare below with the above embodiment of the present invention.
For for simplicity, in the emulation below, do not consider maximum compression time and minimal compression time.In addition, data compression rate is defined as (M-N)/M, and wherein M is total test data number, and N is total stored data number, and the total number of (or being called compressed) data of storage is not M-N like this.
This relatively in, participate in totally 1001 of the points (comprising initial point) (do not contain noise, see the solid line among Fig. 2) of emulation, the compression straggling parameter is taken as 0.01 in the revolving door algorithm; In self-adapting data compression method of the present invention, compression straggling parameter initial value also is taken as 0.01, and the upper bound and its corresponding running parameter of calculating current compression straggling parameter are taken as 0.05 and 0.5 respectively.Show by emulation, compress common 16 points of storage (comprising initial point and last point) that need with the revolving door algorithm, and compress only 8 points of needs storage (comprising initial point and last point) with self-adapting data compression method of the present invention, compressibility is improved.The storage point that provides revolving door and method of the present invention among Fig. 2 compares, and the point that is labeled as o among the figure is a storage point of the present invention, and the point that is labeled as * then is the storage point of revolving door algorithm.
In Fig. 3 and Fig. 4, provide the currency slope and the slope variation zone thereof that obtain by the technology of the present invention and revolving door algorithm respectively.As seen from the figure, compress and these 1001 data need be divided into 15 sections test data set with the revolving door algorithm, and the present invention only need be divided into 7 sections test data set.Solid line among the figure is a currency slope variation curve in each section, and dotted line is the upper bound change curve of each slope over 10, and dot-and-dash line is the lower bound change curve of each slope over 10.Simulation result shows, by adjusting current compression straggling parameter adaptively, has dynamically amplified the variation range of the initial currency slope of each section, has improved the compressibility of data effectively.
The increase standard deviation is 0.01 normal random number in the data of emulation in the above, and at this moment the data compression rate that is obtained by the revolving door algorithm has only 43.66%, and reaches 96.60% (seeing Fig. 5 and Fig. 6) by the data compression rate that the present invention obtains.The storage point line (totally 564 data points) of curve among Fig. 5 for obtaining by the revolving door algorithm; And the solid line among Fig. 6 partly is to be linked to be by 1001 actual measured value, is labeled as the storage point (totally 34 data points) of point for being obtained by the present invention of o among the figure.Significantly, compare with revolving door, data compression rate of the present invention is greatly improved, and has played to a certain extent measuring the Noise Suppression effect.
Claims (4)
1. self-adapting data compression method is characterized in that comprising step:
Compression time is judged: to current measured value, judge its Measuring Time whether in corresponding compression time zone from the historical data buffer queue, mistiming of current measured value too hour, do not carry out the step of back, continue to investigate next measurement data; When too big, system stores the previous value of currency into the historical data Buffer Pool in the mistiming of current measured value, and as the starting point of next round data compressing and testing and a new last memory point;
The calculating of slope: except calculating the currency slope, also calculate current maximum slope and current minimum slope;
The calculating of compression straggling parameter: to difference measurement data constantly, according to current maximum/minimum slope, the straggling parameter of dynamic calculation compression adaptively;
The calculating of the slope upper bound and lower bound: utilize the new compression straggling parameter that calculates, calculate the upper bound and the lower bound of currency slope;
Compression verification is judged: currency is by compression verification, and system stores the previous value of currency into the historical data Buffer Pool, and as the starting point of next round data compressing and testing and a new last memory point, otherwise continue the next new data point of test;
Historical data buffering: utilize the historical data Buffer Pool that the historical data of key is stored;
Data smoothing is handled: adopt the filtering smoothing technique, measurement data is carried out smoothing processing.
2. self-adapting data compression method according to claim 1, it is characterized in that in the compression time determining step, also comprising being used for the data of lienar for variation tendency are carried out pretreated data compression pre-treatment step, comprising to gathering sensitivity test and second order difference upper bound testing procedure.
3. self-adapting data compression method according to claim 2 is characterized in that the absolute value expression formula of described current second order difference is:
Wherein, y
iBe current measured value, τ
iPoor for the Measuring Time of the Measuring Time of current measured value and a last measured value.
4. self-adapting data compression method according to claim 1 is characterized in that the adoptable data smoothing formula of described filtering smoothing technique is:
x
i=ax
i-1+by
i+(1-a-b)y
i-1
Wherein, x
iBe the result after the smoothing processing of current time, x
I-1Be the smooth value of previous moment, y
iBe the measurement data of current time, y
I-1Be the measurement data of previous moment, a and b are the weighting factor greater than zero, and satisfy a+b<1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 02120383 CN1223951C (en) | 2002-05-24 | 2002-05-24 | Self adapting history data compression method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 02120383 CN1223951C (en) | 2002-05-24 | 2002-05-24 | Self adapting history data compression method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1459743A CN1459743A (en) | 2003-12-03 |
CN1223951C true CN1223951C (en) | 2005-10-19 |
Family
ID=29427022
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 02120383 Expired - Fee Related CN1223951C (en) | 2002-05-24 | 2002-05-24 | Self adapting history data compression method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1223951C (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7076402B2 (en) * | 2004-09-28 | 2006-07-11 | General Electric Company | Critical aperture convergence filtering and systems and methods thereof |
CN100412863C (en) * | 2005-08-05 | 2008-08-20 | 北京人大金仓信息技术有限公司 | Huge amount of data compacting storage method and implementation apparatus therefor |
CN100430943C (en) * | 2006-01-09 | 2008-11-05 | 中国科学院自动化研究所 | Intelligent two-stage compression method for process industrial historical data |
CN101807925B (en) * | 2010-02-08 | 2013-01-30 | 江苏瑞中数据股份有限公司 | Historical data compression method based on numerical ordering and linear fitting |
CN102622367B (en) * | 2011-01-30 | 2014-08-20 | 上海振华重工(集团)股份有限公司 | Method for filtering and compressing process data |
CN103646056B (en) * | 2013-11-29 | 2017-02-01 | 北京广利核系统工程有限公司 | Method for storing and extracting historical data based on characteristic value storage |
CN106649026B (en) * | 2016-09-26 | 2020-07-07 | 国家电网公司北京电力医院 | Monitoring data compression method suitable for operation and maintenance automation system |
US11143545B2 (en) * | 2019-02-12 | 2021-10-12 | Computational Systems, Inc. | Thinning of scalar vibration data |
CN109933568A (en) * | 2019-03-13 | 2019-06-25 | 安徽海螺集团有限责任公司 | A kind of industry big data platform system and its querying method |
CN112182034A (en) * | 2019-07-03 | 2021-01-05 | 河南许继仪表有限公司 | Data compression method and device |
CN113114265B (en) * | 2021-04-26 | 2024-03-19 | 北京交通大学 | Synchronous phasor real-time data compression method based on extrapolation |
CN117650791B (en) * | 2024-01-30 | 2024-04-05 | 苏芯物联技术(南京)有限公司 | Welding history airflow data compression method integrating welding process mechanism |
-
2002
- 2002-05-24 CN CN 02120383 patent/CN1223951C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN1459743A (en) | 2003-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1223951C (en) | Self adapting history data compression method | |
US7647585B2 (en) | Methods and apparatus to detect patterns in programs | |
KR101779992B1 (en) | Caching based on spatial distribution of accesses to data storage devices | |
CN1279441C (en) | Internal insersion of dynamic analysis based on storehouse tracing high speed buffer storage | |
CN106649026B (en) | Monitoring data compression method suitable for operation and maintenance automation system | |
CN110191159B (en) | Load adjusting method, system and equipment of resource server | |
CN114279842B (en) | Method and system for determining cracking stress and damage stress of rock cracks | |
CN111782700B (en) | Data stream frequency estimation method, system and medium based on double-layer structure | |
CN110146374B (en) | Method and device for measuring brittleness index | |
CN116915259B (en) | Bin allocation data optimized storage method and system based on internet of things | |
CN1547145A (en) | Dynamic detecting and ensuring method for equipment operating status data quality | |
US6845312B1 (en) | Method for detecting engine knock | |
CN107707680A (en) | A kind of distributed data load-balancing method and system based on node computing capability | |
CN111779572B (en) | Fire diagnosis method, device, equipment and storage medium | |
CN116610469A (en) | Comprehensive quality performance test method and system for solid state disk | |
CN113407425B (en) | Internal user behavior detection method based on BiGAN and OTSU | |
CN112922724B (en) | Method for identifying knock interference | |
CN109828031A (en) | Rock brittleness evaluation method and device | |
CN106484539A (en) | A kind of determination method of processor cache characteristic | |
CN115684363A (en) | Concrete performance degradation evaluation method based on acoustic emission signal processing | |
CN1096895A (en) | Chaos processor | |
Tsybanyov | Application of Modified Fatigue Curve for Evaluation of Fatigue Damage of Steels at Variable Stress Amplitudes. Part 1. Calculation Model and Initial Data at Constant Stress Amplitudes | |
CN118118553B (en) | Intelligent sensor data caching method and system based on edge calculation | |
CN111863117B (en) | Flash memory error page proportion evaluation model and method | |
RU2823230C1 (en) | Method and system for authenticating users on web resource using identifier based on web browser fingerprint |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20051019 Termination date: 20140524 |