CN103487558B - A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality - Google Patents
A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality Download PDFInfo
- Publication number
- CN103487558B CN103487558B CN201310323279.2A CN201310323279A CN103487558B CN 103487558 B CN103487558 B CN 103487558B CN 201310323279 A CN201310323279 A CN 201310323279A CN 103487558 B CN103487558 B CN 103487558B
- Authority
- CN
- China
- Prior art keywords
- sample
- sensor
- model
- tea
- mahalanobis distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality, it is characterised in that: judge the generation of exceptional sample be whether maloperation or instrument abnormal, be then to be corrected by Resurvey;No, then use principal component analysis shot chart method to combine mahalanobis distance method and exceptional sample is identified.
Description
Technical field
The application detects different during relating to a kind of pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality
The often method of sample.
Background technology
Sensory evaluation is the important method that evaluation tea leaf quality is good and bad for a long time, but the method needs abundant Tea Science
Knowledge and evaluate experience.Only specialty tea judgement person, distributor or manufacturer, person is difficult to differentiate tea quality typically to purchase tea
Quality, there is no the accumulation of suitable experience, it is difficult to obtain reliable result.And cultivate a Folium Camelliae sinensis syndic not only to want meticulously
Select, put into a large amount of expense, and cycle of training is the most long.Even if moreover specialty teacher of the sampling tea, it is sensory sensitive
Degree is also easily changed by the interference of extraneous factor, thus affects the accuracy of evaluation result, objectivity and concordance.Such as smelling of people
Feel that resolving power is easily disturbed by extraneous different miscellaneous QI taste;The sense of taste sensitivity of people is easily by other irritable food and the shadow of temperature thereof
Ring;The vision of people relates to the factors such as optics, vision physiological, visual psychology, and the chromatic discrimination power of different people can exist necessarily
Difference.The sensory sensitivity of the personnel that evaluate also is affected by other factors, such as areal variation, sex difference, spirit shape
The factor such as state and health.Additionally, sensory review need to be carried out on the basis of comparison material standard sample, and material standard sample
Make and limited by various conditions, it is difficult to keep several years the most unanimously.And standard sample uses preceding year or former year
Productivity product make raw material, it is impossible to not by sky time, weather, geographical conditions are affected, so in fact standard sample quality is very
Difficulty reaches absolute standard.
The present invention to the different collecting periods, different tree species, different producing area dragon well green tea from physical and chemical index and organoleptic indicator,
The integrated technology of combined with intelligent organoleptic analysis, multivariate statistics and modern instrumental analysis, omnibearing parsing dragon well green tea feature, point
The internal relation of the analysis each index of Folium Camelliae sinensis, sets up qualitative, the mathematical model of quantitative assessment dragon well green tea quality, carries out dragon well green tea quality
Feature identification, ranking accurately, provides strong foundation for setting up unified green tea appraisement system standard.These researchs
There is provided basis by the quality evaluation for other Folium Camelliae sinensis of China in theory and support, in practice for improving China's tea quality
Stability, strengthened the grading and classification of China's Folium Camelliae sinensis by standardized instrument, it is achieved the high quality and favourable price of Folium Camelliae sinensis, break China's export
The high-quality low price tradition of Folium Camelliae sinensis, eliminates developed country's query to China's product high-quality low-cost, for safeguarding domestic market order
With the vital interests of guarantee consumer, actively defend the international fame of China's tea products, promote that international trade etc. has important
Meaning and significant social benefit, economic benefit.
Recently as the development of Modern Instrument Analytical Technique, the physics and chemistry research of Folium Camelliae sinensis have also been obtained corresponding progress.Tea
Leaf aroma substance separates and analytical technology is the most progressively from conventional gas chromatogram (GC) or gas chromatography-mass spectrography (GC-MS)
It is transitioned into gas chromatogram-smell and distinguish (GC-O) method.Detect the tea aroma composition of kind more than 700 at present, spread out including fats
Biology, terpenes derivant, aromatic derivant and nitrogenous oxa-cyclics.But nonetheless, merely from the angle of composition
Degree is also difficult to react global feature information and the flavouring essence quality of tea aroma.Main to the instrument analysis technology of Folium Camelliae sinensis taste compound
There are liquid chromatography, spectrographic method, mass spectrography, nuclear magnetic resonance method etc..At present, containing organic chemical composition up to six in the clearest and the most definite Folium Camelliae sinensis
Over one hundred kind, inorganic mineral element also reaches kind more than 40.But owing to there is interaction between various flavours, right such as the sense of taste
Compare, modify tone, coordinate and the phenomenon such as the mutual-detoxication, so the chemical characteristic parameter recorded can not reflect the taste of sample the most all sidedly
Feel feature.
The appearance of intelligent sensory analytical technology has promoted tea leaf quality detection level further, and it is based on to human body
The technology that perception is imitated.Sensor is equivalent to the sensory organ in biosystem, produces the attribute in terms of sample
Raw response signal;Response signal is transmitted and simple process by signal picker such as nervous system;Computer is such as human brain pair
Signal data carries out complex process and analyzes identification, forms comprehensive, overall judgement.When intelligent sensory analytical technology has detection
Between short, reproducible, need not the sample pretreatment process of complexity, sensory fatigue and the objective spy such as reliably of testing result do not occur
Point, it is often more important that the sense organ that can simulate people to a certain extent provides the judge knot about tea aroma, flavour and expolasm
Fruit and finger print information, be focus and the development trend of the detection research of current tea leaf quality.Currently for the color in Folium Camelliae sinensis,
The sensory attribute such as shape, intelligent sensory analytical technology predominantly organic device vision, Electronic Nose and electronic tongues technology, its work used
Flow process mainly include sensor produce response signal, response signal is carried out pretreatment, extract sample characteristic information, set up relevant
Model is gone forward side by side row mode identification.Wherein pattern recognition is the important component part of intelligent sensory system.The main side of application at present
Method has principal component analysis, artificial neural network and fuzzy diagnosis etc..Principal component analysis is used for signal processing, suppresses multidimensional sensor
Response signal noise and compressed signal data.Signal after processing is learnt and trains by artificial neural network, sets up network
Model.Fuzzy diagnosis then with fuzzy reasoning complexity carried out fuzzy diagnosis, fuzzy quantitatively.
Use function and the feature of intelligent sensory technical modelling people sensory review, process intelligent sensory in conjunction with many algorithm researches
The abundant product quality information contained in detection, and then extract corresponding computation model and method.To solve terminal
Algorithm for the purpose of problem, analyzes theirs in the case of multiple intelligence sensor objects and multiple product index are interrelated
Statistical law, is well suited for the feature of food scientific research.Use many algorithms, intelligent sensory analytical technology and modern instrumental analysis skill
The integrated technologies such as art, it is possible to overcome the trouble of the statistics and analysis that multiple attribute synthetical evaluation brings, also be able to make full use of simultaneously
Experimental data information obtains and the implicit details of Folium Camelliae sinensis feature correlation of attributes so that the statistical analysis of Folium Camelliae sinensis feature quality and pattern
Differentiation can complete simultaneously, the rapidest but also accurate.Thus, for setting up the feature qualitative data storehouse of China's Folium Camelliae sinensis and intelligent quality
Evaluation system, it is achieved the analysis fast, accurate and comprehensive to tea leaf quality, for the scientific evaluation, rationally of China's Folium Camelliae sinensis feature quality
Define and offer reference and instruct, for China's Folium Camelliae sinensis quality guarantee, characteristic protection, real and fake discrimination provide core technical support.
Electronic Nose as the Novel odor scanner grown up the nineties in 20th century, be widely used at present food,
The fields such as beverage, cosmetics, environment measuring and Processing Farm Produce control.Compared with common chemical analysis method, electricity
Sub-nose utilizes its cross-sensitivity to multiple gases, the Global Information of overall merit gas, compared with the olfactory sensation of people, measures knot
Fruit is more objective, reliable.
Electronic tongues technology be 20th century the mid-80 grow up a kind of analysis, identify liquid taste novel detection
Means, have been applied to the fields such as food, medicine, cosmetics, chemical industry, environmental monitoring.With common chemical analysis method phase
Ratio, what electronic tongues exported is not the analysis result of sample flavour composition, but a kind of signal mode relevant with sample, pass through
After there is the software system analysis of mode identificating ability, the overall assessment relevant to sample taste characteristics can be drawn.
In sum, intelligent sensory analytical technology (machine vision technique, Electronic Nose Technology and electronic tongues technology) is at Folium Camelliae sinensis
Achieved with better result in Quality Detection, and show preferable application prospect.But these technology are from reality application also at present
There is certain gap, still have some critical problems to need to solve.As:
(1) key technology research of Electronic Nose, electronic tongues: machine vision technique is the most extensively applied, but electric
Sub-nose, electronic tongues are still in development, therefore to build comprehensive intelligent sensory system, need Electronic Nose, electronics tonguing
Row further investigation, solves its key issue.
(2) development of specific sensor and screening: owing to different types of sample has its specific substance system, lead
Cause different types of sensor the most different to the response of different material.Therefore, need to further investigate further, for specific material
Establishing response is fast, sensitivity height, life-span length, easy to clean, economic and practical sensor array.
(3) representativeness of sample and the science of sampling: in current research report, its result mostly shows and divides Folium Camelliae sinensis
The differentiation rate of class or classification is higher.But in these researchs, the representativeness of Tea Samples is strong not, and sample number is the most complete, is adopting
During collection sample message, being substantially Duplicate Samples, the Folium Camelliae sinensis detection of the most each grade repeats many times so that the stability of model
The best, range is the widest.Only set up the sample collection method of science and the discrimination principle of sample representativeness, after guarantee
The smooth foundation of continuous model.
(4) drift of signal and denoising: due to factors such as apparatus measures parameter, measuring method, measurement environment, sample sources
Change, is easily caused the drift of sensor response curve, causes the error that intelligent sensory detects so that it is do not adapt to industrialized
Work continuously for a long time, it is therefore desirable to strengthen about reducing response signal drift, the research of signal noise analyzing and processing technology.
(5) robustness of model: model, when setting up discrimination model, is not discussed in detail, also do not makes by some research
The robustness of testing model is carried out with independent forecast sample.Additionally, the stability of institute's established model is not enough in quality differentiation, need to add
The research of strong algorithms and improvement, to improve the effect of pattern recognition.
Electric nasus system belongs to the array combination of many sensors, due to tea aroma complicated component so that each sensing
Device has response to a lot of fragrance, and each fragrance component has response on a lot of sensors so that sensor fingerprint collection of illustrative plates
Array can farthest retain fragrance information, but is readily incorporated bulk redundancy information, causes that quality Modeling Calculation amount is big, consumption
Time-consuming length, institute's established model complexity are unstable.Its main cause is: (1) due in intelligent sensory finger printing, some sensor
Sample response information the most weak, directly affect the precision of prediction of model;(2) due to the impact of Electronic Nose noise of instrument, some pass
The sample message signal to noise ratio (snr) of sensor is relatively low;Meanwhile, sample quality is existed by extraneous interference factor (such as temperature, humidity etc.)
Fingerprint response characteristic impact at some sensor is relatively big, thus reduces the robustness of model;(3) containing many in tea aroma
Planting component, each component all can have stronger response in some or several sensor, and believes as tea aroma entirety
The detection of breath, needs optimum organization to have the sensor array of specific response to different aromas, could comprehensive effective characteristic perfume
Finger print information.
Rationally selecting and combination by sensor, is possible not only to reject uncorrelated or nonlinear smelling sensor, goes
Except redundant sensor data message, extract maximally effective fragrance intelligence smelling finger printing information, make calibration model have more preferably
Predictive ability, simplified operation.And can be saved those and pattern recognition effect is had no significant effect even have negative effect
Sensor, thus to reduce Electronic Nose manufacturing cost, improve system stability have certain positive effect.
Sensor selects a kind of optimization problem being frequently encountered by practice exactly.Although currently used optimum combining method
Employ the theory of combination to a certain extent, but this combination is on the basis of preliminary rejecting, to the sensor after packet
Array is combined, and the effect of not up to global optimization combination.And although Loading value method avoids adding of redundant sensor
Enter, but do not analyze the response performance of selected sensor, repeatability that same sample is responded by the most same sensor and to difference
The distinction of sample response.Genetic algorithm (Genetic Algorithms, be abbreviated as GA) be with Darwinian survival of the fittest and
Based on the biological evolution theory of the survival of the fittest, the simulation heredity of biosphere and evolutionary process and a kind of optimization method of setting up,
There is non-derivative, stochastic global optimization, avoid being absorbed in the feature such as local minimum point and easy realization.
Summary of the invention
The side of exceptional sample is detected during a kind of pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality
Method, it is characterised in that: judge the generation of exceptional sample be whether maloperation or instrument abnormal, be then to be carried out by Resurvey
Correct;No, then use principal component analysis shot chart method to combine mahalanobis distance method and exceptional sample is identified.Described main constituent
Analyze shot chart method on the premise of not losing main profile information by Data Dimensionality Reduction, select the less new variables of dimension to replace
The most more variable, to eliminate part overlapped in information co-exist, by changing original collection of illustrative plates variable, makes number
The less new variables of mesh becomes the linear combination of original variable;Described mahalanobis distance (Mahalanobis) method of discrimination combines and passes
The response data of sensor is carried out, and the mahalanobis distance calculation procedure of collection of illustrative plates sample is as follows:
T is the finger printing sample of Folium Camelliae sinensis sampling, and in formula, ti is the collection of illustrative plates score of calibration set sample i, for m sample of calibration set
This average matrix;Tcen is the average centralization matrix of T;M is the Mahalanobis matrix of calibration set sample;MDi is
The Mahalanobis distance of calibration set sample i.According to quantitative correction allowable error and corresponding Mahalanobis distance, really
Determine outlier Mahalanobis distance threshold limit and to spectrum data standardization after, the mahalanobis distance size of each sample
Determined by following formula:
In intelligent sensory
In sensor detection, hii have expressed sample i and regression model influence degree, maximum hii being shown, this regression model is to sample i
Dependency relatively big, then sample i is abnormal sample.
Accompanying drawing explanation
PCA shot chart (a) before the rejecting of Fig. 1 exceptional sample and mahalanobis distance value residual plot (b).
Fig. 2 difference sample is at the signal graph of Electronic Nose sensor characteristics response point.
PCA shot chart (a) after Fig. 3 exceptional sample LLJ rejecting and mahalanobis distance value residual plot (b).
The Electronic Nose sensor response signal strength map of tetra-grade dragon well green teas of Fig. 4.
After Fig. 5 abnormal sample is rejected, different brackets Folium Camelliae sinensis Electronic Nose sensor responds meansigma methods figure.
Load diagram under front four main constituents after the rejecting of Fig. 6 abnormal sample.
Relation between PRESS value and model number of principal components in Fig. 7 Grade Model.
Fig. 8 genetic algorithm operational flow diagram.
Fig. 9 crossover algorithm.
Figure 10 mutation algorithm.
Tetra-rating sample of Figure 11 load diagram under main constituent one, two.
Folium Camelliae sinensis sensor response meansigma methods figure in the model of Figure 12 place of production.
The principal component scores figure (PC1-PC2) of Figure 13 place of production model.
Figure 14 place of production model LHT-LMT (a) and the LYJ-LWJ (b) load diagram under main constituent one, two.
Folium Camelliae sinensis sensor response meansigma methods figure in Figure 15 seeds model.
Figure 16 seeds model principal component scores figure (PC1-PC2).
Detailed description of the invention
1 Tea Samples is collected and is processed
The present invention collects the dragon well green tea from the West Lake samples in 2011 from Hangzhou West Lake Dragon Well tea producing region locality tea grower, specifically includes 4
Individual grade, 2 seeds, 5 places of production.For the ease of the differentiation between tea sample, every kind of tea sample reasonable numbering and differentiation are carried out, tool
Body information is shown in Table 1.In order to ensure the concordance of tea sample matter of the same race, tea sample is placed in the freezer of less than-4 DEG C, according to experiment
Consumption takes pouch every time and tests.
2 detection by electronic nose methods
The present invention uses Alpha MOS company of France to produce the Fox 4000 type Electronic Nose with head space automatic system.First
First, add 5 mL room temperature ultra-pure water glands after each 20 mL ml headspace bottle load 1.00 g Dragon Well tea dried bean curd tea to seal;Every kind of tea
Sample contained the most by this way by sample, and detects successively.The detection of each sample is first ml headspace bottle to be sent into preheating zone,
After heating 900 s at a temperature of 500 rpm agitator rotating speeds and 60 DEG C of head spaces, extract the 2.0 mL gases sample introduction with 2.0 mL/s out
Speed is injected into Electronic Nose sensor array room (containing 18 metal oxide sensors).Respectively with the half of 18 sensor surfaces
Conductor material generation adsorption and desorption effect, causes the change of sensor resistance.Under the different time, produce different
Resistance value.Being 120s in sensor array indoor sample gas residence time, every 0.5s samples once, and Electronic Nose software is remembered automatically
Record each sampled data.
3 tea leaf quality modeling methods
Tea leaf quality model (grade, the place of production, seeds model etc.) is set up specimen in use and is divided into calibration set and forecast set.To often
Sample in individual model, all randomly chooses 2/3rds and makees calibration set sample, and remaining 1/3rd are used as forecast set sample.This
Invention is by soft independent model SIMCA(Soft independent modeling class analogy, also known as similarity analysis method)
Set up qualitative discrimination model, initially set up the PCA data model of all kinds of sample, then on the basis of this, calculate the SIMCA of unknown sample
Distance determines its category attribution.Model all calculating to be completed by self-editing MATLAB 7.0 program.
4 exceptional sample point analysis and rejectings
4.1 exceptional sample point analysis principles
During application intelligent sensory signal carries out the pattern recognition analysis of tea leaf quality, all classification and recognition result
Reliability be dependent firstly on the accuracy of initial data, i.e. obtain intelligent sensory and gather signal and the original classification information of Folium Camelliae sinensis
Reliability, the quality of data set quality directly affects the success or failure of pattern discrimination.Therefore, abnormal (unusual) sample point (Outlier
Sample) existence can affect the distribution trend even changing overall data to a certain extent, thus affects calibration model
Accuracy.
So-called exceptional sample point, refers not only to the measured value of intelligent fingerprint collection of illustrative plates or sample raw information and the aobvious of actual value
Write sexual abnormality, also should include the finger printing of this sample and the significant difference of modeling concentration sample mean profile information, typically
Finger printing can be divided into abnormal and Folium Camelliae sinensis raw information is abnormal.
The main cause causing finger printing abnormal has:
(1) measuring instrument and the change of performance parameter, such as change, noise of instrument and the wave band drift etc. of instrument energy;
(2) change of measuring method, as the difference of sampling, the difference of measuring point and measure distance the most equal;
(3) change of environment is measured, such as the change of temperature and humidity;
(4) other physics of sample or the change of mechanical property, as granularity, viscosity, fineness etc. change;
(5) change of sample source so that sensor response resistivity or some characteristic peak intensity are abnormal, such as the place of production, put
Put the changes such as time, storing mode, collecting period and tillage method;
(6) sample is rotten or mistakes to wait and slips up;
(7) operation mistake in intelligent sensory signal scanning.
The main source of Folium Camelliae sinensis original quality Information abnormity has:
(1) physics and chemistry instrument used by and the reliability of method;
(2) change of sensory evaluation method;
(3) change of sample source;
(4) error of tea teacher is commented, such as the error in judging process and Data Input Process.
The generation of exceptional sample, if maloperation or instrument are abnormal, can obtain simply by Resurvey after discovery
Correct;Abnormal sample is if due to sample generation itself, it is impossible to corrected simply by Resurvey, to this sample
Predictive value the most reliably depend on the fitting degree of its sensor response abnormality and model.So the discovery of abnormal sample and having
Effect rejecting be calibration model and data results the most crucial.
4.2 exceptional sample point analysis methods
It is that principal component analysis score combines mahalanobis distance method that exceptional sample in the present invention analyzes method.
(1) principal component analysis shot chart method
Principal component analysis (PCA) is a kind of data mining technology in multivariate statistics.Do not losing main profile information
By Data Dimensionality Reduction under premise, it is chosen as several less new variables and replaces the most more variable, to eliminate numerous information co-exist
In overlapped part.By original a large amount of collection of illustrative plates variablees are changed, small numbers of new variables is made to become original change
The linear combination of amount.
Principal component scores after principal component analysis can similarity between reflected sample and uniqueness, each sample correspondence is not
Different score value is had with main constituent.Shot chart based on sample can disclose internal feature and the clustering information of sample, further
Illustrate whether each sample exists larger difference in big class sample set, provide certain theory for exceptional sample point analysis and depend on
According to.
(2) mahalanobis distance method of discrimination
Mahalanobis distance (Mahalanobis) is one of effective ways of research hyperspace vector similarity, at collection of illustrative plates
Qualitative, outlier discriminant analysis is used widely.When Mahalanobis distance calculates, in conjunction with the sound under several sensors
Answering data (such as resistivity) to carry out, the mahalanobis distance calculation procedure of sample set is as follows:
In formula, ti is the collection of illustrative plates score of calibration set sample i, for the average matrix of m sample of calibration set;Tcen is the average of T
Centralization matrix;M is the Mahalanobis matrix of calibration set sample;MDi is the Mahalanobis distance of calibration set sample i.
According to quantitative correction allowable error and corresponding Mahalanobis distance, determine that outlier Mahalanobis distance threshold limits also
After spectrum data standardization, the mahalanobis distance size of each sample is determined by following formula:
Hii can be used to the impact weighing a sample for whole standard sample collection.In intelligent sensory sensor detects,
Hii have expressed sample i to regression model influence degree, if hii is too big, shows that this regression model is to the dependency of sample i relatively
Greatly, unfavorable to model stability, in other words, sample i is probably abnormal sample.
4.3 exceptional sample point analysis and rejectings
Main constituent (score matrix) is the linear combination of primal variable, when characterizing primal variable with it produced square
Minimum with error.First main constituent can explain that the amount of variation of former variable is maximum, and second is taken second place, and the rest may be inferred by analogy for it, respectively organizes master
Composition is mutually orthogonal.The method that main constituent calculates is more, uses the nonlinear iterative partial least square of house one validation-cross at this
Method (Nonlinear Iterative Partial Least Squires, NIPALS).The principal component scores of dragon well green tea is with right
The mahalanobis distance residual result answered is as shown in Figure 1.Sample LLJ deviates farther out with other sample sets in main constituent figure, and its
Mahalanobis distance value is the biggest, and therefore these LLJ are exceptional sample point.Analyze its corresponding sensor response diagram (Fig. 2), find
This fine work Folium Camelliae sinensis is very big with the response diagram difference of other sample fine work, and is not belonging to a rating sample.Inquiry sample collecting
Raw information, find this sample the non-real Long Wu place of production, Hangzhou fine work dragon well green tea from the West Lake, but Zhejiang Dragon Well tea, because of sample
Offer mistake causes.After to these abnormity point elimination, re-start principal component scores and analyze (Fig. 3) with mahalanobis distance value,
Finding that these Folium Camelliae sinensis are evenly distributed in main constituent figure, there is not exception yet in its mahalanobis distance value, representative, can carry out
Follow-up model is set up and Mathematical treatment of being correlated with.
The score of main constituent can similarity between reflected sample and uniqueness to a certain extent, each sample correspondence is not
Different score value is had with main constituent.Fig. 3 (a) is each Folium Camelliae sinensis sample score scatterplot in the first two main constituent, it is shown that sample
The dispersion of this point and difference, the sample with same or like character flocks together, and the obvious sample of difference is the most remote
From.Two grades of Folium Camelliae sinensis and other Folium Camelliae sinensis difference are very big as we can see from the figure, have oneself independent region, but fine work, superfine
Distinguish the least with the Folium Camelliae sinensis of one-level, have obvious overlapping region.This is consistent with sensor response curve analysis result above.
Shot chart based on sample can disclose internal feature and the clustering information of sample, further illustrates sample in sensor responds
The larger difference existed, for utilizing Electronic Nose classification and Detection different brackets Folium Camelliae sinensis to provide certain theoretical foundation.But due to it
Its hierarchical region is overlapping serious, and this method is almost not used to the differentiation of these four sample by naked eyes.
Mahalanobis distance residual plot then represents each sample point influence degree to corresponding principal component model, by sample point
Mahalanobis distance and residual error determine, the sample point of high mahalanobis distance value and high residual values is considered as exceptional sample point.Geneva
Distance value is the sample point subpoint in a model distance away from model center, represents this sample and the district of other sample in model
Not, and the influence degree that sample point is to set up model, it is worth the biggest expression the biggest on the impact of model.Residual error is sample point
Observed value and the difference of match value, represent the amount of model the most construable sample point feature, and its value is the least, and models fitting is the best.From
Fig. 3 (b) understands, and residual values and the mahalanobis distance value of sample point are the least, show that the sample that in each model, calibration set is chosen has
The representativeness of corresponding Folium Camelliae sinensis characteristic.
After exceptional sample is rejected, its final experiment sample number is shown in Table 2, and sample is 667 before rejecting, and is 617 after rejecting
Individual sample.The main cause causing above-mentioned phenomenon is to be mixed with in sample set to be not belonging to same overall data, these abnormal datas
After (exceptional sample) is mixed into, can make to predict the outcome inaccurate, affect the correctness of statistical inference, measurement result is brought disadvantageous
Impact.Exceptional sample is the most very important on the impact of calibration model, the effectiveness of established model in order to ensure, is entering data
When row processes, it is necessary to find and identify exceptional sample, and it is rejected from sample set sample, do follow-up study the most again.
By PCA, mahalanobis distance figure and sensor response diagram analysis of spectrum, the abnormal sample point in search modeling.Show to pass
Sensor response finger printing is highly susceptible to the impact of external interference factor, the most not the true character of representative sample
Exceptional sample point, the presence of which can affect the distribution even changing overall data to a great extent, and the impact on modeling is non-
Chang great.From the point of view of mathematics, exceptional sample point is exactly the sample in multivariate space away from barycenter.The most important thing is exception
Sample point represents some character being originally not belonging to model, and forecast set will not include these features under normal circumstances so that
The existence of exceptional sample point reduces predictive ability and the robustness of model.If not carrying out abnormity point analysis and rejecting, use
Finger printing pretreatment or other modeling method are all difficult to improve the effect of model, and therefore abnormal sample is rejected is each modeling work
The problem that author have to consider.
The Grade Model of 5 Xihu Longjing Teas is set up
The calibration set forecast set sample of 5.1 Grade Models divides
Totally 617, the sample differentiated for tea grades after rejecting abnormalities sample point, wherein randomly chooses 2/3rds works
Calibration set sample, remaining 1/3rd are used as forecast set sample so that calibration set had both had preferable representativeness, open up again simultaneously
The estimation range of wide model, enhances the adaptation ability of model, and sample distribution is shown in Table 3.
The Electronic Nose response diagram analysis of spectrum of 5.2 different brackets Folium Camelliae sinensis
The change response diagram of 18 sensor resistance ratios (resistance variations is compared with initial resistance value) in tea aroma detection
As shown in Figure 4, every corresponding sensor of curve, totally 18 curves.Selecting on curve represents millet paste volatile material and passes through
During sensor passage, resistivity change situation in time.According to the difference of Fundamentals of Sensors, its response intensity have positive and negative it
Point.It is LY type sensor below abscissa, is T, p-type sensor above abscissa.As shown in Figure 4, the phase before acquisition, in sample
Volatile substance carries out strong enrichment process at sensor surface, and curve response change is fast, and slope absolute value is bigger.Work as volatility
When the adsorption of material and sensor is in poised state, sensor response value reaches maximum absolute value, now best embodies
The character of gas in sample.Along with the prolongation of acquisition time, gas concentration is gradually lowered, and sensor response value is gradually reduced, bent
Line slowly tends towards stability, and is finally reached a relatively steady state.But fine work is with superfine collection of illustrative plates closely, two grades and its
Its rating sample difference is maximum, and the collection of illustrative plates of one-level sample and fine work, superfine close, but its response value scope is different, grade height
Sample, the absolute value of its response value is just big.It follows that the fragrance ingredient of millet paste is obvious responsed to by Electronic Nose, show profit
It is feasible for measuring tea leaf quality by Electronic Nose.
Response diagram within the 120s time, it is impossible to compare different sample room differences intuitively.Need to find characteristic response point,
I.e. find and represent the every sensor characteristic response intensity to a certain sample.The crest of response curve or trough are for same sample
Relative standard deviation (RSD) relatively low, the discrimination for different samples is the most maximum.Therefore, sensor response is chosen definitely
The maximum point of value, i.e. sensor respond the peak dot in signal strength map or valley point as characteristic point.In order to analyze different brackets, no
The same place of production, the difference of different tree species tea leaf quality, Fig. 2 show certain day different Folium Camelliae sinensis (numbered: LLJ, LWJ, LYJ, LHT, LMT,
QWJ, QHJ, QLJ, QYT, QMT, 1,2) responsor signal graph at each sensor crest or trough.As can be seen from Figure 2, often
One sensor is different to the response of tea aroma.LY type sensor have bright along with the difference of tea leaf quality, amplitude
Aobvious fluctuation, distinguishes obvious, and T-shaped less with its response curve discreteness of p-type.Simultaneously red secondary sample curve with
Other sample area shows clearly, although it is not it is obvious that but fine work is with superfine that one-level sample and the curve of fine work, superfine sample are distinguished
Curve all between firsts and seconds.It follows that the difference of sensor array characteristic response figure reflects to a certain extent
The quality difference of dragon well green tea from the West Lake, and there is certain characteristic and fingerprint, the taxonomic history for Folium Camelliae sinensis provides mathematics
Basis.
Fig. 5 is different brackets Folium Camelliae sinensis respective response meansigma methods figure, from figure it is clear that, the response of two grades of Folium Camelliae sinensis
It is clearly distinguished from and other rating sample.The response collection of illustrative plates of fine work, superfine and one-level is closely similar, simply sensor LY2/G,
LY2/AA, LY2/gCTL, P30/2 etc. have relatively large difference, and the difference of each sensor response signal is that subsequent mathematical is built
The basis of mould.
The principal component scores variation tendency of 5.3 all rating sample
The data matrix of different brackets Folium Camelliae sinensis sample odor characteristic parameter composition is carried out principal component analysis, its master set up
Component analysis model is:.Wherein Am × p is figure spectrum matrix, and Tm × f is score matrix, Pf × p
For loading matrix, E is collection of illustrative plates residual error, and dimension is identical with Am × p.M is sample number, and p is number of sensors, and f is main constituent
Number.
To each measuring value aij in matrix A m × p, its principal component analysis can be expressed as:
, in formula: tin is sample i score value in n-th main constituent, pnj is that sensor j is in n-th main constituent
Load value;Eij is the residual values of the variable j of sample i.
Employing stays a cross verification to carry out principal component analysis, and table 4 is the accumulation tribute of all grade tea sample principal component analysiss
Offer rate situation.The contribution rate of first principal component is 93%, represents most sample messages of initial data, front 4 main constituents
Representing the sensor information of 99%, according to main constituent character, front four main constituents can characterize the Electronic Nose intelligence of sample
Sensorial data architectural feature, thus serve the effect reducing data dimension, simplifying data.Select front 4 number of principal components modeling,
Data matrix is reduced to 617 × 4(4 main constituent from original 617 × 18).
The main constituent loading analysis of 5.4 all rating sample
In principal component analysis, the computing formula of the n-th principal component scores is:,
Wherein pij is referred to as the load (Loading) of variable aij, and load is the biggest, illustrates that main constituent is the best with the dependency of this variable, and
Variable aij is corresponding to the response value of jth sensor in sensor response matrix.The sensor response signal of different brackets Folium Camelliae sinensis
Through principal component analysis, front 4 principal component scores have reached 99% to the contribution of Folium Camelliae sinensis intelligent fingerprint change information.Fig. 6 exhibition
Load and the sensor map of front 4 main constituents are showed, it can be seen that the relation between each main constituent and sensor.
As can be seen from Figure 6, the PC1(93% maximum for representing Folium Camelliae sinensis quantity of information), predominantly LY2/G that its load is bigger,
These four sensors of LY2/AA, LY2/GH, LY2/gCTL, for Second principal component, except sensor LY2/AA, also P10/1,
P10/2, P40/1 and T40/1, TA/2 dependency bigger.Under 3rd main constituent, sensor LY2/LG, LY2/G, LY2/AA,
The dependency of LY2/GH is bigger;Under 4th main constituent, the dependency of sensor LY2/AA, T30/1, T70/1 and T40/1 is bigger.
The number of principal components of 5.5 SIMCA grade modelings selects
First similarity classification method (SIMCA) modeling carries out principal component analysis modeling to each class sample, makes similar sample gather
Collection is in the same space region.Table 5 is different brackets sample each principal component model contribution rate under different main constituents, institute
Gradational first principal component contribution rate is all more than 99%, and the most nearly all grade is all that front 5 main constituents substantially represent
The main information of sample.
Similarity classification method algorithm is based on the method setting up principal component analysis class model, rings through principal component analysis sensor
The change of induction signal main constituent can embody the trend of tea leaf quality feature the most intuitively, and the determination of number of principal components is to set up
The key of good model.Owing to similarity classification method algorithm is concerned with the similarity degree within each grade, and each main constituent represents
Be the variation property of same level correction sample, the levels characteristic that the most forward main constituent comprises is the abundantest, the work to classification
With the biggest, so several main constituents can make classification quality reach optimal before selecting, the main constituent simultaneously selected comprise etc.
Level feature is the most, and the effect of modeling and forecasting is the best.
But select too much number of principal components can bring the effect of model over-fitting equally.In this invention, by alternately
Checking primarily determines that the optimal number of principal components of above-mentioned different brackets Folium Camelliae sinensis model, i.e. becomes at predictive residual error sum of squares (PRESS)
Fewer number of principal components is chosen in the case of changing not quite.Along with main constituent increases, PRESS is gradually reduced, but main constituent exceedes
During certain numerical value, due to the appearance of Expired Drugs, PRESS increases on the contrary.Fig. 7 is the PRESS value of different brackets instance model
And the relation between number of principal components.Owing to fine work is very big, the most not with superfine PRESS value in main constituent one and two
All draw.The number of principal components of fine work is when 9, and PRESS value is minimum, and number of principal components is between 5-8, and the change of its PRESS value is less;
Superfine number of principal components is when 7, and PRESS value is minimum, and number of principal components is when 5 and 6, and the change of its PRESS value is less;The main one-tenth of one-level
When mark is 6, PRESS value is minimum, and number of principal components is when 4 and 5, and the change of its PRESS value is less;When the number of principal components of two grades is 6,
PRESS value is minimum, and number of principal components is when 4 and 5, and the change of its PRESS value is less.
The similarity classification method Grade Model of 5.6 Folium Camelliae sinensis is set up and prediction
The estimated performance of similarity classification method hierarchy model is extremely important, is mainly manifested in whether forecast model may adapt to
The mensuration of new data.Good model can describe the data similar to modeling data, and inspection refers to new similar data
Bring model into, then observe whether forecast error meets predetermined requirement, thus prove the reasonability of selected number of principal components.
Forecast test is divided into two kinds: one to be external inspection, refers to use brand-new prediction data to verify;Other one
Plant and be referred to as internal inspection, refer to use the data of modeling itself that model is verified.In theory, the prediction energy of a model
Power can only be checked by brand-new data, but cross-validation (Cross validation) also can provide reasonably knot
Really.
If sample size is less or little, cross-validation method significantly more efficient can utilize limited sample,
But it is slower than external inspection method to calculate speed.In cross-validation algorithm, identical sample is both for model
In structure, again in the inspection of model.Basic ideas are as follows: first reserve a certain amount of sample from calibration set sample, use it
Calibration model set up by remaining sample, is then predicted with those reserved sample input models, draws forecast error;This mistake
Journey repeats, until each sample was reserved out once, is predicted inspection, then with the prediction repeatedly modeled by mistake
Difference calculates overall residual variance and mean square deviation.Cross validation is a kind of extraordinary internal inspection method, as external inspection
Method is the same, and pursuit is to use independent data to test model, and main benefit is unlike external inspection, in advance
Survey data and be only intended to inspection, and waste data resource.
Cross validation method can be divided into again full figure spectrum cross validation (full cross validation), partial intersection
The several methods such as checking (segmented cross validation).Full figure spectrum cross validation is the cross validation used the earliest
Method, its thinking is only to reserve a sample from gross sample as forecast set sample during modeling every time, and other sample
For modeling, repeat this process, until all of sample all reserves once carrys out testing model as prediction sample.Due to full figure
Spectrum cross-validation method needs to expend a great deal of time, and verifying speed is slow, and partial intersection proof method is only all samples to be divided into
Several parts are verified.
But owing to full figure composes the effective of cross validation, and extensively it is used.First, the actual pre-of model can be estimated
Survey ability, although be internal inspection, but do not participate in modeling as predicted sample, can simulate unknown sample
Prediction case;Second, the sample number of calibration set is the most, and the sample number that modeling is rejected every time is the fewest, and estimation effect is just
The best.
Predictive ability for a model usually uses full spectrogram validation-cross and the external prediction of forecast set of calibration set
Check.Full spectrogram validation-cross is used for the model predictive ability for calibration set, is self-checking evaluation;External prediction is used for commenting
Valency model is for the indication ability of forecast set sample.Generally, full spectrogram validation-cross estimated performance is higher than external prediction,
Full spectrogram validation-cross illustrates the classification capacity of model and Selection parameter to a certain extent, and external prediction is one and more can illustrate
The index of problem, its used characteristic variable of reaction and the robustness of model and adaptability, table 6 is similar to four grade differences
The effect of classification method (SIMCA) calibration modeling.
By table 6, it is known that the discrimination of four rating sample models can only achieve 70% multiple spot, it not the highest, mainly fine work
With superfine tea aroma feature closely, have impact on the estimated performance of block mold.Individually the differentiation of this two-stage sample is built
Mould discrimination the most only about 67%, illustrates that the sample overlap ratio of the two grade is more serious.Trace it to its cause and be because fine work with special
The division of level mainly goes out to send division from commodity tea angle, and namely from fragrance, flavour and plucking time, difference is fairly small,
Main the most variant in terms of the outward appearances such as the regularity of Folium Camelliae sinensis, size homogeneity, for not having unqualified, uniform to be set to fine work
Tea, and bright other front Folium Camelliae sinensis is just set to superfine tea.Therefore, fine work is with superfine odor characteristic closely.
In order to study the power of test of Electronic Nose further, the fine work in this level Four sample is combined into one with superfine sample
The sample of grade is referred to as " essence is superfine " and sets up with the similarity classification method discrimination model that I and II carries out Three Estate, finds model
Estimated performance the best, calibration set, the discrimination of forecast set respectively reach 93.43% and 92.72%, above 92%.The most single
Solely fine work, one-level, two grades of similarity classification method discrimination models carrying out Three Estate are set up, also individually by superfine, one-level, two grades
The similarity classification method discrimination model carrying out Three Estate is set up, and these three grades of models have stronger identification ability, their knowledge
Not rate is above 90%, also absolutely proves that level Four model prediction poor performance is because fine work and superfine sample message is overlapping is led
Cause.It addition, in similarity classification method pattern recognition, the foundation of tea grades model is substantially make use of linear sentences method for distinguishing,
The result of Folium Camelliae sinensis identification not yet reaches the discrimination of 100%, and this is likely due to be rung by storage time, condition of storage and sensor
The characteristic of induction signal so that the signal of acquisition exists nonlinear transformations, so can also attempt to utilize it in work afterwards
Its nonlinear mode identification method sets up model.These three grades of models can substantially meet market detection needs at present.
In principal component analysis Fig. 3 (a), it can be seen that secondary sample collection is maximum with the dispersion degree of other sample sets, uses meat
Eye just can gem-pure distinguish, and is modeled by similarity classification method two discriminant classification of one-level, two grades, and its calibration set is with pre-
The discrimination surveying collection is all 100%, illustrates that the difference of I and II sample message is very big, and the entirely appropriate popularization of this model should
With.
6 intelligent sensory TuPu method sensor systems of selection
In Electronic Nose, the response performance of sensor mainly includes whether the response of same sample is had good by same sensor
Good stability and whether different samples are had higher distinction.
Optimum combining method is the abnormal smells from the patient response signal data that applying electronic nose gathers not same quality sample, by not simultaneous interpretation
The variance analysis of sensor response signal value, carries out Preliminary screening and packet according to sensor response performance quality, then to packet
Sensor carries out permutation and combination, with the discriminant index DI of principal component analysis result as foundation, finally determines and most has sample classification
The sensor array of effect.Although the method employs the method for combination the most to a certain extent, but this combination is tentatively to pick
On the basis of removing, the sensor array after packet is combined, and the effect of not up to global optimization combination.
Loading value method, will sensor as analyzing object, sensor response value under different samples is led
Component analysis, judges to distinguish intimate sensor by principal component analysis figure (being also the Loading analysis chart of sensor)
And reject.Although the method avoids the addition of redundant sensor, but does not analyze the response performance of selected sensor, i.e.
The repeatability that same sample is responded by same sensor and the diversity to different sample response.
Genetic algorithm (Genetic Algorithms, be abbreviated as GA) is with Darwinian survival of the fittest and the survival of the fittest
Based on biological evolution theory, the simulation heredity of biosphere and evolutionary process and a kind of optimization method of setting up, have non-derivative,
Stochastic global optimization, avoid being absorbed in local minimum point and the easy feature such as realization.Its basic thought is by Problem Areas (multisensor group
Gregarious) in may solve (a certain sensor building form) regard body one by one or the dye of population (multi sensor combination group) as
Colour solid (a certain sensor building form), and each individuality is encoded into binary character string form;Genetic algorithm is by dye
" fitness value " of colour solid evaluates the quality of chromosome, and the selected probability of chromosome that fitness value is big is high, on the contrary, adapts to
The selected probability of chromosome that angle value is little is little, and selected chromosome enters the next generation;Chromosome in the next generation passes through
The genetic manipulations such as intersection and variation, produce new chromosome, i.e. " offspring ";After some generations, algorithmic statement is in best dye
Colour solid, this chromosome is exactly optimal solution or the near-optimum solution of problem, the most selected optimal sensor array.The realization of genetic algorithm
Mainly include 5 fundamentals: parameter coding, the choosing of variable, the initialization of colony, fitness function design, genetic manipulation
Design and convergence criterion etc..Wherein the genetic manipulation as important step includes three operators: selects, intersect and makes a variation.Its behaviour
Fig. 8 is seen as flow process.
Sensor during the present invention uses genetic algorithm In Grade, the place of production to set up with seeds model carries out selecting to optimize.Lose
All calculating in propagation algorithm are completed by self-editing MATLAB 7.0 program, its key parameter such as table 7.The concrete step of this algorithm
Rapid as follows:
(1) selecting suitable variable parameter: Population Size 40, crossover probability pc is 0.6, and mutation probability pm is 0.1, heredity
The termination evolutionary generation T of algorithm is 200.
(2) put k=0, randomly generate initial population:。
(3) chromosome coding: all the sensors is carried out binary coding, and each sensor is as a gene (altogether
18 genes).If gene code is 1, then modeling includes this sensor;If 0, then do not include this sensor during modeling.A kind of
Coded combination is referred to as item chromosome.
(4) adaptive value function F (k) is determined: this experiment uses cross verification evaluation to the predictive ability of model, it is desirable to institute
The discrimination of established model is maximum, then pattern function is:。
(5) selection of chromosome: determine previous generation's chromosome information that fitness value is big by conventional " roulette method "
It is delivered to the next generation.
(6) intersection of chromosome: use single-point interior extrapolation method, randomly choose a fixed number according to predetermined crossover probability pc
The chromosome of amount is to as parents;Then, randomly choose a cross point, the gene strand on the right side of exchange parents cross point, produce
New filial generation;Finally, replace parent chromosome by child chromosome, produce new population (see figure 9).This is to produce new individual master
Want method, determine the ability of searching optimum of genetic algorithm.
(7) variation of chromosome: use basic bit mutation method, make with predetermined Probability p m the gene of chromosome change
Become, i.e. 1 and 0 mutual phase transformation, replace parent (see figure 10) by the child chromosome after variation.Individuality after intersecting is become
Different, obtain population of future generation:;This is the auxiliary of the new individuality of generation
Aid method, can prevent immature oils phenomenon, improves the local search ability of sensor.
(8) circulation stopping criterion: whether reach maximum reproductive order of generation (Genmax) or optimal solution that preliminary election sets, reach then
Stop;Otherwise, (4) it are circulated back to.
Sensor in 6.1 Grade Models selects
The sensor response collection of illustrative plates of In Grade model, after 3 take turns genetic algorithm, finds three sensors LY2/LG, P40/
1, the frequency that TA/2 is used in each genetic process is minimum, therefore rejects this three sensors, to the LY2/G stayed,
LY2/AA、LY2/GH、LY2/gCTL、LY2/gCT、T30/1、P10/1、P10/2、T70/2、PA/2、P30/1、P40/2、P30/
2, these 15 sensors such as T40/2, T40/1 carry out the foundation of different brackets model, the modeling effect before and after the rejecting of its sensor
It is shown in Table 8.For I and II model, owing to the sample of itself differs greatly, after sensor is rejected, discrimination still retains
100%;The fine work the least for sample difference and superfine sample, the effect of model is protected almost without the biggest change, calibration set
Hold more than 67%, it was predicted that collection change is the most little;After choosing deleted by same sensor, for fine work, superfine, firsts and seconds these four
The model of rating sample is set up, it was predicted that the discrimination of collection is not changed in, or about 70%;Fine work, three classification of firsts and seconds
In model, the discriminating power of calibration set and forecast set brings up to 92.83% and 92.09% from 92.11%, 90.65% respectively;Superfine,
Although the estimated performance of one-level and tertiary sample Grade Model decreases, but very close to, before and after its sensor is rejected
Effect is the most also about 95%;The Forecasting recognition rate of essence spy, one-level and tertiary sample Grade Model increases equally, with complete
The 92.73% of portion's Sensor Model becomes the 93.20% of 15 Sensor Models.As can be seen here, through sensor select after etc.
The performance of level discrimination model does not reduce, and have becomes excellent the most on the contrary, but makes the quantity of sensor be reduced.
The mechanism disallowable in order to study sensor further, makes a concrete analysis of the response performance of these Electronic Nose sensors.
The measurement of response performance mainly includes whether same sensor has good cohesion and to difference to the response of similar sample
Whether class sample has higher distinction.The principle of application variance analysis, every sensor as a factor, the most equally
The response of product, as level, carries out homogeneity test of variance, it is ensured that data meet the condition of variance analysis.Application SPSS data analysis
Software carries out the calculating F value (table 9) of one factor analysis of variance respectively to the sensing data of all rating sample.F value shows same
One sensor separating capacity to inhomogeneity sample, F value is the biggest, and discrimination is the biggest.
Although the F assay of all the sensors is both greater than F0.05=2.60, i.e. all the sensors is different grades of to four
Discrimination is notable, but compares the F value of all the sensors, and wherein LY2/LG, TA/2 are both less than 25 with the F value of T40/1, and F value is fallen
Several 4th little P10/1 are more than 5 times of these three sensors, and the F value minimum only 8.003 of LY2/LG, therefore pick
Except this sensor.
Simultaneously in four rating sample data load diagram (Figure 11) after principal component analysis, TA/2 Yu T40/1 exists
, belong to the sensor playing common class effect, but the load value that TA/2 is under PC2 is less than T40/1 in load diagram relatively, so
Sensor TA/2 can be rejected.According to same principle, sensor P40/1 Yu P10/1 is nearly at overlap condition in load diagram,
Further according to the Combinatorial Optimization method of sensor, final rejecting sensor P40/1.
Sensor in 6.2 place of production models selects to optimize and screening strength
(1) the calibration set forecast set sample of place of production model divides
In order to ensure the comparability of place of production model, at this mainly for the different sources under the conditions of the same seeds of same grade
Folium Camelliae sinensis model.In 617 Tea Samples gathered, there is following four place of production model: it is superfine that (1) originates from Dragon Well tea 43# seeds
Mountain (LHT) and Mei Jia depressed place (LMT) model after the tiger race of Folium Camelliae sinensis;(2) Fructus Myricae rubrae ridge (QYT) and the prunus mume (sieb.) sieb.et zucc. of colony's seeds superfine Folium Camelliae sinensis are originated from
Family's depressed place (QMT) model;(3) Fructus Myricae rubrae ridge (LYJ) and Weng Jiashan (LWJ) model of Dragon Well tea 43# seeds fine work Folium Camelliae sinensis are originated from;(4) produce
Mountain (QHJ), Long Wu (QLJ) and Weng Jiashan (QWJ) model after the tiger of colony's seeds fine work Folium Camelliae sinensis runs.To the sample in each model
Product, all randomly choose 2/3rds and make calibration set sample, and remaining 1/3rd are used as forecast set sample, and concrete sample distribution is such as
Shown in table 10.
(2) Electronic Nose of place of production model responds collection of illustrative plates and principal component analysis
Figure 12 is four place of production model respective average response collection of illustrative plates, as seen from the figure model LHT-LMT and model QYT-QMT
Collection of illustrative plates distinguish very big, the collection of illustrative plates of model LYJ-LWJ is at sensor LY2/G, LY2/AA, LY2/GH, LY2/gCTL and P30/2
Place differs greatly, and in model QHJ-QLJ-QWJ, the average fingerprint profile variation in three places of production is the least.
From principal component scores Figure 13, it is also possible to see in model QYT-QMT there being the most substantially the sample in each place of production
Region, and sample variation degree between two places of production is maximum;Although each place of production sample also has respective district in model LHT-LWT
Territory, but between two places of production, there is no obvious distinguishing limit;In model LYJ-LWJ, two places of production not only do not have obvious boundary, simultaneously
Also have and intersect and overlapping region;And sample cross in model QHJ-QLJ-QWJ is the most, almost it is hardly formed respective product
Ground classification.
(3) sensor in the model of the place of production selects
At this individually to the sensor of place of production model (QYT-QMT) after 3 take turns genetic algorithm, pick out LY2/G, LY2/
These seven sensors of AA, T30/1, P10/1, P40/1, T70/2, PA/2, reject LY2/LG, LY2/GH, LY2/gCTL, LY2/
GCT, P10/2, P30/1, P40/2, P30/2, T40/2, T40/1, TA/2 these 11 is used the sensor that frequency is low.To selected
The sensor selected carries out place of production differentiation, and its effect is shown in Table 11.After rejecting 11 sensors, Suo Jian colony tree kind is on Fructus Myricae rubrae ridge and prunus mume (sieb.) sieb.et zucc.
The estimated performance or 100% of the place of production model of the family's place of production, depressed place two superfine Folium Camelliae sinensis, and respective number of principal components all reduces from 5 and 6
To 2 so that model more simplifies, and greatly reduce number of sensors.By the average fingerprint collection of illustrative plates of this model with main become
Divide analysis chart, may infer that between each place of production, sample differs greatly so that every sensor performance is all preferable, is simply keeping model
On the basis of performance is constant, simplify the sensor number required for modeling as far as possible.Here, just can be well with seven sensors
Set up colony's seeds superfine Folium Camelliae sinensis in Fructus Myricae rubrae ridge and the place of production, Mei Jia depressed place two.
Model LHT-LWT, model LYJ-LWJ, the sensor response collection of illustrative plates of model QHJ-QLJ-QWJ take turns something lost respectively through 3
After propagation algorithm, discovery is all the frequency that tetra-sensors of LY2/LG, PA/2, P30/1, TA/2 are used in each genetic process
Minimum, therefore reject this four sensors.To the LY2/G stayed, LY2/AA, LY2/GH, LY2/gCTL, LY2/gCT, T30/1,
These 14 sensors such as P10/1, P10/2, P40/1, T70/2, P40/2, P30/2, T40/2, T40/1 carry out building of place of production model
Vertical, the modeling effect before and after its sensor is rejected is shown in Table 11.
Tiger is originated from for full spectrum modeling effect the most less-than-ideal model QHJ-QLJ-QWJ(and runs Hou Shan, Long Wu and father-in-law
Colony's seeds fine work Folium Camelliae sinensis that family's mountain these three is local), after sensor selects, calibration set and the overall discrimination of forecast set
79.59% and 69.39% is brought up to the most respectively from 71.43%, 67.35%.Before and after although sensor is rejected, the school of model LYJ-LWJ
Just collecting discrimination or 93.85%, but the differentiation effect of forecast set is bringing up to 90.91% from 87.88%.The sensor of institute's established model
After number is reduced to 14, although the prediction effect of model LHT-LMT is not reaching to original 100%, but also have 96.97%, super
Cross 95%, fully meet popularization and application.
The principle of variance analysis is applied, every sensor as a factor, the response of different samples in this model
As level, carrying out homogeneity test of variance, table 12 is that the place of production model LHT-LMT, LWJ-LYJ are to sensor LY2/LG Yu TA/2
F checks.F due to these 2 models0.05=3.84, therefore the place of production of these 2 models is distinguished not notable by these two sensors, because of
This can reject this two sensors in these 2 models.
The most at source in the sample data of model LHT-LMT load diagram (Figure 14 (a)) after principal component analysis,
PA/2, P30/1 are close, by Combinatorial Optimization with the effect of other two sensors in the most red mark and blue mark respectively
Effect, rejects this two sensors in model LHT-LMT.Understand in the load diagram of model LYJ-LWJ according to same principle,
PA/2 Yu T70/2 relatively, belongs to the sensor playing common class effect in load diagram;P30/1 Yu P40/2 closely,
There is similar Loading value, belong to the sensor of common class effect, the most also reject this two sensors.
Sensor in 6.3 seeds models selects to optimize and screening strength
(1) the calibration set forecast set sample of seeds model divides
In order to ensure the comparability of seeds model, at this research in mainly under the same production region conditions of same grade not
With seeds Folium Camelliae sinensis model.In 617 Tea Samples gathered, there are two seeds models: (1) originates from Mei Jia depressed place superfine Folium Camelliae sinensis
Dragon Well tea 43#(LMT) and colony seeds (QMT);(2) originate from father-in-law family's Rhizoma Atractylodis Macrocephalae to sample tea the Dragon Well tea 43#(LWJ of leaf) and colony seeds
(QWJ).To the sample in each model, all randomly choosing 2/3rds and make calibration set sample, remaining 1/3rd are used as in advance
Surveying collection sample, concrete sample distribution is as shown in table 13.
(2) Electronic Nose of seeds model responds collection of illustrative plates and principal component analysis
Figure 15 is the average response collection of illustrative plates of respective seeds in two seeds models, is difficult to directly differentiation respective in collection of illustrative plates
Seeds model.In the main constituent figure of Figure 16, owing to the sample of all kinds of seeds presents overlapping phenomenon, it is impossible to carry out seeds intuitively
Judge.
(3) sensor in seeds model selects
The sensor of seeds model (LMT-QMT) is responded collection of illustrative plates after 3 take turns genetic algorithm, find five sensors
The frequency that LY2/AA, LY2/GH, LY2/gCT, T30/1, TA/2 are used in each genetic process is minimum, therefore reject this five
Root sensor, to the LY2/LG stayed, LY2/G, LY2/gCTL, P10/1, P10/2, P40/1, T70/2, PA/2, P30/1,
These 13 sensors such as P40/2, P30/2, T40/2, T40/1 carry out the foundation of different tree species model, before and after its sensor is rejected
Modeling effect be shown in Table 14.Use Dragon Well tea 43# and the group of 13 sensor array JIANMEI man of institute depressed place superfine Folium Camelliae sinensis of function admirable
Body seeds model, its overall discrimination increases, and not only calibration set brings up to 96.92% from 95.38%, and forecast set from
93.94% brings up to 96.97%, with the discrimination of calibration set closely, absolutely proves that this model is highly stable.
The sensor of seeds model (LWJ-QWJ) is responded collection of illustrative plates after 3 take turns genetic algorithm, find four sensors
The frequency that P10/1, P40/1, T40/1, TA/2 are used in each genetic process is minimum, therefore rejects this four sensors,
To the LY2/LG stayed, LY2/G, LY2/AA, LY2/GH, LY2/gCTL, LY2/gCT, T30/1, P10/2, T70/2, PA/2,
These 14 sensors such as P30/1, P40/2, P30/2, T40/2 carry out the foundation of different tree species model, before and after its sensor is rejected
Modeling effect be shown in Table 14.As seen from table, although sensor is reduced to 14 from 18, but the estimated performance of this seeds model does not has
Becoming, calibration set and forecast set keep original 92.31% and 93.34% the most respectively.
By the sample sensor data of each seeds model being carried out respectively one factor analysis of variance, find at seeds mould
The discrimination of these five sensors rejected in type (LMT-QMT) is the least, and its F value is both less than F0.05=3.84(table 15);
Seeds model (LWJ-QWJ) eliminates inapparent four sensors of all discriminations (table 16).
For this different model of three classes of grade, the place of production and seeds, its initial data is different, and model property is the most different, therefore
After using genetic algorithm, the most different for the number of sensors of each self-modeling.Sensor number used by all Grade Models is all 15
Root;In the model of the place of production, colony's kind superfine Folium Camelliae sinensis produces the sensor number of model (LHT-LMT) in Fructus Myricae rubrae ridge and two places, Mei Jia depressed place
Being reduced to 7, other three place of production models (LHT-LMT, LYJ-LWJ, QHJ-QLJ-QWJ) are all 14;In seeds model, prunus mume (sieb.) sieb.et zucc.
Sensor number used by the Dragon Well tea 43# of family's depressed place superfine Folium Camelliae sinensis and colony's seeds model (LMT-QMT) is 13, and father-in-law family's Rhizoma Atractylodis Macrocephalae is sampled tea leaf
Dragon Well tea 43# and colony seeds model (LWJ-QWJ) be 14 sensors.
The present invention utilizes the characteristic of genetic algorithm parallel optimization and global convergence, applies the method in Electronic Nose analysis
In the modeling sensor screening of tea leaf quality, not only make modeling number of sensors effectively be reduced, simplified model, reduce
The instrument requirement to number of sensors, saves resource, saves instrument cost;And keep or further increase precision of prediction,
Obtain preferable result.
Claims (1)
1. apply intelligent sensory signal detects an exceptional sample method during carrying out the pattern recognition analysis of tea leaf quality,
It is characterized in that: use detection by electronic nose method collecting sample, described Electronic Nose use Alpha MOS company of France produce with
The Fox 4000 type Electronic Nose of head space automatic system;First, add after loading 1.00g Dragon Well tea dried bean curd tea in each 20mL ml headspace bottle
Enter 5mL room temperature ultra-pure water gland to seal;Sample contained the most by this way by every kind of tea sample, and detects successively;The detection ring of each sample
Joint is all first ml headspace bottle to be sent into preheating zone, after heating 900s, extracts out at a temperature of 500rpm agitator rotating speed and 60 DEG C of head spaces
2.0mL gas is injected into Electronic Nose sensor array room with the sample introduction speed of 2.0mL/s, respectively with Electronic Nose sensor array room
In the semi-conducting material generation adsorption and desorption effect on 18 metal oxide sensor surfaces, cause sensor resistance
Change;Being 120s in the sensor array indoor sample gas time of staying, every 0.5s samples once, and Electronic Nose software is remembered automatically
Record each sampled data;
Judge the generation of exceptional sample be whether maloperation or instrument abnormal, be then to be corrected by Resurvey;No, then
Use principal component analysis shot chart method to combine mahalanobis distance method exceptional sample is identified;
Described principal component analysis shot chart method eliminates phase mutual respect in information co-exist on the premise of not losing main profile information
Folded part, the similarity between principal component scores reflected sample after principal component analysis and uniqueness, based on sample shot chart energy
Enough disclose internal feature and the clustering information of sample, further illustrate whether each sample exists bigger difference in big class sample set
Different;Wherein, the method that main constituent calculates uses the nonlinear iterative partial least square method of house one validation-cross;
The response data that described mahalanobis distance method of discrimination combines sensor is carried out, the mahalanobis distance calculation procedure of collection of illustrative plates sample
As follows:
T in formulaiFor the collection of illustrative plates score of calibration set sample i, T is the finger printing sample of Folium Camelliae sinensis sampling,For m sample of calibration set
Average matrix;TcenAverage centralization matrix for T;M is the mahalanobis distance matrix of calibration set sample;MDiFor calibration set
The mahalanobis distance of sample i, according to quantitative correction allowable error and corresponding mahalanobis distance, determines that outlier mahalanobis distance threshold value limits
And after to spectrum data standardization, the mahalanobis distance size of each sample is determined by following formula:
hii=ti T(TTT)-1ti (1-5)
In intelligent sensory sensor detects, hiiHave expressed sample i to regression model influence degree, maximum hiiShow this recurrence
Model is relatively big to the dependency of sample i, then sample i is exceptional sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310323279.2A CN103487558B (en) | 2013-07-30 | 2013-07-30 | A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310323279.2A CN103487558B (en) | 2013-07-30 | 2013-07-30 | A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103487558A CN103487558A (en) | 2014-01-01 |
CN103487558B true CN103487558B (en) | 2016-10-12 |
Family
ID=49827944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310323279.2A Active CN103487558B (en) | 2013-07-30 | 2013-07-30 | A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103487558B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109884257A (en) * | 2019-03-28 | 2019-06-14 | 南京林业大学 | The discrimination method of cyclocarya paliurus tea |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103499607B (en) * | 2013-07-30 | 2016-03-09 | 中国标准化研究院 | Exceptional sample point elimination method in a kind of detection by electronic nose honey |
KR101637773B1 (en) * | 2014-12-11 | 2016-07-07 | 현대자동차주식회사 | Apparatus for judging sense of smell and method for the same |
CN104655812B (en) * | 2014-12-16 | 2016-05-04 | 谢绍鹏 | The good and bad method for quick identification of a kind of pseudo-ginseng true and false |
CN104849320A (en) * | 2015-06-04 | 2015-08-19 | 安徽农业大学 | Yellow bud tea aroma grade sorting method by use of electronic nose |
CN106096649B (en) * | 2016-06-08 | 2019-08-06 | 北京科技大学 | Sense of taste inductive signal otherness feature extracting method based on core linear discriminant analysis |
CN106227039B (en) * | 2016-08-24 | 2019-07-09 | 贵州铜仁和泰茶业有限公司 | A kind of tea-processing equipment control method based on pattern-recognition |
CN106325154B (en) * | 2016-08-24 | 2018-12-11 | 贵州铜仁和泰茶业有限公司 | A kind of tealeaves rolling heating stirring machine control method based on pattern-recognition |
CN106501470B (en) * | 2016-11-23 | 2018-10-30 | 广东嘉豪食品有限公司 | Utilize the method for gustatory system and electronic nose association evaluation mustard thick chilli sauce flavor grade |
CN106680241A (en) * | 2017-01-13 | 2017-05-17 | 北京化工大学 | Novel spectrum multi-analysis classification and identification method and application thereof |
CN107273421B (en) * | 2017-05-16 | 2020-10-23 | 浙江大学 | High-accuracy mode identification and detection method for aroma type and quality of tea |
CN107436285A (en) * | 2017-06-20 | 2017-12-05 | 苏州优函信息科技有限公司 | Fast high-flux bloom spectrum detection device and detection method based on linear light source excitation |
CN107846670B (en) * | 2017-11-01 | 2020-05-26 | 东华大学 | Blind regression modeling and updating method for protecting data privacy in mobile group perception |
CN108133313B (en) * | 2017-11-12 | 2021-07-20 | 华南农业大学 | Artificial intelligent sensory evaluation food flavor system and construction method thereof |
CN108627641A (en) * | 2018-04-28 | 2018-10-09 | 璞晞(广州)生物免疫技术有限公司 | The check and evaluation method and kit of hepatopathy T cell function |
CN109115692B (en) * | 2018-07-04 | 2021-06-25 | 北京格致同德科技有限公司 | Spectral data analysis method and device |
CN110780010A (en) * | 2019-09-16 | 2020-02-11 | 陕西师范大学 | Food flavor quality evaluation information detection method and system |
CN110672582B (en) * | 2019-10-08 | 2020-09-15 | 浙江大学 | Raman characteristic spectrum peak extraction method based on improved principal component analysis |
CN112415152A (en) * | 2020-10-10 | 2021-02-26 | 华南农业大学 | Method for identifying yak milk adulteration and application |
CN113705856B (en) * | 2021-07-16 | 2023-10-03 | 北京电子工程总体研究所 | Maintenance strategy optimization method based on dynamic monitoring of multiple quality characteristics |
CN113836784B (en) * | 2021-07-23 | 2023-10-27 | 塔里木大学 | Apple identification system and method based on information fusion technology |
CN114235981B (en) * | 2021-11-17 | 2024-07-02 | 上海应用技术大学 | Method for identifying perilla leaf essential oil by combining gas phase-mass spectrum-sniffing instrument and gas chromatography-ion mobility spectrometry |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222164A (en) * | 2011-05-30 | 2011-10-19 | 中国标准化研究院 | Food sensory quality evaluation method and system thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009217555A (en) * | 2008-03-11 | 2009-09-24 | Mitsubishi Electric Corp | Device for determining abnormality of network |
-
2013
- 2013-07-30 CN CN201310323279.2A patent/CN103487558B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102222164A (en) * | 2011-05-30 | 2011-10-19 | 中国标准化研究院 | Food sensory quality evaluation method and system thereof |
Non-Patent Citations (4)
Title |
---|
FTIR-ATR指纹图谱的主成分分析-马氏距离法应用于烟用香精质量控制;王家俊 等;《光谱学与光谱分析》;20070531;第27卷(第5期);第895-989页 * |
一种基于Mahalanobis距离和主成分分析的电子鼻信号预处理方法;马剑伟 等;《电脑知识与技术》;20100331;第6卷(第7期);第1699-1700,1717页 * |
近红外光谱分析中异常值的判别与定量模型优化;闵顺耕 等;《光谱学与光谱分析》;20041031;第24卷(第10期);第1205-1209页 * |
近红外光谱的主成分分析——马氏距离分类法应用于品牌卷烟烟丝的快速鉴别;李维莉 等;《云南农业大学学报》;20100331;第25卷(第2期);第268-271页 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109884257A (en) * | 2019-03-28 | 2019-06-14 | 南京林业大学 | The discrimination method of cyclocarya paliurus tea |
Also Published As
Publication number | Publication date |
---|---|
CN103487558A (en) | 2014-01-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103487558B (en) | A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality | |
CN103487537A (en) | Detection method for producing areas of Xihulongjing tea based on genetic algorithm optimization | |
Chen et al. | Non-parametric partial least squares–discriminant analysis model based on sum of ranking difference algorithm for tea grade identification using electronic tongue data | |
Dębska et al. | Application of artificial neural network in food classification | |
Forina et al. | Class-modeling techniques, classic and new, for old and new problems | |
Lu et al. | Quality level identification of West Lake Longjing green tea using electronic nose | |
CN103499613A (en) | Selection method of intelligent sensory spectrum feature sensors in electronic nose Longjing tea quality detection system | |
CN104316491B (en) | Method for detecting urea doped in milk based on synchronous-asynchronous two-dimensional near-infrared correlation spectrum | |
CN110309886A (en) | The real-time method for detecting abnormality of wireless sensor high dimensional data based on deep learning | |
Ayari et al. | Using an E‐nose machine for detection the adulteration of margarine in cow ghee | |
CN102222164A (en) | Food sensory quality evaluation method and system thereof | |
Zhang et al. | ‘Sensory analysis’ of Chinese vinegars using an electronic nose | |
Peres et al. | Chemometric classification of several olive cultivars from Trás-os-Montes region (northeast of Portugal) using artificial neural networks | |
CN103499663B (en) | A kind of system of selection based on sensor in the Longjing tea Quality Detection Grade Model of genetic algorithm | |
Kalogiouri et al. | Liquid chromatographic methods coupled to chemometrics: A short review to present the key workflow for the investigation of wine phenolic composition as it is affected by environmental factors | |
CN103499609B (en) | A kind of method that honey fragrance intelligence sense of smell dynamic response feature and differentiation information dynamic characterization are studied | |
de Lima et al. | Methods of authentication of food grown in organic and conventional systems using chemometrics and data mining algorithms: A review | |
CN113326472B (en) | Pattern extraction and evolution visual analysis method based on time sequence multivariable data | |
Tundis et al. | A feature-based model for the identification of electrical devices in smart environments | |
Sipos et al. | Comparison of novel sensory panel performance evaluation techniques with e‐nose analysis integration | |
Dercon | Understanding child poverty in developing countries: Measurement and analysis | |
Fu et al. | Discrimination of geographical indication of Chinese green teas using an electronic nose combined with quantum neural networks: A portable strategy | |
CN103499665A (en) | Optimizing West Lake Longjing tea tree species detection method on basis of genetic algorithm | |
CN103499616A (en) | Selection method of sensors in producing area models for quality detection of Longjing tea on basis of genetic algorithm | |
CN103487463B (en) | A kind of honey detection method of the support vector machine classifier Selecting parameter based on grid optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |