Nothing Special   »   [go: up one dir, main page]

CN103487558B - A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality - Google Patents

A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality Download PDF

Info

Publication number
CN103487558B
CN103487558B CN201310323279.2A CN201310323279A CN103487558B CN 103487558 B CN103487558 B CN 103487558B CN 201310323279 A CN201310323279 A CN 201310323279A CN 103487558 B CN103487558 B CN 103487558B
Authority
CN
China
Prior art keywords
sample
sensor
model
tea
mahalanobis distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310323279.2A
Other languages
Chinese (zh)
Other versions
CN103487558A (en
Inventor
赵镭
史波林
支瑞聪
汪厚银
裴高璞
刘宁晶
解楠
张璐璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Institute of Standardization
Original Assignee
China National Institute of Standardization
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Institute of Standardization filed Critical China National Institute of Standardization
Priority to CN201310323279.2A priority Critical patent/CN103487558B/en
Publication of CN103487558A publication Critical patent/CN103487558A/en
Application granted granted Critical
Publication of CN103487558B publication Critical patent/CN103487558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality, it is characterised in that: judge the generation of exceptional sample be whether maloperation or instrument abnormal, be then to be corrected by Resurvey;No, then use principal component analysis shot chart method to combine mahalanobis distance method and exceptional sample is identified.

Description

During a kind of pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality The method of detection exceptional sample
Technical field
The application detects different during relating to a kind of pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality The often method of sample.
Background technology
Sensory evaluation is the important method that evaluation tea leaf quality is good and bad for a long time, but the method needs abundant Tea Science Knowledge and evaluate experience.Only specialty tea judgement person, distributor or manufacturer, person is difficult to differentiate tea quality typically to purchase tea Quality, there is no the accumulation of suitable experience, it is difficult to obtain reliable result.And cultivate a Folium Camelliae sinensis syndic not only to want meticulously Select, put into a large amount of expense, and cycle of training is the most long.Even if moreover specialty teacher of the sampling tea, it is sensory sensitive Degree is also easily changed by the interference of extraneous factor, thus affects the accuracy of evaluation result, objectivity and concordance.Such as smelling of people Feel that resolving power is easily disturbed by extraneous different miscellaneous QI taste;The sense of taste sensitivity of people is easily by other irritable food and the shadow of temperature thereof Ring;The vision of people relates to the factors such as optics, vision physiological, visual psychology, and the chromatic discrimination power of different people can exist necessarily Difference.The sensory sensitivity of the personnel that evaluate also is affected by other factors, such as areal variation, sex difference, spirit shape The factor such as state and health.Additionally, sensory review need to be carried out on the basis of comparison material standard sample, and material standard sample Make and limited by various conditions, it is difficult to keep several years the most unanimously.And standard sample uses preceding year or former year Productivity product make raw material, it is impossible to not by sky time, weather, geographical conditions are affected, so in fact standard sample quality is very Difficulty reaches absolute standard.
The present invention to the different collecting periods, different tree species, different producing area dragon well green tea from physical and chemical index and organoleptic indicator, The integrated technology of combined with intelligent organoleptic analysis, multivariate statistics and modern instrumental analysis, omnibearing parsing dragon well green tea feature, point The internal relation of the analysis each index of Folium Camelliae sinensis, sets up qualitative, the mathematical model of quantitative assessment dragon well green tea quality, carries out dragon well green tea quality Feature identification, ranking accurately, provides strong foundation for setting up unified green tea appraisement system standard.These researchs There is provided basis by the quality evaluation for other Folium Camelliae sinensis of China in theory and support, in practice for improving China's tea quality Stability, strengthened the grading and classification of China's Folium Camelliae sinensis by standardized instrument, it is achieved the high quality and favourable price of Folium Camelliae sinensis, break China's export The high-quality low price tradition of Folium Camelliae sinensis, eliminates developed country's query to China's product high-quality low-cost, for safeguarding domestic market order With the vital interests of guarantee consumer, actively defend the international fame of China's tea products, promote that international trade etc. has important Meaning and significant social benefit, economic benefit.
Recently as the development of Modern Instrument Analytical Technique, the physics and chemistry research of Folium Camelliae sinensis have also been obtained corresponding progress.Tea Leaf aroma substance separates and analytical technology is the most progressively from conventional gas chromatogram (GC) or gas chromatography-mass spectrography (GC-MS) It is transitioned into gas chromatogram-smell and distinguish (GC-O) method.Detect the tea aroma composition of kind more than 700 at present, spread out including fats Biology, terpenes derivant, aromatic derivant and nitrogenous oxa-cyclics.But nonetheless, merely from the angle of composition Degree is also difficult to react global feature information and the flavouring essence quality of tea aroma.Main to the instrument analysis technology of Folium Camelliae sinensis taste compound There are liquid chromatography, spectrographic method, mass spectrography, nuclear magnetic resonance method etc..At present, containing organic chemical composition up to six in the clearest and the most definite Folium Camelliae sinensis Over one hundred kind, inorganic mineral element also reaches kind more than 40.But owing to there is interaction between various flavours, right such as the sense of taste Compare, modify tone, coordinate and the phenomenon such as the mutual-detoxication, so the chemical characteristic parameter recorded can not reflect the taste of sample the most all sidedly Feel feature.
The appearance of intelligent sensory analytical technology has promoted tea leaf quality detection level further, and it is based on to human body The technology that perception is imitated.Sensor is equivalent to the sensory organ in biosystem, produces the attribute in terms of sample Raw response signal;Response signal is transmitted and simple process by signal picker such as nervous system;Computer is such as human brain pair Signal data carries out complex process and analyzes identification, forms comprehensive, overall judgement.When intelligent sensory analytical technology has detection Between short, reproducible, need not the sample pretreatment process of complexity, sensory fatigue and the objective spy such as reliably of testing result do not occur Point, it is often more important that the sense organ that can simulate people to a certain extent provides the judge knot about tea aroma, flavour and expolasm Fruit and finger print information, be focus and the development trend of the detection research of current tea leaf quality.Currently for the color in Folium Camelliae sinensis, The sensory attribute such as shape, intelligent sensory analytical technology predominantly organic device vision, Electronic Nose and electronic tongues technology, its work used Flow process mainly include sensor produce response signal, response signal is carried out pretreatment, extract sample characteristic information, set up relevant Model is gone forward side by side row mode identification.Wherein pattern recognition is the important component part of intelligent sensory system.The main side of application at present Method has principal component analysis, artificial neural network and fuzzy diagnosis etc..Principal component analysis is used for signal processing, suppresses multidimensional sensor Response signal noise and compressed signal data.Signal after processing is learnt and trains by artificial neural network, sets up network Model.Fuzzy diagnosis then with fuzzy reasoning complexity carried out fuzzy diagnosis, fuzzy quantitatively.
Use function and the feature of intelligent sensory technical modelling people sensory review, process intelligent sensory in conjunction with many algorithm researches The abundant product quality information contained in detection, and then extract corresponding computation model and method.To solve terminal Algorithm for the purpose of problem, analyzes theirs in the case of multiple intelligence sensor objects and multiple product index are interrelated Statistical law, is well suited for the feature of food scientific research.Use many algorithms, intelligent sensory analytical technology and modern instrumental analysis skill The integrated technologies such as art, it is possible to overcome the trouble of the statistics and analysis that multiple attribute synthetical evaluation brings, also be able to make full use of simultaneously Experimental data information obtains and the implicit details of Folium Camelliae sinensis feature correlation of attributes so that the statistical analysis of Folium Camelliae sinensis feature quality and pattern Differentiation can complete simultaneously, the rapidest but also accurate.Thus, for setting up the feature qualitative data storehouse of China's Folium Camelliae sinensis and intelligent quality Evaluation system, it is achieved the analysis fast, accurate and comprehensive to tea leaf quality, for the scientific evaluation, rationally of China's Folium Camelliae sinensis feature quality Define and offer reference and instruct, for China's Folium Camelliae sinensis quality guarantee, characteristic protection, real and fake discrimination provide core technical support.
Electronic Nose as the Novel odor scanner grown up the nineties in 20th century, be widely used at present food, The fields such as beverage, cosmetics, environment measuring and Processing Farm Produce control.Compared with common chemical analysis method, electricity Sub-nose utilizes its cross-sensitivity to multiple gases, the Global Information of overall merit gas, compared with the olfactory sensation of people, measures knot Fruit is more objective, reliable.
Electronic tongues technology be 20th century the mid-80 grow up a kind of analysis, identify liquid taste novel detection Means, have been applied to the fields such as food, medicine, cosmetics, chemical industry, environmental monitoring.With common chemical analysis method phase Ratio, what electronic tongues exported is not the analysis result of sample flavour composition, but a kind of signal mode relevant with sample, pass through After there is the software system analysis of mode identificating ability, the overall assessment relevant to sample taste characteristics can be drawn.
In sum, intelligent sensory analytical technology (machine vision technique, Electronic Nose Technology and electronic tongues technology) is at Folium Camelliae sinensis Achieved with better result in Quality Detection, and show preferable application prospect.But these technology are from reality application also at present There is certain gap, still have some critical problems to need to solve.As:
(1) key technology research of Electronic Nose, electronic tongues: machine vision technique is the most extensively applied, but electric Sub-nose, electronic tongues are still in development, therefore to build comprehensive intelligent sensory system, need Electronic Nose, electronics tonguing Row further investigation, solves its key issue.
(2) development of specific sensor and screening: owing to different types of sample has its specific substance system, lead Cause different types of sensor the most different to the response of different material.Therefore, need to further investigate further, for specific material Establishing response is fast, sensitivity height, life-span length, easy to clean, economic and practical sensor array.
(3) representativeness of sample and the science of sampling: in current research report, its result mostly shows and divides Folium Camelliae sinensis The differentiation rate of class or classification is higher.But in these researchs, the representativeness of Tea Samples is strong not, and sample number is the most complete, is adopting During collection sample message, being substantially Duplicate Samples, the Folium Camelliae sinensis detection of the most each grade repeats many times so that the stability of model The best, range is the widest.Only set up the sample collection method of science and the discrimination principle of sample representativeness, after guarantee The smooth foundation of continuous model.
(4) drift of signal and denoising: due to factors such as apparatus measures parameter, measuring method, measurement environment, sample sources Change, is easily caused the drift of sensor response curve, causes the error that intelligent sensory detects so that it is do not adapt to industrialized Work continuously for a long time, it is therefore desirable to strengthen about reducing response signal drift, the research of signal noise analyzing and processing technology.
(5) robustness of model: model, when setting up discrimination model, is not discussed in detail, also do not makes by some research The robustness of testing model is carried out with independent forecast sample.Additionally, the stability of institute's established model is not enough in quality differentiation, need to add The research of strong algorithms and improvement, to improve the effect of pattern recognition.
Electric nasus system belongs to the array combination of many sensors, due to tea aroma complicated component so that each sensing Device has response to a lot of fragrance, and each fragrance component has response on a lot of sensors so that sensor fingerprint collection of illustrative plates Array can farthest retain fragrance information, but is readily incorporated bulk redundancy information, causes that quality Modeling Calculation amount is big, consumption Time-consuming length, institute's established model complexity are unstable.Its main cause is: (1) due in intelligent sensory finger printing, some sensor Sample response information the most weak, directly affect the precision of prediction of model;(2) due to the impact of Electronic Nose noise of instrument, some pass The sample message signal to noise ratio (snr) of sensor is relatively low;Meanwhile, sample quality is existed by extraneous interference factor (such as temperature, humidity etc.) Fingerprint response characteristic impact at some sensor is relatively big, thus reduces the robustness of model;(3) containing many in tea aroma Planting component, each component all can have stronger response in some or several sensor, and believes as tea aroma entirety The detection of breath, needs optimum organization to have the sensor array of specific response to different aromas, could comprehensive effective characteristic perfume Finger print information.
Rationally selecting and combination by sensor, is possible not only to reject uncorrelated or nonlinear smelling sensor, goes Except redundant sensor data message, extract maximally effective fragrance intelligence smelling finger printing information, make calibration model have more preferably Predictive ability, simplified operation.And can be saved those and pattern recognition effect is had no significant effect even have negative effect Sensor, thus to reduce Electronic Nose manufacturing cost, improve system stability have certain positive effect.
Sensor selects a kind of optimization problem being frequently encountered by practice exactly.Although currently used optimum combining method Employ the theory of combination to a certain extent, but this combination is on the basis of preliminary rejecting, to the sensor after packet Array is combined, and the effect of not up to global optimization combination.And although Loading value method avoids adding of redundant sensor Enter, but do not analyze the response performance of selected sensor, repeatability that same sample is responded by the most same sensor and to difference The distinction of sample response.Genetic algorithm (Genetic Algorithms, be abbreviated as GA) be with Darwinian survival of the fittest and Based on the biological evolution theory of the survival of the fittest, the simulation heredity of biosphere and evolutionary process and a kind of optimization method of setting up, There is non-derivative, stochastic global optimization, avoid being absorbed in the feature such as local minimum point and easy realization.
Summary of the invention
The side of exceptional sample is detected during a kind of pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality Method, it is characterised in that: judge the generation of exceptional sample be whether maloperation or instrument abnormal, be then to be carried out by Resurvey Correct;No, then use principal component analysis shot chart method to combine mahalanobis distance method and exceptional sample is identified.Described main constituent Analyze shot chart method on the premise of not losing main profile information by Data Dimensionality Reduction, select the less new variables of dimension to replace The most more variable, to eliminate part overlapped in information co-exist, by changing original collection of illustrative plates variable, makes number The less new variables of mesh becomes the linear combination of original variable;Described mahalanobis distance (Mahalanobis) method of discrimination combines and passes The response data of sensor is carried out, and the mahalanobis distance calculation procedure of collection of illustrative plates sample is as follows:
T is the finger printing sample of Folium Camelliae sinensis sampling, and in formula, ti is the collection of illustrative plates score of calibration set sample i, for m sample of calibration set This average matrix;Tcen is the average centralization matrix of T;M is the Mahalanobis matrix of calibration set sample;MDi is The Mahalanobis distance of calibration set sample i.According to quantitative correction allowable error and corresponding Mahalanobis distance, really Determine outlier Mahalanobis distance threshold limit and to spectrum data standardization after, the mahalanobis distance size of each sample Determined by following formula:
In intelligent sensory In sensor detection, hii have expressed sample i and regression model influence degree, maximum hii being shown, this regression model is to sample i Dependency relatively big, then sample i is abnormal sample.
Accompanying drawing explanation
PCA shot chart (a) before the rejecting of Fig. 1 exceptional sample and mahalanobis distance value residual plot (b).
Fig. 2 difference sample is at the signal graph of Electronic Nose sensor characteristics response point.
PCA shot chart (a) after Fig. 3 exceptional sample LLJ rejecting and mahalanobis distance value residual plot (b).
The Electronic Nose sensor response signal strength map of tetra-grade dragon well green teas of Fig. 4.
After Fig. 5 abnormal sample is rejected, different brackets Folium Camelliae sinensis Electronic Nose sensor responds meansigma methods figure.
Load diagram under front four main constituents after the rejecting of Fig. 6 abnormal sample.
Relation between PRESS value and model number of principal components in Fig. 7 Grade Model.
Fig. 8 genetic algorithm operational flow diagram.
Fig. 9 crossover algorithm.
Figure 10 mutation algorithm.
Tetra-rating sample of Figure 11 load diagram under main constituent one, two.
Folium Camelliae sinensis sensor response meansigma methods figure in the model of Figure 12 place of production.
The principal component scores figure (PC1-PC2) of Figure 13 place of production model.
Figure 14 place of production model LHT-LMT (a) and the LYJ-LWJ (b) load diagram under main constituent one, two.
Folium Camelliae sinensis sensor response meansigma methods figure in Figure 15 seeds model.
Figure 16 seeds model principal component scores figure (PC1-PC2).
Detailed description of the invention
1 Tea Samples is collected and is processed
The present invention collects the dragon well green tea from the West Lake samples in 2011 from Hangzhou West Lake Dragon Well tea producing region locality tea grower, specifically includes 4 Individual grade, 2 seeds, 5 places of production.For the ease of the differentiation between tea sample, every kind of tea sample reasonable numbering and differentiation are carried out, tool Body information is shown in Table 1.In order to ensure the concordance of tea sample matter of the same race, tea sample is placed in the freezer of less than-4 DEG C, according to experiment Consumption takes pouch every time and tests.
2 detection by electronic nose methods
The present invention uses Alpha MOS company of France to produce the Fox 4000 type Electronic Nose with head space automatic system.First First, add 5 mL room temperature ultra-pure water glands after each 20 mL ml headspace bottle load 1.00 g Dragon Well tea dried bean curd tea to seal;Every kind of tea Sample contained the most by this way by sample, and detects successively.The detection of each sample is first ml headspace bottle to be sent into preheating zone, After heating 900 s at a temperature of 500 rpm agitator rotating speeds and 60 DEG C of head spaces, extract the 2.0 mL gases sample introduction with 2.0 mL/s out Speed is injected into Electronic Nose sensor array room (containing 18 metal oxide sensors).Respectively with the half of 18 sensor surfaces Conductor material generation adsorption and desorption effect, causes the change of sensor resistance.Under the different time, produce different Resistance value.Being 120s in sensor array indoor sample gas residence time, every 0.5s samples once, and Electronic Nose software is remembered automatically Record each sampled data.
3 tea leaf quality modeling methods
Tea leaf quality model (grade, the place of production, seeds model etc.) is set up specimen in use and is divided into calibration set and forecast set.To often Sample in individual model, all randomly chooses 2/3rds and makees calibration set sample, and remaining 1/3rd are used as forecast set sample.This Invention is by soft independent model SIMCA(Soft independent modeling class analogy, also known as similarity analysis method) Set up qualitative discrimination model, initially set up the PCA data model of all kinds of sample, then on the basis of this, calculate the SIMCA of unknown sample Distance determines its category attribution.Model all calculating to be completed by self-editing MATLAB 7.0 program.
4 exceptional sample point analysis and rejectings
4.1 exceptional sample point analysis principles
During application intelligent sensory signal carries out the pattern recognition analysis of tea leaf quality, all classification and recognition result Reliability be dependent firstly on the accuracy of initial data, i.e. obtain intelligent sensory and gather signal and the original classification information of Folium Camelliae sinensis Reliability, the quality of data set quality directly affects the success or failure of pattern discrimination.Therefore, abnormal (unusual) sample point (Outlier Sample) existence can affect the distribution trend even changing overall data to a certain extent, thus affects calibration model Accuracy.
So-called exceptional sample point, refers not only to the measured value of intelligent fingerprint collection of illustrative plates or sample raw information and the aobvious of actual value Write sexual abnormality, also should include the finger printing of this sample and the significant difference of modeling concentration sample mean profile information, typically Finger printing can be divided into abnormal and Folium Camelliae sinensis raw information is abnormal.
The main cause causing finger printing abnormal has:
(1) measuring instrument and the change of performance parameter, such as change, noise of instrument and the wave band drift etc. of instrument energy;
(2) change of measuring method, as the difference of sampling, the difference of measuring point and measure distance the most equal;
(3) change of environment is measured, such as the change of temperature and humidity;
(4) other physics of sample or the change of mechanical property, as granularity, viscosity, fineness etc. change;
(5) change of sample source so that sensor response resistivity or some characteristic peak intensity are abnormal, such as the place of production, put Put the changes such as time, storing mode, collecting period and tillage method;
(6) sample is rotten or mistakes to wait and slips up;
(7) operation mistake in intelligent sensory signal scanning.
The main source of Folium Camelliae sinensis original quality Information abnormity has:
(1) physics and chemistry instrument used by and the reliability of method;
(2) change of sensory evaluation method;
(3) change of sample source;
(4) error of tea teacher is commented, such as the error in judging process and Data Input Process.
The generation of exceptional sample, if maloperation or instrument are abnormal, can obtain simply by Resurvey after discovery Correct;Abnormal sample is if due to sample generation itself, it is impossible to corrected simply by Resurvey, to this sample Predictive value the most reliably depend on the fitting degree of its sensor response abnormality and model.So the discovery of abnormal sample and having Effect rejecting be calibration model and data results the most crucial.
4.2 exceptional sample point analysis methods
It is that principal component analysis score combines mahalanobis distance method that exceptional sample in the present invention analyzes method.
(1) principal component analysis shot chart method
Principal component analysis (PCA) is a kind of data mining technology in multivariate statistics.Do not losing main profile information By Data Dimensionality Reduction under premise, it is chosen as several less new variables and replaces the most more variable, to eliminate numerous information co-exist In overlapped part.By original a large amount of collection of illustrative plates variablees are changed, small numbers of new variables is made to become original change The linear combination of amount.
Principal component scores after principal component analysis can similarity between reflected sample and uniqueness, each sample correspondence is not Different score value is had with main constituent.Shot chart based on sample can disclose internal feature and the clustering information of sample, further Illustrate whether each sample exists larger difference in big class sample set, provide certain theory for exceptional sample point analysis and depend on According to.
(2) mahalanobis distance method of discrimination
Mahalanobis distance (Mahalanobis) is one of effective ways of research hyperspace vector similarity, at collection of illustrative plates Qualitative, outlier discriminant analysis is used widely.When Mahalanobis distance calculates, in conjunction with the sound under several sensors Answering data (such as resistivity) to carry out, the mahalanobis distance calculation procedure of sample set is as follows:
In formula, ti is the collection of illustrative plates score of calibration set sample i, for the average matrix of m sample of calibration set;Tcen is the average of T Centralization matrix;M is the Mahalanobis matrix of calibration set sample;MDi is the Mahalanobis distance of calibration set sample i. According to quantitative correction allowable error and corresponding Mahalanobis distance, determine that outlier Mahalanobis distance threshold limits also After spectrum data standardization, the mahalanobis distance size of each sample is determined by following formula:
Hii can be used to the impact weighing a sample for whole standard sample collection.In intelligent sensory sensor detects, Hii have expressed sample i to regression model influence degree, if hii is too big, shows that this regression model is to the dependency of sample i relatively Greatly, unfavorable to model stability, in other words, sample i is probably abnormal sample.
4.3 exceptional sample point analysis and rejectings
Main constituent (score matrix) is the linear combination of primal variable, when characterizing primal variable with it produced square Minimum with error.First main constituent can explain that the amount of variation of former variable is maximum, and second is taken second place, and the rest may be inferred by analogy for it, respectively organizes master Composition is mutually orthogonal.The method that main constituent calculates is more, uses the nonlinear iterative partial least square of house one validation-cross at this Method (Nonlinear Iterative Partial Least Squires, NIPALS).The principal component scores of dragon well green tea is with right The mahalanobis distance residual result answered is as shown in Figure 1.Sample LLJ deviates farther out with other sample sets in main constituent figure, and its Mahalanobis distance value is the biggest, and therefore these LLJ are exceptional sample point.Analyze its corresponding sensor response diagram (Fig. 2), find This fine work Folium Camelliae sinensis is very big with the response diagram difference of other sample fine work, and is not belonging to a rating sample.Inquiry sample collecting Raw information, find this sample the non-real Long Wu place of production, Hangzhou fine work dragon well green tea from the West Lake, but Zhejiang Dragon Well tea, because of sample Offer mistake causes.After to these abnormity point elimination, re-start principal component scores and analyze (Fig. 3) with mahalanobis distance value, Finding that these Folium Camelliae sinensis are evenly distributed in main constituent figure, there is not exception yet in its mahalanobis distance value, representative, can carry out Follow-up model is set up and Mathematical treatment of being correlated with.
The score of main constituent can similarity between reflected sample and uniqueness to a certain extent, each sample correspondence is not Different score value is had with main constituent.Fig. 3 (a) is each Folium Camelliae sinensis sample score scatterplot in the first two main constituent, it is shown that sample The dispersion of this point and difference, the sample with same or like character flocks together, and the obvious sample of difference is the most remote From.Two grades of Folium Camelliae sinensis and other Folium Camelliae sinensis difference are very big as we can see from the figure, have oneself independent region, but fine work, superfine Distinguish the least with the Folium Camelliae sinensis of one-level, have obvious overlapping region.This is consistent with sensor response curve analysis result above. Shot chart based on sample can disclose internal feature and the clustering information of sample, further illustrates sample in sensor responds The larger difference existed, for utilizing Electronic Nose classification and Detection different brackets Folium Camelliae sinensis to provide certain theoretical foundation.But due to it Its hierarchical region is overlapping serious, and this method is almost not used to the differentiation of these four sample by naked eyes.
Mahalanobis distance residual plot then represents each sample point influence degree to corresponding principal component model, by sample point Mahalanobis distance and residual error determine, the sample point of high mahalanobis distance value and high residual values is considered as exceptional sample point.Geneva Distance value is the sample point subpoint in a model distance away from model center, represents this sample and the district of other sample in model Not, and the influence degree that sample point is to set up model, it is worth the biggest expression the biggest on the impact of model.Residual error is sample point Observed value and the difference of match value, represent the amount of model the most construable sample point feature, and its value is the least, and models fitting is the best.From Fig. 3 (b) understands, and residual values and the mahalanobis distance value of sample point are the least, show that the sample that in each model, calibration set is chosen has The representativeness of corresponding Folium Camelliae sinensis characteristic.
After exceptional sample is rejected, its final experiment sample number is shown in Table 2, and sample is 667 before rejecting, and is 617 after rejecting Individual sample.The main cause causing above-mentioned phenomenon is to be mixed with in sample set to be not belonging to same overall data, these abnormal datas After (exceptional sample) is mixed into, can make to predict the outcome inaccurate, affect the correctness of statistical inference, measurement result is brought disadvantageous Impact.Exceptional sample is the most very important on the impact of calibration model, the effectiveness of established model in order to ensure, is entering data When row processes, it is necessary to find and identify exceptional sample, and it is rejected from sample set sample, do follow-up study the most again.
By PCA, mahalanobis distance figure and sensor response diagram analysis of spectrum, the abnormal sample point in search modeling.Show to pass Sensor response finger printing is highly susceptible to the impact of external interference factor, the most not the true character of representative sample Exceptional sample point, the presence of which can affect the distribution even changing overall data to a great extent, and the impact on modeling is non- Chang great.From the point of view of mathematics, exceptional sample point is exactly the sample in multivariate space away from barycenter.The most important thing is exception Sample point represents some character being originally not belonging to model, and forecast set will not include these features under normal circumstances so that The existence of exceptional sample point reduces predictive ability and the robustness of model.If not carrying out abnormity point analysis and rejecting, use Finger printing pretreatment or other modeling method are all difficult to improve the effect of model, and therefore abnormal sample is rejected is each modeling work The problem that author have to consider.
The Grade Model of 5 Xihu Longjing Teas is set up
The calibration set forecast set sample of 5.1 Grade Models divides
Totally 617, the sample differentiated for tea grades after rejecting abnormalities sample point, wherein randomly chooses 2/3rds works Calibration set sample, remaining 1/3rd are used as forecast set sample so that calibration set had both had preferable representativeness, open up again simultaneously The estimation range of wide model, enhances the adaptation ability of model, and sample distribution is shown in Table 3.
The Electronic Nose response diagram analysis of spectrum of 5.2 different brackets Folium Camelliae sinensis
The change response diagram of 18 sensor resistance ratios (resistance variations is compared with initial resistance value) in tea aroma detection As shown in Figure 4, every corresponding sensor of curve, totally 18 curves.Selecting on curve represents millet paste volatile material and passes through During sensor passage, resistivity change situation in time.According to the difference of Fundamentals of Sensors, its response intensity have positive and negative it Point.It is LY type sensor below abscissa, is T, p-type sensor above abscissa.As shown in Figure 4, the phase before acquisition, in sample Volatile substance carries out strong enrichment process at sensor surface, and curve response change is fast, and slope absolute value is bigger.Work as volatility When the adsorption of material and sensor is in poised state, sensor response value reaches maximum absolute value, now best embodies The character of gas in sample.Along with the prolongation of acquisition time, gas concentration is gradually lowered, and sensor response value is gradually reduced, bent Line slowly tends towards stability, and is finally reached a relatively steady state.But fine work is with superfine collection of illustrative plates closely, two grades and its Its rating sample difference is maximum, and the collection of illustrative plates of one-level sample and fine work, superfine close, but its response value scope is different, grade height Sample, the absolute value of its response value is just big.It follows that the fragrance ingredient of millet paste is obvious responsed to by Electronic Nose, show profit It is feasible for measuring tea leaf quality by Electronic Nose.
Response diagram within the 120s time, it is impossible to compare different sample room differences intuitively.Need to find characteristic response point, I.e. find and represent the every sensor characteristic response intensity to a certain sample.The crest of response curve or trough are for same sample Relative standard deviation (RSD) relatively low, the discrimination for different samples is the most maximum.Therefore, sensor response is chosen definitely The maximum point of value, i.e. sensor respond the peak dot in signal strength map or valley point as characteristic point.In order to analyze different brackets, no The same place of production, the difference of different tree species tea leaf quality, Fig. 2 show certain day different Folium Camelliae sinensis (numbered: LLJ, LWJ, LYJ, LHT, LMT, QWJ, QHJ, QLJ, QYT, QMT, 1,2) responsor signal graph at each sensor crest or trough.As can be seen from Figure 2, often One sensor is different to the response of tea aroma.LY type sensor have bright along with the difference of tea leaf quality, amplitude Aobvious fluctuation, distinguishes obvious, and T-shaped less with its response curve discreteness of p-type.Simultaneously red secondary sample curve with Other sample area shows clearly, although it is not it is obvious that but fine work is with superfine that one-level sample and the curve of fine work, superfine sample are distinguished Curve all between firsts and seconds.It follows that the difference of sensor array characteristic response figure reflects to a certain extent The quality difference of dragon well green tea from the West Lake, and there is certain characteristic and fingerprint, the taxonomic history for Folium Camelliae sinensis provides mathematics Basis.
Fig. 5 is different brackets Folium Camelliae sinensis respective response meansigma methods figure, from figure it is clear that, the response of two grades of Folium Camelliae sinensis It is clearly distinguished from and other rating sample.The response collection of illustrative plates of fine work, superfine and one-level is closely similar, simply sensor LY2/G, LY2/AA, LY2/gCTL, P30/2 etc. have relatively large difference, and the difference of each sensor response signal is that subsequent mathematical is built The basis of mould.
The principal component scores variation tendency of 5.3 all rating sample
The data matrix of different brackets Folium Camelliae sinensis sample odor characteristic parameter composition is carried out principal component analysis, its master set up Component analysis model is:.Wherein Am × p is figure spectrum matrix, and Tm × f is score matrix, Pf × p For loading matrix, E is collection of illustrative plates residual error, and dimension is identical with Am × p.M is sample number, and p is number of sensors, and f is main constituent Number.
To each measuring value aij in matrix A m × p, its principal component analysis can be expressed as: , in formula: tin is sample i score value in n-th main constituent, pnj is that sensor j is in n-th main constituent Load value;Eij is the residual values of the variable j of sample i.
Employing stays a cross verification to carry out principal component analysis, and table 4 is the accumulation tribute of all grade tea sample principal component analysiss Offer rate situation.The contribution rate of first principal component is 93%, represents most sample messages of initial data, front 4 main constituents Representing the sensor information of 99%, according to main constituent character, front four main constituents can characterize the Electronic Nose intelligence of sample Sensorial data architectural feature, thus serve the effect reducing data dimension, simplifying data.Select front 4 number of principal components modeling, Data matrix is reduced to 617 × 4(4 main constituent from original 617 × 18).
The main constituent loading analysis of 5.4 all rating sample
In principal component analysis, the computing formula of the n-th principal component scores is:, Wherein pij is referred to as the load (Loading) of variable aij, and load is the biggest, illustrates that main constituent is the best with the dependency of this variable, and Variable aij is corresponding to the response value of jth sensor in sensor response matrix.The sensor response signal of different brackets Folium Camelliae sinensis Through principal component analysis, front 4 principal component scores have reached 99% to the contribution of Folium Camelliae sinensis intelligent fingerprint change information.Fig. 6 exhibition Load and the sensor map of front 4 main constituents are showed, it can be seen that the relation between each main constituent and sensor.
As can be seen from Figure 6, the PC1(93% maximum for representing Folium Camelliae sinensis quantity of information), predominantly LY2/G that its load is bigger, These four sensors of LY2/AA, LY2/GH, LY2/gCTL, for Second principal component, except sensor LY2/AA, also P10/1, P10/2, P40/1 and T40/1, TA/2 dependency bigger.Under 3rd main constituent, sensor LY2/LG, LY2/G, LY2/AA, The dependency of LY2/GH is bigger;Under 4th main constituent, the dependency of sensor LY2/AA, T30/1, T70/1 and T40/1 is bigger.
The number of principal components of 5.5 SIMCA grade modelings selects
First similarity classification method (SIMCA) modeling carries out principal component analysis modeling to each class sample, makes similar sample gather Collection is in the same space region.Table 5 is different brackets sample each principal component model contribution rate under different main constituents, institute Gradational first principal component contribution rate is all more than 99%, and the most nearly all grade is all that front 5 main constituents substantially represent The main information of sample.
Similarity classification method algorithm is based on the method setting up principal component analysis class model, rings through principal component analysis sensor The change of induction signal main constituent can embody the trend of tea leaf quality feature the most intuitively, and the determination of number of principal components is to set up The key of good model.Owing to similarity classification method algorithm is concerned with the similarity degree within each grade, and each main constituent represents Be the variation property of same level correction sample, the levels characteristic that the most forward main constituent comprises is the abundantest, the work to classification With the biggest, so several main constituents can make classification quality reach optimal before selecting, the main constituent simultaneously selected comprise etc. Level feature is the most, and the effect of modeling and forecasting is the best.
But select too much number of principal components can bring the effect of model over-fitting equally.In this invention, by alternately Checking primarily determines that the optimal number of principal components of above-mentioned different brackets Folium Camelliae sinensis model, i.e. becomes at predictive residual error sum of squares (PRESS) Fewer number of principal components is chosen in the case of changing not quite.Along with main constituent increases, PRESS is gradually reduced, but main constituent exceedes During certain numerical value, due to the appearance of Expired Drugs, PRESS increases on the contrary.Fig. 7 is the PRESS value of different brackets instance model And the relation between number of principal components.Owing to fine work is very big, the most not with superfine PRESS value in main constituent one and two All draw.The number of principal components of fine work is when 9, and PRESS value is minimum, and number of principal components is between 5-8, and the change of its PRESS value is less; Superfine number of principal components is when 7, and PRESS value is minimum, and number of principal components is when 5 and 6, and the change of its PRESS value is less;The main one-tenth of one-level When mark is 6, PRESS value is minimum, and number of principal components is when 4 and 5, and the change of its PRESS value is less;When the number of principal components of two grades is 6, PRESS value is minimum, and number of principal components is when 4 and 5, and the change of its PRESS value is less.
The similarity classification method Grade Model of 5.6 Folium Camelliae sinensis is set up and prediction
The estimated performance of similarity classification method hierarchy model is extremely important, is mainly manifested in whether forecast model may adapt to The mensuration of new data.Good model can describe the data similar to modeling data, and inspection refers to new similar data Bring model into, then observe whether forecast error meets predetermined requirement, thus prove the reasonability of selected number of principal components.
Forecast test is divided into two kinds: one to be external inspection, refers to use brand-new prediction data to verify;Other one Plant and be referred to as internal inspection, refer to use the data of modeling itself that model is verified.In theory, the prediction energy of a model Power can only be checked by brand-new data, but cross-validation (Cross validation) also can provide reasonably knot Really.
If sample size is less or little, cross-validation method significantly more efficient can utilize limited sample, But it is slower than external inspection method to calculate speed.In cross-validation algorithm, identical sample is both for model In structure, again in the inspection of model.Basic ideas are as follows: first reserve a certain amount of sample from calibration set sample, use it Calibration model set up by remaining sample, is then predicted with those reserved sample input models, draws forecast error;This mistake Journey repeats, until each sample was reserved out once, is predicted inspection, then with the prediction repeatedly modeled by mistake Difference calculates overall residual variance and mean square deviation.Cross validation is a kind of extraordinary internal inspection method, as external inspection Method is the same, and pursuit is to use independent data to test model, and main benefit is unlike external inspection, in advance Survey data and be only intended to inspection, and waste data resource.
Cross validation method can be divided into again full figure spectrum cross validation (full cross validation), partial intersection The several methods such as checking (segmented cross validation).Full figure spectrum cross validation is the cross validation used the earliest Method, its thinking is only to reserve a sample from gross sample as forecast set sample during modeling every time, and other sample For modeling, repeat this process, until all of sample all reserves once carrys out testing model as prediction sample.Due to full figure Spectrum cross-validation method needs to expend a great deal of time, and verifying speed is slow, and partial intersection proof method is only all samples to be divided into Several parts are verified.
But owing to full figure composes the effective of cross validation, and extensively it is used.First, the actual pre-of model can be estimated Survey ability, although be internal inspection, but do not participate in modeling as predicted sample, can simulate unknown sample Prediction case;Second, the sample number of calibration set is the most, and the sample number that modeling is rejected every time is the fewest, and estimation effect is just The best.
Predictive ability for a model usually uses full spectrogram validation-cross and the external prediction of forecast set of calibration set Check.Full spectrogram validation-cross is used for the model predictive ability for calibration set, is self-checking evaluation;External prediction is used for commenting Valency model is for the indication ability of forecast set sample.Generally, full spectrogram validation-cross estimated performance is higher than external prediction, Full spectrogram validation-cross illustrates the classification capacity of model and Selection parameter to a certain extent, and external prediction is one and more can illustrate The index of problem, its used characteristic variable of reaction and the robustness of model and adaptability, table 6 is similar to four grade differences The effect of classification method (SIMCA) calibration modeling.
By table 6, it is known that the discrimination of four rating sample models can only achieve 70% multiple spot, it not the highest, mainly fine work With superfine tea aroma feature closely, have impact on the estimated performance of block mold.Individually the differentiation of this two-stage sample is built Mould discrimination the most only about 67%, illustrates that the sample overlap ratio of the two grade is more serious.Trace it to its cause and be because fine work with special The division of level mainly goes out to send division from commodity tea angle, and namely from fragrance, flavour and plucking time, difference is fairly small, Main the most variant in terms of the outward appearances such as the regularity of Folium Camelliae sinensis, size homogeneity, for not having unqualified, uniform to be set to fine work Tea, and bright other front Folium Camelliae sinensis is just set to superfine tea.Therefore, fine work is with superfine odor characteristic closely.
In order to study the power of test of Electronic Nose further, the fine work in this level Four sample is combined into one with superfine sample The sample of grade is referred to as " essence is superfine " and sets up with the similarity classification method discrimination model that I and II carries out Three Estate, finds model Estimated performance the best, calibration set, the discrimination of forecast set respectively reach 93.43% and 92.72%, above 92%.The most single Solely fine work, one-level, two grades of similarity classification method discrimination models carrying out Three Estate are set up, also individually by superfine, one-level, two grades The similarity classification method discrimination model carrying out Three Estate is set up, and these three grades of models have stronger identification ability, their knowledge Not rate is above 90%, also absolutely proves that level Four model prediction poor performance is because fine work and superfine sample message is overlapping is led Cause.It addition, in similarity classification method pattern recognition, the foundation of tea grades model is substantially make use of linear sentences method for distinguishing, The result of Folium Camelliae sinensis identification not yet reaches the discrimination of 100%, and this is likely due to be rung by storage time, condition of storage and sensor The characteristic of induction signal so that the signal of acquisition exists nonlinear transformations, so can also attempt to utilize it in work afterwards Its nonlinear mode identification method sets up model.These three grades of models can substantially meet market detection needs at present.
In principal component analysis Fig. 3 (a), it can be seen that secondary sample collection is maximum with the dispersion degree of other sample sets, uses meat Eye just can gem-pure distinguish, and is modeled by similarity classification method two discriminant classification of one-level, two grades, and its calibration set is with pre- The discrimination surveying collection is all 100%, illustrates that the difference of I and II sample message is very big, and the entirely appropriate popularization of this model should With.
6 intelligent sensory TuPu method sensor systems of selection
In Electronic Nose, the response performance of sensor mainly includes whether the response of same sample is had good by same sensor Good stability and whether different samples are had higher distinction.
Optimum combining method is the abnormal smells from the patient response signal data that applying electronic nose gathers not same quality sample, by not simultaneous interpretation The variance analysis of sensor response signal value, carries out Preliminary screening and packet according to sensor response performance quality, then to packet Sensor carries out permutation and combination, with the discriminant index DI of principal component analysis result as foundation, finally determines and most has sample classification The sensor array of effect.Although the method employs the method for combination the most to a certain extent, but this combination is tentatively to pick On the basis of removing, the sensor array after packet is combined, and the effect of not up to global optimization combination.
Loading value method, will sensor as analyzing object, sensor response value under different samples is led Component analysis, judges to distinguish intimate sensor by principal component analysis figure (being also the Loading analysis chart of sensor) And reject.Although the method avoids the addition of redundant sensor, but does not analyze the response performance of selected sensor, i.e. The repeatability that same sample is responded by same sensor and the diversity to different sample response.
Genetic algorithm (Genetic Algorithms, be abbreviated as GA) is with Darwinian survival of the fittest and the survival of the fittest Based on biological evolution theory, the simulation heredity of biosphere and evolutionary process and a kind of optimization method of setting up, have non-derivative, Stochastic global optimization, avoid being absorbed in local minimum point and the easy feature such as realization.Its basic thought is by Problem Areas (multisensor group Gregarious) in may solve (a certain sensor building form) regard body one by one or the dye of population (multi sensor combination group) as Colour solid (a certain sensor building form), and each individuality is encoded into binary character string form;Genetic algorithm is by dye " fitness value " of colour solid evaluates the quality of chromosome, and the selected probability of chromosome that fitness value is big is high, on the contrary, adapts to The selected probability of chromosome that angle value is little is little, and selected chromosome enters the next generation;Chromosome in the next generation passes through The genetic manipulations such as intersection and variation, produce new chromosome, i.e. " offspring ";After some generations, algorithmic statement is in best dye Colour solid, this chromosome is exactly optimal solution or the near-optimum solution of problem, the most selected optimal sensor array.The realization of genetic algorithm Mainly include 5 fundamentals: parameter coding, the choosing of variable, the initialization of colony, fitness function design, genetic manipulation Design and convergence criterion etc..Wherein the genetic manipulation as important step includes three operators: selects, intersect and makes a variation.Its behaviour Fig. 8 is seen as flow process.
Sensor during the present invention uses genetic algorithm In Grade, the place of production to set up with seeds model carries out selecting to optimize.Lose All calculating in propagation algorithm are completed by self-editing MATLAB 7.0 program, its key parameter such as table 7.The concrete step of this algorithm Rapid as follows:
(1) selecting suitable variable parameter: Population Size 40, crossover probability pc is 0.6, and mutation probability pm is 0.1, heredity The termination evolutionary generation T of algorithm is 200.
(2) put k=0, randomly generate initial population:
(3) chromosome coding: all the sensors is carried out binary coding, and each sensor is as a gene (altogether 18 genes).If gene code is 1, then modeling includes this sensor;If 0, then do not include this sensor during modeling.A kind of Coded combination is referred to as item chromosome.
(4) adaptive value function F (k) is determined: this experiment uses cross verification evaluation to the predictive ability of model, it is desirable to institute The discrimination of established model is maximum, then pattern function is:
(5) selection of chromosome: determine previous generation's chromosome information that fitness value is big by conventional " roulette method " It is delivered to the next generation.
(6) intersection of chromosome: use single-point interior extrapolation method, randomly choose a fixed number according to predetermined crossover probability pc The chromosome of amount is to as parents;Then, randomly choose a cross point, the gene strand on the right side of exchange parents cross point, produce New filial generation;Finally, replace parent chromosome by child chromosome, produce new population (see figure 9).This is to produce new individual master Want method, determine the ability of searching optimum of genetic algorithm.
(7) variation of chromosome: use basic bit mutation method, make with predetermined Probability p m the gene of chromosome change Become, i.e. 1 and 0 mutual phase transformation, replace parent (see figure 10) by the child chromosome after variation.Individuality after intersecting is become Different, obtain population of future generation:;This is the auxiliary of the new individuality of generation Aid method, can prevent immature oils phenomenon, improves the local search ability of sensor.
(8) circulation stopping criterion: whether reach maximum reproductive order of generation (Genmax) or optimal solution that preliminary election sets, reach then Stop;Otherwise, (4) it are circulated back to.
Sensor in 6.1 Grade Models selects
The sensor response collection of illustrative plates of In Grade model, after 3 take turns genetic algorithm, finds three sensors LY2/LG, P40/ 1, the frequency that TA/2 is used in each genetic process is minimum, therefore rejects this three sensors, to the LY2/G stayed, LY2/AA、LY2/GH、LY2/gCTL、LY2/gCT、T30/1、P10/1、P10/2、T70/2、PA/2、P30/1、P40/2、P30/ 2, these 15 sensors such as T40/2, T40/1 carry out the foundation of different brackets model, the modeling effect before and after the rejecting of its sensor It is shown in Table 8.For I and II model, owing to the sample of itself differs greatly, after sensor is rejected, discrimination still retains 100%;The fine work the least for sample difference and superfine sample, the effect of model is protected almost without the biggest change, calibration set Hold more than 67%, it was predicted that collection change is the most little;After choosing deleted by same sensor, for fine work, superfine, firsts and seconds these four The model of rating sample is set up, it was predicted that the discrimination of collection is not changed in, or about 70%;Fine work, three classification of firsts and seconds In model, the discriminating power of calibration set and forecast set brings up to 92.83% and 92.09% from 92.11%, 90.65% respectively;Superfine, Although the estimated performance of one-level and tertiary sample Grade Model decreases, but very close to, before and after its sensor is rejected Effect is the most also about 95%;The Forecasting recognition rate of essence spy, one-level and tertiary sample Grade Model increases equally, with complete The 92.73% of portion's Sensor Model becomes the 93.20% of 15 Sensor Models.As can be seen here, through sensor select after etc. The performance of level discrimination model does not reduce, and have becomes excellent the most on the contrary, but makes the quantity of sensor be reduced.
The mechanism disallowable in order to study sensor further, makes a concrete analysis of the response performance of these Electronic Nose sensors. The measurement of response performance mainly includes whether same sensor has good cohesion and to difference to the response of similar sample Whether class sample has higher distinction.The principle of application variance analysis, every sensor as a factor, the most equally The response of product, as level, carries out homogeneity test of variance, it is ensured that data meet the condition of variance analysis.Application SPSS data analysis Software carries out the calculating F value (table 9) of one factor analysis of variance respectively to the sensing data of all rating sample.F value shows same One sensor separating capacity to inhomogeneity sample, F value is the biggest, and discrimination is the biggest.
Although the F assay of all the sensors is both greater than F0.05=2.60, i.e. all the sensors is different grades of to four Discrimination is notable, but compares the F value of all the sensors, and wherein LY2/LG, TA/2 are both less than 25 with the F value of T40/1, and F value is fallen Several 4th little P10/1 are more than 5 times of these three sensors, and the F value minimum only 8.003 of LY2/LG, therefore pick Except this sensor.
Simultaneously in four rating sample data load diagram (Figure 11) after principal component analysis, TA/2 Yu T40/1 exists , belong to the sensor playing common class effect, but the load value that TA/2 is under PC2 is less than T40/1 in load diagram relatively, so Sensor TA/2 can be rejected.According to same principle, sensor P40/1 Yu P10/1 is nearly at overlap condition in load diagram, Further according to the Combinatorial Optimization method of sensor, final rejecting sensor P40/1.
Sensor in 6.2 place of production models selects to optimize and screening strength
(1) the calibration set forecast set sample of place of production model divides
In order to ensure the comparability of place of production model, at this mainly for the different sources under the conditions of the same seeds of same grade Folium Camelliae sinensis model.In 617 Tea Samples gathered, there is following four place of production model: it is superfine that (1) originates from Dragon Well tea 43# seeds Mountain (LHT) and Mei Jia depressed place (LMT) model after the tiger race of Folium Camelliae sinensis;(2) Fructus Myricae rubrae ridge (QYT) and the prunus mume (sieb.) sieb.et zucc. of colony's seeds superfine Folium Camelliae sinensis are originated from Family's depressed place (QMT) model;(3) Fructus Myricae rubrae ridge (LYJ) and Weng Jiashan (LWJ) model of Dragon Well tea 43# seeds fine work Folium Camelliae sinensis are originated from;(4) produce Mountain (QHJ), Long Wu (QLJ) and Weng Jiashan (QWJ) model after the tiger of colony's seeds fine work Folium Camelliae sinensis runs.To the sample in each model Product, all randomly choose 2/3rds and make calibration set sample, and remaining 1/3rd are used as forecast set sample, and concrete sample distribution is such as Shown in table 10.
(2) Electronic Nose of place of production model responds collection of illustrative plates and principal component analysis
Figure 12 is four place of production model respective average response collection of illustrative plates, as seen from the figure model LHT-LMT and model QYT-QMT Collection of illustrative plates distinguish very big, the collection of illustrative plates of model LYJ-LWJ is at sensor LY2/G, LY2/AA, LY2/GH, LY2/gCTL and P30/2 Place differs greatly, and in model QHJ-QLJ-QWJ, the average fingerprint profile variation in three places of production is the least.
From principal component scores Figure 13, it is also possible to see in model QYT-QMT there being the most substantially the sample in each place of production Region, and sample variation degree between two places of production is maximum;Although each place of production sample also has respective district in model LHT-LWT Territory, but between two places of production, there is no obvious distinguishing limit;In model LYJ-LWJ, two places of production not only do not have obvious boundary, simultaneously Also have and intersect and overlapping region;And sample cross in model QHJ-QLJ-QWJ is the most, almost it is hardly formed respective product Ground classification.
(3) sensor in the model of the place of production selects
At this individually to the sensor of place of production model (QYT-QMT) after 3 take turns genetic algorithm, pick out LY2/G, LY2/ These seven sensors of AA, T30/1, P10/1, P40/1, T70/2, PA/2, reject LY2/LG, LY2/GH, LY2/gCTL, LY2/ GCT, P10/2, P30/1, P40/2, P30/2, T40/2, T40/1, TA/2 these 11 is used the sensor that frequency is low.To selected The sensor selected carries out place of production differentiation, and its effect is shown in Table 11.After rejecting 11 sensors, Suo Jian colony tree kind is on Fructus Myricae rubrae ridge and prunus mume (sieb.) sieb.et zucc. The estimated performance or 100% of the place of production model of the family's place of production, depressed place two superfine Folium Camelliae sinensis, and respective number of principal components all reduces from 5 and 6 To 2 so that model more simplifies, and greatly reduce number of sensors.By the average fingerprint collection of illustrative plates of this model with main become Divide analysis chart, may infer that between each place of production, sample differs greatly so that every sensor performance is all preferable, is simply keeping model On the basis of performance is constant, simplify the sensor number required for modeling as far as possible.Here, just can be well with seven sensors Set up colony's seeds superfine Folium Camelliae sinensis in Fructus Myricae rubrae ridge and the place of production, Mei Jia depressed place two.
Model LHT-LWT, model LYJ-LWJ, the sensor response collection of illustrative plates of model QHJ-QLJ-QWJ take turns something lost respectively through 3 After propagation algorithm, discovery is all the frequency that tetra-sensors of LY2/LG, PA/2, P30/1, TA/2 are used in each genetic process Minimum, therefore reject this four sensors.To the LY2/G stayed, LY2/AA, LY2/GH, LY2/gCTL, LY2/gCT, T30/1, These 14 sensors such as P10/1, P10/2, P40/1, T70/2, P40/2, P30/2, T40/2, T40/1 carry out building of place of production model Vertical, the modeling effect before and after its sensor is rejected is shown in Table 11.
Tiger is originated from for full spectrum modeling effect the most less-than-ideal model QHJ-QLJ-QWJ(and runs Hou Shan, Long Wu and father-in-law Colony's seeds fine work Folium Camelliae sinensis that family's mountain these three is local), after sensor selects, calibration set and the overall discrimination of forecast set 79.59% and 69.39% is brought up to the most respectively from 71.43%, 67.35%.Before and after although sensor is rejected, the school of model LYJ-LWJ Just collecting discrimination or 93.85%, but the differentiation effect of forecast set is bringing up to 90.91% from 87.88%.The sensor of institute's established model After number is reduced to 14, although the prediction effect of model LHT-LMT is not reaching to original 100%, but also have 96.97%, super Cross 95%, fully meet popularization and application.
The principle of variance analysis is applied, every sensor as a factor, the response of different samples in this model As level, carrying out homogeneity test of variance, table 12 is that the place of production model LHT-LMT, LWJ-LYJ are to sensor LY2/LG Yu TA/2 F checks.F due to these 2 models0.05=3.84, therefore the place of production of these 2 models is distinguished not notable by these two sensors, because of This can reject this two sensors in these 2 models.
The most at source in the sample data of model LHT-LMT load diagram (Figure 14 (a)) after principal component analysis, PA/2, P30/1 are close, by Combinatorial Optimization with the effect of other two sensors in the most red mark and blue mark respectively Effect, rejects this two sensors in model LHT-LMT.Understand in the load diagram of model LYJ-LWJ according to same principle, PA/2 Yu T70/2 relatively, belongs to the sensor playing common class effect in load diagram;P30/1 Yu P40/2 closely, There is similar Loading value, belong to the sensor of common class effect, the most also reject this two sensors.
Sensor in 6.3 seeds models selects to optimize and screening strength
(1) the calibration set forecast set sample of seeds model divides
In order to ensure the comparability of seeds model, at this research in mainly under the same production region conditions of same grade not With seeds Folium Camelliae sinensis model.In 617 Tea Samples gathered, there are two seeds models: (1) originates from Mei Jia depressed place superfine Folium Camelliae sinensis Dragon Well tea 43#(LMT) and colony seeds (QMT);(2) originate from father-in-law family's Rhizoma Atractylodis Macrocephalae to sample tea the Dragon Well tea 43#(LWJ of leaf) and colony seeds (QWJ).To the sample in each model, all randomly choosing 2/3rds and make calibration set sample, remaining 1/3rd are used as in advance Surveying collection sample, concrete sample distribution is as shown in table 13.
(2) Electronic Nose of seeds model responds collection of illustrative plates and principal component analysis
Figure 15 is the average response collection of illustrative plates of respective seeds in two seeds models, is difficult to directly differentiation respective in collection of illustrative plates Seeds model.In the main constituent figure of Figure 16, owing to the sample of all kinds of seeds presents overlapping phenomenon, it is impossible to carry out seeds intuitively Judge.
(3) sensor in seeds model selects
The sensor of seeds model (LMT-QMT) is responded collection of illustrative plates after 3 take turns genetic algorithm, find five sensors The frequency that LY2/AA, LY2/GH, LY2/gCT, T30/1, TA/2 are used in each genetic process is minimum, therefore reject this five Root sensor, to the LY2/LG stayed, LY2/G, LY2/gCTL, P10/1, P10/2, P40/1, T70/2, PA/2, P30/1, These 13 sensors such as P40/2, P30/2, T40/2, T40/1 carry out the foundation of different tree species model, before and after its sensor is rejected Modeling effect be shown in Table 14.Use Dragon Well tea 43# and the group of 13 sensor array JIANMEI man of institute depressed place superfine Folium Camelliae sinensis of function admirable Body seeds model, its overall discrimination increases, and not only calibration set brings up to 96.92% from 95.38%, and forecast set from 93.94% brings up to 96.97%, with the discrimination of calibration set closely, absolutely proves that this model is highly stable.
The sensor of seeds model (LWJ-QWJ) is responded collection of illustrative plates after 3 take turns genetic algorithm, find four sensors The frequency that P10/1, P40/1, T40/1, TA/2 are used in each genetic process is minimum, therefore rejects this four sensors, To the LY2/LG stayed, LY2/G, LY2/AA, LY2/GH, LY2/gCTL, LY2/gCT, T30/1, P10/2, T70/2, PA/2, These 14 sensors such as P30/1, P40/2, P30/2, T40/2 carry out the foundation of different tree species model, before and after its sensor is rejected Modeling effect be shown in Table 14.As seen from table, although sensor is reduced to 14 from 18, but the estimated performance of this seeds model does not has Becoming, calibration set and forecast set keep original 92.31% and 93.34% the most respectively.
By the sample sensor data of each seeds model being carried out respectively one factor analysis of variance, find at seeds mould The discrimination of these five sensors rejected in type (LMT-QMT) is the least, and its F value is both less than F0.05=3.84(table 15); Seeds model (LWJ-QWJ) eliminates inapparent four sensors of all discriminations (table 16).
For this different model of three classes of grade, the place of production and seeds, its initial data is different, and model property is the most different, therefore After using genetic algorithm, the most different for the number of sensors of each self-modeling.Sensor number used by all Grade Models is all 15 Root;In the model of the place of production, colony's kind superfine Folium Camelliae sinensis produces the sensor number of model (LHT-LMT) in Fructus Myricae rubrae ridge and two places, Mei Jia depressed place Being reduced to 7, other three place of production models (LHT-LMT, LYJ-LWJ, QHJ-QLJ-QWJ) are all 14;In seeds model, prunus mume (sieb.) sieb.et zucc. Sensor number used by the Dragon Well tea 43# of family's depressed place superfine Folium Camelliae sinensis and colony's seeds model (LMT-QMT) is 13, and father-in-law family's Rhizoma Atractylodis Macrocephalae is sampled tea leaf Dragon Well tea 43# and colony seeds model (LWJ-QWJ) be 14 sensors.
The present invention utilizes the characteristic of genetic algorithm parallel optimization and global convergence, applies the method in Electronic Nose analysis In the modeling sensor screening of tea leaf quality, not only make modeling number of sensors effectively be reduced, simplified model, reduce The instrument requirement to number of sensors, saves resource, saves instrument cost;And keep or further increase precision of prediction, Obtain preferable result.

Claims (1)

1. apply intelligent sensory signal detects an exceptional sample method during carrying out the pattern recognition analysis of tea leaf quality, It is characterized in that: use detection by electronic nose method collecting sample, described Electronic Nose use Alpha MOS company of France produce with The Fox 4000 type Electronic Nose of head space automatic system;First, add after loading 1.00g Dragon Well tea dried bean curd tea in each 20mL ml headspace bottle Enter 5mL room temperature ultra-pure water gland to seal;Sample contained the most by this way by every kind of tea sample, and detects successively;The detection ring of each sample Joint is all first ml headspace bottle to be sent into preheating zone, after heating 900s, extracts out at a temperature of 500rpm agitator rotating speed and 60 DEG C of head spaces 2.0mL gas is injected into Electronic Nose sensor array room with the sample introduction speed of 2.0mL/s, respectively with Electronic Nose sensor array room In the semi-conducting material generation adsorption and desorption effect on 18 metal oxide sensor surfaces, cause sensor resistance Change;Being 120s in the sensor array indoor sample gas time of staying, every 0.5s samples once, and Electronic Nose software is remembered automatically Record each sampled data;
Judge the generation of exceptional sample be whether maloperation or instrument abnormal, be then to be corrected by Resurvey;No, then Use principal component analysis shot chart method to combine mahalanobis distance method exceptional sample is identified;
Described principal component analysis shot chart method eliminates phase mutual respect in information co-exist on the premise of not losing main profile information Folded part, the similarity between principal component scores reflected sample after principal component analysis and uniqueness, based on sample shot chart energy Enough disclose internal feature and the clustering information of sample, further illustrate whether each sample exists bigger difference in big class sample set Different;Wherein, the method that main constituent calculates uses the nonlinear iterative partial least square method of house one validation-cross;
The response data that described mahalanobis distance method of discrimination combines sensor is carried out, the mahalanobis distance calculation procedure of collection of illustrative plates sample As follows:
T ‾ = Σ i = 1 t i m - - - ( 1 - 1 )
T c e n = T - T ‾ - - - ( 1 - 2 )
M = T ′ c e n T c e n m - 1 - - - ( 1 - 3 )
MD i = [ ( t i - T ‾ ) M - 1 · ( t i - T ‾ ) ′ ] 1 2 - - - ( 1 - 4 )
T in formulaiFor the collection of illustrative plates score of calibration set sample i, T is the finger printing sample of Folium Camelliae sinensis sampling,For m sample of calibration set Average matrix;TcenAverage centralization matrix for T;M is the mahalanobis distance matrix of calibration set sample;MDiFor calibration set The mahalanobis distance of sample i, according to quantitative correction allowable error and corresponding mahalanobis distance, determines that outlier mahalanobis distance threshold value limits And after to spectrum data standardization, the mahalanobis distance size of each sample is determined by following formula:
hii=ti T(TTT)-1ti (1-5)
In intelligent sensory sensor detects, hiiHave expressed sample i to regression model influence degree, maximum hiiShow this recurrence Model is relatively big to the dependency of sample i, then sample i is exceptional sample.
CN201310323279.2A 2013-07-30 2013-07-30 A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality Active CN103487558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310323279.2A CN103487558B (en) 2013-07-30 2013-07-30 A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310323279.2A CN103487558B (en) 2013-07-30 2013-07-30 A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality

Publications (2)

Publication Number Publication Date
CN103487558A CN103487558A (en) 2014-01-01
CN103487558B true CN103487558B (en) 2016-10-12

Family

ID=49827944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310323279.2A Active CN103487558B (en) 2013-07-30 2013-07-30 A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality

Country Status (1)

Country Link
CN (1) CN103487558B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109884257A (en) * 2019-03-28 2019-06-14 南京林业大学 The discrimination method of cyclocarya paliurus tea

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103499607B (en) * 2013-07-30 2016-03-09 中国标准化研究院 Exceptional sample point elimination method in a kind of detection by electronic nose honey
KR101637773B1 (en) * 2014-12-11 2016-07-07 현대자동차주식회사 Apparatus for judging sense of smell and method for the same
CN104655812B (en) * 2014-12-16 2016-05-04 谢绍鹏 The good and bad method for quick identification of a kind of pseudo-ginseng true and false
CN104849320A (en) * 2015-06-04 2015-08-19 安徽农业大学 Yellow bud tea aroma grade sorting method by use of electronic nose
CN106096649B (en) * 2016-06-08 2019-08-06 北京科技大学 Sense of taste inductive signal otherness feature extracting method based on core linear discriminant analysis
CN106227039B (en) * 2016-08-24 2019-07-09 贵州铜仁和泰茶业有限公司 A kind of tea-processing equipment control method based on pattern-recognition
CN106325154B (en) * 2016-08-24 2018-12-11 贵州铜仁和泰茶业有限公司 A kind of tealeaves rolling heating stirring machine control method based on pattern-recognition
CN106501470B (en) * 2016-11-23 2018-10-30 广东嘉豪食品有限公司 Utilize the method for gustatory system and electronic nose association evaluation mustard thick chilli sauce flavor grade
CN106680241A (en) * 2017-01-13 2017-05-17 北京化工大学 Novel spectrum multi-analysis classification and identification method and application thereof
CN107273421B (en) * 2017-05-16 2020-10-23 浙江大学 High-accuracy mode identification and detection method for aroma type and quality of tea
CN107436285A (en) * 2017-06-20 2017-12-05 苏州优函信息科技有限公司 Fast high-flux bloom spectrum detection device and detection method based on linear light source excitation
CN107846670B (en) * 2017-11-01 2020-05-26 东华大学 Blind regression modeling and updating method for protecting data privacy in mobile group perception
CN108133313B (en) * 2017-11-12 2021-07-20 华南农业大学 Artificial intelligent sensory evaluation food flavor system and construction method thereof
CN108627641A (en) * 2018-04-28 2018-10-09 璞晞(广州)生物免疫技术有限公司 The check and evaluation method and kit of hepatopathy T cell function
CN109115692B (en) * 2018-07-04 2021-06-25 北京格致同德科技有限公司 Spectral data analysis method and device
CN110780010A (en) * 2019-09-16 2020-02-11 陕西师范大学 Food flavor quality evaluation information detection method and system
CN110672582B (en) * 2019-10-08 2020-09-15 浙江大学 Raman characteristic spectrum peak extraction method based on improved principal component analysis
CN112415152A (en) * 2020-10-10 2021-02-26 华南农业大学 Method for identifying yak milk adulteration and application
CN113705856B (en) * 2021-07-16 2023-10-03 北京电子工程总体研究所 Maintenance strategy optimization method based on dynamic monitoring of multiple quality characteristics
CN113836784B (en) * 2021-07-23 2023-10-27 塔里木大学 Apple identification system and method based on information fusion technology
CN114235981B (en) * 2021-11-17 2024-07-02 上海应用技术大学 Method for identifying perilla leaf essential oil by combining gas phase-mass spectrum-sniffing instrument and gas chromatography-ion mobility spectrometry

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222164A (en) * 2011-05-30 2011-10-19 中国标准化研究院 Food sensory quality evaluation method and system thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009217555A (en) * 2008-03-11 2009-09-24 Mitsubishi Electric Corp Device for determining abnormality of network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222164A (en) * 2011-05-30 2011-10-19 中国标准化研究院 Food sensory quality evaluation method and system thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FTIR-ATR指纹图谱的主成分分析-马氏距离法应用于烟用香精质量控制;王家俊 等;《光谱学与光谱分析》;20070531;第27卷(第5期);第895-989页 *
一种基于Mahalanobis距离和主成分分析的电子鼻信号预处理方法;马剑伟 等;《电脑知识与技术》;20100331;第6卷(第7期);第1699-1700,1717页 *
近红外光谱分析中异常值的判别与定量模型优化;闵顺耕 等;《光谱学与光谱分析》;20041031;第24卷(第10期);第1205-1209页 *
近红外光谱的主成分分析——马氏距离分类法应用于品牌卷烟烟丝的快速鉴别;李维莉 等;《云南农业大学学报》;20100331;第25卷(第2期);第268-271页 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109884257A (en) * 2019-03-28 2019-06-14 南京林业大学 The discrimination method of cyclocarya paliurus tea

Also Published As

Publication number Publication date
CN103487558A (en) 2014-01-01

Similar Documents

Publication Publication Date Title
CN103487558B (en) A kind of method detecting exceptional sample during the pattern recognition analysis applying intelligent sensory signal to carry out tea leaf quality
CN103487537A (en) Detection method for producing areas of Xihulongjing tea based on genetic algorithm optimization
Chen et al. Non-parametric partial least squares–discriminant analysis model based on sum of ranking difference algorithm for tea grade identification using electronic tongue data
Dębska et al. Application of artificial neural network in food classification
Forina et al. Class-modeling techniques, classic and new, for old and new problems
Lu et al. Quality level identification of West Lake Longjing green tea using electronic nose
CN103499613A (en) Selection method of intelligent sensory spectrum feature sensors in electronic nose Longjing tea quality detection system
CN104316491B (en) Method for detecting urea doped in milk based on synchronous-asynchronous two-dimensional near-infrared correlation spectrum
CN110309886A (en) The real-time method for detecting abnormality of wireless sensor high dimensional data based on deep learning
Ayari et al. Using an E‐nose machine for detection the adulteration of margarine in cow ghee
CN102222164A (en) Food sensory quality evaluation method and system thereof
Zhang et al. ‘Sensory analysis’ of Chinese vinegars using an electronic nose
Peres et al. Chemometric classification of several olive cultivars from Trás-os-Montes region (northeast of Portugal) using artificial neural networks
CN103499663B (en) A kind of system of selection based on sensor in the Longjing tea Quality Detection Grade Model of genetic algorithm
Kalogiouri et al. Liquid chromatographic methods coupled to chemometrics: A short review to present the key workflow for the investigation of wine phenolic composition as it is affected by environmental factors
CN103499609B (en) A kind of method that honey fragrance intelligence sense of smell dynamic response feature and differentiation information dynamic characterization are studied
de Lima et al. Methods of authentication of food grown in organic and conventional systems using chemometrics and data mining algorithms: A review
CN113326472B (en) Pattern extraction and evolution visual analysis method based on time sequence multivariable data
Tundis et al. A feature-based model for the identification of electrical devices in smart environments
Sipos et al. Comparison of novel sensory panel performance evaluation techniques with e‐nose analysis integration
Dercon Understanding child poverty in developing countries: Measurement and analysis
Fu et al. Discrimination of geographical indication of Chinese green teas using an electronic nose combined with quantum neural networks: A portable strategy
CN103499665A (en) Optimizing West Lake Longjing tea tree species detection method on basis of genetic algorithm
CN103499616A (en) Selection method of sensors in producing area models for quality detection of Longjing tea on basis of genetic algorithm
CN103487463B (en) A kind of honey detection method of the support vector machine classifier Selecting parameter based on grid optimization

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant