Application of Chemometric Methods in Search For Illicit Leuckart Amphetamine Aources

Analytica Chimica Acta 446 (2001) 107–114

Application of chemometric methods in searching

for illicit Leuckart amphetamine sources
Waldemar Krawczyk a,∗ , Andrzej Parczewski b
a Central Forensic Laboratory of Police, Al. Ujazdowskie 7, 00-583 Warsaw, Poland
b Faculty of Chemistry, Jagiellonian University, Ul. Ingardena 3, 30-060 Cracow, Poland
Received 7 November 2000; accepted 25 June 2001

A chromatogram of contaminants of illicit amphetamine is referred to as contamination profile. The profile depends on the
method and conditions of the drug synthesis. Therefore, it allows to link a drug sample sized by the police to the source of
illicit production as well as enables to find out the links between dealers and users. In our study, statistical and chemometric
methods were applied. The areas of 15 more informative chromatographic peaks were selected and used as original variables
describing the drug samples. The selected peaks formed a profile. As a rule, samples submitted to the police laboratory
are classified using correlation coefficient between the corresponding profiles as a measure of similarity. A preliminary
classification and the affiliation of a given profile to a class is verified by means of principal component analysis and cluster
analysis. Different distances between profiles are used including Euclidean and Pearson ones. The data structure is visualized
in two- or three-dimensional spaces of two or three most important principal components. For the purpose of this study, a
total of nearly 1000 drug samples were tested. It has been concluded that amphetamine samples may be attributed to one
source if the Euclidean distance between the corresponding profiles is less than 1, and the Pearson distance is less than
0.04. This approach proved very effective in identification of drug traffic routes and clandestine laboratories by the police.
© 2001 Elsevier Science B.V. All rights reserved.
Keywords: Forensic research; Illicit amphetamine; Impurities profile; Correlation coefficient; Principal component analysis; Cluster analysis

1. Introduction tion of impurities. On the other hand, amphetamine

samples obtained in a series of repeated syntheses,
The synthesis of amphetamine is accompanied by carried out according to the same recipe and using the
various side reactions, which result in certain impurity same batch of substrates, are of similar composition.
level of the final product (i.e. amphetamine). Compo- A graphic or numerical representation of composi-
sition of impurities is characteristic for method and tion of impurities in a drug (here amphetamine) is
conditions used for the synthesis. Moreover, it has referred to as impurities profile, and may include, for
been experimentally proven that changes in conditions instance, a chromatogram. A profile helps to iden-
of the synthesis significantly influence the composi- tify the method of amphetamine synthesis. In the
hereby study, statistical and chemometric methods
∗ Corresponding author. Fax: +48-22-8497694.
[1–5] were applied in order to classify amphetamine
E-mail addresses: (W. Krawczyk), samples and to recognize a tested sample as the one (A. Parczewski). belonging to a class of samples originating from the

0003-2670/01/$ – see front matter © 2001 Elsevier Science B.V. All rights reserved.
PII: S 0 0 0 3 - 2 6 7 0 ( 0 1 ) 0 1 2 7 3 - 9
108 W. Krawczyk, A. Parczewski / Analytica Chimica Acta 446 (2001) 107–114

Scheme 1. (a) Formamide; (b) hydrochloric acid; (c) diethyl ether, sulfuric acid.

same source, e.g. the same batch produced by an illicit to the first clustering step and a supervised learning
laboratory. method (SIMCA) for classification of illicit Leuckart
The Leuckart amphetamine synthesis proceeds ac- amphetamine samples on two hierarchical levels, try-
cording to the following Scheme 1 [6–8]. The main ing to establish batch and source relations [10,11]. By
by-products include, among others, the following co- means of impurity profiling, Perkal, Ng and Pearson
mpounds: 4-methyl-5-phenylpyrimidine, 4-benzylpyr- were able to identify methamphetamine samples as be-
imidine, α,α-dibenzylmethylamine, N,N-di(β-phenyl- ing produced according to one of the three most com-
isopropyl)amine, N,N-di(β-phenylisopropyl) forma- mon drug synthesis routs encountered in Australia.
mide. In addition to these, some not reacted sub- A total of 224 samples seized by police were exam-
strates and the impurities in the starting material may ined. Three criteria were taken into account including
be encountered, e.g. formamide, formic acid, BMK, the Euclidean distance between the profiles and the
dibenzyl ketone [7,8]. canonical variable analysis was applied [12]. The main
The profiling of impurities as a tool to identify illicit ideas used in the hereby study have been presented in
drug samples was also used by other authors. Some Ph.D. thesis [8] and in a book by Krawczyk [13].
examples are contained in the “Proceedings of Interna-
tional Symposium of Forensic Science”, Tokyo, 1993
[9]. Mitsui and Okuyama used principal component 2. Experimental
analysis (PCA) in evaluation of data from GC/MS
determination of amphetamine and methamphetamine 2.1. Sample preparation
in urine. Cluster analysis (CA) using Euclidean dis-
tance was applied in quantitative assay by comparison Amphetamine impurities were extracted by 200 ␮l
of unknown samples with the known ones. Okuyama of n-heptane (containing 35 mg l−1 of diphenylamine
and Mitsui used results of GC/MS hemp analysis for as internal standard) from a 2 ml buffer solution at pH
classification purposes. CA was applied using Eu- 7.0 containing 200 mg of amphetamine sulfate. In the
clidean distance for 35 samples examined. Marumo extraction step, the pH was checked and adjusted to
et al. used trace elements (i.e. inorganic) profiles for pH 7.0 if necessary.
characterization of seized methamphetamine samples
(ICP/MS and AAS were applied in trace analysis). 2.2. Gas chromatography
In their evaluation of analytical data and comparison
of samples the authors used simple graphic repre- Gas chromatograph HP 5890 series II (Hewlett-
sentations. Inoue et al. examined impurities profiles Packard) with flame ionization detector (FID) and au-
of methamphetamine seized in Japan by means of tomatic sampler HP 7673 was used with the following
capillary GC. Similarity between samples was tested set-up: splitless injection mode (1 min) with 1 ␮l sam-
by multivariate analysis using Euclidean distance. ples being injected at the injection port temperature
Jonson and Stromberg applied the ‘quotient’ method of 250◦ C, 25 m HP-5 column with 0.32 mm inner di-
W. Krawczyk, A. Parczewski / Analytica Chimica Acta 446 (2001) 107–114 109

ameter and 0.52 ␮m film thickness, oven temperature: culated. High coefficient values (r > 0.95) indicate
100◦ C for 1 min, a ramp at 12◦ C/min to 240◦ C, where high probability that the corresponding drug samples
it was held for 5.5 min, a ramp at 15◦ C/min to final originate from the same source and therefore can be
temperature, 300◦ C for 10 min. assigned to the same class. A profile may tentatively
be assigned to a class even if 0.8 < r < 0.95, but
2.3. Data analysis only if a visual comparison of entire chromatograms
provides for a high similarity between them.
Data were analyzed by means of special programs Following the procedure described above, nearly
developed at Central Forensic Laboratory of the Police 1000 chromatograms were separated into 37 groups.
(for preliminary classification of profiles) and using In order to verify the correctness of the above prelim-
STATISTICA StatSoft (PCA, CA, data visualization). inary classification and to be able to visualize the data
structure, PCA and CA were applied both to the orig-
inal data set as well as to the pre-selected groups of
3. Methodology profiles.
To sum up, the classification and visualization of
In general, the procedure of amphetamine profiling the data were carried out according to the following
presented in this paper consists of the following steps. scheme.
• Gas-chromatographic analysis of amphetamine • Tentative classification according to the correlation
samples submitted to the laboratory. Computer coefficients between the profiles as well as their
collection of the data (chromatograms). visual comparison.
• Statistical handling of the data: calculation of the • PCA and CA using all fifteen original variables as
correlation coefficients and distances (similarity well as the principal components (PCs) which ex-
measures) between profiles, application of the PCA plain 90% of total variance and a visual presenta-
and CA. tion of data structure by means of two or three most
• Appropriate attributing the profiles of the tested informative PCs.
samples to the similarity groups (categories, classes, • Application of CA for verification of the tentative
clusters); each group is presumed to have originated classification of a profile to a given category.
from the same illicit source.
In practice, profiling is carried out according to the 4. Results and discussion
procedure as follows. Impurities are separated from
a drug (amphetamine), then pre-concentrated and 4.1. Principal component analysis (PCA)
analyzed by GC. In this way the impurities profile
(chromatogram) of a tested sample is obtained. Out The purpose of the investigation was to examine
of several dozen chromatographic peaks only fifteen, and visualize the data structure of the 37 groups,
the most characteristic ones, were selected for further I–XXXVII, pre-selected according to the correlation
examination. They correspond to organic compounds coefficient, using the most important PCs. The appli-
(impurities) whose composition depends significantly cation of PCA has been preceded by data standard-
on the method and conditions of amphetamine synthe- ization. The results presented in Table 1 concern 11
sis as well as other consecutive steps of amphetamine groups each containing at least 15 members (profiles).
treatment till the final product is obtained. The areas For each group of profiles the number of eigenvalues
of 15 selected chromatographic peaks are introduced higher than 1 and the percent of total variance ex-
to the database, which is built up ‘dynamically’ as plained by the corresponding PCs are given. Also the
the samples are submitted to the laboratory for the numbers of PCs which explain a given fraction of the
analysis. total variance are presented.
The first step of data handling includes prelimi- In Table 1, it can be noticed that in case
nary grouping of the profiles. In order to achieve this, of pre-selected groups of profiles there are 2–5
correlation coefficients (r) between the profiles are cal- eigenvalues higher than 1. The corresponding PCs
110 W. Krawczyk, A. Parczewski / Analytica Chimica Acta 446 (2001) 107–114

Table 1
The results of PCAa
Group NP EV > 1 Variance (%) Number of PCs which explain total variance

>90% >95% >98%

I 141 5 84.8 7 8 10
IV 177 5 77.8 8 10 11
VII 29 3 81.0 3 5 7
VIII 24 4 78.1 7 9 11
X 37 5 81.1 7 9 10
XVI 69 5 74.2 8 10 12
XIX 25 3 82.7 5 7 9
XXII 45 4 81.2 6 8 9
XXIV 42 5 88.9 6 7 8
XXVIII 16 4 90.0 4 5 7
XXXIII 30 2 86.2 3 5 6
Full base 1000 6 65.9 11 12 13
a Columns contain the following data: 1: symbols of the groups; 2: number of profiles in the group; 3: number of eigenvalues higher

than 1; 4: percent of total variance explained by PCs corresponding to the eigenvalues in column 3; 5: number of PCs which explain a
given fraction (%) of total variance.

explain 74.2–90.0% of total variance. For the entire cantly from the distinct root of the cluster. They seem
database there are six such PCs, which explain 65.9% to be outliers, i.e. they may not belong to the same
of total variance. The results demonstrate that peak source as do the samples making the root of the clus-
areas in the chromatograms (profiles) are not strongly ter. Their membership to the class has to be carefully
correlated. This particularly concerns the database of checked.
1000 profiles. It may be concluded that the objects
(profiles) in the 15-dimensional space of original vari- 4.2. Cluster analysis (CA)
ables (peaks areas) are not set up regularly and the
arrangement vary considerably from group to group. The effectiveness of the preliminary classification
A common problem appears in connection to the carried out on the basis of the correlation coefficient
criterion for selection of the number of PCs necessary is also verified by means of CA. The decision has to
to proper describing the original data structure in a re- be undertaken concerning a choice of the appropriate
duced space. The authors decided to use the percent- distance or similarity measure between the profiles
age of total variance explained as the criterion (see last and a clustering method. Another factor which may
columns in Table 1). The examples of dependence of influence the results of CA is pre-treatment of data
the cumulative variance explained versus the number (e.g. standardization).
of PCs have been presented in Fig. 1a and b. Different distances are used in CA, e.g. Minkowski
An advantage of the PCA application is the possi- including Manhattan and Euclidean, square Euclidean,
bility of at least partial visualization of data structure Chebyshev, Pearson based on the correlation coeffi-
(distribution of objects in the 15-dimensional space) cient and Mahalanobis. Which distance is appropriate
in two- or three-dimensional space of the most infor- depends on context and purpose of data analysis. Var-
mative PCs. A projection of objects onto this reduced ious distances were dealt with in our investigations. In
space remains the most characteristic features of the the examples below we used Euclidean and Pearson
original structure, e.g. concerning clusters and out- distances.
liers. An example is presented in Fig. 2. The projec- There are various approaches in CA used in form-
tion presented in Fig. 2 well reflects the arrangement ing the clusters, e.g. in the hierarchical agglomerative
of objects in the original 15-dimensional space. It can clustering, the most common one, several methods are
be seen that there are samples which differ signifi- applied including single linkage (nearest neighbor),
W. Krawczyk, A. Parczewski / Analytica Chimica Acta 446 (2001) 107–114 111

Fig. 1. Cumulative variance explained vs. number of PCs: (a) for the group IV (177 profiles) and (b) for entire database (ca. 1000 profiles).

complete linkage (furthest neighbor), average linkage Fig. 3 presents dendrogram with pre-selected
with weighted or unweighted pair group methods. In groups XXII and XXIV (Table 1) generated by means
our study we used the single linkage method. The re- of Euclidean distances. The subgroups are observed,
sults of CA are usually presented graphically in form where the distances between the objects are very small
of a dendrogram (Fig. 3). Along the horizontal line and which correspond to very similar drug samples.
(abscissa) the symbols of objects are shown and along In such subgroups the Euclidean distance is less than
the vertical axis (ordinate) the distance between the 0.3 (Pearson distance less than 0.01, Manhattan—
objects is given. less than 0.5 and Chebyshev—less than 0.1). It may
112 W. Krawczyk, A. Parczewski / Analytica Chimica Acta 446 (2001) 107–114

be assumed that the subgroups involve amphetamine

samples, which originate from the same production
The main purpose of application of CA to the
proposed analysis scheme is to confirm preliminary
classification of the profiles made on the basis of corre-
lation coefficients. The dendrogram obtained with Eu-
clidean distance is particularly useful in checking new
samples (profiles) which were assigned to a group with
rather low correlation coefficient (0.8 < r < 0.95).
A careful inspection of many real-case CA diagrams
and taking into the account knowledge and practice in
the field of drug characterization and gas chromatog-
raphy, leads to a conclusion that the profiles can be
considered similar (i.e. amphetamine samples origi-
nate from the same source) if the Euclidean distance
appears less than 1, and the Pearson distance is less
than 0.04.
Fig. 2. Score plot. The projection of profiles in groups XXII and The above thesis is also supported by a model exam-
XXIV (see Table 1) on to PC1, PC2, PC3 space (factor 1 vs. ple based on 50 amphetamine profiles taken from five
factor 2 vs. factor 3 space). most numerous pre-selected groups: I, IV, XVI, XXII
and XXIV (see Table 1). From each group 10 profiles

Fig. 3. Dendrograms for the pre-selected groups of profiles XXII and XXIV. Euclidean distances applied.
W. Krawczyk, A. Parczewski / Analytica Chimica Acta 446 (2001) 107–114 113

Fig. 4. Dendrograms for the model data base composed of the selected members of five pre-selected groups of profiles (see text). Euclidean
distances applied.

were randomly selected out of those for which the cor- Euclidean distance used in the examples presented
relation coefficients exceeded 0.95 within a subgroup. above. The examples presented above in Figs. 2–4
In this way, a model database with 50 profiles, each show that Euclidean distance appears useful not only
described by 15 peak areas was generated. The results in indication of outliers inside a pre-selected group of
of CA have been presented in Fig. 4. Dendrogram re- samples (profiles) but also in discrimination between
veal five distinct groups (clusters). The Euclidean dis- different groups.
tance between the clusters exceeds 1 and the Pearson
distance exceeds 0.04. Actually, the profiles attributed
to a group in Fig. 4 belong to the same illicit source. 5. Final remarks
In Fig. 2 score plot and Fig. 3 dendrogram are
presented which concern two pre-selected groups of Chemometrics becomes a more and more useful
samples, XXII and XXIV (see Table 1). All mem- tool in searching for clandestine laboratories and drug
bers of each group were taken into account including traffic. Profiling of impurities produces data that are
profiles for which 0.8 < r < 0.95 as compared with handling according to the statistical and chemometric
those forming a close core of a group. Fig. 2 reveals methods such as PCA and CA. They are used in or-
two distinct clusters with some samples that clearly der to classify drug samples to appropriate similarity
differ from the clusters’ cores (see cases in Fig. 3 for groups.
which Euclidean distance exceeds 1). These samples Efficiency of the approach presented in a hereby
require special examination including comparison of study has been proven on more than 1000 real drug
entire chromatograms. Data presented in Figs. 2 and 3 samples confiscated by the police. The method enables
prove credibility and effectiveness of the pre-selection an objective computer aided comparison and classifi-
of amphetamine samples according to the correlation cation of amphetamine profiles. On the other hand, the
coefficient. On the other hand, the application of PCA approach includes an interactive step, at the beginning
and CA makes visualization of data structure possible of the classification procedure, in which the whole
using different similarity/dissimilarity measures, e.g. chromatograms may be compared and the appropriate
114 W. Krawczyk, A. Parczewski / Analytica Chimica Acta 446 (2001) 107–114

decision can be undertaken concerning assignment of e.g. inorganic impurities profile. The usefulness of var-
a sample (profile) to the pre-selected groups. The ap- ious pattern recognition methods is already tested.
proach proved very helpful in the operations of the
police against illicit amphetamine manufacturers and
dealers in Poland and therefore the authors will con- References
tinuously work on improving efficiency of the method.
A very characteristic point of the described appli- [1] D.L. Massart, L. Kaufman, The Interpretation of Analytical
cation of chemometric methods is that it concerns a Data by the Use of Cluster Analysis, Wiley, New York, 1983.
‘dynamic’, evolving system under consideration. This [2] M.A. Sharaf, D.L. Illman, B.R. Kowalski, Chemometrics,
Wiley, New York, 1986.
means that as drug samples seized by the police are de- [3] D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte,
livered to the lab their profiles are handled according L. Kaufman, in: B.G.M. Vandeginste, L. Kaufman (Eds.),
the proposed approach, and the corresponding points Chemometrics: A Textbook, Data Handling in Science and
in the multi-component space of features are added to Technology, Vol. 2, Elsevier, Amsterdam, 1988.
the previous patterns. Actually, the patterns expand. As [4] R.G. Brereton, Chemometrics. Application of Mathematics
and Statistics to Laboratory Systems, Ellis Horwood, New
long as a batch of illicit drug production is still on the York, 1990.
illegal market, and the drug samples are still seized by [5] G. Kateman, L. Buydens, Quality control, in: J.D.
the police, the population of the corresponding clus- Winefordner, I.M. Kolthoff (Eds.), Analystical Chemistry,
ter increases. A new drug source results in the forma- A Series of Monographs on Analytical Chemistry and its
tion of a new cluster, while if the drug stock of an old Applications, Vol. 60, Wiley, New York, 1993.
[6] F.S. Crossley, M.L. Moore, J. Org. Chem. 9 (1944) 529.
production batch is running low no new profiles are [7] A.M.A. Verweij, Forensic Sci. Rev. 1 (1989) 2.
added to the existing cluster which becomes stagnant. [8] W. Krawczyk, Chemometrics in forensic amphetamine
Then the examples presented in this paper corre- analysis, Ph.D. thesis, Faculty of Chemistry, Jagiellonian
spond to a given moment in time and the patterns be- University, 1998 (in Polish).
fore and after that time differ their changes reflecting [9] National Research Institute of Police Science, in: Proceedings
of the International Symposium of Forensic Science, Tokyo,
the change in drug marked and distribution. Of course, 1993.
for the police operation purposes each profile is de- [10] C.S.L. Jonson, L. Strömberg, Forensic Sci. Int. 69 (1994) 31.
scribed by the time, place and other conditions and [11] S. Jonson, Identification and forensic classification of
circumstances the drug sample was seized. amphetamine, Linköping studies in science and technology,
Although capillary GC/MS turned out to be useful Dissertation No. 641, Linköping, 2000.
[12] M. Perkal, Y.L. Ng, J.R. Pearson, Forensic Sci. Int. 69 (1994)
in amphetamine profiling, some additional significant 77.
(informative) variables, apart from the areas of chro- [13] W. Krawczyk, Amphetamine Profiling, Central Forensic
matographic peaks, will be taken into consideration, Laboratory of Police, Warszawa, 1998 (in Polish).

