Automated Identification and Classification of Diatoms from Water Resources

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11401))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2433 Accesses
8 Citations

Abstract

The quantity of certain types of diatoms is used for determining water quality. Currently, a precise identification of species present in a water sample is conducted by diatomists. However, different points of view of diatomists along with different sizes and shapes that diatoms may have in samples makes diatoms identification difficult, which is required to classify them into genera to which they belong to. Additionally, chemical processes, that are applied to eliminate unwanted elements in water samples (debris, flocs, etc.) are insufficient. Thus, diatoms have to be differentiated from those structures before classifying them into a genus. In fact, researchers have a special interest on looking for different ways to perform an automated identification and classification of diatoms. In spite of applications, an automatic identification of diatom has a high level of difficulty, due to the present of unwanted elements in water samples. After diatoms have been identified, diatoms classification into genera is an additional problem.

In this paper, an automatic method for identification and classification diatoms from images is presented. The method is based on the combination of Scale and Curvature Invariant Ridge Detector (SCIRD-TS), following by a post processing method, and the use of a nested Convolutional Neural Networks (CNN). Whilst the identification approach is able to identify well-defined ridge structures, the nested CNN is able to classify a diatom into the genus to which it belongs to.

You have full access to this open access chapter, Download conference paper PDF

Diatom Segmentation in Water Resources

Deep Learning Versus Classic Methods for Multi-taxon Diatom Segmentation

Diatom Classification Including Morphological Adaptations Using CNNs

Keywords

1 Introduction and Related Work

Diatoms are a type of microscopic algae or plankton called phytoplankton, divided into more than twenty thousand species. They are used as paleoenvironmental indicators since the presence of certain diatom’s genres indicates water purity or contamination, along with the presence of fecal matter, among others. Additionally, diatoms may be used to make historical environmental estimates of water sources, through the abundance or scarcity of some diatom individuals in water sources, such as studying of fossil deposits in lake sediments. Also, environmental variables that have been affected or dominated in the past can be tracked and estimated by identifying the present of diatoms in the source to be analysed [1]. Variations in temperature, pH or conductivity over centuries may be estimated by studying diatoms in sediments, allowing to know how climate has affected a studied area, along with to state baseline conditions from which it is possible to define a set of criteria to determine quality of water, and establish parameters by environmental regulatory bodies of some governments.

Currently, diatomists visually identify those microscopical structures from a given sample in a microscope. Visual identification of diatoms is a task mainly based on subjective with limited repeatability and requires inter-observer agreements [3, 4]. However, images of different sections of water samples can be obtained connecting a camera to a microscope. Different methods for diatom identification have been studied. Identification methods based on coherent optics and holography have been also proposed. However, these methods have a high computational cost and have not been adopted as an alternative to support biologists. The use of operators invariant to translation, rotation and scale, as well as Legendre Polynomials and Principal Component Analysis have been used to identify specific genera of diatoms [6, 7]. Rojas Camacho et al. [5] studied the use of a tuning method to set up the best parameters iteratively, as an optimisation problem, comparing the current result with the last result, and then validated them with Canny edge detector and a binarisation technique.

Although segmentation of structures, like diatoms, is the first step in any investigation, computer science applied to the diatoms field is focused on the classification of species. The Automatic Diatom Identification and Classification (ADIAC) project is a reference in the investigation of diatoms analysis systems [8]. In ADIAC, 171 features were used for diatom classification, using features to describe symmetry, shape, geometry and texture by the means of different descriptors. Dimitrovski et al. argue that, in ADIAC image data set, the SIFT descriptors have better results that the use of Support Vector Machines (SVM). The best results, up to 97.97% accuracy, have been obtained with 38 classes using Fourier and SIFT descriptors with a random forest classifier. Alvarez et al. [11] proposed a method to classify diatoms using Learning Vector Quantization (LVQ) neural network. According with Hawickhorst et al. [12], the use of LVQ allows lower training time that networks based on a training with back-propagation. However, if it is necessary to include more hidden units, the LVQ network will take more time. Approaches as [3, 4] are based on hand-crafted or “hand-designed” methods where a set of fixed features is used. However, hand-crafted methods present limited results as in [9], where 14 classes were classified with SVMs, 10 fcv, using 44 GLCM features that describe geometric and morphological properties. They obtained an accuracy of 94.7%.

In this paper, an automated method for identification and classification of diatom from images is presented. The proposed method is based on the combination of Scale and Curvature Invariant Ridge Detector (henceforth SCIRD-TS) [2] followed by a post processing, and the use of nested Convolutional Neural Networks (CNN). An experimental evaluation is conducted using the F-Score for assessing results, using a ground truth images set. Our approach is able to segment well-defined ridge structures or Regions of Interest (henceforth, RoIs) and the nested CNN is able to classify those RoIs that have been previously segmented in an image of a water sample. The first CNN allows to discard those RoIs from well-defined structures, but which correspond to undesired elements (debris, flocs, etc.), and a second CNN classifies those RoIs containing diatoms into genera to which they belong to.

2 Identification and Classification of Diatoms

The diatoms identification method has two phases: the first phase is focused on segmentation of objects present in images, called RoIs, and the second phase is focused on identification of diatoms by classifying those RoIs depending on whether a RoI corresponds to diatom or not. Whilst the classification is done using identified RoIs as diatoms for classifying them into genera.

2.1 RoIs Segmentation

The segmentation of RoIs is based on SCIRD-TS, which is presented as a filter bank in the application domain of retinal images and it is able to identify thin structures [2]. SCIRD-TS filter bank is adapted and tests on a set of diatom images, using the implementation available at the author’s web-page. SCIRD-TS filter bank, by Annunziata [2], is defined as:

$$\begin{aligned} F(x;\sigma ;k)=\frac{1}{\sigma _{2}^2}\left[ \frac{(x_{2}+kx_{1}^2)^2}{\sigma _{2}^2}-1 \right] exp \left[ {-\frac{x_{1}^2}{2\sigma _{1}^2}-\frac{(x_{2}+kx_{1}^2)^2}{2\sigma _{2}^2}}\right] , \end{aligned}$$

(1)

where $ (x_{1}, x_{2})$ represents a point in the image coordinate system, k is a shape parameter and $ \sigma = (\sigma _{1},\sigma _{2}) $ corresponds to standard deviations in the Gaussian distribution, in each coordinate direction, and k, $\sigma _{1}$ and $\sigma _{2}$ are parameters provided by a user. Since quality of water sample images may vary, two segmentation methods are presented: Method 1: it is proposed for images with high luminosity, large diatoms size, fluorescence conditions, debris concentration of large size and low noise levels, along with diatoms have high relief. It is based on the application of SCIRD-TS with the following parameters: $\sigma _{1}=\left[ 1 , 2 \right] $ with step 1, $\sigma _{2}=\left[ 1 , 2 \right] $ with step 1; $k=\left[ -0.1 , 0.1 \right] $ with step 0.1 and $\theta _{step}=15$ and a post-processing with fixed threshold, morphological operations and filtering based on area, under the assumption that flocs are of small size, due to low noise levels. Method 2: it is proposed for images with high noise levels—caused by large load of particles, fragments and flocs of organic matter—, and low signal-to-noise ratio. It is based on a difference of Gaussians that is calculated by subtracting a resulted image after a single application of SCIRD-TS and a resulted image after a double application of SCIRD-TS. The first image is obtained using the set of parameters: $\sigma _{1}=\left[ 1 , 2 \right] $ with step 1, $\sigma _{2}=\left[ 1 , 2 \right] $ with step 1; $k=\left[ -0.1 , 0.1 \right] $ with step 0.05 and $\theta _{step}=15$; and the second image is obtained using the set of parameters—with a variation on $\sigma _{2} $—: $\sigma _{1}=\left[ 1 , 2 \right] $ with step 1, $\sigma _{2}=\left[ 1 , 11 \right] $ with step 3; $k=\left[ -0.1 , 0.1 \right] $ with step 0.05 and $\theta _{step}=15$. Since images have high presence of fluff and dust, the first image has higher intensities than the second one. Subtracting two Gaussian blurs allows to keep the spatial information conserved in the two blurred images, which is assumed to be the desired information [10]. That means to purge dust and fluff. After the difference of Gaussians, an adaptative threshold is applied following by morphological operations and filtering objects by area. Figure 1 illustrates results obtained during the different steps of the two methods.

2.2 Classification of Diatoms

After segmented RoIs, three nested Convolutional Neural Networks (henceforth, CNNs) are used to classify them into diatom and non-diatom. AlexNet, GoogLeNet and ResNet are the best known, commonly used for recognition of objects, such as animals, people and equipment, or for recognition of specialised objects through transfer learning techniques. Using a fine-tuning technique, a pre-trained CNN model is taken and modified some layers to recalculate parameters in order to learn about training images in the problem that is addressed. A nested CNN consists of a first network that allows discarding those unwanted elements that have been segmented in the segmentation phase (background and debris). RoIs classified as diatoms are taken into a second network, where they are classified according to genera to which they belong. Figure 2 shows the classification results obtained using three nested CNN models.

3 Experimental Results

The experimental evaluation is performed using two groups of images, according to the previously defined methods. Hence, the first group is composed of 96 images, obtained with a microscope Nikon Eclipse Ni-U90 and the second one is composed of 269 images, obtained with a microscope Nikon E200. The ground truth consists of the 365 images of the two groups aforementioned, with labelled regions indicating the specimens, by experts. CNN models are trained using 16,000 segmented RoIs that contain diatoms, background and debris are used. CNNs are trained with MATLAB©Deep Learning Toolbox™. The performance of the proposed segmentation methods is evaluated using two levels of quantitative strategies: pixel and diatom identification, and measure with F-Score. Table 1 shows the results in terms of pixels correctly identified and at the diatom identification level. The experimental results indicate that the Method 1 yields higher F-Scores at pixel and at diatom levels than the Method 2, whist the Method 2 has higher accuracy than the Method 1 at pixel level.

Table 1. Error analysis of segmentation results using G1 and G2, that symbolise the group 1 and the group 2 of images.

Full size table

Classification tests using three nested CNN models were done using as input the RoIs obtained in the segmentation phase. Table 2 shows the architecture per network with the respective error analysis. Among the three nested CNN, AlexNet shown the best performance.

Table 2. Classification error analysis at the diatom identification level using the three CNNs. G1 and G2 symbolise the group 1 and the group 2 of images.

Full size table

4 Final Remarks

We proposed a method for automatically identify and classify diatoms. The method combines SCIRD-TS hand-crafted filter banks with a post-processing, in two different ways depending on specific image characteristics, in order to identify RoIs. We reckon that combining detection of structures and a post-processing strategy to detect potential regions of interest, may lead to a substantial speed-up of diatom segmentation, since a post-processing allows filtering unwanted elements. Although, morphological operations and filters remove flocs of small sizes, there remain regions with flocs of large size. Those flocs cannot be removed by the above mentioned operations, because wanted structures, such as diatoms, may be affected and they may have even smaller size of unwanted structures.

Well-known CNN models were tested for classifying RoIs into diatoms and unwanted elements, such as debris or flocs. Once diatoms are identified from RoIs, a second CNN is used for classifying those diatoms into genera. AlexNet has shown the best performance among the three evaluated networks. In general, the first network, in the nested CNN models, has had a good performance which can be improved in a future work. This indicates that the proposal meets with the objective of discarding those RoIs that are not desired. It is possible that some of those RoIs have no justification in being discarded, which contributes to false negatives. We notice that the performance of second network, used to classify identified diatoms into different genera, goes down. This allows to set a horizon of improvement of the networks. It appears to be very important to maintain a balance among training images by class. While the first network has an acceptable level of balance (more than 16,000 diatom training images), the imbalance of the second network’s classes is large. This is due to a scarce image bank, which makes it necessary to have a larger set of images per genus, especially with those genera with a limited number of individuals. In addition, there is a lot of work in trying other ways to increase the data, enhancing different characteristics to be learned by a network.

References

Smol, J.P., Stoermer, E.F. (eds.): The Diatoms: Applications for the Environmental and Earth Sciences, vol. 17, pp. 283–284. Cambridge University Press, Cambridge (2010)
Google Scholar
Annunziata, R., Trucco, E.: Accelerating convolutional sparse coding for curvilinear structures segmentation by refining SCIRD-TS filter banks. IEEE Trans. Med. Imag. 35, 2381–2392 (2016)
Article Google Scholar
Bueno, G., et al.: Automated diatom classification (Part A): handcrafted feature approaches. Appl. Sci. 7, 753 (2017)
Article Google Scholar
Pedraza, A., Bueno, G., Deniz, O., Cristóbal, G., Blanco, S., Borrego-Ramos, M.: Automated diatom classification (Part B): a deep learning approach. Appl. Sci. 7, 460 (2017)
Article Google Scholar
Rojas Camacho, O., Forero, M., Menéndez, J.: A tuning method for diatom segmentation techniques. Appl. Sci. 7, 762 (2017)
Article Google Scholar
Pech-Pacheco, J.L., Alvarez-Borrego, J.: Optical-digital system applied to the identification of five phytoplankton species. Mar. Biol. 132, 357–365 (1998)
Article Google Scholar
Pappas, J.L., Stoermer, E.F.: Legendre shape descriptors and shape group determination of specimens in the Cymbella cistula species complex. Phycologia 42, 90–97 (2003)
Article Google Scholar
Du Buf, H., et al.: Diatom identification: a double challenge called ADIAC. In: 10th International Conference on Image Analysis and Processing, pp. 734–739 (1999)
Google Scholar
Lai, Q.T.K., Lee, K.C.M., Tang, A.H.L., Wong, K.K.Y., So, H.K.H., Tsia, K.K.: High-throughput time-stretch imaging flow cytometry for multi-class classification of phytoplankton. Opt. Soc. Am. 24, 28170–28184 (2016)
Google Scholar
Davidson, M.W., Abramowitz, M.: Molecular expressions microscopy primer: digital image processing-difference of gaussians edge enhancement algorithm. Olympus America Inc., and Florida State University (2006)
Google Scholar
Alvarez, T., et al.: Classification of microorganisms using image processing techniques. In: 2001 International Conference on Image Processing (Cat. No. 01CH37205), Thessaloniki, vol. 1, pp. 329–332. IEEE Conferences (2001)
Google Scholar
Hawickhorst, B.A., Zahorian, S.A., Rajagopal, R.: A comparison of three neural network architectures for automatic speech recognition. In: Intelligent Engineering Systems Through Artificial Neural Networks, vol. 5, pp. 221 (1995). In: Advances in Neural Information Processing Systems. Neural Information Processing Systems Foundation Inc., La Jolla, CA, USA, pp. 1097–1105 (2012)
Google Scholar

Download references

Acknowledgments

The first author thanks to Santander Bank for the financial support for his mobility to Universidad de Castilla-La Mancha, Ciudad Real, Spain.

Gloria Bueno acknowledges financial support of the Spanish Government under the Aqualitas-retos project (Ref. CTM2014-51907-C2-2-R-MINECO).

The authors acknowledge the contribution to this work of Dr. E. Peña from Universidad del Valle. The authors are also grateful to the anonymous reviewers for their valuable comments, suggestions and remarks, which contributed to improve the paper.

Author information

Authors and Affiliations

Multimedia and Computer Vision Group, Universidad del Valle, Cali, Colombia
Jose Libreros & Maria Trujillo
Grupo de Visión y Sistemas Inteligentes, Universidad de Castilla La Mancha, Ciudad Real, Spain
Gloria Bueno
Grupo de Investigación en Biología de Plantas y Microorganismos, Universidad del Valle, Cali, Colombia
Maria Ospina

Authors

Jose Libreros
View author publications
You can also search for this author in PubMed Google Scholar
Gloria Bueno
View author publications
You can also search for this author in PubMed Google Scholar
Maria Trujillo
View author publications
You can also search for this author in PubMed Google Scholar
Maria Ospina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jose Libreros or Maria Trujillo .

Editor information

Editors and Affiliations

Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Ruben Vera-Rodriguez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Julian Fierrez
Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid, Madrid, Spain
Aythami Morales

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Libreros, J., Bueno, G., Trujillo, M., Ospina, M. (2019). Automated Identification and Classification of Diatoms from Water Resources. In: Vera-Rodriguez, R., Fierrez, J., Morales, A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science(), vol 11401. Springer, Cham. https://doi.org/10.1007/978-3-030-13469-3_58

Download citation

DOI: https://doi.org/10.1007/978-3-030-13469-3_58
Published: 03 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13468-6
Online ISBN: 978-3-030-13469-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Automated Identification and Classification of Diatoms from Water Resources

Abstract