Nothing Special   »   [go: up one dir, main page]

Next Issue
Volume 4, December
Previous Issue
Volume 4, June
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Data, Volume 4, Issue 3 (September 2019) – 41 articles

Cover Story (view full-size image): Satellite earth observation is gaining increasing recognition as a technology for addressing global environmental challenges. Due to large global programs such as Landsat and Copernicus, the availability of free data has increased significantly in recent years. In response, there has been a drive to find more efficient ways to process data at scale and facilitate access to derived insights. Two concepts addressing this challenge are data cubes as a technical solution for efficiently scaling computations and analysis-ready data (ARD) for defining quality standards and ensuring traceability with extensive metadata. This study compared different processing schemes for converting standard synthetic aperture radar (SAR) image datasets into radiometrically terrain corrected (RTC) analysis-ready products. The goal is to assist in the practical implementation of defined ARD standards, enabling routine analyses [...] Read more.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
6 pages, 1015 KiB  
Data Descriptor
Video Recordings of Male Face and Neck Movements for Facial Recognition and Other Purposes
by Collin Gros and Jeremy Straub
Data 2019, 4(3), 130; https://doi.org/10.3390/data4030130 - 6 Sep 2019
Viewed by 4613
Abstract
Facial recognition is made more difficult by unusual facial positions and movement. However, for many applications, the ability to accurately recognize moving subjects with movement-distorted facial features is required. This dataset includes videos of multiple subjects, taken under multiple lighting brightness and temperature [...] Read more.
Facial recognition is made more difficult by unusual facial positions and movement. However, for many applications, the ability to accurately recognize moving subjects with movement-distorted facial features is required. This dataset includes videos of multiple subjects, taken under multiple lighting brightness and temperature conditions, which can be used to train and evaluate the performance of facial recognition systems. Full article
Show Figures

Figure 1

Figure 1
<p>Depicts the positions of lights and the video camera.</p>
Full article ">Figure 2
<p>The different lighting settings used for each video (left to right: Warm, Cold, Low, Medium, and High).</p>
Full article ">Figure 3
<p>Subjects were told to position their head in multiple orientations, for one second at a time, during video recording.</p>
Full article ">
15 pages, 2106 KiB  
Article
Predicting High-Risk Prostate Cancer Using Machine Learning Methods
by Henry Barlow, Shunqi Mao and Matloob Khushi
Data 2019, 4(3), 129; https://doi.org/10.3390/data4030129 - 2 Sep 2019
Cited by 41 | Viewed by 9035
Abstract
Prostate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 [...] Read more.
Prostate cancer can be low- or high-risk to the patient’s health. Current screening on the basis of prostate-specific antigen (PSA) levels has a tendency towards both false positives and false negatives, both of which have negative consequences. We obtained a dataset of 35,875 patients from the screening arm of the National Cancer Institute’s Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. We segmented the data into instances without prostate cancer, instances with low-risk prostate cancer, and instances with high-risk prostate cancer. We developed a pipeline to deal with imbalanced data and proposed algorithms to perform preprocessing on such datasets. We evaluated the accuracy of various machine learning algorithms in predicting high-risk prostate cancer. An accuracy of 91.5% can be achieved by the proposed pipeline, using standard scaling, SVMSMOTE sampling method, and AdaBoost for machine learning. We then evaluated the contribution of rate of change of PSA, age, BMI, and filtration by race to this model’s accuracy. We identified that including the rate of change of PSA and age in our model increased the area under the curve (AUC) of the model by 6.8%, whereas BMI and race had a minimal effect. Full article
Show Figures

Figure 1

Figure 1
<p>Population of National Cancer Institute’s Prostate, Lung, Colorectal, and Ovarian Screening Trial (NCI PLCO) dataset upon different inclusion/exclusion criteria and the positive instances been selected for analysis.</p>
Full article ">Figure 2
<p>Segmentation of NCI PLCO into no cancer, low-risk prostate cancer, and high-risk prostate cancer.</p>
Full article ">Figure 3
<p>The whole process of the analysis.</p>
Full article ">Figure 4
<p>Average area under the curve (AUC) vs. average accuracy across all classifiers for each sampling method on optimally scaled PoPC data.</p>
Full article ">Figure 5
<p>Average AUC for the ensemble of algorithms on each scaling method vs. average accuracy for the ensemble of algorithms on each scaling method on PoPC data.</p>
Full article ">Figure 6
<p>Receiver operating characteristic curve for decision tree on PoPC data.</p>
Full article ">Figure 7
<p>Receiver operating characteristic curve for ADABoost on PoHRPC data.</p>
Full article ">
11 pages, 4065 KiB  
Data Descriptor
Flights of a Multirotor UAS with Structural Faults: Failures on Composite Propeller(s)
by Srikanth Gururajan, Kyle Mitchell and William Ebel
Data 2019, 4(3), 128; https://doi.org/10.3390/data4030128 - 28 Aug 2019
Cited by 8 | Viewed by 3700
Abstract
Data acquired from several flights of a custom-fabricated Hexacopter Unmanned Aerial System (UAS) with composite structure (carbon fiber arms and central hub) and composite (carbon fiber) propellers are described in this article. The Hexacopter was assembled from a commercially available kit (Tarot 690) [...] Read more.
Data acquired from several flights of a custom-fabricated Hexacopter Unmanned Aerial System (UAS) with composite structure (carbon fiber arms and central hub) and composite (carbon fiber) propellers are described in this article. The Hexacopter was assembled from a commercially available kit (Tarot 690) and flown in manual and autonomous modes. Takeoffs and landings were under manual control and the bulk of the flight tests was conducted with the Hexacopter in a “position hold” mode. All flights were flown within the UAS flight cage at Parks College of Engineering, Aviation and Technology at Saint Louis University for approximately 5 min each. Several failure conditions (different types, artificially induced) on the composite (carbon fiber) propellers were tested, including failures on up to two propellers. The dataset described in this article contains flight data from the onboard flight controller (Pixhawk) as well as three accelerometers, each with three axes, mounted on the arms of the Hexacopter UAS. The data are included as supplemental material. Full article
Show Figures

Figure 1

Figure 1
<p>Tarot-RC 690 Hexacopter Unmanned Aerial System (UAS).</p>
Full article ">Figure 2
<p>Schematic of Hexacopter UAS with accelerometers mounted on the arms.</p>
Full article ">Figure 3
<p>Location of motor, accelerometer, and Pixhawk; top view of UAS.</p>
Full article ">Figure 4
<p>Location of motor, accelerometer, and Pixhawk; side view of UAS.</p>
Full article ">Figure 5
<p>GPS altitude above ground level of Hexacopter UAS.</p>
Full article ">Figure 6
<p>Output of the flight controller, raw Pulse Width Modulation (PWM) values.</p>
Full article ">Figure 7
<p>Outdoor UAS flight cage at Parks College of Engineering, Aviation, and Technology, Saint Louis University.</p>
Full article ">Figure 8
<p>Broken propeller #3.</p>
Full article ">Figure 9
<p>Broken propeller #4.</p>
Full article ">Figure 10
<p>ST H3LIS331DL three-axis accelerometer.</p>
Full article ">Figure 11
<p>Featherboard M0 datalogger.</p>
Full article ">Figure 12
<p>Clock drift among the three accelerometers on the Hexacopter.</p>
Full article ">Figure 13
<p>Accelerometer output (<math display="inline"><semantics> <mrow> <msub> <mi>a</mi> <mi>x</mi> </msub> </mrow> </semantics></math>) from one node (n1) under nominal and failure conditions.</p>
Full article ">
9 pages, 382 KiB  
Data Descriptor
NILMPEds: A Performance Evaluation Dataset for Event Detection Algorithms in Non-Intrusive Load Monitoring
by Lucas Pereira
Data 2019, 4(3), 127; https://doi.org/10.3390/data4030127 - 24 Aug 2019
Cited by 9 | Viewed by 4095
Abstract
Datasets are important for researchers to build models and test how these perform, as well as to reproduce research experiments from others. This data paper presents the NILM Performance Evaluation dataset (NILMPEds), which is aimed primarily at research reproducibility in the field of [...] Read more.
Datasets are important for researchers to build models and test how these perform, as well as to reproduce research experiments from others. This data paper presents the NILM Performance Evaluation dataset (NILMPEds), which is aimed primarily at research reproducibility in the field of Non-intrusive load monitoring. This initial release of NILMPEds is dedicated to event detection algorithms and is comprised of ground-truth data for four test datasets, the specification of 47,950 event detection models, the power events returned by each model in the four test datasets, and the performance of each individual model according to 31 performance metrics. Full article
Show Figures

Figure 1

Figure 1
<p>Underlying folder and file organization of NILMPEds.</p>
Full article ">Figure 2
<p>Distribution of detected power events across the five algorithms and four datasets.</p>
Full article ">Figure 2 Cont.
<p>Distribution of detected power events across the five algorithms and four datasets.</p>
Full article ">Figure 3
<p>Median F<math display="inline"><semantics> <msub> <mrow/> <mn>1</mn> </msub> </semantics></math>-Score, for each dataset, under the different detection tolerance values.</p>
Full article ">
11 pages, 3235 KiB  
Article
A Novel Ensemble Neuro-Fuzzy Model for Financial Time Series Forecasting
by Alexander Vlasenko, Nataliia Vlasenko, Olena Vynokurova, Yevgeniy Bodyanskiy and Dmytro Peleshko
Data 2019, 4(3), 126; https://doi.org/10.3390/data4030126 - 23 Aug 2019
Cited by 19 | Viewed by 3791
Abstract
Neuro-fuzzy models have a proven record of successful application in finance. Forecasting future values is a crucial element of successful decision making in trading. In this paper, a novel ensemble neuro-fuzzy model is proposed to overcome limitations and improve the previously successfully applied [...] Read more.
Neuro-fuzzy models have a proven record of successful application in finance. Forecasting future values is a crucial element of successful decision making in trading. In this paper, a novel ensemble neuro-fuzzy model is proposed to overcome limitations and improve the previously successfully applied a five-layer multidimensional Gaussian neuro-fuzzy model and its learning. The proposed solution allows skipping the error-prone hyperparameters selection process and shows better accuracy results in real life financial data. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Figure 1

Figure 1
<p>General architecture of the proposed model.</p>
Full article ">Figure 2
<p>Architecture of the member model.</p>
Full article ">Figure 3
<p>General algorithm of ensemble initialization.</p>
Full article ">Figure 4
<p>Visualization of ensemble learning.</p>
Full article ">Figure 5
<p>An example of a multidimensional Gaussian with an identity receptive field matrix.</p>
Full article ">Figure 6
<p>Examples of Gaussian units that were identically initialized, but tuned with different hyperparameters.</p>
Full article ">Figure 7
<p>Error decay.</p>
Full article ">Figure 8
<p>Ensemble model prediction plot.</p>
Full article ">Figure 9
<p>The best performing single neuro-fuzzy model.</p>
Full article ">
10 pages, 365 KiB  
Data Descriptor
Google Web and Image Search Visibility Data for Online Store
by Artur Strzelecki
Data 2019, 4(3), 125; https://doi.org/10.3390/data4030125 - 22 Aug 2019
Cited by 13 | Viewed by 7267
Abstract
This data descriptor describes Google search engine visibility data. The visibility of a domain name in a search engine comes from search engine optimization and can be evaluated based on four data metrics and five data dimensions. The data metrics are the following: [...] Read more.
This data descriptor describes Google search engine visibility data. The visibility of a domain name in a search engine comes from search engine optimization and can be evaluated based on four data metrics and five data dimensions. The data metrics are the following: Clicks volume (1), impressions volume (2), click-through ratio (3), and ranking position (4). Data dimensions are as follows: queries that are entered into search engines that trigger results with the researched domain name (1), page URLs from research domains which are available in the search engine results page (2), country of origin of search engine visitors (3), type of device used for the search (4), and date of the search (5). Search engine visibility data were obtained from the Google search console for the international online store, which is visible in 240 countries and territories for a period of 15 months. The data contain 123 K clicks and 4.86 M impressions for the web search and 22 K clicks and 9.07 M impressions for the image search. The proposed method for obtaining data can be applied in any other area, not only in the e-commerce industry. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Clusteric Search Auditor with the Google Search Console (GSC) Application Programming Interface (API).</p>
Full article ">
16 pages, 3264 KiB  
Data Descriptor
A Dataset of Students’ Mental Health and Help-Seeking Behaviors in a Multicultural Environment
by Minh-Hoang Nguyen, Manh-Toan Ho, Quynh-Yen T. Nguyen and Quan-Hoang Vuong
Data 2019, 4(3), 124; https://doi.org/10.3390/data4030124 - 21 Aug 2019
Cited by 21 | Viewed by 32313
Abstract
University students, especially international students, possess a higher risk of mental health problems than the general population. However, the literature regarding the prevalence and determinants of mental health problems as well as help-seeking behaviors of international and domestic students in Japan seems to [...] Read more.
University students, especially international students, possess a higher risk of mental health problems than the general population. However, the literature regarding the prevalence and determinants of mental health problems as well as help-seeking behaviors of international and domestic students in Japan seems to be limited. This dataset contains 268 records of depression, acculturative stress, social connectedness, and help-seeking behaviors reported by international and domestic students at an international university in Japan. One of the main findings that can be drawn from this dataset is how the level of social connectedness and acculturative stress are predictive of the reported depression among international as well as domestic students. The dataset is expected to provide reliable materials for further study of cross-cultural public health studies and policy-making in higher education. Full article
(This article belongs to the Special Issue Big Data and Digital Health)
Show Figures

Figure 1

Figure 1
<p>Age distribution of respondents.</p>
Full article ">Figure 2
<p>Different types of acculturative stress according to the type of depressive disorder.</p>
Full article ">Figure 3
<p>Acculturative stress and social connectedness among international and domestic students according to language proficiency.</p>
Full article ">Figure 4
<p>Level of depression, acculturative stress, and social connectedness of students from different origins.</p>
Full article ">Figure 5
<p>The regression line with “ToDep” being the dependent variable using international student dataset.</p>
Full article ">Figure 6
<p>The regression line with “ToDep” being the dependent variable using domestic student dataset.</p>
Full article ">
12 pages, 1284 KiB  
Technical Note
dsCleaner: A Python Library to Clean, Preprocess and Convert Non-Intrusive Load Monitoring Datasets
by Manuel Pereira, Nuno Velosa and Lucas Pereira
Data 2019, 4(3), 123; https://doi.org/10.3390/data4030123 - 12 Aug 2019
Cited by 10 | Viewed by 5549
Abstract
Datasets play a vital role in data science and machine learning research as they serve as the basis for the development, evaluation, and benchmark of new algorithms. Non-Intrusive Load Monitoring is one of the fields that has been benefiting from the recent increase [...] Read more.
Datasets play a vital role in data science and machine learning research as they serve as the basis for the development, evaluation, and benchmark of new algorithms. Non-Intrusive Load Monitoring is one of the fields that has been benefiting from the recent increase in the number of publicly available datasets. However, there is a lack of consensus concerning how dataset should be made available to the community, thus resulting in considerable structural differences between the publicly available datasets. This technical note presents the DSCleaner, a Python library to clean, preprocess, and convert time series datasets to a standard file format. Two application examples using real-world datasets are also presented to show the technical validity of the proposed library. Full article
Show Figures

Figure 1

Figure 1
<p>Dataflow diagram of the dsCleaner workflow.</p>
Full article ">Figure 2
<p>dsCleaner’s Class Diagram.</p>
Full article ">Figure 3
<p>Example of files with extra samples.</p>
Full article ">Figure 4
<p>Example of time correction by sample replication: current (<b>top</b>), voltage (<b>bottom</b>).</p>
Full article ">Figure 5
<p>Resampling example: original signal at 16 kHz vs. resampled signal at (12.8, 6.4, 3.2) kHz.</p>
Full article ">Figure 6
<p>Normalization example: raw file (<b>left</b>), normalized file (<b>right</b>).</p>
Full article ">
11 pages, 3182 KiB  
Data Descriptor
Sea Ice Climate Normals for Seasonal Ice Monitoring of Arctic and Sub-Regions
by Ge Peng, Anthony Arguez, Walter N. Meier, Freja Vamborg, Jake Crouch and Philip Jones
Data 2019, 4(3), 122; https://doi.org/10.3390/data4030122 - 10 Aug 2019
Cited by 5 | Viewed by 6051
Abstract
The climate normal, that is, the latest three full-decade average, of Arctic sea ice parameters is useful for baselining the sea ice state. A baseline ice state on both regional and local scales is important for monitoring how the current regional and local [...] Read more.
The climate normal, that is, the latest three full-decade average, of Arctic sea ice parameters is useful for baselining the sea ice state. A baseline ice state on both regional and local scales is important for monitoring how the current regional and local states depart from their normal to understand the vulnerability of marine and sea ice-based ecosystems to the changing climate conditions. Combined with up-to-date observations and reliable projections, normals are essential to business strategic planning, climate adaptation and risk mitigation. In this paper, monthly and annual climate normals of sea ice parameters (concentration, area, and extent) of the whole Arctic Ocean and 15 regional divisions are derived for the period of 1981–2010 using monthly satellite sea ice concentration estimates from a climate data record (CDR) produced by NOAA and the National Snow and Ice Data Center (NSIDC). Basic descriptions and characteristics of the normals are provided. Empirical Orthogonal Function (EOF) analysis has been utilized to describe spatial modes of sea ice concentration variability and how the corresponding principal components change over time. To provide users with basic information on data product accuracy and uncertainty, the climate normal values of Arctic sea ice extents (SIE) are compared with that of other products, including a product from NSIDC and two products from the Copernicus Climate Change Service (C3S). The SIE differences between different products are in the range of 2.3–4.5% of the CDR SIE mean. Additionally, data uncertainty estimates are represented by using the range (the difference between the maximum and minimum), standard deviation, 10th and 90th percentiles, and the first, second, and third quartile distribution of all monthly values, a distinct feature of these sea ice normal products. Full article
(This article belongs to the Special Issue Open Data and Robust & Reliable GIScience)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Location map of the regions in Arctic. From [<a href="#B3-data-04-00122" class="html-bibr">3</a>].</p>
Full article ">Figure 2
<p>Spatial distributions of annual sea ice concentration (SIC): (<b>a</b>) climate normal, i.e., average of monthly fields for the period of 1981–2010, (<b>b</b>) standard deviation, (<b>c</b>) quality flags, (<b>d</b>) first quartile, (<b>e</b>) second quartile, and (<b>f</b>) third quartile. Light blue in (<b>a</b>,<b>b</b>) denotes water, gray denotes land, and white is for lakes, coastal, and the North Pole hole areas. The flag values of (−0.05, −0.04, −0.03, −0.02, −0.01, 0, 1, 2, 3, 4) in (<b>c</b>)) denote (the North Pole hole, Lakes, Coastal, Land, Missing data, All water, Low record, Provisional, Standard, and Complete). See <a href="#data-04-00122-t003" class="html-table">Table 3</a> for more information.</p>
Full article ">Figure 3
<p>(<b>a</b>–<b>c</b>) are the spatial patterns of the first three leading empirical orthogonal functions (EOFs) for September SIC, and (<b>d</b>–<b>f</b>) are their corresponding principal component time series. The dashed line in (<b>d</b>) is the linear regression trend line. Green (purple) areas project positively (negatively) onto the associated time series, with the EOF magnitude modulating the intensity of the effect of the same (opposite) sign of the time series values.</p>
Full article ">Figure 4
<p>(<b>a</b>) The temporal distribution of monthly Arctic sea ice extent (SIE, 10<sup>6</sup> km<sup>2</sup>) for the period of 1979–2016. The horizontal dashed lines denote the beginning and the ending of the climate normal period (1981–2010). The vertical dotted lines denote the months of March and September, which are normally the annual maximum and minimum, respectively. The white space denotes missing sensor data in December 1987 and January 1988. (<b>b</b>) Seasonal cycle of 30-year average monthly SIE (10<sup>6</sup> km<sup>2</sup>, thick red line with filled circles) and the maximum and minimum values for each month (dashed blue lines) over the climate normal period for the Arctic region. The lower two panels are time series of sea ice extent (red circles with solid black line), its linear regression (thick green dashed line) for the annual maximum (<b>c</b>) and minimum (<b>d</b>). The values in red are decadal trends that are significant at the 99% confidence level.</p>
Full article ">Figure 5
<p>The relative frequency distribution (thick solid black line with circles) of all valid monthly SIE values going into the calculation of the Arctic annual SIE climate normal. The number of SIEs within each bin is normalized by the total number of all valid SIEs. The minimum and maximum SIE values of all valid data are denoted by the intersection points between the <span class="html-italic">x</span>-axis and the dashed purple vertical lines. The 10th and 90th percentiles are denoted by that of the dashed blue lines. The first, second, and third quartiles are denoted by that of the dashed green lines. The mean, which is the annual SIE climate normal, is denoted by that of the dashed red line.</p>
Full article ">Figure 6
<p>The seasonal cycle of monthly Arctic GSFC<sub>m</sub> SIE climate normal values (10<sup>6</sup> km<sup>2</sup>, thick blue line with circles) bounded by the maximum and minimum values for each month over the climate normal period (1981–2010) (light grey shade), superimposed with the SIE climate normal values from SII (solid green line), ERA-Interim (dashed purple line) and ERA5 (dashed red line).</p>
Full article ">Figure 7
<p>The scatter diagram between monthly SII SIEs and GSFC<sub>m</sub> SIEs (left) and GSFC-derived NT SIEs (right), respectively.</p>
Full article ">
20 pages, 590 KiB  
Article
Aspect Extraction from Bangla Reviews Through Stacked Auto-Encoders
by Matteo Bodini
Data 2019, 4(3), 121; https://doi.org/10.3390/data4030121 - 9 Aug 2019
Cited by 8 | Viewed by 4384
Abstract
Interactions between online users are growing more and more in recent years, due to the latest developments of the web. People share online comments, opinions, and reviews about many topics. Aspect extraction is the automatic process of understanding the topic (the aspect) of [...] Read more.
Interactions between online users are growing more and more in recent years, due to the latest developments of the web. People share online comments, opinions, and reviews about many topics. Aspect extraction is the automatic process of understanding the topic (the aspect) of such comments, which has obtained huge interest from commercial and academic points of view. For instance, reviews available in webshops (like eBay, Amazon, Aliexpress, etc.) can help the customers in purchasing products and automatic analysis of reviews would be useful, as sometimes it is almost impossible to read all the available ones. In recent years, aspect extraction in the Bangla language has been regarded more and more as a task of growing importance. In the previous literature, a few methods have been introduced to classify Bangla texts according to the aspect they were focused on. This kind of research is limited mainly due to the lack of publicly available datasets for aspect extraction in the Bangla language. We take into account the only two publicly available datasets, recently published, collected for the task of aspect extraction in the Bangla language. Then, we introduce several classification methods based on stacked auto-encoders, as far as we know never exploited in the task of aspect extraction in Bangla, and we achieve better aspect classification performance with respect to the state-of-the-art: the experiments show an average improvement of 0.17 , 0.31 and 0.30 (across the two datasets), respectively in precision, recall and F1-score, reported in the state-of-the-art works that tackled the problem. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture presented in Kim et al. [<a href="#B37-data-04-00121" class="html-bibr">37</a>]. The first layers embed words into low-dimensional vectors. Then, convolutions using different filter sizes are performed over the embedded word vectors. The result of the convolutional layer is given in input into a max-pooling layer and the result is a feature vector. The final level performs the classification step and it is composed of a fully connected neural network and a softmax.</p>
Full article ">Figure 2
<p>The architecture presented in Rahman et al. [<a href="#B8-data-04-00121" class="html-bibr">8</a>]. The network consisted of a unique convolutional layer followed by a max-pooling and finally a classification layer, composed by a fully connected neural network (NN) and a sigmoid.</p>
Full article ">Figure 3
<p>An example of an auto-encoder (AE) with six neurons both in the input and output layers, respectively <math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>L</mi> <mn>3</mn> </msub> </semantics></math>, and four neurons in the hidden layer <math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math>. The two “+1” nodes represent the bias vectors, initially set to the unit vector.</p>
Full article ">
8 pages, 4219 KiB  
Data Descriptor
Satellite-Based Reconstruction of the Volcanic Deposits during the December 2015 Etna Eruption
by Gaetana Ganci, Annalisa Cappello, Giuseppe Bilotta, Claudia Corradino and Ciro Del Negro
Data 2019, 4(3), 120; https://doi.org/10.3390/data4030120 - 8 Aug 2019
Cited by 13 | Viewed by 3041
Abstract
Satellite-derived data, including an estimation of the eruption rate, proximal volcanic deposits and lava flow morphometric parameters (area, maximum length, thickness, and volume) are provided for the eruption that occurred at Mt Etna on 6–8 December 2015. This eruption took place at the [...] Read more.
Satellite-derived data, including an estimation of the eruption rate, proximal volcanic deposits and lava flow morphometric parameters (area, maximum length, thickness, and volume) are provided for the eruption that occurred at Mt Etna on 6–8 December 2015. This eruption took place at the New Southeast Crater (NSEC), the youngest of the summit craters of Etna, shortly after a sequence of four violent paroxysmal events took place in 65 h (3–5 December) at “Voragine”, the oldest summit crater. Multispectral SEVIRI images at 15 min sampling time have been used to compute time-averaged eruption rate curves, while tri-stereo Pléiades images, at 50 cm spatial resolution, provided the pre-eruptive topography and topographic changes due to volcanic deposits. In addition to the two types of satellite data, other parameters have been inferred, such as probable vesicularity and pyroclastic deposits. Full article
Show Figures

Figure 1

Figure 1
<p>Minimum (orange), medium (dark red) and maximum (violet) estimates for (<b>a</b>) TADR and (<b>b</b>) cumulative volume computed from 6 to 8 December 2015 by HOTSAT using SEVIRI data.</p>
Full article ">Figure 1 Cont.
<p>Minimum (orange), medium (dark red) and maximum (violet) estimates for (<b>a</b>) TADR and (<b>b</b>) cumulative volume computed from 6 to 8 December 2015 by HOTSAT using SEVIRI data.</p>
Full article ">Figure 2
<p>(<b>a</b>) Spatial distribution of the GPS GCPs used to validate the Pleiades-derived pre-eruptive DEM. Colors represent the height difference between the GCPs and the corresponding pixels of the DEM. The five summit craters of Etna are highlighted: NEC (North-East Crater), VOR (Voragine), BN (Bocca Nuova), SEC (South-East Crater) and NSEC, (New South-East Crater); (<b>b</b>) Histogram of the residuals, peaking at 0.94 m, with a standard deviation of 1.63 m, representing the vertical accuracy of the DEM.</p>
Full article ">Figure 3
<p>(<b>a</b>) Elevation change obtained by differencing the two DEMs derived from Pleiades images acquired before and after the December 2015 Etna eruptions. The colors indicate flow thickness in meters inside the lava flow fields. The five summit craters of Etna are highlighted: NEC (North-East Crater), VOR (Voragine), BN (Bocca Nuova), SEC (South-East Crater) and NSEC, (New South-East Crater); (<b>b</b>) The zero-peaked histogram of the terrain residuals, proving that the two DEMs are properly aligned.</p>
Full article ">
15 pages, 2706 KiB  
Article
Gifted and Talented Services for EFL Learners in China: A Step-by-Step Guide to Propensity Score Matching Analysis in R
by Shifang Tang, Fuhui Tong and Xiuhong Lu
Data 2019, 4(3), 119; https://doi.org/10.3390/data4030119 - 3 Aug 2019
Cited by 2 | Viewed by 3786
Abstract
We sought to quantify the effectiveness of a gifted and talented (GT) program, as was provided to university students who demonstrated a talent for learning English as a foreign language (EFL) in China. To do so, we used propensity score matching (PSM) techniques [...] Read more.
We sought to quantify the effectiveness of a gifted and talented (GT) program, as was provided to university students who demonstrated a talent for learning English as a foreign language (EFL) in China. To do so, we used propensity score matching (PSM) techniques to analyze data collected from a tier-1 university where an English talent (ET) program was provided. Specifically, we provided (a) a step-by-step guide of PSM analysis using the R analytical package, (b) the codes for PSM analysis and visualization, and (c) the final analysis of baseline equivalence and treatment effect based on the matching sample. Collectively, the results of descriptive statistics, visualization, and baseline equivalence indicate that PSM is an effective matching technique for generating an unbiased counterfactual analysis. Moreover, the ET program yields a statistically significant, positive effect on ET students’ English language proficiency. Full article
Show Figures

Figure 1

Figure 1
<p>Secure CRAN mirrors of the MatchIt package.</p>
Full article ">Figure 2
<p>Installing the MatchIt Package in R.</p>
Full article ">Figure 3
<p>Loading the MatchIt Package in R.</p>
Full article ">Figure 4
<p>Layout of data for the propensity score matching (PSM) procedure.</p>
Full article ">Figure 5
<p>Example of codes for loading a data set.</p>
Full article ">Figure 6
<p>Demonstration of the first 10 cases of imported data.</p>
Full article ">Figure 7
<p>Example codes for performing PSM and summarizing the results.</p>
Full article ">Figure 8
<p>Results of the PSM procedure.</p>
Full article ">Figure 9
<p>Codes for the visualization of the effectiveness of the PSM procedure.</p>
Full article ">Figure 10
<p>Distribution of propensity scores.</p>
Full article ">Figure 11
<p>Histograms of propensity scores before and after matching by condition.</p>
Full article ">Figure 12
<p>Code to output a matched dataset.</p>
Full article ">Figure 13
<p>R’s feedback for missing values.</p>
Full article ">Figure 14
<p>How to get a data location.</p>
Full article ">
16 pages, 2542 KiB  
Data Descriptor
A Rainfall Data Intercomparison Dataset of RADKLIM, RADOLAN, and Rain Gauge Data for Germany
by Jennifer Kreklow, Björn Tetzlaff, Gerald Kuhnt and Benjamin Burkhard
Data 2019, 4(3), 118; https://doi.org/10.3390/data4030118 - 2 Aug 2019
Cited by 17 | Viewed by 5923
Abstract
Quantitative precipitation estimates (QPE) derived from weather radars provide spatially and temporally highly resolved rainfall data. However, they are also subject to systematic and random bias and various potential uncertainties and therefore require thorough quality checks before usage. The dataset described in this [...] Read more.
Quantitative precipitation estimates (QPE) derived from weather radars provide spatially and temporally highly resolved rainfall data. However, they are also subject to systematic and random bias and various potential uncertainties and therefore require thorough quality checks before usage. The dataset described in this paper is a collection of precipitation statistics calculated from the hourly nationwide German RADKLIM and RADOLAN QPEs provided by the German Weather Service (Deutscher Wetterdienst (DWD)), which were combined with rainfall statistics derived from rain gauge data for intercomparison. Moreover, additional information on parameters that can potentially influence radar data quality, such as the height above sea level, information on wind energy plants and the distance to the next radar station, were included in the dataset. The resulting two point shapefiles are readable with all common GIS and constitutes a spatially highly resolved rainfall statistics geodataset for the period 2006 to 2017, which can be used for statistical rainfall analyses or for the derivation of model inputs. Furthermore, the publication of this data collection has the potential to benefit other users who intend to use precipitation data for any purpose in Germany and to identify the rainfall dataset that is best suited for their application by a straightforward comparison of three rainfall datasets without any tedious data processing and georeferencing. Full article
Show Figures

Figure 1

Figure 1
<p>Comparison of height above sea level derived from the gauge metadata (field ‘G_height’) and from the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) aggregated to 1 km<sup>2</sup> (‘height_dem’) for the 997 points in the gauge shapefile.</p>
Full article ">Figure 2
<p>Differences between planar and geodesic distance calculation in the projected, Cartesian stereographic coordinate system defined for the radar products by Deutscher Wetterdienst (DWD). The very low precipitation values in the northern area, which are due to several months of missing data during the upgrade of the radar Flechtdorf in 2014, provide an ideal radar range for the distance validation.</p>
Full article ">Figure 3
<p>Schematic rain gauge data cleaning workflow.</p>
Full article ">Figure 4
<p>Mean annual precipitation sum 2006–2017 calculated from rain gauges (map on the left) and RADKLIM (map on the right). The dashed rectangle in the RADKLIM map indicates the area around the radar station in Hanover discussed in the text.</p>
Full article ">Figure 5
<p>Importing the gauge shapefile into a DataFrame and plotting the mean annual precipitation sums of gauges and RADKLIM against each other.</p>
Full article ">
10 pages, 2455 KiB  
Article
Paving the Way towards an Armenian Data Cube
by Shushanik Asmaryan, Vahagn Muradyan, Garegin Tepanosyan, Azatuhi Hovsepyan, Armen Saghatelyan, Hrachya Astsatryan, Hayk Grigoryan, Rita Abrahamyan, Yaniss Guigoz and Gregory Giuliani
Data 2019, 4(3), 117; https://doi.org/10.3390/data4030117 - 2 Aug 2019
Cited by 30 | Viewed by 5780
Abstract
Environmental issues become an increasing global concern because of the continuous pressure on natural resources. Earth observations (EO), which include both satellite/UAV and in-situ data, can provide robust monitoring for various environmental concerns. The realization of the full information potential of EO data [...] Read more.
Environmental issues become an increasing global concern because of the continuous pressure on natural resources. Earth observations (EO), which include both satellite/UAV and in-situ data, can provide robust monitoring for various environmental concerns. The realization of the full information potential of EO data requires innovative tools to minimize the time and scientific knowledge needed to access, prepare and analyze a large volume of data. EO Data Cube (DC) is a new paradigm aiming to realize it. The article presents the Swiss-Armenian joint initiative on the deployment of an Armenian DC, which is anchored on the best practices of the Swiss model. The Armenian DC is a complete and up-to-date archive of EO data (e.g., Landsat 5, 7, 8, Sentinel-2) by benefiting from Switzerland’s expertise in implementing the Swiss DC. The use-case of confirm delineation of Lake Sevan using McFeeters band ratio algorithm is discussed. The validation shows that the results are sufficiently reliable. The transfer of the necessary knowledge from Switzerland to Armenia for developing and implementing the first version of an Armenian DC should be considered as a first step of a permanent collaboration for paving the way towards continuous remote environmental monitoring in Armenia. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

Figure 1
<p>Interface of the Armenian DC.</p>
Full article ">Figure 2
<p>Sentinel-2 ingestion of Lake Sevan.</p>
Full article ">Figure 3
<p>McFeeters band-ratio algorithm for Lake Sevan calculated from (<b>a</b>) Sentinel-2 and (<b>b</b>) Landsat sensors.</p>
Full article ">Figure 4
<p>The comparison of shorelines derived from UAV, Sentinel-2 and Landsat imageries.</p>
Full article ">
10 pages, 2762 KiB  
Data Descriptor
A High-Resolution Map of Singapore’s Terrestrial Ecosystems
by Leon Yan-Feng Gaw, Alex Thiam Koon Yee and Daniel Rex Richards
Data 2019, 4(3), 116; https://doi.org/10.3390/data4030116 - 1 Aug 2019
Cited by 60 | Viewed by 26255
Abstract
The natural and semi-natural areas within cities provide important refuges for biodiversity, as well as many benefits to people. To study urban ecology and quantify the benefits of urban ecosystems, we need to understand the spatial extent and configuration of different types of [...] Read more.
The natural and semi-natural areas within cities provide important refuges for biodiversity, as well as many benefits to people. To study urban ecology and quantify the benefits of urban ecosystems, we need to understand the spatial extent and configuration of different types of vegetated cover within a city. It is challenging to map urban ecosystems because they are typically small and highly fragmented; thus requiring high resolution satellite images. This article describes a new high-resolution map of land cover for the tropical city-state of Singapore. We used images from WorldView and QuickBird satellites, and classified these images using random forest machine learning and supplementary datasets into 12 terrestrial land classes. Close to 50 % of Singapore’s land cover is vegetated while freshwater fills about 6 %, and the rest is bare or built up. The overall accuracy of the map was 79 % and the class-specific errors are described in detail. Tropical regions such as Singapore have a lot of cloud cover year-round, complicating the process of mapping using satellite imagery. The land cover map provided here will have applications for urban biodiversity studies, ecosystem service quantification, and natural capital assessment. Full article
Show Figures

Figure 1

Figure 1
<p>The classified map of Singapore made from satellite images taken from 2003 to 2018.</p>
Full article ">Figure 2
<p>The geographical coverage of high resolution satellite images used for this study.</p>
Full article ">Figure 3
<p>A flowchart of classifying high resolution satellite imagery into the land cover map. The image sample was taken from a WorldView 3 image (ID 301).</p>
Full article ">
12 pages, 2595 KiB  
Article
Dynamic Data Citation Service—Subset Tool for Operational Data Management
by Chris Schubert, Georg Seyerl and Katharina Sack
Data 2019, 4(3), 115; https://doi.org/10.3390/data4030115 - 1 Aug 2019
Cited by 2 | Viewed by 4342
Abstract
In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to [...] Read more.
In earth observation and climatological sciences, data and their data services grow on a daily basis in a large spatial extent due to the high coverage rate of satellite sensors, model calculations, but also by continuous meteorological in situ observations. In order to reuse such data, especially data fragments as well as their data services in a collaborative and reproducible manner by citing the origin source, data analysts, e.g., researchers or impact modelers, need a possibility to identify the exact version, precise time information, parameter, and names of the dataset used. A manual process would make the citation of data fragments as a subset of an entire dataset rather complex and imprecise to obtain. Data in climate research are in most cases multidimensional, structured grid data that can change partially over time. The citation of such evolving content requires the approach of “dynamic data citation”. The applied approach is based on associating queries with persistent identifiers. These queries contain the subsetting parameters, e.g., the spatial coordinates of the desired study area or the time frame with a start and end date, which are automatically included in the metadata of the newly generated subset and thus represent the information about the data history, the data provenance, which has to be established in data repository ecosystems. The Research Data Alliance Data Citation Working Group (RDA Data Citation WG) summarized the scientific status quo as well as the state of the art from existing citation and data management concepts and developed the scalable dynamic data citation methodology of evolving data. The Data Centre at the Climate Change Centre Austria (CCCA) has implemented the given recommendations and offers since 2017 an operational service on dynamic data citation on climate scenario data. With the consciousness that the objective of this topic brings a lot of dependencies on bibliographic citation research which is still under discussion, the CCCA service on Dynamic Data Citation focused on the climate domain specific issues, like characteristics of data, formats, software environment, and usage behavior. The current effort beyond spreading made experiences will be the scalability of the implementation, e.g., towards the potential of an Open Data Cube solution. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

Figure 1
<p>A structured order for the Research Data Alliance (RDA) recommendation on dynamic data citation.</p>
Full article ">Figure 2
<p>Schematic draft of subset needs, which includes the control on versioning and the alignment with the persistent identifier (PID), here handle.NET identifier—hdl. For the fragmented subset (blue cube), a new identifier is aligned, coupled with its own version number.</p>
Full article ">Figure 3
<p>Simplified structure of server and hardware components for dynamic data citation within the CCCA Data Centre environment: (i) ckan web server, (ii) the application server for access, data management used as query store, (iii) Handle.NET<sup>®</sup> Registry Server for PID allocation, and (iv) the Unidata Thredds Data Server (TDS), NCSS Subset Service and planned features on Open EO support.</p>
Full article ">Figure 4
<p>The general landing page of a data resource, after the personalized login: the general landing page of a dataset resource after login, where the subset can be created (on top): (<b>a</b>) The visualization is a view service (WMS), created by Thredds, and it allows the user by activating the time control to visualize each time step up to 2100; (<b>b</b>) additionally, it shows a timeline diagram after a point of interest on the map window is created.</p>
Full article ">Figure 5
<p>GUI of the subset creation function: (<b>a</b>) The upper part of web page for defining the parameter, or reuse of a still existing query, defining a bounding box either by polygon or predefined administrative units, (<b>b</b>) allows choosing a time range for other datasets like the globally available radio occultation data packages, a fourth dimension—e.g., the Potential High was introduced and can choose.</p>
Full article ">Figure 6
<p>The screenshot gives an impression of what versions, relations, and the suggested text for citation looks like. In addition, the user could create, with the same arguments, a subset based on oldest versions but normally on a new version published. If new versions are available, a notification will be sent to the subset creator, which is part of the metadata profile.</p>
Full article ">
8 pages, 4499 KiB  
Data Descriptor
A New Multi-Temporal Forest Cover Classification for the Xingu River Basin, Brazil
by Margaret Kalacska, Oliver Lucanus, Leandro Sousa and J. Pablo Arroyo-Mora
Data 2019, 4(3), 114; https://doi.org/10.3390/data4030114 - 1 Aug 2019
Cited by 5 | Viewed by 3529
Abstract
We describe a new multi-temporal classification for forest/non-forest classes for a 1.3 million square kilometer area encompassing the Xingu River basin, Brazil. This region is well known for its exceptionally high biodiversity, especially in terms of the ichthyofauna, with approximately 600 known species, [...] Read more.
We describe a new multi-temporal classification for forest/non-forest classes for a 1.3 million square kilometer area encompassing the Xingu River basin, Brazil. This region is well known for its exceptionally high biodiversity, especially in terms of the ichthyofauna, with approximately 600 known species, 10% of which are endemic to the river basin. Global and regional scale datasets do not adequately capture the rapidly changing land cover in this region. Accurate forest cover and forest cover change data are important for understanding the anthropogenic pressures on the aquatic ecosystems. We developed the new classifications with a minimum mapping unit of 0.8 ha from cloud free mosaics of Landsat TM5 and OLI 8 imagery in Google Earth Engine using a classification and regression tree (CART) aided by field photographs for the selection of training and validation points. Full article
Show Figures

Figure 1

Figure 1
<p>Four classifications of forest cover for the Xingu river basin for (<b>a</b>) circa 1989, (<b>b</b>) circa 2000, (<b>c</b>) circa 2010, and (<b>d</b>) circa 2018 from Landsat TM5 and OLI 8 imagery. Boundaries of states, larger rivers, and landmarks including cities and the Jericoá rapids are shown for reference.</p>
Full article ">Figure 1 Cont.
<p>Four classifications of forest cover for the Xingu river basin for (<b>a</b>) circa 1989, (<b>b</b>) circa 2000, (<b>c</b>) circa 2010, and (<b>d</b>) circa 2018 from Landsat TM5 and OLI 8 imagery. Boundaries of states, larger rivers, and landmarks including cities and the Jericoá rapids are shown for reference.</p>
Full article ">Figure 2
<p>Examples of field photographs used to train the visual interpretation of the imagery for selecting classification and regression tree (CART) classification training and validation points. * indicates the Northern zone between Vitória do Xingu and Sao Felix do Xingu, ** indicates the Southern sector (south of Castelo do Sonhos). 1: clearing of Arapujá Island across from Altamira*, 2: partially deciduous forest, sandy beach, and rocks*, 3: large patch of cleared forest*, 4: intact forest*, 5: homestead in Amazonia lowlands*, 6: small settlement*, 7: small household*, 8: larger settlement*, 9: large-scale deforestation*, 10: deforestation with secondary growth*, 11: large-scale deforestation*, 12: burned land prior to deforestation*, 13: homestead*, 14: large-scale deforestation*, 15: large-scale deforestation*, 16: aerial view north from the Jericoá rapids*, 17: aerial view from the Xadá rapids with deforestation on the unprotected side*, 18: aerial view of intact forest in protected area*, 19: aerial view of intact forest at the Iriri rapids*, 20: small-scale clearing at the Jericoá rapids*, 21: intact forest along the Culuene river**, 22: exposed soil for agriculture**, 23: cattle herd**, 24: Belo Monte dam*, 25: isolated forest patch in corn field**, 26: large corn field**, 27: plantation**, 28: extensive pasture land**, 29: extensive cotton field**, 30: recently cut forest**, 31: pasture with forest patch **, 32: pasture with isolated trees**, 33: cornfield with forest patch**.</p>
Full article ">Figure 2 Cont.
<p>Examples of field photographs used to train the visual interpretation of the imagery for selecting classification and regression tree (CART) classification training and validation points. * indicates the Northern zone between Vitória do Xingu and Sao Felix do Xingu, ** indicates the Southern sector (south of Castelo do Sonhos). 1: clearing of Arapujá Island across from Altamira*, 2: partially deciduous forest, sandy beach, and rocks*, 3: large patch of cleared forest*, 4: intact forest*, 5: homestead in Amazonia lowlands*, 6: small settlement*, 7: small household*, 8: larger settlement*, 9: large-scale deforestation*, 10: deforestation with secondary growth*, 11: large-scale deforestation*, 12: burned land prior to deforestation*, 13: homestead*, 14: large-scale deforestation*, 15: large-scale deforestation*, 16: aerial view north from the Jericoá rapids*, 17: aerial view from the Xadá rapids with deforestation on the unprotected side*, 18: aerial view of intact forest in protected area*, 19: aerial view of intact forest at the Iriri rapids*, 20: small-scale clearing at the Jericoá rapids*, 21: intact forest along the Culuene river**, 22: exposed soil for agriculture**, 23: cattle herd**, 24: Belo Monte dam*, 25: isolated forest patch in corn field**, 26: large corn field**, 27: plantation**, 28: extensive pasture land**, 29: extensive cotton field**, 30: recently cut forest**, 31: pasture with forest patch **, 32: pasture with isolated trees**, 33: cornfield with forest patch**.</p>
Full article ">Figure 3
<p>Open rivers wider than a single pixel (i.e., &gt;30 m) such as in the photograph on the left are included in the classifications. Users are cautioned, however, in examining the surface water class for narrow rivers/streams (&lt;30 m), especially those with dense overgrowth, such as in the photograph on the right. The classifications underestimate the area for these smaller rivers/streams.</p>
Full article ">
23 pages, 4465 KiB  
Article
Paving the Way to Increased Interoperability of Earth Observations Data Cubes
by Gregory Giuliani, Joan Masó, Paolo Mazzetti, Stefano Nativi and Alaitz Zabala
Data 2019, 4(3), 113; https://doi.org/10.3390/data4030113 - 30 Jul 2019
Cited by 38 | Viewed by 9847
Abstract
Earth observations data cubes (EODCs) are a paradigm transforming the way users interact with large spatio-temporal Earth observation (EO) data. It enhances connections between data, applications and users facilitating management, access and use of analysis ready data (ARD). The ambition is allowing users [...] Read more.
Earth observations data cubes (EODCs) are a paradigm transforming the way users interact with large spatio-temporal Earth observation (EO) data. It enhances connections between data, applications and users facilitating management, access and use of analysis ready data (ARD). The ambition is allowing users to harness big EO data at a minimum cost and effort. This significant interest is illustrated by various implementations that exist. The novelty of the approach results in different innovative solutions and the lack of commonly agreed definition of EODC. Consequently, their interoperability has been recognized as a major challenge for the global change and Earth system science domains. The objective of this paper is preventing EODC from becoming silos of information; to present how interoperability can be enabled using widely-adopted geospatial standards; and to contribute to the debate of enhanced interoperability of EODC. We demonstrate how standards can be used, profiled and enriched to pave the way to increased interoperability of EODC and can help delivering and leveraging the power of EO data building, efficient discovery, access and processing services. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

Figure 1
<p>The Swiss Data Cube; viewer showing snow cover change over Switzerland between 1995 and 2017.</p>
Full article ">Figure 2
<p>The Catalan Data Cube. Dynamic normalized difference vegetation index (NDVI) values and a histogram are over the view area, computed by the client using original Sentinel-2A red and infrared bands retrieved as binary arrays (centered in Barcelona and surroundings).</p>
Full article ">Figure 3
<p>The Catalan Data Cube. Dynamic NDVI layer animation including, a temporal profile for sand (black) and crop (red) areas (centered in the Ebre river delta area).</p>
Full article ">
10 pages, 1765 KiB  
Article
Catastrophic Household Expenditure for Healthcare in Turkey: Clustering Analysis of Categorical Data
by Onur Dogan, Gizem Kaya, Aycan Kaya and Hidayet Beyhan
Data 2019, 4(3), 112; https://doi.org/10.3390/data4030112 - 29 Jul 2019
Cited by 3 | Viewed by 4520
Abstract
The amount of health expenditure at the household level is one of the most basic indicators of development in countries. In many countries, health expenditure increases relative to national income. If out-of-pocket health spending is higher than the income or too high, this [...] Read more.
The amount of health expenditure at the household level is one of the most basic indicators of development in countries. In many countries, health expenditure increases relative to national income. If out-of-pocket health spending is higher than the income or too high, this indicates an economical alarm that causes a lower life standard, called catastrophic health expenditure. Catastrophic expenditure may be affected by many factors such as household type, property status, smoking and drinking alcohol habits, being active in sports, and having private health insurance. The study aims to investigate households with respect to catastrophic health expenditure by the clustering method. Clustering enables one to see the main similarity and difference between the groups. The results show that there are significant and interesting differences between the five groups. C4 households earn more but spend less money on health problems by the rate of 3.10% because people who do physical exercises regularly have fewer health problems. A household with a family with one adult, landlord and three people in total (mother or father and two children) in the cluster C5 earns much money and spends large amounts for health expenses than other clusters. C1 households with elementary families with three children, and who do not pay rent although they are not landlords have the highest catastrophic health expenditure. Households in C3 have a rate of 3.83% health expenditure rate on average, which is higher than other clusters. Households in the cluster C2 make the most catastrophic health expenditure. Full article
(This article belongs to the Special Issue Data-Driven Healthcare Tasks: Tools, Frameworks, and Techniques)
Show Figures

Figure 1

Figure 1
<p>Health expenditure details in clusters.</p>
Full article ">Figure 2
<p>Catastrophic health expenditures.</p>
Full article ">
11 pages, 2655 KiB  
Data Descriptor
TIRF Microscope Image Sequences of Fluorescent IgE-FcεRI Receptor Complexes inside a FcεRI-Centric Synapse in RBL-2H3 Cells
by Rachel Drawbond and Kathrin Spendier
Data 2019, 4(3), 111; https://doi.org/10.3390/data4030111 - 28 Jul 2019
Cited by 2 | Viewed by 4006
Abstract
Total internal reflection fluorescence (TIRF) microscope image sequences are commonly used to study receptors in live cells. The dataset presented herein facilitates the study of the IgE-FcεRI receptor signaling complex (IgE-RC) in rat basophilic leukemia (RBL-2H3) cells coming into contact with a supported [...] Read more.
Total internal reflection fluorescence (TIRF) microscope image sequences are commonly used to study receptors in live cells. The dataset presented herein facilitates the study of the IgE-FcεRI receptor signaling complex (IgE-RC) in rat basophilic leukemia (RBL-2H3) cells coming into contact with a supported lipid bilayer with 25 mol% N-dinitrophenyl-aminocaproyl phosphatidylethanolamine, modeling an immunological synapse. TIRF microscopy was used to image IgE-RCs within this FcεRI-centric synapse by loading RBL-2H3 cells with fluorescent anti-dinitrophenyl (anti-DNP) immunoglobulin E (IgE) in suspension for 24 h. Fluorescent anti-DNP IgE (IgE488) concentrations of this suspension increased from 10% to 100% and corresponding non-fluorescent anti-DNP IgE concentrations decreased from 90% to 0%. After the removal of unbound anti-DNP IgE, multiple image sequences were taken for each of these ten conditions. Prior to imaging, anti-DNP IgE-primed RBL-2H3 cells were either kept for a few minutes, for about 30 min, or for about one hour in Hanks buffer. The dataset contains 482 RBL-2H3 model synapse image stacks, dark images to correct for background intensity, and TIRF illumination profile images to correct for non-uniform TIRF illumination. After background subtraction, non-uniform illumination correction, and conversion of pixel units from analog-to-digital units to photo electrons, the average pixel intensity was calculated. The average pixel intensity within FcεRI-centric synapses for all three Hanks buffer conditions increased linearly at a rate of 0.42 ± 0.02 photo electrons per pixel per % IgE488 in suspension. RBL-2H3 cell degranulation was tested by detecting β-hexosaminidase activity. Prolonged RBL-2H3 cell exposure to Hanks buffer inhibited exocytosis in RBL-2H3 cells. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) TIRF microscope image of RBL-2H3 cell FcεRI-centric synapse. Prior to imaging, the cell was loaded with 80% fluorescent anti-DNP IgE and 20% dark anti-DNP IgE. Scale bar represents 5 µm; (<b>b</b>) Schematic of RBL-2H3 cell coming into contact with a supported lipid bilayer with monovalent ligand (25 mol% DNP-lipid) in bilayer.</p>
Full article ">Figure 2
<p>Flowchart of the experiment data files folder and subfolder structure of the published Mendeley data [<a href="#B33-data-04-00111" class="html-bibr">33</a>]. Cells of Sample 1, Sample 2, and Sample 3 were kept in Hanks buffer for a few minutes, about 30 min, or about 1 h before cells were added to the supported lipid bilayer, respectively. Each subfolder contains between 15 and 26 image stacks of individual RBL-2H3 cell FcεRI-centric synapses saved as ome.tif files as indicated by the number within curved brackets. A dark field image stack (Dark.ome.tif) and a TIRF illumination image stack (TIRF.ome.tif) are also contained in the experimental data files. Each image stack contains a sequence of 500 images.</p>
Full article ">Figure 3
<p>Gallery of images of RBL-2H3 cell model synapses labeled with 80% IgE<sub>488</sub> and 20% IgE<sub>dark</sub> in contact with supported lipid bilayers containing mobile ligand. The gallery shows 16 images for each of the three samples. The scale bar in the first image panel represents 5 µm and applies to all remaining panels.</p>
Full article ">Figure 4
<p>Prolonged RBL-2H3 cell exposure to Hanks buffer inhibits exocytosis in RBL-2H3 cells. Cells were either resuspended for one hour in cell media (black open circles) or in Hanks buffer (red open triangles) prior to the start of a DNP-BSA dose-dependent degranulation assay. Exocytosis or degranulation was measured as the percentage of total cellular hexosaminidase released.</p>
Full article ">Figure 5
<p>Mean intensity of fluorescently labeled IgE-FcεRI receptor signaling complexes within synaptic patches as a function of percent fluorescent anti-DNP IgE (IgE<sub>488</sub>) added to solution: (<b>a</b>) Mean intensity for each sample. <span class="html-italic">Sample 1</span> (solid black circles), <span class="html-italic">Sample 2</span> (solid blue squares), and <span class="html-italic">Sample 3</span> (solid red triangles) were kept in Hanks buffer for a few minutes, about 30 min, or about 1 h before cells were added to the supported lipid bilayer, respectively. Error bars represent the standard deviations; (<b>b</b>) Mean intensity of all three samples averaged together with error bars representing the standard deviations. Solid line represents a weighted linear fit <span class="html-italic">y</span> = <span class="html-italic">a x</span>. The slope <span class="html-italic">a</span> of this fit was 0.42 ± 0.01 photo electrons (e<sup>−</sup>) per pixel per percent (%) IgE<sub>488</sub> in solution.</p>
Full article ">
17 pages, 617 KiB  
Review
Reinforcement Learning in Financial Markets
by Terry Lingze Meng and Matloob Khushi
Data 2019, 4(3), 110; https://doi.org/10.3390/data4030110 - 28 Jul 2019
Cited by 82 | Viewed by 17958
Abstract
Recently there has been an exponential increase in the use of artificial intelligence for trading in financial markets such as stock and forex. Reinforcement learning has become of particular interest to financial traders ever since the program AlphaGo defeated the strongest human contemporary [...] Read more.
Recently there has been an exponential increase in the use of artificial intelligence for trading in financial markets such as stock and forex. Reinforcement learning has become of particular interest to financial traders ever since the program AlphaGo defeated the strongest human contemporary Go board game player Lee Sedol in 2016. We systematically reviewed all recent stock/forex prediction or trading articles that used reinforcement learning as their primary machine learning method. All reviewed articles had some unrealistic assumptions such as no transaction costs, no liquidity issues and no bid or ask spread issues. Transaction costs had significant impacts on the profitability of the reinforcement learning algorithms compared with the baseline algorithms tested. Despite showing statistically significant profitability when reinforcement learning was used in comparison with baseline models in many studies, some showed no meaningful level of profitability, in particular with large changes in the price pattern between the system training and testing data. Furthermore, few performance comparisons between reinforcement learning and other sophisticated machine/deep learning models were provided. The impact of transaction costs, including the bid/ask spread on profitability has also been assessed. In conclusion, reinforcement learning in stock/forex trading is still in its early development and further research is needed to make it a reliable method in this domain. Full article
(This article belongs to the Special Issue Data Analysis for Financial Markets)
Show Figures

Figure 1

Figure 1
<p>Flowchart showing the filtering of reinforcement articles chosen for review.</p>
Full article ">
12 pages, 1100 KiB  
Article
Prediction of Fault Fix Time Transition in Large-Scale Open Source Project Data
by Hironobu Sone, Yoshinobu Tamura and Shigeru Yamada
Data 2019, 4(3), 109; https://doi.org/10.3390/data4030109 - 27 Jul 2019
Cited by 3 | Viewed by 2808
Abstract
Open source software (OSS) programs are adopted as embedded systems regarding their server usage, due to their quick delivery, cost reduction, and standardization of systems. Many OSS programs are developed using the peculiar style known as the bazaar method, in which faults are [...] Read more.
Open source software (OSS) programs are adopted as embedded systems regarding their server usage, due to their quick delivery, cost reduction, and standardization of systems. Many OSS programs are developed using the peculiar style known as the bazaar method, in which faults are detected and fixed by developers around the world, and the result is then reflected in the next release. Furthermore, the fix time of faults tends to be shorter as the development of the OSS progresses. However, several large-scale open source projects encounter the problem that fault fixing takes much time because the fault corrector cannot handle many fault reports. Therefore, OSS users and project managers need to know the stability degree of open source projects by determining the fault fix time. In this paper, we predict the transition of the fix time in large-scale open source projects. To make the prediction, we use the software reliability growth model based on the Wiener process considering that the fault fix time in open source projects changes depending on various factors such as the fault reporting time and the assignees to fix the faults. In addition, we discuss the assumption that fault fix time data depend on the prediction of the transition in fault fixing time. Full article
Show Figures

Figure 1

Figure 1
<p>Open source software (OSS) development using the bug-tracking system.</p>
Full article ">Figure 2
<p>The transition of the weekly average fault fixing time and the number of fault reports for Eclipse.</p>
Full article ">Figure 3
<p>The transition of the weekly average fault fixing time and the number of fault reports for OpenStack.</p>
Full article ">Figure 4
<p>Remaining fault fixing time using the exponential model and the S-shaped model in Eclipse.</p>
Full article ">Figure 5
<p>Remaining fault fixing time using the exponential model and the S-shaped model in OpenStack.</p>
Full article ">Figure 6
<p>The transition of the fault fixing time using the exponential model and the S-shaped model in Eclipse.</p>
Full article ">Figure 7
<p>The transition of the fault fixing time using the exponential model and the S-shaped model in OpenStack.</p>
Full article ">Figure 8
<p>Evaluation method of prediction accuracy for the test data.</p>
Full article ">Figure 9
<p>Remaining fault fixing time using the exponential model with 58 weeks o learning data in Eclipse.</p>
Full article ">Figure 10
<p>Remaining fault fixing time using the exponential model with 58 weeks of learning data in OpenStack.</p>
Full article ">Figure 11
<p>Fault fixing time transition using exponential model with 58 weeks learning data in Eclipse.</p>
Full article ">Figure 12
<p>Fault fixing time transition using the exponential model with 58 weeks of learning data in Eclipse.</p>
Full article ">Figure 13
<p>Fault fixing time transition using the exponential model with 28 weeks of learning data in Eclipse.</p>
Full article ">Figure 14
<p>Fault fixing time transition using the exponential model with 28 weeks of learning data in Eclipse.</p>
Full article ">
12 pages, 1029 KiB  
Article
Urban Mobility Demand Profiles: Time Series for Cars and Bike-Sharing Use as a Resource for Transport and Energy Modeling
by Michel Noussan, Giovanni Carioni, Francesco Davide Sanvito and Emanuela Colombo
Data 2019, 4(3), 108; https://doi.org/10.3390/data4030108 - 26 Jul 2019
Cited by 13 | Viewed by 4242
Abstract
The transport sector is currently facing a significant transition, with strong drivers including decarbonization and digitalization trends, especially in urban passenger transport. The availability of monitoring data is at the basis of the development of optimization models supporting an enhanced urban mobility, with [...] Read more.
The transport sector is currently facing a significant transition, with strong drivers including decarbonization and digitalization trends, especially in urban passenger transport. The availability of monitoring data is at the basis of the development of optimization models supporting an enhanced urban mobility, with multiple benefits including lower pollutants and CO2 emissions, lower energy consumption, better transport management and land space use. This paper presents two datasets that represent time series with a high temporal resolution (five-minute time step) both for vehicles and bike sharing use in the city of Turin, located in Northern Italy. These high-resolution profiles have been obtained by the collection and elaboration of available online resources providing live information on traffic monitoring and bike sharing docking stations. The data are provided for the entire year 2018, and they represent an interesting basis for the evaluation of seasonal and daily variability patterns in urban mobility. These data may be used for different applications, ranging from the chronological distribution of mobility demand, to the estimation of passenger transport flows for the development of transport models in urban contexts. Moreover, traffic profiles are at the basis for the modeling of electric vehicles charging strategies and their interaction with the power grid. Full article
Show Figures

Figure 1

Figure 1
<p>Distribution of the number of accurate sensors for each measurement.</p>
Full article ">Figure 2
<p>Median vehicle flow depending on the working day (five minute data for 2018).</p>
Full article ">Figure 3
<p>Median vehicle flow depending on the month of the year (working days only, five minute data for 2018).</p>
Full article ">Figure 4
<p>Median vehicle speed depending on the working day (five minute data for 2018).</p>
Full article ">Figure 5
<p>Average bike pickups depending on the working day (five minute data for 2018).</p>
Full article ">Figure 6
<p>Average bike pickups depending on the working day (5-min data for 2018).</p>
Full article ">
10 pages, 1145 KiB  
Data Descriptor
Internal Seed Structure of Alpine Plants and Extreme Cold Exposure
by Ganesh K. Jaganathan and Sarah E. Dalrymple
Data 2019, 4(3), 107; https://doi.org/10.3390/data4030107 - 24 Jul 2019
Cited by 1 | Viewed by 3789
Abstract
Cold tolerance in seeds is not well understood compared to mechanisms in aboveground plant tissue but is crucial to understanding how plant populations persist in extreme cold conditions. Counter-intuitively, the ability of seeds to survive extreme cold may become more important in the [...] Read more.
Cold tolerance in seeds is not well understood compared to mechanisms in aboveground plant tissue but is crucial to understanding how plant populations persist in extreme cold conditions. Counter-intuitively, the ability of seeds to survive extreme cold may become more important in the future due to climate change projections. This is due to the loss of the insulating snow bed resulting in the actual temperatures experienced at soil surface level being much colder than without snow cover. Seed survival in extremely low temperatures is conferred by mechanisms that can be divided into freezing avoidance and freezing tolerance depending on the location of ice crystal formation within the seed. We present a dataset of alpine angiosperm species with seed mass and seed structure defined as endospermic and non-endospermic. This is presented alongside the locations of temperature minima per species which can be used to examine the extent to which different seed structures are associated with snow cover. We hope that the dataset can be used by others to demonstrate if certain seed structures and sizes are associated with snow cover, and if so, would they be negatively impacted by the loss of snow resulting from climate change. Full article
Show Figures

Figure 1

Figure 1
<p>Global distribution of locations at which species attain the coldest temperatures in their respective ranges. Blue circles denote endospermic species occurrences, red crosses denote non-endospermic species occurrences.</p>
Full article ">Figure 2
<p>Seed mass (g) after natural log transformation for species with endospermic and non-endospermic seeds.</p>
Full article ">Figure 3
<p>Mean diurnal range (Mean of monthly (max temp–min temp)) at locations where alpine species reach their distributional temperature minima. Data derived from Worldclim [<a href="#B37-data-04-00107" class="html-bibr">37</a>].</p>
Full article ">Figure 4
<p>Isothermality at locations where alpine species reach their distributional temperature minima. Data derived from Worldclim [<a href="#B37-data-04-00107" class="html-bibr">37</a>].</p>
Full article ">Figure 5
<p>Minimum temperature of the coldest month at locations where alpine species reach their distributional temperature minima. Data derived from Worldclim [<a href="#B37-data-04-00107" class="html-bibr">37</a>].</p>
Full article ">Figure 6
<p>Temperature annual range (°C) at locations where alpine species reach their distributional temperature minima. Data derived from Worldclim [<a href="#B37-data-04-00107" class="html-bibr">37</a>].</p>
Full article ">Figure 7
<p>Temperature seasonality at locations where alpine species reach their distributional temperature minima. Data derived from Worldclim [<a href="#B37-data-04-00107" class="html-bibr">37</a>].</p>
Full article ">
8 pages, 868 KiB  
Data Descriptor
Scots Pine Seedlings Growth Dynamics Data Reveals Properties for the Future Proof of Seed Coat Color Grading Conjecture
by Arthur Novikov, Vladan Ivetić, Tatyana Novikova and Evgeniy Petrishchev
Data 2019, 4(3), 106; https://doi.org/10.3390/data4030106 - 23 Jul 2019
Cited by 14 | Viewed by 3300
Abstract
Seed coat color grading conjecture is also known as Pravdin’s conjecture. To verify the conjecture, we established a long-term field experiment. This data set included unique empirical data of Scots pine (Pinus sylvestris L.) container-grown seedlings produced from different seed color grades, [...] Read more.
Seed coat color grading conjecture is also known as Pravdin’s conjecture. To verify the conjecture, we established a long-term field experiment. This data set included unique empirical data of Scots pine (Pinus sylvestris L.) container-grown seedlings produced from different seed color grades, outplanted on a post fire site in the Voronezh region, Russia. Variables were provided for 10 rows of 90 samples in each row. These data contribute to our understanding of seed germination and seedlings growth dynamics from size and color gradings of seeds. This structure is the future basis of the Forest Reproductive Material Library (FRMLib) and will be used for assisted migration and forest seed transfer. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Relationship data model for image and variable collections. This structure is the future basis of the Forest Reproductive Material Library (FRMLib) and will be used for assisted migration and transfer of Forest Reproductive Material (FRM).</p>
Full article ">
9 pages, 625 KiB  
Data Descriptor
Building Stock and Building Typology of Kigali, Rwanda
by Felix Bachofer, Andreas Braun, Florian Adamietz, Sally Murray, Pablo d’Angelo, Edward Kyazze, Abias Philippe Mumuhire and Jonathan Bower
Data 2019, 4(3), 105; https://doi.org/10.3390/data4030105 - 21 Jul 2019
Cited by 15 | Viewed by 6376
Abstract
This study uses very high-resolution Pléiades imagery for the densely built-up central part of the City of Kigali for the year 2015 in order to derive urban morphology data on building footprints, building archetypes and building heights. Aerial images of the study area [...] Read more.
This study uses very high-resolution Pléiades imagery for the densely built-up central part of the City of Kigali for the year 2015 in order to derive urban morphology data on building footprints, building archetypes and building heights. Aerial images of the study area from 2008–2009 were used in combination with the 2015 dataset to create a change monitoring dataset on a single building basis. A semi-automated approach was chosen which combined an object-based image analysis with an expert-based revision. The result is a geospatial dataset that detects 165,625 buildings for 2008–2009 and 211,458 for 2015. The dataset includes information on the type of changes between the two dates. Analysis of this geospatial dataset can be used for a range of research applications in economics and the social sciences, as well as a range of policy applications in urban planning and municipal finance administration. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Study area and analysis extent.</p>
Full article ">
10 pages, 2649 KiB  
Data Descriptor
Towards the Fulfillment of a Knowledge Gap: Wood Densities for Species of the Subtropical Atlantic Forest
by Laio Zimermann Oliveira, Heitor Felippe Uller, Aline Renata Klitzke, Jackson Roberto Eleotério and Alexander Christian Vibrans
Data 2019, 4(3), 104; https://doi.org/10.3390/data4030104 - 20 Jul 2019
Cited by 20 | Viewed by 4771
Abstract
Wood density ( ρ ) is a trait involved in forest biomass estimates, forest ecology, prediction of stand stability, wood science, and engineering. Regardless of its importance, data on ρ are scarce for a substantial number of species of the vast Atlantic Forest [...] Read more.
Wood density ( ρ ) is a trait involved in forest biomass estimates, forest ecology, prediction of stand stability, wood science, and engineering. Regardless of its importance, data on ρ are scarce for a substantial number of species of the vast Atlantic Forest phytogeographic domain. Given that, the present paper describes a dataset composed of three data tables: (i) determinations of ρ (kg m−3) for 153 species growing in three forest types within the subtropical Atlantic Forest, based on wood samples collected throughout the state of Santa Catarina, southern Brazil; (ii) a list of 719 tree/shrub species observed by a state-level forest inventory and a ρ value assigned to each one of them based on local determinations and on a global database; (iii) the means and standard deviations of ρ for 477 permanent sample plots located in the subtropical Atlantic Forest, covering ∼95,000 km2. The mean ρ over the 153 sampled species is 538.6 kg m−3 (standard deviation = 120.5 kg m−3), and the mean ρ per sample plot, considering the three forest types, is 525.0 kg m−3 (standard error = 1.8 kg m−3). The described dataset has potential to underpin studies on forest biomass, forest ecology, alternative uses of timber resources, as well as to enlarge the coverage of global datasets. Full article
(This article belongs to the Special Issue Forest Monitoring Systems and Assessments at Multiple Scales)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Distribution of the 153 species’ <math display="inline"><semantics> <mi>ρ</mi> </semantics></math>; (<b>b</b>) distribution of the species’ <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> in the forest type in which they were sampled. ERF: evergreen rainforest; AF: <span class="html-italic">Araucaria</span> forest; SF: semi-deciduous forest.</p>
Full article ">Figure 2
<p>(<b>a</b>) Distribution of all the IFFSC sample plots’ mean <math display="inline"><semantics> <mi>ρ</mi> </semantics></math>; (<b>b</b>) distribution of sample plots’ mean <math display="inline"><semantics> <mi>ρ</mi> </semantics></math> in the three main forest types in the state; the first box in each group refers to arithmetic means and the second to means weighted by the species’ basal area. ERF: evergreen rainforest; AF: <span class="html-italic">Araucaria</span> forest; SF: semi-deciduous forest.</p>
Full article ">Figure 3
<p>The study area and its main forest types, the IFFSC’s sample plots, and sites wherein wood samples were collected.</p>
Full article ">Figure 4
<p>(<b>a</b>) Example of collected branches with ⌀ ≥ 5 cm and (<b>b</b>) discs taken from a branch.</p>
Full article ">
8 pages, 355 KiB  
Data Descriptor
Correlations between Environmental Factors and Milk Production of Holstein Cows
by Roman Mylostyvyi and Olexandr Chernenko
Data 2019, 4(3), 103; https://doi.org/10.3390/data4030103 - 19 Jul 2019
Cited by 25 | Viewed by 8150
Abstract
Global climate change is a challenge for dairy farming. In this regard, identifying reliable correlations between environmental parameters and animals’ physiological responses is a starting point for the mathematical modeling of their effects on the future welfare and milk production of cows. The [...] Read more.
Global climate change is a challenge for dairy farming. In this regard, identifying reliable correlations between environmental parameters and animals’ physiological responses is a starting point for the mathematical modeling of their effects on the future welfare and milk production of cows. The aim of the study was to examine the relationship between environmental parameters and the milk production of cows in hot period. Archival data from the Ukrainian Hydrometeorological Center were used to study the state of insolation conditions (IC), wind direction (WD), wind strength (WS), air temperature (AT), and relative humidity (RH). The temperature–humidity index (THI) (Kibler, 1964) and temperature–humidity index in the hangar-type cowshed (THICHT) (Mylostyvyi et al., 2019) served as integral indicators of the state of the cowshed’s microclimate. The daily milk yield (DMY), yield of milk fat (MF) and milk protein (MP), and percentage of milk fat (PMF) and protein (PMP) were taken into account by the DairyComp 305 herd management system (VAS, USA). Statistical data processing was performed using the mathematical functions of Microsoft Excel (Microsoft Inc.) and Statistica 10 (StatSoft Inc.). There was a weak correlation between IC and DMY at r = −0.2, between RH and DMY at r = +0.4, and between RH and MF at r = +0.2. Between DMY, MF, MP, and WS made up r = –0.2 to 0.4. Between DMY, MF, MP, and AT made up r = −0.2 to 0.5 (p < 0.05). The effects of weather factors on animal productivity will be the subject of further research. Full article
Show Figures

Figure 1

Figure 1
<p>Directions of prevailing winds during the study period.</p>
Full article ">
19 pages, 10075 KiB  
Article
Semantic Earth Observation Data Cubes
by Hannah Augustin, Martin Sudmanns, Dirk Tiede, Stefan Lang and Andrea Baraldi
Data 2019, 4(3), 102; https://doi.org/10.3390/data4030102 - 17 Jul 2019
Cited by 34 | Viewed by 7653
Abstract
There is an increasing amount of free and open Earth observation (EO) data, yet more information is not necessarily being generated from them at the same rate despite high information potential. The main challenge in the big EO analysis domain is producing information [...] Read more.
There is an increasing amount of free and open Earth observation (EO) data, yet more information is not necessarily being generated from them at the same rate despite high information potential. The main challenge in the big EO analysis domain is producing information from EO data, because numerical, sensory data have no semantic meaning; they lack semantics. We are introducing the concept of a semantic EO data cube as an advancement of state-of-the-art EO data cubes. We define a semantic EO data cube as a spatio-temporal data cube containing EO data, where for each observation at least one nominal (i.e., categorical) interpretation is available and can be queried in the same instance. Here we clarify and share our definition of semantic EO data cubes, demonstrating how they enable different possibilities for data retrieval, semantic queries based on EO data content and semantically enabled analysis. Semantic EO data cubes are the foundation for EO data expert systems, where new information can be inferred automatically in a machine-based way using semantic queries that humans understand. We argue that semantic EO data cubes are better positioned to handle current and upcoming big EO data challenges than non-semantic EO data cubes, while facilitating an ever-diversifying user-base to produce their own information and harness the immense potential of big EO data. Full article
(This article belongs to the Special Issue Earth Observation Data Cubes)
Show Figures

Figure 1

Figure 1
<p>Schematic illustration of a semantic Earth observation (EO) data cube (left) used for an exemplary semantic content-based image retrieval (SCBIR) query. Here, a query searches for images with low cloud and low snow cover within a user-defined area of interest (AOI)-based on the associated semantic information. It retrieves images that match the semantic content-based criteria for the AOI instead of the entire image’s extent. In a classic image wide query such AOI specific semantic queries are not possible.</p>
Full article ">Figure 2
<p>A flood mask generated from 78 semantically enriched Landsat 8 images over 9 months in Somalia (left) as an indicator for flood risk is compared to a single event analysis following a reported flood event in the year before (right). Both maps are the result of basic user queries using the semantic information only, without the use of additional parameters or calculations on the original data sets. Originally published as CC-BY-ND by [<a href="#B48-data-04-00102" class="html-bibr">48</a>], modified.</p>
Full article ">Figure 3
<p>The spatial extent of the semantic EO data cube comprises three Sentinel-2 granules. (<b>a</b>) displays the true colour Sentinel-2 images as processed by the European Space Agency (ESA); (<b>b</b>) shows the area as represented in OpenStreetMap.</p>
Full article ">Figure 4
<p>This figure displays the results of the semantic query for water-like observations for two spatio-temporal extents of interest. (<b>a</b>) Query for water-like observations from 15 March to 15 April 2018. (<b>b</b>) Query for water-like observations from 15 March to 15 April 2019. (<b>c</b>) Close-up of an area where water-like observations were present in 2019 but not in 2018.</p>
Full article ">Figure 5
<p>Pseudocode describing how the normalised observed surface water occurrence (SWO) over time is calculated based on semi-concepts, in addition to two other outputs necessary for its calculation. The array of “total clean observations” provides the number of observations over time per-pixel after excluding cloud-like, snow-like and unknown pixels in the spatio-temporal extent of interest. Snow-like are excluded in this case based on the knowledge that there is generally no snow within the spatio-temporal extent of interest. “Total water observations” refers to the number of observations over time per-pixel that water-like spectral profiles were observed. It is the ratio between these two outputs (i.e., total divided by clean observations per-pixel) that results in the normalised observed SWO.</p>
Full article ">Figure 6
<p>This figure displays the results of a different semantic query for the same two spatio-temporal extents of interest used in the query of water-like observations seen in <a href="#data-04-00102-f005" class="html-fig">Figure 5</a>. (<b>a</b>) Normalised observed vegetation occurrence from 15 March to 15 April 2018. (<b>b</b>) Normalised observed vegetation occurrence from 15 March to 15 April 2019. (<b>c</b>) Normalised observed SWO from 15 March to 15 April 2019 overlaid above normalised observed vegetation occurrence as represented in (<b>b</b>).</p>
Full article ">
23 pages, 4815 KiB  
Article
Feedforward Neural Network-Based Architecture for Predicting Emotions from Speech
by Mihai Gavrilescu and Nicolae Vizireanu
Data 2019, 4(3), 101; https://doi.org/10.3390/data4030101 - 15 Jul 2019
Cited by 9 | Viewed by 4446
Abstract
We propose a novel feedforward neural network (FFNN)-based speech emotion recognition system built on three layers: A base layer where a set of speech features are evaluated and classified; a middle layer where a speech matrix is built based on the classification scores [...] Read more.
We propose a novel feedforward neural network (FFNN)-based speech emotion recognition system built on three layers: A base layer where a set of speech features are evaluated and classified; a middle layer where a speech matrix is built based on the classification scores computed in the base layer; a top layer where an FFNN- and a rule-based classifier are used to analyze the speech matrix and output the predicted emotion. The system offers 80.75% accuracy for predicting the six basic emotions and surpasses other state-of-the-art methods when tested on emotion-stimulated utterances. The method is robust and the fastest in the literature, computing a stable prediction in less than 78 s and proving attractive for replacing questionnaire-based methods and for real-time use. A set of correlations between several speech features (intensity contour, speech rate, pause rate, and short-time energy) and the evaluated emotions is determined, which enhances previous similar studies that have not analyzed these speech features. Using these correlations to improve the system leads to a 6% increase in accuracy. The proposed system can be used to improve human–computer interfaces, in computer-mediated education systems, for accident prevention, and for predicting mental disorders and physical diseases. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Emotion prediction accuracy based on the duration of the analyzed utterance (intra-subject methodology).</p>
Full article ">Figure 2
<p>Emotion prediction accuracy based on the duration of the analyzed utterance (inter-subject methodology).</p>
Full article ">Figure 3
<p>Emotion prediction accuracy based on the number of subjects involved in the training phase (inter-subject methodology).</p>
Full article ">Figure 4
<p>Average processing time - comparison with state-of-the-art methods.</p>
Full article ">Figure 5
<p>The overall architecture for predicting emotions based on speech.</p>
Full article ">Figure 6
<p>Prediction accuracy using different feedforward neural network (FFNN) combinations; we denote with T—tanh activation function, S—sigmoid activation function, R—ReLU activation function, SMAX—softmax activation function, 1L—FFNN with one hidden layer, 2L—FFNN with two hidden layers, and each combination is labeled with <span class="html-italic">[number of hidden layers][1<sup>st</sup> hidden layer activation function][2<sup>nd</sup> hidden layer activation function (if FFNN has two hidden layers][output layer activation function]</span>.</p>
Full article ">Figure 7
<p>SENN prediction accuracy based on the number of neurons in hidden layers.</p>
Full article ">Figure 8
<p>SENN configuration and hyperparameters.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop