4.3.1 CCD: Device Identification Using Clock Characteristics.
The broad variety of IoT devices makes it challenging to design identification mechanisms that are universally applicable. A mechanism that uses an universal information source is the CCD approach. CCD is one of the mechanisms used in the SAFER prototype that initiates from our previous work [
45]. CCD probes an internal clock of a device over the network using timestamps in TCP. The advantage of using TCP timestamps [
7] is that they are easily accessible and scanning is minimally disruptive, because nearly all IoT devices implement TCP and use it for their primary functionality. The underlying assumption of CCD is that all devices of the same model share the same clock-related characteristics, which make them identifiable. In this work, we extend the existing evaluation of CCD from our previous work [
45] to a larger dataset with more device models to demonstrate the scalability of the framework. We then use the results of this evaluation to explain how subjective logic opinions should be formed for use within SAFER.
Overview of CCD. CCD uses a non-invasive scanning method to identify the device model of a particular device, which is based on clock characteristics that are exposed through the TCP timestamps the device provides over time. Compared to other scanning methods with similar intrusiveness (e.g., a very low intensity Nmap scan), CCD can provide more device information [
45] for trained models by exploiting information in TCP timestamps. To achieve this, a scan is performed over a much longer period of time than other methods. A single scan results in a vector of 576 TCP timestamps, collected over a period of 48 hours. These vectors are used as input for a random forest [
9] classification algorithm, which is trained to detect known device models. In essence, the classifier compares the clock characteristic features of the DUT to a dataset of known device models and provides the probability of a match for each of these.
Applying CCD. CCD can be applied using a previously collected dataset of device model scans, which can be expanded with data from new device models. As the random forest classifier is used here in a supervised learning strategy, to expand the dataset, a series of initial scans of a new device model needs to be performed. However, this task can be performed as part of the standard operation. The result of the initial scans can be shared and enables the detection of the same device model (even if it is a different instance of the same device model in other networks), because the features are related to the device model rather than the environment. This process works for IoT devices that support TCP timestamps and is not limited to specific vendors or protocols. Hardware specifications of IoT devices are identical and use similar firmware. This lets clock characteristics become similar for device models. Non-IoT devices are built with varying and more complex hardware/operating systems, where TCP timestamps enable, for example, clock-based fingerprinting approaches [
32].
CCD evaluation. In this article, we extend the existing evaluation of CCD in our previous work [
45] to a larger dataset with more device models to demonstrate scalability. We also use this dataset to determine the appropriate quantification of evidence that enables the integration of CCD in the subjective logic based fusion used by SAFER. The fusion process itself is then separately evaluated in Section
4.4.
Dataset. We collected 5,993,052 unique scans, each containing 576 TCP timestamp samples taken over a time frame of 48 hours, as the dataset for this evaluation. The scans were performed in the CERN network, which has a wide variety of IoT and non-IoT devices brought and used by visiting researchers from all over the world. The scans were performed from several hosts throughout different subnets in the network infrastructure. The ground truth for these scans is provided by a manual identification process in SAFER’s setup phase, which labels each scan with the associated device model. This manual identification is the ground truth for our dataset, against which we evaluate SAFER. The manual process is not required for the practical application of SAFER, as a ground truth is only required for the experimental validation of SAFER. The dataset contains 752 unique IoT devices associated with 57 different device models. The dataset was collected from a network with a total of 13,181 various networked devices, which included servers, workstations, virtual machines, and embedded devices. The classification into IoT/non-IoT was done using a manual classification procedure.
Methodology. To evaluate CCD, we split the dataset into a training set representing 75% of all scans, a test set using 20%, and a validation set using 5%. In this experiment, the random forest classifier at the heart of CCD is trained with 10 iterations, where the final model is the average of these individual iterations. We evaluated in a preliminary unpublished study that the classifier performs best by using 60 decision trees. Each iteration uses a different split of training and test set to avoid overfitting to a specific training set. The evaluation of CCD, which should provide an unbiased assessment of the model’s performance on data not used during any training iteration, is performed using the validation set.
Results. We used the clock characteristics identification mechanism in CERN’s large-scale network to identify the IoT devices in the dataset. For the 57 device models of various categories that CCD is able to identify, we gained the following results for the training set: precision 90.81%, recall 90.19%, and accuracy 98.67%. We then used our training set to identify scans of the validation set, which our classifier was never trained with. This resulted in the following results for the validation set: precision 90.43%, recall 90.42%, and accuracy 98.64%. There are notable differences between different categories of device models, as can be seen in Table
1.
Discussion. In Table
1, we show the identified IoT devices, which are grouped for the sake of readability. We observed multiple NAS device models of QNAP and Synology that are only identifiable with an accuracy of 30% to 40%, which reduces the overall results significantly. One reason is that both manufacturers use the same hardware for different device models. CCD identifies these device similarities but is not able to distinguish these devices with high accuracy. To combine the evidence with other identification mechanisms, we convert the output into a subjective logic opinion. CCD’s testing was performed by multiple hosts within different subnets of CERN’s network infrastructure by initiating TCP connections in an active manner. This is less likely to be affected by intermediate network equipment, as the TCP timestamps of the DUT’s clock are persisted unmodified in the packet itself and thus not significantly affected by inter-arrival times. Additionally, CCD has a sampling frequency of 5 minutes between the 576 data points, which makes it less susceptible to network interference. We stress that our scans are non-invasive, as they mimic normal user behavior.
Opinion configuration. To generate an opinion, the supporting evidence that CCD provides per device model must be quantified. The random forest classifier provides a probability of a correct match,
\(p(x)\). A naive belief definition would assign the probability output directly to the belief function. However, this fails to consider the uncertainty in the predictions.
7 We define the belief in a particular classification result
x as
This equation considers two types of uncertainty: the uncertainty due to classification errors
k and the uncertainty due to the limited model size, which is defined by the number of decision trees used for training
D. The intuition is that
k redistributes belief mass to uncertainty, based on the positive predictive value (i.e., precision) of CCD, essentially reducing overestimated belief due to overfitting.
8 To ensure this always generates valid opinions, the division by
n, the total number of identifiable device models, is needed. However, as the amount of decision trees used for training
D increases, our confidence in the correct classification increases; therefore, we use this to weight the probability. The values for these parameters are based on an unpublished preliminary study. In that study, we determined the optimal performance to be at
\(D=60\) with the same set of
n identifiable devices, resulting in a precision value for the classifier of 0.7, such that
\(k=0.7\).
9 \(D=60\) is a parameter of the random forest classifier, used both in this work and in our previous work [
45]. The remaining belief mass not assigned to any
x is then uncertainty:
To conclude, Equation (
3) describes the probability of having identified the correct device model in relation to the probability of classifying all device models. Since one could possibly overfit the classifier using many decision trees
D, we introduced
k as the precision value that represents how useful CCD’s validation set results are in general.
4.3.2 WPD: Device Identification Using Web Pattern Detection.
As a second source of information to identify the device model, and additionally the firmware version, we developed a mechanism focusing on IoT web pages containing the relevant information. WPD uses patterns in the user interfaces of IoT devices, which are typically web pages, to automatically identify the device model and firmware. We convert the evidence provided by this mechanism into a subjective logic opinion, which then enables us to fuse the different evidence sources in our fusion approach. In contrast to the well-known Nmap tool, WPD does not use banners of non-HTTP services for identification. This is because we found that using Nmap to identify device models based on service banners only was not able to achieve a sufficient identification ratio. The low identification ratio is caused by the fact that, within the set of devices we are interested in, the service banners contain very little information to distinguish device models. Moreover, identifying the firmware version of a device using banners also poses a challenge, since version information is often removed from banners, and the version of a service does not uniquely identify a firmware image. Therefore, we designed WPD as an alternative approach to address this challenge.
Overview. For IoT devices, the primary way to administrate and configure the device is often through a web interface, typically exposed through HTTP. A common way to manually gather information about the device is to analyze these administration interfaces; WPD essentially automates this process. WPD retrieves web pages and searches for patterns that reveal information about the device, such as strings in HTML pages and version numbers in embedded JavaScript snippets. WPD also uses hash values of images on these web pages, as well as Application Programming Interfaces (APIs) to fetch device-related information. We detected that CERN’s IoT devices also serve web pages on ports other than 80 and 443, which we integrated as well. The main novelty of WPD is not the automated operation but rather the evidence quantification that allows us to fuse the evidence with other evidence. Using our large, heterogeneous network, we are also able to demonstrate that this mechanism scales after an initial pattern definition effort.
To limit intrusiveness of WPD, our patterns are designed to require few HTTP queries: a generic query that is always used and a device-specific query. The first query corresponds to a typical user interaction, such as opening a device’s main page in a typical web browser. The resulting files are then analyzed using all configured patterns to gather basic information about the device; if the information retrieved is limited (e.g., if only the manufacturer can be identified), then a second query is performed. The second query accesses HTTP resources that can be used to identify a specific device model, such as a manufacturer-specific debug page that provides additional device information.
Web patterns to identify devices are set manually once per major user interface release of a firmware during SAFER’s setup phase. We observed that manufacturers aim to minimize development efforts and equip several device models with identical firmware and hence similar user interface. This enables us to identify various devices using the same web patterns even for multiple firmware versions. Web patterns are created once by a single person to let SAFER (1) identify yet unknown devices and (2) major user interface updates. Pursuing a community-based approach, those patterns are propagated to a central SAFER instance enabling the entire community to identify such devices as well.
Applying WPD. WPD can be applied to a network using a pre-configured set of web patterns, which are part of the setup phase of SAFER. This pre-configured set of web patterns is based on several years of observations within the large and heterogeneous CERN network infrastructure. Many of the patterns are designed with specific device types or device models in mind; we expect that WPD is regularly improved to include additional web patterns for new devices or firmware versions to improve accuracy. In the following, we introduce the types of patterns that are currently supported by SAFER.
String patterns: WPD collects string patterns from page sources of the IoT device, which are often device manufacturer, device model, and sometimes even firmware version specific. For example, if the device is a printer, the identification mechanism detects common printer string patterns, like “Cartridge” or “Tray” occurring in well-defined positions in the page source. Naively searching for web patterns in the page source without considering the context can cause false positives, because, for example, a random number may be erroneously be identified as a firmware version. We describe a web page’s context using XPATH, which narrows the potential page content where WPD can apply its web patterns to.
Embedded libraries: IoT manufacturers embed a variety of custom-built or public third-party libraries, such as JavaScript or CSS libraries, into their device’s web pages. WPD recognizes the occurrence of model-specific libraries, for example, and uses this as additional evidence for device identification.
Hashes of images: Some manufacturers embed descriptive text into images. For example, Sony’s network camera includes the text “Ipela network camera SNC-RZ50P” inside an image on the device web page. Hence, WPD can detect the device model via comparing hashes of such images.
APIs: Some manufacturers, such as QNAP,
10 provide applications to administer their devices, which can scan the network for their devices. We analyzed the network traffic of such an application for QNAP devices, for example, where we detected always unprotected and generic REST-API URL. WPD mimics such a scan using the same REST-API endpoints as the administrative applications to read out the model and firmware version directly.
WPD evaluation. To evaluate the performance of WPD, we scanned the CERN network, which at the time contained 13,181 networked devices, out of which 687 are classified as IoT devices. This ground truth is derived from a manual classification process to verify if WPD identified all IoT devices, as well as WPD’s associated IoT device models and firmware versions. However, this manual process is not required for the practical application of SAFER. Within this dataset, WPD identified 526 out of 687 IoT devices, which is an identification rate of 76.56%. We manually verified the correctness of this classification and found that there were no false positives. The detailed results of the detected devices can be found in Table
2, which lists the identified devices per category, models, and firmware version found. WPD identified that the 526 detected IoT devices can be categorized into 50 manufacturers, 110 models, and 124 firmware versions of home, business, and laboratory IoT devices. For the 161 IoT devices that were not correctly identified, a partial classification was possible for a further 110 devices (e.g., only category, manufacturer, or model). The remaining 51 IoT devices were fully protected by an authentication prompt.
These results assume no knowledge of credentials for the landing pages of IoT devices. However, if the user specifies the credentials, WPD is able to analyze password-protected devices.
Opinion configuration. The WPD opinion is set by matching previously specified web patterns to a device web page. This identifies—in the best case—the device manufacturer, model, and firmware version. Increasing the amount of web pattern matches increases the identification certainty of the potential device model. If WPD identified a firmware version, detected image hashes, or identified configuration pages, we increased the certainty even more. This gathered certainty affirms the correctness of the identified device information of WPD. To configure a subjective opinion, it is relevant to have device model knowledge and, moreover, identification mechanism knowledge. This is important, because we consider the most describing web patterns of a device model only. To explain the background knowledge, which results in the belief Equation (
6), we split the calculation into different parts.
The first part of Equation (
6) is defined within the brackets stating
\(amount\_text\_patterns\) and
\(amount\_hashes\). Based on our observations, CERN’s assessed devices serve either a web page containing device describing text or pages mainly containing an image that embeds device information. We identified image-embedded device information for seven device models of seven different categories. Since WPD only matches few text patterns in such an “image-only” case, we set the weight of
\(amount\_hashes\) to 0.3. This resulted in one hash having similar weight than two matching text pattern, because in an “image-only” case, we identified that the image contains the vendor and model of the device. To compare this, identifying a vendor and model with text patterns would result in two used text patterns. We argue that hashes are rarely used and are—due to their weight—limited in occurrence.
The second part of Equation (
6) relates to
\(amount\_config\_patterns\). First, we add
\(\frac{1}{6}\) per matched
\(amount\_config\_patterns\) analog to matched
\(amount\_text\_pattern\), the reason being that we only consider up to six web patterns (text and config) to calculate the belief value, which we discuss in the later part of this section. We argue that if WPD matched a web pattern on a configuration page containing detailed technical or administrative information of a device, we consider it more likely that this information is correct. To express this increased belief in the Equation (
6), we added the factor 0.1 to
\(amount\_config\_patterns\).
The last part of Equation (
6) refers to
\(fw\_detected\). This value is set to 0.4 as shown in Equation (
5), if WPD found the firmware version that occurs in well-defined places within a device’s web page. WPD outputs only the most characteristic patterns it matched for a device model, its vendor, and the firmware. In our experiment, most commonly two text patterns matched the device vendor and model, and one config pattern matched the firmware version. This lets the first part of Equation (
6) be
\(\frac{2}{6}\), the second part to be
\(\frac{1}{6} + 0.1,\) and the last part to be 0.4. This results in a high belief of 1.0, meaning that the device was identified by WPD.
where
amount_hashes is the number of recognized images (using known hashes),
amount_text_ patterns is the number of matches from text-based web patterns,
amount_config_patterns is the number of matches from configuration page web patterns, and
fw_detected represents if WPD detected the firmware version.
The maximum amount of text patterns (\(amount\_text\_patterns\) and \(amount\_config\_patterns\)) for WPD is set to 6. We limit matching text patterns and hashes because WPD is designed to use only the most describing ones per device model. This prevents WPD from including, for example, unrelated patterns of other categories or vendors that would falsify the belief of a device identification.