research-article

Open access

LaserShoes: Low-Cost Ground Surface Detection Using Laser Speckle Imaging

Authors:

Yu Cai,

Yang ZhangAuthors Info & Claims

CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Article No.: 853, Pages 1 - 20

https://doi.org/10.1145/3544548.3581344

Published: 19 April 2023 Publication History

All formats PDF

Abstract

Ground surfaces are often carefully designed and engineered with various textures to fit the functionalities of human environments and thus could contain rich context information for smart wearables. Ground surface detection could power a wide array of applications including activity recognition, mobile health, and context-aware computing, and potentially provide an additional channel of information for many existing kinesiology approaches such as gait analysis. To facilitate the detection of ground surfaces, we present LaserShoes, a texture-sensing-enabled system using laser speckle imaging that can be retrofitted to shoes. Our system captures videos of speckle patterns induced on ground surfaces and uses pre-processing to identify ideal images with clear speckle patterns collected when users’ feet are in contact with ground surfaces. We demonstrated our technique with a ResNet-18 model and achieved real-time inference. We conducted an evaluation in different conditions and demonstrated results that verified the feasibility.

Figure 1:

1 Introduction

Human environments contain rich contextual information that could be used to power a variety of context-aware computing applications. Users’ presence in a kitchen, for example, often indicates food preparation activities, whereas classrooms indicate learning and theaters indicate entertainment. As a result, accurate and robust sensing of user presence in environments with varying functionalities has long been desired in HCI [1, 70, 71]. Additionally, fine-grained information on user location could also facilitate conventional sensor-aided approaches such as gait analysis [14, 30], activity logging [26], and beyond for medical research and many more in-the-wild studies.

In this research, we create a wearable system to recognize ground surfaces, which are a universal and expressive feature of human environments and often are strong indicators of user contexts. Surface texture, as a distinguishing feature of any ground surface defined by four characteristics including lay, flaw, roughness, and waviness [43], has recently received tons of attention in the sensing research field. For example, texture-based ground surface detection has been widely used in applications of robotics, such as assisting mobile robots in detecting obstacles [73] and promoting autonomous agriculture [42].

As we lay barefoot on ground surfaces, we feel the soft grass of a lawn, lumpy fabrics on a carpet, gritty soil of a hiking trail, smooth tiles of a bathroom, grainy wood of a floor, and rough sands on a beach. We believe that wearable intelligence could benefit from enhanced perceptual capabilities of sensing ground surfaces, similar to what humans can do but without limitations in sensitivity, granularity, latency, and time of operation, in order to achieve a better understanding of environments and user contexts, and to provide assistance, accommodate for natural interactions, and log important patterns in information for analysis and diagnosis.

As users’ feet are almost always in contact with ground surfaces, shoe-instrumented wearables serve as an ideal platform for sensing ground surfaces. To enable shoe wearables to sense ground surfaces, we propose LaserShoes, a low-cost ground surface detection system using the laser speckle imaging technique (Fig. 1). In comparison with conventional vision-based approaches taking RGB photos of ground surfaces, laser speckle imaging reveals richer and more accurate information about textures of ground surfaces using an active signal – laser beams. When compared to cameras, laser speckle imaging can distinguish surface textures that appear visually similar. Additionally, unlike conventional imaging systems which require lenses, laser speckle imaging does not require a lens and thus cannot provide clear visuals of users’ backgrounds to preserve privacy.

Our system mainly consists of a laser emitter, an image sensor (CCD), and a Raspberry Pi board. The laser emitter and the image sensor are connected to shoes to capture videos of speckle patterns that reflect surface textures. The Raspberry Pi board is instrumented to a user’s lower leg and runs the detection pipeline which features a pre-processing phase to eliminate blurry images, and a deep learning model to acquire ground surface types. The entire system costs $136. We recruited 15 participants in a user study where they were asked to walk on 24 ground surfaces for 1~2 minutes. In total, we collected 28,492 1.5s video sessions. We validated our system under within-user and cross-user conditions, and the classification accuracy of within-user and cross-user conditions is $86.93\%$ and $80.57\%$, respectively. We also carried out three additional studies to tease out the performance of our system in detecting dry, wet, and frozen surfaces, and sand surfaces of different grain sizes, and under various lighting conditions. Finally, we demonstrated applications enabled by our system, such as personal running assistant, gait analysis, surface-aware cleaning equipment, coarse navigation, and daily activity recognition through localization.

In summary, our main contributions include:

•

We designed and implemented LaserShoes, a laser-imaging-based ground surface detection wearable system that can identify ground surfaces.

•

We designed a data process method for LaserShoes to identify relative stationary frames from collected videos and completed an end-to-end real-time inference pipeline based on contemporary deep learning techniques.

•

We conducted an evaluation with 15 participants to investigate the performance of LaserShoes with two validation mechanisms (i.e., within- and cross-user), and under various surface and environmental conditions.

2 Related Work

2.1 Sensing Ground Surface with Smart Shoes

With the rise of ubiquitous computing, various smart shoes are designed and developed for sensing ground surfaces and accommodating novel interaction modalities [62, 67]. Taking advantage of their unique position, and linking the foot with ground surfaces, smart shoes can often yield information beyond what is possible through wearables at other body locations.

Prior work has demonstrated ground surface identification using foot kinematics, which could be used for danger alerts and human activity recognition. Specifically, Otis et al. [37] used a variety of sensors, including accelerometers, gyroscopes, and force sensors, to distinguish between the physical properties of different soils. Cheng et al. [8] designed wearable capacitive sensors and applied them around users’ ankles to recognize whether users were walking on concrete or in the meadow. Furthermore, Matthies et al. [31] designed CapSoles leveraging the capacitive ground coupling effect to detect six different ground surfaces. Zrenner et al. [75] revealed the relationship between foot kinematics and ground surfaces with different properties using inertial measurement units (IMU). Strada et al. [50] used gait data collected by inertial sensors embedded in the shoes’ soles to identify surface types and conducted experiments on four different ground surfaces. However, foot kinematics can be largely affected by the user’s intrinsic walking characteristics [31] and health conditions [41]. In contrast, LaserShoes uses laser speckle imaging to recognize ground surfaces of different textures, which is immune to differences in gait, and has been evaluated on 24 different ground surfaces and demonstrated robustness under a variety of environmental conditions.

2.2 Surface Texture Detection Techniques

Surface texture is a complex condition resulting from a combination of roughness (nano and micro-roughness), waviness (macro-roughness), and lay and flaw [43]. Surface texture recognition has been widely used in various application domains [2, 6, 35, 49]. Conventional approaches to texture recognition include microscopes and roughness meters which are expensive and often stationary, making them difficult to be instrumented on a user’s body as wearable devices. It is also possible to use tactile sensors with a portable form factor to identify surface textures [12, 38, 57, 60, 66]. These tactile sensors can augment the sensation of touch and assist surface texture recognition.

Closer to our system, several prior works have investigated optical approaches for surface texture recognition. For example, Su et al. [53] acquired subsurface scattering characteristics measured by time-of-flight (ToF) cameras to identify surface texture features. Researchers also combined a multi-spectral light source and an image sensor to recognize surface textures [20, 65]. These techniques, however, often rely on complex devices or multiple light sources, which are expensive to scale. In this research, we chose laser speckle imaging, a relatively low-cost approach consisting of mainly a laser and an image sensor, to capture surface texture features at high fidelity. We furthered this sensing approach into an end-to-end system and evaluated it with realistic surface and environmental conditions.

2.3 Laser Speckle Imaging

LaserShoes is closely related to prior works on Laser speckle imaging [19], which is a technique that uses an image sensor to obtain patterns in laser speckle images corresponding to surface textures when a beam of coherent light, such as a laser, illuminates the surface. This method has been used in a variety of fields. In the medical field, for example, it is used to monitor capillary perfusion in human skin tissues and brain blood flow maps in rodents [10, 11, 16]. In HCI, Laser Speckle Imaging has been used to recognize appliance use and home activities [54, 72] to achieve motion sensing and motion tracking [36, 48, 74]. And a non-contact force sensing can also be achieved by applying Laser Speckle Imaging to manifest surface deformation which is corresponded to force [40]. Furthermore, Laser Speckle Imaging techniques can be used to expose surface characteristics for surface material identification. SpecTrans [45] leverages Laser Speckle Imaging in conjunction with multi-spectral LED illumination to classify textureless, specular, and transparent materials for interactivity. Laser Speckle Imaging is highly sensitive and can even reveal small composition differences of materials that appear identical to human eyes. SensiCut [13], for example, applies this technique on a laser cutting machine to identify the pending materials before cutting, to improve its safety and workflow.

Our work leveraged this sensing approach to identify ground surfaces, a drastically different class of surfaces than the ones in the prior work. Our different application scenario comes with unique challenges such as the relative motion between a user’s feet and the ground surface. To overcome these challenges, we developed an end-to-end wearable system with a custom pre-processing phase to filter out blurry speckle images due to the motion effect, resulting in a robust system that we evaluated with a wide range of common ground surface types.

3 Principles of Operation

LaserShoes is based on two principles of operation: 1) we used Laser Speckle Imaging to detect ground surface textures, and 2) we used the variance of grayscale-converted frames from recorded videos to infer gait status and obtain speckle images with high quality.

Figure 2:

First, Laser Speckle Imaging can reveal surface texture characteristics. When a beam of coherent light (e.g., laser) illuminates a ground surface, the light will be reflected, and captured by a nearby image sensor, forming an image with laser speckles, as shown in Fig. 2 (a). This phenomenon occurs because ground surfaces are rough – the micro geometry of ground surfaces varies the optical paths of the laser beam. Thus, each pixel of the image sensor will receive the reflected laser beam with different constructive and destructive interference, forming laser speckles. Because different ground surfaces have different micro geometries, the resulting laser speckle patterns vary and could be leveraged to identify ground surfaces.

Second, we applied Laser Speckle Imaging with the consideration that a user’s feet could be in constant motion (e.g., walking and running) in relation to ground surfaces. The sensor’s movements relative to the ground manifest as the motion effect on images, resulting in blurry laser speckle images that have lower variances compared with those that have sharp speckles. As illustrated in Fig. 2 (b), the laser speckle images are much clearer when a user’s foot is in contact with the ground than when the foot is moving in the air. We utilized the variances of grayscale speckle images to identify the foot-ground contact period from recorded videos and used only speckle images collected from this period for the subsequent classification.

4 Hardware Design

We prototype LaserShoes to investigate the capabilities of laser imaging in ground surface detection. Although our current implementation is relatively bulky and impractical for direct adoption, our end-to-end prototype enables us to effectively verify our sensing principle, conduct technical evaluation, and explore potential applications. The form factor of our current prototype is akin to established works in the HCI community [7, 61, 68]. In this section, we introduce our hardware configurations and fabrication.

4.1 Embedded System

Figure 3:

We apply Laser Speckle Imaging to capture speckle patterns and recognize ground surfaces. The technique has been used in the HCI community and could be eye-safe [4]. To utilize this technique, our system consists of four parts: 1) a laser emitter, 2) an image sensor, 3) a Raspberry Pi board, and 4) assistant modules. The laser emitter and the image sensor compose the detecting component, while other parts compose the processing and assistant component. The hardware details of our system are shown in Fig. 3. Compared to prior works [20, 65], the core sensors bundled in our system are more compact to set on shoes. The enclosure of the system is 3D printed using photosensitive resin. The entire system and its manufacturing cost are $135.23, and the combined cost of the laser emitter and image sensor is $23.14. The cost of each component is shown in Table 1.

Table 1:

Module	Laser Emitter	Image Sensor	Switch Module	Raspberry Pi Board with USB interface board	Battery Module	Fabrication
Price ($)	7.00	16.14	0.52	76.86	17.57	17.14

Table 1: Costs of main components of LaserShoes.

Laser Emitter. We select a laser emitter with a 520nm wavelength and 5mW output power based on our configuration experiments (see Section 4.2.2). Given that using a low-power laser emitter will result in insufficient illumination and unclear speckle patterns, and that using a high-power laser may not be eye-safe, we ultimately choose a 5mW laser (Class IIIA) which is chronic viewing hazard but safe for transient exposures. Additionally, in order to have maximum laser reflection to preserve signal-to-noise ratio (SNR), we set the laser emitter vertical to ground surfaces.

Image Sensor. Given that our system is mounted on users’ shoes, it is subject to movement as users walk, leading to the loss of speckle information in parts of the image due to motion blur. To extract images with clear speckle patterns from captured videos, we select an OV2710 image sensor with a relatively high frame rate of 60 fps. We set the resolution of the image sensor as 1280 × 720 pixels, which is the highest resolution under the 60-fps frame rate. It is worth noting that our system does not use a lens because laser beams reflected by ground surfaces are always in focus, resulting in sharp speckle patterns that are distributed uniformly across the captured images when a user’s shoe is relatively still with respect to ground surfaces. To further improve SNR, the image sensor is placed right next to the laser emitter.

Raspberry Pi Board and Assistant Modules. For image acquisition and processing, we choose the Raspberry Pi Zero 2 W, for its compact size, superior speed, and wireless connectivity. With the connected laser emitter and image sensor, the Raspberry Pi board carries out three functions: 1) supplying power to the laser emitter from GPIO, 2) acquiring videos from the image sensor through a USB interface and 3) processing acquired videos and yielding the detected type of ground surface to users. The assistant modules include a battery module, a USB interface module, and a switch module to safely supply power to the entire system.

4.2 Configurations

In order to identify the optimal configuration of our system, we conducted experiments using various combinations of laser wavelengths and distances, as they are two significant factors affecting the formation of laser speckles, and investigated their performance in surface classification. In these experiments, we used an image sensor which was a model commonly used on webcams with a pixel size of 3μm × 3μm.

4.2.1 Image sensor.

Given that our system operates in a moving scenario, an image sensor with a sufficient frame rate is required to ensure the quality of captured videos and to extract clear speckle patterns from those videos. Through experiments in which we collected videos while researchers with the camera configured at different frame rates were walking at their normal speed, we discovered that the standard 30-fps frame rate is insufficient due to the motion effect, resulting in an excessive number of blurry images. On the other end, sensors with higher frame rates are often costly, which contradicts our design goal of being low-cost. As a result, we choose a frame rate of 60 and rely on a custom pre-processing pipeline to mitigate the motion blur (see in Section 5.1).

4.2.2 Wavelength and distance.

Figure 4:

Since infrared lasers are difficult to debug, we selected wavelengths of laser in the visible spectrum. Specially, in our experiments, we investigated 4 different representative laser wavelengths (405nm, 450nm, 520nm, and 650nm). In terms of distance, considering that our system is intended to be fixed on shoes, which often hold a relatively short distance with ground surfaces, we kept the distance as short as possible while maintaining sufficient clearance for the light path (i.e., from the emitter to ground surfaces and back to the image sensor). Thus, in this case, for each wavelength, we investigated its performance at distances with the ground surfaces of 1cm, 3cm, 5cm, 7cm, 9cm, 11cm, 13cm and 15cm (Fig. 4).

For each wavelength-and-distance combination, we collected a number of images with speckle patterns on five surfaces (wood, fabric, concrete, rubber, and ceramic). During the collection, we manually swapped the laser emitter of different wavelengths and adjusted the sensor distance to the ground surface. In order to evaluate the qualities of these images, we conducted a quick validation using ResNet-18 [21], with collected images split into a training set and a testing set. Our assumption is that laser speckle images with high-quality speckle patterns will yield relatively high classification accuracy, revealing optimal wavelength-and-distance combinations.

The average classification accuracies and their standard deviations of all wavelength-and-distance combinations are shown in Appendix A. Results indicate that the green laser (520nm) exhibits both high accuracy and stability, though almost all combinations reach high classification accuracies. When the distance is under 11cm, the accuracies of the green laser are all above $98\%$. Thus, in our subsequent studies, we choose the green laser with a 520nm wavelength and set the distance between the sensor and ground surfaces to under 11cm when affixing the sensor to users’ shoes.

4.3 Mechanical Structure and Fabrication

We build a mechanical structure of two modules that can achieve angle adjustment of the detecting component to ground surfaces and the fixation of the system on a user’s leg (Fig. 3). The first module consists of five parts: two semi-cubic shells forming a container (b11, b13), a limiter with two cylindrical channels (b12), a cylindrical housing (b14), and a clamping part (b15). The two semi-cubic shell surfaces are joined together into a cube container by screws on the side. The image sensor is fixed inside the cube housing via slots in the four corners of the cube container’s inner side, and the laser is fixed on the bottom side of the cube housing via a fixture (b12). A number of rivet structures are used to connect the cube container to the column housing b14, and to implement the rotatable connection between the column housing b14 and the clamping part (b15). Screws are used to secure a series of discontinuous holes in the column housing and the clamping part, allowing an adjustable angle between the cube container and the clamping part, ranging from 0 to 90 degrees in a 15-degree step. As the clamping part of the first module is fixed to the outer side of a user’s ankle, adjusting the angle between the cube container and the clamping part changes the angle between the laser sensing beam with the user’s leg and thus with the ground surfaces.

The second module contains four parts: a supporting part (b8), a square housing (b9), a top lid (b10), and a controller box (b5). Among these, b8, b9, and b10 are jointed by three studs on the corners to form a container for the combined structure of the Raspberry Pi board and the battery module. The container measures approximately 65.7mm in length, 30.6mm in width, and 46.0mm in height. The USB port and the charging port are reserved for the exterior of the container. The controller box (b5) contains the switch module and is attached to the rest of the module with a side slide. This module is fixed to the outside of the user’s lower leg with straps fitting through b8 and the main structure of the container is kept away from the user’s skin to avoid possible discomfort due to the heat dissipation of our system. The above mechanical structures are 3D printed with photosensitive resin at a 0.05 mm resolution using a Lite600HD 3D printer.

5 Ground Surface Detection

Figure 5:

The whole ground surface detection pipeline of LaserShoes is illustrated in Fig. 5. LaserShoes device is expected to work despite the constant motion with ground surfaces while users are walking. Every 90 frames are treated as a video session, taking about 1.5 seconds to collect. This duration is selected for our observation that at least one foot-ground contact would appear in the video session when users walk at normal speeds.

Video sessions are fed into our ground surface detection system, which consists of a pre-processing phase and a deep learning model for classification. Specifically, with this pre-processing phase, we select images with clear speckle patterns from the collected videos and crop selected speckle images into small images before feeding them into a deep learning model for classification, as a data enhancement technique to increase our data collection efficiency. This pre-processing phase allows LaserShoes to deal with distance change and motion blur caused by users’ gait.

5.1 Data Pre-processing

The motion of users’ feet causes the speckle patterns to be blurry and thus contain little information on ground surfaces (Fig. 2 (b)). To achieve high detection accuracy, it is necessary to extract high-quality images with clear speckle patterns. Our pre-processing phase contains four stages (Fig. 5 (b)-(e)), including 1) identifying the foot-ground contact periods, 2) cropping images, 3) removing partial blurry images, and 4) removing fuzzy patterns. Specifically, we first identify images collected from foot-ground contact periods. We then crop these foot-ground contact images into small images with the size of 256 × 256. We discard cropped images with partial blur or fuzzy patterns. After the pre-processing phase, we obtain a group of cropped images with clear speckle patterns to feed into our deep-learning model. The details of each stage of this pre-processing phase are explained below, and the efficacy of the data pre-processing is discussed in Section 8.1.

5.1.1 Identifying foot-ground contact periods with variance-based threshold.

We observe that the distribution of bright and dark regions in speckle images contains the majority of information about ground surfaces, and that color is not a significant factor. Therefore, to increase the efficiency of our pre-processing phase, we convert all speckle images to grayscale.

The first step, after acquiring the grayscale frames, is to identify speckle images that correspond to the foot-ground contact period. These images are often less blurry, revealing much information about ground surfaces. We note that, when LaserShoes is moving in relation to the ground, the collected speckle images are less visible, resulting in lower variances of pixel intensities across an image for the edge of the speckle patterns being fuzzy. Fig. 6 shows some example speckle images from the foot-ground contact period and from a user’s foot in motion, illustrating the difference in blurriness. Hence, by comparing the variances of pixels, we identify speckle images that are collected from the foot-ground contact period and pass them to the next stage.

We calculate the grayscale variance of each speckle image in each video session. Then, for each speckle image, we recognize it as one collected from the foot-ground contact period if it has a cross-pixel variance that is larger than the top 8% variance value of the previous 90-frame video segment. To further improve robustness, we use adjacent images to aid in identification – we consider a speckle image to be a foot-ground contact image only when both its previous frame and next frame have high variance. Finally, before feeding these selected images into the next stage, we conduct a center crop on them for the lack of sensitivity at the edges of the CCD module, not being able to output clear laser speckles. The pseudo-code of this pre-processing stage is shown in Algorithm 1 .

Figure 6:

5.1.2 Cropping images.

The first stage yields foot-ground contact images of 1024 × 592 pixels. We conduct a test to investigate the effect of image size on the detection performance in Section 5.3, and choose 256 × 256 pixels as the size of our input data. Specifically, we use an extraction window of that size to crop out input images from each foot-ground contact image. This cropping operation also increases the number of samples and improves the efficiency of deep learning model training.

However, within each foot-ground contact image, some regions may still be blurry while others have clear speckle patterns. We eliminate those with blurry speckle patterns in this stage to further improve our system’s robustness. Instead of using the intuitive approach to calculating pixel variance of all cropped images, which could be computationally expensive, we calculate the variances of the cropped images along the left edge of a foot-ground contact image to decide the blurriness of rows in these cropped images reside. We note that the distributions of speckle patterns in each image row are often similar to the rolling shutter of our image sensor (Fig. 6). Thus, we can determine if a row has clear speckle patterns by inspecting only one section of it. Specifically, we slide the extraction window in the y direction to crop out different image patches and check whether they are clear by thresholding their pixel variances (Fig. 5). The slide stride is 56 pixels, and thus for each foot-ground contact image, six cropped images will be extracted. If a cropped image has a variance higher than the top 20 percent of all variances of all foot-ground contact images belonging to the current video session, we consider it to have clear speckle patterns and save it in a buffer. We also save the indexes of these cropped images for sliding the extraction window along rows of these indexes with a 128-pixel stride. The extracted patches from this step are candidate images. Histogram equalization is applied to candidate images to amplify their contrast. All candidate images are fed into the next pre-processing stage after histogram equalization. Algorithm 2 shows the pseudo-code of this stage.

5.1.3 Removing partial blurry images with region-based sum comparison.

There could still be blurry images resulting from the aforementioned stages. To eliminate these images, we design an additional pre-processing stage for fine selection. Because the contrasts of these potentially blurry candidate images become much larger after histogram equalization, the pixel variances of different regions of these images all vary greatly (shown in Fig. 7 (a)). Thus, to identify blurry images, each candidate image is equally divided into four sub-images. We calculate the sum of the grayscale values of every sub-image and eliminate the candidate image if the difference between any two sums exceeds a given threshold. The rest of the candidate images are then fed into the final pre-processing stage.

5.1.4 Removing fuzzy patterns with Gabor filter.

Since there may still be relative motions between our sensor and ground surfaces during the foot-ground contact period due to the deformation of ground surfaces, fuzzy patterns can be generated in the speckle images. These fuzzy patterns often appear as stripes oriented in a particular direction, while clear speckle images have patterns with no obvious orientation (as shown in Fig. 7(b) and (c)). To remove images with fuzzy patterns, we apply 8 Gabor Filters with different directions (30, 60, 120, 150, 210, 240, 300, and 330 degrees) and remove those with unbalanced filtered results. Specifically, we eliminate an image if there is a difference between any two filtered results greater than a given threshold. The candidate images that are not eliminated by the third and fourth stages are the output of our pre-processing phase and are the input to the deep learning model. The pseudo-code for these two pre-processing stages is described in Algorithm 3 .

Figure 7:

5.2 Deep Learning Model

Image classification is a mature field in Computer Vision (CV), and many deep learning algorithms have shown remarkable performance. To choose a proper model for our sensing, we conduct a comparison study with different models, including ResNet-18[21], VGG [47], GoogleNet [55], and MobileNetV3 [24]. As shown in Table 2, ResNet-18 and GoogleNet achieve comparatively high accuracies. We eventually choose ResNet-18 to implement LaserShoes for its smaller size, despite its slightly lower accuracy than GoogleNet.

In the ResNet model, input images first pass through a convolution layer, a batch normalization (BN) layer, and a rectified linear unit (ReLU) layer. The data then goes through a series of basic blocks which consists of a residual mapping and an identity mapping. For the residual mapping, the input passes through a convolution layer, a BN layer, a ReLU layer, another convolution layer, and another BN layer, while for the identity mapping, the input only passes through a 1 × 1 convolution layer to be downsampled to the same size as the residual mapping result. Then the two mapping results are added and the sum passes through a ReLU layer to get the output of a basic block. Finally, an average pooling and a full connection layer are operated to obtain the classification results. During training, we select Cross Entropy Loss as the loss function and use the Adam optimizer. The learning rate and the batch size are set to 0.0001 and 32, respectively. We do not use a pre-trained model to initialize our parameters and use 150 epochs for the model training because we find that it is enough for our models to be converged.

Table 2:

Model	ResNet-18	VGG-16	GoogleNet	MobileNetV3
Accuracy	$88.95\%$	$79.50\%$	$89.95\%$	$77.88\%$
Model Size	42.8M	512.6M	48.2M	6.3M

Table 2: Classification accuracy results of different models and their model sizes.

5.3 Image Size Selection

The model’s input is the clear candidate images from the data pre-processing phase, and the model’s output is the type of ground surfaces. Image size is set to 256 × 256 in our ground surface detection, the same as the size used in SensiCut [13]. To verify the efficacy of this image size, we extract a number of clear candidate images with different sizes to train a series of ResNet-18 models. The experimented image sizes included 64 × 64, 128 × 128, 256 × 256, and 512 × 512. The results of average accuracy and inference time for the classification of one input image are shown in Table 3. As expected, input images with larger sizes lead to higher accuracy but take significantly longer to classify. Given the improvement in accuracy is modest from 256 × 256 to 512 × 512, we select 256 × 256 as the size of the input images to our model to balance accuracy with inference time.

Table 3:

Image Size	64 × 64	128 × 128	256 × 256	512 × 512
Accuracy	$48.33\%$	$66.95\%$	$88.95\%$	$94.67\%$
Inference Time	2ms	4ms	17ms	45ms

Table 3: Classification accuracy and inference time for one image with various input image sizes.

5.4 Real-Time Inference

In real-time detection, the image sensor continually records frames, and every 90 frames constitute a video session that is fed into the pre-processing stage. If no clear candidate images are detected by the pre-processing phase, the detection pipeline outputs “None” as a neutral label. We conduct testing using 100 video sessions captured during participants’ normal walks on various everyday ground surfaces. Our result shows that for every video session, after the data pre-process phase, the average number of input images fed into the subsequent model is 11. We use C++ for implementing the data pre-processing for a superior speed and use Python for implementing the deep learning model. For every input image of a video session, the classification model will output a corresponding surface type. Among all these types, we choose the surface type that appears the most frequently as the surface label of this video session. And the label is also provided to the user as the detection feedback. We record the average time needed for completing the pre-processing and inference of one video session, with 100 sessions collected from various participants and ground surfaces processed on a Raspberry Pi Zero 2 W, a laptop with a CPU of 3.1 GHz dual-core Intel Core i5, and a GPU of NVIDIA GeForce RTX 3090 respectively. Results are shown in Table 4. We find that the current implementation of LaserShoes running solely on the Raspberry Pi board cannot perform real-time detection without dropping input images if the duty cycle of users’ feet contacting ground surfaces is too high, which we acknowledge as a limitation of our system.

Table 4:

Model	Pre-processing	Inference	Total
Laptop CPU	99ms	1211ms	1310ms
Embedded System	696ms	6082ms	6778ms
GPU	75ms	194ms	269ms

Table 4: Data processing pipeline average run time for one session on various devices.

6 Evaluation

Our user study consisted of one main study and three supplementary investigations. The main study involved collecting data with 24 ground surfaces to understand LaserShoes ’ ability to classify the ground surface material while its wearer is walking. In the supplementary studies, we aimed to evaluate the robustness of LaserShoes under various conditions (i.e., on dry, wet, and icy surfaces, on sand surfaces of different grain sizes, and under different lighting conditions).

Considering that when pre-processing, identifying the foot-ground contact periods of a 1.5s video session is the first stage and is the basis of the subsequent pre-processing stages, a high detection accuracy (DA) of identifying the foot-ground contact period (FGCP) is necessary. Thus, we first evaluated this detection accuracy, which is defined as

\[\text{DA}= \frac{\#~\text{detected 1.5s video sessions containing FGCP}}{\#~\text{all 1.5s video sessions containing FGCP}}.\]

Then, we used accuracy, precision, recall, and F1 score as our evaluation metrics for the ground surface classification. To calculate them, we only considered the 1.5s video sessions that have surface label (SL) output and eliminated those with “None” signals. The classification accuracy (CA) is defined as

\[\text{CA}= \frac{\#~\text{correctly classified 1.5s video sessions with SL output}}{\#~\text{all 1.5s video sessions with SL output}}.\]

6.1 Main Study with 24 Ground Surface

6.1.1 Ground surface materials.

We selected a total of 24 common ground surfaces, comprising 15 indoor surfaces and 9 outdoor surfaces, for our study. These surfaces could be classified into five groups: 1) rough, 2) smooth, 3) hard, 4) discontinuous, and 5) granular. These surfaces are shown in detail in Fig. 8. For each ground surface, we prepared at least one continuous area of 20 square meters in size to allow our participants to walk naturally (e.g., not need to frequently turn or turn back, not need to keep looking down the ground) during data collection for our study.

Figure 8:

6.1.2 Participants and apparatus.

We recruited 15 participants (7 males and 8 females), with ages ranging from 20 to 27 years old (mean = 23.40, SD = 1.56) via social media and flyers. Their body weights ranged from 48.0kg to 82.6kg (mean = 61.03, SD = 9.93) and their heights ranged from 158.5cm to 182.0cm (mean = 170.13, SD = 6.83). Of all the participants, 5 wore sneakers, 6 wore running shoes, 3 wore canvas shoes, 1 wore ankle boots, and 1 wore snow boots. Their shoe sizes ranged from 23.0cm to 27.0cm, with a mean of 24.67 (SD = 1.12).

Participants wore their own shoes normally and our LaserShoes as described in Section 4 to collect videos from ground surfaces while participants were walking on them. Considering that our device requires proximity to ground surfaces, we required participants to wear flat shoes. Fig. 9 shows some example shoe styles that LaserShoes is compatible with. Distances between our image sensor and ground surfaces in the study varied from 6cm to 10cm across the 15 participants. The detection component was attached tightly to participants’ shoes through our designed clamping mechanism, while the processing and assistant component was attached to participants’ lower legs using Nylon tapes.

Figure 9:

6.1.3 Data collection procedure.

We started the study with an introduction of the procedure and helped the participant put the devices on. For each surface, we used tapes to indicate an area that the participants could walk on. Participants were allowed to walk freely in the area. Each study had two sessions. A short practice session was at the start, where the participant walked through all surfaces. This session was used to familiarize participants with the system and no data was collected. We asked the participant to slow down their walk if no clear speckle patterns could be captured by LaserShoes (i.e., output from the pre-processing phase). After the practice session, the participants were asked to walk on each chosen surface for 1~2 minutes in the second session for data collection. The order of the surfaces each participant needed to walk on was randomized to avoid bias (e.g., a change in walking speed or gait caused by fatigue). In addition, in order to simulate real-world scenarios, participants were asked to adjust their LaserShoes after each session and to take breaks in between sessions (around 2 mins). The study was conducted under typical indoor and outdoor lighting conditions. To collect the ground truth of foot-ground contact periods, a camera was set up to record the foot movements of participants during the study and research assistants labeled all foot-ground contact timestamps manually. In total, we collected 28,492 1.5s video sessions on 24 surfaces from the 15 participants. And it took around 2 hours for each participant to finish the data collection.

6.1.4 Results.

To evaluate the performance of our system for ground surface classification, we used both within-user and cross-user approaches. For within-user evaluation, to ensure there is no overlapping between the training set and the test set, we first split all data into ten folders and randomly selected two folders as the test datasets. Of note that no time-adjacent input images were included in both training or test datasets. For cross-user evaluation, we used leave-one-out evaluation methods using 14 participants’ data to train and the remaining one to test.

Figure 10:

Detection Accuracy of Identifying Foot-Ground Contact Periods. The collected videos were processed using the method described in Section 5.1 and we first evaluated the performance of identifying foot-ground contact periods using the formula defined above. The detection accuracy is $90.91\%$, indicating that our method can detect the majority of foot-ground contact periods from recorded data.

Within-User Evaluation Results. Results of the within-user detection accuracy for 24 ground surfaces are shown in Fig. 10 (a). The average classification accuracy of the 24 ground surfaces is $86.93\%$, with the recall of $87.17\%$ (SD = 10.09), the precision of $85.82\%$ (SD = 13.57) and the F1 score of $85.94\%$ (SD = 10.59). For 15 indoor surfaces, the average classification accuracy is $91.53\%$, with the recall of $90.60\%$ (SD = 9.62), the precision of $92.48\%$ (SD = 7.23) and the F1 score of $91.23\%$ (SD = 7.06), while for 9 outdoor surfaces, the average classification accuracy is $78.86\%$, with the recall of $81.46\%$ (SD = 8.07), the precision of $74.73\%$ (SD = 14.39) and the F1 score of $77.13\%$ (SD = 9.58). Indoor surface detection is more accurate than outdoor surface detection. The reason for this could be that the light condition outside is less stable than it is indoors due to changes in intensity and angle of sunlight. This may reduce the quality of collected images, resulting in poor detection results.

Besides, we also evaluated the detection accuracy of surfaces with different characteristics and the results are shown in Table 5. The results show that rough surfaces have the highest accuracy and the lowest standard deviation among the five surface groups with varying characteristics. This makes sense because the microstructure of rough surfaces is more complex, resulting in more subtle patterns. Furthermore, discontinuous surfaces have the lowest average accuracy and a large standard deviation.

Table 5:

	Rough	Smooth	Hard	Discontinuous	Granular
Within-user	91.77 ± 6.92	86.57 ± 12.98	84.03 ± 7.66	83.40 ± 11.87	85.33 ± 9.91
Cross-user	79.93 ± 9.67	87.04 ± 10.79	82.98 ± 11.00	72.42 ± 4.41	77.24 ± 8.87

Table 5: Average Accuracy (%) and SD results of the within-user model and cross-user model for five surface characteristics.

Table 6:

Lighting Conditions	Indoor-with-light	Indoor-without-light	Outdoor-at-daytime	Outdoor-at-dusk	Outdoor-at-night
Accuracy (%)	90.05	88.99	71.85	90.94	87.69

Table 6: Ground surface classification results in different lighting conditions.

Cross-User Evaluation Results. For cross-user evaluation, the detection results are shown in Fig. 10 (b). The average classification accuracy of the cross-user model is $80.57\%$, with the recall of $80.36\%$ (SD = 10.48), the precision of $78.32\%$ (SD = 17.62) and the F1 score of $78.73\%$ (SD = 13.86). For indoors and outdoors, the average classification accuracy are $83.22\%$ and $73.13\%$, with the recalls of $85.48\%$ (SD = 8.95) and $71.85\%$ (SD = 6.56), the precision of $87.79\%$ (SD = 10.00) and $62.54\%$ (SD = 16.21), and the F1 scores of $86.39\%$ (SD = 8.45) and $65.97\%$ (SD = 11.53), respectively. In contrast to within-user results, classification accuracy decreases in the cross-user evaluation. This could be due to the fact that participants were wearing different shoes in the study, which caused different distances between the image sensor and ground surfaces. Furthermore, different foot postures of participants when their feet come into contact with ground surfaces contribute to a decrease in accuracy. Some participants’ feet were in aversion, while others were in inversion or in neutral positions. These different foot postures (shown in Fig. 11) cause a distance change between the image sensor and ground surfaces. The distance differences result in differently formed speckle patterns and thus variance between training and test datasets – the same type of ground surface may correspond to multiple speckle patterns. This variance may decrease the accuracy of the cross-user evaluation. And indoor detection, like within-user results, outperformed outdoor detection.

Figure 11:

We also tested the performance of the cross-user model for five groups of surfaces with different characteristics. The results are shown in Table 5, which indicates that compared to within-user results, detection accuracy did not change a lot for smooth and hard surfaces. However, for rough, discontinuous, and granular surfaces, there is a large decrease. The reason may be that surfaces with complex microstructure amplified the difference in participants’ foot postures, resulting in larger differences of speckle patterns belonging to the same type of ground surfaces.

Visually Similar Ground Surfaces. Among our selected ground surfaces, light-colored wood and artificial flooring look very similar, which are not easy to distinguish by conventional RGB cameras. The results shown in Fig. 10 reveal that in both within-user and cross-user conditions, these two visually similar surfaces can be distinguished from the other one with LaserShoes.

6.2 Supplementary Investigation

Given the length of the primary data collection, the supplementary study is not conducted on the same day to avoid the fatigue of participants. 12 participants took part in our supplementary studies. The basic procedure was the same as the main study procedure. We finally collected 19,319, 4,250, and, 41,005 1.5s video sessions for each study, respectively.

6.2.1 Dry, wet, and icy surfaces.

In outdoor settings, ground surfaces could be dry, wet, or icy due to different types of weather. This may pose a potential danger to pedestrians. Thus, the sensing capability of LaserShoes to identify ground surface conditions could have real-world uses. We conducted experiments to classify ground surface conditions on nine types of outdoor surfaces, shown in Fig. 8, under three conditions (i.e, dry, wet, and icy). For the wet condition, we poured water on the ground while for the icy condition, we put crushed ice on the ground. We conducted two evaluations in this study. We treated each combination of surface and condition as a separate label (27 in total) in the first validation. In the second evaluation, we combined all of the surfaces of the icy condition into one label (19 in total). The detailed results are shown in Fig. 12. In the first evaluation, the detection model has a $62.89\%$ recall, a $66.06\%$ precision, and a $59.91\%$ F1. In the second evaluation, after merging icy surfaces, the detection model has a $76.06\%$ recall, a $76.75\%$ precision, and a $74.29\%$ F1. These results show the feasibility of LaserShoes detecting ground surface conditions in real-world applications to improve pedestrian safety.

Figure 12:

6.2.2 Sand surfaces with different grain sizes.

Even when the material is the same, the physical state of the material (e.g., graininess, looseness) can vary. We also investigated how LaserShoes could perform finer-grained ground surface material sensing. Participants were asked to walk on three different types of sand surfaces with the same procedure as the main study. To be more specific, we assess the classification performance using data collected on sand surfaces with sands of three different grain sizes (i.e., small, medium, and large). The classification accuracy for the sand types is $92.28\%$ with an $87.60\%$ recall, a $95.56\%$ precision, and a $90.59\%$ F1, which indicates that LaserShoes could identify the same type of surfaces with different fine-grained surface geometries.

6.2.3 Different lighting conditions.

Lighting conditions may affect the quality of speckle images and thus the ground surface detection performance. To test the robustness of LaserShoes against this factor, we collected data in five different lighting conditions. These conditions included two for the 15 indoor surfaces and three for the 9 outdoor surfaces, and are listed as follows:

•

Indoor-with-light: lamps (cold light source) on in a room.

•

Indoor-without-light: lamps off in a room.

•

Outdoor-at-daytime: much sunlight outdoors at daytime.

•

Outdoor-at-dusk: little sunlight outdoors at dusk.

•

Outdoor-at-night: no sunlight, with streetlamps on, outdoors at night.

Figure 13:

We trained five classification models, each using the data collected under different lighting conditions. Table 6 shows the average surface classification accuracies for the five different lighting conditions. The results demonstrate that, with the exception of the outdoor-at-daytime condition, the classification accuracy for all other conditions was above 87%. This indicates the robustness of LaserShoes, except under lighting conditions with strong ambient light, which requires further improvement.

7 Application

To demonstrate our system as a real-time assistant in many use cases by sensing ground surfaces, we developed five application examples as shown in Fig. 14.

Figure 14:

7.1 Personal Running Assistant

A considerable amount of research has been dedicated to assisting and promoting running activity. For instance, sensing techniques have been developed to help users understand their body (e.g., tracking kinesiological data about feet and gait) [59], some data-driven interfaces are designed to motivate users’ actions [34], while others are proposed to support natural navigation running in unknown places [28, 46]. Some previous works have taken the form of smart shoes, which people envision as being capable of adapting to different terrains to improve runner performance and health in the future, becoming an active support tool [33].

However, there are currently few smart shoes that can yield rich terrain surface information that one can use to correlate with running experience. For example, a cross-country runner who runs over a variety of ground surfaces of varying difficulty levels may want to understand how running performance is related to the ground surface. Our body has different reactions and biomechanical demands with different types of ground [3, 15]. For instance, the degree of compliance of the ground surface will impact the speed of energy transfer between people’s foot and ground surface, resulting in different foot-ground contact time and energy consumption [23, 32]. In this case, LaserShoes could be used to support running analysis and yield guidance with fine granularity. Fig. 14 (a) shows an example of using LaserShoes to support running analysis. During the running trial, the user ran over various ground surfaces such as carpet, rubber, asphalt, and discontinuous brick, and LaserShoes detected these different surfaces. As a result, the detection results could be used to generate reports for each surface, such as time, speed, and energy consumption.

7.2 Gait Analysis

Gait parameters variability is an important diagnostic indicator of health [41], related to both the quality of life and mortality [51], correlating with the rehabilitation degree of specific joint injuries [56], and thus has received significant attention to both clinicians and researchers. However, the terrain type can significantly influence the gait pattern [31, 50], which underscores the need to consider different terrain types while analyzing. Our LaserShoes can be used to support such analysis. Specifically, as shown in Fig. 14 (b), when the user steps on soft surfaces like sand and mud, her gait will change due to the softness of the surface. However, when stepping on hard surfaces like asphalt, the user can maintain a normal gait. We can incorporate a simple IMU module into our LaserShoes to monitor users’ gait information, as well as use LaserShoes to collect terrain ground surface information. In this case, the additional information can be leveraged to examine how gait is changed on various types of ground surfaces, providing insights that could be of use in medical applications.

7.3 Cleaning Equipment Auto-Control

There is a wide variety of cleaning equipment (e.g., UnoClean ¹), designed for indoor and outdoor applications. Many advanced cleaning machines have numerous cleaning modes (e.g., vacuuming power, whether water is used) for various types of ground surface and dirt cleaning needs. For example, the cleaning mode used for grass cannot be directly applied to real leather carpets. A high-power mode will likely damage the carpet. In this case, users must frequently change the working mode due to the various physical forms and chemical compositions of the ground surface. As the variety of decoration materials in our living environments grows, automatically switching the cleaning machine’s working mode based on the floor material can provide much convenience and reduce errors in our daily cleaning tasks. As shown in Fig. 14 (c), if the user wears LaserShoes while cleaning, our system detects the material of the floor the user is walking on, such as ceramic, carpet, or wood, and automatically changes the cleaning mode of the machine. Similar to floor cleaning equipment, other types of mobile tools (e.g., pressure washer, leaf blower) or even smart devices (e.g., smartphones, AR/VR headsets) could also leverage ground surface as side-channel information to improve their performances.

7.4 Coarse Navigation

Navigation tools have greatly facilitated our lives. Even with GPS navigation, people might get disoriented in outdoor places with complex layouts or crowded areas. GPS also does not work in indoor settings such as museums, airports, and shopping malls [5]. These environments often have floors made of various materials. For example, different stores in the mall may have different decorative floor materials. The route for outdoor running may include grass, gravel, asphalt, and other surfaces. LaserShoes can infer coarse user locations from ground surfaces and alert users when they are off-course. As shown in Fig. 14 (d), the proper route for the user is the sidewalk made of bricks. However, if LaserShoes detects that the current surface is asphalt, its user is likely on the wrong route and will receive an alert.

The negation system can also be applied to accessibility for which we envision LaserShoes to work in concert with accessible infrastructure in urban environments. Visually impaired individuals could rely on additional information (e.g., tactile feedback of ground surfaces) to acquire spatial awareness [63]. Previous research attempted to design physical tactile maps to enable users to access information with audio [22, 27, 44, 52], tactile [58], and a combination of tactile and audio feedback [18, 25, 39, 64]. Instead of relying solely on the tactile sensation of users’ feet (e.g., tactile ground surfaces, blind pathways), LaserShoes could sense ground surfaces for users, providing an alternative solution that could take advantage of sensory substitution techniques – converting ground textures into sounds to guide visually impaired individuals to stay on track of pathways that are safe for them.

7.5 Daily Activity Recognition through Localization

Recognized activities provide rich contextual information to support natural human-computer interactions. Statistical analysis of a person’s behavior in an environmental space helps with the inference for the design of the space and the user’s lifestyle. For instance, logged activity data can be used to help older adults to encourage healthy daily routines and active lifestyles, and to monitor chronic health conditions and enjoyment [29]. Among all types of in-home contextual information, the ground surface texture is often unique to living spaces of different functions. For example, the bathroom floor is typically made of easy-to-clean and waterproof tile surfaces, the bedroom floor soft carpets or rugs, and the living room wood or plastic floor materials. In this case, we can use LaserShoes to recognize the ground texture and determine which space the user is in to coarsely infer their activities. As shown in Fig. 14 (e), when LaserShoes detects carpet, the user is more likely to be relaxing in the living room. However, when LaserShoes detects wood floors, the user is more likely to be working in the study. We can, for example, alert users when they spend too much time on the toilet (e.g., playing with smartphones), which is detrimental to their health.

8 Discussion

8.1 Efficacy of Data Pre-processing

Although machine learning models are somewhat resilient to noisy data points, they require more computation power during inference. To alleviate the such burden, a denoising process is commonly performed prior to feeding into machine learning models [68, 69]. In our case, if we do not remove blurry images, the time consumption for inference will be large, which is opposite to our goal of real-time prediction. Even if we only extract one image by cropping one raw frame and do not perform data pre-processing, the number of images from one video session fed into the classification model will be 90. However, the average number of images fed into the classification model after data pre-processing is 11, indicating that our data pre-processing step can significantly reduce computation costs during inference.

Figure 15:

Further, to evaluate the influence of the data pre-processing step in terms of ground surface classification performance, we conducted experiments on data collected from one of our participants. The experiment procedure is the same as our main study except that we replaced the pre-processing part with cropping one 256 × 256 image from each frame. For the classification model trained with raw data, the recall, precision, and F1 are $64.25\%$, $67.22\%$, and $60.60\%$, respectively. For the classification model trained with data after pre-processing, the recall, precision, and F1 are $88.45\%$, $88.05\%$, and $87.60\%$, respectively. Therefore, conducting our data pre-processing step can achieve better performance compared to using raw data.

8.2 Avoid Overfitting

Overfitting is a common issue in deep learning applications, especially when the number of training samples is small. To prevent the deep model in our system from overfitting, common techniques, including data augmentation, and normalization, were applied during the training process. Besides, as described in Section 5.1.2, cropping a raw speckle image to generate multiple smaller input images helps increase the number of training samples. Moreover, we set the number of training epochs of the model to 150, after we conducted experiments using a validation set of data and found that the training loss converged while the validation loss did not degrade at around 150 epochs. The evaluation results with high classification accuracies, especially those from the cross-user study, demonstrate effective mitigation of overfitting.

8.3 Power Consumption

There are three main parts that consume power in our system: The laser emitter with a switch module (51.3 mW), the image sensor (1047.9 mW), and the Raspberry Pi (2643.6 mW). LaserShoes has a relatively high total consumption, which prevents it from being continuously used for a long time without battery exchange. In the future, to reduce the power consumption on Raspberry Pi, the collected data could be transferred to a cloud server via low-power wireless communications. We could also design a custom circuit and reduce power consumption by removing components that are not in use and using low-power MCU and communication modules. Besides, the current image sensor captures images of 1280 × 720 pixels for efficient data collection. However, in live classification, the input images need not be that large, possibly taken by smaller image sensors to preserve power.

8.4 Sense Surfaces ahead for Early Alert

Since LaserShoes uses images captured when a user’s foot is in contact with the ground, our system in its current implementation could not predict ground surface conditions in advance, limiting use scenarios such as alerts of dangerous surface conditions. To achieve this, LaserShoes should be able to leverage in-flight images. To mitigate the motion effect, we could add an IMU sensor to measure motion speed and implement deblurring methods [9, 17]. Image sensors with a short exposure time could also help to obtain clear images when the user’s foot is moving in the air. Second, we could tilt up our device to sense ground surfaces in front of a user for early alerts (Fig. 15 (a)). We performed a test to see if our sensing system could still function with our device tilted up, pointing to the front of a shoe. Results indicate discernable speckle patterns up to 45 degrees for the three types of surfaces we tested (Fig. 15 (b)). However, it merits future research to investigate how this sensor configuration could work in real-use cases powered by real-time signal processing and classification.

8.5 Loose or Transparent Ground Surfaces

In practice, we discover that LaserShoes cannot capture frames with high-quality speckle patterns on loose ground surfaces such as grass for insufficient reflected light intensity. We suspect that grass surfaces diffused or absorbed most of the laser energy due to their layered surface micro geometries. Besides, for transparent ground surfaces such as glass (Fig. 15 (b)), the reflected laser is also weakened. Though speckle patterns can still be formed on transparent ground surfaces, information on the textured surfaces underneath the transparent coating layer is much deluded, resulting in less discernable speckle patterns than ones induced on surfaces without the transparent coating laser.

Figure 16:

8.6 LaserShoes under Intense Ambient Light

When the ambient light is too intense, the image sensor receives too much ambient light, which lowers the signal-to-noise ratio (SNR). As a result, the speckle patterns become blurry or invisible under some outdoor conditions in our study. To mitigate this issue, future systems could leverage optical filters. Given that laser light is polarized and has a narrow frequency band, we could include a polarizer or a band-pass filter between ground surfaces and the image sensor. These filters could make the laser a dominant signal on captured laser speckle images that have sufficient SNR for classification. Another tactic to preserve SNR is to implement synchronous detection, with the image sensor and the laser in sync. Specifically, we could leverage high-speed image sensors to take two consecutive photos with and without the laser turning on. The subtraction between these two consecutive photos should reveal little effect imposed by the ambient light which is relatively constant and therefore the effect could be subtracted out.

8.7 Form Factor Optimization

Our current implementation is relatively bulky. Furthermore, different image sensor heights, which are affected by shoe styles and foot postures, will reduce ground detection accuracy as discussed in Section 6.1.4. In the future, LaserShoes could be replicated with better form factor designs.

One possible solution is to make the height of the image sensor consistent across shoe styles by adding a height-adjustable mechanical module as shown in Fig. 16 (a). This module could also mitigate variances introduced by the foot posture by asking users to calibrate and adjust LaserShoes before use.

Since the diode of a laser emitter and the chip of an image sensor are both very small, they can be combined into a single integrated component that might be sufficiently thin to be integrated on a smart sole under shoes as shown in Fig. 16 (b). In this case, the sensing distance is short and consistent, and the sensor is isolated from the ambient light when the sole is in contact with ground surfaces, all of which could result in an improved SNR that yields higher classification accuracies.

9 Conclusion

We present LaserShoes, a texture-sensing wearable system that detects ground surfaces using Laser Speckle Imaging. Our system can retrofit shoes, and consists of a laser emitter that illuminates ground surfaces and an image sensor that records videos with laser speckles. The recorded videos first pass through a pre-processing phase with which we extract input images from speckle images captured when a user’s foot is in contact with ground surfaces. Next, these input images are fed into a ResNet-18 classification model for surface type detection. We conducted a main study and three supplementary investigations to evaluate our system’s classification accuracy and robustness across various surface conditions and under different lighting conditions. We showed five applications of LaserShoes to demonstrate its potential use cases. Finally, we discussed our evaluation results and future work needed to further improve our system.

Acknowledgments

This project is supported by the National Natural Science Foundation of China (No. 62202423), and the Fundamental Research Funds for the Central Universities (No. 2022FZZX01-22). We thank all study participants for participating in the study and reviewers for constructive comments.

A Configuration Experiment

Table 1:

Wavelength (nm)	Distance (cm)	Accuracy (%)	Wavelength (nm)	Distance (cm)	Accuracy (%)
650 (red)	1	82.35 ± 2.15	520 (green)	1	98.09 ± 1.53
	3	96.84 ± 1.97		3	99.96 ± 0.23
	5	96.69 ± 0.97		5	98.61 ± 1.29
	7	99.91 ± 0.34		7	99.60 ± 0.67
	9	97.49 ± 1.55		9	99.36 ± 0.79
	11	94.13 ± 2.14		11	99.92 ± 0.32
	13	98.88 ± 1.19		13	89.09 ± 0.55
	15	95.44 ± 1.84		15	92.33 ± 1.72
Wavelength (nm)	Distance (cm)	Accuracy (%)	Wavelength (nm)	Distance (cm)	Accuracy (%)
450 (blue)	1	96.60 ± 1.92	405 (purple)	1	90.80 ± 2.51
	3	67.48 ± 2.44		3	91.55 ± 1.82
	5	91.92 ± 2.16		5	99.99 ± 0.13
	7	99.69 ± 0.59		7	99.53 ± 0.85
	9	97.76 ± 1.23		9	100.00 ± 0.00
	11	94.56 ± 2.49		11	91.53 ± 2.48
	13	98.20 ± 1.34		13	91.91 ± 0.34
	15	83.69 ± 0.96		15	100.00 ± 0.00

Table 1: Classification results of different wavelength-and-distance combinations. Accuracy greater than 98% are bolded.

Footnote

https://www.unoclean.com/

Supplementary Material

MP4 File (3544548.3581344-video-preview.mp4)

Video Preview

Download
24.81 MB

MP4 File (3544548.3581344-talk-video.mp4)

Pre-recorded Video Presentation

Download
213.47 MB

MP4 File (3544548.3581344-video-figure.mp4)

Video Figure

Download
24.72 MB

References

[1]

Paramvir Bahl, Venkata N. Padmanabhan, Venkat Padmanabhan, and Victor Bahl. 1999. User Location and Tracking in an In-Building Radio Network. Technical Report MSR-TR-99-12. Citeseer. 12 pages. https://www.microsoft.com/en-us/research/publication/user-location-and-tracking-in-an-in-building-radio-network/

Model	ResNet-18	VGG-16	GoogleNet	MobileNetV3
Accuracy	\(88.95\%\)	\(79.50\%\)	\(89.95\%\)	\(77.88\%\)
Model Size	42.8M	512.6M	48.2M	6.3M

Image Size	64 × 64	128 × 128	256 × 256	512 × 512
Accuracy	\(48.33\%\)	\(66.95\%\)	\(88.95\%\)	\(94.67\%\)
Inference Time	2ms	4ms	17ms	45ms

Abstract

1 Introduction

2 Related Work

2.1 Sensing Ground Surface with Smart Shoes

2.2 Surface Texture Detection Techniques

2.3 Laser Speckle Imaging

3 Principles of Operation

4 Hardware Design

4.1 Embedded System

4.2 Configurations

4.2.1 Image sensor.

4.2.2 Wavelength and distance.

4.3 Mechanical Structure and Fabrication

5 Ground Surface Detection

5.1 Data Pre-processing

5.1.1 Identifying foot-ground contact periods with variance-based threshold.

5.1.2 Cropping images.

5.1.3 Removing partial blurry images with region-based sum comparison.

5.1.4 Removing fuzzy patterns with Gabor filter.

5.2 Deep Learning Model

5.3 Image Size Selection

5.4 Real-Time Inference

6 Evaluation

6.1 Main Study with 24 Ground Surface

6.1.1 Ground surface materials.

6.1.2 Participants and apparatus.

6.1.3 Data collection procedure.

6.1.4 Results.

6.2 Supplementary Investigation

6.2.1 Dry, wet, and icy surfaces.

6.2.2 Sand surfaces with different grain sizes.

6.2.3 Different lighting conditions.

7 Application

7.1 Personal Running Assistant

7.2 Gait Analysis

7.3 Cleaning Equipment Auto-Control

7.4 Coarse Navigation

7.5 Daily Activity Recognition through Localization

8 Discussion

8.1 Efficacy of Data Pre-processing

8.2 Avoid Overfitting

8.3 Power Consumption

8.4 Sense Surfaces ahead for Early Alert

8.5 Loose or Transparent Ground Surfaces

8.6 LaserShoes under Intense Ambient Light

8.7 Form Factor Optimization

9 Conclusion

Acknowledgments

A Configuration Experiment

Footnote

Supplementary Material

References

Cited By

Index Terms

Recommendations

ForceSight: Non-Contact Force Sensing with Laser Speckle Imaging

RadarFoot: Fine-grain Ground Surface Context Awareness for Smart Shoes

Laser speckle photography for surface tampering detection

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF