1. Introduction
Transportation is an essential part of the tourism system and serves as the basis for the movement of tourists between the origin, destination, and different attractions to engage in recreational and tourism activities. In tourism cities, metros, buses, and taxis are the popular public transportation services for tourists. Of these, taxis only account for a small proportion of public transport travel due to the promotion of green and shared travel in recent years and the restrictions on taxi reservations. However, taxis are still the most favored transport mode for tourists owing to their convenience, quickness, and “point-to-point” accessibility [
1,
2]. Unlike the fixed routes and stops of buses and subways, the pick-up and drop-off locations in taxi trajectory records are highly related to human activities [
3]. Therefore, the analysis of the spatial and temporal characteristics of taxi travel and their relationship with the configuration of tourist attractions and tourist supporting elements (i.e., geographic contextual factors in urban geography) are important for the shaping of tourism perception and the sustainable development of urban tourism systems.
In the past few years, most researchers have used questionnaires to collect tourism data, such as mode of transportation [
4,
5], attraction choices, and tourism satisfaction. The sampling of tourist movements [
6,
7] is often conducted using satellite and Wi-Fi positioning technologies. Most of these studies have analyzed preferences, impressions, perceptions, and the distribution of tourists in a given area and at a given time to guide tourism product development and tourist flow management. However, this approach makes it difficult to provide timely feedback on tourist dynamics and can be costly when applied on a large scale. In recent years, digital footprints have become a widely used method in tourism research. From tourism web portals and social media platforms, data such as travelogues and photos of tourists can be collected [
8,
9,
10]. This has played an important role in facilitating the development of the spatial characterization of tourism flows toward precision and personalization but has also limited its large-scale analysis capabilities. However, the mobility that underpins urban tourism activity is often neglected in these studies, which makes it difficult to extract the spatial patterns of tourism traffic from web data.
A growing number of studies have attempted to use taxi trajectory data to analyze the operational characteristics of urban transport [
11,
12,
13,
14,
15,
16,
17,
18], traffic state identification [
19,
20], traffic flow parameter calculations [
21,
22], optimal route selection [
23], and daily travel characteristics and patterns research [
16,
18]. For example, it has been used to analyze travel hotspots according to passenger pick-up and drop-off locations and the relationships with land use [
15,
24] and explain the functional structure of cities [
13,
15,
25,
26]. Most of the literature focuses only on the behavior of urban residents during the workday. The spatial and temporal characteristics of tourists by taxi and their relationship with the organization of tourist attractions have received little attention [
27]. Although taxi trajectory data have the advantage of broad coverage and dynamic characteristics, they have not been effectively used in the study of urban tourism traffic patterns and the factors affecting them. Furthermore, modeling the flow status in tourism is critical to understanding the linkages between attractions within a destination and the entire tourism system. It can explain how tourism systems are shaped and reconfigured [
3,
11,
28,
29]. Intra-city tourist flows are strongly integrated with the transportation network. However, academic specialists in the field of transport and tourism have largely remained compartmentalized. Few studies have focused on the dynamics of urban tourism transport in a destination from the perspective of tourists.
Although good progress has been made in previous studies of daily taxi travel, few have explored the behaviour and structural characteristics of taxi travel during the peak tourism period. This study aims to fill these gaps. We have taken taxi trajectories in Shenzhen as a case study. In China, ‘May Day’ (a.k.a. International Labor Day) is one of the traditional holidays and the preferred date for tourism in the first half of the year. During this period, Shenzhen, as a coastal tourist city, receives numerous tourists. To investigate the spatial and temporal characteristics of tourists’ travel by taxi, we built a taxi trajectories dataset during ‘May Day’. We explored the taxi trajectories from two perspectives: trip origins and travel networks of attractions (i.e., building a travel network for each attraction). This study makes two major contributions to the literature. Firstly, unlike previous studies on tourist visitation patterns at the scenic scale and the characteristics of tourism network structures at the regional scale, the scale of this paper focuses on the intra-city tourism. We use taxi data to characterize intra-city tourism flows and the structure of attraction networks. It extends the exploration of complex urban tourism flows. Second, while most of our previous knowledge of tourism flows comes from manual surveys and panel data, this study provides a bottom-up objective perspective that reveals the geographical relevance of tourist trips and the differences in intra-city tourism network structure and spatial attractiveness by taxi data.
The remainder of this paper proceeds as follows.
Section 2 describes the study area, the methodology, and the key algorithms used in this paper, including the KDE, Getis-Ord Gi*, GWR, and complex network metrics.
Section 3 presents the experimental results.
Section 4 gives discussions, and the final section concludes this paper.
2. Data and Methods
2.1. Study Area and Dataset
Shenzhen is a coastal city in southern China with a subtropical maritime climate and is a famous tourist city. Owing to its unique geographical location near Hong Kong and Macau, it attracts many domestic and foreign tourists every year. The annual report of Shenzhen tourism statistics shows that in 2015, tourist accommodation facilities received 53.752 million overnight visitors throughout the year, an increase of 7.7% as compared to the previous year. Among them, overseas tourists accounted for 22.67%, and domestic tourists accounted for 77.33% (
http://wtl.sz.gov.cn/, accessed on 26 June 2021). In the same year, taxi travel accounted for 10.5% of public transport travel in Shenzhen. To investigate the characteristics of tourism travel by taxi, we selected the top 26 attractions ranked by tourists on Ctrip.com (a popular Chinese travel booking and travel diary sharing website) and the taxi trips to these 26 attractions from May 1 to 3, 2015. These attractions are shown in
Figure 1 and listed in
Table 1. As can be seen on
Figure 1, taxi trip origins are mainly distributed in three regions of Nanshan, Futian, and Luohu, where there are more tourist attractions.
The trajectory data were collected by 16,828 GNSS (Global Navigation Satellite System)-equipped taxis operating in Shenzhen, with an average sampling frequency of 30 s. In total, there are 69.16 million GNSS records. Each record includes the taxi’s identification, coordinates (i.e., latitude and longitude), instantaneous speed, time, and occupancy state (loading passengers or not). To explore the relationship between tourist trip origins and geographic contextual factors, this study also used POI (Point of Interest) data and road network data. The POI data were crawled from the open API of Gaode Maps (
https://lbs.amap.com/api/webservice/guide/api/search, accessed on 26 June 2021), with over 1.7 million records as of the end of September 2018 [
30], and each POI record includes attributes such as name, address, type, longitude, and latitude. The POI data are divided into 11 types: catering services (CS); corporate enterprise services (CES); shopping services (SS); transportation facilities (TF); finance and insurance services (FIS); science, education and culture services (SECS); residential housing (RH); living services (LS); sports and leisure services (SLS); health care services (HCS); and accommodation services (AS).
2.2. Methods
The spatial structure of urban tourist trips contains three components: trip origins, tourist attractions, and travel networks. To analyze the spatiotemporal characteristics of tourist trips from both supply and demand perspectives, we first processed the taxi trajectories to extract the tourist trips. Second, we analyzed the aggregation trends and spatial dependence of tourist trip origins and their correlations with geographic contextual factors by using KDE, Getis-Ord Gi*, and GWR. Third, we established tourist travel networks and analyzed their structure and characteristics using complex network metrics to explore the mechanism of the formation of the tourist network for each attraction. The methodology in this paper is divided into the following parts: building a tourist trip dataset, spatial aggregation of tourist trip origins and spatial dependence on geographic contextual factors, and quantitative analysis of the travel network structure for each attraction. All data analysis was conducted on a Dell Tower 7810 server with an Intel Xeon CPU, 32 GB of RAM. Taxi trajectory data were pre-processed using the Python. Tourist trip data was mapped and spatially analysed using ArcGIS.
The methodological framework is illustrated in
Figure 2.
Step 1: Building tourist trip dataset.
The original collected taxi trajectories are disorganized, and tourist trips need to be extracted for subsequent tasks.
(1) Extraction of taxi trips.
We first identified and removed trajectory records with large latitude and longitude jumps and speed anomalies. Next, we used a map-matching algorithm called ST-Matching [
31] to align all trajectory points (identified by ride status) between passenger pick-up and drop-off locations (identified by the occupancy states) with the road network.
(2) Selecting taxi trips for tourism.
The taxi trip data are divided into two types of trips: tourist trips and residential trips (i.e., trips for other activities of local residents). First, we sketched out the tourist drop-off areas of 26 tourist attractions to extract tourist trips. Considering the randomness of taxi drop-off locations and the influence of satellite positioning accuracy, we repeatedly compared and corrected the boundaries of the potential drop-off area of each attraction near the entrance with the help of Google satellite images and Baidu Street View (
https://map.baidu.com/, accessed on 26 June 2021) to ensure the reliability of the extracted tourist trips. If the taxi drop-off location is in the potential drop-off area of a tourist attraction, this trip is considered to be a tourist trip. The final tourist trip dataset contained 37,878 records.
Step 2: Spatial aggregation of tourist trip origins and spatial dependence on geographic contextual factors.
The taxi travel network comprises trip origins, trip routes, and trip destinations. Of these, trip origins are commonly used for predicting trip generation rates and trip distribution in the field of trajectory-based urban travel studies, as well as for traffic impact, association relationship, and driver factors analysis. In this step, we focus on the aggregation trends of tourist trips by taxis, the spatial dependency characteristics, and the influence of geographical contextual factors.
(1) Aggregation trends and spatial dependencies of tourist trip origins.
We used kernel density (KDE) (
https://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/kernel-density.htm, accessed on 26 June 20211) [
32] to estimate the aggregation trends of trip origins. KDE is a non-parameter calculation algorithm for surface density, which calculates the data aggregation status of the entire region based on the input dataset, to produce a continuous surface with density. A larger kernel density value indicates a stronger concentration—i.e., more tourists traveling from this location.
Next, we used the Getis-Ord Gi* (
https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/h-how-hot-spot-analysis-getis-ord-gi-spatial-stati.htm, accessed on 26 June 2021) [
33] algorithm to explore the local spatial dependence of the tourist trip origins to determine the hot or cold regions for tourism travel. The Gi* statistic is the ratio of the sum of observations at locations around the target location to the sum of locations at all locations within a given distance range. It is used to identify whether there is a dependence between the target location and the surrounding locations in terms of high and low values. The Gi* statistic returns the z-score value for each element in the dataset. For positive z-scores, the higher the z-score, the tighter the spatial dependence for higher values. For negative z-scores, the lower the z-score, the tighter the spatial dependence for lower values. Thus, the Gi* statistic can identify significant hot spots (high-value spatial dependence) and cold spots (low-value spatial dependence).
(2) Correlation between geographic contextual factors and trip origins.
To identify the factors correlated with trip origins, we created buffers with radii of 20, 50, 100, 200, and 300 m at each trip origin and counted the number of each type of POI within each buffer. We used the number of the 11 POI types as the optional explanatory variable and the kernel density value at the trip origin as the dependent variable to build a geographically weighted regression model, which was used to test the validity of the explanatory variables. Geographically weighted regression (GWR) (
https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/geographically-weighted-regression.htm, accessed on 26 June 2021) [
34] introduced geographic location into the model parameters and used locally weighted least squares for parameter estimation. Therefore, the variables vary with spatial location and its model coefficients can better reveal the spatial non-homogeneity of geographic elements.
Step 3: Quantitative analysis of tourists’ travel networks.
Finally, 26 travel networks were created for each tourist attraction based on the tourist trip data. The nodes of a travel network comprise the trip origins, the road intersections through which travel routes pass and target tourist attractions. The edges of a travel network are composed of sections between road intersections. The resulting travel network integrates discrete trip origins and tourist attractions into a holistic system that can represent the spatial range of attractiveness and services of each attraction. To compare the structural differences in travel networks, complex network metrics such as average degree, network diameter, average path length, and average clustering coefficient were used. The specific details of each metric are not presented here and can be found in the relevant literature [
35].
4. Discussion
(1) Analysis of tourism travel models.
For the distribution model, there are various models to describe the distribution patterns of travel distance and travel time in trajectory data, such as power law distribution, exponential distribution, exponentially truncated power law distribution, lognormal distribution, and gamma distribution. Brockmann [
36] observed that human travel distances show a power law distribution. Yan [
37] argued that the mode of transportation affects the aggregated travel patterns, and the displacement from a single mode traffic should follow an exponential distribution rather than a power law. Liang [
38] argued that the displacement of taxi passenger trips follows an exponential decay. Zhang explored urban mobility in Harbin, China, and found that travel distances follow a log-normal distribution [
39]. Veloso found that the gamma distribution can describe the travel distance of taxis [
40]. From the above studies, it can be seen that the travel time and distance patterns contained in the trajectory data are difficult to represent using a uniform model in different data sets and study areas. In this study, through the modeling and comparative analysis of travel distances, we found that the GEV model better represents the characteristics of tourist trips on ‘May Day’. It can describe the climbing and falling characteristics in trips. In line with the pattern derived from the other data, all data fall into the long-tail distribution, which represents a decreasing volume of traffic over long distances. However, the GEV fits the data better in describing the climbing characteristics. One of the possible reasons for this is that the combination of the layout characteristics of tourism resource and the weather factors lead to the need for more comfortable transportation when visiting the close attractions. This phenomenon could describe the preference of tourists for taxi travel, which would help to plan an efficient and effective transport system that facilitates the turnover of tourists between multiple attractions. Moreover, this phenomenon is expected to guide them in making informed decisions about transportation services when visiting multiple attractions.
For travel mobility models, most of the previous literature has given flow patterns between destinations [
41]. In contrast, few studies have been devoted to modelling intra-destination flows. It is therefore important to clarify the patterns of intra-destination flows [
42], particularly the characteristics of city-scale tourism flows. In many countries, taxis are the preferred mode of travel for many trips, especially for individuals conducting business and tourism. Existing studies of travel behavior using trajectory data focus on the transportation characteristics of commuters in general. In this study, we analyzed the structural characteristics of taxi travel networks between intra-city attractions. McKercher and Lew [
43] gave four mobility patterns for tourists, such as single destination with or without side trips, transit leg and circle tour, circle tour with or without multiple access, and hub-and-spoke style. These patterns can be used to guide tourism product development. However, it is difficult to adapt to the needs of transportation organization and synergistic development planning among multiple attractions. We used taxi OD data and tourist volume to establish a flow network between multiple attractions. The results reported in this study shows that taxi trip data can reveal the spatial use behavior of tourism resources, which can be used to guide tourism product development, and tourism route organization and planning.
For the modeling of impact factors, a lot of meaningful work has been done. Urban taxi travel is closely related to geographic location, particularly sociodemographic distribution and built environment characteristics [
1]. Compared to previous studies, this study focuses on the impact of the built environment. Considering the difficulty of obtaining taxi trajectory data and the demographic characteristics of tourists at the same time, we built a geographically weighted regression model with trip density as the dependent variable and POIs within the buffer as the explanatory variable to help explain the spatial imbalance of tourist trips. Since it is difficult to build the range of environmental factors influencing travel, we conducted a buffer zone analysis at seven different scales to build models that may accommodate more explanatory variables. Through experiments, we found that the modellable variables that can explain the characteristics of trip occurrence are different. At 100 m, the associated influence of various types of POI is more effectively expressed.
(2) Implications for tourism transportation planning.
This study analyzed the characteristics of urban tourist trips from a spatiotemporal and network perspective. The results show that the morphological structure of cities and the uneven distribution of tourism resources are one of the main reasons for the hot and cold distribution of tourist trips. In such a scenario, a single mode of transport affects the willingness and impression of urban tourism travel. Moreover, as a fast means of transport, the need for drivers to make a profit affects the spatial distribution of the passenger-finding process, as they prefer to go to places where there is a high population density. This also generates competitive pressure to travel. For urban tourism resources to be favoured by the public, the development of time-saving, long-distance transportation modes is a necessary part of a sustainable urban tourism system. Taking Shenzhen as an example, the data collection year for this study was 2015, when there were five metro lines and a lack of metro lines in the northern and eastern parts of the city. When taxis are not adequately distributed, the public has to rely on buses, which may lead to increased travel time and reduced willingness to travel. The Shenzhen government is also working to improve transport conditions, although the main goal is to serve the needs of daily commuting and mobility, which invariably also benefits the city’s tourism industry. As of 2020, Shenzhen has 11 metro lines, and travel conditions in the east and north have been significantly improved. Moreover, since 2016, Shenzhen has developed shared cars that can be hailed via smartphones and picked up at the departure point on time. These modes of transportation facilitate the tourism travel from long distances and bridge the imbalance in the spatial distribution of taxicabs carrying tourists. Overall, taxis are one of the most important parts of urban tourism transport systems. However, to achieve sustainable urban tourism, opening long-distance metros and increasing car sharing will help to satisfy tourism travel in suburban areas and improve the equity of urban tourism, the image of urban tourism and tourist satisfaction.
(3) Limitations.
There are various modes of transport that can be used for urban tourism, such as metros, buses, and taxis. Recently, car sharing has gradually emerged as a new transport mode for the public. This study focuses only on taxi travel during a single time period, ‘May Day’, which has limitations in terms of the comprehensiveness of the transport modes. However, the related data analysis methods are applicable to other transport modes, and the results are reliable when compared to traditional manual tourist surveys, especially for transport modes such as taxis where on-site survey data is difficult to collect. Another limitation is the extraction methods for tourist trips. Given the random nature of taxi drop-off locations, it is difficult to define a precise area at the entrance of a tourist attraction to assist in extracting reliable trips. However, we tried to ensure the quality of the data and the accuracy of the drop-off areas, such as, by setting different drop-off areas for different attraction entrances and road layouts, and by combining Baidu Street View and Google satellite images to correct the drop-off areas to take into account the congestion of tourist traffic in ‘May Day’.
5. Conclusions and Future Work
In this study, we investigated the spatial distribution characteristics of tourist trip origins and their correlation with geographical contextual factors, as well as the structural characteristics of tourist travel networks. First, we used the KDE algorithm to analyze the spatial aggregation characteristics of tourist trip origins. The results show that tourist trips are concentrated in areas with a high distribution of tourist attractions and urban entry/exit ports. Second, we examined the spatial dependence of tourist trips using Getis-Ord Gi* and found that urban spatial structure, morphological characteristics, and the distribution of tourist resources can have an impact on tourist taxi trips. Third, we explored the correlations between the tourist trip origins and urban geographic contextual factors using the GWR model. The results revealed significant differences in the correlations between tourist trips and the factors. Finally, we constructed travel networks and quantified and compared them using complex network metrics. Other interesting insights were found that are either consistent or inconsistent with some preconceived ideas and related research. First, the trend between the coverage of the tourism network and the volume of tourist trips is similar. Furthermore, for attractions with high coverage, the peak in tourist volumes occurs on the second day of the tourism period. Attractions in the middle of the coverage rankings show a downward trend in tourist volumes. Second, the spatial interaction intensity between urban tourist attractions has two structural characteristics: grouping and hierarchy. However, the groups are not evenly distributed spatially. This is one of the reasons why there is a big difference between the hot and cold tourist trips from the north and south of Shenzhen.
Compared with other public transport data, taxi GNSS records have higher accuracy for location and time stamping, which can reveal people’s movement patterns with reliability. Here, we have only analyzed the tourist characteristics in ‘May Day’, and in the future, we aim to obtain data for the same period over several years and carry out a comparative analysis covering other tourist seasons in China, such as the National Day. Moreover, we will focus on the environmental semantic features of traffic around tourist attractions to assist in tourism product development and tourist moderation.