1 Introduction

The maturity of e-commerce has fundamentally changed the searching and purchasing behavior of consumers and has enabled them to compare products before purchasing (Bronnenberg et al., 2016). For instance, when purchasing a digital camera online, consumers engage on average in browsing 6.4 products from 3 companies (ibid), while when booking hotels online, they browse on average 2.3 companies, with large heterogeneity across groups (Chen & Yao, 2017). Prior studies have explored the relationship between purchasing intention and browsing behavior, such as depth (e.g., Johnson et al., 2004), frequency (e.g., Moe & Fader, 2004a, b), and duration of browsing (e.g., Sismeiro & Bucklin, 2004). We extend this line of research by considering consumers’ browsing-path dependence.

Browsing-path dependence refers to a consumer’s browsing behavior in one company and how it will affect this consumer’s behavior in another. Considering the browsing path dependence can alleviate biased deviations between the interpreted purchasing intention and the actual behavior, which can guide a better understanding of consumers and enable decision support for service response strategies. Consumer service is defined as a useful, responsive service that responds quickly to consumer inquiries and returns or complaints during or after a purchase (Holloway & Beatty, 2008) and is one of the dimensions of e-service quality measurement (Behera et al., 2021; Wang et al., 2004). We specifically focus on service response during purchasing (Bhatnagar et al., 2017), due to its effect on consumer satisfaction, which affects purchasing behavior (Udo et al., 2010), as well as because prior studies have merely explored the topic from a consumer review perspective after purchasing (e.g., Proserpio & Zervas, 2017).

Compared to the attempts in the extant literature, the browsing path dependence could demonstrate more accurately the real-time purchasing intentions of consumers, thereby enabling decision support for e-commerce platforms to formulate dynamic service response strategies. To study the browsing path dependence of consumers, we use clickstream data that showcase their browsing behavior in a time-variant and objective way (Balan & Mathew, 2021; Bucklin & Sismeiro, 2009; Kim et al., 2011). Clickstream data have recently become a rich source of information for both researchers and practitioners to unearth the potential interests and preferences of heterogeneous consumers. We use a dataset from one of the largest e-commerce platforms in Asia capturing the time-variant browsing behavior of consumers for 11 weeks (77 days). We separately model the browsing path dependencies for overall consumers and heterogeneous consumer groups through duration analysis (Bhatnagar et al., 2017).

Our findings demonstrate that the actual behavior of consumers can be accurately described by considering the dependencies of the sequential browsing paths. We show that the path dependence of consumers decreases with the increase in browsing times. In practice, this means that companies need to provide a timely service response to first-time consumers. We also find that the behavior patterns of heterogeneous consumer groups are quite different, and the degree of such a difference is related to the position that a certain company holds in the market.

The rest of our paper is structured as follows. In the next section, we provide an overview of the literature. After this, the dataset and methodology of the study are discussed. The following section describes the results of our study. Next, these results are discussed, and their implications are also considered. Finally, a brief conclusion is provided along with limitations, as well as an agenda for future research on the topic.

2 Background

Clickstream data contain inter alia consumers’ i) browsing path, ii) sequential purchase, iii) website entrance, and iv) other associated information (Bhatnagar et al., 2017; Montgomery, 2001). Browsing-path information differs from the simple sequential transaction information as it can record the complete trajectory of consumers in the online shopping process (Hui et al., 2009), and it is more likely to better explain consumers’ interests and purchasing intentions. Prior studies have explored the purchasing intentions of consumers based on the characteristics of their online browsing behavior. For instance, Moe and Fader (2004a, b) explored the relationship between consumers’ purchasing intentions and frequency of browses and showed that those who browse a website more frequently have a higher purchasing intention. Similarly, when consumers’ purchasing process is decomposed, the longer consumers’ browse a specific product page, the higher their purchasing intention is (e.g., Sismeiro & Bucklin, 2004). Such studies, therefore, provide a foundation to examine clickstream data and consumers’ purchasing intention.

Most studies on online browsing behavior have investigated at the aggregate level, that is, they have studied the cumulative browsing behavior across all consumers. However, many studies have found that heterogeneous consumer groups have large differences in learning ability and consumption experience, resulting in large differences in browsing behavior (Bhatnagar et al., 2017; Johnson et al., 2003). In addition, Sismeiro and Bucklin (2004) found that the results of analysis using aggregate-level browsing data are often misleading. Therefore, we use the overall consumer behavior to illustrate the importance of browsing path-dependent information. We further explain the heterogeneity in consumers’ browsing behavior by using behavioral information of heterogeneous consumer groups.

We argue that browsing-path dependence is critical for explaining consumers’ behavior, especially the purchasing behavior of complex products. For example, a consumer may acquire product and company information from the webpage of company A and then purchase directly from another company B. If sequential browsing information is not integrated into the analysis, then consumers’ behavior is calculated based only on partial information and the results can be biased (Park & Fader, 2004). Prior studies have demonstrated that integrating consumers’ sequential path information can yield a better explanation. For instance, Park and Fader (2004) explained consumers’ behavior on the website of one company based on the browsing-path information associated with frequency and timing when consumers switch to the websites of other companies. Karimi (2021) used information on consumers’ cross-visit behavior between comparison and retailer websites to explain consumers’ holistic behavior.

Prior studies have considered consumers’ browsing-path behavior on multiple websites (e.g., Karimi, 2021; Park, 2017). We argue that it is equivalently, if not more, important to consider consumers’ browsing-path behavior on multiple companies on the same website, since competition intensity on one platform is greater than that on different websites (Bronnenberg et al., 2016). In addition, explaining consumers’ browsing path intentions can provide decision-making reference for timely service responses, and enhance companies’ reputation, increasing their competition intensity. Thus, we deploy duration analysis to analyze the browsing-path dependence of related companies to explain the spell and browsing patterns of heterogeneous consumers.

3 Methodology

3.1 Data

We acquired data from one of the most popular e-commerce platforms in Asia, which sells millions of high-quality products from tens of thousands of companies in 12 product categories, which include: electrical appliances, digital cameras, computers, mobile phones, home furniture, clothes, shoes, luggage, cosmetics, mother and baby products, books, and food. To increase the practical implications of our findings, we adopted the product categorization of the focal e-commerce platform without further interference. We collected information about consumers’ time-variant online browsing behavior for a total of 11 weeks (77 days) from this platform. The dataset contains consumers’ sequential browsing paths and demographics, as well as information about the companies they browsed. The dataset was constructed by merging three databases: i) consumers’ sequential browsing path, covering the time when the browsing behavior occurred and the name of the company being browsed; ii) attributes of the browsed companies, including the name and category the company belongs to; and iii) consumers’ demographic information, including gender, membership level, and age. To ensure confidentiality, all information about consumers and companies is pseudonymized via numerical coding.

Since consumers’ browsing behavior can vary based on the category of product, to verify the effectiveness of path dependence information, we follow Park and Fader (2004) who use consumers’ browsing behavior data in two different categories of products. We present the summary statistics of consumers’ browsing behavior in these two product categories in Table 1. For the convenience of the subsequent presentation, we named these two product categories 1 and 2. According to Table 1, the average number of times consumers repurchase products in category 2 is twice that of category 1, while the number of browses for category 1 products is more than twice that of category 2. Different categories of products have different attribute factors and consumption frequencies, so there is a difference in the number of consumers’ browses and repurchases (e.g., Zuo et al., 2019). Therefore, according to consumers’ browsing and purchasing information, we can see these categories are different.

Table 1 Statistics of consumers' behavior characteristics in two product categories

According to the preliminary statistics of the dataset, there are 66 companies in product category 1 and 127 in category 2. Due to the number of companies in each product category, we need to conduct company screening to show the dependence of consumers on the browsing paths of related companies in detail. Our preliminary statistics on product sales reveal that the cumulative volume of the top 10 companies in each product category accounts for about 90% of the entire market share. That is, the bottom 90% of companies in each product category have extremely low market shares, and the number of consumers with co-browsing behavior is sparse, making the results insignificant. Subsequently, we randomly combined the top 10 companies in each product category into two pairs and counted the number of consumers who co-browsed them. According to these results, we select the two companies with the most consumers, and we name these two companies in the first product category as A and B, and companies in the second product category as C and D. In these product categories, the number of consumers with browsing path records for at least one of the two related companies was 2,634 and 1,169, and the number of browsing records was 22,349 and 15,042, respectively. To study the impact of consumers’ current browsing path information about related companies on subsequent browsing behavior, we exclude consumers who did not visit both companies at the same time. The number of consumers who browse two companies separately and both in different categories is shown in Table 2: 1,610 out of 2,634 consumers (61%) in the first category and 785 out of 1,169 consumers (67%) in the second one.

Table 2 Statistics of consumers browsing two related companies

In line with prior studies (Moe & Fader, 2004a, b; Park & Fader, 2004), we also adopt the calendar day as a spell, which means if a consumer i browses a specific company multiple times on the same day, that company should be coded as browsed for that calendar day. For example, assuming consumer i first browses company A three times and then B for two times; then we code this consumer’s browsing behavior as browsing both A and B that day.

To study the heterogeneity of consumers’ browsing paths of related companies in the same product category, we classify consumers according to available demographic information including age (3 types), gender (2 types), and membership level (2 levels). Accordingly, consumers can be divided into 12 (3*2*2) types (see Table 3).

Table 3 Consumer characteristics of browsing related companies in product category 1

3.2 Data Analysis

Duration analysis is typically considered between two companies, but it can analyze path dependence among multiple ones (Park & Fader, 2004). We first present a simple bivariate distribution model that assumes consumers’ browsing path among related companies is completely independent. Then we establish a Farlie-Gumbel-Morgenstern family model to show the dependence of the browsing paths (Chintagunta & Haldar, 1998; Park & Fader, 2004).

3.3 Benchmarking Models

Let us assume a random continuous variable T indicating that an event has been occurring until that time. A specific value of \({T}_{ijl}\) is represented by \({t}_{ijl}\), in which both \({T}_{ijl}\) and \({t}_{ijl}\) are greater than or equal to 0, where i stands for different consumers, j represents different companies, and l represents different product categories. The possible values of j are 1 or 2, indicating two related companies in the same product category. Assume that the probability density and cumulative distribution function of \({t}_{ijl}\) are \(f({t}_{ijl})\) and \(F({t}_{ijl})\) respectively, where \(F({t}_{ijl})\) refers to the failure function. The probability that the consumer browsing interval exceeds \({t}_{ijl}\), namely the univariate survival function, is defined in formula (1):

$$S\left({t}_{ijl}\right)=P\left({T}_{ijl}>{t}_{ijl}\right)=1-F\left({t}_{ijl}\right),$$
(1)

According to formula (1), the survival function \(S\left({t}_{ijl}\right)\) is the inverse of \(F({t}_{ijl})\). Since \(F({t}_{ijl})\) is monotonically increasing, \(S\left({t}_{ijl}\right)\) is monotonically decreasing. Therefore, if consumer i did not browse company j in product category l until time \({t}_{ijl}\), the hazard function of a specific instantaneous browsing probability can be expressed by formula (2):

$$h\left({t}_{ijl}\right)=\frac{f({t}_{ijl})}{S({t}_{ijl})}=\frac{f({t}_{ijl})}{1-F({t}_{ijl})},$$
(2)

When considering the specific consumer’s browsing paths among the related companies are independent, the joint cumulative distribution function is defined by formula (3):

$${F}_{12}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right)={F}_{1}\left({t}_{{1}_{l}}\right){F}_{2}\left({t}_{{2}_{l}}\right),$$
(3)

where the function \({F}_{12}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right)\) represents the joint cumulative distribution function of browsing paths between the company 1 and 2 in product category l, while \({F}_{1}\left({t}_{{1}_{l}}\right)\) and \({F}_{2}\left({t}_{{2}_{l}}\right)\) represent the marginal cumulative distribution functions, respectively.

3.4 Proposed Association Models

The most common way to express the correlation between two independent univariate distribution functions is to combine the two random variables with a third term. For instance, Farlie (1960) has proposed a common bivariate distribution function, as defined in formula (4):

$$\begin{array}{c}{F}_{12}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right)={F}_{1}\left({t}_{{1}_{l}}\right){F}_{2}\left({t}_{{2}_{l}}\right)\{1+\beta \left[1-{F}_{1}\left({t}_{{1}_{l}}\right)\right]\left[1-{F}_{2}\left({t}_{{2}_{l}}\right)\right]\}\\ -1\le \beta \le 1\end{array},$$
(4)

The meaning of parameters in formula (4) is the same as formula (3), where the parameter \(\beta\) indicates the degree of dependence of consumers’ browsing paths among related companies. According to Chintagunta and Haldar (1998), \(\beta\) can also identify the relationship between two companies. When \(\beta\) is negative, these companies have a potential complementary relationship, while when \(\beta\) is positive, they have a potential substitutional relationship (Park & Fader, 2004). If \(\beta\) is 0 or not significant, the browsing paths of these two companies are not relevant. The density function of the bivariate function in formula (4) is then expressed in formulas (5), (6), (7) and (8):

$$\begin{array}{c}{f}_{12}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right)=\frac{{\partial }^{2}{F}_{12}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right)}{\partial {t}_{{1}_{l}}\partial {t}_{{2}_{l}}}\\ ={f}_{1}\left({t}_{{1}_{l}}\right){f}_{2}\left({t}_{{2}_{l}}\right)\{1+\beta \left[2{S}_{1}\left({t}_{{1}_{l}}\right)-1\right]\left[2{S}_{2}\left({t}_{{2}_{l}}\right)-1\right]\}\end{array},$$
(5)

when consumer i browses both companies in the sequential browsing path, the browsing behavior to the likelihood function is captured in formula (5), following Chintagunta and Haldar (1998). If a consumer only browses the first company, the browsing behavior is captured in formula (6), following Johnson and Kott (1975):

$${{f}_{1}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right)=f}_{1}\left({t}_{{1}_{l}}\right){F}_{2}\left({t}_{{2}_{l}}\right)\left\{1+\beta \left[1-2{F}_{1}\left({t}_{{1}_{l}}\right)\right]\left[1-{F}_{2}\left({t}_{{2}_{l}}\right)\right]\right\},$$
(6)

Similarly, when the consumer only browses the second company, the browsing behavior in the likelihood function is captured in formula (7):

$${{f}_{2}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right)=F}_{1}\left({t}_{{1}_{l}}\right){f}_{2}\left({t}_{{2}_{l}}\right)\left\{1+\beta \left[1-{F}_{1}\left({t}_{{1}_{l}}\right)\right]\left[1-{2F}_{2}\left({t}_{{2}_{l}}\right)\right]\right\},$$
(7)

When the consumer browses neither of these companies, the browsing behavior in the likelihood function is shown in formula (8):

$${{S}_{12}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right)=S}_{1}\left({t}_{{1}_{l}}\right){S}_{2}\left({t}_{{2}_{l}}\right)\left[1+\beta {F}_{1}\left({t}_{{1}_{l}}\right){F}_{2}\left({t}_{{2}_{l}}\right)\right],$$
(8)

3.5 Right Censoring of Association Models

A right censoring problem typically exists in consumers’ sequential association browsing paths, meaning that when the pre-defined observation period (namely spell) ends, consumer i has no browsing behavior within the spell (Bhatnagar et al., 2017). As illustrated in Fig. 1, we set the starting observation time as 0, and the corresponding browsing behavior is also 0. The period from 0 to \({t}_{1}\) is spell 1. According to the illustrative example in Fig. 1, consumer i only browses company A in spell 1. In spell 3, this consumer does not browse any company. To show the continuity of the consumer’s browsing behavior, we use right censoring to deal with the consumer’s browsing behavior in the third spell. To address the potential right censoring problem, a dummy variable \({d}_{ijl}\) is introduced in the likelihood function to indicate whether consumer i has the browsing behavior in the respective spell. When a consumer is interested in two related companies A and B, there are four possibilities for their browsing paths in any one spell: i) browsing only A; ii) browsing only B; iii) browsing both; and iv) browsing neither of these. Therefore, when consumer i has browsing behavior during this spell, then the contribution of the sequential browsing path to the likelihood function is shown in the above formulas (5), (6), (7); otherwise, the function is shown in formula (8).

Fig. 1
figure 1

The Browsing Path of Consumer i

3.6 Synthetic Likelihood Function of Association Models

Considering the four browsing paths during the observation period, the likelihood function for all browsing which is labeled in formula (9).

$$L={\prod }_{\zeta =1}^{M}[{f}_{12}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right){]}^{{\delta }_{12\zeta }}[{f}_{1}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right){]}^{{\delta }_{10\zeta }}[{f}_{2}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right){]}^{{\delta }_{02\zeta }}[{S}_{12}\left({t}_{{1}_{l}},{t}_{{2}_{l}}\right){]}^{{d}_{ijl}},$$
(9)

where M represents the number of completed spells, when a certain spell \(\zeta\) ends, the consumer simultaneously browses the two companies in product category l, then the value of \({\delta }_{12\zeta }\) is equal to 1; otherwise, \({\delta }_{12\zeta }\) is equal to zero. If the consumer only browses the first company, then the value of \({\delta }_{10\zeta }\) is equal to 1; otherwise, \({\delta }_{10\zeta }\) is equal to zero. If the consumer only browses the second company, the value of \({\delta }_{02\zeta }\) is equal to 1; otherwise, \({\delta }_{02\zeta }\) is equal to zero. The last term of function is used to explain the right censoring problem, meanwhile, each browsing path can only be one of the above four types, which means the sum of \({\delta }_{12\zeta }\), \({\delta }_{10\zeta }\), \({\delta }_{02\zeta }\) and \({d}_{ijl}\) must equal to 1.

3.7 Distribution Assumptions of the Density Function

Before conducting the parameter estimation, the distribution of the density function \(f({t}_{ijl})\) of the continuous random variable \({t}_{ijl}\) needs to be assumed, satisfying the non-monotonousness assumption (Chintagunta & Haldar, 1998). The non-monotonic density functions commonly used in duration analysis include exponential distribution (Park & Fader, 2004), Weibull, and Log-logistic distribution (Bhatnagar et al., 2017; Chintagunta & Haldar, 1998). Among them, the exponential density function is the earliest hypothetical form. However, since the hazard function \(h({t}_{ijl})\) corresponding to the exponential density function is constant, the probability of consumer i browsing the two companies is unrelated to the length of the browsing interval. The hazard function of the exponential is memoryless, which differs from the consumer i in practice. Therefore, the exponential density function is extended with only one parameter to Log-logistic, and Weibull distribution with two parameters. After the preliminary processing of the data, we find that the density function of Log-logistic is better than Weibull distribution. Therefore, we assume that the density function \(f({t}_{ijl})\) of the sequential browsing path of consumer i follows a Log-logistic distribution, and the hazard function shown in formula (10):

$$h\left({t}_{ijl}\right)=\frac{\alpha \gamma {t}_{ijl}^{\gamma -1}}{1+\alpha {t}_{ijl}^{\gamma }},$$
(10)

\(\alpha\) in formula (10) represents the scale parameter, with other parameters unchanged, a larger value of \(\alpha\) indicates that consumer i is more likely to browse this company again. Here, \(\gamma\) represents the distribution of the hazard function. When \(\gamma >1\), the distribution of the hazard function is a single peak, which means that when consumer i enters a company store, the probability of browsing increases due to the need to first receive information about the company. However, once a certain time is reached, the browsing possibility starts declining, which means that when consumer i has a certain understanding of a company, they will concentrate on understanding a certain product. Therefore, the corresponding browsing probability will be reduced. However, when \(\gamma \le 1\), the change law of the hazard function is monotonically decreasing, that is, as time increases, the possibility of browsing each company will gradually decrease. This demonstrates that the consumer i is familiar with the company and has a clear shopping goal.

3.8 Bayesian Estimation Process of Parameters

To explore the uncertainty and heterogeneity of the estimated parameters, we use the Bayesian method for their estimation (Chu et al., 2017; Manouchehri et al., 2020). According to the Bayesian principle, the joint posterior distribution of each parameter can be expressed by formula (11):

$$\begin{array}{c}p({\alpha }_{i1l},{\gamma }_{i1l},{\alpha }_{i2l},{\gamma }_{i2l},{\beta }_{il}|{t}_{i1l},{t}_{i2l})\\ \propto L\left({t}_{i1l},{t}_{i2l}|{\alpha }_{i1l}, {\gamma }_{i1l}, {\alpha }_{i2l}, {\beta }_{il}\right)p\left({\alpha }_{i1l}\right)p\left({\gamma }_{i1l}\right)p\left({\alpha }_{i2l}\right)p\left({\gamma }_{i2l}\right)p({\beta }_{il})\end{array},$$
(11)

The likelihood function \(L(\bullet )\) on the right side of formula (11) is calculated by the formula (9), and \(p(\bullet )\) is the prior distribution of each parameter. If no prior information exists for each parameter, then the prior can be set to the suggestion of weakly informative priors (Gelman et al., 2013). Therefore, in the Markov chain Monte Carlo (MCMC) sampling process, the prior distribution settings for each parameter are shown in formula (12):

$$\begin{array}{c}{\alpha }_{i1l}\sim Half Normal({\sigma }_{{\alpha }_{i1l}}^{2})\\ {\alpha }_{i2l}\sim Half Normal({\sigma }_{{\alpha }_{i2l}}^{2})\\ \begin{array}{c}{\gamma }_{i1l}\sim Half Normal({\sigma }_{{\gamma }_{i1l}}^{2})\\ {\gamma }_{i2l}\sim Half Normal({\sigma }_{{\gamma }_{i2l}}^{2})\\ {\beta }_{il}\sim Uniform(-\mathrm{1,1})\end{array}\end{array},$$
(12)

3.9 Markov Chain-Monte Carlo Parameter Estimation

We use the No-U-Turn Sampler (NUTS) algorithm to sample and estimate the above parameters, which is an efficient parameter optimization method (Hoffman & Gelman, 2014). In each sampling, a recursive algorithm generates posterior information of candidate parameter sets. If the constraint of non-U-type rotation is satisfied, sub-tree updating parameters are constructed continuously. Otherwise, record the optimal parameter set of this sampling and start the next sampling until sufficient samples are obtained. The algorithm fully optimizes the parameters in each sampling, avoids redundant sampling due to random walk which, and improves the efficiency of parameter optimization. We run two independent MCMC to sample the parameters. Each MCMC was sampled 10,000 times and keeps a sample every 5 times. Hence, the number of samples used as a posterior inference of our model is 20,000.

4 Results

4.1 Estimation of all Parameters

We summarize the estimated results of parameters in Table 4: Alpha represents the scale parameter and Gamma represents the shape parameter of each company in the two product categories. Beta indicates the degree of dependence between consumers’ browsing paths in related companies. We use Watanabe-Akaike Information Criterion (WAIC) (Vehtari et al., 2017; Watanabe, 2010) as the indicator that evaluates the model’s fit effect, in which the smaller the value, the better the fit of the model. We use Mean (s.d.) to represent the average (standard deviation) of the estimated parameters. The interval value of 2.5–97.5% represents the 95% confidence level of the estimated parameters. If 0 is not in this interval, the parameter to be estimated is significant.

Table 4 Estimation of parameters

The parameter estimation and WAIC values indicate that, first, when comparing WAIC values of the benchmarking model and the proposed association model in these two product categories, all WAIC values of the proposed association models are smaller than the corresponding benchmarking model. This suggests, that regardless of category, the model-fitting effect of the proposed association model with browsing-path dependence is better than that of the benchmarking one. Second, all parameters are significant at the 95% confidence interval corresponding to the estimated value of each parameter and the judgment criteria of the Bayesian estimation method. Third, the maximum value of the average standard deviation is 0.017, and most of the other values are within 0.010, indicating that the distribution of the estimated values of each parameter is relatively concentrated. Fourth, the significant positive beta indicates that the related companies in the two categories are competing (Park & Fader, 2004).

To test the robustness of the results in Table 4, we randomly select two of the other top 8 companies in sales volume. We name the companies in the first product category as E and F, and in the second category as G and H. In these categories, the number of consumers with browsing path records for at least one of the two related companies was 1,845 and 718 respectively, while the browsing records for these categories were 12,072 and 6,140, respectively. The number of unique consumers who browse two companies separately in different product categories is shown in Table 5, which shows that 664 out of 1,845 consumers (36%) in the first product category and 412 out of 718 consumers (57%) in the second one. We summarize the results of the parameters and robustness test in Table 6.

Table 5 Statistics of consumers browsing two related companies
Table 6 Estimation of parameters

The parameter estimation and WAIC values indicate the following results, which demonstrate the robustness of the results in Table 4:

  1. 1.

    when comparing WAIC values of the benchmarking model and the proposed association model in these two product categories, all WAIC values of the proposed association models are smaller than the corresponding benchmarking model. This suggests that the model-fitting effect of the proposed association model with browsing-path dependence is better than that of the benchmarking model, regardless of product category.

  2. 2.

    all parameters are significant at the 95% confidence interval corresponding to the estimated value of each parameter and the judgment criteria of Bayesian estimation.

  3. 3.

    the maximum value of the average standard deviation is 0.017, and most of the other values are within 0.010, indicating that the distribution of the estimated values of each parameter is relatively concentrated.

  4. 4.

    the significant positive beta indicates that the related companies in each of the two product categories are in a competitive relationship (Park & Fader, 2004).

We illustrate consumers’ dependence on their paths when browsing related companies in Fig. 2, where the x-axis and y-axis denote the distribution of browsing time for each of the two companies, and deltat represents the time interval between the current and the last browsing behavior. We use the Kernel Density Estimation to fit the correlation of browsing time. According to Fig. 2, the path dependence of most consumers occurs in the first four calendar days. Among them, the dependence on the browsing path of related companies is the highest on the first calendar day, and then it gradually decreases, because when consumers browse related companies for the first time, they need to compare them. With the increase in the number of browses, the information of related companies is gradually familiar, so no comparison is made, which leads to a gradual decrease in the degree of path dependence.

Fig. 2
figure 2

The effect of path dependence. Note: The axes represent time interval between current and last browsing in days. The side graphs show distribution of consumers’ browsing time in the two companies. The main graph represents the frequency of consumers’ browsing path dependence on both companies, and the denser line indicates higher browsing path dependence

4.2 Parameter Estimations Considering Consumer Heterogeneity

The literature suggests that the browsing paths, frequency of browsing, and the time spent on each webpage for different types of consumers can be distinct (Su & Chen, 2015). Our findings show that consumers significantly differ in their online browsing strategies. We, therefore, categorize consumers into 12 types based on their demographics (i.e., 2 gender categories * 3 age types * 2 membership levels). To analyze the heterogeneity of consumer browsing behavior, we choose the consumer sequential browsing-path information of the first product category as the illustration. Each type of consumer is labeled with a three-digit number: the first digit indicates the age group (1, 2, or 3), the second digit indicates the consumer’s gender (0 or 1), and the third digit indicates the consumer’s membership level (1 or 2). For instance, the type with code 102 refers to VIPs, males, and under 35 years old. In Table 7 we demonstrate the browsing path characteristics of the 12 consumer groups in related companies.

Table 7 Consumers’ browsing path characteristics of companies A & B

Our analyses demonstrate that Gamma values have two types of distributions: i) both Gamma values are greater than 1 for groups 101, 111, 201, 202, 211, 301, and 302; ii) both Gamma values are less than 1 for groups 102, 112, 212, 311 and 312. We compare the browsing of heterogeneous consumer groups according to the above distribution of Gamma values by incorporating the following two consumer groups in the detailed analysis: 112 in type (1), and 211 in type (2). We present the Hazard Function of these two consumer groups in Fig. 3, where the three-digit number refers to the focal consumer group. Because each consumer group browses two related companies at the same time, there are two curves of browsing path characteristics for each consumer group. Among them, Alpha1 and Gamma1 represent the scale and shape parameters of company A respectively, while Alpha2 and Gamma2 represent the scale and shape parameters of company B respectively.

Fig. 3
figure 3

Hazard function of different consumer groups

Several observations can be obtained from these results. Consumer groups such as 101, 111, 201, 202, 211, 301, and 302 have both Gamma values greater than 1, suggesting that these consumer groups are different from all other consumer groups. Specifically, the form of their browsing path characteristics in the related companies follows a single peak distribution. The hazard function value of the relevant company of these consumer groups is smallest at time 0, implying that when these groups of consumers enter the website, no matter which company they browse, they don’t have a very clear goal to browse and compare companies at the beginning. In the first spell, when browsing duration increases, the distribution of hazard functions increases monotonously and reaches a turning point near the end of the first spell, and then the distribution of hazard functions gradually decreases. It shows that although a gradually increasing desire to browse and compare companies can be still observed, a turning point is reached near the end of the first spell, and then these consumer groups’ desire to browse and compare to related companies gradually decreases. By contrast, for all the other types of consumers, their hazard function value is the largest at time 0, but when the spell increases, the value of the hazard function gradually decreases. These results indicate that for all consumers other than the above-mentioned groups, a comprehensive comparison between companies starts immediately when they enter the website, while the comparisons fade down as time goes on.

4.3 Robustness Tests

4.3.1 Overall Consumers’ Browsing Behavior

In line with prior studies (Bhatnagar et al., 2017; Chintagunta & Haldar, 1998), we test the robustness of our results by assuming that the spell data follow the Weibull distribution. Two distribution (Log-logistic and Weibull) models are used to analyze the influence of two factors, i.e., distribution form of spell data and dependence of browsing paths, on consumers’ actual browsing. The robustness of the findings is tested according to the best-fit effect with the same data set for both product categories. In Table 8 we summarize our results by comparing the WAIC values of four different models in each product category. We can see that incorporating the dependence of browsing paths performs better for explaining overall consumers’ browsing behavior, and the spell data follow the Log-logistic distribution, which verifies the robustness of our findings.

Table 8 Results of the overall consumers’ browsing behavior robustness tests

4.3.2 Consumer Characteristics

To study the heterogeneity of consumers’ browsing paths of related companies in the same product category, we classified all consumers according to the available demographic information including i) age (3 available types), ii) gender (2 available types), and iii) membership level (2 available levels). Accordingly, consumers can be divided into 12 (3 * 2 *2) types, and the number of consumers in each type is shown in Table 9.

Table 9 Consumer characteristics of browsing related companies in product category 1

Table 10 captures the browsing path characteristics of the 12 groups. Our analyses demonstrate that Gamma values have three types of distributions:

  1. 1.

    both Gamma values are greater than 1, referring to the only consumer group 301

  2. 2.

    only one of the Gamma values is larger than 1, referring to consumer groups 111 and 201

  3. 3.

    both Gamma values are less than 1, referring to all other consumer groups in Table 10.

Table 10 Consumers’ browsing path characteristics of companies E & F

We compare the browsing behavior of heterogeneous groups according to the above distribution of Gamma values by incorporating the following three groups in the detailed analysis: 301 in type (1), 111 and 201 in type (2), and 202 as a random example in type (3). The Hazard Function of these three groups is displayed in Fig. 4.

Fig. 4
figure 4

Hazard function of different consumer groups

The three-digit number in Fig. 4 refers to the focal consumer group. As each group browses two related companies at the same time, there are two curves of browsing path characteristics for each group: Alpha1 and Gamma1 represent the scale and shape parameters of company E, while Alpha2 and Gamma2 represent the scale and shape parameters of company F respectively. Several visual inferences can be drawn from Fig. 3. Group 301 is the only one having both Gamma values greater than 1, suggesting that non-VIP male consumers over 46 years of age are different from all other consumer groups. Specifically, the form of their browsing path characteristics in the related companies follows a single peak distribution. The hazard function value of the relevant company of this group is smallest at time 0, implying that when this group enters the website, no matter which company they browse, they do not have a very clear goal to browse and compare companies at the beginning. In the first spell, when browsing duration increases, the distribution of hazard functions increases monotonously and reaches a turning point near the end of the first spell, and then the distribution of hazard functions gradually decreases. It shows that although a gradually increasing desire to browse and compare companies can be still observed, a turning point is reached near the end of the first spell, and then this group’s desire to browse and compare to related companies gradually decreases.

Our findings indicate the single peak distribution also applies to the consumer groups labeled 111 and 201 as shown in Fig. 3. However, unlike the browsing behavior characteristics of the consumer group labeled 301, the hazard function value for the groups labeled 111 and 201 at time 0 is the largest for F, and the smallest for company E. When the spell increases, the risk function value of E is gradually greater than that of F. These results demonstrate when the groups labeled 111 and 201 enter the website, they are more interested in one certain company (i.e., company F), but as the browsing process goes on, it gradually develops a stronger interest in another company (i.e., company E), implying that these two consumer groups are easily interfered by external information. The hazard function value for all the other types of consumers is the largest at time 0, but when the spell increases, the value of the hazard function gradually decreases. These results indicate that for all consumer groups other than the above-mentioned groups labeled 301, 111, and 201, a comprehensive comparison between companies start immediately when they enter the website, while the comparisons fade down as time goes on.

5 Discussion

5.1 Key Findings

We use duration analysis to explore how path dependence can better explain consumers’ subsequent behavior. We develop a multiple-spell competing hazard model that incorporates the dependence of consumers’ browsing paths in related companies. Through the analysis of overall consumers’ browsing behavior, we find that compared to the bivariate independent model, the multivariate model that considers the dependence of consumers’ sequential browsing paths can accurately describe their actual behavior. Moreover, the Log-logistic distribution of spell data fits better with their actual behavior than the Weibull distribution. Particularly, the dependence on the browsing path of related companies is the highest on the first calendar day, and then it gradually decreases. Through the analysis of the heterogeneous browsing behavior of consumers, we find that when different groups browse related companies, their behavior patterns are quite different, and the degree of this difference is related to the company’s position in the market. According to parameter estimates of heterogeneous consumer groups, for groups with unclear goals, a personalized recommendation service strategy should be implemented near the end of the first spell to increase the recommendation conversion rate. For other groups, as they have clear goals when they browse, online platforms should implement fewer personalized recommendations for these consumer groups to better utilize the resources.

5.2 Theoretical and Managerial Implications

Our findings contribute to research on online service response strategy, personalized recommendations, and duration analysis. Online service response strategy includes both in- and post-purchase components, however, prior studies have explored the topic only from a consumer review perspective after purchase. Compared to physical service, online one lacks face-to-face contact, which may result in greater perceived risks. Thus, timely online service response during purchasing is conducive to improving purchasing conversion rates. Contributing to this line of work, we incorporate a perspective of consumers’ online browsing behavior. Although prior studies considered the browsing behavior of consumers (e.g., Bhatnagar et al., 2017), they have not considered the browsing-path dependence of related companies. Moreover, we explain the browsing behavior and the spell of the browsing behavior from the perspective of the heterogeneity of consumer groups.

For the field of personalized recommendation, current research usually studies how to improve the accuracy of personalized recommendation from the perspective of algorithms (Chang & Jung, 2017; Gan et al., 2019). One of the biggest challenges for recommendation algorithms is data sparsity and the cold start problem (Bunnell et al., 2020). To effectively solve this problem, we use information from heterogeneous consumer groups. In addition, equally important to the recommendation algorithm is the timing of recommendations, since inappropriate recommendations are not conducive to purchase conversion. Thus, we first identify the types of heterogeneous consumer groups that are suitable for personalized recommendation and those that are not, and further analyze the timing of recommendation. Thus, we further enrich the theoretical underpinnings of personalized recommendations, while also presenting a methodological contribution by incorporating duration analysis, which is used to analyze consumers’ purchasing behavior (Chintagunta & Haldar, 1998) but is less frequently adopted in information systems (IS) (Bhatnagar et al., 2017). Compared to the online browsing behavior of consumers, their purchasing behavior cannot reveal their real-time intentions for dynamically adapting the service response strategy of e-commerce platforms.

When it comes to the implications for managerial practice, our results provide a solid reference point for online marketplaces to optimize differentiated service response strategies. Specifically, our findings provide rich insights into consumer differentiation. For groups with a Gamma value of the hazard function greater than 1, as they do not have clear browsing goals, online marketplaces should focus on providing personalized recommendations near the end of the first spell to increase their conversion rates. For the other groups that have clear browsing goals, online marketplaces should implement fewer personalized recommendations to better utilize their computational resources.

5.3 Limitations and Future Research

Although we followed a structured research design, there are limitations that we need to acknowledge, and which present opportunities for future research. In line with the literature, as well as to simplify our data processing and analyses, we aggregated consumers’ behavior to calendar days, which might result in coarser granularity in the subsequent analysis of consumer access time. To further address this issue, we suggest that future research should use various and multiple time windows such as minutes or hours for the data analysis. Additionally, limited by the model and calculation choices, we have only used two related companies to compare the fitting effect. In practice, however, consumers may compare more than two companies during their shopping journey. We, therefore, encourage future research to explore the topic with more combinations of products, companies, and categories. Furthermore, when analyzing consumer heterogeneity, we take the group as a unit. In practice, however, the behavior of individual consumers is heterogeneous. Therefore, we recommend that future research should use individual consumers as a unit when analyzing the heterogeneity of browsing path dependence. Ideally, future research should explore whether the personalized recommendation strategy of heterogeneous consumer groups is beneficial to improve the revenue of a company, as well as the performance of companies in general in terms of other indicators. Finally, we adopted the product categorization of the focal e-commerce platform without further interference in order to enhance the practical implications of our findings. To further contribute to this line of work, future research could form categories based on the nature of products through, for instance, the product involvement concept (e.g., Drossos et al., 2013) or by distinguishing between hedonic and utilitarian ones (Lim & Ang, 2008). The novel avenues for future research that stem from our work can be extended beyond the context of e-commerce platforms. For instance, online recommendations can be beneficial for e-government websites (e.g., Angelopoulos et al., 2010; Kitsios et al., 2009) and can further be applied to policymaking (e.g., Georgiadou et al., 2020). Moreover, our work could also be extended through research on the use of intimate personal data by e-commerce platforms (e.g., Angelopoulos et al., 2021), and has extensions related to cybersecurity threats (e.g., Janse et al., 2017; Ou et al., 2022). We, therefore, encourage future research to further explore these topical and timely areas, in line with the extant IS research agenda (Struijk et al., 2022).

6 Conclusion

In this paper, we examined how the sequential browsing behavior of consumers can enable targeted marketing strategies for e-commerce platforms, by using clickstream data. We deploy duration analysis to explore how path dependence can better explain consumers’ sequential browsing behavior in various product categories and characterize the sequential browsing behavior of heterogeneous consumer groups. In doing so, we demonstrate that sequential browsing path dependence can explain with high accuracy the behavior of consumer. Our findings provide nuanced expositions for strategic decision-making on e-commerce platforms and open up avenues for future research.