Introduction

The development of type 2 diabetes is multifactorial. Besides inherited traits and age, various modifiable risk factors have been identified. Among clinical risk factors, obesity has been found to be one of the strongest risk factors for type 2 diabetes. It has been suggested that excess body fat, especially visceral fat, is central to the pathogenesis of insulin resistance (Lee et al., 2018; Neeland et al., 2019). Prospective cohort studies also found abnormal blood lipid profile, such as low HDL-cholesterol and high triglycerides, to be a strong predictor for the development of type 2 diabetes (Després & Lemieux, 2006; Kruit et al., 2010; von Eckardstein & Widmann, 2014). For lifestyle behaviors, both interventions and observational studies have demonstrated that poor diet (Maghsoudi et al., 2016; Schulze et al., 2005), physical inactivity (Astrup, 2001; Aune et al., 2015), and smoking (Pan et al., 2015) may contribute to the risk of type 2 diabetes independent of weight change. Observational studies have also established that risk drinking is associated with high risk of type 2 diabetes (Knott et al., 2015). In addition, emerging lifestyle risk factors, such as excessive TV watching (Llavero-Valero et al., 2021; Patterson et al., 2018) and unhealthy sleep duration (Cappuccio et al., 2010), have potential as new type 2 diabetes prevention targets. After controlling for the aforementioned risk factors, socioeconomic status, such as low education and insufficient income, has been found to be associated with higher risk of type 2 diabetes (Foster et al., 2018; Maty et al., 2005; Vinke et al., 2020). We present a more extensive summary of evidence in Supplementary Table 1.

In diabetes research, conventional approaches for risk identification often apply traditional regression models, in which the net effects of risk factors are estimated under the assumption of an independent direct effect on diabetes status. However, some risk factors may act as mediators (e.g., obesity, blood lipids) or mainly exert indirect effects (e.g., education, income) (Bardenheier et al., 2013; Roman-Urrestarazu et al., 2016). The lack of insight into their holistic interrelationships has led to the fragmentation of evidence and development of unfocused prevention programs. More specifically, obesity and abnormal blood lipids are largely attributed to unhealthy lifestyle behaviors, whereas all are strongly influenced by socioeconomic status. These factors, in turn, collectively form several hypothesized intersecting pathways that lead to the eventual development of type 2 diabetes (Duan et al., 2021; Foster et al., 2018; Maty et al., 2005; Vinke et al., 2020; Zhu et al., 2021). Socioeconomic status is thus considered the overarching upstream determinant of type 2 diabetes for its significant effects on proximal (or downstream) risk factors. Likewise, lifestyle behaviors are the upstream determinants of clinical disorders such as obesity (Lakerveld & Mackenbach, 2017). In terms of primary prevention, it would be highly useful to understand the relatedness of a broad range of risk factors, so that aiming at prioritized risk factor targets and their most influential upstream determinants would optimize the effectiveness of diabetes prevention at population level.

To this purpose, we aimed to analyze a conceptual model (originally proposed by Bardenheier et al. on prevalent prediabetes (Bardenheier et al., 2013; Roman-Urrestarazu et al., 2016)), including multiple modifiable risk factors and their interrelationships for type 2 diabetes (Fig. 1). We extended the original conceptual model with 4 important lifestyle behaviors, i.e., TV watching (Llavero-Valero et al., 2021; Patterson et al., 2018), smoking (Pan et al., 2015), sleep duration (Cappuccio et al., 2010), and risk drinking (Knott et al., 2015). We examined this model by structural equation modeling (SEM) using data from the Lifelines cohort study, focusing on incident type 2 diabetes as outcome. SEM is a multivariate statistical technique that allows the quantification of multiple intersecting pathways (yielding path coefficients) within a conceptual model simultaneously. Untangling the pathways of these risk factors may provide the additional evidence needed to develop better prevention strategies by identifying the most crucial pathways as priority prevention targets.

Fig. 1
figure 1

Conceptual model illustrating pathways of risk factors to incident type 2 diabetes. MVPA denotes non-occupational moderate-to-vigorous physical activity; WC denotes waist circumference; and sleep denotes unhealthy sleep duration (versus healthy sleep duration). Straight line with one arrowhead denotes a direct effect (e.g., income to MVPA), and curved line with double arrowheads denotes a correlation term (e.g., triglycerides and HDL-cholesterol). For easy reading, several factors are repeated at different locations with different pathways depicted, but they do not differ from their identical others (e.g., education and income [socioeconomic status])

Methods

Study Design of the Lifelines Cohort Study

The Lifelines study is a multi-disciplinary prospective general population-based cohort study that applies in a unique three-generation design to study the health and health-related behaviors of 167,729 people living in the north of The Netherlands. The Lifelines cohort study was established from year 2006 to 2013. Detailed information regarding recruitment strategy and the representativeness of the Lifelines study population are shown in Supplementary Text 1 (Klijs et al., 2015; Scholtens et al., 2015).

Four assessment rounds have taken place: T1-baseline assessment (year 2007 to 2014) and three follow-ups, i.e., T2, T3, and T4. Comprehensive physical examinations, biobanking, and questionnaires were conducted at T1 and T4 (Supplementary Fig. 1). The Lifelines study was conducted according to the principles of the Declaration of Helsinki and was approved by the medical ethical committee of the University Medical Center Groningen, The Netherlands (approval number 2007/152). All participants gave written informed consent to participate the study.

Study Population and Exclusion Criteria

In this study, participants between the ages of 35 and 80 years who were free of diabetes at baseline from the Lifelines cohort study were included. We further excluded participants if (1) they were diagnosed with cancer or renal failure before enrollment; (2) they were pregnant at baseline; (3) they developed type 1 diabetes or gestational diabetes during follow-ups; (4) they had no available follow-up data; and (5) they had unreliable dietary intake data. Dietary intake data was considered unreliable when the ratio between reported energy intake and basal metabolic rate, calculated with the Schofield equation (Schofield, 1985), was below 0.50 or above 2.75, based on the considerations of Goldberg (Black, 2000). Furthermore, except for physical activity and income, participants with missing data on other variables (missing less than 1%) were excluded. This led to an additional exclusion of 1.7% of the study population. In this study, multiple imputation was used to deal with missing data (Kline, 2015). This additional exclusion aimed to avoid massive imputation and was not expected to have major impacts on our results. After applying exclusion criteria, in total 68,649 participants (40,121 women and 28,528 men) were included in the analysis. Supplementary Fig. 2 shows the study flow chart.

Clinical Measurements

Blood samples were collected by venipuncture in a fasting state between 8 and 10 am. Serum levels of glucose, HbA1c, HDL-cholesterol, and triglycerides were subsequently analyzed. Baseline measurements of blood pressure and anthropometry were made by trained research staff following standardized protocols. Anthropometric measurements were performed without shoes and heavy clothing. Participants were considered having hypertension at baseline if they (1) used hypertensive medication (ATC codes C02, C03, C07, C08, and C09) (WHO Collaborating Centre for Drug Statistics Methodology & Norwegian Institute of Public Health, 2020); (2) had systolic blood pressure ≥ 140 mmHg; or (3) had diastolic blood pressure ≥ 90 mmHg (Williams et al., 2018). Detailed information for clinical measurements is available in Supplementary Text 2.

Assessment of Lifestyle and Socioeconomic Covariates

Age, education level, income level, smoking status, sleep duration, TV watching time, and physical activity level were assessed by self-administered questionnaires. Age at baseline was calculated from date of birth in the questionnaire. Highest education level achieved was categorized according to the International Standard Classification of Education (ISCED): (1) low—level 0, 1, or 2; (2) middle—level 3 or 4; and (3) high—level 5 or 6 (UNESCO, 1997). Income was based on monthly household net income and was categorized as < 1000, 1000–2000, 2000–3000, and > 3000 euro/month. Smoking status was categorized as never, former, and current smoker. Unhealthy sleep duration was defined as sleep time less than 6 or more than 9 h per day (Cappuccio et al., 2010). Average TV watching time per day was asked in hours plus minutes. Physical activity level was assessed by the validated Short QUestionnaire to ASsess Health-enhancing physical activity (SQUASH) (Wendel-Vos et al., 2003), from which non-occupational moderate-to-vigorous physical activity (MVPA), including commuting and sports (both if ≥ 4.0 MET), was calculated in minutes per week, and was further divided into sex-specific quartiles (if not zero) or coded to zero (Byambasukh et al., 2020; Wendel-Vos et al., 2003).

Dietary intake was assessed using a semi-quantitative self-administered food frequency questionnaire (FFQ), which was aimed to assess the habitual intake of 110 food items (including alcohol) during the last month and was designed based on the validated Dutch FFQ (Streppel et al., 2013). The questionnaire assessed the frequency of consumption and portion sizes. The latter was estimated using fixed portion sizes (e.g., slices of bread, pieces of fruit) and commonly used household measures (e.g., cups, spoons). The food-based Lifelines Diet Score (LLDS) was calculated to evaluate the diet quality of each participant. More specifically, this score ranks the relative intake of nine food groups with positive health effects (vegetables, fruit, whole grain products, legumes/nuts, fish, oils/soft margarines, unsweetened dairy, coffee, and tea) and three food groups with negative health effects (red/processed meat, butter/hard margarines, and sugar-sweetened beverages). The development of this score is described in detail elsewhere (Vinke et al., 2018). Risk drinking was defined as consuming more than 15 g of alcohol per day, which was approximated to one drink per day.

Ascertainment of Incident Type 2 Diabetes

Incident type 2 diabetes was assessed by self-report questionnaires (T2, T3, and T4) and blood test (T4). Participants were considered an incident case if they met either of the following criteria: (1) self-reported newly developed type 2 diabetes from last available questionnaire; (2) had fasting glucose ≥ 7.0 mmol/L; or (3) had HbA1c ≥ 48 mmol/mol (6.5%) (American Diabetes Association, 2020).

The Conceptual Model

Figure 1 illustrates the conceptual model that connects modifiable risk factors with incident type 2 diabetes and with each other, in which they are grouped into four different levels, i.e., socioeconomic status (education and income), lifestyle behaviors (diet quality [LLDS], non-occupational MVPA, smoking status, TV watching time, unhealthy sleep duration, and risk drinking), clinical markers (triglycerides, HDL-cholesterol, BMI, and waist circumference), and clinical outcomes (blood pressure and incident type 2 diabetes).

The original conceptual model was first proposed by Bardenheier et al. on prevalent prediabetes (Bardenheier et al., 2013; Roman-Urrestarazu et al., 2016). We extended the original model by adding four modifiable lifestyle behaviors (smoking, TV watching, risk drinking, and unhealthy sleep duration) and adapting several pathways based on previous evidence (Supplementary Table 1). Specifically, we hypothesized that (Fig. 1) (1) socioeconomic status had direct effects on lifestyle behaviors; (2) lifestyle behaviors had direct effects on clinical markers; (3) blood lipids (HDL-cholesterol and triglycerides) had direct effects on obesity status (BMI and waist circumference); (4) blood pressure had direct effect on incident type 2 diabetes; and (5) clinical markers had direct effects on clinical outcomes. In the conceptual model, we also allowed direct effects from socioeconomic status and lifestyle behaviors on obesity status and clinical outcomes, because there might be unobserved mediators along the causal pathways. Furthermore, age and sex, as two strong unmodifiable risk factors for type 2 diabetes, were also included in the conceptual model and were hypothesized to have direct effects on all other factors. In total, the conceptual model yielded 96 hypothesized paths and 3 correlations between the measurement errors of variables.

Statistical Analysis

We used structural equation modeling (SEM) to examine our conceptual model (Fig. 1). SEM analysis is chiefly a confirmatory statistical technique to test if the hypothesized model is correctly specified and supported by the data observed, rather than generating new hypothesis (Kline, 2015). Because the hypothesized model consisted of ordered categorical variables (e.g., income), we used the estimation method—weighted least square with mean and variance adjustment (Muthén et al., 1997). The WLSMV is suggested to be the most suitable estimator in SEM if the model tested contains multiple binary or ordered endogenous categorical variables (Muthén et al., 1997). Additionally, we estimated the associations between each included risk factor and incident type 2 diabetes using logistic regression model as a conventional approach for risk identification.

In order to improve and evaluate model fit, the following aspects were considered. First, we referred to the model fit indices calculated from the SEM output, i.e., comparative fit index (CFI), standardized root mean square residual (SRMR), root mean square error of approximation (RMSEA), and Tucker-Lewis index (TLI). We did not purely rely on the commonly used cut-offs of these fit indices as the absolute criteria (Xia & Yang, 2019). Additionally, we performed sensitivity analyses using other estimators to cross-check the model fit. Second, modification indices, which are based on chi-square statistics indicating the changes in model’s goodness-of-fit if an omitted path was added, were also used as reference for adjustments of particular paths (Kline, 2015).

Missing data for income (proportion of missing 15.3%) and non-occupational MVPA (proportion of missing 6.4%) were imputed with chained equation creating 25 imputed datasets (Van Buuren et al., 1999), from which results were pooled according to the Rubin’s rule (Li et al., 1991).

In order to ensure the robustness of our results, we performed several sensitivity analyses. Detailed methods and results are discussed in Supplementary Text 3.

We used STATA (version 13.1) for data management and descriptive data analyses, and R Studio (version 1.1.383) with lavaan package (version 0.6–5; Y. Rosseel) for SEM analysis (Rosseel, 2012). Multiple imputation was performed with mice package (version 3.8.0; S. van Buuren et al.) in R Studio (Van Buuren & Groothuis-Oudshoorn, 2010), and results from imputed datasets were pooled with semTools package (version 0.5–2; T.D. Jorgensen et al.) in R Studio (Jorgensen et al., 2019). Statistical significance was considered if p value < 0.05.

Results

Descriptive Statistics

Among 68,649 participants (aged 35–80 years) included in the analysis, we identified 1124 type 2 diabetes cases (incidence 1.6%) after a median follow-up of 41 months. Compared with participants who did not develop type 2 diabetes throughout the study, those who developed type 2 diabetes tended to be older and male, have less education and lower income at baseline, engage in negative lifestyle behaviors, and have poorer clinical markers (Table 1).

Table 1 Baseline characteristics by diabetes status

Structural Equation Model

The best-fit model (Fig. 2; CFI 0.981, TLI 0.949, RMSEA 0.032, SRMR 0.023) was achieved after we made adjustments to our original hypothesized model (Fig. 1; CFI 0.953, TLI 0.774, RMSEA 0.068, SRMR 0.039). The model fit indices of the best-fit model indicated that the hypothesized model was well supported by the observed data (cut-offs commonly considered for a good model fit: CFI > 0.090, TLI > 0.090, RMSEA < 0.080, and SRMR < 0.060). In brief, we dropped paths that did not yield significant estimates. Based on modification indices (mi), we further added two correlation paths between smoking status and risk drinking (mi = 2444.854), and between non-occupational MVPA and LLDS (mi = 869.306). Additionally, several paths (e.g., TV watching to incident type 2 diabetes) were dropped because results from sensitivity analyses showed substantial changes in path coefficients, which suggested that these estimates were not robust. We present details of stepwise adjustments and reasons for changes in Supplementary Table 2.

Fig. 2
figure 2

Quantified best-fit conceptual model illustrating pathways of risk factors to incident type 2 diabetes. MVPA denotes non-occupational moderate-to-vigorous physical activity; WC denotes waist circumference; and sleep denotes unhealthy sleep duration (versus healthy sleep duration). Straight line with one arrowhead denotes a direct effect (e.g., income to MVPA), and straight or curved line with double arrowheads denotes a correlation term (e.g., triglycerides and HDL-cholesterol). For easy reading, several factors are repeated at different locations with different pathways depicted, but they do not differ from their identical others (e.g., education and income [socioeconomic status]). Sample size tested for the conceptual model, n = 68,649. Tests for significance: p value < 0.001 for all path coefficients except for HDL-cholesterol to blood pressure (p value = 0.002) and smoking to incident type 2 diabetes (p value = 0.012). Adjusted for sex and age

Figure 2 presents the best-fit hypothesized model with standardized path coefficients. Paths related to age and sex are not shown in Fig. 2 but available in Supplementary Table 3. Among all modifiable risk factors included in the conceptual model (standardized β-coefficients are given in parentheses), waist circumference (0.214) had the strongest direct effect on type 2 diabetes, followed by HDL-cholesterol (− 0.134), triglycerides (0.096), income (− 0.074), blood pressure (0.055), diet quality (− 0.045), and smoking (0.035). Except for unhealthy sleep duration, education showed larger positive effects than income on all lifestyle behaviors. All included lifestyle behaviors were significantly associated with clinical markers, among which non-occupational MVPA, smoking, and TV watching yielded larger effect sizes. Risk drinking and smoking showed mixed effects on metabolic profiles. Almost all factors received strong direct effects from age and sex. In addition, correlations were found between BMI and waist circumference, between education and income, between triglycerides and HDL-cholesterol, between smoking status and risk drinking, and between diet quality and non-occupational MVPA.

For more information, please see Supplementary Table 3, which shows all standardized and unstandardized coefficients with standard errors for all paths.

Supplementary Table 4 shows the results of logistic regression model as a conventional approach for risk identification. The strongest effects were found for income group > 3000 euro/month (− 0.405), waist circumference (0.386), sex (women compared with men, 0.355), and HDL-cholesterol (− 0.339).

Results from sensitivity analyses showed consistent results, which indicated our estimates are robust. Compared with the main analysis, some variations were found when replacing incident type 2 diabetes by fasting glucose and HbA1c measured at T4. Detailed discussions of sensitivity analyses are presented in Supplementary Text 3.

Discussion

This study is the first that examined a broad range of key modifiable risk factors simultaneously in relation to incident type 2 diabetes using SEM. Our analysis quantified the complex pathways of these concomitant risk factors on the subsequent risk of developing type 2 diabetes, which provides valuable insights into the identification of priority prevention targets. Our results further extend knowledge of previous similar studies on prevalent prediabetes (Bardenheier et al., 2013) and prevalent type 2 diabetes (Roman-Urrestarazu et al., 2016) by incorporating four important lifestyle behavioral factors, i.e., smoking, TV watching, risk drinking, and unhealthy sleep duration.

Interrelationships of Risk Factors

There are several key findings. First, of the two obesity indicators examined, large waist circumference was found to have a strong direct effect on type 2 diabetes. Our results highlight the importance of waist management, in addition to BMI control, for diabetes prevention in both clinical practice and public health interventions (Lee et al., 2018; Neeland et al., 2019). Second, blood lipids, assessed as a higher level of HDL-cholesterol and a lower level of triglycerides, had critical direct effects on lowering diabetes risk. Additionally, healthier lifestyle behaviors, especially watching less TV and engaging in more non-occupational MVPA, indirectly and favorably affected diabetes risk through the mediation of clinical markers (i.e., blood lipids and obesity status), indicating their equal importance in diabetes prevention.

For socioeconomic status, our analysis dissected the differential effects between education and income, showing that low education, rather than insufficient income, is the major upstream determinant of unhealthy lifestyle behaviors. In the context of The Netherlands, where the level of income inequality is relatively low, the effect of lower income on lifestyle behaviors may not predominantly be due to less access to healthy lifestyle resources. Instead, it is suggested that self-perceived control, attitudes, and social norms towards adopting a healthier lifestyle are more restrained among those with lower education (Stronks et al., 1997). Programs promoting healthy lifestyle should be complemented by additional elements to help people with lower education (Ball et al., 2012; Van der Lucht & Polder, 2010).

It is noteworthy that we observed direct effects of education on obesity status, as well as of income, diet quality, and smoking on type 2 diabetes. A cautious interpretation is warranted, as it cannot be excluded that the observed direct effects are in fact due to other, but unobserved, existing mediators or confounders, such as neighborhood deprivation (distal environmental factors) and chronic inflammation (proximal clinical biomarkers) (Dekker et al., 2020; Kivimäki et al., 2018; Zhu et al., 2021).

Identification of Priority Prevention Targets

In terms of primary prevention, this simultaneous quantification of multiple risk factors and their intersecting pathways puts scattered evidence together and enables the identification of key upstream prevention targets for type 2 diabetes. Public health programs on these targets may have the potential to address as much of the broader risk profile as possible, particularly for those proximal clinical markers, for which pharmacological interventions may often be needed. Based on our results, (1) reducing large waist circumference may be prioritized as a main clinical target for diabetes prevention; (2) less TV watching time and more physical activity may be the main behavioral targets; and (3) better education may be the main societal target. Future studies are encouraged to examine the conceptual model in other populations.

It should be noted that the prevalence of type 2 diabetes at baseline in our population from the northern Netherlands (4.5%) is comparable to the average of upper-middle-income countries (5.6%), but lower than the average of high-income countries (7.9%) (Institute for Health Metrics and Evaluation, 2021). Regarding incidence, 1.6% of our study sample developed type 2 diabetes after a median follow-up of 41 months (230,259 person-years), which is translated into an incidence rate of 4.9 per 1000 person-years. In the literature, we found a wide range of incidence across different countries and cohorts, ranging from 2.6 per 1000 person-years in the UK Biobank study (Levy et al., 2021) to 11.4 per 1000 person-years in the American Multi-Ethnic Study of Atherosclerosis (Joseph et al., 2016). Despite the differences in cohort design and methodology that preclude direct comparisons, this high prevalence and incidence of type 2 diabetes worldwide call for us researchers to further work on curbing this global pandemic, especially by adopting innovative approaches to further build the evidence basis for the design of more effective public health programs (for detailed data, please see Supplementary Table 5).

Strengths and Limitations

Conventional approaches for risk identification commonly estimate the total net effects of risk factors, but leave their interrelationships masked. We further illustrated this by comparing the results between using SEM and logistic regression model (Supplementary Table 4). More specifically, SEM clearly elucidated the extent to which education impacted on risk of type 2 diabetes through the mediation of lifestyle behaviors, while such information is unavailable in results from logistic regression models. Using SEM also avoids possible multiple testing of significance if each mediation pathway was modelled separately.

In our conceptual model, we did not develop latent variables as in previous similar studies (Bardenheier et al., 2013; Roman-Urrestarazu et al., 2016). Instead, we used single aggregate measures for diet and physical activity, and additionally added a correlation term between income and education. For diet and physical activity, our selected indicators are evidence-based and easy to apply to evaluation at population level (Byambasukh et al., 2020; Vinke et al., 2018). However, for latent variables, indicators were usually arbitrarily selected specifically to that study population, which may limit their generalizability. Nevertheless, we acknowledge that constructing a latent variable for lifestyle factors may help reduce measurement error. For effects of socioeconomic status, we clearly illustrated that the effects of income and education were different along the pathways to type 2 diabetes.

Our study also has some limitations. Even though we constructed the model in a prospective setting, the hypothesized pathways from socioeconomic status to clinical biomarkers are still of cross-sectional nature, although the lifestyle questionnaires were collected before the clinical measurements, and socioeconomic status was unlikely to change throughout the study period. An alternative conceptual model is also possible, even if model fit indices and sensitivity analyses indicate that our final model was well supported by the data observed. In addition, as the Lifelines cohort mainly consists of local Dutch participants, it may not be possible to extrapolate our results to other populations. Another limitation of this study is that misclassification could occur in the ascertainment of type 2 diabetes cases, since at T2 and T3 only self-reported data was available. We also regrettably do not have data on medication use during follow-ups to validate self-reported diagnosis of type 2 diabetes. However, as most cases were identified by objective laboratory measurements at T4, this limitation is unlikely to have introduced severe bias in our results. A final concern is that we regrettably could not analyze the potential impacts of lost to follow-up (23.2%) among eligible participants. Such attrition could affect our estimation, specifically for the pathways directly linked to type 2 diabetes status. Nonetheless, the baseline characteristics of those who had no follow-up data were comparable with the study population, except for some minor differences in education level (Supplementary Table 6). Simulation studies have shown that such attrition bias may only have limited influences on estimates of associations in regression analysis (Howe et al., 2013; Peters et al., 2012).

Conclusions

This prospective study examined modifiable risk factors as a system in relation to incident type 2 diabetes through integrated pathways in a large population-based cohort. Quantifying the pathways of those modifiable risk factors using SEM may be a useful tool for the prioritization of prevention targets. Primary prevention strategies targeting proximal clinical risk factors should be complemented with public health initiatives that simultaneously address their corresponding upstream determinants. Regarding the current guideline for diabetes prevention, waist management in addition to BMI control (clinical level), as well as less TV watching in addition to more physical activity (behavioral level), may provide additional public health benefits. Better education would be the main societal goal for the prevention of type 2 diabetes.