0 Datasets
0 Files
Get instant academic access to this publication’s datasets.
Yes. After verification, you can browse and download datasets at no cost. Some premium assets may require author approval.
Files are stored on encrypted storage. Access is restricted to verified users and all downloads are logged.
Yes, message the author after sign-up to request supplementary files or replication code.
Join 50,000+ researchers worldwide. Get instant access to peer-reviewed datasets, advanced analytics, and global collaboration tools.
✓ Immediate verification • ✓ Free institutional access • ✓ Global collaborationJoin our academic network to download verified datasets and collaborate with researchers worldwide.
Get Free AccessIntroduction: High-income countries are undergoing significant demographic shifts, characterized by population decline and progressive aging. These transformations are associated with an increase in the prevalence of chronic diseases, which often coexist, worsening individuals’ quality of life and increasing healthcare costs. Identifying the factors that contribute to the onset of multimorbidity is particularly complex, as these factors often interact with each other and cause multiple effects across different diseases. Objectives: This study aimed to identify the main risk factors for multimorbidity within a large UK cohort using a fully nonparametric ensemble method. This approach makes no assumptions about the underlying relationships between variables and allow managing high-dimensional data while preventing overfitting. Methods: We analyzed data from the UK Biobank cohort, which includes detailed information on socioeconomic status, lifestyle, anthropometric measures, and environmental exposures collected at recruitment, along with disease occurrence obtained through linkage with hospital admissions (primary and secondary diagnoses), death records, and cancer registries. Multimorbidity was defined as the presence of at least two chronic conditions from a list developed through an international consensus using a modified Delphi method [1]. To assess the role of 18 candidate variables in predicting the onset of multimorbidity over a five-year follow-up, we applied a random forest algorithm adapted for survival analysis within a competing risk framework [2], considering two competing events: the development of multimorbidity and death prior to its onset. The candidate variables included: white British/Irish ethnicity (Yes/No), qualification level, average total household income before tax (adjusted for household size and categorized into quintiles), area-level index of multiple deprivation (deciles), body mass index (kg/m2), waist circumference (cm), pack-years of smoking, alcohol drinking (g/day), healthy diet score (ranging from 0 to 5, based on the intake of fruit, vegetables, fish, whole grains, processed and red meat), walking (at least 10 min, number of times a week), moderate physical activity (at least 10 min, number of times a week), vigorous physical activity (at least 10 min, number of times a week), particulate matter air pollution 2.5 (PM2.5) (µg/m3), PM2.5-10 (µg/m3), PM10 (µg/m3), NO2 (µg/m3), average exposure to evening (7:00 pm – 11:00 pm) or night noise (11:00 pm – 7:00 am) (dB). Results were summarised using out-of-bag partial dependence plots and variable importance (VIMP) metrics. Results: Of the 422,344 individuals included in the cohort, aged between 39 and 73 years, we selected 137,565 participants who were free from the conditions included in the definition of multimorbidity at the time of recruitment and for whom risk factor information was available. During the five-year follow-up, 4384 individuals developed multimorbidity (2740 males, 1644 females). The five-year cumulative incidence was 3.9% in males and 2.6% in females. Among individuals who developed multimorbidity during follow-up, the main conditions observed were cancer (52.4% of males and 52.1% of females), arrhythmias (44.7% of males and 28.5% of females) and coronary artery disease (42.1% of males and 24.8% of females). Based on VIMP metrics, the strongest predictors in men were smoking, waist circumference, and sleep duration; in women alcohol, smoking, and waist circumference. Five-year cumulative incidence was higher for heavy smokers (sex-specific 95th percentile of pack-years) (males: 6.3%, females: 4.0%) compared to non-smokers (males: 3.5%, females: 2.4%); for individuals with elevated waist circumference (sex-specific 95th percentile) (males: 6.1%, females: 5.2%) versus those with median values (males: 3.9%, females 2.6%); for heavy alcohol drinkers (sex-specific 95th percentile) (males: 4.6%, females: 4.0%) versus median intake (males: 3.8%, females: 2.4% ); for those sleeping 4 hours/day (males: 6.3%, females: 4.2%) or 10 hours/day (males: 6.5%, females: 4.5%) versus 7 hours/day (males: 3.7%, females: 2.5%). Diet, physical activity, and air pollution had smaller impacts. Conclusions: Preventive interventions targeting smoking, abdominal obesity, and heavy alcohol consumption among middle-aged adults in the UK and likely in other high-income countries, may substantially reduce the incidence of multimorbidity. Such interventions could improve the health trajectory and burden of disease of future older populations. In addition, promoting adequate sleep duration appears to be beneficial and should be integrated into public health recommendations.
Linia Patel, Silvia Mignozzi, Margherita Pizzato, Carlo La Vecchia, Gianfranco Alicandro (2025). A Random Forest Algorithm For Identifying Risk Factors For Multimorbidity In The UK Biobank Cohort. , DOI: https://doi.org/10.54103/2282-0930/29299.
Datasets shared by verified academics with rich metadata and previews.
Authors choose access levels; downloads are logged for transparency.
Students and faculty get instant access after verification.
Type
Article
Year
2025
Authors
5
Datasets
0
Total Files
0
Language
en
DOI
https://doi.org/10.54103/2282-0930/29299
Access datasets from 50,000+ researchers worldwide with institutional verification.
Get Free Access