Kliniken &… Institute Heidelberger Institut… Groups & Projects Working Groups Adolescent Risk…

Clustering of Adolescent Risk Profiles

Adolescence is the stage of life that occurs between childhood and adulthood and is marked by significant physical growth, psychological development, and evolving social relationships. There are presently 1.86 billion teenagers aged 10 to 24 years old worldwide (24 percent of the global population), the greatest number ever recorded (1). Although the right to health of adolescents is established in the United Nations Convention on the Rights of the Child, adolescent health, development, and well-being have received little attention in global health and social policy (2,3). Youth are a very significant age group for risk factor analyses since many risk and protective behaviors are formed in youth and become lifelong habits. The key behavioral risks that impede population health and health-related human development are well understood (4–8). They include poor diet, consumption of sugar-sweetened beverages, failure to follow good hygiene practices, low physical activity, substance use, excessive alcohol consumption, smoking, and violence (9). These behaviors drive the major biological and physical risk factors for ill health including hypertension, diabetes, infections and injuries. However, unlike biological and physical risk factors, which are often causally linked to each other, behavioral risk factors do not necessarily cluster in populations. Identifying behavioral health risk profiles in various populations is thus beneficial. Youth behavioral health risk profiles could be used as both 'sensors' for adult illness and death, as well as starting points for designing and targeting public health and behavior change initiatives (10,11).

Problem definition

Youth are a very significant age group for risk factor analyses since many risk and protective behaviors are formed in youth and become lifelong habits (12). In most countries and communities, the behavioral health risk profiles are not known in youth or other age groups. The small extant literature on this topic is not based on rigorous statistical learning and data mining approaches (13).

Survey (related work)

Many modifiable multiple risk behaviors, which means an engagement in two or more risk behaviors, (smoking, excessive alcohol use, poor diet) begin in adolescence but can develop into a habit in adulthood, increasing the risk of comorbidities and early mortality. Adolescents who engage in one risk behavior are more likely to engage in others (14–16). This holds true for both substances - for example tobacco, alcohol, and illicit drugs - and behaviors, such as sexual risk, self-harm, and antisocial behavior (16). According to the World Health Organization (WHO), adolescent risk behaviors account for two-thirds of premature deaths in adults, with physical inactivity accounting for 81 percent of youth aged 11–17 years and heavy drinking accounting for 11.7% (17). Most noncommunicable diseases (NCDs), which account for more than 70% of all deaths worldwide and are becoming more prevalent among youth (18), share predisposing risk factors, such as concurrent exposure to poor diets, physical inactivity, and harmful use of cigarettes and/or alcohol in diabetic and cancer patients (17). These variables are unlikely to occur in isolation; instead, they cluster and combine to exponentially raise health risks, such as NCDs (19).


Our overarching research goal is to identify behavioral health risk profiles in many countries and communities worldwide and to eventually translate this information into effective interventions for behavior change. We have designed the following three specific research aims for this project: to (i) identify behavioral health risk profiles (‘sensors’), (ii) measure the prevalence of each profile, and (iii) quantify the risk factors characterizing each profile.


Our research is a first in that it uses analytic techniques to ‘discover’ clustering patterns of ten major lifestyle risk behaviors (dietary behavior, hygiene, injury, mental health, smoking, drinking, drugs, sexual behavior, physical inactivity, protective factors) among school-aged adolescents. Typically, studies using this data set have used latent class analyses (20–23) or logistic regression to identify clusters (24), amongst others, only focusing on a selected range of risk variables and countries. These studies found that using such methods for risk behavior clustering yielded positive results to increase our understanding in youth risk clusters, and with our approach, we are taking a next step in exploring the usefulness of a selection of machine learning algorithms. The individual risk profiles would likely make for powerful segments for social marketing of healthy lifestyles and other prevention and public health campaigns and interventions. 

Data source: Global School-based Student Health Survey

For our analyses, we used publicly available Global School-based Student Health Survey (GSHS; Link). The GSHS aims to assess and quantify risk and protective factors. The survey draws content from the Youth Risk Behavior Survey of the Centers for Disease Control and Prevention (CDC) for which test-retest reliability has been established (25). Overall, the GSHS contains data from students of more than 100 countries on 10 major behavioral health risk factor categories, including tobacco use, alcohol use, drug use, dietary behaviors like servings of fruits and vegetables a youth ate; soft drinks and fast food (26,27), hygiene like tooth brushing (28,29) as well as water, sanitation, handwashing, and hygiene rules (WASH) (30), physical activity, sexual risk behaviors, and violence and injuries.

Outcomes so far

  • Dashboard (PowerBI) about our clustering approach for the all countries of the Global School-based Student Health Survey: Go to Dashboard