Michaël Rioux, Data Scientist, M.Sc.
Daniel Braun, Data Scientist, Ph.D.
Executive Summary
Our longitudinal analysis of wellness data over a three-year period (2022-2025) aimed to identify the drivers of wellbeing and BIOTONIX Health platform engagement. Using Linear Mixed-Effects Models, we studied the wellbeing-engagement relationships of 1,628 corporate users on four key health domains: Posture, Fitness, Mental, and PsychoSocial health. Our main analytical findings are reported in Table 1.
The analysis revealed that a user’s health state is highly stable; the strongest predictor of any future score was the score from the previous quarter, which underscores that meaningful change requires consistent, long-term effort. Beyond specific behaviors, we also uncovered a natural, albeit modest, upward trend in both Fitness and Mental health scores over time, suggesting that sustained participation in the wellness platform itself contributes to positive outcomes.
The most significant driver of positive change, above and beyond the general time effect, was consistent physical activity. A positive interaction between externally logged exercise and time was the only behavior associated with longitudinal improvements in both Fitness and Mental scores. Furthermore, physical activity creates a powerful feedback loop. Users who log external exercise are far more likely to engage with other platform features, like watching educational videos, and vice-versa, making exercise the key « gateway » to broader engagement.
Conversely, some interventions showed different patterns. The structured posture-correction (BTX) programs, while clinically validated, did not produce a measurable change in Posture scores over time, likely due to low adherence to the demanding protocol in a corporate setting. Similarly, users primarily turned to Meditation to manage bad moods, using it for short-term emotional regulation rather than as a tool for long-term mental improvement.
Overall, the primary strategic implication is to prioritize and foster the habit of consistent physical activity. This not only directly improves key health outcomes beyond the baseline improvement seen over time, but also drives a positive feedback loop of wider platform engagement.
| Finding | Supporting Evidence & Magnitude |
| 1. Health States are Highly Stable | A user’s previous score was the single strongest predictor across all domains (e.g., +22 points for Fitness, +14 for Mental). |
| 2. External Exercise is the Primary Driver of Improvement | The interaction of External Exercise and Time was the only behavior significantly associated with longitudinal gains in Fitness (+1 points/3 months) and Mental (+0.5 points/3 months) scores. |
| 3. Sustained Engagement Has Inherent Benefits (Time Effect) | A baseline increase over time was observed in Fitness (+0.66 points/3 months) and Mental (+0.28 points/3 months) scores, independent of specific activities. |
| 4. Platform Activities Create a Positive Feedback Loop | – BTX Programs ➔ External Exercise (+13 sessions) – Videos Watched ➔ External Exercise (+12 sessions) – External Exercise ➔ BTX Programs (+3 sessions) – Corporate League ➔ Videos Watched (+ 2 videos) |
| 5. Demanding Interventions Have Low Adherence | The clinically robust BTX Programs for posture showed no significant longitudinal improvement in Posture scores. Cross-sectional analysis revealed users were more likely to stop using time-consuming features like the BTX Program before observing quantifiable results. |
| 6. Mindfulness is Used Reactively for Mood Regulation | Bad Mood was a significant predictor of time-consuming activities like Meditation (+0.04 sessions). Conversely, engaging in more breathing exercises was associated with a 0.25-point increase in logging a bad mood. |
Table 1: Main Analytical Findings of the Longitudinal Study.
Introduction: The Wellbeing Feedback Loop
Our analysis was structured around two bidirectional questions:
-
- The Intervention Hypothesis: Does proactive engagement with platform activities drive measurable changes in users’ wellness scores over time?
-
- The Engagement Hypothesis: Do wellness scores and mood states predict how and why users will engage with the platform in the future?
Looking at both sides of this loop allows to uncover what truly works to improve employee wellbeing and identify the behavioral patterns that lead to sustained positive outcomes. To provide context for thes e findings, this section outlines the validated instruments used to measure wellbeing and the primary user activities tracked by the BIOTONIX Health platform.
The Wellbeing Scores (%)
-
- Posture Score: Derived from the BIOTONIX Posture system developed by Dr. Sylvain Guimond, an FDA-compliant clinical tool used in over 2 million assessments worldwide. The system, which has been used to optimize the performance of world-renowned athletes, provides a detailed analysis of postural deviations (BIOTONIX, 2023).
-
- Fitness Score: Based on the BIOTONIX Physiological Age assessment, a protocol also used by major fitness partners like Énergie Cardio, which evaluates key markers of physical health beyond chronological age.
-
- Mental Score: A composite score developed using Item Response Theory (IRT), a modern psychometric standard for accurately measuring latent traits (Embretson & Reise, 2000). These instruments have been cross-validated in a clinical setting against established tools such as the BDI-II, BAI, and IVA-2.
-
- PsychoSocial Score: Calculated from a proprietary BIOTONIX workspace happiness scale and the psychosocial risk factors as identified by the INSPQ: autonomy, balance, recognition and support.
Key Platform Activities (Counts)
-
- External Exercises: User-logged physical activities, such as walking or running.
-
- BTX Programs: Structured exercise programs, primarily for posture correction.
-
- Mindfulness: Guided breathing and meditation sessions.
-
- Mood Tracker: Daily journaling of mood and its causes.
-
- Videos Watched: Engagement with the platform’s educational content.
Methodology and Validation
The analysis covered a longitudinal dataset of 1,628 corporate users. Most participants contributed data for a single season — a pattern typical of workplace wellness programs. Because the data were sparse and uneven across time, we modeled individual trajectories with Linear Mixed-Effects Models (LMMs) (Pinheiro & Bates, 2000), which balance population-level and person-specific trends.
Data preparation followed a structured sequence. We first identified outliers with an Isolation Forest (Liu et al., 2008) and trimmed remaining extremes using interquartile range (IQR) rules to limit leverage (Kutner et al., 2005). Missing values were handled in passes: rule-based forward or zero fills where appropriate, then z-score standardization to place variables on a common scale before applying Multiple Imputation by Chained Equations (MICE) on the standardized data (van Buuren & Groothuis-Oudshoorn, 2011). Structural gaps (e.g., before a feature launch) were deliberately left un-imputed to avoid artificial completion (van Buuren, 2018).
To guard against p-hacking or confirmation bias, the workflow split the data into exploratory, confirmatory, and test subsets (Wagenmakers et al., 2012). The exploratory phase produced FDR-corrected p-values that prioritized candidate predictors; these guided but did not constrain the confirmatory analysis. The confirmatory phase used an AIC-guided bidirectional stochastic beam search, which favors models that improve AIC while allowing probabilistic exploration to avoid local optima. Predictors with stronger exploratory support were sampled more often, and the algorithm explicitly considered the expected AIC behavior of models containing weaker candidates — a design that required holding a test set aside to verify whether confirmatory AIC gains generalized out of sample. We then applied model averaging to the best-performing models to reflect selection uncertainty (Burnham & Anderson, 2002; Symonds & Moussalli, 2011). Models with multicollinearity (VIF > 5.0) were excluded (Kutner et al., 2005).
Results are reported as statistically significant when p < 0.05 and as weak trends when 0.05 ≤ p < 0.10.
Interpreting the Fit Quality
To evaluate model performance, we used the two R² measures defined by Nakagawa & Schielzeth (2013). The Marginal R² captures how much variance is explained by fixed predictors such as activities or demographics, whereas the Conditional R² adds the influence of random effects that reflect individual differences.
The gap between them illustrates how much behavior depends on personal habits rather than observable factors. Two main patterns emerged:
-
- For health outcomes, a high Marginal R² confirmed that baseline health is the dominant predictor of a user’s future state, reflecting strong temporal stability.
-
- For platform engagement, a large gap between Marginal and Conditional R² showed that unmeasured personal habits are a primary driver of how users interact with the platform.
We also saw common challenges in longitudinal modeling. Prediction errors were slightly skewed and heavy-tailed (high kurtosis), meaning the models fit most users well but struggle with abrupt outliers, meaning our models perform strongly for the stable majority, while a small minority experiencing abrupt changes are less predictable (e.g., an injury that sharply lowers fitness, or a life event that suddenly affects mental health). This struggle with infrequent behaviors is also reflected in relatively high Root Mean Square Errors (RMSE), especially for posture in our dataset.
Note on Imputation Quality
To validate the MICE procedure, we compared variable distributions and covariance structures pre- and post-imputation, testing for significant shifts using the two-sample Kolmogorov-Smirnov test (Massey, 1951) and Wasserstein distance (Rubner, 2000). The diagnostics confirmed the imputation did not meaningfully alter the data’s underlying structure for most variables. The only exception was the recently introduced mood tracking feature, which, due to its sparse initial data, showed a moderate distributional shift that we deemed acceptable but flagged for consideration during model interpretation.
Figure 1: Absolute Differences in Correlation Post-Imputation
The Intervention Hypothesis: What Drives Health Improvement?
The longitudinal analysis revealed two primary forces shaping health outcomes: a strong baseline stability where past health is the best predictor of future health, and the singular impact of consistent external exercise, which was the only behavior found to drive improvement over time
Emergent Engagement Patterns
A preliminary cross-sectional analysis revealed that engagement in one activity was generally positively correlated with engagement in others. However, activities requiring a more significant time commitment, such as the structured BTX Programs and Meditation sessions, showed lower inter-correlation with other platform features. This suggests that users might selectively engage with less time-consuming activities. This highlighted the need for a longitudinal approach to determine which specific activities drive long-term wellbeing.
Health States are Inherently Stable
Across all four wellbeing domains, a user’s score in the previous quarter was the most significant predictor of their current score. This high degree of autocorrelation, a common feature in longitudinal health data, confirms that behavioral effects, while important, operate on top of a strong pre-existing baseline (Fitzmaurice, Laird, & Ware, 2011). The magnitudes of this baseline effect were:
-
- PsychoSocial Health: +33 points
-
- Fitness: +22 points
-
- Posture: +22 points
-
- Mental Health: +14 points
Posture was a uniquely stable metric; a user’s score from the previous quarter was its only statistically significant predictor. This suggests posture reflects chronic adaptations that are resistant to short-term change (Sahrmann, 2002). This stability likely explains why the structured BTX posture programs failed to produce a measurable effect. We hypothesize that the 8-week protocol, while clinically robust, is too demanding for a corporate environment, leading to low adherence—a known challenge for digital health interventions (Eysenbach, 2005). Our future work will focus on re-imagining demanding interventions, like the posture program, into smaller, more manageable « micro-habits » to improve adherence, based on behavior design principles suggesting that simplicity is a key driver of action adoption (Fogg, 2009).
Selection Effects and Static Health Links
The data’s static associations primarily pointed to a ‘selection effect,’ where a user’s pre-existing condition predicts their platform engagement (Groenwold et al., 2009).
-
- Program Usage: Engagement with posture programs correlated with a 1.47-point lower Fitness score, indicating these foundational programs are attracting users with the greatest need. Similarly, this engagement was also linked to a small but significant 0.34-point decrease in PsychoSocial scores, which may indicate users with lower psychosocial well-being are drawn to more structured interventions.
-
- Demographics: Older age was associated with more BTX program usage (an increase of 0.27 sessions). There was also a weak trend for male users to have lower Fitness scores overall (a decrease of 0.82 points). This demographic is the most vulnerable to neck and back pain.
-
- Exercise and Mental Health: Higher initial engagement in exercise correlated with a 0.86-point lower Mental score, likely reflecting that users with lower mental wellbeing are self-selecting into accessible activities like walking.
-
- Total Activity and PsychoSocial Health: A higher number of total logged activities was a significant predictor of a higher PsychoSocial score (an increase of 0.50 points). Being in a corporate league was also associated with a 0.26-point increase in this score. Conversely, there was a weak trend suggesting that a higher number of total activities was associated with a small decrease in Mental scores (0.54 points decrease), further supporting the selection effect hypothesis that corporate users with a greater need may try a wider variety of interventions.
We also found several small but significant cross-domain associations:
-
- Better physical Posture was associated with a minor 0.35-point decrease in the PsychoSocial score.
-
- The relationship between Mental and PsychoSocial health was notably asymmetrical: while a higher PsychoSocial score predicted a 0.48-point increase in the Mental score, the inverse was negative—higher Mental scores were associated with a 0.23-point decrease in PsychoSocial scores. This suggests a unidirectional positive influence from psychosocial to mental wellbeing, a finding that aligns with established occupational health literature where workplace factors are known predictors of psychological health (Stansfeld & Candy, 2006).
-
- Higher Fitness scores were also associated with a 0.38-point increase in Mental scores, highlighting the influence of physical factors on mental health.
Alternatively, rather than a selection effect, it’s possible that individuals with better baseline health hold higher standards for workplace wellbeing and are therefore more sensitive in their reporting of psychosocial risks.
Consistent Exercise is the Primary Driver of Improvement Over Time
The only user behavior associated with a significant improvement in health scores over time was consistent external exercise. A significant positive interaction between exercise and time predicted a 1.32-point increase in Fitness scores and a smaller, but still significant, improvement in Mental Health. This finding was consistent with established research on the benefits of physical activity for both physical and mental health (e.g., Hillman, Erickson, & Kramer, 2008; Chekroud et al., 2018).
Beyond the impact of specific behaviors, the models revealed a significant « time effect, » where prolonged engagement improved both health outcomes and platform proficiency, independent of any specific logged activity. The impact of specific interventions, such as external exercise, should therefore be interpreted as an additional benefit on top of this natural upward trajectory.
The specific gains were:
-
- Inherent Health Gains: Both Fitness (+0.66 points per quarter) and Mental (+0.28 points per quarter) scores showed a statistically significant increase over time.
-
- Increased Platform Mastery: The number of Challenges completed also increased with continued participation. Furthermore, users with higher overall wellness scores completed significantly more challenges (+0.12). This suggests that as users spend more time on the platform, they become more comfortable and proficient with its gamified features.
The Engagement Hypothesis: What Drives Platform Use?
To complement the health outcome analysis, we modeled the drivers of user activity on the platform. This revealed a powerful pattern of synergistic engagement.
A Virtuous Cycle of Engagement
A strong, reciprocal relationship was observed between physical activities and educational content, supporting the concept of health behavior clustering, where interventions targeting one behavior can spill over to positively affect others (Prochaska & Prochaska, 2011). Both in-platform activities (BTX Programs, Videos Watched) and off-program actions (External Exercises) create a virtuous cycle. For instance, engaging with BTX Programs and watching videos were highly predictive of logging more external exercises (increases of approximately 13 and 12 sessions, respectively). In turn, logging more external exercises was a powerful predictor of engaging more with BTX Programs (an increase of nearly 3 sessions).
Analysis reveals that watching educational videos acts as a central hub for platform use. This finding aligns with behavioral models like the Theory of Planned Behavior, where information influences attitudes and perceived control, which are antecedents to behavioral intention and action (Ajzen, 1991). Engagement with videos is significantly predicted by a wide range of other activities, including completing BTX Programs (an increase of 0.92 videos), completing challenges (an increase of 2.50 videos), logging external exercises (an increase of 1.20 videos), and even doing breathing exercises (an increase of 5.53 videos).
Social features (through gamification) can amplify this effect, though in nuanced ways. Participation in a corporate league was associated with watching nearly two more videos and showed a weak trend for logging more external exercise. However, it was also associated with engaging in significantly fewer BTX Programs (a decrease of 0.78 sessions), suggesting league participation may steer users toward more social or general fitness activities over corrective, individualized programs. Furthermore, the influence of these behaviors is not static; for example, the positive impact of external exercise on starting a BTX program significantly increases over time, while the impact of watching videos on the same outcome decreases over time.
Mood Dictates Proactive vs. Reactive Use
The data reveals a clear behavioral divide dictated by a user’s logged mood, which determined whether their engagement was for immediate relief or proactive growth.
When users reported negative feelings, their actions were consistent with in-the-moment coping. Logging a Bad Mood was a significant predictor of turning to Meditation (an increase of 0.04 sessions). This behavior aligns with research on mobile health interventions, which are frequently used reactively to manage acute emotional distress (Mohr et al., 2017). This reactive pattern was further substantiated by the finding that greater engagement in breathing exercises also predicted an increased likelihood of logging a bad mood (an increase of 0.25 points).
In contrast, positive moods spurred proactive, goal-oriented behaviors. Logging a Good Mood significantly predicted the completion of more platform Challenges (an increase of 0.25 challenges). This finding is consistent with Fredrickson’s « Broaden-and-Build » theory, which posits that positive emotional states broaden an individual’s cognitive and behavioral repertoires, encouraging exploration and skill-building (Fredrickson, 2001). This behavioral dichotomy was reinforced by a weak trend suggesting that a good mood correlated with less use of reactive or demanding interventions, such as Meditation (a decrease of 0.06 sessions) and the structured BTX Programs (a decrease of 0.67 sessions).
Distinct User Pathways for Mindfulness and Physical Activity
A strong synergy appeared in mindfulness practices, where engaging in Guided Breathing was the single strongest predictor of engaging in Meditation (an associated increase of about 1.35 additional sessions). This suggests users may treat these as a paired set of activities. Participation in a corporate league was also associated with a small but significant increase in meditation sessions.
However, the data also reveals a potential trade-off between contemplative and physically demanding activities. A weak negative trend was observed between engaging in Guided Breathing and using BTX Programs, a finding reinforced by the fact that higher engagement in BTX Programs significantly predicted a 0.29-session decrease in breathing exercises. This trade-off is further highlighted by the finding that more breathing exercises were associated with significantly fewer logged external exercises (a decrease of 3.93 sessions).
Interestingly, while meditation appears to be used reactively to bad moods, breathing exercises are associated with both positive and negative states. As with meditation, more breathing is a predictor of logging a bad mood. Yet, unlike meditation, logging a Good Mood was also associated with an increase of 0.15 breathing sessions. This suggests that users may prioritize either a mental/mindfulness track or a physical activity track within a given period, creating distinct user patterns based on their immediate goals.
Discussion and Strategic Implications
This three-year analysis provides several data-driven insights. The high stability of wellbeing scores, particularly Posture, underscores that interventions should be designed to foster gradual, consistent change. The most actionable finding is the significant positive interaction between external exercise and time in predicting improvements in both Fitness and Mental scores. Consistent physical activity appears to be the behavior most clearly associated with positive health improvement, while also being part of a synergistic loop that drives broader platform use. More broadly, Fitness and Mental scores showed a tendency to improve simply by remaining engaged with the platform over time, independent of specific logged activities. This suggests the platform may foster a general increase in health awareness or a « Hawthorne effect » from being part of a wellness initiative, providing a rising tide that lifts all users. Hawthorne effects are a phenomenon where subjects modify their behavior in response to their awareness of being observed (McCarney et al., 2007). However, the effect of consistent external exercise was shown to be a significant additional driver of improvement, highlighting that while general engagement is beneficial, targeted action is what produces the most substantial and measurable gains.
Furthermore, the analysis reveals the nuanced and targeted role of different tools within the ecosystem. The use of meditation for in-the-moment mood regulation shows the platform is successfully providing users with tools for immediate emotional support, a critical component of psychosocial health, and aligns with a large body of evidence demonstrating the health benefits of mindfulness-based practices (Grossman et al., 2004). Similarly, the data on the demanding posture programs provides clear, actionable insights for continuous product evolution. By identifying an adherence challenge, the platform has generated the data needed to innovate, for instance by reframing the program into more manageable « micro-interventions. »
In conclusion, the BIOTONIX platform has proven to be a powerful and intelligent tool for corporate wellness. It establishes an environment of inherent improvement, directs users toward the most effective behaviors, provides targeted tools for specific needs, and generates the data required for its own continuous enhancement. It is a promising and demonstrably effective solution for improving employee wellbeing.
Bibliography
-
- Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179-211. DOI Link
-
- Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer-Verlag. Springer Publisher Page
-
- BIOTONIX (2023). Scientific Reviews and Documentation Related to the Products and Services of Biotonix, Inc. BIOTONIX Posture. Scientific Review
-
- Chekroud, S. R., et al. (2018). Association between physical exercise and mental health in 1.2 million individuals in the USA between 2011 and 2015: a cross-sectional study. The Lancet Psychiatry. DOI Link
-
- Deterding, S., Dixon, D., Khaled, R., & Nacke, L. (2011). From Game Design Elements to Gamefulness: Defining « Gamification ». Proceedings of the 15th International Academic MindTrek Conference, 9–15. DOI Link
-
- Embretson, S. E., & Reise, S. P. (2013). Item Response Theory for Psychologists. Taylor and Francis. Perlego Digital Library
-
- Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), 226–231. AAAI Press. PDF Link
-
- Eysenbach, G. (2005). The Law of Attrition. Journal of Medical Internet Research, 7(1), e11. DOI Link
-
- Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2011). Applied Longitudinal Analysis (2nd ed.). Wiley. DOI Link
-
- Fogg, B. J. (2009). A behavior model for persuasive design. Proceedings of the 4th International Conference on Persuasive Technology, 1-7. DOI Link
-
- Fredrickson, B. L. (2001). The role of positive emotions in positive psychology: The broaden-and-build theory of positive emotions. American Psychologist, 56(3), 218–226. DOI Link
-
- Groenwold, R. H., et al. (2009). Quantitative assessment of confounding by indication in observational studies. European journal of epidemiology. DOI Link
-
- Grossman, P., et al. (2004). Mindfulness-based stress reduction and health benefits: A meta-analysis. Journal of Psychosomatic Research. DOI Link
-
- Hillman, C. H., et al. (2008). Be smart, exercise your heart: exercise effects on brain and cognition. Nature Reviews Neuroscience. DOI Link
-
- Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). Springer. Springer Publisher Page
-
- Kutner, M. H., et al. (2005). Applied linear statistical models (5th ed.). McGraw-Hill Irwin. Archive Link
-
- Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008). Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining, 413-422. DOI Link
-
- Massey, F. J., Jr. (1951). The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, 46(253), 68–78. DOI Link
-
- McCarney, R., Warner, J., Iliffe, S., van Haselen, R., Griffin, M., & Fisher, P. (2007). The Hawthorne Effect: a randomised, controlled trial. BMC Medical Research Methodology, 7(1), 30. DOI Link
-
- Mohr, D. C., Weingardt, K. R., Reddy, M., & Schueller, S. M. (2017). Three Problems With Current Digital Mental Health Research . . . and Three Things We Can Do About Them. Psychiatric Services, 68(5), 427–429. DOI Link
-
- Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R² from generalized linear mixed-effects models. Methods in Ecology and evolution. DOI Link
-
- Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. Springer. Springer Publisher Page
-
- Prochaska, J. J., & Prochaska, J. O. (2011). A review of multiple health behavior change interventions for primary prevention. American Journal of Lifestyle Medicine. DOI Link
-
- Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121. DOI Link
-
- Sahrmann, S. A. (2002). Diagnosis and treatment of movement impairment syndromes. Mosby. DOI Link
-
- Stansfeld, S., & Candy, B. (2006). Psychosocial work environment and mental health—a meta-analytic review. Scandinavian Journal of Work, Environment & Health, 32(6), 443–462. DOI Link
-
- Symonds, M. R., & Moussalli, A. (2011). A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion. Behavioral Ecology and Sociobiology. DOI Link
-
- van Buuren, S. (2018). Flexible Imputation of Missing Data (2nd ed.). CRC Press. Archive Link
-
- van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. DOI Link
-
- van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(11), 2579-2605. JMLR Link
-
- Wagenmakers, E.-J., et al. (2012). An agenda for purely confirmatory research. Perspectives on Psychological science. DOI Link
Annex A: Cohort Demographics and Participation Dynamics
This annex provides a descriptive overview of the cohort to contextualize the main analytical findings. The visualizations highlight the user base’s composition, seasonal activity fluctuations, and typical participation lifecycles.

Figure A1: Seasonal Trends in User Activity
A review of user participation reveals distinct seasonal patterns. As Figure A1 shows, the percentage of active users consistently drops during the summer months each year before rebounding in the autumn. This cycle appears for both league and non-league users, which suggests that external factors like summer vacations have a broad influence on platform engagement. While league members are consistently more active, they are not immune to this seasonal dip.

Figure A2: Demographic Composition by League Status
The cohort’s age and gender distribution reveals a noticeable selection effect related to league participation. Although the age distribution is similar across both groups, Figure A2 shows a visibly higher proportion of female participants in the « In League » cohort. This likely reflects the demographics of corporate partners who enroll employees in league activities and tend to have more gender-diverse teams. Across all seasons, the age distribution for both genders and league statuses remains stable.

Figure A3: Participant Entry

Figure A4: Participant Study Duration
The study used a rolling enrollment, with participants joining throughout the three-year period. Figure A3 demonstrates that this entry was not uniform, as fewer participants joined during the summer seasons. The largest influx of new users occurred in the winter of 2023-2024.
Once enrolled, a clear pattern emerges in participation length. As illustrated in Figure A4, an overwhelming majority of users are active for a single three-month season, a duration that aligns with standard corporate wellness programs (8-weeks). The number of users who remain engaged for two or more consecutive seasons drops off sharply, a typical pattern that highlights the challenge of long-term retention.
Annex B: Engagement Dynamics for Key Platform Activities
This annex details engagement trends for specific platform activities, stratified by league participation. These visualizations show how different user segments interact with the platform’s tools, revealing distinct patterns tied to gamification, seasonal initiatives, and feature adoption.

Figure B1: Assessment Count
The frequency of health assessments points to two distinct user behaviors. A segment of users, shown in Figure B1, engages in a frequent evaluation-re-evaluation loop, completing six or more assessments in a quarter. In contrast, another segment does not update their scores within a three-month period; these cases are treated as « constant outcomes » in the longitudinal models.

Figure B2: BTX Program Engagement
Engagement with the structured BTX Programs is highly sensitive to external initiatives. Figure B2 shows that participation peaks align with targeted corporate wellness campaigns, followed by a significant drop-off when users are not actively prompted. This pattern suggests the demanding nature of the programs requires consistent external encouragement to drive adherence.

Figure B3: Challenges and Gamification

Figure B4: External Exercise as a Motivator
The impact of gamification is clearly visible in challenge completion and logged physical exercise. League participants complete significantly more challenges than their non-league counterparts, indicating the competitive scoring system is a powerful motivator (Figure B3), a finding consistent with literature showing that gamification elements like points and leaderboards can significantly increase engagement (Deterding et al., 2011). A similar trend is observed for external exercise, where league members consistently log more activity (Figure B4). Both activities show seasonal peaks that often correspond with corporate wellness initiatives.

Figure B5: Mindfulness Meditation Drop

Figure B6: Mindfulness Breathing Adoption
The adoption of mindfulness tools reveals distinct lifecycles. Meditation usage (Figure B5) saw a significant spike upon the feature’s launch, followed by a gradual decline, suggesting novelty-driven adoption. In contrast, breathing exercises (Figure B6) were adopted more steadily and maintained more consistent engagement. For both features, league participants showed a higher propensity for engagement.
Note on Mood Tracker
The user-logged mood tracker was introduced late in the study period. The available data is therefore not yet sufficient to generate insightful longitudinal visualizations.
Annex C: Group-Level Trends in Wellbeing Scores
This annex provides a visual overview of average wellbeing scores for the cohort. While the primary analysis uses models to isolate individual-level change, these aggregated plots complement those findings by illustrating the overall trends and seasonal patterns that characterize the user base.

Figure C1: A General Trajectory of Improvement in Fitness Health

Figure C2: A General Trajectory of Improvement in Mental Health
In contrast to posture, the group averages for both Fitness (Figure C1) and Mental (Figure C2) scores show a clear, albeit modest, upward trajectory over the study. This observation aligns with the core model results, which identified a significant positive effect of time on these outcomes. The plots reflect this inherent benefit of sustained participation, which is then supplemented by the impact of specific behaviors like external exercise.

Figure C3: The Stability of Posture Scores
The average Posture score shows considerable quarter-to-quarter fluctuation but no clear directional trend over the three-year period, reflecting different workplace ergonomic needs for each of our partners. This visual evidence (Figure C3) supports the model’s finding that posture is a highly stable metric reflecting chronic adaptations. The absence of a discernible upward slope in the group average is consistent with the analysis, which found no measurable longitudinal effect from interventions like the BTX Programs.

Figure C4: PsychoSocial Scores and League Participation
The trend for PsychoSocial scores (Figure C4) highlights the distinct experience of users in a corporate league, who often report considerably higher average scores than non-league participants. The plot illustrates this baseline difference between the groups. This gap fluctuates over time without a clear, sustained upward or downward trajectory for the cohort, reflecting different workplace environment needs for each of our partners.
Annex D: Evolution of Cross-Sectional Correlation Structures
While the longitudinal models infer change over time, a cross-sectional analysis of correlations within each season provides a valuable snapshot of user behavior. The following analysis reveals a clear evolution from isolated activities in the early stages to a highly interconnected engagement pattern in later periods, illustrating how users develop more complex habits over time.

Figure D1: The Mid-Study Period, Emergence of a Core Engagement Loop
The Autumn 2023 season (Figure D1) marks a critical turning point. In contrast to the sparse correlations of earlier seasons, this period shows the clear emergence of a synergistic engagement pattern. The heatmap reveals a strong positive feedback loop: engagement with BTX Programs is significantly correlated with watching more Videos and logging more External Exercises. This is the first period where this foundational relationship becomes statistically visible.

Figure D2: The Late-Study Period, a Mature and Holistic Ecosystem
By the Winter 2025 season (Figure D2), the correlational structure has evolved into a much denser and more complex web, representing a mature ecosystem. The core physical activity loop remains strong but is now complemented by a fully integrated set of mindfulness and gamification habits. The heatmap shows new, strong correlations between Breathing, Meditation, and Challenge completion. This snapshot illustrates the behavior of a mature user base that engages holistically across the platform’s features.

Figure D3: The Next Steps of the Study, including Mood Tracking
The final period of the current study, Spring 2025 (Figure D3), provides the first clear snapshot of the newly introduced mood tracker. This allows for an initial cross-sectional analysis of the relationship between self-reported mood and the primary wellbeing scores. The data reveals a nuanced relationship between users’ emotional states and their health scores. A straightforward negative correlation appears between the Cognitive score and logging a Bad Mood, indicating that users with higher cognitive function tend to report fewer bad moods.
Annex E: Principal Component Analysis of Cohort Structure
To complement the regression analysis and visually confirm the dataset’s correlational structure, we performed a Principal Component Analysis (PCA) on the imputed data (Jolliffe, 2002). This approach allows us to project the high-dimensional user data into its primary axes of variance, revealing the dominant patterns in both health outcomes and user activities.
Figure E1: Cohort Overlap by League Status (PCA)
The projection of the principal components (Figure E1) shows no clear separation between league and non-league users, indicating that despite differences in mean activity levels identified by the regression models, the overall variance structure of the two groups is largely indistinguishable.
Figure E2: Inter-relationships Among Scores (PCA)

Figure E3: Inter-relationships Among Activities (PCA)
The biplots provide a visualization of the variable relationships identified in our main analysis. The near orthogonality of the Posture vector relative to the Fitness, Cognitive, and PsychoSocial vectors (Figure E2) offers a clear geometric confirmation of its statistical independence as a health domain. Furthermore, the activity biplot (Figure E3) reveals two distinct behavioral archetypes. The first is a « Physical Rehabilitation » axis, where engagement in BTX Programs and External Exercises are tightly coupled. The second is a « Mindful Engagement » axis, which groups mindfulness features (Breathing, Meditation) with gamification (Challenges) and content consumption (Videos). This clustering visually substantiates the synergistic feedback loops our regression models uncovered, showing how distinct user engagement patterns emerge from the platform’s ecosystem.
Annex F: Model Space Search and Selection
To identify the most accurate and parsimonious predictive models, this study employed an advanced search methodology designed to overcome the limitations of traditional automated approaches. Standard stepwise regression methods are « greedy, » meaning they follow a single path of improvements and can easily get stuck in a « local optimum »: a model that appears best at a given step but is not the best model overall. Our approach, combining a Stochastic Beam Search with a topological analysis of the model space, is explicitly designed to navigate this complexity and avoid such pitfalls.

Figure F1: The Cognitive Stochastic Beam Search Tree
Instead of following a single path, the algorithm maintains a « beam » of several of the best-performing models at each step, allowing it to explore multiple promising paths in parallel. Furthermore, the selection of which new predictors to test is probabilistic, enabling the search to occasionally explore less-obvious paths that may lead to a superior solution. The resulting tree structure for the Cognitive outcome (Figure F1) is a visualization of this parallel exploration. Each point represents a candidate model, plotted by its step in the search process and its performance via the AIC score (vertical axis, lower is better). The algorithm can effectively backtrack and jump between different model families. The red line highlights the winding trajectory the algorithm took to arrive at the final, best-performing model, a path that a simple, linear stepwise search would likely have missed.

Figure F2: Navigating the Cognitive Model Space Topology
We also visualized the explored model space as a 2D topological map using t-SNE (Figure F2), a non-linear dimensionality reduction technique well-suited for visualizing high-dimensional datasets (van der Maaten & Hinton, 2008). On this map, each circle represents a candidate model, and models with similar predictors and coefficient weights are placed close together. The background color corresponds to the AIC score, with the yellow « valleys » representing regions of better-fitting models.
This visualization confirms that the model space is a complex topography with multiple local optima. For the Cognitive outcome, the search identified two distinct « families » of solutions, clustered using the DBSCAN algorithm (Ester et al., 1996) on a custom hybrid AIC-model weights distance. A standard greedy search could easily have been trapped in the first local optimum (best AIC ≈ 308). The plot clearly shows how the stochastic beam search was able to traverse the model space, escape this region, and discover the globally superior set of models in the valley of the second cluster, which contains the best model found (AIC ≈ 288). This process ensures that the final model selected for averaging and inference is drawn from the most robust and best-performing region of the entire explored model space.
