A digital twin-based comparative reinforcement learning framework for personalized behavioral recommendation

Publikasjonsdetaljer

Tidsskrift: Frontiers in Artificial Intelligence, vol. 9, 1834771, 25. juni 2026

Doi: doi.org/10.3389/frai.2026.1834771
Arkiv: hdl.handle.net/11250/5531855
Arkiv: nva.sikt.no/registration/019efdabe8ed-b6b98454-452a-48d7-bc2c-b3171f001a44

Sammendrag:
Promoting healthy lifestyle behaviors such as physical activity, sleep, diet quality, stress management, hydration, and healthy habits requires adaptive systems capable of responding dynamically to changing behavioral and environmental conditions. However, the development and evaluation of personalized recommendation systems are challenged by fragmented observational data, privacy constraints, delayed feedback, and ethical limitations associated with long-term human experimentation. To address these challenges, this study proposes a digital twin-driven reinforcement learning framework for generating personalized behavioral recommendations in a fully simulated and statistically validated environment. The proposed framework formulates personalized behavioral recommendation as a stochastic Markov Decision Process (MDP) incorporating adherence uncertainty, behavioral drift, environmental modulation, and engagement dynamics. Synthetic longitudinal behavioral trajectories are generated through a digital twin simulator that models demographic heterogeneity, lifestyle behaviors, contextual variables, and variability in policy adherence over time. The optimization objective is defined through an effective reward formulation that balances behavioral compliance gains against penalties associated with health and environmental constraint violations. This study implements several reinforcement learning (RL) paradigms under simulated conditions, such as multi-armed bandits, table-based Q-learning, State-Action-Reward-State-Action (SARSA), function approximation-based temporal difference (TD) learning, and deep Q-learning network (DQN). The results demonstrate that richer state representations and context-dependent action dynamics are necessary for higher-capacity reinforcement learning models to consistently outperform simpler baselines. Furthermore, this study provides a reproducible method for comparing learning dynamics, performance, and computational cost in digital twin-based recommender systems. The framework additionally supports privacy-preserving experimentation through the exclusive use of synthetic behavioral data and locally controlled simulation environments.