Publikasjonsdetaljer
Tidsskrift: Environmental Research, vol. 291, 123558, 2026
Doi: doi.org/10.1016/j.envres.2025.123558
Arkiv: nva.sikt.no/registration/019b69e39445-471dd000-4962-4c5a-a70c-029e94819080
Sammendrag:
We evaluate the added value of integrating validated Low-Cost Sensor (LCS) data into a Machine Learning (ML) framework for providing surface PM2.5 estimates over Central Europe at 1 km spatial resolution. The synergistic ML-based S-MESH (Satellite and ML-based Estimation of Surface air quality at High resolution) approach is extended, to incorporate LCS data through two strategies: using validated LCS data as a target variable (LCST) and as an input feature via an inverse distance weighted spatial convolution layer (LCSI). Both strategies are implemented within a stacked XGBoost model that ingests satellite-derived aerosol optical depth, meteorological variables, and CAMS (Copernicus Atmospheric Monitoring Service) regional forecasts. Model performance for 2021–2022 is evaluated against a baseline trained on air quality monitoring stations without any form of LCS integration. Our results indicate that the LCSI approach consistently outperforms both the baseline and LCST models, particularly in urban areas, with RMSE reductions of up to 15–20 %. It also exhibits higher accuracy than the CAMS regional interim reanalysis with a lower annual mean absolute error (MAE) of 2.68 μg/m3 compared to 3.32 μg/m3. SHapley Additive exPlanations based analysis indicates that LCSI information improves both spatial and temporal representativeness, with the LCSI strategy better capturing localized pollution dynamics. However, the LCSI's dependency on the spatial LCS layer limits its ability to capture inter-urban pollution transport in regions with sparse or no LCS data. These findings highlight the value of large-scale sensor networks in addressing spatial coverage gaps in official air quality monitoring stations and advancing high-resolution air quality modeling.