PV power generation forecast technique for solar plants with missing data
AI Analysis
Summary
Scientists in China have developed a novel missingness-aware power forecasting method that leverages signal decomposition, multi-scale covariate interaction, and multi-domain collaborative transfer learning. The approach reportedly improves average forecasting accuracy by 15.3%.
<p class="p1"><span class="s1">Scientists in China have developed a novel missingness-aware power forecasting method that leverages signal decomposition, multi-scale covariate interaction, and multi-domain collaborative transfer learning. The approach reportedly improves average forecasting accuracy by 15.3%.</span></p><p>A research team led by China’s <a href="https://www.pv-magazine.com/2024/12/02/chinese-scientists-develop-photovoltaic-window-with-heat-flow-control/" rel="noopener" target="_blank">Hunan University</a> has developed a novel PV missingness-aware power forecasting method designed to handle missing and incomplete data.</p>
<p>The multi-domain collaborative transfer learning–multiscale covariate interaction (MDCTL-MCI) methodology combines signal decomposition, multi-scale covariate interaction, and multi-domain collaborative transfer learning.</p>
<p>“This study considers how covariate information can be effectively utilized to enhance predictive performance, and whether the inherent generalization capacity and robustness of deep learning algorithms can be leveraged to directly forecast solar irradiance in the presence of substantial missing input features, without performing additional imputation, and to conduct a thorough analysis of the various influencing factors and the underlying predictive mechanisms,” the group said.</p>
<p>To achieve this, the method first applies multivariate singular spectrum analysis (MSSA) to reduce noise and enhance data representation. Next, a lightweight MCI approach models the relationships among variables and extracts deep temporal patterns. In the third step, the MDCTL strategy enhances model robustness under low-quality data conditions by integrating data from multiple PV sites. Finally, a Shapley additive explanation (SHAP) technique identifies the key factors influencing forecasting performance.</p>
<p>The dataset used in the study consists of one year of continuous operational data from four solar PV stations in northern, central, and northwestern China, recorded at 30-minute intervals. These stations have rated output capacities ranging from 30 MW to 130 MW. According to the researchers, the dataset “exhibits significant data quality issues.” While PV power output data are relatively complete, covariates such as solar irradiance and weather conditions show missing rates ranging from 0% to 80% across different stations. The data were divided into training, validation, and test sets using a 6:1:1 ratio.</p>
<figure class="wp-caption aligncenter" id="attachment_320491" style="width: 600px;"><img alt="" class="size-medium wp-image-320491" height="419" src="https://www.pv-magazine.com/wp-content/uploads/2025/10/1-s2.0-S0306261925015016-gr9_lrg-600x419.jpg" tabindex="0" width="600" /><figcaption class="wp-caption-text">Observed and predicted value curves <p><i>Image: Hunan University, Applied Energy, CC BY 4.0</i></p>
</figcaption></figure>
<p>“Given the critical role of covariate types in determining model accuracy, both Pearson correlation analysis (for linear relationships) and Spearman correlation analysis (for nonlinear relationships) are conducted on six variables,” explained the team. “Global horizontal irradiance (GHI), direct normal irradiance (DNI), and total solar irradiance (TSI), which show the strongest correlation with PV power output, are selected as input variables for subsequent experiments. To better understand the data distribution, marginal histograms are plotted to depict the relationship between each selected variable and PV power output.”</p>
<p>The MDCTL-MCI model uses 48 historical time steps as input and performs multi-step forecasting for the next 48 time steps in a single forward pass. Its performance was compared with several state-of-the-art time series forecasting methods, including Pyraformer, Transformer, Informer, TimeXer, iTransformer, and PatchTST, as well as MLP-based models such as LightTS, TSMixer, and MCI.</p>
<p>“Extensive experiments on four Chinese PV installations reveal that, compared to baseline methods, the proposed method improves average accuracy by 10.5% under complete data conditions and by 15.3% under various missing data scenarios,” the results showed. “In summary, the MDCTL-MCI method proposed in this study effectively tackles the limitations of covariate underutilization and the instability and inaccuracy of forecasts under poor data quality conditions, which remain common in existing research. The proposed model establishes a solid foundation for the deployment of PV systems in complex environments and offers significant contributions to the development of PV technology.”</p>
<p>The new approach was described in “<a href="https://www.sciencedirect.com/science/article/pii/S0306261925015016" rel="noopener" target="_blank">Robust photovoltaic forecasting under severe data missingness via multi-domain collaboration and covariate interaction</a>,” published in <em>Applied Energy. </em>Scientists from China’s <a href="https://www.pv-magazine.com/2024/12/02/chinese-scientists-develop-photovoltaic-window-with-heat-flow-control/" rel="noopener" target="_blank">Hunan University</a>, <a href="https://www.pv-magazine.com/2025/09/01/using-drones-satellite-and-ground-data-to-map-vegetation-in-pv-plants/" rel="noopener" target="_blank">Zhejiang University</a>, Japan’s <a href="https://www.pv-magazine.com/2025/05/02/planting-turf-clover-beneath-solar-plants-can-increase-soil-organic-carbon/" rel="noopener" target="_blank">Kyushu University</a>, and Australia’s <a href="https://www.pv-magazine.com/2024/08/05/recovering-silver-from-pv-waste-via-green-graphene/" rel="noopener" target="_blank">James Cook University</a> have contributed to the study.</p>