Optimizing solar-plus-storage operation for markets with imbalance penalties
AI Analysis
Summary
Scientists in Japan have used a deep reinforcement learning–based AI model to calculate discrepancies between the planned and actual electricity supply volumes in PV-battery systems operating in markets where grid imbalances are penalized. Through a series of simulations, they found that the proposed methodology can reduce imbalance penalties by approximately 47%.
<p class="p1"><span class="s1">Scientists in Japan have used a deep reinforcement learning–based AI model to calculate discrepancies between the planned and actual electricity supply volumes in PV-battery systems operating in markets where grid imbalances are penalized. Through a series of simulations, they found that the proposed methodology can reduce imbalance penalties by approximately 47%.</span></p><p>Researchers from Japan’s <a href="https://www.pv-magazine.com/2023/01/02/a-closer-look-at-potential-induced-degradation-in-solar-cells/" rel="noopener" target="_blank">University of Tsukuba</a> have developed a novel imbalance-aware control framework for photovoltaic battery storage systems (PV-BSS) that trade in day-ahead electricity markets with strict penalty mechanisms.</p>
<p>In such markets, power producers and retailers are charged imbalance fees when their actual power supply or demand deviates from planned values.</p>
<p>“This research will contribute to a mechanism that improves profitability, avoids imbalance penalties, and provides a stable supply of renewable energy to the market,” the scientists said in a statement. “Furthermore, it may lay the foundation for a system that treats aggregated household power sources—such as storage batteries and electric vehicles—as a new power source, delivering societal benefits such as stabilized electricity prices and a reduced risk of power outages.”</p>
<p>At the core of the new system is a proximal policy optimization (PPO)-based deep reinforcement learning (DRL) framework that embeds imbalance penalties directly into its reward function. Although tailored to Japan’s market rules, the researchers note that the method can be adapted to other electricity markets. The proposed setup involves three key components: a renewable energy aggregator, a PV-BSS, and a wholesale market.</p>
<figure class="wp-caption aligncenter" id="attachment_320396" style="width: 600px;"><img alt="" class="size-medium wp-image-320396" height="268" src="https://www.pv-magazine.com/wp-content/uploads/2025/10/access-gagraphic-3615960-600x268.jpg" tabindex="0" width="600" /><figcaption class="wp-caption-text">Graphical abstract of the proposed approach <p><i> Image: University of Tsukuba, IEEE Access, CC BY 4.0 </i></p>
</figcaption></figure>
<p>The framework first collects and forecasts PV generation for the following day using a lower–upper bound estimation (LUBE) model. A multi-layer perceptron (MLP) network then predicts electricity and imbalance prices based on weather data and PV forecasts. Using these predictions, the DRL model schedules the battery’s charge and discharge cycles to maximize revenue and minimize penalties. During real-time operation, model predictive control (MPC) acts as a safety layer, fine-tuning actions to account for short-term fluctuations. The final output is used to formulate the day-ahead market bid.</p>
<p>“Other computational methods can control the balance to some extent, but they cannot adequately reflect real-world uncertainties such as sudden weather changes and complex market dynamics.” The university explained. “The novel method optimizes the operation of solar power generation and battery storage systems while conforming to market rules. The method relies on deep reinforcement learning-based AI, which can handle problems involving uncertainty.”</p>
<p>The team tested the novel approach on a single household in Tsukuba City, using real data collected between April 2022 and March 2023. The setup included a 4 kWh battery and a 4 kW inverter. Data for PV and price forecasting were split into 70% for training, 15% for validation, and 15% for testing. Stable model training was achieved after about 5,000 episodes.</p>
<p>The researchers incorporated the imbalance penalty into the reward function and refined control using MPC, finding that their proposed method achieved approximately 63% of the ideal net revenue while reducing imbalance penalties by 47% compared with the rule-based baseline and by 26% compared with the DRL model lacking imbalance awareness.</p>
<p>“The hybrid PPO+MPC strategy outperformed standalone PPO, reducing imbalance events from 140 to 99 and improving revenue stability,” the researchers concluded. “Moreover, seasonal evaluations confirmed stable profitability across diverse conditions.”</p>
<p>Their findings were presented in “<a href="https://ieeexplore.ieee.org/document/11184820/authors" rel="noopener" target="_blank">Imbalance-Aware Scheduling for PV-Battery Storage Systems Using Deep Reinforcement Learning</a>,” published in <em>IEEE Access. </em></p>