We formulate a batch reinforcement learning-based demand response approach to prevent distribution network constraint violations in unknown grids. We use the fitted Q-iteration to compute a network-safe policy from historical measurements for thermostatically controlled load aggregations providing frequency regulation. We test our approach in a numerical case study based on real load profiles from Austin, TX. We compare our approach's performance to a greedy, grid-aware approach and a standard, grid-agnostic approach. The average tracking root mean square error is 0.0932 for our approach, and 0.0600 and 0.0614 for, respectively, the grid-aware and grid-agnostic implementations. Our numerical case study shows that our approach leads to a 95% reduction, on average, in the total number of rounds with at least a constraint violation when compared to the grid-agnostic approach. Working under limited information, our approach thus offers lower but acceptable setpoint tracking performance while ensuring safer distribution network operations.
Published October 2021 , 14 pages
G2155.pdf (600 KB)