IEMS Virtual Microgrid — Deep-RL Energy Agent

Held-out test set (Oct–Dec 2016). The DQN agent controls only the battery; the grid import/export is whatever is left over after the battery and solar meet the factory load.

How to read this dashboard

Pick a test day

00:00 speed
Press ▶ Play to watch the agent run the battery through the day, minute by minute.
Price now
Agent decision
Battery SOC
Grid (agent)
LP optimum here

1 · The grid  — what the utility sees

Flatter and lower is better. The agent (blue) reshapes the raw no-battery profile (orange). The dashed LP optimum is the best any controller could do with perfect foresight — how close the agent hugs it is the whole story.

2 · The agent's decisions  — battery power & state of charge

Green bars above zero = charging (storing energy); below zero = discharging (releasing it). The white line is the battery's state of charge; the dashed line is what the LP would have done. Agent and LP filling/emptying at the same times = the agent learned the optimal timing.

3 · Context  — price, load and solar driving the decision

The agent watches the day-ahead price (red): notice the battery charges into the price dips and discharges into the peaks.

Whole test set — cumulative electricity cost

The widening gap between the two curves is the money the agent saves over the full Oct–Dec test period.

How the IEMS works — the whole system, end to end

Every number and equation below is taken directly from the project code (config.py, env.py, dqn.py, lp_benchmark.py). Nothing here is illustrative-only.

1 · What problem are we solving?

An Industrial Energy Management System (IEMS) decides, every 15 minutes, how a factory should use a big battery to cut its electricity bill. The factory has three energy sources/sinks tied together at one electrical bus:

The one thing we can control is a 1600 kWh battery (ESS) rated at 400 kW. Charging it when power is cheap and discharging it when power is expensive shifts the factory's demand in time and lowers the bill — that is the entire job of the agent.

2 · System architecture

AC BUS P_grid = P_load − P_PV + P_battery ☀ Solar PV ~131 kW peak · weather-driven P_PV → 🏭 Factory load fixed demand · must be served → P_load 🔋 Battery (ESS) 1600 kWh · ±200 kW/step η=0.95 · SOC 15–100% P_battery (charge/discharge) ⚡ Utility grid hourly price (can be <0) buy / sell @ β=0.9 P_grid (import/export) 🧠 DQN Agent (the IEMS) observes everything → sets battery power
The agent only commands the green battery flow. Everything else is exogenous (decided by weather, the factory, and the market). The grid flow is whatever is left over to keep the bus balanced — so by moving the battery, the agent indirectly controls the bill.

3 · The physical model (exact equations from env.py)

4 · How Deep Reinforcement Learning works here

Reinforcement learning frames control as an agent interacting with an environment in a loop. There is no labelled "correct answer" — the agent learns purely from a reward signal by trial and error.

AGENT (DQN) "given this state, which battery action?" ENVIRONMENT microgrid: battery + grid + load + PV action aₜ — battery power next state sₜ₊₁ + reward rₜ

The three ingredients, exactly as coded:

Over hundreds of thousands of steps the agent discovers a policy — a mapping from state to action — that earns the most reward, i.e. the cheapest, smoothest battery schedule. The strategy it converges on is price arbitrage: charge in the cheap hours, discharge in the expensive ones (watch it happen in the animation above).

5 · The DQN agent in detail

"DQN" = Deep Q-Network. A Q-value Q(s,a) estimates the total future reward of taking action a in state s. If we know Q accurately, the best policy is simply "pick the action with the highest Q." DQN learns Q with a neural network (from dqn.py):

For the headline result we train 5 independent agents (different random seeds) and report the average, because a single RL run is noisy — that is what makes the numbers trustworthy.

6 · What we compare against — the two benchmarks

⊘ No-battery baseline

The factory with its solar but no battery and no control: it simply imports/exports whatever the net load is, at the market price. This is the "do nothing" reference — the bill the agent has to beat. Lower bound on effort, upper bound on cost.

◆ Perfect-foresight LP optimum

A Linear Program (scipy HiGHS) that is told the entire day's prices, load and PV in advance and computes the mathematically cheapest possible battery schedule. No real controller can beat it. The theoretical best — the target.

So the agent is sandwiched between the two: it should be far below the no-battery baseline (big saving) and as close as possible to the LP optimum (near-perfect). Our result:

7 · How we built it — the pipeline

  1. Data (data.py): a full year (2016) of 15-min load, PV and price data is loaded and converted to kW. It is split by time — train on Jan→Sep, test on Oct→Dec — so the agent is always evaluated on days it never trained on (no data leakage).
  2. Environment (env.py): the battery/grid physics above, wrapped as a Gym-style step()/reset() simulator running ~500 steps/second.
  3. Training (train.py, run_best.py): the DQN agent interacts with the training months for ~100k–250k steps, learning the policy by trial and error (the headline improved run averages 5 independent seeds).
  4. Evaluation (evaluate.py): the trained policy is rolled out greedily on the held-out test months and scored on real KPIs — cost, peak, variability — against both benchmarks.
  5. This dashboard (dashboard.py): re-runs that rollout and renders every chart you see from the actual step-by-step results.
One-sentence summary: we built a fast simulator of a solar-plus-battery factory, trained a Deep Q-Network to drive the battery by trial-and-error against real 2016 price data, and it learned — on days it had never seen — to cut the electricity bill by about a third, landing within single digits of the perfect-foresight optimum while keeping the grid peak unchanged.
Generated by iems_drl/dashboard.py. Self-contained — Plotly is embedded, no internet required.