Water Distribution System Overview:
A water distribution system is a water supply network, which has components such as pumps and valves to carry water from storage tanks to water consumers in order to satisfy their requirements.
Need for Pump / Valve optimization:
Inefficient usage of pumps can lead to enormous problems such as high energy usage, high cost, and overflow in storage tanks and not meeting demand requirements. Real-time pump control is resource-intensive and an infeasible task using calculations. Efficient pump control can lead to low energy, low cost, tank levels under upper-bound and lower-bound constraints, and meeting demand requirements.
Reinforcement Learning as a pump optimization solution:
Optimization does no harm in any field. Having said that, it is important to know what can be achieved by using optimization and whether it can solve the requirements. Optimizing pumps in water plants lead to low energy usage and low operation cost. Moreover, optimization should also solve existing requirements, such as fulfilling demand and keeping reservoir levels under constraint.
Reinforcement Learning is a subfield of machine learning, whose goal is to find a policy with which the agent can govern the environment to the most advantageous state from any initial state in order to maximize the reward over the time.
There are two main components in the reinforcement learning solution: Agent and Environment. An RL Agent is an AI Algorithm, and an RL Environment is a task/simulation that needs to be solved by an RL agent. The environment interacts with the agent by sending its state and a reward. Thus, the environment should be constructed by using simulation, a reward system, and state vectors, which represent the internal state of the simulation.
The Agent is nothing but an AI algorithm. The goal/solution is to have a “Reinforcement Learning Agent”, which can provide an optimized pump schedule that solves the above-listed requirements and saves cost and energy. The RL Agent must communicate to the environment, and it should be reusable.
Reinforcement Learning Agents (AI algorithms) can be model-based and model-free. Popular model-based algorithms are dynamic programming (DP), and popular model-free algorithms are Monte Carlo, Sarsa, and Q Learning. In 2015, DeepMind combined a Q Learning algorithm with deep neural networks and produced the Deep Q-Network (DQN), which can solve a wide variety of problems, and it is also one of the most powerful algorithms. DQN approximates the q-value function by using deep neural networks, which also belongs to the model-free algorithm category.
An environment consists of two main components: a simulation, and a reward mechanism. The simulation takes an existing state and action taken by an RL agent and estimates the next state, and then updates them. Simulation can be carried out by using any method. Generally, hydraulic models and ML models are the common practice for simulation. The reward mechanism is a feedback loop to the agent. It calculates reward based on the action taken by the RL agent and outcome (simulation results). The reward mechanism is very important as it drives the learning for RL agents; therefore, the reward mechanism is one of the most crucial components. Here are some examples:
- Penalty on breaking reservoir constraints
- Reward on keeping the level under constraints
- Small penalty on turning the pump ON during expensive tariff hours
- Small reward on turning the pump ON during cheaper tariff hours
- Reward on meeting the demand
- Penalty on not meeting the demand
The Reinforcement Learning Agent takes an action to maximize the reward for the episode. In a water distribution system, the control variables are pumps (binary / variable speed) and valves. Based on the current state and reward received from the previous action, the agent decides the next action. For example, in the case of a binary pump, the agent will decide when to turn the pump ON and OFF.
Once an RL agent is trained, it requires some parameters as inputs, and it recommends schedules in real-time. In the case of a water distribution system, an RL agent needs to know water demand and state variables (reservoir levels). Based on the current states (reservoir levels) and the predicted demand, the RL agent will take an action for the next timestamp. Since the RL agent is trained on reward function, it will take actions to meet all requirements:
- Keep the reservoir level under constraint
- Meet the demand
- Low energy usage
- Meet the toggle count constraint for pumps
5. Terminal State:
An Episode in an RL is the length of the simulation that ends with a terminal state at the end. In other words, an RL agent can take action or recommend schedules until it reaches a terminal state. Deciding the terminal state is highly dependent on the business solution. Here are some examples of terminal states. The RL agent can have multiple rules for terminal states.
- The agent is allowed to run until a fixed number of steps get to the most advantageous state. The environment keeps track of the number of steps taken, and it terminates the episode when the agent reaches the limit.
- The agent can decide to do nothing instead of changing one of the control variables/speed ratios.
- If the agent adheres to a state for a predefined number of consecutive steps, the episode is terminated as well.
There are many RL platforms that are currently available that allow users to develop and compare multiple RL algorithms. Having said that, a few of them are the more popular ones such as:
- Amazon SageMaker RL
- Google’s Dopamine
- Facebook’s ReAgent
- Deepmind’s bSuite
- OpenAI Gym
Among all of these, OpenAI Gym is a good starting point for beginners. It already provides a wide variety of simulated Reinforcement Learning environments, and it has rich documentation.