Digital Water for DoD Installations: Demonstration and Deployment Guidance for Leveraging Big Data at Water and Wastewater Utilities

Principal Investigator. Dr. Kate Newhart, kathryn.newhart@oregonstate.edu

Co-Principal Investigators. COL Andrew Pfluger, COL Corey James, Dr. Kris Villez

  

Overview. The DoD provides water and wastewater (W/WW) services to a continuously changing population with a broad set of local and federal water quantity and quality standards. To ensure that this critical infrastructure meets mission, W/WW systems are designed and operated conservatively, resulting in large, overdesigned facilities and excessive electrical and embedded energy consumption. To meet present and future needs without costly infrastructure upgrades or putting the installation at risk, data-driven monitoring and control must be leveraged. Data-driven modeling (DDM), including statistical and machine learning (ML), capture real-world variability and system-specific performance with precision unmatched by traditional mechanistic models and controllers. However, the majority of DoD W/WW systems lack the digital infrastructure and knowledge to implement real-time DDM for improved monitoring and control. To address this challenge, the proposed project will (1) characterize and assess the potential for data-driven monitoring and control of W/WW utilities at DoD installations, (2) define a framework for installations to develop and deploy data-driven monitoring and control at W/WW utilities, (3) demonstrate the framework at a W/WW utility on a DoD installation, (4) quantify the change in process efficiency (electrical or embedded energy use) and processes reliability (downtime) when data-driven monitoring or control is used, and (5) identify the challenges, barriers, and opportunities associated with data-driven monitoring and control of W/WW processes for DoD installations. The final framework will allow DoD installations to target capacity building to become resilient digital utilities. 

Task 1. Evaluation of water treatment data management practices at DoD installations and selection of pilot site.

  • Data management practices related to process (i.e., instrumentation and laboratory) data collection, validation, storage, and utilization will be requested and catalogued via the Qualtrics survey system to DoD installations and private utility partners. Follow-up interviews will be conducted with facility engineers where available. Interviews will be held, at minimum, with the installations that have provided letters of collaboration. Inspiration for the survey will be taken from existing sensor quality standards (e.g., by ISO) and Scientific and Technical Reports by the International Water Association [19], [20].
  • The results from the survey and interviews will be synthesized as a digital infrastructure map. A SWOT (strengths, weaknesses, opportunities and threats) analysis will be conducted to inform the DoD and the project team of the DoD’s capacity for data-driven process monitoring and control at DoD W/WW facilities. This assessment will (1) identify installations that are positioned for subsequent tasks and (2) establish the baseline for framework development.
  • Facilities with the best data systems and management practices identified in Subtask 1a will be approached to participate in subsequent tasks for DDM development. The best facility will (1) have a data management system that includes both online process data (including instrumentation and mechanical operation) as well as laboratory data, (2) the data management system supports a connection (e.g., API) that can be accessed with Python or other scripting language, and (3) the data management system can be accessed on an onsite server.
  • At least two facilities will be selected for detailed evaluation and potential problem statement development. At minimum, with the installations that have provided letters of collaboration. Each facility selected will be interviewed to identify present needs and through collaboration with the project team will define modeling problem statements for subsequent project tasks. Each problem statement must address a challenge that can be benchmarked using existing monitoring and control practices (i.e., is not an entirely new monitoring or control capability but rather an improvement on an existing capability at the facility). This includes existing safety factors and acceptable error. 

Task 2. Exploratory data analysis and additional data collection.

  • For each problem statement, available historical data will be used to evaluate the available data quality including trueness, precision, and rate of outlying observations. This will be achieved through a data preprocessing pipeline, including (1) detection and removal of outliers (e.g., operations can confirm detected outliers and identify their cause accordingly such as sample contamination, power disruption, or other causes of deviation), (2) blend heterogeneous data sources (e.g., laboratory and sensor data), (3) quantify accuracy, including trueness and precision, by comparing online sensor measurement against referenced laboratory values, (4) quantify variability amongst control and response variables based on the modeling objective identified in the problem statement, and (5) perform a cursory correlation analysis to determine if available data is relevant to the problem of interest. 
  • If after outlier removal and data blending, the historical data are insufficient for model development to address the initial problem statement, a data collection plan will be developed and executed with the partner facility. 

Task 3. Model evaluation of statistical and ML models of varying complexities

  • Prediction, forecasting, and fault detection problem statements will include both statistical and traditional ML modeling approaches. At least five DDM of varying complexity will be explored for prediction or forecasting problem statements. At least three DDM approaches of varying complexity will be explored for fault detection problem statements. 

Task 4. Deploy model for supervised and unsupervised performance evaluation.

  • Stand up a server with bidirectional data connection to the required data sources that meets cybersecurity requirements of DoD and the utility.
  • Operationalization of the model identified in Task 3. This builds on the code used to develop the model in Task 3 including but not limited to: detection of missing variables, updating the model with new historical data, and addition of log files for troubleshooting.
  • Monitor unchanged (i.e., unsupervised) model stability and accuracy for no fewer than 30 days.

Task 5. Synthesize framework as guidance document for DoD installations.