Paper Title: A data-driven multivariable framework for operational regime identification, product transition detection, and anomaly detection in industrial pumping systems using SCADA data
Authors: Johnatan Corrales-Bonilla, William Hidalgo-Ozorio, Christian Corrales-Otáñez, Francisco Viteri
Corresponding Author: Johnatan Corrales-Bonilla (johnatan.corrales5518@utc.edu.ec)/ Ecuador
Abstract
This study analyzes a centrifugal pumping system in an industrial facility using fifteen months of operational data collected from a Supervisory Control and Data Acquisition (SCADA) system. Applying a flow greater than zero criterion, 15,049 records corresponding to active operation were retained; after quality control and removal of incomplete and feature-inconsistent observations, 14,501 records were used for the multivariable analysis. Instead of analyzing variables independently, the study characterizes system behavior through the relationships among hydraulic, electrical, and fluid-related variables. Principal Component Analysis (PCA) is applied first, and the first two components explain 69.8% of the total variance. Based on this reduced representation, K-means clustering identifies two operational regimes, corresponding to dominant and low-load conditions. A Gaussian Mixture Model (GMM) applied to fluid density reveals two product regimes with mean values of 716.84 kg/m³ and 830.35 kg/m³. In addition, anomaly detection based on the Mahalanobis distance identifies 73 anomalous observations (0.5% of the dataset), associated with reduced discharge pressure, lower pressure differential, and decreased power consumption, indicating degraded operating conditions. The proposed framework provides a physically interpretable representation of system behavior, enabling the identification of operational regimes, product-related variations, and anomalous conditions within a unified analytical approach. This supports its application in industrial monitoring environments aligned with Industry 4.0 (I4.0) principles.