The objective of this thesis is the development and implementation of an automated ETL system for acquiring, processing, and storing data on public electrical infrastructure of the National Grid operator in the United Kingdom, with implementation of a hybrid relational-graph approach to network structure analysis. The system will regularly download publicly available Excel files from the National Grid platform, archive them in Google Cloud Storage to ensure historical traceability, and process them using Python scripts for data cleaning, validation, and transformation. Special attention will be given to the Demand Headroom indicator, which represents the difference between the reliable capacity of a network element and its expected peak load, thus determining the remaining capacity before infrastructural upgrades are required. The processed data will be loaded into a PostgreSQL database extended with the Apache AGE graph extension, enabling the execution of both traditional SQL and graph-based cypher queries on the same data structure. This hybrid approach will be demonstrated through the implementation of analytical queries for network connection optimization and electrical transmission loss calculations, executed in both approaches for direct comparison of their advantages and limitations.
The entire process will be fully automated using Google Cloud Scheduler, which will ensure regular execution, data quality control mechanisms, and complete traceability of all operations. The system is designed to be scalable, enabling easy extension to additional distribution network operators and providing a foundation for potential development of a comprehensive national platform for monitoring electrical infrastructure in support of energy transition.
|