This master thesis focuses on the processing of big data streams with high throughput in real time. Because of the demands for optimization of the business process, data has to be obtained in real time or with minimal time delay. On the production level, complex event processing is mostly used for fault detection and quality assurance. Statistical process control systems can also notify out-of-bound conditions and send alarms. Complex event processing allows for querying different types of data streams, from business to production, with relative ease. Queries are static and data is dynamic, which is a paradigm shift from the conventional analysis of static datasets. It enables real time querying without the delay of writing the data in the database. This kind of analysis guarantees minimal time delay for decision making and processed data can still be stored in the form of a data lake. The complex event processing solution is implemented on the platform Microsoft StreamInsight. Real time and historical data is analysed in the database management system Microsoft SQL Server and Microsoft platform for business intelligence solutions Power BI. Solution is proposed for fault detection with the use of PCA (principal component analysis) and DPCA (dynamic principal component analysis). Offline part of the algorithm is implemented in Python with the use of Jupyter Notebook. Online part is implemented on the platform StreamInsight with the MathNet library of the programming language C#.
|