The development of artificial intelligence and machine learning has significantly simplified data analysis. As data volumes in companies continue to grow, the need for automated systems capable of handling complex analysis without human intervention becomes increasingly important. This thesis presents the development of a dynamic system for data segmentation and anomaly detection based on the K-Means algorithm and the ML.NET framework. The system automatically prepares and normalizes data using methods such as Min-Max normalization and Robust Scaling, determines the optimal number of clusters using the Elbow and Silhouette methods, and detects anomalies through Principal Component Analysis (PCA). In the final stage, the results are interpreted using a large language model (GPT-4o) via the Azure OpenAI platform, providing deeper insights into detected patterns. The solution was tested on real but anonymised data from a pharmaceutical company, demonstrating its practical applicability in real-world environments.
|