This paper addresses the challenge of ensuring data privacy when using Large Language Models (LLMs) in Construction Management Workflows. It analyses how well existing Named Entity Recognition (NER) tools can identify and redact sensitive information in technical construction documents, particularly in the Slovenian language. A qualitative evaluation was performed with four NLP frameworks (SpaCy, SpaCy SLO, Flair, NLTK) applied to a sample of five real-world construction documents and compared with manually annotated baseline data. The evaluation also included anonymization with VJM, which masked sensitive data using regular expressions. The results show that while basic anonymisation is possible, all classical NER frameworks underperform in identifying domain-specific entities such as project codes, engineering titles and structured numerical data. These findings emphasise the urgent need for domain-adapted preprocessing tools, as inaccurate redaction po ses legal and ethical risks when integrating LLMs in regulated domains such as construction. Future work should focus on building hybrid redaction pipelines and training custom models on annotated corpora to improve accuracy and compliance in technical domains.
|