Details

Recent advances in automatic term extraction : a comprehensive survey
ID Tran, Hanh Thi Hong (Author), ID Martinc, Matej (Author), ID Caporusso, Jaya (Author), ID Delaunay, Julien (Author), ID Doucet, Antoine (Author), ID Pollak, Senja (Author)

.pdfPDF - Presentation file, Download (1,85 MB)
MD5: 09D320DE02A903CF6142EFBF5620790C
URLURL - Source URL, Visit https://dl.acm.org/doi/10.1145/3787584 This link opens in a new window

Abstract
Automatic terminology or term extraction (ATE) is a Natural Language Processing (NLP) task intended to automatically identify specialized terms present in domain-specific corpora. As units of knowledge in a speciic ield of expertise, extracted terms are not only beneicial for several terminographical tasks, but also support and improve several complex downstream tasks, e.g., information retrieval, machine translation, topic detection, and sentiment analysis. ATE systems and datasets annotated for the task at hand have been studied and developed for decades, but more recent approaches have increasingly involved novel neural systems. Despite a large amount of new research on ATE tasks, systematic survey studies covering novel neural approaches are lacking, especially when it comes to the usage of large-scale language models (LLMs). We present a comprehensive survey of neural approaches to ATE, focusing on transformer-based neural models and the recent generative approaches based on LLMs. The study also compares these systems and previous ML-based approaches, which employed feature engineering and non-neural supervised learning algorithms.

Language:English
Keywords:computing methodologies, natural language processing, neural networks, language resources, language models, transformers, automatic term extraction, ATE, low-resourced languages, monolingual, multilingual, deep learning, zero-shot, few-shot, transfer learning, prompt engineering, large-scale language models, LLMs
Work type:Article
Typology:1.01 - Original Scientific Article
Organization:FRI - Faculty of Computer and Information Science
Publication status:Published
Publication version:Version of Record
Year:2026
Number of pages:34 str.
Numbering:Vol. , no.
PID:20.500.12556/RUL-179419 This link opens in a new window
UDC:004.89:81'322
ISSN on article:0360-0300
DOI:10.1145/3787584 This link opens in a new window
COBISS.SI-ID:268167171 This link opens in a new window
Publication date in RUL:13.02.2026
Views:66
Downloads:7
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Record is a part of a journal

Title:ACM computing surveys
Shortened title:ACM comput. surv.
Publisher:Association for Computing Machinery
ISSN:0360-0300
COBISS.SI-ID:24841216 This link opens in a new window

Licences

License:CC BY 4.0, Creative Commons Attribution 4.0 International
Link:http://creativecommons.org/licenses/by/4.0/
Description:This is the standard Creative Commons license that gives others maximum freedom to do what they want with the work as long as they credit the author.

Secondary language

Language:Slovenian
Keywords:računalniške metodologije, obdelava naravnega jezika, nevronske mreže, jezikovni viri, jezikovni modeli, transformatorji, avtomatsko pridobivanje izrazov, ATE, jeziki z malo viri, enojezični, večjezični, globoko učenje, učenje z ničelnim številom poskusov, učenje z malo poskusi, prenosno učenje, hitro inženirstvo, jezikovni modeli velikega obsega

Projects

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:P2-0103-2022
Name:Tehnologije znanja

Funder:EC - European Commission
Project number:101186647
Name:Centre of Excellence in Artificial Intelligence for Digital Humanities
Acronym:AI4DH

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:J5-50169-2023
Name:Jezikovna dostopnost pravic socialnega varstva v Sloveniji

Funder:ARIS - Slovenian Research and Innovation Agency
Project number:J6-3131-2021
Name:KOMBINATORIKA BESEDOTVORNIH OBRAZIL V SLOVENŠČINI

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back