Writing tests is a time-consuming part of software development, and developers
spend considerable time preparing them daily. With rapid advances in
large language models, we investigated the effectiveness of Retrieval-Augmented
Generation (RAG) technology for automated Python test generation. Large
language models are successful at generating code, but quality testing is one
of their major weaknesses; they often generate tests without understanding
context. RAG is a technique that bypasses the lack of domain knowledge by
identifying similar tests in a knowledge base based on the analyzed function,
which it uses to generate new ones correctly. With the retrieved examples,
the language model creates tests without hallucinations and other limitations.
We researched and implemented a prototype system and tested it
on a real project using objective and subjective metrics. Generated tests
achieved higher code coverage, while being less successful at detecting actual
bugs compared to manually written tests. Developers further confirmed the
superiority of manual tests through subjective assessments.
|