Imagine giving your software tester a super-smart assistant that can read requirements, write test cases, suggest missing checks, and even help explain bugs—just by talking to it in natural language. This paper surveys how those assistants, powered by large language models like ChatGPT, are being used in software testing and what still goes wrong.
Traditional software testing is slow, labor‑intensive, and often under‑resourced. Teams struggle to keep test cases up to date, achieve good coverage, and understand complex failures. The reviewed approaches show how LLMs can automate or accelerate test design, test code generation, documentation, and analysis, while highlighting the risks (hallucinations, lack of reliability, data/privacy issues) that must be managed.
Defensibility for a company applying these ideas will mostly come from proprietary test data and historical defect logs, tight integration into existing SDLC/CI-CD workflows, and domain-specific fine-tuning that makes the LLM much better at testing that company’s particular tech stack and business rules.
Hybrid
Vector Search
Medium (Integration logic)
Context window cost and reliability of generated tests at scale; ensuring reproducibility and versioning of LLM-generated test assets.
Early Adopters
This work is a broad survey, not a single product; its differentiator is mapping the full landscape of how LLMs are applied to test case generation, test maintenance, code explanation, bug localization, and test prioritization, while systematically cataloging failure modes (hallucinations, security/privacy issues, evaluation challenges) and open research problems.