Test smells 20 years later : detectability, validity, and reliability

Panichella, Annibale; Panichella, Sebastiano; Fraser, Gordon; Sawant, Anand; Hellendoorn, Vincent

doi:10.1007/s10664-022-10207-5

Please use this identifier to cite or link to this item: https://doi.org/10.21256/zhaw-25672

Full metadata record

DC Field	Value	Language
dc.contributor.author	Panichella, Annibale	-
dc.contributor.author	Panichella, Sebastiano	-
dc.contributor.author	Fraser, Gordon	-
dc.contributor.author	Sawant, Anand	-
dc.contributor.author	Hellendoorn, Vincent	-
dc.date.accessioned	2022-09-17T09:21:50Z	-
dc.date.available	2022-09-17T09:21:50Z	-
dc.date.issued	2022	-
dc.identifier.issn	1382-3256	de_CH
dc.identifier.issn	1573-7616	de_CH
dc.identifier.uri	https://digitalcollection.zhaw.ch/handle/11475/25672	-
dc.description	Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch)	de_CH
dc.description.abstract	Test smells aim to capture design issues in test code that reduces its maintainability. These have been extensively studied and generally found quite prevalent in both human-written and automatically generated test-cases. However, most evidence of prevalence is based on specific static detection rules. Although those are based on the original, conceptual definitions of the various test smells, recent empirical studies indicate that developers perceive warnings raised by detection tools as overly strict and non-representative of the maintainability and quality of test suites. This leads us to re-assess test smell detection tools’ detection accuracy and investigate the prevalence and detectability of test smells more broadly. Specifically, we construct a hand-annotated dataset spanning hundreds of test suites both written by developers and generated by two test generation tools (EvoSuite and JTExpert) and performed a multistage, cross-validated manual analysis to identify the presence of six types of test smells in these. We then use this manual labeling to benchmark the performance and external validity of two test smell detection tools – one widely used in prior work and one recently introduced with the express goal to match developer perceptions of test smells. Our results primarily show that the current vocabulary of test smells is highly mismatched to real concerns: multiple smells were ubiquitous on developer-written tests but virtually never correlated with semantic or maintainability flaws; machine-generated tests actually often scored better, but in reality, suffered from a host of problems not wellcaptured by current test smells. Current test smell detection strategies poorly characterized the issues in these automatically generated test suites; in particular, the older tool’s detection strategies misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice, (ii) more accurate detection strategies to be evaluated primarily in industrial contexts.	de_CH
dc.language.iso	en	de_CH
dc.publisher	Springer	de_CH
dc.relation.ispartof	Empirical Software Engineering	de_CH
dc.rights	http://creativecommons.org/licenses/by/4.0/	de_CH
dc.subject	Test generation	de_CH
dc.subject	Test smell	de_CH
dc.subject	Software quality	de_CH
dc.subject.ddc	005: Computerprogrammierung, Programme und Daten	de_CH
dc.title	Test smells 20 years later : detectability, validity, and reliability	de_CH
dc.type	Beitrag in wissenschaftlicher Zeitschrift	de_CH
dcterms.type	Text	de_CH
zhaw.departement	School of Engineering	de_CH
zhaw.organisationalunit	Institut für Informatik (InIT)	de_CH
dc.identifier.doi	10.1007/s10664-022-10207-5	de_CH
dc.identifier.doi	10.21256/zhaw-25672	-
zhaw.funding.eu	info:eu-repo/grantAgreement/EC/H2020/957254//DevOps for Complex Cyber-physical Systems/COSMOS	de_CH
zhaw.issue	7	de_CH
zhaw.originated.zhaw	Yes	de_CH
zhaw.pages.start	170	de_CH
zhaw.publication.status	publishedVersion	de_CH
zhaw.volume	27	de_CH
zhaw.publication.review	Peer review (Publikation)	de_CH
zhaw.webfeed	Software Engineering	de_CH
zhaw.funding.zhaw	COSMOS – DevOps for Complex Cyber-physical Systems of Systems	de_CH
zhaw.author.additional	No	de_CH
zhaw.display.portrait	Yes	de_CH
zhaw.relation.references	https://zenodo.org/record/3337892#.XswWby-w3yU	de_CH
Appears in collections:	Publikationen School of Engineering

Files in This Item:

File	Description	Size	Format
2022_Panichella-etal_Test-smells-20-years-later.pdf	Published Version	3.27 MB	Adobe PDF	View/Open
2022_Panichella-etal_Test-smells-20-years-later_EMSE.pdf	Submitted Version	420.55 kB	Adobe PDF	View/Open

Show simple item record

Panichella, A., Panichella, S., Fraser, G., Sawant, A., & Hellendoorn, V. (2022). Test smells 20 years later : detectability, validity, and reliability. Empirical Software Engineering, 27(7), 170. https://doi.org/10.1007/s10664-022-10207-5

Panichella, A. et al. (2022) ‘Test smells 20 years later : detectability, validity, and reliability’, Empirical Software Engineering, 27(7), p. 170. Available at: https://doi.org/10.1007/s10664-022-10207-5.

A. Panichella, S. Panichella, G. Fraser, A. Sawant, and V. Hellendoorn, “Test smells 20 years later : detectability, validity, and reliability,” Empirical Software Engineering, vol. 27, no. 7, p. 170, 2022, doi: 10.1007/s10664-022-10207-5.

PANICHELLA, Annibale, Sebastiano PANICHELLA, Gordon FRASER, Anand SAWANT und Vincent HELLENDOORN, 2022. Test smells 20 years later : detectability, validity, and reliability. Empirical Software Engineering. 2022. Bd. 27, Nr. 7, S. 170. DOI 10.1007/s10664-022-10207-5

Panichella, Annibale, Sebastiano Panichella, Gordon Fraser, Anand Sawant, and Vincent Hellendoorn. 2022. “Test Smells 20 Years Later : Detectability, Validity, and Reliability.” Empirical Software Engineering 27 (7): 170. https://doi.org/10.1007/s10664-022-10207-5.

Panichella, Annibale, et al. “Test Smells 20 Years Later : Detectability, Validity, and Reliability.” Empirical Software Engineering, vol. 27, no. 7, 2022, p. 170, https://doi.org/10.1007/s10664-022-10207-5.