Published May 31, 2018 | Version v1
Dataset Open

Code Smells and their Collocations : A Large-scale Experiment on Open-source Systems

  • 1. Faculty of Computing, Poznán University of Technology, Poznán, Poland
  • 2. Department of Informatics, Systems and Communication University of Milano-Bicocca, Milano, Italy
  • 3. Software Institute, Faculty of Informatics, USI Lugano, Switzerland

Description

This dataset includes classes with code smells, acquired from Qualitas Corpus (QC).
Folder 'all' contains data coming from the QC rev.20130901 (92 systems).
Folder 'domains' contains data coming from QC rev.20111026 (76 systems updated to their most recent releases from rev.20130901). 
Folder 'pca' includes results of the PCA analysis, generated with the R prcomp() function for regular PCA, and logisticPCA() function for the binary data.

Filenames include information about the base release of the QC, and a number (25, 50 or 75) that specifies the minimum number of detectors that identified a specific smell instance (25%, 50%, and 75%, respectively). For example, if a given code smell in a class X has been identified by 1 out of 4 available detecting tools, then the smell for the class X will be reported in the respective file 25, but not in 50 or 75. Please note, that for smells detected with only one tool, the values would be equal in all datasets (in that case, the smell was detected by 0% or 100% of tools)

In all files, "1" denotes that the smell was identified (subject to the limitations with the number of detectors, described above), and “0” that the smell was not found in a given class.

The filename also includes the domain abbreviation (app, css, dev, dgdv) or a keyword ALL, which indicates that the dataset includes data from all domains.

The smells have been detected by 11 tools. Most of the tools detect more than one smell. 
Information about the tool used to detect a given smell is given in headers of each file. Additionally, in 'smell detectors.csv' file we present the information about smells detected by a specific tool.

Files

Code_Smells_Dataset.zip

Files (3.0 MB)

Name Size Download all
md5:13f5adb3d883d0625bda10564c9b1e5f
3.0 MB Preview Download