Software Engineering Research Group - SERG

Computer Science | Faculty of Engineering, LTH

Active Learning

Experiment package for On Using Active Learning and Self-training when Mining Performance Discussions on Stack Overflow (EASE'17)

Raw data


A set of Stack Overflow posts in separate text files. The filename corresponds to the post ID. The files are either in the folder "training set" (i.e., it has a manual annotated of being performance related or not) or in the folder "unlabelled".

In the root folder, "files.xlsx" lists all files and in which active learning iteration (batch 0-16, A and B denotes separate annotators) it was annotated as well as its label (1=related to performance, 0=not related to performance).

Finally, the folder "components" contains, per iteration, the component names we identified and annotated.

For more information, please contact Markus Borg.