Experiment package for On Using Active Learning and Self-training when Mining Performance Discussions on Stack Overflow (EASE'17)
A set of Stack Overflow posts in separate text files. The filename corresponds to the post ID. The files are either in the folder "training set" (i.e., it has a manual annotated of being performance related or not) or in the folder "unlabelled".
In the root folder, "files.xlsx" lists all files and in which active learning iteration (batch 0-16, A and B denotes separate annotators) it was annotated as well as its label (1=related to performance, 0=not related to performance).
Finally, the folder "components" contains, per iteration, the component names we identified and annotated.
For more information, please contact Markus Borg.