Dataset of Student Solutions to Algorithm and Data Structure Programming Assignments (LREC 2022)
AD 2022 - Dataset of Student Solutions to Algorithm and Data Structure Programming Assignments (LREC 2022)
This is the data for our paper accepted at LREC 2022. The dataset contains the collected source code of programming assignments of the course "Algorithmen und Datenstrukturen" (algorithms and data structures) at Universität Hamburg. The source code was collected from all students who consented on collection and distribution in the Winter term 2019/2020, 2020/2021 and 2021/2022.
Authors
- Fynn Petersen-Frey
- Marcus Soll
- Louis Kobras
- Melf Johannsen
- Peter Kling
- Chris Biemann
Licence
This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or consult the license.txt file.
For questions regarding licensing, please contact Daniel Sitzmann.
Gathering
All source code was extracted fom the online evaluation tool Moodle/CodeRunner as used in the course. Source code in this corpus passed all implemented test cases and is therefore considered correct. The source code was pseudomised and every student was assigned a random ID.
Tasks
The dataset contains 21 tasks from courses of three years. For each task, a few example tests were given, which could be completed in an unlimited amount of tries. The description of the tasks is originally in German but we also provide English translations.
Statistics
Course | 19/20 | 20/21 | 21/22 |
---|---|---|---|
Exercises | 10 | 5 | 6 |
Students | 85 | 91 | 128 |
Correct solutions (abs.) | 541 | 415 | 570 |
Correct solutions (rel. %) | 68.5 | 75.0 | 73.3 |
Test cases | 241 | 142 | 150 |
Avg. task description length | 122.7 | 200.4 | 201.0 |
Avg. lines of code | 25.3 | 21.8 | 16.6 |
Avg. lines of code (Java) | 28.8 | 26.1 | 20.0 |
Avg. lines of code (Python) | 19.7 | 17.7 | 12.7 |
Download
Please download the zip archive of the dataset here.
ACKNOWLEDGEMENTS
We would like to thank all students who have participated in the courses and agreed to distribute and use their assignment solutions for scientific purposes. Additionally, we would like to thank Matin Urdu and Ahmad Shallouf for integrating further programming assignments for the courses during the winter semester 2020/2021 and 2021/2022.
Marcus Soll, Louis Kobras and Melf Johannsen were funded by MINTFIT Hamburg. MINTFIT Hamburg is a joint project of the four STEM universities in Hamburg: Hamburg University of Applied Sciences (HAW), HafenCity University Hamburg (HCU), Hamburg University of Technology (TUHH), University Medical Center Hamburg-Eppendorf (UKE) as well as Universität Hamburg (UHH) and is funded by the [Hamburg Authority for Science, Research and Gender Equality (BWFG).
Fynn Petersen-Frey was partly supported by the Cluster of Excellence CLICCS (EXC 2037), Universität Hamburg, funded through the German Research Foundation (DFG).