Thank you for participating in this study to evaluate the quality of AI-generated code. Your expertise is crucial in helping us understand the performance of different code generation models.
Your Task: You will be presented with 10 code generation tasks. For each task, you will see:
1. A Task Description, which includes a natural language objective and the available programmatic context.
2. The Groud Truth of this task, and three anonymous Code Solutions (A, B, C) generated by different AI models.
Please use the following definitions to score each code solution on a scale of 0 to 2.
Criterion 1: Correctness (Does the program satisfy the given requirement?)
- 0 points: The program is totally inconsistent with the requirement.
- 1 point: The program is implemented, but misses some details.
- 2 points: The program is correctly implemented.
Criterion 2: Maintainability (Is the implementation standardized and does it have good readability?)
- 0 points: The program does not follow a consistent specification, uses many meaningless variable names, or has repetitive and redundant code.
- 1 point: The program implementation meets certain specifications, but some variable names could be further refined.
- 2 points: The program implementation is relatively standardized, variable naming is semantically straightforward, and readability is good.