Comparing Large Language Models and Human Programmers for Generating Programming Code
Abstract The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT‐4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding perform...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2025-02-01
|
| Series: | Advanced Science |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/advs.202412279 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract The performance of seven large language models (LLMs) in generating programming code using various prompt strategies, programming languages, and task difficulties is systematically evaluated. GPT‐4 substantially outperforms other LLMs, including Gemini Ultra and Claude 2. The coding performance of GPT‐4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT‐4, employing the optimal prompt strategy, outperforms 85 percent of human participants in a competitive environment, many of whom are students and professionals with moderate programming experience. GPT‐4 demonstrates strong capabilities in translating code between different programming languages and in learning from past errors. The computational efficiency of the code generated by GPT‐4 is comparable to that of human programmers. GPT‐4 is also capable of handling broader programming tasks, including front‐end design and database operations. These results suggest that GPT‐4 has the potential to serve as a reliable assistant in programming code generation and software development. A programming assistant is designed based on an optimal prompt strategy to facilitate the practical use of LLMs for programming. |
|---|---|
| ISSN: | 2198-3844 |