December 2, 2024

An open-source project to explore the capabilities and limitations of LLMs on coding challenges

Image by author (created with Flux 1.1 Pro)

What is this about?

If 2024 taught us anything in the realm of Generative AI, then it is that coding is one of the most promising applications for large language models (LLMs).

In this blog post, I will describe how I am using one of the most advanced LLMs, Gemini Experimental 1121, which currently leads the LMArena Leaderboard, to tackle the Advent of Code challenge.

Image by author

I will outline my approach and share my open-source code repository so that readers can explore it further and replicate the results.

Why should we care?

There are many reasons why LLMs + Coding is an exciting area, to highlight a few:

  • Code is just like language and can be learned the same way by transformer models
  • The output is easily validated — we can just run the code and check if it does
  • There is a huge demand for code assistant — they can easily increase productivity of coders manyfold

So, this is definitely an interesting and exciting direction and I thought it might be fun to explore it a bit more with a hands-on-challenge.

The Advent of Code challenge

For those not familiar with the Advent of Code challenge: It is an annual event that runs from December 1st to December 25th, offering daily programming puzzles similar to an advent calendar. Each day, a new two-part puzzle is released where coders can test their coding and problem-solving skills. It’s a fun way for developers of all levels to practice coding.

Both parts of the daily challenge revolve around a similar problem and use the same input data. The idea is to write a Python program that will process the input data and produce a solution (typically a number). Once we run the code and the model calculated the solution, we can take it and paste it into the website, which then will tell us if the solution was correct. If so, the second part will be unlocked with the a similar procedure.

The competition runs for 25 days and allows users to collect a maximum of 50 stars (2 per day).

A great challenge for LLMs

As mentioned above, this is a great challenge for LLMs. We can just take the problem statement and plug it into an LLM of our choice, let it produce the code, run the code, and take the solution that was produced and paste it into the website to see if the LLM was successful.

For this project I’m using Gemini Experimental 1121, which has greatly improved coding and reasoning capabilities. It is available through Google’s AI Studio. I use the same system prompt throughout the challenge — it is a zero-shot prompt (no chain-of-thought) with the addition that the code should expect the input via input redirection, like so:

python day01/part1.py < day01/input.txt

The system prompt is:

Provide python code to solve a given puzzle.
Assume there is an input.txt file that can be read
via input redirection in the command line.

I then post the actual challenge and Gemini will create the code that should produce the correct solution. I copy the code into the GH repo and run it and paste the produced solution into the Advent of Code website to see if it was correct.

The repository

Each day’s challenge is organized in its own directory:

dayXX/
├── input.txt # Challenge input
├── part1-problem.txt # Problem description for part 1
├── part2-problem.txt # Problem description for part 2
├── part1.py # Solution for part 1
└── part2.py # Solution for part 2

The part1 and part2-problem text files contain the problems of the challenge as stated by Advent of Code. I also appended the correct solution to the end of each text file:

Image by author

The python scripts contain teh code as produced by Gemini. To be fully transparent I also link to the actual conversations so that everyone can see and review the steps:

Image by author

To see an example of one of these chats, head over to my chat with Gemini about the day 1 challenge.

I will record all the results in a table that will give the readers a good first overview how the model has fared so far:

Image by author

Example

To get a better idea what this will look like, let’s have a look at part 1 the day 1 challenge. Here is the problem statement:

The Chief Historian is always present for the big Christmas sleigh launch, but nobody has seen him in months! Last anyone heard, he was visiting locations that are historically significant to the North Pole; a group of Senior Historians has asked you to accompany them as they check the places they think he was most likely to visit.

As each location is checked, they will mark it on their list with a star. They figure the Chief Historian must be in one of the first fifty places they'll look, so in order to save Christmas, you need to help them get fifty stars on their list before Santa takes off on December 25th.

Collect stars by solving puzzles. Two puzzles will be made available on each day in the Advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants one star. Good luck!

You haven't even left yet and the group of Elvish Senior Historians has already hit a problem: their list of locations to check is currently empty. Eventually, someone decides that the best place to check first would be the Chief Historian's office.

Upon pouring into the office, everyone confirms that the Chief Historian is indeed nowhere to be found. Instead, the Elves discover an assortment of notes and lists of historically significant locations! This seems to be the planning the Chief Historian was doing before he left. Perhaps these notes can be used to determine which locations to search?

Throughout the Chief's office, the historically significant locations are listed not by name but by a unique number called the location ID. To make sure they don't miss anything, The Historians split into two groups, each searching the office and trying to create their own complete list of location IDs.

There's just one problem: by holding the two lists up side by side (your puzzle input), it quickly becomes clear that the lists aren't very similar. Maybe you can help The Historians reconcile their lists?

For example:

3 4
4 3
2 5
1 3
3 9
3 3
Maybe the lists are only off by a small amount! To find out, pair up the numbers and measure how far apart they are. Pair up the smallest number in the left list with the smallest number in the right list, then the second-smallest left number with the second-smallest right number, and so on.

Within each pair, figure out how far apart the two numbers are; you'll need to add up all of those distances. For example, if you pair up a 3 from the left list with a 7 from the right list, the distance apart is 4; if you pair up a 9 with a 3, the distance apart is 6.

In the example list above, the pairs and distances would be as follows:

The smallest number in the left list is 1, and the smallest number in the right list is 3. The distance between them is 2.
The second-smallest number in the left list is 2, and the second-smallest number in the right list is another 3. The distance between them is 1.
The third-smallest number in both lists is 3, so the distance between them is 0.
The next numbers to pair up are 3 and 4, a distance of 1.
The fifth-smallest numbers in each list are 3 and 5, a distance of 2.
Finally, the largest number in the left list is 4, while the largest number in the right list is 9; these are a distance 5 apart.
To find the total distance between the left list and the right list, add up the distances between all of the pairs you found. In the example above, this is 2 + 1 + 0 + 1 + 2 + 5, a total distance of 11!

Your actual left and right lists contain many location IDs. What is the total distance between your lists?

### Solution: 1666427

When giving this challenge to Gemini it produces this code:

import sys

def solve():
left_list = []
right_list = []

for line in sys.stdin:
left, right = map(int, line.strip().split())
left_list.append(left)
right_list.append(right)

left_list.sort()
right_list.sort()

total_distance = 0
for i in range(len(left_list)):
total_distance += abs(left_list[i] - right_list[i])

print(total_distance)

solve()

Running this code with

python day01/part1.py < day01/input.txt

gives the result:

Image by author

As we can see, Gemini’s code produced the correct result.

Bonus: Get an explanation

As a bonus, we can also ask Gemini to provide an explanation what the code does and why it solves the challenge:

Image by author

You can find the chat with the explanation here.

Conclusion

With this project I want to explore how capable state-of-the-art LLMs are currently at solving coding challenges. I have the hypothesis that Gemini (and other LLMs) have gotten good enough to solve most of these challenges. This does, of course, not mean that they are fit (yet) to solve real software challenges that are much more complex.

That being said, I was just curious about this and decided to hop onto this fun little project. I hope you enjoy it and it gives you some insight into where we are headed with LLMs + Coding 🤗

Heiko Hotz

👋 Follow me on Medium and LinkedIn to read more about Generative AI, Machine Learning, and Natural Language Processing.

👥 If you’re based in London join one of our NLP London Meetups.

Image by author


Google Gemini Is Entering the Advent of Code Challenge was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

​An open-source project to explore the capabilities and limitations of LLMs on coding challengesImage by author (created with Flux 1.1 Pro)What is this about?If 2024 taught us anything in the realm of Generative AI, then it is that coding is one of the most promising applications for large language models (LLMs).In this blog post, I will describe how I am using one of the most advanced LLMs, Gemini Experimental 1121, which currently leads the LMArena Leaderboard, to tackle the Advent of Code challenge.Image by authorI will outline my approach and share my open-source code repository so that readers can explore it further and replicate the results.Why should we care?There are many reasons why LLMs + Coding is an exciting area, to highlight a few:Code is just like language and can be learned the same way by transformer modelsThe output is easily validated — we can just run the code and check if it doesThere is a huge demand for code assistant — they can easily increase productivity of coders manyfoldSo, this is definitely an interesting and exciting direction and I thought it might be fun to explore it a bit more with a hands-on-challenge.The Advent of Code challengeFor those not familiar with the Advent of Code challenge: It is an annual event that runs from December 1st to December 25th, offering daily programming puzzles similar to an advent calendar. Each day, a new two-part puzzle is released where coders can test their coding and problem-solving skills. It’s a fun way for developers of all levels to practice coding.Both parts of the daily challenge revolve around a similar problem and use the same input data. The idea is to write a Python program that will process the input data and produce a solution (typically a number). Once we run the code and the model calculated the solution, we can take it and paste it into the website, which then will tell us if the solution was correct. If so, the second part will be unlocked with the a similar procedure.The competition runs for 25 days and allows users to collect a maximum of 50 stars (2 per day).A great challenge for LLMsAs mentioned above, this is a great challenge for LLMs. We can just take the problem statement and plug it into an LLM of our choice, let it produce the code, run the code, and take the solution that was produced and paste it into the website to see if the LLM was successful.For this project I’m using Gemini Experimental 1121, which has greatly improved coding and reasoning capabilities. It is available through Google’s AI Studio. I use the same system prompt throughout the challenge — it is a zero-shot prompt (no chain-of-thought) with the addition that the code should expect the input via input redirection, like so:python day01/part1.py < day01/input.txtThe system prompt is:Provide python code to solve a given puzzle.Assume there is an input.txt file that can be readvia input redirection in the command line.I then post the actual challenge and Gemini will create the code that should produce the correct solution. I copy the code into the GH repo and run it and paste the produced solution into the Advent of Code website to see if it was correct.The repositoryEach day’s challenge is organized in its own directory:dayXX/├── input.txt # Challenge input├── part1-problem.txt # Problem description for part 1├── part2-problem.txt # Problem description for part 2├── part1.py # Solution for part 1└── part2.py # Solution for part 2The part1 and part2-problem text files contain the problems of the challenge as stated by Advent of Code. I also appended the correct solution to the end of each text file:Image by authorThe python scripts contain teh code as produced by Gemini. To be fully transparent I also link to the actual conversations so that everyone can see and review the steps:Image by authorTo see an example of one of these chats, head over to my chat with Gemini about the day 1 challenge.I will record all the results in a table that will give the readers a good first overview how the model has fared so far:Image by authorExampleTo get a better idea what this will look like, let’s have a look at part 1 the day 1 challenge. Here is the problem statement:The Chief Historian is always present for the big Christmas sleigh launch, but nobody has seen him in months! Last anyone heard, he was visiting locations that are historically significant to the North Pole; a group of Senior Historians has asked you to accompany them as they check the places they think he was most likely to visit.As each location is checked, they will mark it on their list with a star. They figure the Chief Historian must be in one of the first fifty places they’ll look, so in order to save Christmas, you need to help them get fifty stars on their list before Santa takes off on December 25th.Collect stars by solving puzzles. Two puzzles will be made available on each day in the Advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants one star. Good luck!You haven’t even left yet and the group of Elvish Senior Historians has already hit a problem: their list of locations to check is currently empty. Eventually, someone decides that the best place to check first would be the Chief Historian’s office.Upon pouring into the office, everyone confirms that the Chief Historian is indeed nowhere to be found. Instead, the Elves discover an assortment of notes and lists of historically significant locations! This seems to be the planning the Chief Historian was doing before he left. Perhaps these notes can be used to determine which locations to search?Throughout the Chief’s office, the historically significant locations are listed not by name but by a unique number called the location ID. To make sure they don’t miss anything, The Historians split into two groups, each searching the office and trying to create their own complete list of location IDs.There’s just one problem: by holding the two lists up side by side (your puzzle input), it quickly becomes clear that the lists aren’t very similar. Maybe you can help The Historians reconcile their lists?For example:3 44 32 51 33 93 3Maybe the lists are only off by a small amount! To find out, pair up the numbers and measure how far apart they are. Pair up the smallest number in the left list with the smallest number in the right list, then the second-smallest left number with the second-smallest right number, and so on.Within each pair, figure out how far apart the two numbers are; you’ll need to add up all of those distances. For example, if you pair up a 3 from the left list with a 7 from the right list, the distance apart is 4; if you pair up a 9 with a 3, the distance apart is 6.In the example list above, the pairs and distances would be as follows:The smallest number in the left list is 1, and the smallest number in the right list is 3. The distance between them is 2.The second-smallest number in the left list is 2, and the second-smallest number in the right list is another 3. The distance between them is 1.The third-smallest number in both lists is 3, so the distance between them is 0.The next numbers to pair up are 3 and 4, a distance of 1.The fifth-smallest numbers in each list are 3 and 5, a distance of 2.Finally, the largest number in the left list is 4, while the largest number in the right list is 9; these are a distance 5 apart.To find the total distance between the left list and the right list, add up the distances between all of the pairs you found. In the example above, this is 2 + 1 + 0 + 1 + 2 + 5, a total distance of 11!Your actual left and right lists contain many location IDs. What is the total distance between your lists?### Solution: 1666427When giving this challenge to Gemini it produces this code:import sysdef solve(): left_list = [] right_list = [] for line in sys.stdin: left, right = map(int, line.strip().split()) left_list.append(left) right_list.append(right) left_list.sort() right_list.sort() total_distance = 0 for i in range(len(left_list)): total_distance += abs(left_list[i] – right_list[i]) print(total_distance)solve()Running this code withpython day01/part1.py < day01/input.txtgives the result:Image by authorAs we can see, Gemini’s code produced the correct result.Bonus: Get an explanationAs a bonus, we can also ask Gemini to provide an explanation what the code does and why it solves the challenge:Image by authorYou can find the chat with the explanation here.ConclusionWith this project I want to explore how capable state-of-the-art LLMs are currently at solving coding challenges. I have the hypothesis that Gemini (and other LLMs) have gotten good enough to solve most of these challenges. This does, of course, not mean that they are fit (yet) to solve real software challenges that are much more complex.That being said, I was just curious about this and decided to hop onto this fun little project. I hope you enjoy it and it gives you some insight into where we are headed with LLMs + Coding 🤗Heiko Hotz👋 Follow me on Medium and LinkedIn to read more about Generative AI, Machine Learning, and Natural Language Processing.👥 If you’re based in London join one of our NLP London Meetups.Image by authorGoogle Gemini Is Entering the Advent of Code Challenge was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.  llm, coding, gemini, chatgpt, genai Towards Data Science – MediumRead More

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

FavoriteLoadingAdd to favorites
December 2, 2024

Recent Posts

0 Comments

Submit a Comment