Tell HN: OpenAI can solve Leetcode problems

I gave OpenAI the following LC prompt:

https://leetcode.com/problems/determine-if-two-strings-are-close

https://gist.github.com/autotune/e6345fab54f6acab0ba95c45ec15f22a#file-openai_test_prompt-txt

It returned the following result that matched 3/3 test cases:

https://gist.github.com/autotune/e6345fab54f6acab0ba95c45ec15f22a#file-openai_test_result-txt

I didn't even have to do any manual formatting, it just generated the solution right where it needed to. Now that LC has been automated can we move onto trying something else in the standard software dev interview process?

55 points | by autotune 941 days ago

16 comments

teraflop 941 days ago
This code superficially resembles a correct solution, but it is not correct, and if it passes, it's only because the test cases for this problem are woefully sparse.
(EDIT: I tried actually submitting it, and it doesn't pass Leetcode's tests, so I'm not sure what OP was talking about.)
For example, the correct answer for closeStrings("aabbcc", "abbbcc") is false, but this solution incorrectly returns true.
In addition, it has O(n^3) time complexity and may take a very long time to run. Try closeStrings("a"*1000+"b", "a"*1000+"c"), and then consider that the problem statement allows each string to be up to 100,000 characters in length.
[-]
- b20000 941 days ago
  that doesn’t really matter. leetcode interviews are about producing the solution the interviewer expects.
  [-]
  - saxonww 941 days ago
    Maybe I've just drunk the koolaid, but as someone who has given these 'coding challenge' interviews, I:
    - Do go into them expecting to see a certain solution.
    - Do not penalize people for giving me a different one, if it works. I think that's awesome, personally.
    - Do ask people if they can think of additional test cases vs. those provided, and I penalize if there's obvious ones that they don't come up with, or if they just give me a bunch of variations of what we've already provided.
    - Do consider whether people actually read and/or understood the prompt (I read it to them). If I've asked you to write a function that returns a boolean, and you write a function that prints 'true' and doesn't return anything, and after some gentle prodding you don't get it, then you didn't solve the problem.
    - Do ask for you to talk through your solution and approach so I understand how you're thinking about the problem.
    I need to know whether you can code, and more importantly that you understand why you're doing what you're doing. I can't ask your prior employer. Not everyone has open source contributions or a portfolio of work they can show me, and no, I am not going to just look at your resume and say "well, they worked for a software company before so clearly they know what they're doing."
    I prefer take home exercises, where there's less manufactured pressure. I think they also lend themselves to a wider variety of challenge vs. the online 1 hour test using coderpad or something, meaning you may be able to ask something more appropriate for the role. But this approach upsets people as well.
    [-]
    - haswell 941 days ago
      > I need to know whether you can code, and more importantly that you understand why you're doing what you're doing.
      I haven't gone through the interview process since 2013, so I might be a bit behind here, but are these challenges still happening under pressure in a highly time constrained environment?
      Because if they are, and if you interview me, you will probably not learn whether or not I can code, or if I know why I'm doing what I'm doing.
      I know this is a subject that has been discussed to death, but all this tells you is whether the candidate is good at high pressure coding exercises.
      I'm on sabbatical and half considering semi-retirement, but economic pressures are pushing me in the direction of seeking work. If there's one thing I do know, it's that I will pass on companies who insist on this kind of exercise.
      Let me take it home, and that's a different story. I know that not everyone will be happy with every approach, but time and time again, I've found the high pressure exercises useless, and that includes when I'm interviewing candidates.
      One dev team at my former company passed on me because I didn't do well in one of these high pressure exercises. A different team picked me up (they didn't believe in this type of exercise), and I spent over 8 years at the company and was known as a top performer.
      [-]
      - saxonww 941 days ago
        I try to point out at the beginning that we understand it's manufactured pressure, and try to ease their minds if we can before getting started. In particular, we ask people to solve the problems in whatever language they are most comfortable with, and that it's an open book test; we're not testing you on whether you've memorized the Python standard library. This can certainly make things interesting! I've spent most of my career with perl, python, c#, and java, so the guy deciding to do his challenge in clojure was perhaps more of a challenge for me than it was for him.
        I don't think there's a universal definition of 'high pressure,' so I am not sure what else to say about this. It's certainly not a fishbowl experience where you're surrounded by 20 people yelling and pleading for you to save their lives by writing a function to reverse a string in place. And while I've read that this happens elsewhere, we're not critiquing you on the spot and arguing with you about why using this technique causes the interpreter to do some less efficient thing and how could you be so stupid.
        We try to coach and offer suggestions though, to help a candidate get to the right answer. That doesn't mean they necessarily passed - if we have to basically give you the answer, then you didn't solve the problem, and the coding interview is one of several inputs into a hiring decision - but I don't like to let candidates leave feeling like they failed if I can help it. I like to find at least one thing to praise about the solutions they turn in and tell them why I liked what they did. It is not the point of these interviews to ruin someone's day or make them feel worthless.
        All that said, we can only do so much. Ultimately whether you feel an environment is high pressure is on you, and how you handle it is also on you. We would regret missing out on a good candidate but I'm sure it happens. There are other candidates, and other opportunities for those we've missed out on.
        [-]
        quickthrower2 940 days ago
        High pressure to me, in this case, is because the stakes are high. And if not enough time is given that makes it pressure. The whole way I think in these circumstances is different to on the job. There is more shortcutting, looking for artificial tricks and so on. In the real job more deep thinking/strategic/ looking for real world murphys law type things that’ll bite you.
        But interviewing, i.e. determining if you want 5k hours work from someone in less than 5 hours, is going to be heuristic and imperfect. Always.
        haswell 941 days ago
        > Ultimately whether you feel an environment is high pressure is on you, and how you handle it is also on you.
        Obviously that is subjective, but this is on everyone involved, especially the company creating said environment. Twitter 2.0 is an obvious (if extreme) example.
        The bottom line is that the whole approach is manufactured, and not indicative of real world problem solving skills or what the candidate will be dealing with on the team.
        > We would regret missing out on a good candidate but I'm sure it happens.
        It absolutely happens, and I’d bet good money that it happens often. Whether they realize it or not, most teams that take this approach end up creating an environment that is skewed towards a particular kind of personality, and that is ultimately a net negative in the long run.
        Back to my personal anecdote (which I’ve observed over and over throughout my career), the teams that rely too heavily on this approach end up rather dysfunctional and unbalanced over time. The team that turned me down was disbanded after some horrific security issues came to the surface. They were so focused on whether candidates could solve certain algorithms that they apparently didn’t verify that these devs understood the fundamentals of client-side vs. server-side validation and why they’re important (and not the same thing). But those devs could bang out a chess algorithm on a whiteboard with the best of them.
        I’m not saying that’s how you’re running things, but in my experience, teams that overemphasize this are doing so at the expense of far more important indicators. It takes a lot of discipline and care to not let this become a crutch.
        I’m also not saying it isn’t important to verify that a candidate has the necessary skills, but having a conversation about person’s learning style, how they got into development, how they go about taking on new tasks, etc can go a long way. I care more about mindset and approach to information acquisition than I do about what a candidate can whip up in a hour.
        Sitting down together and going over some existing code (after giving them enough time to absorb it) while having a discussion about the nuances of that code is far more collaborative, and in my experience far more likely to indicate whether or not the candidate knows what’s going on.
        I’ve spoken to many like minded and high performing colleagues who will not consider companies that are overly focused on this.
        Another colleague was recently offered a position after going through such a process, but bailed when they failed to follow up with enough meaningful conversation. Her conclusion: if that’s all they care about, I don’t trust that they’ve built a good team.
        There’s a growing contingent of the types of people who you’d want the most, but will not consider your team for doing this.
        [-]
        saxonww 941 days ago
        > I’m not saying that’s how you’re running things
        Good, because we're not.
        The main takeaway here, which I think we agree on, is that a good hiring decision requires more than one input. We happen to use a coding challenge as one input in our decisions, but I agree that a collaborative discussion could work too. In particular, we do ask about background and learning style, among other things, in subsequent interviews.
        Overall, if I had to attach a percentage to it, I'd say the coding challenge is no more than 25% of the overall decision, with the provision that if you clearly just don't know what you're doing at all - which has happened just a handful of times that I can remember - we're not going to invest more time in you as a candidate.
        > There’s a growing contingent of the types of people who you’d want the most, but will not consider your team for doing this.
        All I can say to this is that you're probably right, but also that almost no one is as uniquely qualified as they think they are. There are millions of candidates, and especially now that more software companies are OK with remote work, we can consider more of them than before. This is really good and really bad, the latter because employers are even more likely to :shrug: at this sort of thing and move to an equivalently-capable candidate who doesn't complain about having a one hour coding test.
        [-]
        haswell 941 days ago
        Yeah, I think we’re on the same page regarding the importance of multiple inputs.
        > In particular, we do ask about background and learning style, among other things, in subsequent interviews.
        In the past, I’ve done the opposite, and started with these questions up front. In my mind, they’re the most important thing to learn about a candidate, but I realize not everyone goes about this the same way. It does require more time investment up front though, and I get that.
        > I'd say the coding challenge is no more than 25% of the overall decision, with the provision that if you clearly just don't know what you're doing at all - which has happened just a handful of times that I can remember - we're not going to invest more time…
        If this is the case, wouldn’t that be evidence that this process should be de-emphasized even further, or at least moved to a later stage?
        Correct me if I’m misreading this, but if almost everyone coming through has had some level of success with the coding stage, it’s not really screening many candidates out, but still could be changing the candidate’s view of your team and company. If it’s the first thing they encounter, it’s a big part of the first impression the candidate forms.
        In a worst case scenario, you may drive someone good away with minimal benefit to the hiring process, and in a best case scenario, you’ve only saved a few conversations given the low failure rate.
        > but also that almost no one is as uniquely qualified as they think they are.
        I completely agree with you here, I just don’t believe an artificial coding test is the antidote to that. The more experience and confidence someone has (warranted or not), the less they feel they need to put themselves through pain for their next role.
        Really not trying to be difficult here, but trying to understand the mindset and strategy behind this style of hiring, and I’m not quite seeing where the value is coming from based on what you’ve shared.
        I will say that what you describe is one of the more reasonable stances I’ve come across, but still seems potentially counterproductive.
        I do hear you regarding the access to millions of candidates. It’s a great problem to have, I just wonder if the old ways are still relevant.
        Thanks for being willing to share your thought processes on this - it’s what I love about this site.
    - bobleeswagger 941 days ago
      > I need to know whether you can code
      On the flip-side, how much time do you spend explaining to an interviewee what hurdles they must overcome in order to code in the first place? Cultural blocks? Inefficiencies that don't have an owner or visibility to the greater org?
      If your infra is well architected and any random engineer can start contributing within a week to a month, kudos.
      In my experience, way too much energy is put into "can they do the job?" when the energy should be in "is this an environment where engineers can do their job?"
      [-]
      - saxonww 941 days ago
        This is a good question. At least where I'm working, we have multiple conversations with a candidate throughout the interview process. When I have the opportunity, I do talk about technical or process challenges we have and try to be up front about aspects of a job that we know from experience can be a problem for some people.
        For example, I'm on an internal tooling team, and so there's a support component where we have to be accessible and accountable to devs who encounter problems using tools we've created. We have a support rotation where one person a sprint is primarily tasked with being the first contact for people with questions, problems, etc. We've had people on the team before who didn't like this and it showed because they put little effort into it. So now we're very up front during the interviews that this is part of the job; we don't want to hire someone who will be unhappy in the role. It's what we think is the best we can do.
        It's hard to answer about cultural blocks, because I haven't experienced those myself. I'm not sure how to even approach that; "hey, I see you're $different, here's some potential challenges you may need to deal with" ? That seems like it is as likely to offend as aid. That said, we do try to get a candidate in front of a variety of people, both to get a sense of how they deal with different people, and to give the candidate an opportunity to get feedback from people in different parts of the organization.
    - whimsicalism 941 days ago
      > But this approach upsets people as well.
      Ultimately you have to realize that any process that is gating people from an opportunity & resources is going to upset people.
      People have an amazing capacity to make up elaborate reasons and narratives for things to be unfair, especially when there is a group of them.
    - b20000 941 days ago
      you forgot to mention that you don't care about their experience relevant for the position, other projects they completed succesfully on their own, code they have written themselves for these projects, and you also forget to mention that you expect them to provide the optimal solution in under 2 minutes, and if they don't, they are out.
      you also seem to think that everyone is able to talk while they are coding, which is not the case.
  - dogbox 941 days ago
    What kind of interviewer expects a solution that's not only incorrect, but also woefully inefficient?
    [-]
    - actually_a_dog 941 days ago
      Nobody expects that, obviously, but such solutions do occasionally get accepted (by humans). Beyond a certain basic level of competency, a lot of interviews, including LC-based ones, are about how good the candidate is at story telling and/or explaining their thought process.
  - nine_zeros 941 days ago
    > that doesn’t really matter. leetcode interviews are about producing the solution the interviewer expects.
    This. In an interview, it doesn't matter. The interview process only selects for correct results, not for an individual's ability to solve problems.
    [-]
    - teraflop 941 days ago
      But this is a demonstration that OpenAI doesn't produce correct results, which becomes obvious as soon as you hit the "Submit" button on Leetcode.
  - teraflop 941 days ago
    So you're saying the interviewer expects a wrong solution, and the wrong solution they expect is the same wrong one the AI generates (which happens to be twice as long as the correct version), and the interviewer doesn't bother to run the tests?
    [-]
    - whimsicalism 941 days ago
      Hacker News doesn't generally really make that much sense on topics adjacent to leetcode or discussing tech interviews, best to stay away.
      [-]
      - dogbox 941 days ago
        This thread is making me feel like I'm taking crazy pills! Staying away is great advice.
  - autotune 941 days ago
    Exactly. Would I use this for actual production code in any way right now? Probably not. In an interview setting where the interviewer thinks testing candidates through Leetcode is a good strategy? Maybe just to see if I could use it to make it to the next round.
  - d0liver 941 days ago
    If all they expect are passing test cases, then just use canned values in your code to make the tests pass. This is way more performant and even more efficient to write.
    [-]
    - teraflop 941 days ago
      Part of the point of this style of challenge is that you're not given all of the test cases up front. And even if you are, you generally can't come up with those "canned values" without actually figuring out an algorithmic solution.
      As I said, some of the test cases for this particular problem can be strings up to 100,000 characters long. Do you really think it's faster to solve those by hand than to come up with a correct algorithm to do it for you?
going_ham 941 days ago
Dear executives,
You have AI that is capable of solving your interview questions. These questions were designed to select the best candidates. Now you have AI that is cheaper and faster. Please, fire your engineers and embrace the next generation of programming. Why would you need engineer when a bot will do it for you? After all, your candidates suck at leetcode interview.
[-]
- jstx1 941 days ago
  Dear candidate,
  What's even cheaper and faster than the AI solution is for us to look up the answer in our own database. It turns out that we already know the answers to all the questions we ask in interviews. We're afraid that you've completely misunderstood that goals of our interviewing process.
  [-]
  - spoils19 941 days ago
    What are the goals then?
- paulcole 941 days ago
  > These questions were designed to select the best candidates
  I’d reconsider this assumption. My assumption is also wrong but in a different way. I think these questions were designed to (in no particular order):
  • Identify hirable candidates
  • Be very scalable and allow for the interviewing, rejection, and acceptance of hundreds/thousands of candidates
  • Determine who would put the work in to pass this sort of screening
- riku_iki 941 days ago
  > These questions were designed to select the best candidates.
  more like do initial screening and filter out garbage candidates?
- hcks 941 days ago
  Im sorry you still can’t code a medium leetcode after all these years
samvher 941 days ago
It could be that it was trained on leetcode content? I.e. that it has seen this problem more or less verbatim. Any information on whether that would be the case? I'm not so familiar with what OpenAI do and don't use for training.
b20000 941 days ago
thanks, since AI is not capable of generating novel/unique output itself yet without training, but can only learn to remix and transform what is already there, it makes sense that it would do well on leetcode.
this implies that hiring people via leetcode interviews will result in staff who can only copy paste and remix standard solutions they have been trained on and are incapable of being creative and generating completely new or unorthodox solutions.
the latter is what most companies need, but don’t know yet they need.
leetcode was the wet dream of executives. a standard factory like process to produce more code drones in a cheap and efficient way. the next wet dream is replacing them with AI. they forget that software development is a creative process.
[-]
- GaggiX 941 days ago
  >thanks, since AI is not capable of generating novel/unique output itself yet without training, but can only learn to remix and transform what is already there, it makes sense that it would do well on leetcode.
  I guess you're implying that this doesn't happen with humans as well, although if you give a pen to an infant or a man who has always lived in a cave he's not going to draw anything meaningful like an untrained AI, I have the opinion that everything is derivative.
- nicholasjarnold 941 days ago
  > they forget that software development is a creative process
  Exactly. Very few executives have read Hackers & Painters [0], from what I've been able to anecdotally personally gather. Nobody (almost) should be fearful that low-code solutions and/or AI will replace their livelihood yet.
  [0] - http://www.paulgraham.com/hackpaint.html
tarr11 941 days ago
The kind of work solved by leetcode will be solved by AI. Testing for it interviews in a few years will make no more sense than testing for people to generate optimized machine code. (Unless you are a super specialized systems developer who actually writes such things)
This will enable engineers to focus on higher order problems, similar to how compilers and programming languages allowed engineers to be more productive than having to write assembly.
The way that this play out for software development is we may include gpt prompts to our code repositories and treating them as library functions or macros which could be compiled as regular code, or interpreted at runtime.
jstx1 941 days ago
It's cool if it can really do it (as opposed to spitting out wrong solutions that look plausible at a glance) but I don't think it's a good argument against LC interviews. The point is that they're testing your understanding of general CS concepts and how you communicate about code, a language model being able to solve these problems doesn't really change any of this. It's not like the company cares about your actual solution, they already know what the solution is.
[-]
- autotune 941 days ago
  >It's not like the company cares about your actual solution
  Every time I have ever been tested in any kind of interview, they 100 percent care more about the solution than how to get there. Some of the honest ones have admitted it, some of the others have pretended they don't to try and make you feel better about not reaching it in time. I have never passed an interview with a live coding round, especially one involving leetcode, where they were fine with me not having reached a solution in the 10, 35, or 45 minute time window given.
  [-]
  - jstx1 941 days ago
    I think I phrased it in a confusing way. They care about your solution in the sense that they want to know if/how you'll solve the problem. They don't care in the sense that they aren't trying to learn the answer by interviewing you. It's not like they have a critical red-black tree problem and they called you in to fix things for them. They know the solution so the existence of an additional out-of-interview way to get to the solution like a language model doesn't change anything about the usefulness of the interview.

heurist 941 days ago

It solves easy problems and gets nowhere close on hard problems. But I'm sure that's coming.

[-]

sayogo1227 941 days ago

Here is what I tried for (https://leetcode.com/problems/longest-increasing-path-in-a-m...)

INPUT:

Given an m x n integers matrix, return the length of the longest increasing path in matrix.

From each cell, you can either move in four directions: left, right, up, or down. You may not move diagonally or move outside the boundary (i.e., wrap-around is not allowed).

Input: matrix = [[9,9,4],[6,6,8],[2,1,1]] Output: 4 Explanation: The longest increasing path is [1, 2, 6, 9].

class Solution: def longestIncreasingPath(self, matrix: List[List[int]]) -> int:

ChatGPT:

if not matrix or not matrix[0]: return 0

            # initialize the 2D array with all zeros
        dp = [[0] * len(matrix[0]) for _ in range(len(matrix))]
        
        # initialize the result with 0
        result = 0
        
        # iterate through each cell in the matrix
        for i in range(len(matrix)):
            for j in range(len(matrix[0])):
                # perform DFS starting from the current cell
                result = max(result, self.dfs(matrix, dp, i, j))
        
        # return the result
        return result

    # helper function to perform DFS
    def dfs(self, matrix, dp, i, j):
        # if the path length is already calculated, return it
        if dp[i][j] > 0:
            return dp[i][j]
        
        # initialize the path length with 1
        path_len = 1
        
        # check the adjacent cells
        for dx, dy in [(1, 0), (-1, 0), (0, 1), (0, -1)]:
            # get the coordinates of the adjacent cell
            x, y = i + dx, j + dy
            
            # check if the adjacent cell is valid
            if 0 <= x < len(matrix) and 0 <= y < len(matrix[0]):
                # check if the adjacent cell has a smaller value
                if matrix[x][y] > matrix[i][j]:
                    # perform DFS on the adjacent cell and update the path length
                    path_len = max(path_len, 1 + self.dfs(matrix, dp, x, y))
        
        # store the path length in the 2D array
        dp[i][j] = path_len
        
        # return the path length
        return path_len

[-]

DenisM 941 days ago
Have you tried submitting it?
[-]
- sayogo1227 940 days ago
  Yes. Accepted.
  [-]
  - DenisM 940 days ago
    So what do we make of that? A really good search engine, or an entity that can synthesize?

yakkityyak 941 days ago
We've basically just automated copying and pasting stack overflow answers.
[-]
- Workaccount2 941 days ago
  As opposed to manually doing it now.
  [-]
  - DonsDiscountGas 940 days ago
    ChatGPT has a wayyyyy nicer interface than SO.
wizofaus 940 days ago
I gave it a similar example that happened to be a function I needed to write recently (case-insensitive substring replacement, which is missing from the version of .NET framework I was stuck with), and the solution it gave looked superficially correct too (and very similar to what I wrote), except it actually lower-cased everything, which I wouldn't consider correct. I just had to re-prompt "now without using ToLower", and it gave the correct solution though. I haven't tried co-pilot yet, but it would be curious to see how similar they are.
I then asked it to give me a class to convert a Stream to a seekable Stream (something else I needed to do recently), and again, pretty impressed with the result (though I needed to avoid using MemoryStream so it could work with potentially huge streams, I tried to explain that but the solution didn't look right at all - basically the only way you can do it is by storing the data on disk).
frontman1988 941 days ago
It's basically ingested the question and the answer. Won't be able to solve a new question, or just changing the language of the question fools the AI. Example it's not trained on codeforces programming competetion website and fails to solve most of the problems like these:
https://codeforces.com/contest/1672/problem/D
Leetcode is a hazing ritual and it doesn't matter if AI excels in it, what matters is if human programmers are able to 'perform' in front of an interviewer. So leetcode is here to stay, otherwise you will have these competetive programming problems replacing them and turn interviewing into an even bigger nightmare.
aaek9 931 days ago
Can you write a cover letter to become chief marketing manger at car wash
tmaly 941 days ago
GPT-4 might be able to handle it.
All we need now is a pocket AI device out of one of William Gibson's novels paired to a neurallink and one can solve the hardest interview questions.
DenisM 941 days ago
As time goes more content on the internet will be produced by these models, and eventually these models will be learning from each other rather than from humans.
That kind of snake eating its own tail will likely accumulate errors over time, and eventually the internet will be just chaos.
[-]
- jstx1 941 days ago
  Wouldn't it be more converging to a bad average of everything instead of being chaos?
  [-]
  - DenisM 940 days ago
    It’s not obvious to me. It might degenerate into an alphabet soup, or it might stabilize around something higher level. I guess it’s on the same spectrum.
DonsDiscountGas 940 days ago
I've tried a few easy-levels and it passed them all. I tried one medium and it completely failed.
I especially liked its solution for two sum; first it gave the O(N^2) solution and then it gave the O(N) solution. With good explanations for both.
DenisM 940 days ago
There’s no limit to what you can do when testing your model on the training dataset.
bugfix-66 941 days ago
It was trained on numerous Leetcode solutions (question/answer), which you can find on GitHub. People are proud of their Leetcode efforts: they post their solutions, as if it is an achievement.
Now tell me, how many of the BUGFIX-66 problems can it solve?
https://BUGFIX-66.com
Copilot can solve a few of the simple ones at the beginning (simple matrix multiplication, simple radix sort, etc., that appear often in the training data, and some of the harder ones whose solution appears on GitHub, e.g., the uncorrected prediction/correction compressor/decompressor whose solutions were front-page on Hacker News).
But as you go down the list, into material outside the training set, the language model won't be able to solve it.
Try it. Paste the tile, the introductory text, and the unfixed code, and see whether the omnipotent AI can solve these (quite straightforward) problems. For how many does it need the hint?
[-]
- kapsi 940 days ago
  Let me check in a year