As long as AI (genAI, LLMs, whatever you call it describe the current tech) is perceived not as a "bicycle of the mind" and a tool to utilize 'your' skills to a next phase but as a commodity to be exploited by giant corporations whose existence is based on maximizing profits regardless of virtue or dignity (a basic set of ethics to, for example, not to burn books after you scan it feed your LLM like Anthropic), it is really hard to justify the current state of AI.
Once you understand the sole winner in this hype is the one who'll be brutally scraping every bit of data, whether it's real-time or static and then refining it to give it back to you without your involvement in the process (a.k.a, learning) you'll come to understand that the current AI by nature is hugely unfavorable to mental progression...
IANAL, I can only cite court decision: "And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies."
> As long as AI (...) is perceived not as a "bicycle of the mind" and a tool to utilize 'your' skills to a next phase but as a commodity (...), it is really hard to justify the current state of AI.
I don't agree at all. The "commodity" argument is actually a discussion on economic viability. This is the central discussion, and what will determine if tomorrow we will still have higher-quality and up-to-date LLMs available to us.
You need to understand that nowadays there is a clear race to the bottom in LLM-related services, at a time when the vast majority is not economically viable. The whole AI industry is unsustainable at this point. Thus it's rather obvious that making a business case and generating revenue is a central point of discussion.
Highlighting this as a good thing is just... not great
> Code you did not write is code you do not understand
> You cannot maintain code you do not understand
We need no AI for this one: If I could only maintain code I wrote, I'd have to work alone. Whether I am reviewing code written by AI or by a human is irrelevant here. The rest of the comments on the human we know having more trust is in itself a smell on your development process: This is how we get cliques, and people considering repos hostile, because there's clear differences in behavior for well known contributors, who are just as capable of writing something bad as anyone else. If there's anything I have seen in code review processes, regardless of where I've worked for the last couple of decades, is that visual inspections rarely get us anywhere, and no organization is willing to dedicate the time to double check anything. Bugs that happily go through inspection come in too even when the organization commits to extreme programming: Full time TDD and full pairing.
There is a question on how fast we can merge code in a project safely, regardless of the identity of the author of the PRs. A systematic approach to credible, reliable development. But the answer here has very little to do with what the article's author is saying, and has nothing to do with whether a contribution was made by an AI, a human, or a golden retriever with access to a keyboard and a lot of hope.
> If I could only maintain code I wrote, I'd have to work alone.
I don't think that "Code you did not write is code you do not understand" implies that the only way to understand code is to write it. I can read code, run it, debug it and figure out how it works.
It's true that all of that is possible with AI generated code. But the thing is, is it worth for me spending time understanding that PR, when I can make the change or launch my own agent to prompt it myself? At least with my own prompt I know exactly what I want it to be.
> We need no AI for this one: If I could only maintain code I wrote, I'd have to work alone.
I think you missed the whole point. This is not about you understanding a particular change. This is about the person behind the code change not understanding the software they are tasked to maintain. It's a kin to the discussion about the fundamental differences between script kiddies vs hackers.
With LLMs and coding agents, there is a clear pressure to turn developers into prompt kiddies: someone who is able to deliver results when the problem is bounded, but is fundamentally unable to understand what he did or the whole system being used.
This is not about sudden onsets of incompetence. This is about a radical change in workflows that no longer favor or allow research to familiarize with projects. You no longer need to pick through a directory tree to know where things are, or nagivate through code to check where a function is called or what component is related to which component. Having to manually open a file to read or write to it represents a learning moment that allows you to recall and understand how and why things are done. With LLMs you don't even understand what is there.
Thus developers who lean heavily on LLMs don't get to learn what's happening. Everyone can treat the project as a black box, and focus on observable changes to the project's behavior.
I read this and thought, "are we using the same software?" For me, I have turned the corner where I barely hand-edit anything. Most of the tasks I take on are nearly one-shot successful, simply pointing Claude Code at a ticket URL. I feel like I'm barely scratching the surface of what's possible.
I'm not saying this is perfect or unproblematic. Far from it. But I do think that shops that invest in this way of working are going to vastly outproduce ones that don't.
LLMs are the first technology where everyone literally has a different experience. There are so many degrees of freedom in how you prompt. I actually believe that people's expectations and biases tend to correlate with the outcomes they experience. People who approach it with optimism will be more likely to problem-solve the speed bumps that pop up. And the speed bumps are often things that can mostly be addressed systemically, with tooling and configuration.
If all you're doing is reviewing behaviour and tests then yes almost 100% of the time if you're able to document the problem exact enough codex 5.3 will get it right.
I had codex 5.3 write flawless svelte 5 code only because I had already written valid svelte 5 code around my code.
The minute I started a new project and asked it to use svelte 5 and let it loose it not only started writing a weird mixture of svelte 3/4 + svelte 5 code but also straight up ignored tailwind and started writing it's own CSS.
I asked it multiple times to update the syntax to svelte 5 but it couldn't figure it out. So I gave up and just accepted it, that's what I think is going to happen more frequently. If the code doesn't matter anymore and it's just the process of evaluating inputs and outputs then whatever.
However if I need to implement a specific design I will 100% end up spending more time generating than writing it myself.
I'm working in a very mature codebase on product features that are not technically unprecedented, which probably is determining a lot of my experience so far. Very possible that I'm experiencing a sweet spot.
I can totally imagine that in greenfield, the LLM is going to explore huge search spaces. I can see that when observing the reasoning of these same models in non-coding contexts.
That's exactly what I meant, when I've used LLMs on mature code bases it does very well because the code base was curated by engineers. When you have a greenfield project it's slop central, it's literally whatever the LLM has been trained on and the LLM can get to compile and run.
Which is still okay, only until I have access to good and cheap LLMs.
This person is not using Claude Code or Cursor. They refuse to use the tools and have convinced themselves that they are right. Sadly, they won't recognize how wrong they were until they are unemployable.
I was a huge skeptic on this stuff less than a year ago, so I get it. For a couple years, the hype was really hype, when it came to the actual business utility of AI tools. It's just interesting to me the extent to which people have totally different lived experiences right now.
I do agree that some folks are in for rude awakening, because markets (labor and otherwise) will reveal winning strategies. I'm far from a free market ideologist, but this is a place where the logic seems to apply.
To be totally fair to them... it is quite literally in the last few months that the tools have actually begun to meet the promises that the breathless hypers have been screeching about for years at this point.
But it's also true that it simply is better than the OP is giving it credit for.
If Claude Code or Cursor is actually that good then we're all unemployed anyway. Using the tools won't save any of our jobs.
I say this as someone who does use the tools, they're fine. I have yet to ever have an "it's perfect, no notes" result. If the bar is code that technically works along the happy path then fine, but that's the floor of what I'm willing to put forth or accept in a PR.
> If Claude Code or Cursor is actually that good then we're all unemployed anyway. Using the tools won't save any of our jobs.
There is absolutely reason for concern, but it's not inevitable.
For the foreseeable future, I don't think we can simply Ralph Wiggum-loop real business problems. A lot of human oversight and tuning is required.
Also, I haven't seen anything to suggest that AI is good at strategic business decisionmaking.
I do think it dramatically changes the job of a software developer, though. We will be more like developers of software assembly lines and strategists.
Every company I have ever worked for has had a deep backlog of tasks and ideas we realistically were never going to get to. These tools put a lot of those tasks in play.
> I have yet to ever have an "it's perfect, no notes" result.
It frequently gets close for me, but usually some follow-up is needed. The ones that are closest to pure one-shot are bug fixes where replication can be captured in a regression test.
> Every company I have ever worked for has had a deep backlog of tasks and ideas we realistically were never going to get to. These tools put a lot of those tasks in play.
Some of that backlog was never meant to be implemented. “Put it in the backlog” is a common way to deflect conflict over technical design and the backlog often becomes a graveyard of ideas. If I unleashed a brainless agent on our backlog the system would become a Frankenstein of incompatible design choices.
An important part of management is to figure out what actually brings value instead of just letting teams build whatever they want.
> If Claude Code or Cursor is actually that good then we're all unemployed anyway.
I don't know about that. This PR stunt is a greenfield project that no one really knows what volume of work went behind it, and targeted a problem (bootstrapping a C compiler) that is actually quite small and relatively trivial to accomplish.
Go ahead and google for small C compilers. They are a dime a dozen, and some don't venture beyond a couple thousand lines of code.
Hilarious take. There's absolutely no advantage to learning to use LLMs now. Even LLM "skills", if you can call it that, that you may have learnt 6 months ago are already irrelevant and obsolete. Do you really think a smart person couldn't get to your level in about an hour? You are not building fundamental skills and experience by using LLM agents now, you're just coasting and possibly even atrophying.
I feel so weird not being the grumpy one for once.
Can't relate to GP's experience of one-shotting. I need to try a couple of times and really hone in on the right plan and constraints.
But I am getting so much done. My todo list used to grow every year. Now it shrinks every month.
And this is not mindless "vibe coding". I insist on what I deploy being quality, and I use every tool I can that can help me achieve that (languages with strong types, TDD with tests that specify system behaviour, E2E tests where possible).
I'm on my 5th draft of an essentially vibe-coded project. Maybe its because I'm using not-frontier models to do the coding, but I have to take two or three tries to get the shape of a thing just right. Drafting like this is something I do when I code by hand, as well. I have to implement a thing a few times before I begin to understand the domain I'm working in. Once I begin to understand the domain, the separation of concerns follows naturally, and so do the component APIs (and how those APIs hook together).
- like the sister comment says, use the best model available. For me that has been opus but YMMV. Some of my colleagues prefer the OAI models.
- iterate on the plan until it looks solid. This is where you should invest your time.
- Watch the model closely and make sure it writes tests first, checks that they fail, and only then proceeds to implementation
- the model should add pieces one by one, ensuring each step works before proceeding. Commit each step so you can easily retry if you need to. Each addition will involve a new plan that you go back and forth on until you're happy with it. The planning usually gets easier as the project moves along.
- this is sometimes controversial, but use the best language you can target. That can be Rust, Haskell, Erlang depending on the context. Strong types will make a big difference. They catch silly mistakes models are liable to make.
Cursor is great for trying out the different models. If opus is what you like, I have found Claude code to be better value, and personally I prefer the CLI to the vscode UI cursor builds on. It's not a panacea though. The CLI has its own issues like occasionally slowing to a crawl. It still gets the work done.
> Maybe its because I'm using not-frontier models to do the coding
IMO it’s probably that. The difference between where this was a a year ago and now is night and day, and not using frontier models is roughly like stepping back in time 6-12 months.
I am one of the ones who reviews code and pushes projects to the finish line for people who use AI like you. I hate it. The code is slop. You don’t realize because you aren’t looking close enough, but we do and it’s annoying
I disagree with the characterization as "slop", if the tools are used well. There's no reason the user has to submit something that looks fundamentally different from what they would handwrite.
You can't simply throw the generated code over the wall to the reviewer. You have to put in the work to understand what's being proposed and why.
Lastly, an extremely important part of this is the improvement cycle.
The tools will absolutely do suboptimal things sometimes, usually pretty similar to a human who isn't an expert in the codebase. Many people just accept what comes out. It's very important to identify the gaps between the first draft, what was submitted for code review, and the mergeable final product and use that information to improve the prompt architecture and automation.
What I see is a tool that takes a lot of investment to pay off, but where the problems for operationalizing it are very tractable, and the opportunity is immense.
I'm worried about many other aspects, but not the basic utility.
Here’s the thing, they say all the same things you just said in this comment. Yet, the code I end up having to work in is still bad. It’s 5x longer than it needs to be and the naming is usually bad so it takes way longer to read than human code. To top it off, very often it doesn’t integrate completely with the other systems and I have to rewrite a portion which takes longer because the code was designed to solve for a different problem.
If you are really truly reviewing every single line in a way that it is the same as if you hand wrote it… just hand write it. There’s no way you’re actually saving time if this is the case. I don’t buy that people are looking at it as deeply as they claim to be.
> If you are really truly reviewing every single line in a way that it is the same as if you hand wrote it… just hand write it.
I think this is only true for people who are already experts in the codebase. If you know it inside-out, sure, you can simply handwrite it. But if not, the code writing is a small portion of the work.
I used to describe it as this task will take 2 days of code archaeology, but result in a +20/-20 change. Or much longer, if you are brand new to the codebase. This is where the AI systems excel, in my experience.
If the output is +20/-20, then there's a pretty good chance it nailed the existing patterns. If it wrote a bunch more code, then it probably deserves deeper scrutiny.
In my experience, the models are getting better and better at doing the right thing. But maybe this is also because I'm working in a codebase where there are many example patterns in the codebase to slot into and the entire team is investing heavily in the agent instructions and skills, and the tooling.
It may also have to do with the domain and language to some extent.
Yes, the code archaeology is the time consuming part. I could use an LLM to do that for me in my co-workers generated code, but I don’t want to because when I have worked with AI I have found it to typically create overly-complex and uncreative solutions. I think there may be some confirmation bias with LLM coders where they look at the code and think it’s pretty good, so they think it’s basically the same way they would have written it themselves. But you miss a lot of experiences compared to when you’re actually in the code trenches reading, designing, and tinkering with code on your own. Like moving around functions to different modules and it suddenly hits you that there’s actually a conceptual shift you can make that allows you to code it all much simpler, or recalling that shareholder feedback from last week that — if worked with —could allow you a solution pathway that wasn’t viable with the current API design. I have also found that LLMs make assumptions about what parts of the code base can and can’t be changed, and they’re often inaccurate.
>* I find it hard to justify the value of investing so much of my time perfecting the art of asking a machine to write what I could do perfectly well in less time than it takes to hone the prompt.*
This is correct if the prompt is single-use. It's wrong if the prompt becomes a reusable skill that fires correctly 200 times.
The problem isn't generation — it's unstructured generation. Prompting ad-hoc and hoping the
output holds up. That fails, obviously.
200 skills later: skills are prose instructions matched by description, not slash commands. The thinking happens when you write the skill. The generation happens when you invoke it. That's
composition, not improvisation. Plus drift detection that catches when a skill's behavior diverges
from its intent.
>You’re supposed to craft lengthy prompts that massage the AI assistant’s apparently fragile ego by telling it “you are an expert in distributed systems”
This isn't GPT-3 anymore. We have added fine tuning and other post training techniques to make this unnecessary.
I’ve reached a point where I stop reading whenever I see a post that mentions “one-shot.” It's becoming increasingly obvious that many platforms are riddled with bots or incompetent individuals trying to convince others that AI are some kind of silver bullet.
The POV of the author I understand quite well because it was mine. Its really only in the last 6 months or so that my perspective has changed. The author still sounds like they are in the "black box that I toss wishes into and is dumb as fuck" phase. It also sounds like they are resistant to learning how to make the most of it which is a shame. If you take the time to learn the techniques that make this stuff tick you'll be amazed at what it can do. I mean maybe I am a total idiot and this stuff will get good enough that I am no longer necessary. Right now though? I see it as an augmentation. An amplification of me and what I am capable of.
"It’s very unsettling, then, to find myself feeling like I’m in danger of being left behind - like I’m missing something. As much as I don’t like it, so many people have started going so hard on LLM-generated code in a way that I just can’t wrap my head around.
...
’ve been using Copilot - and more recently Claude - as a sort of “spicy autocomplete” and occasional debugging assistant for some time, but any time I try to get it to do anything remotely clever, it completely shits the bed. Don’t get me wrong, I know that a large part of this is me holding it wrong, but I find it hard to justify the value of investing so much of my time perfecting the art of asking a machine to write what I could do perfectly well in less time than it takes to hone the prompt.
You’ve got to give it enough context - but not too much or it gets overloaded. You’re supposed to craft lengthy prompts that massage the AI assistant’s apparently fragile ego by telling it “you are an expert in distributed systems” as if it were an insecure, mediocre software developer.
Or I could just write the damn code in less time than all of this takes to get working."
Well there's your problem. Nobody does roll-based prompts anymore, and the entire point of coding agents is that they search your code base, do internet searches, and do web fetches, as well as launch sub agents and use todo lists, to fill and adjust their context exactly as needed themselves, without you having to do it manually.
It's funny reading people planatively saying, "I just don't get how people could possibly be getting used out of these things. I don't understand it." And then they immediately reveal that it's not the baffling mystery or existential question there pretending it is for the purpose of this essay — the reason they don't understand it is that they literally don't understand the tech itself lol
> I have a source file of a few hundred lines implementing an algorithm that no LLM I've tried (and I've tried them all) is able to replicate, or even suggest, when prompted with the problem. Even with many follow up prompts and hints.
People making this kind of claim will never post the question and prompts they tried. Because if they did, everyone will know it's just they don't know how to prompt.
This just shows that the models (not AI, statistical models of text used without consent) are not that smart, it's the tooling around them which allows using these models as a heuristic for brute force search of the solution space.
Just last week, I prompted (not asked, it is not sentient) Claude to generate (not tell me or find out or any other anthropomorphization) whether I need to call Dispose on objects passed to me from 2 different libraries for industrial cameras. Being industrial, most people using them typically don't post their code publicly, which means the models have poor statistical coverage around these topics.
The LLM generated a response which triggered the tooling around it to perform dozens of internet searches and based on my initial prompt, the search results and lots of intermediate tokens ("thinking"), generated a reply which said that yes, I need to call Dispose in both cases.
It was phrased authoritatively and confidently.
So I tried it, one library segfaulted, the other returned an exception on a later call. I performed my own internet search (a single one) and immediately found documentation from one of the libraries clearly stating I don't need to call Dispose. The other library being much more poorly documented didn't mention this explicitly but had examples which didn't call Dispose.
I am sure if I used LLMs "properly" "agentically", then they would have triggered the tooling around them to build and execute the code, gotten the same results as me much faster, then equally authoritatively and confidently stated that I don't need to call Dispose.
This is not thinking. It's a form of automation but not thinking and not intelligence.
Yes, I think you are spot on. I've been toying with Claude Code recently to counter my own bias against agentic coding. It will confidently create a broken project, run it, read the error messages, fix it, run it, read the error messages and keep going until it runs. I used it to create a firefox addon, which meant that it invoked me very frequently to validate its output. This was much more tedious than letting it work on problems that it could validate with the console. It also kinda sucks at googling and looking up documentation.
AI "reasoning" in it's current state is a hack meant to overcome the problem of contextual learning[0]. It somewhat works given enough time and good automatic tooling. When this problem is solved, I think we will see a significant boost in productivity from these tools. In it's current state, I'm not convinced that they are worth my time (and money).
> I am sure if I used LLMs "properly" "agentically", then they would have triggered the tooling around them to build and execute the code, gotten the same results as me much faster, then equally authoritatively and confidently stated that I don't need to call Dispose.
Yes, usually my agents directly read the source code of libraries that don't have lots of good documentation or information in their training data, and/or create test programs as minimal viable examples and compile and run them themselves to see what happens, it's quite useful.
But you're right overall; LLMs placed inside agents are essentially providing a sort of highly steerable plausible prior for a genetic algorithm to automatically solve problems and do automation tasks. It's not as brute force as a classic genetic algorithm, but it can't always one-shot things, there is sometimes an element of guess-and-check. But IME at least that element is usually not more iterations than it takes me to figure something out (2-3), on average, although sometimes it needs more iterations than I would've on simple problems, and other times much less on harder ones, or vice versa.
“Brute force” is mostly what makes it all work, and what is most disappointing to me currently. Including the brute force necessary to train an LLM, the vast quantity of text necessary to approach almost human quality, the massive scale of data centers necessary to deploy these models, etc.
I am hoping this is a transitional period, where LLMs could be used to create better models that are more finesse and less brute force.
To be honest, these models being bad is what gives me some hope we can figure out how to approach a potential future AI as a society before it arrives.
Because right now everything in the west is structured around rich people owning things they have not built while people who did the actual work with their hands and their minds are left in the dust.
For a brief period of time (a couple decades), tech was a path for anyone from any background to get at least enough to not struggle. Not become truly rich as for that you need to own real estate or companies but having all your reasonable material needs taken care of and being able to save up for retirement (or in countries without free education, to pay for kids' college).
And that might be coming to an end, with people who benefited from this opportunity cheering it on.
I make $1500/mo, working part time for a friend being vastly underpaid for technical work and part time as a meat packer in the back of a supermarket, because I had to drop out of university a year before finishing due to a disability and loss of financial resources, and thanks to being trans, disabled, and a dropout, as well as the job market that's fucked up due to AI and all the other reasons for layoffs, so I can't get a tech job, despite having been programming since I was seven and being very good at it.
I don't think it's really fair to talk about "people who benefited from this opportunity cheering it on" in the comments on one of my posts. I'm an agentic AI coding enthusiast because I find it fascinating, it allows me to focus more on what I like most about programming (software architecture, systems thinking, etc), and the decreased cognitive load and increased productivity allows me to continue to do interesting projects in the time and energy I have left after my jobs and disability take.
Have you heard about "run or raise" scripts/utilities?
I have each program i use regularly bound to a keyboard shortcut and the search bar is ctrl+k in most browsers. If we're talking purely about the time saving of not having to open/focus another program and its search bar, than those costs can be negligible.
(Valley Girl voice) Role-based prompts are like, so September 2025. Everybody is just using agents now. You don't get to sit with us at the cool kids' table until you learn how to prompt, loser.
Mocking me (a woman, as it happens) with a sexist stereotype and a strawman of my argument doesn't make me wrong. All that extra work is now very much unnecessary, and it's the extra work they are complaining about, so, they are complaining about a problem that no longer exists, thanks to intentionally not learning as things develop. This isn't an argument about fashion or the "cool kids table."
I hardly write code anymore myself and am a heavy user of Claude Code.
One of the things I’m struggling to come to terms with is the “don’t commit code that you don’t understand” thing.
This makes sense, however it’s a thorn in my side that if I’m honest I’ve not been able to come up with a compelling answer to yet.
I agree with the sentiment. But in practice it only works for folks who have become domain experts.
My pain point is this - it’s fine to only review if you have put in the work by writing lots of code over their careers.
But, what’s the game plan in 10 years if we’re all reviewing code?
I know you learn from reviewing, but there’s no doubt that humans learn from writing code, failing, and rewriting.
So…this leaves me in a weird place. Do we move passively into a future where we don’t truely deeply understand the code anymore because we don’t write it? Where we leave it up to the AI to handle the lower level details and we implicitly trust it so we can focus on delivering value further up the chain? What happens to developers in 10-20 years?
I don’t know but I’m not so convinced that we’re going to be able to review AI code so well when we’ve lost the skill to write it ourselves.
There's a couple of news stories doing the rounds at the moment which point to the fact that AI isn't "there yet"
1. Microsoft's announcement of cutting their copilot products sales targets[0]
2. Moltbook's security issues[1] after being "vibe coded" into life
Leaving the undeniable conclusion to be - the vast majority (seriously) distrusts AI much more than we're led to believe, and with good reason.
Thinking (as a SWE) is still very much the most important skill in SWE, and relying on AI has limitations.
For me, AI is a great tool for helping me to discover ideas I had not previously thought of, and it's helpful for boilerplate, but it still requires me to understand what's being suggested, and, even, push back with my ideas.
"Thinking (as a SWE) is still very much the most important skill in SWE, and relying on AI has limitations."
I'd go further and say the thinking is humanity's fur and claws and teeth. It's our strong muscles. It's the only thing that has kept us alive in a natural world that would have us extinct long, long ago.
But now we're building machine with the very purpose of thinking, or at least of producing the results of thinking. And we use it. Boy, do we use it. We use it to think of birthday presents (it's the thought that counts) and greeting card messages. We use it for education coursework (against the rules, but still). We use it, as programmers, to come up with solutions and to find bugs.
If AI (of any stripe, LLM or some later invention) represents an existential threat, it is not because it will rise up and destroy us. Its threat lies solely in the fact that it is in our nature to take the path of least resistance. AI is the ultimate such path, and it does weaken our minds.
My challenge to anyone who thinks it's harmless: use it for a while. Figure out what it's good at and lean on it. Then, after some months, or years, drop it and try working on your own like in the before times. I would bet that one will discover that significant amounts of fluency will be lost.
It seems pretty hard to say at this point—we have people who say they get good results and have high standards. They don’t owe us any proof of course. But we don’t really have any way to validate that. Everybody thinks their code is good, right?
Microsoft might just be having trouble selling copilot because Claude or whatever is better, right?
Moltbook is insecure, but the first couple iterations of any non-trivial web service ends up having some crazy security hole. Also Moltbook seems to be some sort of… intentional statement of recklessness.
I think we’ll only know in retrospect, if there’s a great die-off of the companies that don’t adopt these tools.
Every one of these posts and most of the comments on them could be written by an LLM. Nobody says anything new. Nobody has any original thoughts. People make incredibly broad statements and make fundamental attribution errors.
In fact, most LLMs would do a better job than most commenters on HN.
The best part about this whole debate is that we don't have to wait years and years for one side to be proven definitively right. We will know beyond a shadow of a doubt which side is right by this time next year. If agentic coding has not progressed any further by then, we will know. On the other hand, if coding agents are 4x better than they are today, then there will be a deluge of software online, the number of software engineers that are unemployed will have skyrocketed up and HN will be swamped by perma-doomers.
> I’ve been using Copilot - and more recently Claude - as a sort of “spicy autocomplete” and occasional debugging assistant for some time, but any time I try to get it to do anything remotely clever, it completely shits the bed.
This seems like a really disingenuous statement. If claude can write an entire C compiler that is able to compile the linux kernel, I think it has already surpassed an unimaginable threshold for "cleverness"
I can't help but draw parrallels to the systems programmers who would scoff at people getting excited over css and javascript features. "Just write the code yourself! There is nothing new here! Just think!"
The point of programming is to automate reasoning. Don't become a reactionary just cause your skills got got. The market is never wrong, even if there is a correction in 20 years we'll see nvidia with 10T market cap. Like every other correction (at&t, NTT)
Ah yes, slightly abstracted mathematical concepts compiling down to mathematical logic is totally the same as trying to unpredictably guess which mathematical concepts your massive, complex, non-mathematical natural language might possibly be referring to
Programmers for some reason love to be told what do to. First thing in the morning they look out for someone else to tell them how to do, how to test, how to validate.
Why don't do it yourself, like you want to do it, when you could just fallback to mediocrity and instead do like everybody else does?
Why think when you can be told what to do?
Why have intercourse with your wife when instead you can let someone else do? This is the typical llm user mentality
Maybe I don't understand it correctly but to me this reads like the author isn't actually using AI agents. I don't talk or write prompts anymore. I write tasks and I let a couple of AI agent complete those tasks. Exactly how I'd distribute tasks to a human. The AI code is of variating quality and they certainly aren't great at computer science (at least not yet), but it's not like they write worse code than some actual humans would.
I like to say that you don't need computer science to write software, until you do. The thing is that a lot of software in the organisations I've worked in, doesn't actually need computer science. I've seen horrible javascript code on the back-end live a full lifecycle of 5+ years without needing much maintainence, if any, and be fine. It could've probably have been more efficient, but compute is so cheap that it never really mattered. Of course I've also seen inefficient software or errors cost us a lot of money when our solar plants didn't output what they were supposed to. I'd let AI's write one of those things any day.
Hell I did recently. We had an old javascript service which was doing something with the hubspot API. I say something because I didn't ever really find out what it was. Basically hubspot sunset the v1 of their API, and before the issue arrived at my table my colleagues had figured out that was the issue. I didn't really have the time to fix this, so when I saw how much of a mess the javascript code was and realized it would take me a few hours to figure out what it even did... well... I told my AI agent running on our company framework to fix it. It did so in 5-10 minutes with a single correction needed. It improved the javascript quite a bit while doing it, typing everything. I barely even got out of my flow to make it happen. So far it's run without any issues for a month. I was frankly completely unnecessary in this process. The only reason it was me who fired up the AI is because the people who sent me the task haven't yet adopted AI agents.
That being said... AI's are a major security risk that needs to be handled accordingly.
> I think it’s important to highlight at this stage that I am not, in fact, “anti-LLM”. I’m anti-the branding of it as “artificial intelligence”, because it’s not intelligent. It’s a form of machine learning.
It's a bit weird to be against the use of the phrase "artificial intelligence" and not "machine learning". Is it possible to learn without intelligence? Methinks the author is a bit triggered by the term "intelligence" at a base primal level ("machines can't think!").
> “Generative AI” is just a very good Markov chain that people expect far too much from.
The author of this post doesn't know the basics of how LLMs work. The whole reason LLMs work so well is that they are extremely stateful and not memoryless, the key property of Markov processes.
As long as AI (genAI, LLMs, whatever you call it describe the current tech) is perceived not as a "bicycle of the mind" and a tool to utilize 'your' skills to a next phase but as a commodity to be exploited by giant corporations whose existence is based on maximizing profits regardless of virtue or dignity (a basic set of ethics to, for example, not to burn books after you scan it feed your LLM like Anthropic), it is really hard to justify the current state of AI.
Once you understand the sole winner in this hype is the one who'll be brutally scraping every bit of data, whether it's real-time or static and then refining it to give it back to you without your involvement in the process (a.k.a, learning) you'll come to understand that the current AI by nature is hugely unfavorable to mental progression...
> not to burn books after you scan it
Shouldn't we blame copyright laws for that?
How would copyright law possibly compel the burning of books?
IANAL, I can only cite court decision: "And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies."
> As long as AI (...) is perceived not as a "bicycle of the mind" and a tool to utilize 'your' skills to a next phase but as a commodity (...), it is really hard to justify the current state of AI.
I don't agree at all. The "commodity" argument is actually a discussion on economic viability. This is the central discussion, and what will determine if tomorrow we will still have higher-quality and up-to-date LLMs available to us.
You need to understand that nowadays there is a clear race to the bottom in LLM-related services, at a time when the vast majority is not economically viable. The whole AI industry is unsustainable at this point. Thus it's rather obvious that making a business case and generating revenue is a central point of discussion.
Highlighting this as a good thing is just... not great
> Code you did not write is code you do not understand > You cannot maintain code you do not understand
We need no AI for this one: If I could only maintain code I wrote, I'd have to work alone. Whether I am reviewing code written by AI or by a human is irrelevant here. The rest of the comments on the human we know having more trust is in itself a smell on your development process: This is how we get cliques, and people considering repos hostile, because there's clear differences in behavior for well known contributors, who are just as capable of writing something bad as anyone else. If there's anything I have seen in code review processes, regardless of where I've worked for the last couple of decades, is that visual inspections rarely get us anywhere, and no organization is willing to dedicate the time to double check anything. Bugs that happily go through inspection come in too even when the organization commits to extreme programming: Full time TDD and full pairing.
There is a question on how fast we can merge code in a project safely, regardless of the identity of the author of the PRs. A systematic approach to credible, reliable development. But the answer here has very little to do with what the article's author is saying, and has nothing to do with whether a contribution was made by an AI, a human, or a golden retriever with access to a keyboard and a lot of hope.
> If I could only maintain code I wrote, I'd have to work alone.
I don't think that "Code you did not write is code you do not understand" implies that the only way to understand code is to write it. I can read code, run it, debug it and figure out how it works.
It's true that all of that is possible with AI generated code. But the thing is, is it worth for me spending time understanding that PR, when I can make the change or launch my own agent to prompt it myself? At least with my own prompt I know exactly what I want it to be.
Exactly. If the only effort of coding was in the writing, we would have much more code, applications, etc.
The real effort of coding is understanding it, reviewing it, improving it, making it work, and work well and durably. All of this requires thinking.
> We need no AI for this one: If I could only maintain code I wrote, I'd have to work alone.
I think you missed the whole point. This is not about you understanding a particular change. This is about the person behind the code change not understanding the software they are tasked to maintain. It's a kin to the discussion about the fundamental differences between script kiddies vs hackers.
With LLMs and coding agents, there is a clear pressure to turn developers into prompt kiddies: someone who is able to deliver results when the problem is bounded, but is fundamentally unable to understand what he did or the whole system being used.
This is not about sudden onsets of incompetence. This is about a radical change in workflows that no longer favor or allow research to familiarize with projects. You no longer need to pick through a directory tree to know where things are, or nagivate through code to check where a function is called or what component is related to which component. Having to manually open a file to read or write to it represents a learning moment that allows you to recall and understand how and why things are done. With LLMs you don't even understand what is there.
Thus developers who lean heavily on LLMs don't get to learn what's happening. Everyone can treat the project as a black box, and focus on observable changes to the project's behavior.
> Everyone can treat the project as a black box, and focus on observable changes to the project's behavior.
This is a good thing. I don’t need to focus on oil refineries when I fill my car with gas. I don’t know how to run a refinery, and don’t need to know.
Someone somewhere knows though.
I read this and thought, "are we using the same software?" For me, I have turned the corner where I barely hand-edit anything. Most of the tasks I take on are nearly one-shot successful, simply pointing Claude Code at a ticket URL. I feel like I'm barely scratching the surface of what's possible.
I'm not saying this is perfect or unproblematic. Far from it. But I do think that shops that invest in this way of working are going to vastly outproduce ones that don't.
LLMs are the first technology where everyone literally has a different experience. There are so many degrees of freedom in how you prompt. I actually believe that people's expectations and biases tend to correlate with the outcomes they experience. People who approach it with optimism will be more likely to problem-solve the speed bumps that pop up. And the speed bumps are often things that can mostly be addressed systemically, with tooling and configuration.
This only works if you don't look at the code.
If all you're doing is reviewing behaviour and tests then yes almost 100% of the time if you're able to document the problem exact enough codex 5.3 will get it right.
I had codex 5.3 write flawless svelte 5 code only because I had already written valid svelte 5 code around my code.
The minute I started a new project and asked it to use svelte 5 and let it loose it not only started writing a weird mixture of svelte 3/4 + svelte 5 code but also straight up ignored tailwind and started writing it's own CSS.
I asked it multiple times to update the syntax to svelte 5 but it couldn't figure it out. So I gave up and just accepted it, that's what I think is going to happen more frequently. If the code doesn't matter anymore and it's just the process of evaluating inputs and outputs then whatever.
However if I need to implement a specific design I will 100% end up spending more time generating than writing it myself.
I'm working in a very mature codebase on product features that are not technically unprecedented, which probably is determining a lot of my experience so far. Very possible that I'm experiencing a sweet spot.
I can totally imagine that in greenfield, the LLM is going to explore huge search spaces. I can see that when observing the reasoning of these same models in non-coding contexts.
That's exactly what I meant, when I've used LLMs on mature code bases it does very well because the code base was curated by engineers. When you have a greenfield project it's slop central, it's literally whatever the LLM has been trained on and the LLM can get to compile and run.
Which is still okay, only until I have access to good and cheap LLMs.
This person is not using Claude Code or Cursor. They refuse to use the tools and have convinced themselves that they are right. Sadly, they won't recognize how wrong they were until they are unemployable.
I was a huge skeptic on this stuff less than a year ago, so I get it. For a couple years, the hype was really hype, when it came to the actual business utility of AI tools. It's just interesting to me the extent to which people have totally different lived experiences right now.
I do agree that some folks are in for rude awakening, because markets (labor and otherwise) will reveal winning strategies. I'm far from a free market ideologist, but this is a place where the logic seems to apply.
To be totally fair to them... it is quite literally in the last few months that the tools have actually begun to meet the promises that the breathless hypers have been screeching about for years at this point.
But it's also true that it simply is better than the OP is giving it credit for.
Depressingly. Because I like writing code.
If Claude Code or Cursor is actually that good then we're all unemployed anyway. Using the tools won't save any of our jobs.
I say this as someone who does use the tools, they're fine. I have yet to ever have an "it's perfect, no notes" result. If the bar is code that technically works along the happy path then fine, but that's the floor of what I'm willing to put forth or accept in a PR.
> If Claude Code or Cursor is actually that good then we're all unemployed anyway. Using the tools won't save any of our jobs.
There is absolutely reason for concern, but it's not inevitable.
For the foreseeable future, I don't think we can simply Ralph Wiggum-loop real business problems. A lot of human oversight and tuning is required.
Also, I haven't seen anything to suggest that AI is good at strategic business decisionmaking.
I do think it dramatically changes the job of a software developer, though. We will be more like developers of software assembly lines and strategists.
Every company I have ever worked for has had a deep backlog of tasks and ideas we realistically were never going to get to. These tools put a lot of those tasks in play.
> I have yet to ever have an "it's perfect, no notes" result.
It frequently gets close for me, but usually some follow-up is needed. The ones that are closest to pure one-shot are bug fixes where replication can be captured in a regression test.
> Every company I have ever worked for has had a deep backlog of tasks and ideas we realistically were never going to get to. These tools put a lot of those tasks in play.
Some of that backlog was never meant to be implemented. “Put it in the backlog” is a common way to deflect conflict over technical design and the backlog often becomes a graveyard of ideas. If I unleashed a brainless agent on our backlog the system would become a Frankenstein of incompatible design choices.
An important part of management is to figure out what actually brings value instead of just letting teams build whatever they want.
You need to groom your backlog.
> If Claude Code or Cursor is actually that good then we're all unemployed anyway.
I don't know about that. This PR stunt is a greenfield project that no one really knows what volume of work went behind it, and targeted a problem (bootstrapping a C compiler) that is actually quite small and relatively trivial to accomplish.
Go ahead and google for small C compilers. They are a dime a dozen, and some don't venture beyond a couple thousand lines of code.
Check out this past discussion.
https://news.ycombinator.com/item?id=21210087
Hilarious take. There's absolutely no advantage to learning to use LLMs now. Even LLM "skills", if you can call it that, that you may have learnt 6 months ago are already irrelevant and obsolete. Do you really think a smart person couldn't get to your level in about an hour? You are not building fundamental skills and experience by using LLM agents now, you're just coasting and possibly even atrophying.
I'm reading all these articles and having the same thought. These folks aren't using the same tools I'm using.
I feel so weird not being the grumpy one for once.
Can't relate to GP's experience of one-shotting. I need to try a couple of times and really hone in on the right plan and constraints.
But I am getting so much done. My todo list used to grow every year. Now it shrinks every month.
And this is not mindless "vibe coding". I insist on what I deploy being quality, and I use every tool I can that can help me achieve that (languages with strong types, TDD with tests that specify system behaviour, E2E tests where possible).
I'm on my 5th draft of an essentially vibe-coded project. Maybe its because I'm using not-frontier models to do the coding, but I have to take two or three tries to get the shape of a thing just right. Drafting like this is something I do when I code by hand, as well. I have to implement a thing a few times before I begin to understand the domain I'm working in. Once I begin to understand the domain, the separation of concerns follows naturally, and so do the component APIs (and how those APIs hook together).
My suggestions:
- like the sister comment says, use the best model available. For me that has been opus but YMMV. Some of my colleagues prefer the OAI models.
- iterate on the plan until it looks solid. This is where you should invest your time.
- Watch the model closely and make sure it writes tests first, checks that they fail, and only then proceeds to implementation
- the model should add pieces one by one, ensuring each step works before proceeding. Commit each step so you can easily retry if you need to. Each addition will involve a new plan that you go back and forth on until you're happy with it. The planning usually gets easier as the project moves along.
- this is sometimes controversial, but use the best language you can target. That can be Rust, Haskell, Erlang depending on the context. Strong types will make a big difference. They catch silly mistakes models are liable to make.
Cursor is great for trying out the different models. If opus is what you like, I have found Claude code to be better value, and personally I prefer the CLI to the vscode UI cursor builds on. It's not a panacea though. The CLI has its own issues like occasionally slowing to a crawl. It still gets the work done.
> and personally I prefer the CLI to the vscode UI cursor builds on
So do I, but I also quite like Cursor's harness/approach to things.
Which is why their `agent` CLI is so handy! You can use cursor in any IDE/system now, exactly like claude code/codex cli
I tried it when it first came out and it was lacking then. Perhaps it's better now--will give it a shot when I sign up for cursor again.
Thank you for sharing that!
> Maybe its because I'm using not-frontier models to do the coding
IMO it’s probably that. The difference between where this was a a year ago and now is night and day, and not using frontier models is roughly like stepping back in time 6-12 months.
I am one of the ones who reviews code and pushes projects to the finish line for people who use AI like you. I hate it. The code is slop. You don’t realize because you aren’t looking close enough, but we do and it’s annoying
I disagree with the characterization as "slop", if the tools are used well. There's no reason the user has to submit something that looks fundamentally different from what they would handwrite.
You can't simply throw the generated code over the wall to the reviewer. You have to put in the work to understand what's being proposed and why.
Lastly, an extremely important part of this is the improvement cycle. The tools will absolutely do suboptimal things sometimes, usually pretty similar to a human who isn't an expert in the codebase. Many people just accept what comes out. It's very important to identify the gaps between the first draft, what was submitted for code review, and the mergeable final product and use that information to improve the prompt architecture and automation.
What I see is a tool that takes a lot of investment to pay off, but where the problems for operationalizing it are very tractable, and the opportunity is immense.
I'm worried about many other aspects, but not the basic utility.
Here’s the thing, they say all the same things you just said in this comment. Yet, the code I end up having to work in is still bad. It’s 5x longer than it needs to be and the naming is usually bad so it takes way longer to read than human code. To top it off, very often it doesn’t integrate completely with the other systems and I have to rewrite a portion which takes longer because the code was designed to solve for a different problem.
If you are really truly reviewing every single line in a way that it is the same as if you hand wrote it… just hand write it. There’s no way you’re actually saving time if this is the case. I don’t buy that people are looking at it as deeply as they claim to be.
> If you are really truly reviewing every single line in a way that it is the same as if you hand wrote it… just hand write it.
I think this is only true for people who are already experts in the codebase. If you know it inside-out, sure, you can simply handwrite it. But if not, the code writing is a small portion of the work.
I used to describe it as this task will take 2 days of code archaeology, but result in a +20/-20 change. Or much longer, if you are brand new to the codebase. This is where the AI systems excel, in my experience.
If the output is +20/-20, then there's a pretty good chance it nailed the existing patterns. If it wrote a bunch more code, then it probably deserves deeper scrutiny.
In my experience, the models are getting better and better at doing the right thing. But maybe this is also because I'm working in a codebase where there are many example patterns in the codebase to slot into and the entire team is investing heavily in the agent instructions and skills, and the tooling.
It may also have to do with the domain and language to some extent.
Yes, the code archaeology is the time consuming part. I could use an LLM to do that for me in my co-workers generated code, but I don’t want to because when I have worked with AI I have found it to typically create overly-complex and uncreative solutions. I think there may be some confirmation bias with LLM coders where they look at the code and think it’s pretty good, so they think it’s basically the same way they would have written it themselves. But you miss a lot of experiences compared to when you’re actually in the code trenches reading, designing, and tinkering with code on your own. Like moving around functions to different modules and it suddenly hits you that there’s actually a conceptual shift you can make that allows you to code it all much simpler, or recalling that shareholder feedback from last week that — if worked with —could allow you a solution pathway that wasn’t viable with the current API design. I have also found that LLMs make assumptions about what parts of the code base can and can’t be changed, and they’re often inaccurate.
>* I find it hard to justify the value of investing so much of my time perfecting the art of asking a machine to write what I could do perfectly well in less time than it takes to hone the prompt.*
This sums up my interactions with LLMs
This is correct if the prompt is single-use. It's wrong if the prompt becomes a reusable skill that fires correctly 200 times.
The problem isn't generation — it's unstructured generation. Prompting ad-hoc and hoping the output holds up. That fails, obviously.
200 skills later: skills are prose instructions matched by description, not slash commands. The thinking happens when you write the skill. The generation happens when you invoke it. That's composition, not improvisation. Plus drift detection that catches when a skill's behavior diverges from its intent.
Don't stop generating. Start composing.
>You’re supposed to craft lengthy prompts that massage the AI assistant’s apparently fragile ego by telling it “you are an expert in distributed systems”
This isn't GPT-3 anymore. We have added fine tuning and other post training techniques to make this unnecessary.
I’ve reached a point where I stop reading whenever I see a post that mentions “one-shot.” It's becoming increasingly obvious that many platforms are riddled with bots or incompetent individuals trying to convince others that AI are some kind of silver bullet.
Nice name you got there localghost ;p
The POV of the author I understand quite well because it was mine. Its really only in the last 6 months or so that my perspective has changed. The author still sounds like they are in the "black box that I toss wishes into and is dumb as fuck" phase. It also sounds like they are resistant to learning how to make the most of it which is a shame. If you take the time to learn the techniques that make this stuff tick you'll be amazed at what it can do. I mean maybe I am a total idiot and this stuff will get good enough that I am no longer necessary. Right now though? I see it as an augmentation. An amplification of me and what I am capable of.
"It’s very unsettling, then, to find myself feeling like I’m in danger of being left behind - like I’m missing something. As much as I don’t like it, so many people have started going so hard on LLM-generated code in a way that I just can’t wrap my head around.
...
’ve been using Copilot - and more recently Claude - as a sort of “spicy autocomplete” and occasional debugging assistant for some time, but any time I try to get it to do anything remotely clever, it completely shits the bed. Don’t get me wrong, I know that a large part of this is me holding it wrong, but I find it hard to justify the value of investing so much of my time perfecting the art of asking a machine to write what I could do perfectly well in less time than it takes to hone the prompt.
You’ve got to give it enough context - but not too much or it gets overloaded. You’re supposed to craft lengthy prompts that massage the AI assistant’s apparently fragile ego by telling it “you are an expert in distributed systems” as if it were an insecure, mediocre software developer.
Or I could just write the damn code in less time than all of this takes to get working."
Well there's your problem. Nobody does roll-based prompts anymore, and the entire point of coding agents is that they search your code base, do internet searches, and do web fetches, as well as launch sub agents and use todo lists, to fill and adjust their context exactly as needed themselves, without you having to do it manually.
It's funny reading people planatively saying, "I just don't get how people could possibly be getting used out of these things. I don't understand it." And then they immediately reveal that it's not the baffling mystery or existential question there pretending it is for the purpose of this essay — the reason they don't understand it is that they literally don't understand the tech itself lol
Yeah, remind me of this: https://news.ycombinator.com/item?id=46929505
> I have a source file of a few hundred lines implementing an algorithm that no LLM I've tried (and I've tried them all) is able to replicate, or even suggest, when prompted with the problem. Even with many follow up prompts and hints.
People making this kind of claim will never post the question and prompts they tried. Because if they did, everyone will know it's just they don't know how to prompt.
At what point will the proper way to prompt just be "built-in"? Why aren't they built-in already if the "proper way to prompt" is so well understood?
Eh, I think that one is fair. LLMs aren't great at super novel solutions
This just shows that the models (not AI, statistical models of text used without consent) are not that smart, it's the tooling around them which allows using these models as a heuristic for brute force search of the solution space.
Just last week, I prompted (not asked, it is not sentient) Claude to generate (not tell me or find out or any other anthropomorphization) whether I need to call Dispose on objects passed to me from 2 different libraries for industrial cameras. Being industrial, most people using them typically don't post their code publicly, which means the models have poor statistical coverage around these topics.
The LLM generated a response which triggered the tooling around it to perform dozens of internet searches and based on my initial prompt, the search results and lots of intermediate tokens ("thinking"), generated a reply which said that yes, I need to call Dispose in both cases.
It was phrased authoritatively and confidently.
So I tried it, one library segfaulted, the other returned an exception on a later call. I performed my own internet search (a single one) and immediately found documentation from one of the libraries clearly stating I don't need to call Dispose. The other library being much more poorly documented didn't mention this explicitly but had examples which didn't call Dispose.
I am sure if I used LLMs "properly" "agentically", then they would have triggered the tooling around them to build and execute the code, gotten the same results as me much faster, then equally authoritatively and confidently stated that I don't need to call Dispose.
This is not thinking. It's a form of automation but not thinking and not intelligence.
Yes, I think you are spot on. I've been toying with Claude Code recently to counter my own bias against agentic coding. It will confidently create a broken project, run it, read the error messages, fix it, run it, read the error messages and keep going until it runs. I used it to create a firefox addon, which meant that it invoked me very frequently to validate its output. This was much more tedious than letting it work on problems that it could validate with the console. It also kinda sucks at googling and looking up documentation.
AI "reasoning" in it's current state is a hack meant to overcome the problem of contextual learning[0]. It somewhat works given enough time and good automatic tooling. When this problem is solved, I think we will see a significant boost in productivity from these tools. In it's current state, I'm not convinced that they are worth my time (and money).
[0] - https://hy.tencent.com/research/100025?langVersion=en
> I am sure if I used LLMs "properly" "agentically", then they would have triggered the tooling around them to build and execute the code, gotten the same results as me much faster, then equally authoritatively and confidently stated that I don't need to call Dispose.
Yes, usually my agents directly read the source code of libraries that don't have lots of good documentation or information in their training data, and/or create test programs as minimal viable examples and compile and run them themselves to see what happens, it's quite useful.
But you're right overall; LLMs placed inside agents are essentially providing a sort of highly steerable plausible prior for a genetic algorithm to automatically solve problems and do automation tasks. It's not as brute force as a classic genetic algorithm, but it can't always one-shot things, there is sometimes an element of guess-and-check. But IME at least that element is usually not more iterations than it takes me to figure something out (2-3), on average, although sometimes it needs more iterations than I would've on simple problems, and other times much less on harder ones, or vice versa.
>brute force search of the solution space
“Brute force” is mostly what makes it all work, and what is most disappointing to me currently. Including the brute force necessary to train an LLM, the vast quantity of text necessary to approach almost human quality, the massive scale of data centers necessary to deploy these models, etc.
I am hoping this is a transitional period, where LLMs could be used to create better models that are more finesse and less brute force.
To be honest, these models being bad is what gives me some hope we can figure out how to approach a potential future AI as a society before it arrives.
Because right now everything in the west is structured around rich people owning things they have not built while people who did the actual work with their hands and their minds are left in the dust.
For a brief period of time (a couple decades), tech was a path for anyone from any background to get at least enough to not struggle. Not become truly rich as for that you need to own real estate or companies but having all your reasonable material needs taken care of and being able to save up for retirement (or in countries without free education, to pay for kids' college).
And that might be coming to an end, with people who benefited from this opportunity cheering it on.
I make $1500/mo, working part time for a friend being vastly underpaid for technical work and part time as a meat packer in the back of a supermarket, because I had to drop out of university a year before finishing due to a disability and loss of financial resources, and thanks to being trans, disabled, and a dropout, as well as the job market that's fucked up due to AI and all the other reasons for layoffs, so I can't get a tech job, despite having been programming since I was seven and being very good at it.
I don't think it's really fair to talk about "people who benefited from this opportunity cheering it on" in the comments on one of my posts. I'm an agentic AI coding enthusiast because I find it fascinating, it allows me to focus more on what I like most about programming (software architecture, systems thinking, etc), and the decreased cognitive load and increased productivity allows me to continue to do interesting projects in the time and energy I have left after my jobs and disability take.
What a sober and accurate observation of the real capabilities of LLMs.
And it’s nothing to sneeze at because it allows me to stay in the terminal rather than go back and forth between the terminal and Google.
Have you heard about "run or raise" scripts/utilities?
I have each program i use regularly bound to a keyboard shortcut and the search bar is ctrl+k in most browsers. If we're talking purely about the time saving of not having to open/focus another program and its search bar, than those costs can be negligible.
(Valley Girl voice) Role-based prompts are like, so September 2025. Everybody is just using agents now. You don't get to sit with us at the cool kids' table until you learn how to prompt, loser.
Mocking me (a woman, as it happens) with a sexist stereotype and a strawman of my argument doesn't make me wrong. All that extra work is now very much unnecessary, and it's the extra work they are complaining about, so, they are complaining about a problem that no longer exists, thanks to intentionally not learning as things develop. This isn't an argument about fashion or the "cool kids table."
So I guess you could say, they're "holding it wrong"?
Yeah, so?
I hardly write code anymore myself and am a heavy user of Claude Code.
One of the things I’m struggling to come to terms with is the “don’t commit code that you don’t understand” thing.
This makes sense, however it’s a thorn in my side that if I’m honest I’ve not been able to come up with a compelling answer to yet.
I agree with the sentiment. But in practice it only works for folks who have become domain experts.
My pain point is this - it’s fine to only review if you have put in the work by writing lots of code over their careers.
But, what’s the game plan in 10 years if we’re all reviewing code?
I know you learn from reviewing, but there’s no doubt that humans learn from writing code, failing, and rewriting.
So…this leaves me in a weird place. Do we move passively into a future where we don’t truely deeply understand the code anymore because we don’t write it? Where we leave it up to the AI to handle the lower level details and we implicitly trust it so we can focus on delivering value further up the chain? What happens to developers in 10-20 years?
I don’t know but I’m not so convinced that we’re going to be able to review AI code so well when we’ve lost the skill to write it ourselves.
There's a couple of news stories doing the rounds at the moment which point to the fact that AI isn't "there yet"
1. Microsoft's announcement of cutting their copilot products sales targets[0]
2. Moltbook's security issues[1] after being "vibe coded" into life
Leaving the undeniable conclusion to be - the vast majority (seriously) distrusts AI much more than we're led to believe, and with good reason.
Thinking (as a SWE) is still very much the most important skill in SWE, and relying on AI has limitations.
For me, AI is a great tool for helping me to discover ideas I had not previously thought of, and it's helpful for boilerplate, but it still requires me to understand what's being suggested, and, even, push back with my ideas.
[0] https://arstechnica.com/ai/2025/12/microsoft-slashes-ai-sale...
[1] https://www.reuters.com/legal/litigation/moltbook-social-med...
"Thinking (as a SWE) is still very much the most important skill in SWE, and relying on AI has limitations."
I'd go further and say the thinking is humanity's fur and claws and teeth. It's our strong muscles. It's the only thing that has kept us alive in a natural world that would have us extinct long, long ago.
But now we're building machine with the very purpose of thinking, or at least of producing the results of thinking. And we use it. Boy, do we use it. We use it to think of birthday presents (it's the thought that counts) and greeting card messages. We use it for education coursework (against the rules, but still). We use it, as programmers, to come up with solutions and to find bugs.
If AI (of any stripe, LLM or some later invention) represents an existential threat, it is not because it will rise up and destroy us. Its threat lies solely in the fact that it is in our nature to take the path of least resistance. AI is the ultimate such path, and it does weaken our minds.
My challenge to anyone who thinks it's harmless: use it for a while. Figure out what it's good at and lean on it. Then, after some months, or years, drop it and try working on your own like in the before times. I would bet that one will discover that significant amounts of fluency will be lost.
It seems pretty hard to say at this point—we have people who say they get good results and have high standards. They don’t owe us any proof of course. But we don’t really have any way to validate that. Everybody thinks their code is good, right?
Microsoft might just be having trouble selling copilot because Claude or whatever is better, right?
Moltbook is insecure, but the first couple iterations of any non-trivial web service ends up having some crazy security hole. Also Moltbook seems to be some sort of… intentional statement of recklessness.
I think we’ll only know in retrospect, if there’s a great die-off of the companies that don’t adopt these tools.
Every one of these posts and most of the comments on them could be written by an LLM. Nobody says anything new. Nobody has any original thoughts. People make incredibly broad statements and make fundamental attribution errors.
In fact, most LLMs would do a better job than most commenters on HN.
You're absolutely right!
And in that moment, blackqueeriroh was enlightened. Come, let us transcend this plane and go "touch grass".
The best part about this whole debate is that we don't have to wait years and years for one side to be proven definitively right. We will know beyond a shadow of a doubt which side is right by this time next year. If agentic coding has not progressed any further by then, we will know. On the other hand, if coding agents are 4x better than they are today, then there will be a deluge of software online, the number of software engineers that are unemployed will have skyrocketed up and HN will be swamped by perma-doomers.
Wait - why are we waiting (another) year?
ChatGPT has been out for 3 years (Nov 2022)
Claude almost 3 years (March 2023)
Gemini 1 and a bit years (2024)
There hasn't been an avalanche of new software online - if anything things have slowed
Surely this will be the year of AI dominance. And the year of the Linux desktop will be next year.
Did you miss the chart from FT that shows the number of iOS apps and GH commits taking off late last year? It’s happening.
Yes, the Garbogization[1] of the Internet is happening at full steam.
1: Garbage filling a place.
I'm not sure if you are being serious, but
Number of commits don’t tell anything about the value and quality of those commits. Please don’t measure yourself with this poor metric.
— Jaana Dogan ヤナ ドガン (@rakyll) February 26, 2019
Number of IOS apps isn't much of a metric either to be honest, but sure we can pretend that that's a result of the AI revolution...
Maybe we will know, but meanwhile thousands of developers are a long down way the rabbit-hole signposted "the psychology of prior investment".
Outsourcing their thinking is going to be the stupidest thing humans ever did and we won't even be smart enough to understand that this is the case.
Thought we learned this lesson with attention span/ADHD-mimicking symptoms from phone addiction but apparently not!
> I’ve been using Copilot - and more recently Claude - as a sort of “spicy autocomplete” and occasional debugging assistant for some time, but any time I try to get it to do anything remotely clever, it completely shits the bed.
This seems like a really disingenuous statement. If claude can write an entire C compiler that is able to compile the linux kernel, I think it has already surpassed an unimaginable threshold for "cleverness"
"Hammock driven development"'s force multiplying projected to increase steadily on, as the coding becomes less the pressure point.
This text color and background is unreadable.
What theme did you use? I really like the "garden" theme
I can't help but draw parrallels to the systems programmers who would scoff at people getting excited over css and javascript features. "Just write the code yourself! There is nothing new here! Just think!"
The point of programming is to automate reasoning. Don't become a reactionary just cause your skills got got. The market is never wrong, even if there is a correction in 20 years we'll see nvidia with 10T market cap. Like every other correction (at&t, NTT)
Ah yes, slightly abstracted mathematical concepts compiling down to mathematical logic is totally the same as trying to unpredictably guess which mathematical concepts your massive, complex, non-mathematical natural language might possibly be referring to
Programmers for some reason love to be told what do to. First thing in the morning they look out for someone else to tell them how to do, how to test, how to validate.
Why don't do it yourself, like you want to do it, when you could just fallback to mediocrity and instead do like everybody else does?
Why think when you can be told what to do?
Why have intercourse with your wife when instead you can let someone else do? This is the typical llm user mentality
Maybe I don't understand it correctly but to me this reads like the author isn't actually using AI agents. I don't talk or write prompts anymore. I write tasks and I let a couple of AI agent complete those tasks. Exactly how I'd distribute tasks to a human. The AI code is of variating quality and they certainly aren't great at computer science (at least not yet), but it's not like they write worse code than some actual humans would.
I like to say that you don't need computer science to write software, until you do. The thing is that a lot of software in the organisations I've worked in, doesn't actually need computer science. I've seen horrible javascript code on the back-end live a full lifecycle of 5+ years without needing much maintainence, if any, and be fine. It could've probably have been more efficient, but compute is so cheap that it never really mattered. Of course I've also seen inefficient software or errors cost us a lot of money when our solar plants didn't output what they were supposed to. I'd let AI's write one of those things any day.
Hell I did recently. We had an old javascript service which was doing something with the hubspot API. I say something because I didn't ever really find out what it was. Basically hubspot sunset the v1 of their API, and before the issue arrived at my table my colleagues had figured out that was the issue. I didn't really have the time to fix this, so when I saw how much of a mess the javascript code was and realized it would take me a few hours to figure out what it even did... well... I told my AI agent running on our company framework to fix it. It did so in 5-10 minutes with a single correction needed. It improved the javascript quite a bit while doing it, typing everything. I barely even got out of my flow to make it happen. So far it's run without any issues for a month. I was frankly completely unnecessary in this process. The only reason it was me who fired up the AI is because the people who sent me the task haven't yet adopted AI agents.
That being said... AI's are a major security risk that needs to be handled accordingly.
> I think it’s important to highlight at this stage that I am not, in fact, “anti-LLM”. I’m anti-the branding of it as “artificial intelligence”, because it’s not intelligent. It’s a form of machine learning.
It's a bit weird to be against the use of the phrase "artificial intelligence" and not "machine learning". Is it possible to learn without intelligence? Methinks the author is a bit triggered by the term "intelligence" at a base primal level ("machines can't think!").
> “Generative AI” is just a very good Markov chain that people expect far too much from.
The author of this post doesn't know the basics of how LLMs work. The whole reason LLMs work so well is that they are extremely stateful and not memoryless, the key property of Markov processes.