Anyone else find reading things like this slightly exhausting?
I'm very much pro AI for coding there are clearly significant capabilities there but I'm still getting my head around how to best utilise it.
Posts like these make it sound like ruthlessly optimizing your workflow letting no possible efficiency go every single day is the only way to work now. This has always been possible and generally not a good idea to focus on exclusively. There's always been processes to optimise and automate and always a balance as to which to pursue.
Personally I am incorporating AI into my daily work but not getting too bogged down by it. I read about some of the latest ideas and techniques and choose carefully which I employ. Sometimes I'll try and AI workflow and then abandon it. I recently connected Claude up to draw.io with an MCP, it had some good capabilities but for the specific task I wanted it wasn't really getting it so doing it manually was the better choice to achieve what I wanted in good time.
The models themselves and coding harnesses are also evolving quickly complex workflows people may put together can quickly become pointless.
Yeah, AI generated diagrams can be pretty hit or miss. The lack of good quality examples in training data and minimal documentation for these tools can make it difficult for models to even get basic syntax correct for more complex diagrams.
I’ve had a lot of success dogfooding my own product, the Mermaid Studio plugin for JetBrains IDEs (https://mermaidstudio.dev).
It combines the deep semantic code intelligence of an IDE with a suite of integrated MCP tools that your preferred agent can plug into for static analysis, up to date syntax, etc.
I basically tell Claude Code to run the generated diagram through the analysis tool, fix issues it detects and repeat until fixed. Then generate a png or svg for a visual inspection before finalizing the diagram.
Now all of my planning and architecture docs are filled with illustrative flowcharts, sequence diagrams, and occasionally block diagrams for workshopping proposed UI layouts
I know people like this, theres a form of procrastination where they are busy hyperoptimising their todo lists and workflows but getting only a tiny amount of actual work done. It's a form of the disconnected intellect - they can tell you the problem, they can even tell you the solution, but they can't turn that knowledge into action. Convincing themselves that utterly trivial inconveniences are so productivity and or psychologically harmful they can they can then rationalize all this "meta-work" while simultaneously claiming to be virtuous, when in reality it's usually insecurity in their abilities or cowardice to face the unknown, preventing them doing real work
I almost agree with you, but with regard to this specific blog I'm not sure I can.
From my perspective, all this energy spent on AI prompting is actually just planning meetings and whiteboarding in disguise, but since all that has the bad reputation of luring devs into power struggles and yak shaving this is the new way.
It's likely where most of their improved productivity is coming from. The people doing the meta-work just need be mature about it to avoid procrastinating.
"Agents should work overnight, on commutes, in meetings, asynchronously."
If I read stuff like that, I wonder what the F they are doing. Agents work overnight? On what? Stuck in some loop, trying to figure out how to solve a bug by trial and error because the agent isn't capable of finding the right solution? Nothing good will come out of that. When the agent clearly isn't capable of solving an issue in a reasonable amount of time, it needs help. Quite often, a hint is enough. That, of course, requires the developer to still understand what the agent is doing. Otherwise, most likely, it will sooner or later do something stupid to "solve" the issue. And later, you need to clean up that mess.
If your prompt is good and the agent is capable of implementing it correctly, it will be done in 10 minutes or less. If not, you still need to step in.
> I wonder how our comments will age in a few years.
I don't think there will be a future where agents need to work on a limited piece of code for hours. Either they are smart enough to do it in a limited amount of time, or someone smarter needs to get involved.
> This can't be a serious project. It must be a greenfield startup that's just starting.
I rarely review UI code. Doesn't mean that I don't need to step in from time to time, but generally, I don't care enough about the UI code to review it line-by-line.
> I wonder how our comments will age in a few years.
Badly. While I wouldn't assign a task to an LLM that requires such a long running time right now (for many reasons: control, cost etc) I am fully aware that it might eventually be something I do. Especially considering how fast I went from tab completion to whole functions to having LLMs write most of the code.
My competition right now is probably the grifters and hustlers already doing this, and not the software engineers that "know better". Laughing at the inevitable security disasters and other vibe coded fiascos while back-patting each other is funny but missing the forest for the trees.
We don't have enough context here really. For simple changes, sure - 10min is plenty. But imagine you actually have a big spec ready, with graphical designs, cucumber tests, integration tests, sample data, very detailed requirements for multiple components, etc. If the tests are well integrated and the harness is solid, I don't see a reason not to let it go for a couple hours or more. At some point you just can't implement things using the agent in a few simple iterations. If it can succeed on a longer timeline without interruption, that may be actually a sign of good upfront design.
To be clear, this is not a hypothetical situation. I wrote long specs like that and had large chunks of services successfully implemented up to around 2h real-time. And that was limited by the complexity of what I needed, not by what the agent could handle.
To be fair, for major features 30m to an hour isn’t out of this world. Browser testing is critical at this point but it _really_ slows down the AI in the last 15% of the process.
I can see overnight for a prototype of a completely new project with a detailed SPEC.md and a project requirements file that it eats up as it goes.
I can think of one reason for letting agents run overnight: running large models locally is incredibly slow or incredibly expensive. Even more so with he recent RAM price spikes thanks to the AI bubble. Running AI overnight can be a decent solution to solve complex prompts without being dependent on the cloud.
This approach breaks the moment you need to provide any form of feedback, of course.
10 minute is not the limit for current models. I can have them work for hours on a problem.
Humans are not the only thing initiating prompts either. Exceptions and crashes coming in from production trigger agentic workflows to work on fixes. These can happen autonomously over night, 24/7.
> 10 minute is not the limit for current models. I can have them work for hours on a problem.
Admittedly, I have never tried to run it that long. If 10 minutes are not enough, I check what it is doing and tell it to do what it needs to do differently, or what to look at, or offer to run it with debug logs. Recently, I have also had a case where Opus was working on an issue forever, fixing one issue and thereby introducing another, fix that, only for the original issue to disappear. Then I tried out Codex, and it fixed it at first sight. So changing models can certainly help.
But do you really get a good solution after running it for hours? To me, that sounds like it doesn't understand the issue completely.
Sometimes it doesn't work or it will give up early, but considering these run when I'm not working it is not a big deal. When it does work I would say that it has figured out that hard part of the solution. I may have to do another prompt to clean it up a bit, but it got the hard work out of the way.
>or offer to run it with debug logs.
Enabling it to add its own debug logs and use a debugger can allow it to do these loops itself and understand where it's going wrong with its current approach.
For this, which summarises vibe coding and hence the rest of the article, the models aren't good enough yet for novel applications.
With current models and assuming your engineers are of a reasonable level of experience, for now it seems to result in either greatly reduced velocity and higher costs, or worse outcomes.
One course correction in terms of planned process, because the model missed an obvious implication or statement, can save days of churning.
The math only really has a chance to work if you reduce your spend on in-house talent to compensate, and your product sits on a well-trodden path.
In terms of capability we're still at "could you easily outsource this particular project, low touch, to your typical software farm?"
You can expand it beyond novel applications. The models aren't good enough for autonomous coding without a human in the loop period.
They can one shot basic changes and refactors, or even many full prototypes, but for pretty much everything else they're going to start making mistakes at some point. Usually very quickly. It's just where the technology is right now.
The thing that frustrates me is that this is really easy to demonstrate. Articles like this are essentially hallucinations that, at least many, people mystifyingly take seriously.
I assume the reason they get any traction is that a lot of people don't have enough experience with LLM agents yet to be confident that their personal experience generalizes. So they think maybe there are magical context tricks to get the current generation of agents to not make the kinds of mistakes they're seeing.
There aren't. It doesn't matter if it's Opus 4.6 in Claude Code or Codex 5.3 xhigh, they still hallucinate, fail to comprehend context and otherwise drift.
Anyone who can read code can fire up an instance and see this for themselves. Or you can prove it for free by looking at the code of any app that the author says was vibecoded without human review. You won't have to look very hard.
Agents can accomplish impressive things but also, often enough, they make incomprehensibly bad decisions or make things up. It's baked into the technology. We might figure out how to solve that problem eventually, but we haven't yet.
You can iterate, add more context to AGENTS.md or CLAUDE.md, add skills, setup hooks, and no matter how many times you do it the agents will still make mistakes. You can make specialized code review agents and run them in parallel, you can have competing models do audits, you can do dozens of passes and spend all the tokens you want, if it's a non trivial amount of code, doing non trivial things, and there's no human in the loop, there will still be critical mistakes.
No one has demonstrated different behavior, articles and posts claiming otherwise never attempt to prove that what they claim is actually possible. Because it isn't.
Just to be clear, I think coding agents are incredibly useful tools and I use them extensively. But you can't currently use them to write production code without a human in the loop. If you're not reading and understanding the code, you're going to be shipping vulnerabilities and tech debt.
Articles like this are just hype. But as long as they keep making front pages they'll keep distorting the conversation. And it's an otherwise interesting conversation! We're living through an unprecented paradigm shift, the field of possibilities is vast and there's a lot to figure out. The idea of autonomous coding agents is just a distraction from that, at least for now.
It blows my mind how these posts seem like everyone is victim of a collective amnesia.
Literally every single point in the article was good engineering practice way before AI. So it's either amnesia or simple ignorance.
In particular, "No coding before 10am" is worded a bit awkward, as it simply means "think before you write code", which... Does it need an article for saying it?
Good point. To clarify my stance: what I meant is that the narrative of the article is the following: AI made us change the playbook and so now, because of AI, the playbook is this one. Which is like saying that Sun Tzu wrote the cited line of the Art of War in a second edition, whereas his first version was "completely different".
I think I finally understand why the LLM craze is like catnip to management types - they think they've found a cheat code to workaround the mythical man-month
For my whole life in technology, there was this thing called the Mythical Man Month: nine women cannot have a baby in a month. If you're Google, you can't just put a thousand software engineers on a product and wipe out a startup because you can only... build that product with seven or eight people. Once they've figured it out, they've got that lead.
That's not true with AI. If you have data and you have enough GPUs, you can solve almost any problem. It is magic. You can throw money at the problem. We've never had that in tech.
Maybe it could have been written slightly clearer, but I think the intended meaning is, "If 10x more tokens saves a day, spend the tokens. The bottleneck should be human decision-making time, not agent compute time."
Any human in the loop will be a bottleneck in comparison to AI performance.
If we take that to its logical conclusion, I think we can answer that question.
Getting rid of humans, unfortunately, also takes away their earnings and therefore their ability to purchase whatever product you are developing. The ultra rich can only purchase your product so often - hence better make it a subscription model.
So there is pressure on purchasing power versus earnings. Interesting to see what happens and why.
I'm not sure I agree with this. 10x more tokens means leaaving the agent to work for 10x longer, which may lead to bugs and misintepretation of the intention. Breaking the goal into multiple tasks seems more efficient in terms of tokens and getting close to the desired goal. Of course this means more human involvment, but probably not 10x more.
The day being referred to is the human’s time, not the AI’s time. That sentence is saying substitute the cheap, abundant resource for the expensive, bottlenecked resource.
The points look like disconnected pieces of wisdom, rather than tied to some common goals or objectives. First get clarity on root objectives, roles (who is doing what), artifacts etc and then define rules that are immediately traceable to the objectives, roles and artifacts.
So if you specify work the right way, you can move at incredible velocity. If you have the confidence in your setup and ways of working, you don’t need to look at the code being put out.
Genuinely seeking answers on the following - if you’re working that way, what are you “understanding” about what’s being produced? Are you monitoring for signal that points out gaps in your spec which you update; code base is updated, bugs are fixed and the show goes on? What insights can you bring to how the code base works in reality?
Not a sceptic, but thinking this stuff through ain’t easy!
The spec prompts are typically better off being reviewed iteratively using AI itself, a lot more so than merely by pairing with coworkers. Perhaps a combination is best. The point is that AI reviews of the task spec must never be overlooked prior to its execution.
Also, if your spec is taking too long for the agent to execute, odds are high that it's ambiguous, unsound, unreviewed, underspecified, unmaintainable, or the model is just optimized to waste tokens so as to bill you maximally.
> Code is context, not a library. Data is the real interface.
I don't *yet* subscribe to the idea of "code is context for AI, not an interface for a human", but I have to admit that the idea sounds feasible. I have many examples of small-to-mid size apps (local use only) where I pretty much didn't even look at the code beyond checking that it doesn't do anything finicky. There, the code doesn't mater because I know that I can always regenerate it from my specs, POC-s, etc. I agree that the paradigm changes completely if you look at code as something temporary that can be thrown away and re-created when the specification changes. I don't know where this leads to and if this is good or not for our industry, but the fact is - it is feasible.
I would never use this paradigm for anything related to production, though. Nope. Never. Not in the foreseeable future anyway.
> Everyone uses their own IDE, prompting style, and workflow.
In my experience with recent models this is still not a good idea: it quickly leads to messy code where neither AI nor human can do anything anymore. Consistency is key. (And abstractions/layers/isolation everywhere, as usual).
IDE - of course. But, at the very least, I would suggest using the same foundation model across the code base, .agent/ dirs with plenty of project documents, reusable prompts, etc.
--
P.S. Still not sure what does the 10AM rule bring, though...
lol. my personal preference has always to do ALL the coding as early as possible. i get progressively dumber as the day wears on, seems sad to waste the prime hours on meetings and other more human things.
I don't see how that would change if you accept the premise that code is now a commodity.
Not linear in my case. My best is somewhere around 11ish. So that's usually when I start my ballmer peak and take my first beer. (Joking, of course, for people who don't get the reference)
This kind of generalizations are very organization specific because they rely on preexisting rules set within company. I dismiss every such rule and work that forces me to adjust my daily routine too heavily. Let me choose my best ways to deliver more instead of trying to fit me in the box.
In these cases, I just read the main point behind in this case is "create a way for devs to share context when working with AI".
That is essentially what the article says, that mornings are the most productive time, but it has shifted the focus from you doing the work, and mostly in the morning, to you outlining the work clearly in the morning, and the agent doing the work all day (and all night, and while you commute, and while you are in meetings)
Coding tools are less stable as the code grows for several reasons.
Some recent techniques claim to be solving this problem but none reached a release yet.
Working with what we have now, this is a recipe for disaster.
Agents often lies about the outputs. The shorter the context space they have to manage while the bigger the data already in context makes it prone to lie and deceive.
It works ok for small changes on top of human code. That's what we know works now.
The rest is more yet to be reached
I'm writing my side project as if I can afford only a 70B model in 2028, even if I have VC-subsidised unlimited GLM-5 now. I'm trimming away most of the generated code and generating more tests.
Would prefer if 2028 models are concise and generates perfect refactors.
Sounds like GPT wrote this piece based on some tech exec‘s „we must use AI or lose“ „strategy“. Just let engineers use the tools they want instead of force feeding them yet another ridiculous process. For me, if I have to do meetings in the morning (or „write promps“ lmao) instead of clearing out the ridiculous AI slop debt of code agents my product would never ship.
Anyone else find reading things like this slightly exhausting?
I'm very much pro AI for coding there are clearly significant capabilities there but I'm still getting my head around how to best utilise it.
Posts like these make it sound like ruthlessly optimizing your workflow letting no possible efficiency go every single day is the only way to work now. This has always been possible and generally not a good idea to focus on exclusively. There's always been processes to optimise and automate and always a balance as to which to pursue.
Personally I am incorporating AI into my daily work but not getting too bogged down by it. I read about some of the latest ideas and techniques and choose carefully which I employ. Sometimes I'll try and AI workflow and then abandon it. I recently connected Claude up to draw.io with an MCP, it had some good capabilities but for the specific task I wanted it wasn't really getting it so doing it manually was the better choice to achieve what I wanted in good time.
The models themselves and coding harnesses are also evolving quickly complex workflows people may put together can quickly become pointless.
More haste, less speed as they say!
Don't worry so much about spee. Some people obsess over it and don't realize they are running in the wrong direction.
Yeah, AI generated diagrams can be pretty hit or miss. The lack of good quality examples in training data and minimal documentation for these tools can make it difficult for models to even get basic syntax correct for more complex diagrams.
I’ve had a lot of success dogfooding my own product, the Mermaid Studio plugin for JetBrains IDEs (https://mermaidstudio.dev).
It combines the deep semantic code intelligence of an IDE with a suite of integrated MCP tools that your preferred agent can plug into for static analysis, up to date syntax, etc.
I basically tell Claude Code to run the generated diagram through the analysis tool, fix issues it detects and repeat until fixed. Then generate a png or svg for a visual inspection before finalizing the diagram.
Now all of my planning and architecture docs are filled with illustrative flowcharts, sequence diagrams, and occasionally block diagrams for workshopping proposed UI layouts
Nope, but I do find it revolting.
I know people like this, theres a form of procrastination where they are busy hyperoptimising their todo lists and workflows but getting only a tiny amount of actual work done. It's a form of the disconnected intellect - they can tell you the problem, they can even tell you the solution, but they can't turn that knowledge into action. Convincing themselves that utterly trivial inconveniences are so productivity and or psychologically harmful they can they can then rationalize all this "meta-work" while simultaneously claiming to be virtuous, when in reality it's usually insecurity in their abilities or cowardice to face the unknown, preventing them doing real work
I almost agree with you, but with regard to this specific blog I'm not sure I can.
From my perspective, all this energy spent on AI prompting is actually just planning meetings and whiteboarding in disguise, but since all that has the bad reputation of luring devs into power struggles and yak shaving this is the new way.
It's likely where most of their improved productivity is coming from. The people doing the meta-work just need be mature about it to avoid procrastinating.
"Agents should work overnight, on commutes, in meetings, asynchronously."
If I read stuff like that, I wonder what the F they are doing. Agents work overnight? On what? Stuck in some loop, trying to figure out how to solve a bug by trial and error because the agent isn't capable of finding the right solution? Nothing good will come out of that. When the agent clearly isn't capable of solving an issue in a reasonable amount of time, it needs help. Quite often, a hint is enough. That, of course, requires the developer to still understand what the agent is doing. Otherwise, most likely, it will sooner or later do something stupid to "solve" the issue. And later, you need to clean up that mess.
If your prompt is good and the agent is capable of implementing it correctly, it will be done in 10 minutes or less. If not, you still need to step in.
Everyone here (including me) agrees on how dumb this idea is, yet I know C level people who would love it.
I wonder how our comments will age in a few years.
Edit: to add
> Review the output, not the code. Don't read every line an agent writes
This can't be a serious project. It must be a greenfield startup that's just starting.
> I wonder how our comments will age in a few years.
I don't think there will be a future where agents need to work on a limited piece of code for hours. Either they are smart enough to do it in a limited amount of time, or someone smarter needs to get involved.
> This can't be a serious project. It must be a greenfield startup that's just starting.
I rarely review UI code. Doesn't mean that I don't need to step in from time to time, but generally, I don't care enough about the UI code to review it line-by-line.
> I wonder how our comments will age in a few years.
Badly. While I wouldn't assign a task to an LLM that requires such a long running time right now (for many reasons: control, cost etc) I am fully aware that it might eventually be something I do. Especially considering how fast I went from tab completion to whole functions to having LLMs write most of the code.
My competition right now is probably the grifters and hustlers already doing this, and not the software engineers that "know better". Laughing at the inevitable security disasters and other vibe coded fiascos while back-patting each other is funny but missing the forest for the trees.
We don't have enough context here really. For simple changes, sure - 10min is plenty. But imagine you actually have a big spec ready, with graphical designs, cucumber tests, integration tests, sample data, very detailed requirements for multiple components, etc. If the tests are well integrated and the harness is solid, I don't see a reason not to let it go for a couple hours or more. At some point you just can't implement things using the agent in a few simple iterations. If it can succeed on a longer timeline without interruption, that may be actually a sign of good upfront design.
To be clear, this is not a hypothetical situation. I wrote long specs like that and had large chunks of services successfully implemented up to around 2h real-time. And that was limited by the complexity of what I needed, not by what the agent could handle.
To be fair, for major features 30m to an hour isn’t out of this world. Browser testing is critical at this point but it _really_ slows down the AI in the last 15% of the process.
I can see overnight for a prototype of a completely new project with a detailed SPEC.md and a project requirements file that it eats up as it goes.
I can think of one reason for letting agents run overnight: running large models locally is incredibly slow or incredibly expensive. Even more so with he recent RAM price spikes thanks to the AI bubble. Running AI overnight can be a decent solution to solve complex prompts without being dependent on the cloud.
This approach breaks the moment you need to provide any form of feedback, of course.
Yeah if you have agents running overnight you're probably stupid
10 minute is not the limit for current models. I can have them work for hours on a problem.
Humans are not the only thing initiating prompts either. Exceptions and crashes coming in from production trigger agentic workflows to work on fixes. These can happen autonomously over night, 24/7.
> 10 minute is not the limit for current models. I can have them work for hours on a problem.
Admittedly, I have never tried to run it that long. If 10 minutes are not enough, I check what it is doing and tell it to do what it needs to do differently, or what to look at, or offer to run it with debug logs. Recently, I have also had a case where Opus was working on an issue forever, fixing one issue and thereby introducing another, fix that, only for the original issue to disappear. Then I tried out Codex, and it fixed it at first sight. So changing models can certainly help.
But do you really get a good solution after running it for hours? To me, that sounds like it doesn't understand the issue completely.
Sometimes it doesn't work or it will give up early, but considering these run when I'm not working it is not a big deal. When it does work I would say that it has figured out that hard part of the solution. I may have to do another prompt to clean it up a bit, but it got the hard work out of the way.
>or offer to run it with debug logs.
Enabling it to add its own debug logs and use a debugger can allow it to do these loops itself and understand where it's going wrong with its current approach.
That assumes that it can easily reproduce the issues. But it's not good at interacting with a complex UI like a human user.
> Don’t spec the process, spec the outcome.
For this, which summarises vibe coding and hence the rest of the article, the models aren't good enough yet for novel applications.
With current models and assuming your engineers are of a reasonable level of experience, for now it seems to result in either greatly reduced velocity and higher costs, or worse outcomes.
One course correction in terms of planned process, because the model missed an obvious implication or statement, can save days of churning.
The math only really has a chance to work if you reduce your spend on in-house talent to compensate, and your product sits on a well-trodden path.
In terms of capability we're still at "could you easily outsource this particular project, low touch, to your typical software farm?"
You can expand it beyond novel applications. The models aren't good enough for autonomous coding without a human in the loop period.
They can one shot basic changes and refactors, or even many full prototypes, but for pretty much everything else they're going to start making mistakes at some point. Usually very quickly. It's just where the technology is right now.
The thing that frustrates me is that this is really easy to demonstrate. Articles like this are essentially hallucinations that, at least many, people mystifyingly take seriously.
I assume the reason they get any traction is that a lot of people don't have enough experience with LLM agents yet to be confident that their personal experience generalizes. So they think maybe there are magical context tricks to get the current generation of agents to not make the kinds of mistakes they're seeing.
There aren't. It doesn't matter if it's Opus 4.6 in Claude Code or Codex 5.3 xhigh, they still hallucinate, fail to comprehend context and otherwise drift.
Anyone who can read code can fire up an instance and see this for themselves. Or you can prove it for free by looking at the code of any app that the author says was vibecoded without human review. You won't have to look very hard.
Agents can accomplish impressive things but also, often enough, they make incomprehensibly bad decisions or make things up. It's baked into the technology. We might figure out how to solve that problem eventually, but we haven't yet.
You can iterate, add more context to AGENTS.md or CLAUDE.md, add skills, setup hooks, and no matter how many times you do it the agents will still make mistakes. You can make specialized code review agents and run them in parallel, you can have competing models do audits, you can do dozens of passes and spend all the tokens you want, if it's a non trivial amount of code, doing non trivial things, and there's no human in the loop, there will still be critical mistakes.
No one has demonstrated different behavior, articles and posts claiming otherwise never attempt to prove that what they claim is actually possible. Because it isn't.
Just to be clear, I think coding agents are incredibly useful tools and I use them extensively. But you can't currently use them to write production code without a human in the loop. If you're not reading and understanding the code, you're going to be shipping vulnerabilities and tech debt.
Articles like this are just hype. But as long as they keep making front pages they'll keep distorting the conversation. And it's an otherwise interesting conversation! We're living through an unprecented paradigm shift, the field of possibilities is vast and there's a lot to figure out. The idea of autonomous coding agents is just a distraction from that, at least for now.
It blows my mind how these posts seem like everyone is victim of a collective amnesia.
Literally every single point in the article was good engineering practice way before AI. So it's either amnesia or simple ignorance.
In particular, "No coding before 10am" is worded a bit awkward, as it simply means "think before you write code", which... Does it need an article for saying it?
> Does it need an article for saying it?
Not for nothing but The Art of War includes really insightful quotes like "If you do not feed your soldiers, they will die."
Good point. To clarify my stance: what I meant is that the narrative of the article is the following: AI made us change the playbook and so now, because of AI, the playbook is this one. Which is like saying that Sun Tzu wrote the cited line of the Art of War in a second edition, whereas his first version was "completely different".
Is this what LLM-induced brain damage looks like? I think it is.
> If 10x more tokens saves a day, spend the tokens. The bottleneck is human decision-making time, not compute cost.
This seems entirely backwards. Why spend money to optimize something that _isn't_ the bottleneck?
I think I finally understand why the LLM craze is like catnip to management types - they think they've found a cheat code to workaround the mythical man-month
https://x.com/a16z/status/2018418113952555445
A women can at times, in extremely rare occasions have twins, or triplets, or quadruplets. In 9 months.
So that only goes some distance and then you face new limitations.
Maybe it could have been written slightly clearer, but I think the intended meaning is, "If 10x more tokens saves a day, spend the tokens. The bottleneck should be human decision-making time, not agent compute time."
Any human in the loop will be a bottleneck in comparison to AI performance.
If we take that to its logical conclusion, I think we can answer that question.
Getting rid of humans, unfortunately, also takes away their earnings and therefore their ability to purchase whatever product you are developing. The ultra rich can only purchase your product so often - hence better make it a subscription model.
So there is pressure on purchasing power versus earnings. Interesting to see what happens and why.
Just sell it to govt, then you can take printed money directly instead of having to go thru such filthy thing as "consumers"
I'm not sure I agree with this. 10x more tokens means leaaving the agent to work for 10x longer, which may lead to bugs and misintepretation of the intention. Breaking the goal into multiple tasks seems more efficient in terms of tokens and getting close to the desired goal. Of course this means more human involvment, but probably not 10x more.
The day being referred to is the human’s time, not the AI’s time. That sentence is saying substitute the cheap, abundant resource for the expensive, bottlenecked resource.
Am i misunderstanding? spending more tokens certainly is not optimizing for compute cost. It's the opposite
Oh, I also have a rule of not coding before 10 am, but that's because I'm drinking tea and thinking.
I thought that what article would be. Instead I got article going "we're dumber than Artificial Intern we hired, let's just be their secretary"
The points look like disconnected pieces of wisdom, rather than tied to some common goals or objectives. First get clarity on root objectives, roles (who is doing what), artifacts etc and then define rules that are immediately traceable to the objectives, roles and artifacts.
This is easy, no need for AI, just join any public servant IT organisation, regardless of the country. :)
So if you specify work the right way, you can move at incredible velocity. If you have the confidence in your setup and ways of working, you don’t need to look at the code being put out.
Genuinely seeking answers on the following - if you’re working that way, what are you “understanding” about what’s being produced? Are you monitoring for signal that points out gaps in your spec which you update; code base is updated, bugs are fixed and the show goes on? What insights can you bring to how the code base works in reality?
Not a sceptic, but thinking this stuff through ain’t easy!
The spec prompts are typically better off being reviewed iteratively using AI itself, a lot more so than merely by pairing with coworkers. Perhaps a combination is best. The point is that AI reviews of the task spec must never be overlooked prior to its execution.
Also, if your spec is taking too long for the agent to execute, odds are high that it's ambiguous, unsound, unreviewed, underspecified, unmaintainable, or the model is just optimized to waste tokens so as to bill you maximally.
This sounds like a genuinely awful way to work.
It has always been like this.
Plan before you code. Now your plan is just in a prompt.
> Code is context, not a library. Data is the real interface.
I don't *yet* subscribe to the idea of "code is context for AI, not an interface for a human", but I have to admit that the idea sounds feasible. I have many examples of small-to-mid size apps (local use only) where I pretty much didn't even look at the code beyond checking that it doesn't do anything finicky. There, the code doesn't mater because I know that I can always regenerate it from my specs, POC-s, etc. I agree that the paradigm changes completely if you look at code as something temporary that can be thrown away and re-created when the specification changes. I don't know where this leads to and if this is good or not for our industry, but the fact is - it is feasible.
I would never use this paradigm for anything related to production, though. Nope. Never. Not in the foreseeable future anyway.
> Everyone uses their own IDE, prompting style, and workflow.
In my experience with recent models this is still not a good idea: it quickly leads to messy code where neither AI nor human can do anything anymore. Consistency is key. (And abstractions/layers/isolation everywhere, as usual).
IDE - of course. But, at the very least, I would suggest using the same foundation model across the code base, .agent/ dirs with plenty of project documents, reusable prompts, etc.
--
P.S. Still not sure what does the 10AM rule bring, though...
lol. my personal preference has always to do ALL the coding as early as possible. i get progressively dumber as the day wears on, seems sad to waste the prime hours on meetings and other more human things.
I don't see how that would change if you accept the premise that code is now a commodity.
Not linear in my case. My best is somewhere around 11ish. So that's usually when I start my ballmer peak and take my first beer. (Joking, of course, for people who don't get the reference)
This kind of generalizations are very organization specific because they rely on preexisting rules set within company. I dismiss every such rule and work that forces me to adjust my daily routine too heavily. Let me choose my best ways to deliver more instead of trying to fit me in the box.
In these cases, I just read the main point behind in this case is "create a way for devs to share context when working with AI".
That is essentially what the article says, that mornings are the most productive time, but it has shifted the focus from you doing the work, and mostly in the morning, to you outlining the work clearly in the morning, and the agent doing the work all day (and all night, and while you commute, and while you are in meetings)
Yet to see cancer solving or fusion breakthroughs, until then what exactly are the running around the clock for, a CRUD app?
Coding tools are less stable as the code grows for several reasons.
Some recent techniques claim to be solving this problem but none reached a release yet.
Working with what we have now, this is a recipe for disaster. Agents often lies about the outputs. The shorter the context space they have to manage while the bigger the data already in context makes it prone to lie and deceive.
It works ok for small changes on top of human code. That's what we know works now. The rest is more yet to be reached
I'm writing my side project as if I can afford only a 70B model in 2028, even if I have VC-subsidised unlimited GLM-5 now. I'm trimming away most of the generated code and generating more tests.
Would prefer if 2028 models are concise and generates perfect refactors.
Which company is actually doing this?
Sounds like GPT wrote this piece based on some tech exec‘s „we must use AI or lose“ „strategy“. Just let engineers use the tools they want instead of force feeding them yet another ridiculous process. For me, if I have to do meetings in the morning (or „write promps“ lmao) instead of clearing out the ridiculous AI slop debt of code agents my product would never ship.
> What would your team’s tenets look like? I’d genuinely love to hear.
My team is incredibly clueless and complacent. I can't even get them to use TypeScript or to migrate from Yarn v1.