I did a bit of digging into why you think agentic coding is “not there yet”, and I think you are bashing a tool you have very little experience with and are using a bit wrongly.
Nothing wrong with that, except that as opposed to any other tool that is out there, agentic coding is approached by smart senior engineers that would otherwise spend time reading documentation and understanding a new package/tool/framework before giving conclusions around it with “I spun up Claude code and it’s not working”. Dunno why the same level of diligence isn’t applied to agentic coding as well.
First question that I always have to such engineers is “what model have you tried?” And it always ends up being the non-SOTA models for tasks that are not simple. Have you tried Claude Opus?
Second question: have you tried plan mode?
And then I politely ask them to read some documentation on using these tools, because the simplicity of the chat interface is deceptive.
> You could take an editor session, a diff, or a pull request and automatically split it into a series of more focused commits that are easier for people to review. This is one of the cases where the AI can reduce human review labor
I feel this should be a bigger focus than it is. All the AI code review start up are mostly doing “hands off” code review. It’s just an agent reviewing everything.
Why not have an agent create a perfect “review plan” for human consumption? Split the review up in parts that can be individually (or independently) reviewed and then fixed by the coding agent. Have a proper ordering in files (GitHub shows files in a commit alphabetically, which is suboptimal), and hide boring details like function implementations that can be easily unit tested.
Unfortunately GitHub doesn’t let you easily review commits in a PR. You can easily selectively review files, but comments are assumed to apply to the most recent HEAD of the PR branch. This is probably why review agents don’t natively use that workflow. It would probably not be hard to instruct the released versions of Opus or Codex to do this, however, particularly if you can generate a PR plan, either via human or model.
I do this. For example, the other day I made a commit where I renamed some fields of a struct and removed others, then I realized it would be easier to review if those were two separate commits. But it was hard to split them out mechanically, so I asked Claude to do it, creating two new commits whose end result must match the old one and must both past tests. It works quite well.
I wonder if the problem of idle time / waiting / breaking flow is a function of the slowness. That would be simple to test, because there are super fast 1000 tok/s providers now.
(Waiting for Cerebras coding plan to stop being sold out ;)
I've used them for smaller tasks (making small edits), and the "realtime" aspect of it does provide a qualitative difference. It stops being async and becomes interactive.
A sufficient shift in quantity produces a phase shift in quality.
--
That said, the main issue I find with agentic is my mental model getting desynchronized. No matter how fast the models get, it takes a fixed amount of time for me to catch up and understand what they've done.
The most enjoyable way I've found of staying synced is to stay in the driver's seat, and to command many small rapid edits manually. (i.e. I have my own homebrew "agent" that's just a loop of, I prompt it, it proposes edits, I accept or edit, repeat.)
So then the "synchronization" of the mental state is happening continuously, because there is no opportunity for desynchronization. Because you are the one driving. I call that approach semi-auto, or Power Coding (akin to Power Armor, which is wielded manually but greatly enhances speed and strength).
You still have to synchronize with your code reviewers and teammates, so how well you work together in a team becomes a limiting factor at some point then I guess.
Yes, and that constraint shows up surprisingly early.
Even if you eliminate model latency and keep yourself fully in sync via a tight human-in-the-loop workflow, the shared mental model of the team still advances at human speed. Code review, design discussion, and trust-building are all bandwidth-limited in ways that do not benefit much from faster generation.
There is also an asymmetry: local flow can be optimized aggressively, but collaboration introduces checkpoints. Reviewers need time to reconstruct intent, not just verify correctness. If the rate of change exceeds the team’s ability to form that understanding, friction increases: longer reviews, more rework, or a tendency to rubber-stamp changes.
This suggests a practical ceiling where individual "power coding" outpaces team coherence. Past that point, gains need to come from improving shared artifacts rather than raw output: clearer commit structure, smaller diffs, stronger invariants, better automated tests, and more explicit design notes. In other words, the limiting factor shifts from generation speed to synchronization quality across humans.
You can ask the agent to reverse engineer its own design and provide a design document that can inform the code review discussion. Plus, hopefully human code review would only occur after several rounds of the agent refactoring its own one-shot slop into something that's up to near-human standards of surveyability and maintainability.
Post had nothing to do with Haskell so the title is a bit misleading. But rest of article is good, and I actually think that Agentic/AI coding will probably evolve in this way.
The current tools are the infancy of AI assisted coding. It’s like the MS-DOS era. Over time maybe the backpropagating from “your comfort language” to “target language” could become commonplace.
Agree. Gist of the FA is about "calm technology". Title should reflect it better.
Also agree on everything author mentions. I can't attest to all examples but I know what a UI is.
Author mentions center of focus of attention. We should hear more often about the periphery of our attention field. Its bandwidth so to speak is a magnitude lower compared to the center but it's still there and can guide some decisions quite unintrusively to flow.
(Major) eye movements are a detriment to attention, which itself should be treated like a commodity (in case of a UI thousands use, moreso like a borrowed commodity).
Is the article good? I found it of a surprisingly poor quality. Is my assessment incorrect? Basically it is an article that tries to convince people of how relevant AI is nowadays. I don't really see it like that at all and none of the "arguments" I found convincing.
Programming languages are most interesting area in CS for the next 10 years. AI need criteria for correctness that can't be faked so the boundary between proof verification and programs will become fuzzier and fuzzier. The runtimes also need support for massively parallel development in a way that is totally unnecessary for humans.
What I've found is that most people who dislike the chat interface aren't using it in a way that leverages its strengths.
Up until recently, LLMs just plain sucked. You'd set them on a task and then spend hours hand-holding them to output something almost correct.
Nowadays you can have a conversation with the chatbot, hash out a design, rubber duck and discuss what-ifs until you have a solid idea of the thing you're building, codified in a way an agent could understand, and now you have a PLAN.
From there, it's a matter of setting the agent in motion and checking from time to time to make sure it's not getting stuck on something under-specified.
That said, I've found that this kind of workflow works a lot better with claude than with gemini.
I have the same feeling recently that we should focus more on using AI to enable us, to empower us to do the important things. Not take away but enhance, boring , clear boilerplate yes, design decisions no.
And making reviewing easier is a perfect example of enhancing our workflow.
Not reviewing for us, but supporting us.
I am recently using this tiny[1] skill to generate an order on how to review a PR and it has been very helpful to me.
I have been considering what it would be like to give each function name a specific color and a color for each variable's type followed by a color derived from the hash of the symbol name and keywords would each be their specific type. And essentially printing a matrix of this, essentially transforming your code into a printable matrix "low-lod" or "mipmap" form. This could be implemented like the VSCode minimap but I the right move here is to implement it as a hook that can modify the output of your agent. That way you can look at the structure of the code without reading the names in particular.
Great idea. As a "visual type" this would be so much more intuitive to decipher. I prefer TUIs over GUI exactly because they're simpler and work hard to focus on the essential. This is low hanging fruit to enhance TUIs.
> “Focus on…” would allow the user to specify what they're interested in changing and present only files and lines of code related to their specified interest.
> “Edit as…” would allow the user to edit the file or selected code as if it were a different programming language or file format.
The "junior dev" analogy is the one I keep coming back to, but the part people miss is the review surface area problem.
When a human junior writes code, they leave breadcrumbs of their thinking — commit messages, PR descriptions, comments explaining why they chose approach A over B. You can reconstruct their reasoning from the artifact trail.
Agents don't do this naturally. You get a diff with no context for why it went that direction. So the reviewer has to reverse-engineer the thinking from the code alone, which is actually harder than reviewing human code because there are no "tells" — no familiar coding style, no consistent patterns that hint at the developer's mental model.
The semi-auto approach mentioned upthread works precisely because it solves this: you were there for every decision, so there's nothing to reconstruct. The productivity loss from staying in the loop is offset by the time you save not having to audit opaque changes after the fact.
The only way AI companies can recover their capex is to replace workers. That's why their interfaces are only facially built for the workers they're replacing (engineers, finance, etc) and why this is a non-starter: it totally undermines the business model.
The “Calm technology” thing always annoys me, because it skips every economic, social, and psychological reason for the current state of affairs and presents itself as some kind of wondrous discovery, as opposed to “the way things were before we invented the MBA.” A willing blindness to predators doesn’t provide a particularly useful toolkit.
“Facet-based project navigation
You could browse a project by a tree of semantic facets. For example, if you were editing the Haskell implementation of Dhall the tree viewer might look like this prototype I hacked up2”
^ This is a genius idea - someone add this to claude
At work we use Clean Architecture which is incredibly hard to browse, even though I've been there for 6+ months now and know where everything is, I have to use so much working memory to gather together the files for a feature slice (endpoint, command, command handler, etc).
I've thought for a while of building this exact thing as a vscode extension because of how utterly shit it is :D
> A tool is not meant to be the object of our attention; rather the tool should reveal the true object of our attention (the thing the tool acts upon), rather than obscuring it
I think this is true of AI agents. What is the object of our engineering attention? Applications, features, defect resolution. Not code.
"I allow interview candidates to use agentic coding tools and candidates who do so consistently performed worse"
I have a similar impression. It seems to me that people get something that kind of works and then their interest runs out and they're left with a shallow understanding of the result and how it might be achieved. This seems detrimental to learning, which tends to happen when one is struggling.
"I strongly believe that chat is the least interesting interface to LLMs"
This is also something I agree with. When I work with databases, the best part is not sitting with an immediate client writing raw queries by hand.
Agentic coding doesn't make any sense for a job interview. To do it well requires a detailed specification prompt which can't reliably be written in an interview. It ideally also requires iterating upon the prompt to refine it before execution. You get out of it what you put into it.
As someone that agenticly codes A LOT. Detailed specs are not required, but certainly one way to use the systems.
If you are going to do a big build out of something, spec up front at least to have a clear idea of the application architectural boundaries.
If you are adding features to a mature code base, then the general order of the day is: First have the Ai scout all the code related to the thing you are changing. Then have it give you a summary of its general plan of action. Then fire it off and review the results (or watch it, less needed now though).
For smaller edits or even significant features, I often just give it very short instructions of a few sentences, if I have done my job well the code is fairly opinionated and the models pick up the patterns well and I don't really have to give much guidance. I'll usually just ask for a few touchups like introdusing some fluent api nicities.
That being said, I do tend to make a few surgical requests of the AI when I review the PR, usually around abraction seams.
(For my play projects I don't even look at the code any more unless I hit a wall, and I haven't really hit a wall since Opus 4.5, though I do have a material physics simulator that Opus 4.5 wrote that runs REALLY slow that I should muck around in, but I'm thinking of seeing if Opus 4.6 can move it to the GPU by itself first.)
So if I were doing an interview with an interview question. I would probably do a "let's break down what we know", "what can we apply to this", "ok. let's start with x" and then iterate quickly and look at the code to validate as needed.
In the UK the driving test requires a portion of driving using a satnav, the idea being that drivers are going to use satnavs so it's important to test that they know how how to use them safely.
The same goes for using Claude in a programming interview. If the environment of interview is not representative of how people actually work then the interview needs to be changed.
In the Before Times we used to do programming interviews with “you can use Google and stack overflow” for precisely this reason. We weren’t testing for encyclopaedic knowledge - we were testing to see if the candidate could solve a problem.
But the hard part is designing the problem so that it exercises skill.
You’ve just written the exact reason LeeteCode is widely mocked as an interview technique. They are not representative of most real world software, and engineers that train to solve them give a false impression of their ability to solve most other problems.
I’ve interviewed hundreds of engineers for software and hardware roles. A good coding test is based on self-contained problems that the team actually encountered while developing our product. Boil the problem down to its core, create a realistic setup that reflects the information the team had when they encountered the challenge, and then ask the candidate to think it through. It doesn’t matter if they only write notes or pseudo code, and it doesn’t matter if they reach the wrong conclusion. What it’s testing for is the thought process. The fact the candidate has to ask the interviewer questions as though the interviewer is effectively the IDE, is great! The interviewer experiences the engineer’s thought process first-hand. And the interviewer can nudge the candidate in the correct direction by communicating answers that aren’t just typical IDE error messages.
To validate these kinds of questions in advance, I’d often run them on existing team members that hadn’t already been exposed to the real challenge the problem was based on.
It's a time issue. Interviews hardly offer much time as it is. To ask for something that benefits from multiple iterations is probably not going to fit in the available time.
> I believe there is a lot of untapped potential in AI-assisted coding tools
Yikes.
By the way, the whole website is strange. Just the name alone "haskell for all".
Many years ago when I tried to learn Haskell (and wrote some haskell code that worked but it was sooooo much harder when compared to ruby or python), one of the few things that appeared early on, aside from the monad barrier, was that many haskell people said that Haskell is deliberately not for everyone. Back then this was when IRC was still en vogue, so I "heard" that via various discussions on #haskell.
I did not fully understand this part, because ... why would you write a language that only a few big brain people could use? I found that elitistic and snobbish, even arrogant.
Only at a later time did I understand one part of the meaning. The "we don't want you here" also means "we don't want YOU to change haskell into some other new meta-variant". I understood this much better when some guys wanted to have ruby embrace types. Then I understood that people not only want to change a language but also want to ruin it; whether on purpose or because they prefer something else (such as their brain embraced types-only code bases) is a separate discussion. I still find the haskell attitude very elitistic but I at the least understand that they don't want everyone to use - and change - Haskell.
> For example, someone who was new to Haskell could edit a Haskell file “as Python” and then after finishing their edits the AI attempts to back-propagate their changes to Haskell.
I like the general idea behind "write in any language, have it work in EVERY language". But the whole AI movement seems more about trying to dumb down people really or make them lazy, in many ways. I have seen people use it to great effect, so I am not at all saying AI has no use cases. What I am however had noticing is that it made many normal folks super-lazy. They type on their smartphone, solution comes out, task finished, move on. That's not necessarily only bad, but it comes with trade-offs. My approach is much slower, but it is systematic and I am in full control of what is documented how and where.
> This is obviously not a comprehensive list of ideas, but I wrote this to encourage people to think of more innovative ways to incorporate AI into people's workflows
Oh he has achieved this in a different way. Now I have another reason to not want AI in my "workflows". The whole website also seems super-strange to me. Has he used AI to write the whole content and layout? It's hard to say because I don't know how it used to be in the past, but the paragraphs and the content seem so strange. I suspect he used AI to generate the layout too; and some of the content as well. We are losing "interaction" with real humans here too (ok ok, there is not a lot of interaction with regards to a static website, but if a blog is written by AI, then that is not really any possibility for interaction with a human - you could not even distinguish WHO wrote the content or made the decisions such as which style to choose and so forth; it looks very fake to me or, at the least, in part. I typically don't see this with other blogs.).
I did not use AI to generate my blog's content nor layout.
Also, the reason my blog is named "Haskell for all" is because I originally created my blog a long time ago to try to make Haskell more accessible to people and counter the elitist tendencies.
I did a bit of digging into why you think agentic coding is “not there yet”, and I think you are bashing a tool you have very little experience with and are using a bit wrongly.
Nothing wrong with that, except that as opposed to any other tool that is out there, agentic coding is approached by smart senior engineers that would otherwise spend time reading documentation and understanding a new package/tool/framework before giving conclusions around it with “I spun up Claude code and it’s not working”. Dunno why the same level of diligence isn’t applied to agentic coding as well.
First question that I always have to such engineers is “what model have you tried?” And it always ends up being the non-SOTA models for tasks that are not simple. Have you tried Claude Opus?
Second question: have you tried plan mode?
And then I politely ask them to read some documentation on using these tools, because the simplicity of the chat interface is deceptive.
> You could take an editor session, a diff, or a pull request and automatically split it into a series of more focused commits that are easier for people to review. This is one of the cases where the AI can reduce human review labor
I feel this should be a bigger focus than it is. All the AI code review start up are mostly doing “hands off” code review. It’s just an agent reviewing everything.
Why not have an agent create a perfect “review plan” for human consumption? Split the review up in parts that can be individually (or independently) reviewed and then fixed by the coding agent. Have a proper ordering in files (GitHub shows files in a commit alphabetically, which is suboptimal), and hide boring details like function implementations that can be easily unit tested.
Unfortunately GitHub doesn’t let you easily review commits in a PR. You can easily selectively review files, but comments are assumed to apply to the most recent HEAD of the PR branch. This is probably why review agents don’t natively use that workflow. It would probably not be hard to instruct the released versions of Opus or Codex to do this, however, particularly if you can generate a PR plan, either via human or model.
I do this. For example, the other day I made a commit where I renamed some fields of a struct and removed others, then I realized it would be easier to review if those were two separate commits. But it was hard to split them out mechanically, so I asked Claude to do it, creating two new commits whose end result must match the old one and must both past tests. It works quite well.
Yes please. There are many use cases where failure modes are similar to not using AI at all, which is useful.
Many very low risk applications of AI can add up to high payoff without high risk.
I wonder if the problem of idle time / waiting / breaking flow is a function of the slowness. That would be simple to test, because there are super fast 1000 tok/s providers now.
(Waiting for Cerebras coding plan to stop being sold out ;)
I've used them for smaller tasks (making small edits), and the "realtime" aspect of it does provide a qualitative difference. It stops being async and becomes interactive.
A sufficient shift in quantity produces a phase shift in quality.
--
That said, the main issue I find with agentic is my mental model getting desynchronized. No matter how fast the models get, it takes a fixed amount of time for me to catch up and understand what they've done.
The most enjoyable way I've found of staying synced is to stay in the driver's seat, and to command many small rapid edits manually. (i.e. I have my own homebrew "agent" that's just a loop of, I prompt it, it proposes edits, I accept or edit, repeat.)
So then the "synchronization" of the mental state is happening continuously, because there is no opportunity for desynchronization. Because you are the one driving. I call that approach semi-auto, or Power Coding (akin to Power Armor, which is wielded manually but greatly enhances speed and strength).
You still have to synchronize with your code reviewers and teammates, so how well you work together in a team becomes a limiting factor at some point then I guess.
Yes, and that constraint shows up surprisingly early.
Even if you eliminate model latency and keep yourself fully in sync via a tight human-in-the-loop workflow, the shared mental model of the team still advances at human speed. Code review, design discussion, and trust-building are all bandwidth-limited in ways that do not benefit much from faster generation.
There is also an asymmetry: local flow can be optimized aggressively, but collaboration introduces checkpoints. Reviewers need time to reconstruct intent, not just verify correctness. If the rate of change exceeds the team’s ability to form that understanding, friction increases: longer reviews, more rework, or a tendency to rubber-stamp changes.
This suggests a practical ceiling where individual "power coding" outpaces team coherence. Past that point, gains need to come from improving shared artifacts rather than raw output: clearer commit structure, smaller diffs, stronger invariants, better automated tests, and more explicit design notes. In other words, the limiting factor shifts from generation speed to synchronization quality across humans.
This thread seems to have re-identified Amdahl’s law in the context of software development workflow.
Agentic coding is only speeding up or parallelising a small part of the workflow - the rest is still sequential and human-driven.
You can ask the agent to reverse engineer its own design and provide a design document that can inform the code review discussion. Plus, hopefully human code review would only occur after several rounds of the agent refactoring its own one-shot slop into something that's up to near-human standards of surveyability and maintainability.
Post had nothing to do with Haskell so the title is a bit misleading. But rest of article is good, and I actually think that Agentic/AI coding will probably evolve in this way.
The current tools are the infancy of AI assisted coding. It’s like the MS-DOS era. Over time maybe the backpropagating from “your comfort language” to “target language” could become commonplace.
Agree. Gist of the FA is about "calm technology". Title should reflect it better.
Also agree on everything author mentions. I can't attest to all examples but I know what a UI is.
Author mentions center of focus of attention. We should hear more often about the periphery of our attention field. Its bandwidth so to speak is a magnitude lower compared to the center but it's still there and can guide some decisions quite unintrusively to flow.
(Major) eye movements are a detriment to attention, which itself should be treated like a commodity (in case of a UI thousands use, moreso like a borrowed commodity).
> Post had nothing to do with Haskell so the title is a bit misleading.
To be fair, that's not part of the article's title, but rather the title of the website that the article was posted to.
I know, but that's not typically how you see titles posted here. I'm just disappointed as I enjoy writing Haskell. :)
I was excited to see a non-AI article on this site for once. Oh well.
It was a good article though
Agreed. This website seems to prepend the blog name to each page's document.title
Would suggest that one of the mods remove it
Is the article good? I found it of a surprisingly poor quality. Is my assessment incorrect? Basically it is an article that tries to convince people of how relevant AI is nowadays. I don't really see it like that at all and none of the "arguments" I found convincing.
Programming languages are most interesting area in CS for the next 10 years. AI need criteria for correctness that can't be faked so the boundary between proof verification and programs will become fuzzier and fuzzier. The runtimes also need support for massively parallel development in a way that is totally unnecessary for humans.
What I've found is that most people who dislike the chat interface aren't using it in a way that leverages its strengths.
Up until recently, LLMs just plain sucked. You'd set them on a task and then spend hours hand-holding them to output something almost correct.
Nowadays you can have a conversation with the chatbot, hash out a design, rubber duck and discuss what-ifs until you have a solid idea of the thing you're building, codified in a way an agent could understand, and now you have a PLAN.
From there, it's a matter of setting the agent in motion and checking from time to time to make sure it's not getting stuck on something under-specified.
That said, I've found that this kind of workflow works a lot better with claude than with gemini.
I have the same feeling recently that we should focus more on using AI to enable us, to empower us to do the important things. Not take away but enhance, boring , clear boilerplate yes, design decisions no. And making reviewing easier is a perfect example of enhancing our workflow. Not reviewing for us, but supporting us.
I am recently using this tiny[1] skill to generate an order on how to review a PR and it has been very helpful to me.
https://www.dev-log.me/pr_review_navigator_for_claude/
I have been considering what it would be like to give each function name a specific color and a color for each variable's type followed by a color derived from the hash of the symbol name and keywords would each be their specific type. And essentially printing a matrix of this, essentially transforming your code into a printable matrix "low-lod" or "mipmap" form. This could be implemented like the VSCode minimap but I the right move here is to implement it as a hook that can modify the output of your agent. That way you can look at the structure of the code without reading the names in particular.
Great idea. As a "visual type" this would be so much more intuitive to decipher. I prefer TUIs over GUI exactly because they're simpler and work hard to focus on the essential. This is low hanging fruit to enhance TUIs.
I really like the "file lens" example:
> “Focus on…” would allow the user to specify what they're interested in changing and present only files and lines of code related to their specified interest.
> “Edit as…” would allow the user to edit the file or selected code as if it were a different programming language or file format.
I whole heartedly prefer chat interfaces over inline ai suggestions.
I find the inline stuff so incredibly annoying because they move around the text I am looking at.
Same! It feels like being shouted at nonstop by an overeager teacher's pet who's wrong 60% of the time.
I do appreciate in-IDE functionality that can search the codebase etc etc, but I want to hit a button when I need it.
This is an amazing article. The HN title should be edited a bit. "Calm Technology - Beyond Agentic Coding"
Hard agree.
The "junior dev" analogy is the one I keep coming back to, but the part people miss is the review surface area problem.
When a human junior writes code, they leave breadcrumbs of their thinking — commit messages, PR descriptions, comments explaining why they chose approach A over B. You can reconstruct their reasoning from the artifact trail.
Agents don't do this naturally. You get a diff with no context for why it went that direction. So the reviewer has to reverse-engineer the thinking from the code alone, which is actually harder than reviewing human code because there are no "tells" — no familiar coding style, no consistent patterns that hint at the developer's mental model.
The semi-auto approach mentioned upthread works precisely because it solves this: you were there for every decision, so there's nothing to reconstruct. The productivity loss from staying in the loop is offset by the time you save not having to audit opaque changes after the fact.
The only way AI companies can recover their capex is to replace workers. That's why their interfaces are only facially built for the workers they're replacing (engineers, finance, etc) and why this is a non-starter: it totally undermines the business model.
The “Calm technology” thing always annoys me, because it skips every economic, social, and psychological reason for the current state of affairs and presents itself as some kind of wondrous discovery, as opposed to “the way things were before we invented the MBA.” A willing blindness to predators doesn’t provide a particularly useful toolkit.
I would be interested to hear you elaborate on this more. I feel like I almost get what you are saying but am not confident I actually understand.
“Facet-based project navigation You could browse a project by a tree of semantic facets. For example, if you were editing the Haskell implementation of Dhall the tree viewer might look like this prototype I hacked up2”
^ This is a genius idea - someone add this to claude
At work we use Clean Architecture which is incredibly hard to browse, even though I've been there for 6+ months now and know where everything is, I have to use so much working memory to gather together the files for a feature slice (endpoint, command, command handler, etc).
I've thought for a while of building this exact thing as a vscode extension because of how utterly shit it is :D
I really want the source code!
> A tool is not meant to be the object of our attention; rather the tool should reveal the true object of our attention (the thing the tool acts upon), rather than obscuring it
I think this is true of AI agents. What is the object of our engineering attention? Applications, features, defect resolution. Not code.
"I allow interview candidates to use agentic coding tools and candidates who do so consistently performed worse"
I have a similar impression. It seems to me that people get something that kind of works and then their interest runs out and they're left with a shallow understanding of the result and how it might be achieved. This seems detrimental to learning, which tends to happen when one is struggling.
"I strongly believe that chat is the least interesting interface to LLMs"
This is also something I agree with. When I work with databases, the best part is not sitting with an immediate client writing raw queries by hand.
Agentic coding doesn't make any sense for a job interview. To do it well requires a detailed specification prompt which can't reliably be written in an interview. It ideally also requires iterating upon the prompt to refine it before execution. You get out of it what you put into it.
As someone that agenticly codes A LOT. Detailed specs are not required, but certainly one way to use the systems.
If you are going to do a big build out of something, spec up front at least to have a clear idea of the application architectural boundaries.
If you are adding features to a mature code base, then the general order of the day is: First have the Ai scout all the code related to the thing you are changing. Then have it give you a summary of its general plan of action. Then fire it off and review the results (or watch it, less needed now though).
For smaller edits or even significant features, I often just give it very short instructions of a few sentences, if I have done my job well the code is fairly opinionated and the models pick up the patterns well and I don't really have to give much guidance. I'll usually just ask for a few touchups like introdusing some fluent api nicities.
That being said, I do tend to make a few surgical requests of the AI when I review the PR, usually around abraction seams.
(For my play projects I don't even look at the code any more unless I hit a wall, and I haven't really hit a wall since Opus 4.5, though I do have a material physics simulator that Opus 4.5 wrote that runs REALLY slow that I should muck around in, but I'm thinking of seeing if Opus 4.6 can move it to the GPU by itself first.)
So if I were doing an interview with an interview question. I would probably do a "let's break down what we know", "what can we apply to this", "ok. let's start with x" and then iterate quickly and look at the code to validate as needed.
In the UK the driving test requires a portion of driving using a satnav, the idea being that drivers are going to use satnavs so it's important to test that they know how how to use them safely.
The same goes for using Claude in a programming interview. If the environment of interview is not representative of how people actually work then the interview needs to be changed.
In the Before Times we used to do programming interviews with “you can use Google and stack overflow” for precisely this reason. We weren’t testing for encyclopaedic knowledge - we were testing to see if the candidate could solve a problem.
But the hard part is designing the problem so that it exercises skill.
We don't solve LeetCode for a living yet it is asked in interviews anyway, so nah, we don't have to use AI in interviews.
You’ve just written the exact reason LeeteCode is widely mocked as an interview technique. They are not representative of most real world software, and engineers that train to solve them give a false impression of their ability to solve most other problems.
I’ve interviewed hundreds of engineers for software and hardware roles. A good coding test is based on self-contained problems that the team actually encountered while developing our product. Boil the problem down to its core, create a realistic setup that reflects the information the team had when they encountered the challenge, and then ask the candidate to think it through. It doesn’t matter if they only write notes or pseudo code, and it doesn’t matter if they reach the wrong conclusion. What it’s testing for is the thought process. The fact the candidate has to ask the interviewer questions as though the interviewer is effectively the IDE, is great! The interviewer experiences the engineer’s thought process first-hand. And the interviewer can nudge the candidate in the correct direction by communicating answers that aren’t just typical IDE error messages.
To validate these kinds of questions in advance, I’d often run them on existing team members that hadn’t already been exposed to the real challenge the problem was based on.
How about bug fixing? Give someone a repo with a tricky bug, ask them to figure it out with the help of their coding agent of choice.
It doesn't have to be a "tricky" bug. A straightforward bug will do. If it's too tricky, the logic could be better off being rewritten.
>which can't reliably be written in an interview
Why not? It sounds like a skill issue to me.
>It ideally also requires iterating upon the prompt to refine it before execution.
I don't understand. It's not like you would need to one shot it.
It's a time issue. Interviews hardly offer much time as it is. To ask for something that benefits from multiple iterations is probably not going to fit in the available time.
> I believe there is a lot of untapped potential in AI-assisted coding tools
Yikes.
By the way, the whole website is strange. Just the name alone "haskell for all".
Many years ago when I tried to learn Haskell (and wrote some haskell code that worked but it was sooooo much harder when compared to ruby or python), one of the few things that appeared early on, aside from the monad barrier, was that many haskell people said that Haskell is deliberately not for everyone. Back then this was when IRC was still en vogue, so I "heard" that via various discussions on #haskell.
I did not fully understand this part, because ... why would you write a language that only a few big brain people could use? I found that elitistic and snobbish, even arrogant.
Only at a later time did I understand one part of the meaning. The "we don't want you here" also means "we don't want YOU to change haskell into some other new meta-variant". I understood this much better when some guys wanted to have ruby embrace types. Then I understood that people not only want to change a language but also want to ruin it; whether on purpose or because they prefer something else (such as their brain embraced types-only code bases) is a separate discussion. I still find the haskell attitude very elitistic but I at the least understand that they don't want everyone to use - and change - Haskell.
> For example, someone who was new to Haskell could edit a Haskell file “as Python” and then after finishing their edits the AI attempts to back-propagate their changes to Haskell.
I like the general idea behind "write in any language, have it work in EVERY language". But the whole AI movement seems more about trying to dumb down people really or make them lazy, in many ways. I have seen people use it to great effect, so I am not at all saying AI has no use cases. What I am however had noticing is that it made many normal folks super-lazy. They type on their smartphone, solution comes out, task finished, move on. That's not necessarily only bad, but it comes with trade-offs. My approach is much slower, but it is systematic and I am in full control of what is documented how and where.
> This is obviously not a comprehensive list of ideas, but I wrote this to encourage people to think of more innovative ways to incorporate AI into people's workflows
Oh he has achieved this in a different way. Now I have another reason to not want AI in my "workflows". The whole website also seems super-strange to me. Has he used AI to write the whole content and layout? It's hard to say because I don't know how it used to be in the past, but the paragraphs and the content seem so strange. I suspect he used AI to generate the layout too; and some of the content as well. We are losing "interaction" with real humans here too (ok ok, there is not a lot of interaction with regards to a static website, but if a blog is written by AI, then that is not really any possibility for interaction with a human - you could not even distinguish WHO wrote the content or made the decisions such as which style to choose and so forth; it looks very fake to me or, at the least, in part. I typically don't see this with other blogs.).
Author here: my pronouns are she/her
I did not use AI to generate my blog's content nor layout.
Also, the reason my blog is named "Haskell for all" is because I originally created my blog a long time ago to try to make Haskell more accessible to people and counter the elitist tendencies.