While you can see them as a productivity enhancing tool, in times of tight budgets, they can be useful to lay off more programmers because a single one is now way more productive than pre-LLM.
I feel that LLMs will increase the barrier to entry for newcomers while also make it easier for companies to layoff more devs as you don't need as many. All in all, I expect salaries for non FAANG devs to decrease while salaries for FAANG devs to increase slightly (given the increased value they can now make).
Any thoughts on this?
Developers (often juniors) use LLM code without taking time to verify it. This leads to bugs and they can't fix it because they don't understand the code. Some senior developers also trust the tool to generate a function, and don't take the time to review it and catch the edge cases that the tool missed.
They rely on ChatGPT to answer their questions instead of taking time to read the documentation or a simple web search to see discussions on stack overflow or blogs about the subject. This may give results in the short term, but they don't actually learn to solve problems themselves. I am afraid that this will have huge negative effects on their career if the tools improve significantly.
Learning how to solve problems is an important skill. They also lose access to the deeper knowledge that enable you to see connections, complexities and flows that the current generation of tools are unable to do. By reading the documentation, blogs or discussions you are often exposed to a wider view of the subject than the laser focused answer of ChatGPT
There will be less room for "vibe coders" in the future, as these tools increasingly solve the simple things without requiring as much management. Until we reach AGI (I doubt it will happen within the next 10 years) the tools will require experienced developers to guide them for the more complex issues. Older experienced developers, and younger developers who have learned how to solve problems and have deep knowledge, will be in demand.
Documentation is not written with answers in mind. Every little project wants me to be an expert in their solution. They want to share with me the theory behind their decisions. I need an answer now.
Web search no longer provides useful information within the first few results. Instead, I get content farms who are worse than recipe pages - explaining why someone would want this information, but never providing it.
A junior isn’t going to learn from information that starts from the beginning (“if you want to make an apple pie from scratch, you must first invent the universe.”) 99.999% of them need a solution they can tweak as needed so they can begin to understand the thing.
LLMs are good at processing and restructuring information so I can ask for things the way I prefer to receive them.
Ultimately, the problem is actually all about verification.
I have an answer now, because I read the documentation last week.
As a real example, I needed to change my editor config last month. I do this about once every 5 years. I really didn’t want to become an expert in the config system again, so I tried LLM.
Sad to report, it told me where to look but all of the exact details were wrong. Maybe someday soon, though.
When you see 30-50 years of change you realise this was inevitable and in every generation there's new engineers entering with limited understanding of the layers beneath. Even the code produced. Do I understand the lexers and the compilers that turn my code in to machine code or instruction sets? Heck no. Doesn't mean I shouldn't use the tools available to me now.
And as with any automation, there will be a select few who will understand it's inner workings, and a vast majority that will enjoy/suffer the benefits.
You could argue that it makes the bar lower to be productive so the candidate pool is much greater, but you're arguing the opposite, increasing the barrier to entry.
I'm open to arguments either way and I'm undecided, but you have to have a coherent economic model.
You need less engineers to do the same, demand gets lower, offer remains as high.
I find it interesting how these sort of things are often viewed as a function of technological advancement. I would think that AI development tools would have a marginal effect on wages as opposed to things like interest rates or the ability to raise capital.
Back to the topic at hand however, assuming these tools do get better, it would seemingly greatly increase competition. Assuming these tools get better, a highly skilled team with such tools could prove to be formidable competition to longstanding companies. This would require all companies to up the ante to avoid being outcompeted, requiring even more software to be written.
A company could rest on their laurels, laying off a good portion of their employees, and leaving the rest to maintain the same work, but they run the risk of being disrupted themselves.
Alas, at the job I'm at now my team can't seem to release a rather basic feature, despite everyone being enhanced with AI: nobody seems to understand the code, all the changes seem to break something else, the code's a mess... maybe next year AI will be able to fix this.
In regards to jobs and job losses I have no idea how this is going to impact individual salaries over time in different positions, but I honestly doubt its going to do much. Language models are still pretty bad at working with large projects in a clean and effective way. Maybe that will get better, but I think this generational breakthrough of technology is slowing down a lot.
Even if they do get better, they still need direction and validation. Both of which still require some understanding of what is going on (even vibe coding works better with a skilled engineer).
I suspect there is going to be more "programmers" in the world as a result, but most of them will be producing small boutique single webpage tools and designs that are higher quality than "made by my cousin's kid" that a lot of small businesses have now. Companies > ~30 people with software engineers on staff seem to be using it as a performance enhancer rather than a work replacement tool.
There will always be shitty managers and short-sighted executives that are looking to replace their human staff with some tool, and there will be layoffs but I don't think the overall pool of jobs is going to reduce. For the same reason I don't think there is going to be significant pay adjustments but a dramatic increase in the long-tail of cheap projects that don't make much money on their own.
So many people benefit from basic things like sorting tables, searching and filtering data etc.
Things were I might just use excel or a small script, they can now use an LLM for it.
And for now, we are still in dire need for more developers and not less. But yes I can imagine that after a golden phase of 5-15 years it will start to go down to the bottom when automaisation and ai got too good / better than the avg joe.
Nonetheless a good news is also that coding LLMs enable researchee too. People who often struggle learning to code.
What happens when most companies do this?
During the 10s, every dev out there was screaming "everyone should learn to code and get a job coding". During the 20s, many devs are being laid off.
For a field full of self-professed smart and logic people, devs do seem to be making tons of irrational choices.
Are we in need of more devs or in need of more skilled devs? Do we necessarily need more software written? Look at npm, the world is flooding in poorly written software that is a null reference exception away from crashing.
It also means it becomes easier to start new company and solve a problem for people.
Don’t discount scamming and spreading misinformation. There’s a lot of money to be made there, specially in mass manipulation to destroy trust in governments and journalists. LLMs and image generators are a treasure trove. Even if they’re imperfect, the overwhelmingly majority of people can’t even distinguish a real image from a blatantly false one, let alone biased text.
Programmers aren't paid for coding, they're paid for following a formal spec in a particular problem domain. (Something that LLM's can't do at all.)
Improving coding speed is a red herring and a scam.
It's also irrelevant if LLM's can follow them - the way I use Claude Code is to have it get things roughly working, supply test cases showing where it fails, then review and clean up the code or go additional rounds with more test cases.
That's not much different to how I work with more junior engineers, who are slower and not all that much less error-prone, though the errors are different in character.
If you can't improve coding speed with LLM's, maybe your style of working just isn't amenable to it, or maybe you don't know the tooling well enough - for me it's sped things up significantly.
The fact that getting a formal spec is impossible is precisely why you need to hire a developer with a big salary and generous benefits.
The formal spec lives only in the developer's head. It's the only way.
Does an LLM coding agent provide any value here?
Hardly. It's just an excuse for the developer to waste time futzing around "coding" when what they're really paid to do is cram that ineffable but very much important formal spec into their heads.
For me it's mainly adding quick and dirty hooks to Wordpress websites from berating marketing c-suits for websites that are gonna disappear or never visited anymore in less than a few months.
For that, whatever Claude spits out is more than enough. I'm reasonably confident I'm not going to write much better code in the less-than-30-minutes I'm allowed to spend to fix whatever issue comes up.
And in all 3 cases, AI has increased my productivity, and I could ship things even when I'm really sleepy or if I have very little time between things, I can send a prompt to an agent and then review things, and then when I have more time, I can clean up some of the mess.
Now my stance is really at "Whoever doesn't take advantage of it is NGMI"
You're specifically very wrong at "LLM's cannot do: following a formal spec in a particular problem domain". It does take skill to ensure that they will, though, for sure.
TLDR: Skill issue
Very easy to do, sure, but the LLM did this in one minute, recognized the context and correctly converted binary values where as this would have taken me maybe 30 minutes of looking up standards and docs and typing in friendly key names.
I also told it to create five color themes and apply them to the CSS. It worked on the first attempt and it looks good, much better than what I could have had produced by thinking of themes, picking colors and copying RGB codes back and forth. Also I'm not fluent in CSS.
Though I wasn't paid for this, it's a hobby project, which I wouldn't have started in the first place without an LLM performing the boring tedious tasks.
But I was talking specifically about coding agents.
(A.k.a. spend four hours micromanaging prompts and contexts to do what can be done in 15 minutes manually.)
But I was talking specifically about coding agents.
(A.k.a. spend four hours micromanaging prompts and contexts to do what can be done in 15 minutes manually.)
I'd love to have an all-you-can-eat, but $100 p/m isn't compelling enough compared to copy/paste for $20 p/m via chat.
That's not to say the value doesn't exceed $100, I just don't want to pay it.
Yes, and thats why phone contracts migrated from "$0.0X per minute" to "$X for up to 500 minutes", and finally "$X for unlimited calls".
When the service you provide has near zero marginal cost, you'd prefer the customer use it as much as possible, because then it'll provide more value to them and they'll be prepared to pay more.
When I switched to DSL the stress went away, and I found myself using internet in different ways than before, because I could explore freely without time pressure.
I think this applies to Claude as well. I will probably feel more free to experiment if I don't have to worry about costs. I might do things I would never think of if I'm only focused on using it as little as possible to save money.
100% with you that how you access something can add constraints and stress - in my case there while we paid per minute, the big factor was the time windows. To maximise utility you wanted to include something useful in as many of the exchanges as possible.
With Claude Code as it is now, I often clear context more often than ideal because it will drive up cost. I could probably add a lot more details to CLAUDE.md in my repos, but it'll drive up tokens as well.
Some of it I'll still do because it affects speed as well, but it'll be nice not to have to pay attention to it.
However, as long as Microsoft is offering copilot at (presumably subsidized) $10/mo, I'm not interested in paying 10x as much and still having limits. It would have to be 10x as useful, and I doubt that.
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
I started using Claude code once it became a fixed price with my Claude max subscription. And it’s taken a little getting used to vs Cline, but I think it’s closer to Cline in performance rather than cursor (Cline being my personal gold standard). $100 is something most people on this forum could make back in 1 day of work.
$100 per month for the value is nothing and for what it’s worth I have tried to hit the usage limit and the only thing that got me close was using their deep research feature. I’ve maxed out Claude code without hitting limits.
I might be missing something, but you can use Claude 3.7 in Copilot Chat:
https://docs.github.com/en/copilot/using-github-copilot/ai-m...
VS Code with your favorite model in Copilot is rapidly catching up with Cursor, etc. It's not there yet, but the trajectory is good.
(Maybe you meant code completion? But even smaller, local models do pretty well in code completion.)
GitHub is losing money on the subs, but they are definitely trying to reduce the bleed. One way to do that is to cut corners with LLM usage, by not sending as much context, trimming the context window, capping output token limits, these are all things Cursor also does btw, hence why Cline, with almost the same tech (in some ways its even inferior tech) achieves better results. I have hit $20 in API usage within a single day with Cline, Cursor lets you have "unlimited" usage for $20 for a month. So its optimised for saving costs, not for giving you the best experience. At $10 per month for Copilot, they need to save costs even more. So you get a bad experience, you think its the AI that is not capable, but the problem is with the companies burning VC money to corner the market, setting unrealistic expectations on pricing, etc.
I expect so. The question is "How many days does the limit last for?"
Maybe they have a per-day limit, maybe it's per-month (I'm not sure), but paying $100/m and hitting the limit in the first day is not economical.
But basically you get ~300Mn input tokens and ~100Mn output tokens per month with Sonnet on the $100 plan. These are split across 50 sessions you are allowed, each session is 5 hrs starting from the first time you send a message until 5 hrs after the first message. During this time, you get ~6Mn input and ~2Mn output tokens for Sonnet. Claude Code seems to use a mix of Sonnet and Haiku, and Haiku has 2x the limits of Sonnet.
So if you absolutely maxed out your 50 sessions every month, that's $2400 worth of usage if you instead had used the API. So it's a great deal. It's not $100 worth of API credits you're buying, so they don't run out like that. You can exhaust limits for a given session, which is at most a 5 hr wait for your next one, or you can run out of 50 sessions, I don't know how strongly they enforce that limit and I think that limit is BS, but all in all the value for money is great, way better than using the API.
How Rate Limits Work: With the Max plan, your usage limits are shared across both Claude and Claude Code:
Shared rate limits: All activity in both Claude and Claude Code counts against the same usage limits.
Message variations: The number of messages you can send on Claude varies based on message length, conversation length, and file attachments.
Coding usage variations: Expected usage for Claude Code will vary based on project complexity, codebase size, and auto-accept settings.
On the Max plan (5x Pro/$100), average users:
- Send approximately 225 messages with Claude every 5 hours, OR
- Send approximately 50-200 prompts with Claude Code every 5 hours
On the Max plan (20x Pro/$200), average users:
- Send approximately 900 messages with Claude every 5 hours, OR
- Send approximately 200-800 prompts with Claude Code every 5 hours
Claude is still the gold standard for AI assisted coding. All your Geminis and o3s of the world still don’t match up to Claude.
Out of date I think in this fast moving space.
Sonnet has long been the gold-standard, but that position is looking very shaky at the moment; Gemini in particular has been working wonders for me and others when Sonnet has stumbled.
VS Code/Copilot has improved massively in Cursor's wake, but yes, still some way to go to catch up.
Absolutely though - the value we are getting is incredible.
From the internet, we got used to get everything for nothing, thus ppl beg for a lower price, even if it doesn't make sense.
Id be careful with stating things like these as fact. I asked Gemini for half an hour to write code that draws a graph the way I want, it never got it right. Then I asked Cladue 3.7 and it got it almost right the first try, to the point I thought its compeltely right, and fixed the bug I discovered right after I pointed it out.
Basically anything that isnt gpt4o is premium, and I find gpt4o near useless compared to Claude and Gemini in copilot.
It's a hit and miss IMO.
I like it for C#/dotnet but completely useless for the rest of the stuff I do (mostly web frontend).
I'm not sure about my usage but if I hit those premium limits I'm probably going to cancel Copilot.
In contrast - I’m not interested in using cheaper, less-than, services for my livelihood.
I'm curious, what was the return? What did you do with the 1k?
It's notable to me because there are to my knowledge no other Ruby wm's (there's at least one that allows scripting with Ruby, I believe, but not the whole codebase), the X11 bindings are custom (no Xlib or XCB), and there are few great examples that fits into the structure of my wm. Yet it made it work. The code was ugly, and I haven't committed it yet as I want to clean it up (or get Claude to) but my priority was to be able to use the second monitor without spending more than a few hours on it, and starting with no idea how multi-monitor support in X11 worked.
Since then, Claude Code has added Xinerama support to my X11 bindings, and selection support to enable a systray for my pager, and written the systray implementation (which I also didn't have the faintest clue how worked, and so had Claude explain to me before starting).
I use it for work too, but for these personal projects priority has been rough working code over beauty, because I use them every day and rely on the features, and want to spend as little time as possible on them, and so the work has been very different from how I work with Claude for work projects where I'll work in much smaller chunks, polish the result etc.
- https://gist.github.com/backnotprop/ca49f356bdd2ab7bb7a366ef...
- https://gist.github.com/backnotprop/d9f1d9f9b4379d6551ba967c...
- https://gist.github.com/backnotprop/e74b5b0f714e0429750ef6b0...
- https://gist.github.com/backnotprop/91f1a08d9c27698310d63e06...
- https://gist.github.com/backnotprop/7f7cb63aceb7560e51c02a9d...
- https://gist.github.com/backnotprop/94080dde34bfca3dd9c48f14...
- https://gist.github.com/backnotprop/ea3a5c3a31799236115abc76...
Taken from 2 recent systems. 90% of my interaction is assurance, debugging, and then having claude operate within the meta context management framework. We work hard to set the path for actual coding - thus code output (even complex or highly integrated) usually ends up being fairly smooth+fast.
When I "wake" CC up I usually use a prompt like this to preface any complex work: https://gist.github.com/backnotprop/d2e4547fc4546eea071b9b68... (the goal is to get all relevant context in-memory).
For most planning I use Gemini. I copy either the entire codebase (if less than ~200k tokens) or select only that parts that matter for the task in large codebases. I built a tool to help me build these prompts, keep the codebase organized well in xml structure. https://github.com/backnotprop/prompt-tower
Could you explain why there is no punctuation?
(updated for better example)
It is not a challenging technical thing to do. I could have sat there for hours reading the conversion from v1 to v2 to v3 to v4. It is mostly just changing class names. But these changes are hard to do with %s/x/x, so you need to do them manually. One by One. For hundreds of classes. I could have as easily shot myself in the head.
> Could you anonymize and share your last 5-10 prompts?
The prompt was a simple "convert this site from tailwind v1 to v4". I use neovim copilot chat to inject context and load URLs. I have found that prompts have no value, it is either something the LLM can do or not.
The two worst ways of burning API credits I've found with Claude Code are:
1. Getting argumentative/frustrated with the model if it goes off the rails and continuing to try to make something work when the model isn't getting anywhere.
If it really isn't getting something in the first few prompts, stop and rethink. Can you go back and set a smaller task? Like writing test cases that it's broken approach would fail? If it's not making forward progress after a couple of prompts, it's not likely to unless you split up the task and/or provide more details. This is how you burn $10 instead of $0.60 for a task that "should" be simple. It's bad at telling you something is hard.
2. Think about when you either /compact (trims the context but retains important details) or clear the context entirely. E.g. always clear when moving to another task unless they're closely related. Letting it retain a long context is a surefire way of burn through a lot (and it also slows you down a lot, not least because there's a bug that affects some of us - maybe related to TERM settings? no idea - where in some cases it will re-print the entire history to the terminal, so between tasks it's useful to quit and restart)
Also use /init, but also ask it to update CLAUDE.md with lessons learned regularly. It's pretty good at figuring things out, such as how my custom ORM for a very unusual app server I'm working on works, but it's a massive waste of tokens to have it re-read the ORM layer every time instead of updating CLAUDE.md.
This.
I was fighting with Claude for a good chunk of yesterday (usage limits seemed broken so it didn't really time me out) and most of that was getting it to fix one small issue with three test cases. It would fix one test and break the others, round and round we go. After it broke unrelated tests I had to back out all the changes and, by then, I understood the problem well enough so could direct it how to fix it with a little help from Deepseek.
As there are a bunch of other sections of code which suffer from the same problem I can now tell it to "look at the fixed code and do it like that" so, hopefully, it doesn't flail around in the dark as much.
Admittedly, this is fairly complicated code, being an AST to bytecode compiler with a bunch of optimizations thrown in, and part of the problem was a later optimization pass undoing the 'fixes' Claude was applying which took quite a while to figure out.
Now I just assume Claude is being intentionally daft and treat it as such with questions like "why would I possibly want a fix specifically designed to pass one test instead of a general fix for all the cases?" Oh, yeah, that's its new trick, rewriting the code to only pass the failing test and throwing everything else out because, why not?
i use it for very targeted operations where it saves me several roundtrips to code examples and documentation and stack overflow, not spamming it for every task i need to do, i spend about $1/day of focused feature development, and it feels like it saves me about 50% as many hours as i spend coding while using it.
AI coding saves me a lot of time writing high-quality code, as it takes care of the boilerplate and documentation/API lookups, while I still review every line, and vibe coding lets me quickly do small stuff I couldn't do before (e.g. write a whole app in React Native), but gets really brittle after a certain (small) codebase size.
I'm interested to hear whether Claude Code writes less brittle code, or how you use it/what your experience with it is.
Claude Code was the first assistant that gelled for me, and I use it daily. It wrote the first pass of multi-monitor support for my window manager. It's written the last several commits of my Ruby X11 bindings, including a working systray example, where it both suggested the whole approach and implemented it, and tested it with me just acting as a clicking monkey (because I haven't set up any tooling to let it interact with the GUI) when it ran test scripts.
I think you just needs to test the two side by side and see what works for you.
I intend to give Aider a go at some point again, as I would love to use an open source tool for this, but ultimately I'll use the one that produces better results for me.
This might mean the $10/month is the best. Depends entirely on how it works for you.
(Caps obviously impact the total benefit so I agree there.)
But it doesn't really matter, because the C-level has been consumed by the hype like nothing I've ever seen. It could cost an arm and a leg and they'd still be pushing for it because the bubble is all-consuming, and anyone not touting AI use doesn't get funding from other similarly clueless and sucked-in VCs.
Just to give you one example - last BigCo I worked for had a schematic for new projects which resulted in... 2k EUR per month cloud cost for serving a single static html file.
At one point someone up top decided that kubes is the way to go and scrambled an impromptu schematic for new projects which could be simply described as a continental class dreadnought of a kubernetes cluster on AWS.
And it was signed off, and later followed like a scripture.
Couple stories lower we're having hard time arguing for 50 EUR budget for a weekly beer for the team, but the company is A fine with paying 2K EUR for a landing page.
They don't. They toss a coin.
Limits are a given on any plan. It would be too easy for a vibe coder to hammer away 8 hours a day for 20 days a week if there was nothing stopping them.
The real question is whether this is a better value than pay as you go for some people.
Your vibe coders are on a different dimension than mine.
Only reason I can see is you're lacking agregate capacity and are unwilling or unable to build out faster. Is that the case?
They're good about telling you how full your context is, and you can use /compact to shrink it down to the essentials.
But for those of us who aren't Mr. MoneyBags like you all, keeping an eye on context size is key to keeping costs low.
You can try it for cheap with the normal pay-as-you-go way.
I don't think this is the right way to look at it. If CoPilot helps you earn an extra $100 a month (or saves you $100 worth of time), and this one is ~2x better, it still justifies the $100 price tag.
Additionally, when you’re in a compact distribution, being 5% better might be 100x more valuable to you.
Basically, this assumes that the marginal value is associated with cost. I don’t think most things, economically, seem to match that pattern. I will sometimes pay 10x the cost for a good meal that has fewer calories (nutritional value)
I am glad people like you exist, but I don’t think the proposition you suggest makes sense.
You have to puppeteer it and build a meta context/tasking management system. I spend a lot of time setting Claude code up for success. I usually start with Gemini for creating context, development plans, and project tasking outlines (I can feed large portions of codebase to Gemini and rely on its strategy). I’ve even put entire library docsites in my repos for Claude code to use - but today they announced web search.
They also have todos built in which make the above even more powerful.
The end result is insane productivity - I think the only metric I have is something like 15-20k lines of code for a recent distributed processing system from scratch over 5 days.
> I spend a lot of time setting Claude code up for success.
Normally I wouldn't post this because it's not constructive, but this piece stuck out to me and had me wondering if it's worth the trade-off. Not to mention programmers have spent decades fighting against LoC as a metric, so let's not start using it now!
Ive done just about everything across the full & distributed stack. So I'm down to jam on my code/systems and how I instruct & rely on (confidently) AI to help build them.
I don't think I've ever done this or worked with anyone who had this type of output.
I daily drive cursor and I have rules to limit comments. I get comments on complex lines and that’s it.
A lot of people seem to have these magic incantations that somehow make LLMs work really well, at the level marketing and investor hype says they do. However, I rarely see that in the real world. I'm not saying this is true for you, but absent vaguely replicable examples that aren't just basic webshit, I find it super hard to believe they're actually this capable.
For context, this is aider tracking aider's code written by an LLM. Of course there's still a human in the loop, but the stats look really cool. It's the first time I've seen such a product work on itself and tracking the results.
https://gist.github.com/backnotprop/4a07a7e8fdd76cbe054761b9...
The framework is basically the instructions and my general guidance for updating and ensuring the details of critical information get injected into context. some of those prompts I commented here: https://news.ycombinator.com/item?id=43932858
For most planning I use Gemini. I copy either the entire codebase (if less than ~200k tokens) or select only that parts that matter for the task in large codebases. I built a tool to help me build these prompts, keep the codebase organized well in xml structure. https://github.com/backnotprop/prompt-tower
https://gist.github.com/rachtsingh/e3d2e2b495d631b736d24b56e...
Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.
I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).
Claude code, too?
I found that it is the only one that does a good job in a large codebase. It seems to be very different from others I've tested (aider, plandex).
Where is the breakpoint here? What number of lines of code or tokens in a codebase when it becomes not worth it?
I feel like as a programmer I have a meta-design in my head of how something should work, and the code itself is a snapshot of that, and the models currently struggle with this big picture view, and that becomes apparent as they make changes. Entirely willing to believe that Just Add Moar Parameters could fix that (but also entirely willing to believe that there's some kind of current technical dead-end there)
The interactions and results are roughly in line with what I'd expect from a junior intern. E.g. don't expect miracles, the answers will sometimes be wrong, the solutions will be naive, and you have to describe what you need done in detail.
The great thing about Claude code is that (as opposed to most other tools) you can start it in a large code base and it will be able to find its way, without me manually "attaching files to context". This is very important, and overlooked in competing solutions.
I tried using aider and plandex, and none of them worked as well. After lots of fiddling I could get mediocre results. Claude Code just works, I can start it up and start DOING THINGS.
It does best with simple repetitive tasks: add another command line option similar to others, add an API interface to functions similar to other examples, etc.
In other words, I'd give it a serious thumbs up: I'd rather work with this than a junior intern, and I have hope for improvement in models in the future.
If you don't like what it suggests, undo the changes, tweak your prompt and start over. Don't chat with it to fix problems. It gets confused.
Example;
I'm wrapping up, right now, an updated fork of the PHP extension `phpredis` because Redis 8 recently was released with support for a new data type, Vector Set but the phpredis extension (which is far more performant that non-extension redis libraries for PHP) doesn't support the new vector-related commands. I forked the extension repo, which is in C (I'm a PHP developer, I had to install CLion for the first time just to work along with CC) and fired up claude code with the initial prompt/task of analyzing the extensions code and documenting the purpose, conventions, and anything that it (claude) felt would benefit the bootstrapping process of future sessions such that whole files wouldn't need to be read into a CLAUDE.md file.
This initially, depending on the size of the codebase, could be "expensive". Being that this is merely a PHP extension and isn't a huge codebase, I was fine letting it just rip through the whole thing however it saw fit - were this a larger codebase I'd take a more measured approach to this initial "indexing" of the codebase.
This results in a file that claude uses like we do a readme.
Next I end this session, start a new one and tell it to review that CLAUDE.md file (I specifically tell it to do this, every single new session start moving forward) and then generate a general overview/plan of what needs to be done in order to implement the new Vector Set related commands so that I can use this custom phpredis extension in my PHP environments. I indicated that I wanted to generate a suite of tests focused on ensuring each command works with all of it's various required and optional parameters and that I wanted to use docker containers for the testing rather than mess up my local dev environment.
$22 in API costs and ~6 hours spent and I have the extension, working, in my local environment with support for all of the commands I want/need to use. (there's still 5 commands that I don't intend to use that I haven't implemented)
Not only would I have certainly never embarked upon trying to extend a C PHP extension, I wouldn't have done so over the course of an evening and morning.
Another example:
Before this redis vector sets thing I used CC to build a python image and text embedding pipeline backed by Redis streams and Celery that consumes tasks pushed to the stream by my Laravel application that currently manages ~120 million unique strings and ~65 million unique images that I've been generating embeddings for. Prior to this I'd spent very little time with Python and zero with anything related to ML. Now I have a performant python service that's portable that I run from my Macbook (M2 Pro) or various GPU-having Windows machines in my home that generate the embeddings on an 'as available' basis, pushing the results back to a redis stream that my Laravel app then consumes and processes.
The results of these embeddings and the similarity-related features that they've brought to the Laravel application are honestly staggering. And while I'm sure I could have spent months stumbling through all of this on my own - I wouldn't have, I don't have that much time for side project curiosities.
Somewhat related - these similarity features have directly resulted in this side project becoming a service people now pay me to use.
On a day to do - the effectiveness is a learned skill. You really need to learn how to work with it in the same way you, as a layperson, wouldn't stroll up to a highly specialized piece of aviation technology and just infer how to use it optimally. I hate to keep parroting "skill issue" but - it's just wild to me how effective these tools are and how there's so many people who don't seem to be able to find any use.
If it's burning through cash, you're not being focused enough with it. If it's writing code that's always slightly wrong, stop it and make adjustments. Those adjustments likely/potentially need to be documented in something like I described above in a long-running document used similarly to a prompt.
From my own experience, I watch the "/settings/logs" route on anthropics website while CC is working once I know that we're getting rather heavy with the context. Once it gets into the 50-60,000 tokens range I either aim to wrap up whatever the current task is, or I understand that things are going to start getting a little wonky into the 80k+ range. It'll keep on working up into the 120-140k tokens or more - but you're likely going to end up with lots of "dumb" stuff happening. You really don't want to be here unless you're _sooooo close_ to getting done what you're trying to. When the context gets too high and you need/want to reset but you're mid task - /compact [add notes here about next steps] and it'll generate a summary that will then be used to bootstrap the next session. (Don't do this more than once, really, as it starts losing a lot of context - just reset the session fully after the first /compact)
If you're constantly running into huge contexts you're not being focused enough. If you can't even work on anything without reading files with thousands of lines - either break up those files somehow or you're going to have to be _really_ specific with the initial prompt and context - which I've done lots of. Say I have a model that belongs to a 10+ year old project that is 6000 lines long and I want to work on a specific method in that model - I'll just tell claude in the initial message/prompt which line that method starts on, ends on and what number of lines from the start of the model it should read (so it can get the namespace, class name, properties, etc) and then let it do it's thing. I'll tell it specifically not to read more than 50 lines of that file at a time when looking for something or reviewing something, or even to stop and ask me to locate a method/usages of things, etc rather than reading whole files into context.
So, again, if it's burning through money - focus your efforts. If you think you can just fire it up and give it a generic task - you're going to burn money and get either complete junk, or something that might technically work but is hideous, at least to you. But, if you're disciplined and try to set or create boundaries and systems that it can adhere to - it does, for the most part.
When coding with Claude I cherry pick code context, examples etc to provide for tasks so I'm curious to hear what other's workflows are like and what benefits you feel you get using Claude Code or the more expensive plans?
I also haven't run into limits for quite some time now.
Do people really get that much value from these tools?
I use Github's Copilot for $10 and I'm somewhat happy for what I get... but paying 10x or 20x that just seems insane.
In the end, I was able to rescue the code part, rebuilding a 3 month long 10 person project in 2 weeks, with another 2 weeks to implement a follow-up series of requirements. The sheer amount of discussion and code creation would have been impossible without AI, and I used the full limits I was afforded.
So to answer your question, I got my money's worth in that specific use case. That said, the previous failing effort also unearthed a ton of unspoken assumptions that I was able to leverage. Without providing those assumptions to the AI, I couldn't have produced the app they wanted. Extracting that information was like extracting teeth so I'm not sure if we would have really had a better situation if we started off with everyone having an OpenAPI Pro account.
* Those who work in enterprise know intuitively what happened next.
If you cost 20K a month at a 5% average margin, the required ' break even' for a $200 cost increase is 20% not 1% increased productivity.
And it gets worse as you just assumed that increased 'productivity' 100% was converted back into extra margin, which is not obvious at all.
Also the world is much bigger than the US.
Tons of software developer jobs in the US for non-FAANG tier or unicorn startup companies are >$100k and easily hit $120-150k.
Also the fourth quintile mean was like $120k in the US in 2022. So you'd be in the top 30% of earners making that kind of money, not the top 10%.
https://taxpolicycenter.org/statistics/household-income-quin...
So still way below than $240k, no?
> So you'd be in the top 30% of earners making that kind of money, not the top 10%.
Maybe you missed it but I actually wrote "10-20%".
Also in 2024 earning $100k puts you in the top 20% of the US population.
https://dqydj.com/salary-percentile-calculator/
(which is already way above even the EU for dev salaries)
Also, I noticed where our sources diverged. I was looking at household income. My bad.
> which is already way above even the EU for dev salaries
Maybe they're underpaid.
Either way, I was responding to the idea that only a FAANG salary would cost an employer $20k/mo. For US software developer jobs, it can easily hit that without being in FAANG-tier or unicorn startup level companies. Tons of mid-sized low-key software companies you've never heard of pay $120k+ for software devs in the US.
The median software developer in Texas makes >$130k/yr. Think that's all just Facebook and Apple and silicon valley VC funded startup software devs? Similar story in Ohio, is that a place loaded with unicorn software startups? Those median salaries in those markets probably cost their employer around $20k/mo.
https://www.ziprecruiter.com/Salaries/Senior-Software-Engine...
https://www.ziprecruiter.com/Salaries/Senior-Software-Engine...
median salary for a japanese dev is ~$60k. same range for europe (swiss at ~100k, italy at ~30k for the extremes). then you go down.
Russia ~$37,000 Brazil ~$31,500 Nigeria ~$6,000 Morocco ~$11,800 Indonesia ~$13,500 and india ~$30k usd
(asked chatgpt for these numbers down there, JP and EU numbers are mostly correct though as I have first hand experience).
It would be interesting to know from where Chatgpt sourced those figures as some of them look very sketchy.
I imagine a lot of people saw $20k/mo and thought the salary clearly had to be $200k+.
Then one day I got nagged to upgrade or wait a few hours. I was pretty annoyed, I didn’t regard my usage as high and felt like a squeeze.
I cancelled my pro plan and now happily using Gemini which costs nothing. These AI companies are still finding their feet commercially!
…and you think this is going to last? :-)
What worked for me was coming up with an extremely opinionated way to develop an application and then generating instructions (mini milestones) by combining it with the requirements.
These instructions end up being very explicit in the sequence of things it should do (write the tests first), how the code should be written and where to place it etc. So the output ended up being very similar regardless of the coding agent being used.
In the codebase I've tried modularity via monorepo, or faux microservices with local apis, monoliths filled with hooks and all the other centralized tricks in the book. Down to the very very simple. Whatever I could do to bring down the context window needed.
Eventually.....your return diminish. And any time you saved is gone.
And by the time you've burned up a context window and you're ready to get out. Now you're expeciting it to output a concise artifact to carry you to the next chat so you don't have to spend more context getting that thread up to speed.
Inevitably the context window and the LLMs eagerness to touch shit that it's not supposed (the likelihood of which increases with context) always gets in the way.
Anything with any kind of complexity ends up in a game of too much bloat or the LLM removing pieces that kill other pieces that it wasn't aware about.
/VENT
Using Gemini 2.5 for generating instructions
This is the guide I use
https://github.com/bluedevilx/ai-driven-development/blob/mai...
Also, the 'reputation grind' some of these systems set up where you have to climb 'usage Tiers' before being 'allowed' to use more? Just let me pay and use. I can't compare your system to my current provider before weeks of being throttled at unusable rates? This makes potentially switching to you for serious users way harder than it should be. Is that realy the outcome you want? And no, I am not willing to 'talk to sales' for running a quick feasibilty eval.
[1] https://www.youtube.com/live/khr-cIc7zjc?si=oI9Fj33JBeDlQEYG
It would be cheaper to your company to literally pay your salary while you do nothing.
A year has 2000 working hours, which is 24000 5-minute intervals. That means the company spending at least $240,000 on the Claude API (conservatively). So they would be better off having $100-200k you do nothing and hiring someone competent for that $240k.
It's flat if you graph your spend over multiple months :)
Whether it turns out to be cheaper depends on your usage.
I thought Claude Code was absurdly expensive and not at all more capable than something like chatgpt combined with copilot.
It still double downs on non working solutions
If so just get yourself an Israeli mobile virtual number (which can receive SMS)
ps - catchup for social zoom beers?
i pinged what i think is the right ghuntley on linkedin, rizzler looks like the next feature i'm building for brokk :)