At the end of the day, all I ever seem to use is the chat completion API with structured outputs turned on. Despite my "basic" usage, I am employing tool use, recursive conversations, RAG, etc. I don't see the value in outsourcing state management of my "agent" to a 3rd party. I have way more autonomy if I keep things like this local.
The entire premise of these products is that you are feeding a string literal into some black box and it gives you a new string. Hopefully, as JSON or whatever you requested. If you focus just on the idea of composing the appropriate string each time, everything else melts away. This is the only grain that really matters. Think about other ways in which we compose highly-structured strings based upon business state stored in a database. It's literally the exact same thing you do when you SSR a webpage with PHP. The only real difference is how it is served.
I haven't really found any agent framework that gives me anything I need above a simple structured gen call.
As you say, most requests to LLMs are (should be?) prompt-in structure-out, in line with the Unix philosophy of doing precisely one thing well.
Agent frameworks are simply too early. They are layers built to abstract a set of design patterns that are not common. We should only build abstractions when it is obvious that everyone is reinventing the wheel.
In the case of agents, there is no wheel to invent. It's all simple language model calls.
I commonly use the phrase "the language model should be the most boring part of your code". You should be spending most of your time building the actual software and tooling -- LLMs are a small component of your software. Agent frameworks often make the language model too large a character in your codebase, at least for my tastes.
huh? sample code please? this should not be true since Structured Outputs came out - literally prevented from generating invalid json
You have to set 'strict' to True manually to use the same grammar-based sampling they use for structured outputs.
https://platform.openai.com/docs/guides/function-calling?api...
If you built on the Assistant API, maybe take the hint and don't just rewrite to the Responses API? Own your product, black box the LLM-of-the-day.
Is it actually the case that OpenAI couldn't be viable if all they offered was a simple chat completion API on top of the web experience?
It seems to me the devil is all in how the margin plays out. I'd focus on driving down costs and pushing boundaries on foundation models. If you are always a half step ahead, highly reliable and reasonably cheap, your competitors will have a tough time. Valuations can be justified if businesses begin to trust the roadmap and stability of the services.
I'll tell you what's not working right now is the insane model naming scheme and rapid fire vision changes. This kind of stuff is spooking the technology leaders of large prospective customers. Only the most permanently online people can keep things straight. Everyone was super excited and on board with AI in 2024 because who wants to be left out. I think that energy is still justified in many ways, but we've also got to find a way to meet more of the customer base where they are currently at. Wrappers and agentic SDKs are not what these people are looking for. Many F500s already have a gigantic development team who can deal with deep, nasty API integrations and related state contraptions. They're looking for assurances/evidence that OAI's business & product line will remain stable for the next 5+ years before going all-in.
> When using Chat Completions, the model always retrieves information from the web before responding to your query. To use web_search_preview as a tool that models like gpt-4o and gpt-4o-mini invoke only when necessary, switch to using the Responses API.
Porting over to the new Responses API is non-trivial, and we already have history, RAG and other things an assistant needs already.
i can understand them trying to prevent their business from becoming a commodity but i don't see that working out for them except for some short term buzz, but others will run with their ideas in domain specific applications
Don’t be fooled by moving state management to somewhere other than your business logic unless it enables a novel use case (which these SDKs do not)
With that said, glad to see the agentic endpoints available but still going to be managing my state this way
But you can't expect them not to try.
So - are people forming relationships with OAI which include an SLA, and if so - what do those look like?
Sounds exactly like “the cloud”, especially AWS. Basically “get married to our platform, build on top of it, and make it hard to leave.” The benefits are that it’s easy to get started. And also that they invested in the infrastructure, but now they are trying to lock you in by storing as much state and data as possible with them withoit an easy way to migrate. So, increase your switching costs. For social networks the benefit was that they had the network effect but that doesn’t apply here.
Yeah they keep pushing higher-level services, but the uptake of these is extremely limited. If you used something like SageMaker, which has an extremely high lock-in factor, it's probably because you're an old school company that don't know what you're doing and AWS held your developer's hand to get the Hello World-level app working, but at least you got your name printed in their case study materials of the project at the end.
I think OpenAI looks at AWS and thinks they can do better. And for their investors, they must do better. But in the end I think the commoditization of LLMs is already almost complete, and this is just a futile attempt to fight it.
Here's the alternative link for people who aren't signed in to Twitter: https://nitter.net/athyuttamre/status/1899541471532867821
The current AI agent approach appears to be permutations of the joke about how people will make AI to expand their once sentence to a long nice e-mail and the AI on the receiving end will summarize that long e-mail into single sentence.
I get that there's a use case for automating tasks on legacy systems but IMHO the real opportunity is the opportunity to remove most of the legacy systems.
Humans are not that bad you know? Is it creating UIs for humans using AI then make AI use these UI to do stuff really the way forward?
How many hand crafted, clay bowls, baked in a human powered kiln are you using everyday? Or how many weaved baskets, made out of hand picked sticks?
History has showed that anything that can be automated, will be automated. And everything that can be made "cheaper" or "faster" will as well.
Why would you want to have your swipes on Tinder and your trip planning to Rio be automated through human interface? If it was for legit reasons it would have happened as from machine to machine communications. I'm big fan of the AI agent concept, my objection is that at in its current state people don't think out of the box and propose using the current infrastructure to delegate human functions instead of re-imagining the new world that is possible when working together with AI.
Ah, my bad I missread your initial post.
If I now understand what you're saying, I think there's a paralel in manufacturing, where "custom made bots" on assembly line will win against "humanoid bots" every time. The problem there is that you have to first build the custom-made bots, and they only work on that one task. While a "humanoid" bot can, in theory, do more general things with tools already in place for humans.
I think specialised APIs and stuff will eventually be built for AI agents. But in the meantime everyone wants to be first to market, and the "human facing" UI/UX is all we have. So they're trying to make it work with what's available.
They just need to go a few steps back and evaluate why this system was needed in first place. Awful lot of software and all kinds of interfaces exists only to accommodate humans who need to be in the loop when working with machines and are not actually needed if you are taking the human out of the loop. You can be taking humans off the loop for legit or nefarious reasons and when its legit there's usually opportunity to coordinate with the other machines to remove the people specific parts to make things more efficient.
In programming this is even more evident, i.e %100 of the programming libraries exist only to make developers do stuff easier or prevent re-inventing the wheel.
The part about making the developers life easier is quite substential and can be removed by making the AI write the exact code needed to accomplish the task without bothering with human developer accommodations like libraries to separate the code into modules for maintainability.
For example, as a supplemental user experience that power users in your org can leverage to macro out client configuration and project management tasks in a B2B SaaS ecosystem. Tool use can be very reliable when you have a well constrained set of abstractions, contexts and users to work with.
but yes, it's the strongest anti-developer move to not directly support MCP. not surprised given OpenAI generally. but would be a very nice addition!
I wish they'd done a smaller launch of it and gather feedback rather than announcing a supposed new standard which feels a lot like a wrapper.
This here is atrocious https://github.com/modelcontextprotocol/quickstart-resources... It includes this mcp PyPI package which pulls in a bunch of other PyPI dependencies. And for some reason they say "we recommend uv". How is that related to just setting up a tool for an AI to use?
Compare that to this get weather example: https://api-docs.deepseek.com/guides/function_calling/
It makes me not want to use Claude/Anthropic.
> [Q] Does the Agents SDK support MCP connections? So can we easily give certain agents tools via MCP client server connections?
> [A] You're able to define any tools you want, so you could implement MCP tools via function calling
in short, we need to do some plumbing work.
relevant issue in the repo: https://github.com/openai/openai-agents-python/issues/23
Querying schema from prompt is great, but also being able to say "I cannot see the Create Project button on the projects list screen. Use MCP to see if user with email me@domain.com has the appropriate permissions" is just amazing.
This SDK is trying to provide a bunch of code for implementing specific agent codebases. There are a bunch of open source ones already, so this is OpenAI throwing their hat in the ring.
IMO this OpenAI release is kind of ecosystem-hostile in that they are directly competing with their users, in the same way that the GPT apps were.
It does not specify how “agentic” systems interact with each other. Depending on what you mean there.
https://github.com/slavakurilyak/awesome-ai-agents
CrewAI is a popular VC-backed one, but two that I think are kind of interesting in the open source space are:
https://github.com/i-am-bee/beeai-framework
https://github.com/lastmile-ai/mcp-agent
... However I think the vast majority of "AI Agent" use-cases in practice right now are actually just workflows, and imo dify is great for those:
https://github.com/langgenius/dify
[edit] worth mentioning [langfuse](https://github.com/langfuse/langfuse), which is more like a platform that addresses the observability/evals/prompt management piece of the puzzle as opposed to a full-on "agent framework". In practice I have not yet run into a case where I needed something like what OpenAI just released, nor crewAI etc (despite it feeling like those cases may be coming)
https://latent.space/p/openai-agents-platform
main fun part - since responses are stored for free by default now, how can we abuse the Responses API as a database :)
other fun qtns that a HN crew might enjoy:
- hparams for websearch - depth/breadth of search for making your own DIY Deep Research
- now that OAI is offering RAG/reranking out of the box as part of the Responses API, when should you build your own RAG? (i basically think somebody needs to benchmark the RAG capabilities of the Files API now, because the community impression has not really updated from back when Assistants API was first launched)
- whats the diff between Agents SDK and OAI Swarm? (basically types, tracing, pluggable LLMs)
- will the `search-preview` and `computer-use-preview` finetunes be merged into GPT5?
> Notably, our SDK is compatible with any model providers that support the OpenAI Chat Completions API format.
so you can use with everything, not only OpenAI?
I’ve noticed that with longer responses (particularly involving latex), models are a lot less accurate when the results need to be additionally encoded into JSON.
I like structured, but my preference is yaml/markdown, as it is a lot more readable (and the only thing that works with longer responses, latex or code generation).
one of the main reasons i build these ai search tools from scratch is that i can fully control the depth and breadth (and also customize loader to whatever data/sites). and currently the web search isn't very transparent on what sites they do not have full text or just use snippets.
having computer use + websearch is definitely something very powerful (openai's deep research essentially)
To ease into it, I added the entire SDK with examples and full documentation as a single text file in my repo [2] so you can quickly get up to speed be adding it to a prompt and just asking about it or getting some quick start code to play around with.
The code in my repo is very modular so you can try implementing any module using one of the other frameworks to do a head-to-head.
Here’s a blog post with some more thoughts on this SDK [3] and some if its major capabilities.
I’m liking it. A lot!
[1] https://github.com/dazzaji/agento6
[2] https://raw.githubusercontent.com/dazzaji/agento6/refs/heads...
[3] https://www.dazzagreenwood.com/p/unleashing-creativity-with-...
I wonder what justifies this drastic difference in price.
The new Responses API is a step in the right direction, especially with the built-in “handoff” functionality.
For agentic use cases, the new API still feels a bit limited, as there’s a lack of formal “guardrails”/state machine logic built in.
> “Our goal is to give developers a seamless platform experience for building agents”
It will be interesting to see how they move towards this platform, my guess is that we’ll see a graph-based control flow in the coming months.
Now there are countless open-source solutions for this, but most of them fall short and/or add unnecessary obfuscation/complexity.
We’ve been able to build our agentic flows using a combination of tool calling and JSON responses, but there’s still a missing higher order component that no one seems to have cracked yet.
BTW I have something somewhat similar to some of this like Responses and File Search in MindRoot by using the task API: https://github.com/runvnc/mindroot/blob/main/api.md
Which could be combined with the query_kb tool from the mr_kb plugin (in my mr_kb repo) which is actually probably better than File Search because it allows searching multiple KBs.
Anyway, if anyone wants to help with my program, create a plugin on PR, or anything, feel free to connect on GitHub, email or Discord/Telegram (runvnc).
Kind of annoying that they've made a bunch of tiny changes to the history format though. It doesn't seem to change anything important, and only serves to make existing code incompatible.
I wrote a wrapper around it that works in a web browser (you'll need an OpenAI API key): https://github.com/uhsealevelcenter/IDEA
Also very nice of them to include extensible tracing. The AgentOps integration is a nice touch to getting behind the scenes to understand how handoffs and tool calls are triggered
otoh, they've dropped prices for everything else a ton previously so maybe they will for this as well
https://github.com/openai/openai-python/blob/main/examples/r...
The chunking strategy is... pretty basic, but I guess we'll see if it works well enough for enough people.
- assistant-like and thread-like objects to the responses api
- async responses
- code interpreter in responses
once we do this, we'll share a migration guide that allows you to move over without any loss of features or data. we'll also give you a full 12 months to do your migration. feel free to reach out at nikunj[at]openai.com if you have any questions about any of this, and thank you so much for building on the assistants api beta! I think you'll really like responses api too!
Web Search [0]
* $30 and $25 per 1K queries for GPT‑4o search and 4o-mini search.
File search [1]
* $2.50 per 1K queries and file storage at $0.10/GB/day
* First 1GB is free.
Computer use tool (computer-use-preview model) [2]
* $3 per 1M input tokens and $12/1M output tokens.
[0] https://platform.openai.com/docs/pricing#web-searchI have a hard time seeing how this API is better than https://www.anthropic.com/news/model-context-protocol.
It seems like the motivation was "how can we make more money", rather than "how can we be more useful for our users".
I also wrote a script that searches the web and works pretty well (using the vercel ai sdk)[1]
[0] - https://brave.com/search/api/
[1] - https://gist.github.com/bramses/41e90b27d156590154bcefd4119f...
It seems completely upside down, they always said traditional search was cheaper/less intensive, I guess a lot of tokens must go into the actual LLM searching and retrieving.
As an engineer, I have to manage the cost/service ratio manually, making sure I charge enough to handle my traffic, while enforcing/managing/policing the usage.
Additionally, there are customers who already pay for OpenAI, so the value add for them is less, since they are paying twice for the underlying capabilities.
If OpenAPI had a billing API/platform ala AppStore/PlayStore, I have multiple price points matched to OpenAI usage limits (and maybe configurable profit margins).
For customers that don't have an existing relationship with me, OpenAI could support a Netflix/YouTube-style profit-sharing system, where OpenAI customers can try out and use products integrated with the billing platform/API, and my products would receive payment in accordance with customer usage...
Two, yes, many people will pay $20/mo for ChatGPT and then also pay for a product that under the hood uses OpenAI API. If you're worried about your product's value not being differentiated from ChatGPT, I'd say you have a product problem moreso than OpenAI has a billing model problem.
But they could just make great services and live in the infra layer instead of trying to squeeze everyone out at the application layer. Seems unnecessarily ecosystem-hostile
Ok I’ll start: an agent is a computer program that utilized LLMs heutiger for decision making.
As an example, I can provide a system prompt that mentions a function like get_weather() being available to call. Then, I can pass whatever my user's prompt text is and the LLM will determine what code I need to call on the back-end.
So if a user types "What is the weather in Nashville?" the LLM would infer that the user is asking about weather and reply to me with a string like "call function get_weather with location Nashville" or if you prompted it, some JSON like { function_to_call: 'get_weather', location: 'Nashville' }. From there, I'd just call that function with any the data I asked the LLM to provide.
I personally strongly prefer the term "bots" for what most of these frameworks call "agents"
- Workflows are systems where LLMs and tools are orchestrated through predefined code paths. (imo this is what most people are referring to as "agents")
- Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
https://www.anthropic.com/engineering/building-effective-age...