This is actually very cool. Not really replacing a browser, but it could enable an alternative way of browsing the web with a combination of deterministic search and prompts. It would probably work even better as a command line tool.
A natural next step could be doing things with multiple "tabs" at once, e.g: tab 1 contains news outlet A's coverage of a story, tab 2 has outlet B's coverage, tab 3 has Wikipedia; summarize and provide references. I guess the problem at that point is whether the underlying model can support this type of workflow, which doesn't really seem to be the case even with SOTA models.
(I'm not affiliated with them; just saw them in the sponsorship section of a Kurzgesagt video the other day and figured they're doing the thing you described +/- UI differences.)
I was thinking of showing multiple tabs/views at the same time, but only from the same source.
Maybe we could have one tab with the original content optimised for cli viewing, and another tab just doing fact checking (can ground it with google search or brave). Would be a fun experiment.
In your cleanup step, after cleaning obvious junk, I think you should do whatever Firefox's reader mode does to further clean up, and if that fails bail out to the current output. That should reduce the number of tokens you send to the LLM even more
You should also have some way for the LLM to indicate there is no useful output because perhaps the page is supposed to be a SPA. This would force you to execute Javascript to render that particular page though
Interestingly, the original idea of what we call a "browser" nowadays – the "user agent" – was built on the premise that each user has specific needs and preferences. The user agent was designed to act on their behalf, negotiating data transfers and resolving conflicts between content author and user (content consumer) preferences according to "strengths" and various reconciliation mechanisms.
(The fact that browsers nowadays are usually expected to represent something "pixel-perfect" to everyone with similar devices is utterly against the original intention.)
Yet the original idea was (due to the state of technical possibilities) primarily about design and interactivity. The fact that we now have tools to extend this concept to core language and content processing is… huge.
It seems we're approaching the moment when our individual personal agent, when asked about a new page, will tell us:
Well, there's nothing new of interest for you, frankly:
All information presented there was present on pages visited recently.
-- or --
You've already learned everything mentioned there. (*)
Here's a brief summary: …
(Do you want to dig deeper, see the content verbatim, or anything else?)
Because its "browsing history" will also contain a notion of what we "know" from chats or what we had previously marked as "known".
I can definitely see a future in which we are qch have our own personal memetic firewall, keeping us safe and cozy in our personal little worldview bubbles.
It would have to have a pretty good model of my brain to help me make these decisions. Just as a random example, it will have to understand that an equation is a sort of thing that I’m likely to look up even if I understand the meaning of it, just to double check and get the particulars right. That’s an obvious example, I think there must be other examples that are less obvious.
Or that I’m looking up a data point that I already actually know, just because I want to provide a citation.
Well we should first establish some sort of contract how to convey the "I feel that I actually understand this particular piece of information, so when confronted with it in the future, you can mark is as such". My lines of thought were more about a tutorial page that would present the same techniques as course you have finished a week prior, or news page reporting on an event you just read about on a different news site a minute before … stuff like this … so you wold potentially save the time skimming/reading/understanding only to realise there was no added value for you in that particular moment. Or while scrolling through a comment section, hide comment parts repeating the same remark, or joke.
Or (and this is actually doable absolutely without any "AI" at all):
What the bloody hell actually newly appeared on this particular URL since my last visit?
(There is one page nearby that would be quite unusable for me, had I not a crude userscript aid for this particular purpose. But I can imagine having a digest about "What's new here?" / "Noteworthy responses?" would be way better.)
For the "I need to cite this source", naturally, you would want the "verbatim" view without any amendments anyway. Also probably before sharing / directing someone to the resource, looking at the "true form" would be still pretty necessary.
Not any more "sentient" than existing LLMs even in the limited chat context span are already.
Naturally, »nothing new of interest for you« here is indeed just a proxy for »does not involve any significant concept that you haven't previously expressed knowledge about« (or how to put it), what seems pretty doable, provided that contract of "expressing knowledge about something" had been made beforehand.
Let's say that all pages you have ever bookmarked you have really grokked (yes, a stretch, no "read it later" here) - then your personal model would be able to (again, figuratively) "make qualified guess" about your knowledge. Or some kind of tag that you could add to any browsing history entry, or fragment, indicating "I understand this". Or set the agent up to quiz you when leaving a page (that would be brutal). Or … I think you got the gist now.
Classic that the first example is for parsing the goddamn recipe from the goddamn recipe site. Instant thumbs up from me haha, looks like a neat little project.
Which it apparently does by completely changing the recipe in random places including ingredients and amounts thereof. It is _indeed_ a very good microcosm of what LLMs are, just not in the way these comments think.
It was actually a bit worse than that the LLM never got the full recipe due to some truncation logic I had added. So it regurgitated the recipe from training, and apparently, it couldn't do both that and convert units at the same time with the lite model (it worked for just flash).
I should have caught that, and there are probably other bugs too waiting to be found. That said, it's still a great recipe.
What do you mean? The recipes in the screenshot look more or less the same, the formatting has just changed in the Spiegel one (which is what was asked for, so no surprises there).
Edit: just saw the author's comment, I think I'm looking at the fixed page
The output was then posted to the Internet for everyone to see, without the minimal amount of proofreading that would be necessary to catch that, which gives us a good microcosm of how LLMs are used.
On a more pleasant topic the original recipe sounds delicious, I may give it a try when the weather cools off a little.
There are extensions that do that for you, in a deterministic way and not relying on LLMs. For example, Recipe Filter for chrome. It just shows a pop up over the page when it loads if it detects a recipe
I definitely like the LLM in the middle, it’s a nice way to circumvent the SEO machine and how Google has optimized writing in recent years. Removing all the cruft from a recipe is a brilliant case for an LLM. And I suspect more of this is coming: LLMs to filter. I mean, it would be nice to just read the recipe from HTML, but SEO has turned everything into an arms race.
> Removing all the cruft from a recipe is a brilliant case for an LLM
Is it though, when the LLM might mutate the recipe unpredictably? I can't believe people trust probabilistic software for cases that cannot tolerate error.
I foreseen this a couple years ago. We already have web search tools in LLMs, and they are amazing when they chain multiple searches. But Spegel is a completely different take.
I think the ad blocker of the future will be a local LLM, small and efficient. Want to sort your timeline chronologically? Or want a different UI? Want some things removed, and others promoted? Hide low quality comments in a thread? All are possible with LLM in the middle, in either agent or proxy mode.
I wonder if you could use a less sophisticated model (maybe even something based on LSTMs) to walk over the DOM and extract just the chunks that should be emitted and collected into the browsable data structure, but doing it all locally. I feel like it'd be straightforward to generate training data for this, using an LLM-based toolchain like what the author wrote to be used directly.
Unfortunately in the modern web simply walking the DOM doesn't cut it if the website's content loads in with JS. You could only walk the DOM once the JS has loaded, and all the requests it makes have finished, and at that point you're already using a whole browser renderer anyway.
It would be cool of it were smart enough to figure out whether it was necessary to rewrite the page on every visit. There's a large chunk of the web where one of us could visit once, rewrite to markdown, and then serve the cleaned up version to each other without requiring a distinct rebuild on each visit.
Cache headers exist for servers to communicate to clients how long it safe to cache things for. The client could be updated to add a cache layer that respects cache headers.
Each user have distinct needs, and has a distinct prior knowledge about the topic, so even the "raw" super clean source form will probably be eventually adjusted differently for most users.
But yes, having some global shared redundant P2P cache (of the "raw" data), like IPFS (?) could possibly help and save some processing power and help with availability and data preservation.
If the goal is to have a more consistent layout on each visit, I think we could save the last page's markdown and send it to the model as a one-shot example...
The author says this is for “personalized views using your own prompts.” Though, I suppose it’s still useful to cache the outputs for the default prompt.
People here are not realizing that html is just the start. If you can render a webpage into a view, you can render any input the model accepts. PDF to this view. Zip file of images to this view. Giant json file into this view. Whatever. The view is the product here, not the html input.
Very cool! My retired AI agent transformed live webpage content, here's an old video clip of transforming HN to My Little Pony (with some annoying sounds): https://www.youtube.com/watch?v=1_j6cYeByOU. Skip to ~37 seconds for the outcome. I made an open-source standalone Chrome extension as well, it should probably still work for anyone curious: https://github.com/joshgriffith/ChromeGPT
Welcome to 2025 where it's more reasonable to filter all content through an LLM than to expect web developers to make use of the semantic web that's existed for more than a decade. . .
Serioisly though, looks like a novel fix for the problem that most terminal browsers face. Namely that terminals are text based, but the web, whilst it contains text, is often subdivided up in a way that only really makes sense graphically.
I wonder if a similar type of thing might work for screen readers or other accessibility features
Gosh. Lovely project and cool, and - likewise - a bit scary: This is where the "bubble" seals itself "from the inside" and custom (or cloud, biased) LLMs sear the "bubble" in.-
The ultimate rose (or red, or blue or black ...) coloured glasses.-
Super neat - I did something similar on a lark to enable useful "web browsing" over 1200 baud packet - I have Starlink back at my camp but might be a few miles away, so as long as I can get line of sight I can Google up stuff, albeit slow. Worked well but I never really productionalized it beyond some weekend tinkering.
The main problem with these approaches is that most sites now are useless without JS or having access to the accessibility tree. Projects like browser-use or other DOM based approaches at least see the DOM(and screenshots).
I wonder if you could turn this into a chrome extension that at least filters and parses the DOM
I actually made a CLI tool recently that uses Puppeteer to render the page including JS, summarizes key info and actions, and enables simple form filling all from a CLI menu. I built it for my own use-cases (checking and paying power bills from CLI), but I'd love to get feedback on the core concept: https://github.com/jadbox/solomonagent
Thanks, it's alpha at the moment- next feature is complex forms and bug fixing broken actions (downloading). Do give it a spin! Welcome to contribute or drop feedback on the repo :)
True for stuff requiring interaction, but to help their LCP/SEO lots of sites these days render plain html first. It's not "usable" but for viewing it's pretty good
The worst[1] part about losing my job last month was having to take LinkedIn seriously, and the best[2] part about now having found a new job is logging off LinkedIn, for a very long time hopefully. The self-aggrandising, pretentious, occasionally virtue signalling, performance-posting make me want to throw up. It takes a considerable amount of effort on my part to not make sarcastic shitposts, but in the interest of self preservation, I restrain myself. My header picture, however, is my extremely messy desk, full of electronics, tools, test equipment, drawings, computers and coffee cups. Because that's just how I work when I'm in the zone, and it serves as a quiet counterpoint to the polished self-promotion people do.
And I didn't even get the new job through LinkedIn, though it did yield one interview.
We’ll probably have to add some custom code to log in, get an auth token, and then browse with it. Not sure if LinkedIn would like that, but I certainly would.
I have been thinking of a project extremely similar to this for a totally different purpose. It’s lovely to see something like this. Thank you for sharing it, inspiring
The web has existed for long before javascript was around.
The web was useful for long before javascript was around.
I literally hate javascript -- not the language itself but the way it is used. It has enabled some pretty cool things, yes. But javascript is not required to make useful webpages.
I think you misunderstood him. Yes, it’s possible to CREATE a useful webpage without JavaScript, but many EXISTING webpages rely on JavaScript to be functional.
If Amazon.com can work with JavaScript disabled, any site could be rewritten to do without. But I think to even get to the content on a lot of SPAs this would need to be running a headless browser to render the page, before extracting the static content unfortunately
No - an experiment: try disabling javascript in your browser settings, and then whenever you see a webpage that isn't working, enable javascript for that domain. You'd be surprised how fast 90% of the web feels with JS disabled.
Changes Spegel made to the linked recipe's ingredients:
Pounds of lamb become kilograms (more than doubling the quantity of meat), a medium onion turns large, one celery stalk becomes two, six cloves of garlic turn into four, tomato paste vanishes, we lose nearly half a cup of wine, beef stock gets an extra ¾ cup, rosemary is replaced with oregano.
Great catch. I was getting ready to mention the theoretical risk of asking an LLM be your arbiter of truth; it didn't even occur to me to check the chosen example for correctness. In a way, this blog post is a useful illustration not just of the hazards of LLMs, but also of our collective tendency to eschew verity for novelty.
> Great catch. I was getting ready to mention the theoretical risk of asking an LLM be your arbiter of truth; it didn't even occur to me to check the chosen example for correctness.
It's beyond parody at this point. Shit just doesn't work, but this fundamental flaw of LLMs is just waved away or simply not acknowledged at all!
You have an algorithm that rewrites textA to textB (so nice), where textB potentially has no relation to textB (oh no). Were it anything else this would mean "you don't have an algorithm to rewrite textA to textB", but for gen ai? Apparently this is not a fatal flaw, it's not even a flaw at all!
I should also note that there is no indication that this fundamental flaw can be corrected.
Fantastic catch! It led me down a rabbit hole, and I finally found the root cause.
The recipe site was so long that it got truncated before being sent to the LLM. Then, based on the first 8000 characters, Gemini hallucinated the rest of the recipe, it was definitely in its training set.
I have fixed it and pushed a new version of the project. Thanks again, it really highlights how we can never fully trust models.
That would violate the do-one-thing-and-do-it-well principle for no apparent benefit. There are plenty of tools to convert markdown to basic HTML already.
You could also use headless selenium under the hood and pipe to the model the entire Dom of the document after the JavaScript was loaded. Of course it would make it much slower but also would amend the main worry people have which is many websites will flat out not show anything in the initial GET request.
You have a point as it uses Gemini under the hood. However, the moment Google introduces ads in the model users will run away. So Google really has no opportunity here to inject ads.
And wouldn't it be ironic if Gemini was used to strip ads from webpages?
I did something similar, but with a chrome extension. Basically, for every web page, I feed the HTML to a local LLM (well, on a server in my basement). I ask it to consider if the content is likely clickbait or can be summarized without losing too many interesting details, and if so, it adds a little floating icon to the top of the page that I can click on to see the summary instead.
My next plan is to rewrite hyperlinks to provide a summary of the page on hover, or possibly to rewrite the hyperlinks to be more indicative of the content at the end of it(no more complaining about the titles of HN posts...). But, my machine isn't too beefy and I'm not sure how well that will work, or how to prioritize links on the page.
Couldn't this time reasonably well on a local machine is you have some kind of neutral processing chip and enough ram? Conversion to MD shouldn't require a huge model.
Because the modern web isn't reliably HTML, it's "web apps" with heavy use of JavaScript and API calls. To first display the HTML that you see in your browser, you need a user agent that runs JavaScript and makes all the backend calls that Chrome would make to put together some HTML.
Some websites may still return some static upfront that could be usefully understood without JavaScript processing, but a lot don't.
That's not to say you need an LLM, there are projects like Puppeteer that are like headless browsers that can return the rendered HTML, which can then be sent through an HTML to Markdown filter. That would be less computationally intensive.
Because this isn't just converting HTML to markdown. I'd recommend taking another look at the website and particularly read the recipe example as it demonstrates the goal of the project pretty well.
I think the project itself is really cool, that said I really don't like the trend of having LLMs regurgitate content back to us. That said, this kinda makes me think of Browsh, who took the opposite approach and tries to render the HTML in the terminal (without LLMs as far as I know)
this is another layer of abstraction on top of an already broken system. you're running html through an llm to get markdown that gets rendered in a terminal browser. that's like... three format conversions just to read text. the original web had simple html that was readable in any terminal browser already. now they arent designed as documents anymore but rather designed as applications that happen to deliver some content as a side effect
That's the world we live in. You can either not have access to content or you must accept abstractions to remove all the bad decisions browser vendors have forced on us the last 30 years to support ad-browsing.
If the web site is a SPA that is hydrated using an API it would be conceivable that the LLM can build a reusable interface around the API while taking inspiration from the original page. That interface can then be stored in some cache.
I'm not saying it's necessarily a good idea but perhaps a bad/fun idea that can inspire good ideas?
Think of it as a secretary that is transforming and formatting information. You may desire for the original medium to be something like what you want but you don’t get that so you can get a cheap dumber secretary instead.