A fun trend on the "small web" is the use of 88x31 badges that link to friends websites or in webrings. I have a few on my website, and you can browse a ton of small web websites that way.
A beautiful trend that has been going for 30 years ;-)
One of the happiest moments of my childhood (I'm exagerating) was when my button was placed in that website that I loved to visit everyday. It was one of the best validations I ever received :)
One objection I have to the kagi smallweb approach is the avoidance of infrequently updated sites. Some of my favorite blogs post very rarely; but when they post it's a great read. When I discover a great new blog that hasn't been updated in years I'm excited to add it to my feed reader, because it's a really good signal that when they publish again it will be worth reading.
I'm with you. Also, sometimes I'm specifically looking for some dusty old site that has long been forgotten about. Maybe I'm trying to find something I remember from ages ago. Or maybe I'm trying to deeply research something.
There's a lot more to fixing search than prioritizing recency. In fact, I think recency bias sometimes makes search worse.
Cool to see Gemini mentioned here. A few years back I created Station, Gemini's first "social network" of sorts, still running today: https://martinrue.com/station
This is a specific definition of "small web" which is even narrower than the one I normally think of. But reading about Gemini, it does make me wonder if the original sin is client-side dynamism.
We could say: that's Javascript. But some Javascript operates only on the DOM. It's really XHR/fetch and friends that are the problem.
We could say: CSS is ok. But CSS can fetch remote resources and if JS isn't there, I wonder how long it would take for ad vendors to have CSS-only solutions...or maybe they do already?
Kagi Small Web has about 32K sites and I'd like to think that we have captured most of (english speaking) personal blogs out there (we are adding about 10 per day and a significant effort went into discovering/fidning them).
It is kind of sad that the entire size of this small web is only 30k sites these days.
I think the article briefly touches on an important part: people still write blogs, but they are buried by Google that now optimizes their algorithm for monetization and not usefulness.
Anyone interested in seeing what the web when the search engines selects for real people and not SEO optimized slop should check out https://marginalia-search.com .
It's a search engine with the goal of finding exactly that - blogs, writings, all by real people. I am always fascinated by what it unearths when using it, and it really is a breath of fresh air.
It's currently funded by NLNet (temporarily) and the project's scope is really promising. It's one of those projects that I really hope succeeds long term.
The old web is not dead, just buried, and it can be unearthed. In my opinion an independent non monetized search engine is a public good as valuable as the internet archive.
So far as I know marginalia is the only project that instead of just taking google's index and massaging it a bit (like all the other search engines) is truly seeking to be independent and practical in its scope and goals.
Regarding the financials, even though the second nlnet grant runs out in a few weeks, I've got enough of a war chest to work full time probably a good bit into 2029 (modulo additional inflation shocks). The operational bit is self-funding now, and it's relatively low maintenance, so if worse comes to worst I'll have to get a job (if jobs still exist in 2029, otherwise I guess I'll live in the shameful cardboard box of those who were NGMI ;-).
Whether the results are less relevant or not depends massively on what you searched and whether the best results even exist in the Marginalia search index or not.
If Google is ranking small web results better than Marginalia, that’s actionable.
If the best result isn’t in the index and it should be, that’s actionable.
Well to be fair, Marginalia is also developed by 1 guy (me), and Google has like 10K people and infinite compute they can throw at the problem. There has been definite improvements, and will be more improvements still, but Google's still got hands.
> Google that now optimizes their algorithm for monetization and not usefulness.
I don't think they do that. Instead, "usefulness" is mostly synonymous with commercial intent: searching for <x> often means "I want to buy <x>".
Even for non-commercial queries, I think the sad reality is that most people subconsciously prefer LLM-generated or content-farmed stuff too. It looks more professional, has nice images (never mind that they're stock photos or AI-generated), etc. Your average student looking for an explanation of why the sky is blue is more interested in a TikTok-style short than some white-on-black or black-on-gray webpage that gives them 1990s vibes.
TL;DR: I think that Google gives the average person exactly the results they want. It might be not what a small minority on HN wants.
Google and most search engines optimize for what is most likely to be clicked on. This works poorly and creates a huge popularity bias at scale because it starts feeding on its own tail: What major search engines show you is after all a large contributor to what's most likely to be clicked on.
The reason Marginalia (for some queries) feels like it shows such refreshing results is that it simply does not take popularity into account.
> I think that Google gives the average person exactly the results they want.
There is some truth in this, but to me it's similar to saying that a drug dealer gives their customers exactly what they want. People "want" those things because Google and its ilk have conditioned them to want those things.
It's easy to hand-curate a list of 5,000 "small web" URLs. The problem is scaling. For example, Kagi has a hand-curated "small web" filter, but I never use it because far more interesting and relevant "small web" websites are outside the filter than in it. The same is true for most other lists curated by individual folks. They're neat, but also sort of useless because they are too small: 95% of the things you're looking for are not there.
The question is how do you take it to a million? There probably are at least that many good personal and non-commercial websites out there, but if you open it up, you invite spam & slop.
I mainly use Kagi Small Web as a starting point of my day, with my morning coffee. Especially now when categories are added, always find something worth reading. The size here does not present a problem as I would usually browse 20-30 sites this way.
Right, but that basically works as a retro alternative to scrolling through social media. If you're looking for something specific, it's simultaneously true that there's a small web page that answers your question and that it's not on any "small web" list because the owner of the webpage never submitted it there, or didn't meet the criteria for inclusion.
For example, I have several non-commercial, personal websites that I think anyone would agree are "small web", but each of them fails the Kagi inclusion criteria for a different reason. One is not a blog, another is a blog but with the wrong cadence of posts, etc.
mm, yeah. I like the idea of the small web not as a size category but as a mindset. people publishing for the sake of sharing rather than optimizing for attention or monetization.
The fediverse is also generally experienced as a small web, where it comes to mindset. Though that is not always to the liking or preference of those expecting to find alternatives to big church social media platforms.
If it takes off in any amount, then LLMs will just subscribe and pull said data from sites at a reasonable pace (or not, it's free so make many accounts).
I moved my site to Gemini on sdf.org, I find it far easier to use and maintain. I also mirror it on gopher. Maintaining both is still easier than dealing with *panels or hosting my own. There is a lot of good content out there, for example:
gemini://gemi.dev/
FWIW, dillo now has plugins for both Gemini and Gopher and the plugins work find on the various BSDs.
Small Web, Indie Web and Gemini are terminally missing the point. The web in the 90s was an ecosystem that attracted people because of experimentation with the medium, diversity of content and certain free-spirited social defaults. It also attracted attention because it was a new, exciting and rapidly expanding phenomenon. To create something equivalent right now you would need to capture those properties, rather then try to revive old visual styles and technology.
For a while I hoped that VR will become the new World Wide Web, but it was successfully torpedoed by the Metaverse initiative.
There's an element of nostalgia, certainly but it's also a reaction to the overwhelmingly commercial web. Why not build something instead of scrolling through brief videos interspersed with more and more ads that follow you everywhere?
Large companies have helped build the web but they've done at least as much, if not more, to help kill it.
It's about capturing the noncommerciality, not the experimentation. Most of the small web sites are just blogs, a solved problem by now, but there's interesting content in many of them.
I'm a dinosaur who bemoans the loss of whatever-it-was we had prior to the mass exploitation and saturation of the web today, so I feel it's my duty to check out Gemini and stop complaining. I'm prepared to trade ease of use or some modern functionality for better content and less of what the internet has become.
Not quite. I think Gemini has deliberately gone for a "text only" philosophy, which I think is very constraining.
The early web had a lot going on and allowed for a lot of creative experimentation which really caught the eye and the imagination.
Gemini seems designed to only allow long-form text content. You can't even have a table let alone inline images which makes it very limited for even dry scientific research papers, which I think would otherwise be an excellent use-case for Gemini. But it seems that this sort of thing is a deliberate design/philosophical decision by the authors which is a shame. They could have supported full markdown, but they chose not to (ostensibly to ease client implementation but there are a squillion markdown libraries so that assertion doesn't hold water for me)
It's their protocol so they can do what they want with it, but it's why I think Gemini as a protocol is a dead-end unless all you want to do is write essays (with no images or tables or inline links or table-of-contents or MathML or SVG diagrams or anything else you can think of in markdown). Its a shame as I think the client-cert stuff for Auth is interesting.
https://varun.ch (at the bottom of the page)
There's also a couple directories/network graphs https://matdoes.dev/buttons https://eightyeightthirty.one/
One of the happiest moments of my childhood (I'm exagerating) was when my button was placed in that website that I loved to visit everyday. It was one of the best validations I ever received :)
There's a lot more to fixing search than prioritizing recency. In fact, I think recency bias sometimes makes search worse.
We could say: that's Javascript. But some Javascript operates only on the DOM. It's really XHR/fetch and friends that are the problem.
We could say: CSS is ok. But CSS can fetch remote resources and if JS isn't there, I wonder how long it would take for ad vendors to have CSS-only solutions...or maybe they do already?
It is kind of sad that the entire size of this small web is only 30k sites these days.
I would expect a raw link in the top bar to the page shown, to be able to bookmark it etc.
Anyone interested in seeing what the web when the search engines selects for real people and not SEO optimized slop should check out https://marginalia-search.com .
It's a search engine with the goal of finding exactly that - blogs, writings, all by real people. I am always fascinated by what it unearths when using it, and it really is a breath of fresh air.
It's currently funded by NLNet (temporarily) and the project's scope is really promising. It's one of those projects that I really hope succeeds long term.
The old web is not dead, just buried, and it can be unearthed. In my opinion an independent non monetized search engine is a public good as valuable as the internet archive.
So far as I know marginalia is the only project that instead of just taking google's index and massaging it a bit (like all the other search engines) is truly seeking to be independent and practical in its scope and goals.
Regarding the financials, even though the second nlnet grant runs out in a few weeks, I've got enough of a war chest to work full time probably a good bit into 2029 (modulo additional inflation shocks). The operational bit is self-funding now, and it's relatively low maintenance, so if worse comes to worst I'll have to get a job (if jobs still exist in 2029, otherwise I guess I'll live in the shameful cardboard box of those who were NGMI ;-).
If Google is ranking small web results better than Marginalia, that’s actionable.
If the best result isn’t in the index and it should be, that’s actionable.
I don't think they do that. Instead, "usefulness" is mostly synonymous with commercial intent: searching for <x> often means "I want to buy <x>".
Even for non-commercial queries, I think the sad reality is that most people subconsciously prefer LLM-generated or content-farmed stuff too. It looks more professional, has nice images (never mind that they're stock photos or AI-generated), etc. Your average student looking for an explanation of why the sky is blue is more interested in a TikTok-style short than some white-on-black or black-on-gray webpage that gives them 1990s vibes.
TL;DR: I think that Google gives the average person exactly the results they want. It might be not what a small minority on HN wants.
The reason Marginalia (for some queries) feels like it shows such refreshing results is that it simply does not take popularity into account.
There is some truth in this, but to me it's similar to saying that a drug dealer gives their customers exactly what they want. People "want" those things because Google and its ilk have conditioned them to want those things.
On the other hand, we could probably convince Cory Doctorow to write a piece about how fentanyl is really about the enshitification of opiates.
The question is how do you take it to a million? There probably are at least that many good personal and non-commercial websites out there, but if you open it up, you invite spam & slop.
For example, I have several non-commercial, personal websites that I think anyone would agree are "small web", but each of them fails the Kagi inclusion criteria for a different reason. One is not a blog, another is a blog but with the wrong cadence of posts, etc.
gemini://gemi.dev/
FWIW, dillo now has plugins for both Gemini and Gopher and the plugins work find on the various BSDs.
For a while I hoped that VR will become the new World Wide Web, but it was successfully torpedoed by the Metaverse initiative.
Large companies have helped build the web but they've done at least as much, if not more, to help kill it.
The early web had a lot going on and allowed for a lot of creative experimentation which really caught the eye and the imagination.
Gemini seems designed to only allow long-form text content. You can't even have a table let alone inline images which makes it very limited for even dry scientific research papers, which I think would otherwise be an excellent use-case for Gemini. But it seems that this sort of thing is a deliberate design/philosophical decision by the authors which is a shame. They could have supported full markdown, but they chose not to (ostensibly to ease client implementation but there are a squillion markdown libraries so that assertion doesn't hold water for me)
It's their protocol so they can do what they want with it, but it's why I think Gemini as a protocol is a dead-end unless all you want to do is write essays (with no images or tables or inline links or table-of-contents or MathML or SVG diagrams or anything else you can think of in markdown). Its a shame as I think the client-cert stuff for Auth is interesting.