Show HN: Quality News – Towards a fairer ranking algorithm for Hacker News

(news.social-protocols.org)

140 points | by manx 263 days ago

19 comments

  • PaulHoule 263 days ago
    When my RSS reader shows me an arXiv paper about ML with ‘fair’ in the title I hit the reject button. What is ‘fair’ is subjective and what I want is a feed relevant to my interests (also subjective.)

    This is 2023 and text classification problems that I struggled with at a startup 5 years ago are now easy and the power of transformer models is obscured by the ChatGPT hype. It is time that we turn our back in the collaborative filtering algorithms that made social media a hellscape and embrace content-based filtering.

    I have a model that predicts if an article will front page or get a high ratio of comments/votes. It has a terrible ROCAUC because it is such a fuzzy problem but it is well calibrated and just today my RSS reader told me a story I thought was a nothingburger would succeed on both metrics and… It did!

    I did make an attempt to take into account the factors you’re concerned about and I was surprised that the AUC didn’t go up. Probably I did it wrong though.

    Look up my profile, I’d love to chat about it.

    • jwarden 263 days ago
      Yes we understand taking issue with the word fair. But we should say we mean fair in a very specific way. We would say our algorithm is more fair in the sense that it, in some ways, it more fairly reflects the intent of the HN community as revealed by their upvote behavior. We talk about this more in the Readme: https://github.com/social-protocols/news#readme
    • mgraczyk 263 days ago
      I don't agree that "fair" is inherently subjective. There are many sensible ways to objectively define fair. For example you could say HN ranking is "fair" if the probability any viewer would upvote that article is independent of its position-history on HN. That is an objective definition that is "fair" with respect to positions.

      This and other notions of "fairness" are very common problems in ranking (I used to rank things on Instagram) that have to be addressed, even if you're only doing content based ranking.

      • JohnFen 263 days ago
        "Fair" is inherently subjective because, in a vacuum, everyone has a subjective definition of the word. It can legitimately mean many different things, after all. People will choose the definition they read it as according to their own subjective experiences.

        I reacted poorly to the use of the word "fair" here, too, because I didn't see how "fairness" really entered into it. Naturally, you can provide a specific definition of the word to make it objectively measurable, but if you use a word before you've said what your definition is, people are going to use the common definition -- and therefore, it's subjective.

        • mistermann 263 days ago
          The word "is" is also subjective in this context, since "subjective" is not a binary.
    • nico 263 days ago
      > It is time that we turn our back in the collaborative filtering algorithms that made social media a hellscape and embrace content-based filtering.

      Except people deeply care about what other people are doing.

      That was the whole point of Google’s Pagerank algorithm.

      So, it might not be what you personally want. But to a lot of people, it’s more important to read/consume something popular (ie. that a lot of others care about), rather than something related to their own interests.

      • jwarden 263 days ago
        Further in community like HN, it’s not just about collaborative filtering to find things that you as an individual will like. It is about focusing the collective attention of the community on a small set of topics to drive rich discussions.
      • ClapperHeid 263 days ago

          >So, it might not be what you personally want. But to a lot of people, it’s more important to read/consume something popular...
        
        Count me in the "don't care" group. Which is why I always browse HN by "New" [0]. I neither know nor care which stories have been voted to prominence on the default frontpage, nor how this ranking has been determined. I just want to see what's new and select for myself what looks interesting.

        [0] https://news.ycombinator.com/newest

        • jwarden 263 days ago
          I would suggest that one of the reasons you find the submissions on the New page valuable is that submitters are actively seeking out stories they think the HN community would upvote. So the content of the New page indirectly reflects popularity within the HN community.
          • ClapperHeid 262 days ago
            Not really. There's a hell of a lot of wading through junk, when you browse HN by "New". So much so that I maintain a special uBlock filter list to weed out a lot of crappy domains and keywords.

            That's one advantage of browsing by the default frontpage: your fellow HNers have pre-filtered what ends up there, for you. It's also the disadvantage.

      • PaulHoule 263 days ago
        A good system uses both but it's not trivial to blend them. My system is right now showing me maybe 30% of what it ingests, if I was seeing just 3% I'd have to cut back more harshly and a popularity score would help. Fundamentally a popularity score has a much larger dynamic range than a relevance score.

        Google has both a document-query relevance score plus a document quality score.

        I've heard from a lot of people who like reading HN from a comment-centric point of view and I tried feeding all the comments into my system and it was really too much. When I fed in high-scoring comments, however, I like the results. I had somebody suggest comments from Metafilter and I think that could be a winner but of course comments have a network structure of relatedness to other comments and the submission that a comment-oriented reader could take advantage of.

    • petercooper 263 days ago
      it is well calibrated and just today my RSS reader told me a story I thought was a nothingburger would succeed on both metrics and… It did!

      I was originally going to joke that maybe you should turn your script on to the stock market, but I'm guessing with your background you may have some experience in that regard!

    • manx 263 days ago
      Just looked up your profile. There's some super interesting stuff you worked on. We'll get in touch!
  • sokoloff 263 days ago
    There's a related question of "what's the purpose of the ranking algorithm?"

    Is it to ensure that the #1 article is strictly "better" (via whatever function) than the #2 and the #2 better than the #3?

    Or is it to ensure that at least N of the top 30 (page 1) submissions will tend to be interesting to many users on the site (driving engagement and discussion)?

    As a user, I'm a lot more interested in the second goal than I am the first goal. This change seems to serve the first goal much more than it serves the second goal. The reinforcement loop of "on the front page => gets more votes" is a property that supports the second goal more than it supports the first. Looking at the top 30 on social-protocols (this algorithm) vs the front page on HN, I saw 1 additional story on HN that would motivate me to click through (5 vs 4), so not a massive difference.

    • jwarden 263 days ago
      I would say the ranking algorithm has many purposes. Driving quality engagement and discussion, as you suggest, is probably the most important. But I think simply being “interesting enough” to many users is not the goal. I think the goal is to make the front page as interesting as possible. That’s why HN has already out so much effort into ranking algorithms and moderation and why we think it is still worth improving if possible.

      We actually haven’t implemented a new algorithm (for reasons discussed in the readme). What you see when you click on our site is the exact same rankings, but with the upvoteRate next to each in addition to the score, which you can click on to see charts with a history of the story’s rank and upvoteRate.

  • woollyhat 263 days ago
    I think it would be interesting if users could spend their karma on performing moderator actions, perhaps with some sort of algorithmic exchange rate that converts acquired karma into modcoins.

    For example, it might cost 1000 modcoins to pin an article to the top of the page for ten minutes. Or perhaps more of a bold change: 2000 modcoins to make the text of your comment glow with a golden hue to make it more noticeable. 5000 modcoins to display an image of Paul Graham at the top of the thread, smiling beatifically at all the comments below Him. And so on.

    This would of course be of no interest to users such as myself who habitually generate throwaway accounts and discard them, but I would be curious to see how high karma users would use such a feature.

    • jwarden 262 days ago
      Yeah we have thought of a model where Karma is a measure of how much "marginal value" you have created for other users by your submissions. If we take upvotes as a proxy for value, we can calculate how many additional site-wide upvotes were generated as a result of you submitting your high-upvote-rate story: the number of upvotes your story received minus the number of upvotes the stories that were displace by your story being ranked above them would have received.

      Then, we could allow you to spend a part of that value. That is, you can promote a story with a lower upvote rate, thereby decreasing site-wide upvotes by displacing stories with higher upvote rates. You would only be able to spend some fraction of the value you created, so that after you have spent all your Karma, the net value you have created for other users would still be positive.

    • mtlmtlmtlmtl 263 days ago
      Interesting idea, but this reminds me too much of Tinder.
  • abecedarius 263 days ago
    Idea: if you know all of a user's votes, you can estimate that they at least glanced over the items up to the lowest-placed one they voted on on the same page. This is a bit more information than "users tend to read higher-placed items following a known distribution" like the formula from your readme. I guess you'd have to be HN to implement this.
    • jwarden 263 days ago
      Interesting idea. Yes that's probably true. One issue is that a story could appear on multiple pages (top, new, show, etc.), and we don't know where the upvote came from. But I think we could deal with that issue and we might be able to use that as a datapoint to refine the upvoteRate calculation, and we could experiment with adding that to our model.
      • kqr 263 days ago
        What percentage do you believe does not come from the front page? Is it big enough to actually be worried about?
        • jwarden 263 days ago
          If it's on rank 1 of the front page, then the vast majority of votes come from the front page. But if it's at rank 90 of the front page, and rank 1 of the new or the best page, then in those cases only a minority of upvotes may be coming from the front page.

          If HN implemented this, they would know where the vote is coming from. But on Quality News we could just assume there was an X% chance the vote was from the front page, a Y% it was the new page, etc., and adjust our upvoteRate formula based on that.

  • supernova87a 263 days ago
    Are you sure / do you have info that HN doesn't use some kind of "holding pen" for stories to have a fixed amount of time to see if they get a certain % of votes before being kicked down the list?

    This is a classic problem with forums, and I wonder if HN already has something in place that you might not have factored in (which could then just be tuned better).

    • jwarden 263 days ago
      I think there may well be some sort of “holding pen” system. But we don’t have the info. We only know what the “raw” ranking formula is. But the actual rankings differ significantly from what the raw ranking formula says. You can actually see the difference in charts on our site. For example: https://news.social-protocols.org/stats?id=35183317

      We have noticed that the “raw” rank (black line) will sometimes initially put a story on page 1, while it still has no the actual rank (orange line). But then sometimes the orange line suddenly jumps up. This seems to support the “holding pen” hypothesis.

  • MilnerRoute 263 days ago
    I'd like to see these alternate algorithms implemented. The API exists - and isn't that really the only way to ultimately judge if it's better or worse?

    Another random idea: have the parameters affecting rankings be visible and adjustable with interactive sliders- so you could customize the various weights to try to attain the ideal mix of stories for you.

    Or does that defeat the purpose. Is the joy of HN in knowing that when a story reaches the front page, you know it's on everyone's front page...

    • jwarden 263 days ago
      One thing that prevents us from actually implementing the algorithm is that there is some "secret sauce" to HN rankings that is not publicly available. There are flags, vote ring detectors, domain penalties, the second chance queue, and other means by which HN moderators change the rank of stories. And these make a *huge* difference. Our initial implementation of an alternative ranking algorithm was not an improvement over the existing HN home page for this reason.
      • zamalek 263 days ago
        > secret sauce

        This is a feature FWIW. It prevents blatant gaming of rankings.

    • jnakayama 263 days ago
      We played around with customization options like that (URL parameters etc.), but ultimately decided against it. The reasoning behind it was that a lack of personalization might be a feature not a bug for a news aggregator like HN. One issue that arises with personalization is that it is detrimental to a sense of shared experience and we thought that the global frontpage might be a distinct reason for the sense of community on HN.

      This issue was also discussed previously on HN: - https://news.ycombinator.com/item?id=31375092

  • password4321 263 days ago
    I wonder how long this project will run. So many Hacker News interface reimplementations have gone dark over the years.

    Success to you!

    • manx 263 days ago
      Thank you! We built a lightweight page on purpose (server-side rendered go templates), to never involve a big hosting cost. In fact, right now, it's hosted on a fly.io free tier. But it's open source and anybody could host it, if we're deciding to shut it down.
  • 23B1 263 days ago
    I guess I don't understand what the problem is with the current way of doing things? Like, HN is one of the few communities online now where the content and the conversation seem interesting, varied, and polite.

    Whats the core problem you're trying to solve here?

    • cycomanic 263 days ago
      I can tell you one issue I observe as someone living in a different timezone. HN ranking highly depends on timing. As a consequence during hours the US is asleep my impression (I have not investigated this thoroughly) is that front page stories are dominated by old stories and new submissions don't make enough votes to get to the front page. Then once the US wakes, New stories get enough votes and make it to the front page. This leads to biases in the news, which I find unfortunate, because I believe everyone would benefit from news being geographically (for lack of a better word) broader.

      I certainly welcome someone playing with different algorithms to see how they affect ranking.

      • jnakayama 263 days ago
        If you're interested in investigating timing effects: We collected a dataset last year where we took a snapshot of the newest 1500 stories on HN every minute for several months which should contain the information required. Feel free to play around with it and get in touch with us if you find something interesting!

        [1] Dataset: https://osf.io/bnysw/

        [2] Exploratory analyses: https://github.com/social-protocols/hacker-news-data

    • user3939382 263 days ago
      My biggest gripe, which may not be solvable and isn’t unique to HN, is this vote/point system that ends up as “truth by consensus”. Controversial opinions, which may actually be the correct ones, are hidden and buried and even have their text slowly disappear and fade out which I think is ridiculous.

      It promotes groupthink and encourages users to just repeat mainstream opinions.

      • SllX 263 days ago
        Not as much as you may think.

        “showdead” is a feature if you have enough karma and it’s usually easy to see why something is dead.

        Controversial arguments made well, or at least to the best of your ability that fall within the guidelines can and do get upvoted. A lot of the dead comments I see are really just ad hominem attacks or near enough to it.

    • jwarden 263 days ago
      The problem certainly isn’t that HN content and conversation isn’t good. But that doesn’t mean it couldn’t be better.

      The core problem we are teying to solve is that the community sometimes misses out on the opportunity to discuss content that many people would find valuable and that would engender quality discussion.

    • chasebank 263 days ago
      The only problem I have with HN is the endless content. I wish you could say, 'Show me stories that reached the front page', then hide or watch each story as you deem. That way I could clear out the front page like I do my inbox.
      • manx 263 days ago
        Some feed readers like feedly have a fifo mode, where you, for example, can go through all the stories which appeared on hacker news best.
      • yesenadam 263 days ago
        Sounds like https://hckrnews.com/ would be perfect for you.
  • troydavis 263 days ago
    How does your model compare to using:

    (Users who upvoted a given submission / Users who saw a page that includes the submission and its vote icon)

    This would be a percent between 0 (no one who saw a page containing a given submission upvoted it) and 100 (everyone who saw it upvoted it). Receiving more impressions wouldn’t change that percentage.

    Weaknesses: It can only be calculated by HN itself. On pages that list lots of submissions (like the home page), it need may to compensate for relative position on the page. These pages may already randomize position enough for this not to be an issue, or to only be an issue for the first 3-5 items on the home page.

    • jwarden 263 days ago
      Interesting question.

      One difference is that upvoteRate formula adjusts for where the submission appears on the page (the rank). It also adjusts for how many site-wide upvotes occurred during that time period.

      You are right, since we don't know the number of users who saw the page + the vote icon, so we can't calculate the probability Pr(upvote|saw submission with upvote button). But the upvoteRate formula would be proportional to this probability, times additional factors for rank and time.

      We talked about this in our original blog article here: https://felx.me/2021/08/29/improving-the-hacker-news-ranking...

  • mtVessel 263 days ago
    This is interesting, but it would even better if I could see it sorted by descending upvoteRate.
    • manx 262 days ago
    • manx 263 days ago
      That's a great idea! Similar to the hacker news /best page, where stories of the past 7 days are sorted by their upvote count, we could provide a page where those stories are sorted by their upvoteRate. Should be easy to do.
  • mostcallmeyt 263 days ago
  • jrussino 263 days ago
    Lots of discussion here about this approach and alternatives. Not sure how feasible this is but I think it would be even cooler to turn this into a site where users can define their own custom ranking algorithm and/or select from a set of available algorithms (including the one you're currently using). Maybe even provide a meta-ranking of the most popular ranking algorithms?
    • jwarden 262 days ago
      Yes we'd like to do that, but the main problem is that we don't have access to some data that goes into HN rankings: flags, penalties, and other moderator actions. These make a huge difference to the quality of front-page content.

      But we have thought of allowing users to provide this data on our site. They could flag posts, mark them as duplicate, mark them as being non-technical (HN mods regularly apply penalties to popular news items that are not really about hacking and startups), etc.

  • taubek 263 days ago
    Can I somehow see the historical data for some of mine old submits? I seems to me that on https://news.social-protocols.org/ I can get the data for past 24 hours.

    I think that you have point with feedback loop.

    • manx 263 days ago
      We just reset the history yesterday and plan to keep data for about one month for now. In the future, it should definitely be possible to retain a much longer time span.
  • akomtu 263 days ago
    HN needs to separate emotion from reason in upvotes and downvotes. I bet that many readers here confuse the upvote button with "I like it" and the downvote button with "I dislike it" while it should be about "is this comment truthful and informative, does it add something novel to the discussion?"

    HN could implement this with a cosmetic change: all upvotes and downvotes would show a form to provide explanation. Those explanations will be reviewed, randomly, to spot emotional users and suspend their voting power for a month or two. As for those who can't be botheted with explaining their voting decision, they shouldn't be able to influence the global ranking. Rage downvoting and hive-mind upvoting will be gone very quickly.

    • manx 263 days ago
      I think this boils down to:

      - Users upvote, because they want that story to get MORE attention, BECAUSE they agree

      - Users downvote, because they want that story to get LESS attention, BECAUSE they disagree

      So the intent is still attention control, but the reason is (dis)agreement.

      But in the case of HN, the downvote is a moderation mechanism, instead of a community poll. So this might be confusing to the user. Treating downvotes differently, based on a top-level reason (disagreement, violating ToC, false or misleading, not interesting, etc) makes a lot of sense to me.

      • JohnFen 263 days ago
        Usually (but not always), I upvote not because I agree, but because someone said something that I think was worth reading. Usually (but not always), I downvote not because I disagree, but because someone said something that I think is worth negative attention (trolling, etc.).

        But I would love a separate Like/Dislike mechanism. It's a bit painful to upvote an insightful (thus upvote-worthy) comment that expresses a view that I disagree with.

    • layer8 263 days ago
      You expect emotional up-/downvoters to be objective and truthful about their up-/downvote motivation?
      • akomtu 263 days ago
        They'll have to explain their votes, and if the mods deem the reasoning bs, they'll reverse the votes.
  • unethical_ban 263 days ago
    I would like the HN UI to have a "favorite" button equally accessible to the "upvotes" button.

    I know favorites are a feature, but they require clicking into the comments. I end up using upvote as a bookmark function, not as a method of approving of a post, because that's easier.

    As it relates to this post, the HN UI encourages the feedback loop this submission is trying to fix.

    Put a bookmark icon next to the upvote icon. Provide a unified view of upvotes+bookmarked for a user so they can see everything that got their interest.

    • jwarden 263 days ago
      Good point, I hadn't thought that some people are using the upvote button effectively as a bookmark button. Interesting idea the unified upvotes+bookmarked view.

      And your comment raises the question, what does an upvote mean? Why do people upvote? There may be lots of strange reasons. But whatever upvotes mean -- whether it means people want to bookmark it, or people find it valuable, or people want to bring something they disagree with to the attention of other people -- an upvote is a rough signal of "this should get more attention". The whole concept of a link aggregator like HN only makes sense if we assume that upvotes can be interpreted as a proxy for what people think deserves the attention of other users.

    • nick__m 263 days ago
      If something is interesting enough to be bookmarked it is surely interesting enough to be upvoted !
  • mhb 263 days ago
    Would just increasing the number of posts on the front page be an improvement?
    • manx 263 days ago
      I think so. The first page gets much more upvotes than the second page. There is a visible step in the data: https://github.com/social-protocols/news#upvote-share-by-ran...

      Once a story drops to the second page, it receives fewer upvotes and can't sustain any growth anymore. Having a longer front page (we're showing 90 ranks), smooths out that effect.

  • 4dayworkweek4u 263 days ago
    HN really needs tags per story submitted too.
  • exolymph 263 days ago
    I like https://hckrnews.com/ as an alternative front page
  • wpietri 263 days ago
    As long as we're talking about redoing this, let me suggest letting authors see the names of upvoters (and only upvoters).

    Quora had this and it did a fair bit to create positive community feelings for me. It also let people signal agreement/support without having to create a comment to do so, which I would find handy.

    • 999900000999 263 days ago
      I actually wouldn't like this, it would make me afraid to upvote unpopular opinions I agree with. As is hacker News falls into the same trap as Reddit where there's a bit of a hive mind effect.
    • luckylion 263 days ago
      Wouldn't that lead to you seeing a pattern in who upvotes you which would make you more likely to upvote their submissions or comments, slowly guiding you towards bubble-forming?

      And if someone doesn't upvote your "let's not eat babies" comment, do you go after them for being pro-baby-eating?

      • wpietri 261 days ago
        Bubble-forming is one possible downside of community-forming, yes.
    • JohnFen 263 days ago
      If this were the system, I'd just stop voting. Which may or may not be a bad thing.