Show HN: Find HN threads about the page you're browsing

(github.com)

260 points | by achairapart 1365 days ago

21 comments

  • joshstrange 1365 days ago
    Love the idea, I have thought about this before and the main issue is training my self to click an extension to check. If an icon flashed when there was a HN submission (ideally only if I didn't come directly from HN since I already know) it would be way more useful.

    That said we all know the big issue with that is privacy. I don't want an extension sending every url I visit to any service (directly to the API or through some third-party). I've mulled over this issue before and I'm not sure how much space it would take up to store a list of urls that have been submitted to HN (Maybe keep 1 month plus all submissions that got over 100 upvotes or something) and check against that local list.

    Then, and only then, when you get a match you can call out to the Algolia API to get the HN url (or store that as well depending on size).

    I have no idea, off the top of my head, what the storage requirements for this look like but I don't think they would be huge. The other issue (which I want to look into the source to see how this handles it) is the stupid social/ads tracking params that are added to URLs. Maybe there is a good list of these that you can remove (from both the current URL and the HN submission) so you can see if it's the same base URL.

    • oefrha 1365 days ago
      Every day there are at most around a hundred articles that generate any meaningful discussion at all (see past). Let’s say each article takes up <=200 bytes (URL + title + some stats, a very generous limit actually), then one year worth of data is at most ~7 MB. That might be a bit too much but not by far. If you gate submissions by votes/comments threshold as a function of time, it’s conceivable to store metadata of all the good discussions on HN within 20 MB or even 10 MB.

      Chrome and Firefox extensions can request the unlimitedStorage permission, btw. (Chrome has a 5MB default limit, Firefox doesn’t seem to have one.)

    • relate 1365 days ago
      You could allow for errors and use a bloom filter to avoid the space issue:

      https://en.wikipedia.org/wiki/Bloom_filter

      • StavrosK 1364 days ago
        Or you could use a privacy-preserving lookup API, but that might be too much traffic. A Bloom filter could be downloaded locally and is probably a better solution.
        • waterhouse 1364 days ago
          Bloom filter as first pass and then privacy-preserving lookup API if it returns "probably a match"?
          • StavrosK 1364 days ago
            If it returns "probably a match", you can just look up with HN directly, like it does now.
    • Willamin 1364 days ago
      > I'm not sure how much space it would take up to store a list of urls that have been submitted to HN

      I recently was wondering how much space this would take up myself. After a lot of searching, I found this reddit post, which links to an archive of Hackernews. It contains data from HN from late 2006 until mid 2018 and totals just over 2 gigabytes. These dumps contain all comments, job postings, polls, poll options, and stories.

      I did some super quick analysis of the 2018-05 archive (the latest provided by this source). I found that there were 237,646 total items, and only 32,473 of those are stories. That's only ~14%. Assuming the ratio of stories to non-stories has been constant for the entire dataset, that's only 280 megabytes for the entire 2006 to 2018 set.

      That data can further be shrunk by removing extraneous information from each story in the data. Mirroring the HN api, it has the following pieces of data for each item: author username, id, date retrieved, score, time posted, title, type, url, and whether its dead, how many descendants it has, and what items are its kids. I didn't attempt to reduce the data to only contain links, but I imagine it would be significantly reduce the size.

      Once you've reduced the data down to a list of urls, I imagine it can be reduced even more by removing duplicate links.

      Depending on the average size of the urls, it's not unreasonable to think that taking a hash of each of the urls would result in a smaller set of data.

      On top of that, there's wonderful text compression, but I don't have the numbers on how much that would reduce the size of data.

      • tleb_ 1364 days ago
        I was curious so I downloaded a list of id-url pairs from here [0]. It's CSV formatted and contains 1_960_207 entries (last update being 22 feb 2019). It is 134MiB uncompressed and 35MiB compressed using xz, so definitely storable in a web extension.

        IDs being integers smaller than 10_000_000, they can be stored in 3 bytes and using a 64 bits hash function is enough (using this approximation [1] with k=2_000_000 and N=2^64 gives p=1,08e-7) which accounts for 22MB for 2 million entries. Stats on duplicates would be needed to know the impact of bundling identical hashes together. Definitely doable!

        Keeping up-to-date would be harder, having a server querying the API to collect and distribute the day-by-day data to every extension-user is probably the best option.

        [0]: https://console.cloud.google.com/marketplace/product/y-combi... [1]: https://preshing.com/20110504/hash-collision-probabilities/

      • Ruthalas 1364 days ago
        If you can find that thread, I'd love to mirror that content!
    • Jarwain 1365 days ago
      Comparing hashes would help a bit on both anonymity and size concerns.

      I also think, in a majority of cases, one could remove all of the query parameters from a URL and still have the same page. I'm not 100% confident about this though

      • joshstrange 1365 days ago
        Yeah, I thought about that but there are still a number of blogs that use something like

            https://mysite.com/post.php?id=123
        
        But it would work for a lot of sites to strip the query.
    • hunter2_ 1364 days ago
      > params that are added to URLs. Maybe there is a good list of these that you can remove

      Although not every page offers it, for those that do, comparing the canonical link [0] should be pretty robust.

      [0] https://en.wikipedia.org/wiki/Canonical_link_element

      • joshstrange 1364 days ago
        That's a really good idea! Thanks for mentioning that, it's something I've used before (coded support for) but it completely slipped my mind when thinking about this.
  • coffeemug 1364 days ago
    This is awesome. I really love this idea, and love the implementation. Works great and I think will be really useful or at least interesting (can't tell yet).

    One piece of feedback-- I'd love a mode where the extension notifies me there are HN threads that pass a certain threshold (e.g. number of upvotes, number of comments, etc.) for every page I visit. This is less privacy preserving, but I'd be willing to make that trade-off in exchange for useful information being surfaced to me opportunistically.

    Thanks for making it!!

  • adenozine 1364 days ago
    I'd like something like this in reverse.

    Downvote if you will, all the same, I find most HN discussions to be of relatively low-value, and also it's not easy to vet whether or not someone's credentials align with what they're writing. I come across interesting links on HN all the time, and I wish I could have something to tell me "Oh, there's a LtU user with hundreds of posts discussing this with links to papers and proofs."

    My personal perspective is just that way, I don't see myself coming across anything and thinking "Gee, I wonder what HN thinks about this."

    I love the idea though. I wish browsers didn't suck so much. I wish Opera had won more, and maybe we'd have lots of different browsers, infinitely configurable like emacs/vim, with my whole little customized universal browsing tool. Extensions are an adequate compromise, it's just that the kind of person who thinks this stuff up, could so much MORE if browsers weren't so limited.

    • mindfulhack 1364 days ago
      Just to offer my perspective, and thank you for yours - I don't feel like HN is a hive mind and a single-voiced consensus, nor that I need to check if 'HN has endorsed this project or the idea proposed by this article' as the use case for this extension. Instead, I'm instantly recognising the value in having a quick link to further information and analysis about a given page, if it exists. This is because, to me, HN comment sections can be a wealth of information in their own right, very often.

      Just installed the add-on, let's see how useful it becomes. :)

    • summitsummit 1364 days ago
      I've been thinking of the same. I spend a lot of my time in forums and would like to be able to discriminate users based on certain parameters.

      I envision it as some sort of extension that analyzes the users on the current discussion thread, visits their profiles, analyzes it (post history, stats, etc) and decorates the users handle on my current page.

      Performance shouldn't be too bad using caching and prioritization.

      What value adds were you envisioning specifically?

      • adenozine 1362 days ago
        Sorry, I didn't see this the other day, I'm not really a power-user.

        I guess if I could have any features I want, it'd be this:

        * Topics, perhaps categorized by keywords.

        Maybe, if someone uses the word "TensorFlow" I'd like to know if they have other posts that have scored well with that word in them.

        * Similarly, I'd like to know what topics that user are more likely to post in. If a user only ever posts in threads that contain the word "SomeSmallStartupTheyAreClearlyShillingFor" then, I'd like to know that. I find that in practice. Algorithmically, I think its not too hard to separate these two categories and distinguish high-quality users, because the way posts can be scored here on HN.

        Really though, I have been thinking about building something like this, maybe a service reading thousands of RSS feeds and keeping track of comment sections in blogs and forum threads, and just compiling these webs of influence for certain links. Like a search engine, except it'd be specialized for discovering high-quality conversations.

    • localcdn 1364 days ago
      >there's a LtU user

      What is LtU?

  • yalooze 1364 days ago
    I use Kiwi Conversations[0] for this which allows you to check HN, Reddit and others if you want.

    The design of What Hacker News Says is really nice though.

    [0] https://chrome.google.com/webstore/detail/kiwi-conversations...

  • llimos 1364 days ago
    I made a bookmarklet for Firefox that opens the HN discussion in a new tab (if there is one) and offers to submit if there isn't. It's very quick and dirty but it does the trick. It opens the first result from the search API, can be modified to open all of them if you want.

      javascript:(()=>{const w=window.open();fetch(`https://hn.algolia.com/api/v1/search?tags=story&query=${encodeURIComponent(window.location.href)}`).then(a => a.json()).then(a=>{const c=a.hits.filter(b=>b.url===window.location.href)[0];if(c){w.location.replace(`https://news.ycombinator.com/item?id=${c.objectID}`)}else{w.confirm('Not on HN. Submit?') ? w.location.replace(`https://news.ycombinator.com/submitlink?u=${encodeURIComponent(document.location)}&t=${encodeURIComponent(document.title)}`):w.close();}})})()
    
    Interestingly I had to open the tab before getting the search results, it seems there is an exemption to the popup blocker for bookmarklets but only synchronously.

    Edit: It seems the backticks mess something up in HN formatting. Code here: https://gist.github.com/llimos/ee818bcb3060adc8469f4978c654a...

  • kioleanu 1365 days ago
    Haha, I had the exact same idea a few weeks ago: https://github.com/viorelsfetea/commenter

    Your design is much nicer tho

    • nishparadox 1364 days ago
      Hey. I've been using the commenter for a month now. Thanks for such a handy extension. I use it on daily basis... Appreciate it.
  • KenanSulayman 1364 days ago
    This is great!

    I'm also using [0] which displays mentions of a site on reddit.

    (And while you're at it, one [1] that replaces youtube comments with reddits comments from the subreddit threads where a video was posted to.)

    [0] https://chrome.google.com/webstore/detail/reddit-check/mllce...

    [1] https://chrome.google.com/webstore/detail/karamel-view-reddi...

  • maxbaines 1365 days ago
    Also think it would be great to see an indicator of whether or not there are hits, perhaps not a flash as thats pretty invasive but a number of hits, kinda like SMS or email count on icons.

    The privacy thing is also making me flinch, an idea could be to disable unless clicked, when I find an interesting page, product or application I often wonder if its featured on HN

  • alexpi 1365 days ago
    Simple and great! I consume lot of content from HN and bookmark posted links often. As everyone here knows hn comments sometimes contribute more to the topic than article itself (so I add them to favorites). Now both of them linked. Thanks
  • ekzy 1364 days ago
    Your extension looks good! I made a similar extension with clojurescript 4 years ago, using algolia api too. It's not intrusive and only look up when you click. Check out the code here: https://github.com/jazzytomato/hnlookup

    https://chrome.google.com/webstore/detail/hacker-news-lookup...

  • XCSme 1365 days ago
    That's a really cool idea! I added it. I am mostly curious to learn more about sites using HN as a way to market their products or to know more about the context in which a product is discussed.
  • eatonphil 1364 days ago
    One weird thing to me about this extension and Kiwi Conversations is that it doesn't search comments, only submissions.
  • arendtio 1364 days ago
    'Looped in' is a very similar extension and a few years older:

    https://news.ycombinator.com/item?id=16316374

    https://github.com/jdormit/looped-in

  • SilasX 1365 days ago
    Love it! One suggestion, at risk of promoting feature creep/visual bloat: maybe go into those thread and pull the top comments (ideally, the top comments over all discussions), and have those pop us as the first thing I see on the drop-down, instead of just links to the discussions?
  • commonturtle 1362 days ago
    Nice, I've wanted something like this for a while. HN often has substantive comments on writing in the internet, so I often find myself checking if something interesting I read has been submitted to HN before.
  • ikedaosushi 1363 days ago
    I'm using a similar extension which allow to see comments and threads. https://github.com/doublemarket/hnpopup
  • _5p0g 1364 days ago
    To me HN Algolia has been a one-stop shop for everything related to search in HN. I always have a browser tab that has HN Algolia opened up for any kind of research. I’d love if this extension could be extended to include HN Algolia too.
  • ivan_ah 1364 days ago
    Here is a similar web extension for reddit discussions https://thredd.io/
  • lukeplato 1364 days ago
    off-topic: even if it's not necessary for a blog, someone needs to hook up Paul Graham with an SSL certificate.
  • swyx 1364 days ago
    relatedly - a browser extension that shows you twitter convos about the page you are browsing https://github.com/round/Twitter-Links-beta
  • zingermc 1365 days ago
    I feel compelled to point out that this extension sends the URLs of all open tabs to algolia.com when you click the extension (at least on Chrome).

    I would much prefer if it only looked up the current tab.

    A more private design might fetch the top N results from algolia.com and only search through them locally.

    That being said, this is cool! Thanks for sharing.

    • achairapart 1364 days ago
      >I feel compelled to point out that this extension sends the URLs of all open tabs to algolia.com when you click the extension (at least on Chrome).

      Wait, how's that possible? The extension doesn't even have permission to get urls from tabs that are not the active one...

      • zingermc 1364 days ago
        Your comment made me dig in a little more. I was wrong, it is only fetching the current tab, although it wouldn't need more permissions to see all the tabs.

        In popup.js[1]:

            chrome.tabs.query({active:true,currentWindow:true}, function(tabs){ ... })
        
        These `active` and `currentWindow` parameters to query() [2] restrict the results to the current tab. If I remove those parameters and run in DevTools, I seem to get a full tab listing.

        [1]: https://github.com/pinoceniccola/what-hn-says-webext/blob/ma...

        [2]: https://developer.chrome.com/extensions/tabs#method-query

        • achairapart 1364 days ago
          Even without `active` and `currentWindow` parameters the extension cannot get urls and titles from other tabs because it has only the `activeTab`[1] permission declared in the manifest. You need more powerful permission for that.

          I think with the `activeTab` permission you still get the an object for every tab other the active one, but without access to `url`, `title` and `faviconUrl` properties.

          Thanks for checking out anyway. I built this tool especially because all of the others already available were a privacy nightmare.

          [1]: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...

    • jasonjayr 1365 days ago
      Maybe a bloom filter would help if we could get a dump of the urls from HN?