Ask HN: Where do you find interesting papers to read?

I'd like to start reading a lot more academic papers on NLP and LLM, but I'm not sure where to look for interesting ones. It feels like there is an overload of them because of chatGPT ease of generation.

My main source right now is twitter with arxiv links retweeted most by people I follow.

My favourite ones are:

https://twitter.com/arxiv_cs_cl

https://twitter.com/papers_daily

Where do you mainly find good papers?

11 points | by Obertr 684 days ago

7 comments

  • PaulHoule 684 days ago
    I am working on a smart RSS reader that collects about 1000 articles on a good day from various sources including CS papers from arXiv. It selects about 300 articles (summary only) most days that i browse through a TikTok like interface (i judge one article at a time so I get valid negatives unlike the typing “learning to rank” problem). I can favorite an article to retrieve it later, say i like it to see more like it in the future but not save, or say i dislike it.

    It is powered by transformer models and sbert.net, these are used to assign articles to 20 clusters generated daily, i see the top 15 from each cluster. This does a reasonable job of handling a diverse feed that includes CS abstracts, trade publication article, sports news, etc. I have high satisfaction in days that the system gets a lot of articles (peaks on Thorsday) but less on the weekends, sometimes I backfill high-scoring articles from last week then.

    I tried using fine-tuned BERT-like models for classification and got them to equal the performance of the embedding-based system after a huge amount of work and a much longer training time. My problem is pretty noisy and there is some limit to how high i can get the AUC.

    • extasia 684 days ago
      Are you tracking your satisfaction somewhere?

      Interested in your embedding based system - is that embedding layer + neural net?

      Sounds very cool overall:)

      • PaulHoule 684 days ago
        I’ve thought about satisfaction and mood tracking (I am sure these are linked) but haven’t built anything that i really use other than my memory.

        The embedding system uses a probability-calibrated SVM. My average AUC is 0.77, I hear TikTok gets in the low 80’s and they are using collaborative filtering. I got 0.72 with a bag-of-words and logistic regression model.

        From a product standpoint it’s got the disadvantage that it takes about 1000 judgements to really get good, right now I am training over the last 40 days of data because it doesn’t really get better with more than that which is good news because the compute and storage are nicely bounded.

        • extasia 684 days ago
          Are you looking to make this a product?

          On an unrelated note I realized recently that the 'bag' in bag-of-words is another name for the multiset data structure... Which makes sense when you think about the text as being a _set_ of tokens which can appear _multiple_ times.

          • PaulHoule 684 days ago
            It's got potential as a product but I haven't committed to it yet. I have two ideas in mind, (1) a consumer product which is a bit like social media without the social (basically what it is now) and (2) a "pro" product that is capable of handling a network of classifications and more complex workflows.

            I was talking to somebody about the potential as an open source project and came to the conclusion that it's a research project right now but my research projects are more solid than average. I know I'm not afraid to demo it because I run it every day and it spins like a top.

            If you want to chat about it look up my profile and send me an email.

  • bjourne 684 days ago
    Read a random paper about LLMs and look at what it cites. Read those papers and look at what those cites. And so on. You'll soon figure out what the academic community consider the seminal papers in that field.
    • menshiki 682 days ago
      This is a good approach. It is basically how all literature reviews are done in academia.
  • is_true 683 days ago
    When I find a paper I'm interested in I usually follow the cites.

    The last time I was interested in a topic (tree segmentation) I used elicit.org * and I found it really nice to find new papers.

    * From the FAQ:

    If you ask a question, Elicit will show relevant papers and summaries of key information about those papers in an easy-to-use table.

  • mamediz 684 days ago
  • tikkun 684 days ago
    There's paperswithcode which has a ranking of sorts.
  • throwaway29303 682 days ago
  • m348e912 684 days ago
    This is assuming you have access behind the research paper paywalls. Not everyone does and sci-hub doesn't always have access to recent papers.