Ask HN: ML Papers to Implement

103 points | by Heidaradar 11 days ago


  • tdekken 11 days ago
    Great question!

    I seem to be in a similar situation as an experienced software engineer who has jumped into the deep end of ML. It seems most resources either abstract away too much detail or too little. For example, building a toy example that just calls gensim.word2vec doesn't help me transfer that knowledge to other use cases. Yet on the other extreme, most research papers are impenetrable walls of math that obscure the forest for the trees.

    Thus far, I would also recommend Andrej Karpathy's Zero to Hero course ( He assumes a high level of programming knowledge but demystifies the ML side.


    P.S. If anyone is, by chance, interested in helping chip away at the literacy crisis (e.g., 40% of US 4th graders can't read even at a basic level), I would love to find a collaborator for evaluating the practical application of results from the ML fields of cognitive modeling and machine teaching. These seemingly simple ML models offer powerful insight into the neural basis for learning but are explained in the most obtuse ways.

    • ttul 11 days ago
      Jeremy Howard's FastAI's course is another great one:

      I'm enrolled in their latest course via University of Queensland; presently, they're teaching us by implementing one of the latest text-to-image papers in PyTorch. They cover the math as side lectures if you're interested in it and have the pre-requisite knowledge. But it's not necessary if what you're keen on is the programming of models.

    • Heidaradar 11 days ago
      great, thanks!

      also for your PS, can you give a little more detail? What's your end result, what have you done so far etc

      • tdekken 11 days ago
        > what have you done so far

        So far, I am a week into learning ML :). I have spent ~30 hours watching various ML courses and am in the process of testing the hypothesis that teaching reading with a shallower orthography (e.g., differentiating between the short and long 'e' sounds by introducing an 'ē' grapheme) leads to improved recognition of sublexical patterns. The step I am working on is building an embedding layer to ensure that these new graphemes (i.e., 'ē', 'ā', etc.) are near their parent grapheme (i.e., 'e', 'a') in the embedding space. (Although the model seems straightforward, I could also be completely misguided in how I am tackling this problem :) ).

        FYI, this orthographic approach (i.e., how words are spelled using an alphabet) is used in a few highly researched literacy programs, but AFAICT there isn't direct research on the approach itself. The motivation is to initially make English a consistent language (i.e., the letters you see have a one-to-one correspondence with a particular sound). This should greatly simplify the initial roadblock in learning to read English (as seen by studies of countries with "shallow" orthographic languages) and then learners would transfer this knowledge to the normal (inconsistent) English orthography.

        • Heidaradar 11 days ago
          damn, this all sounds like very interesting, cool stuff!! I'm not sure if I'd be able to help much/have the time for it though, but best of luck!
      • tdekken 11 days ago
        I would love to!

        My main goal is to use cognitive modeling to evaluate the efficacy of interventions and inform the personalized "minimum effective dose" for a particular learner. Academically, this is well-trodden territory [0-2] but these results haven't found there way into practice. This is critically important because we know that ~30% of children will learn to read regardless of method, ~50% require explicit, systematic instruction, ~15% require prolonged explicit and systematic instruction, and up to 6% have severe cognitive impairments that make acquiring reading skills extremely difficult [3]. Yet, how much is enough?

        To make this more concrete, imagine you are learning a foreign language with Duolingo. How much effort per day is necessary to achieve that? Many people have long streaks and are no closer to fluency (I learned nearly nothing despite a 400 day streak). Similarly, many reading interventions are once-a-week and, predictably, don't meaningfully affect the learning outcomes for those students.

        BTW, this ML portion is part of a much larger effort (e.g., our team is a Phase II finalist in the Learning Engineering Tools Competition). If anyone is interested in collaborating, please feel free to reach out to me.

        [0] Phonology, reading acquisition, and dyslexia: insights from connectionist models (

        [1] Modeling the successes and failures of interventions for disabled readers. (

        [2] Learning to Read through Machine Teaching (

        [3] Education Advisory Board. (2019). Narrowing the Third-grade Reading Gap: Embracing the Science of Reading, District Leadership Forum: Research briefing

        • thundergolfer 11 days ago
          Don’t get me wrong I think your work is really cool and a worthy cause, but surely the literacy crisis is a socio-economic problem not a technological one.
          • tdekken 11 days ago
            > surely the literacy crisis is a socio-economic problem not a technological one.

            Yes and no. It is, of course, not strictly a technological one, but the argument that it is a socio-economic one is, at best, an oversimplification. If you are interested in a more complete understanding, I highly recommend checking out APM's documentaries on this issue (

            From my research, the underlying causes of the literacy crisis are:

            1. The mistaken belief that reading, like speaking, is biologically natural. This belief manifests as guidance to surround your child with books and read to them. Unfortunately, this isn't sufficient for the majority of children.

            2. The majority of teachers lack the content knowledge to teach children to read. For example, imagine helping a child to sound out the word "father". What is the sound of the second letter? It isn't a short 'a' nor a long 'a'.

            3. Many popular programs used in schools are completely debunked by science (e.g., cueing theory), but as a teacher it is difficult to identify that your approach is faulty. (If ~30% of children learn regardless of method, it is too easy to offer excuses for why the other children don't learn).

            4. Helping a struggling child is "rich man's game". If you are high SES and your child is struggling, you will pay a tutor to rectify the problem. That isn't an option for the vast majority of families.

            In other words, this is a highly complex puzzle and it is completely understandable why society is seemingly no closer to solving it :). Consequently, the majority of our effort is directed at understanding these root causes and identifying how to overcome them (FWIW, we have made significant progress here). The cognitive modeling portion is a small but plausibly important part of the larger landscape.

            • thundergolfer 11 days ago
              Thanks for the answer. I haven't learned that much about this topic, but most of what I have learned is from reading E.D Hirsch Jr.

              Who would you recommend I read next?

              • tdekken 11 days ago
                It depends how far down the rabbit hole you want to go :). I highly recommend checking out APM's documentaries on this issue ( These are in-depth and accessible.

                If you want to go further, you can read Moat's Speech to Print, Seidenberg's Language at the Speed of Sight, and many others. If you want to go even deeper, then welcome to the firehose that is educational research :D.

  • phonebucket 11 days ago
    A lot depends on what you're interested in.

    Some papers that are runnable on a laptop CPU (so long as you stick to small image sizes/tasks):

    1) Generative Adversarial Networks ( Good practice to have a custom training loops, different optimisers and networks etc.

    2) Neural Style Transfer ( Nice to be able to manipulate pretrained networks and intercept intermediate layers.

    3) Deep Image Prior ( Nice low-data exercise in building out an autoencoder.

    4) Physics Informed Neural Networks ( If you're interested scientific applications, this might be fun. It's good exercise in calculating higher order derivatives of neural networks and using these in loss functions.

    5) Vanilla Policy Gradient ( is the easiest reinforcement learning algorithm to implement and can be used as a black-box optimiser in a lot of settings.

    6) Deep Q Learning ( is also not too hard to implement and was the first time I had heard about DeepMind, as well as being a foundational deep reinforcement learning paper .

    Open AI gym ( would help get started with the latter two.

    • OgAstorga 11 days ago
      thank you, this is a great compilation
    • Heidaradar 11 days ago
      this is exactly what I needed, thanks!
  • p1esk 11 days ago
    It's best to choose something you personally find interesting, for example, I'm interested in audio generation, so I'd pick some papers that describe a music/voice generation model or algorithm, but to you it might be something completely different.

    When you do decide on a paper, take a look at Phil Wang's implementation style:, he has hundreds of papers implemented.

    If you don't already have a GPU machine, you can rent 40GB A100 instance for $1.1/hr or 24GB A10 for $0.6/hr:

    • Heidaradar 11 days ago
      it's hard for me right now to tell if something is simple or not to implement, but i totally agree! thanks for the links.
  • loveparade 10 days ago
    > ideally, a list of papers that could take 2-5 hours each and a few hundred lines of code?

    I think you are severely underestimating the time required, unless you are quite experienced, know exactly what to look for, or the paper is just a slight variation on previous work that you are already familiar with.

    Even seasoned researchers can easily spend 30+ hours on trying to reproduce a paper, because papers almost never contain all the details that went into the experiments. You are left with a lot of fiddling and iteration. Of course, if you only care about roughly reproducing what the authors did, and don't care about getting the same results, the time can be much shorter. If the code is available that's even better, but looking at it is cheating since wrestling with issues yourself is a big part of the learning process.

    A few people here mentioned Andrej's lectures, and I also think they are amazing, but they are not a replacement for getting stuck and solving problems yourself. You can easily watch these lectures and think "I get it!" because everything is so well explained, but you'll probably still be stuck when you run into your own problems trying to reproduce papers from scratch. There's no replacement for the experience you gain by struggling :)

    It's like watching a math lecture and thinking you get it, but then getting stuck at the exercise problems. The real learning happens when you force yourself to struggle through the exercises.

    • Heidaradar 10 days ago
      I completely agree (that struggling is the most important part, and also about andrej's lectures - they almost spoon feed you imo), but I also didn't know papers took so long. Any particular papers that you remember implementing yourself that you would reccomend?
      • loveparade 10 days ago
        I believe which paper you implement matters less than you think. Most papers take some kind of well-known base model, make a few tweaks, and then run experiments. To implement it you need to follow the references and start with the base model, and 99% of all papers within some field will eventually lead you back to implementing the same base model, so it really doesn't matter where you start. The overlap is huge.
  • pizza 10 days ago
    Biggest productivity boosters for me:

    - for pytorch

    - conda for packaging w CUDA

    - einops

    - tensorboard

    - huggingface datasets

    Interesting models/structures:

    - resnet

    - unet

    - transformers

    - convnext

    - vision transformers

    - ddpm

    - novel optimizers

    - generative flow nets

    - “your classifier is secretly an energy-based model and yiu should treat it like one” paper

    - self-supervision

    - distance metric learning

    Places where you can read implementations:

    - lucidrains’ github

    - timm computer vision models library

    - fastai

    - labml (annotated quite nicely)

    Biggest foreseeable headaches:

    - not suuper easy to do test-driven development

    - data normalization (floating point error, not using eg batchnorm)

    - sensitivity of model performance to (hyper)params (layer sizes, learning rates, optimizer, etc)

    - impatience

    - lack of data

    I’d also recommend watching Mark Saroufim live code in PyTorch, on YouTube. My 2 cents, you can only get really fast as well as good at this with a lot of experience. A lot of rules-of-thumb have to come together just right for the whole system to work.

    • gschoeni 8 days ago
      Early days, but trying to solve the lack of data problem with a project called "Oxen".

      At it's core it's a data version control system. Built from the ground up to handle the size and scale of some of these ML datasets (unlike git-lfs or DVC which I have found to be relatively slow and hard to work with). Also building out a web hub similar to GitHub to collaborate on the data with a nice UI.

      Would love any feedback on the project as it grows! Here's the github repo:

    • Heidaradar 10 days ago
  • badpun 11 days ago
    2-5 hours for a few hundred lines of tricky math code sounds like way too little. Not to mention, having to read and understand the paper first. Depending on the difficulty of the paper and your level of skill in the field, I'd say implementing a paper should take 20-200 hours.
    • Heidaradar 11 days ago
      Oh interesting, didn't know it'd take so long lol Any recommendations for papers you've implemented before?
      • badpun 11 days ago
        Sorry, I mostly work on (hobbyist) 3d computer vision and not ML.

        BTW time of implementation also greatly depends on what you've implemented already. Most papers are a small derivation of some preexisting idea, so I've you've already implemented that idea, there isn't that much work to do on top of it - just modify your existing code. But, if you're just starting with some area, getting up to that point will take time.

        • Heidaradar 11 days ago
          Oh i see, thanks for the info.
  • carom 11 days ago
    Andrej Karpathy is currently releasing videos for a course [1] that goes from zero to GPT.


  • Mxbonn 11 days ago
    Not sure if it's beginner friendly but I found implementing NeRF from scratch a good exercise. Especially since it reveals many details that are not immediately obvious from the paper.
    • kkaranth 11 days ago
      I've been working on implementing this too! It's been fun trying to debug problems by figuring out how to reduce it to the "bare minimum" to repro. Ex: does it work if I disable positional encoding? Does it work if I have only one sample per ray on a single image dataset? Etc
    • jahewson 11 days ago
      Sounds neat. Tell me more.
  • polygamous_bat 11 days ago
    I would recommend diffusion: try starting with Lilian Weng's blog post and writing up the process for yourself. For all it's abilities, the code for DDPM is surprisingly simple.
  • m-watson 11 days ago
    This is not exactly what you are looking for but you should browse Papers with Code:

  • osaariki 11 days ago
    I'd love for someone to do a good quality PyTorch enabled implementation of Sampled AlphaZero/MuZero [1]. RLLib has an AlphaZero, but it doesn't have the parallelized MCTS you really want to have and the "Sampled" part is another twist to it. It does implement a single player variant though, which I needed. This would be amazing for applying MCTS based RL to various hard combinatorial optimization problems. Case in point, AlphaTensor uses their internal implementation of Sampled AlphaZero.

    An initial implementation might be doable in 5 hours for someone competent and familiar with RLLib's APIs, but could take much longer to really polish.


    • Heidaradar 11 days ago
      I'll definitely take a look!
  • la_fayette 11 days ago
    I have implemented YOLO v1 and train/tested it on synthetic images with geometric forms. Implementing the loss function thought me a lot on how backpropagation really works. I used keras/tf.
  • imranq 11 days ago
    Highly recommend this resource for RL

  • biotechbio 11 days ago
    Just finished assignment 2 of cs224n[1], which has you derive gradients and implement word2vec. I thought it was a pretty good exercise. You could read the glove paper and try implementing that as well.

    Knowing how to step through backpropagation in a neural network gets you pretty far in conceptual understanding of a lot of architectures. Imo there’s no substitute for writing out the gradients by hand to make sure you get what’s going on, if only in a toy example.


  • paulmorio 11 days ago
    I have done this a few times now. Alone (e.g. and in collaboration with others (e.g. primarily as a way to learn about the methods I was interested in from a research perspective whilst improving my skills in software engineering. I am still learning.

    Starting out I would recommend implementing fundamental building blocks within whatever 'subculture' of ML you are interested in whether that be DL, kernel methods, probabilistic models, etc.

    Let's say you are interested in deep learning methods (as that's something I could at least speak more confidently about). In that case build yourself an MLP layer, then an RNN layer, then a GNN layer, then a CNN layer, and an attention layer along with some full models with those layers on some case studies exhibiting different data modalities (images, graphs, signals). This should give you a feel for the assumptions driving the inductive biases in each layer and what motivates their existence (vs. an MLP). It also gives you the all the building blocks you can then extend to build every other DL layer+model out there. Another reason is that these fundamental building blocks have been implemented many times so you have a reference to look to when you get stuck.

    On that note: here are some fun GNN papers to implement in order of increasing difficulty (try building using vanilla PyTorch/Jax instead of PyG). - SGC (from - GCN (from - GAT (from

    After building the basic building blocks these should each take about 2-5 hours (reading paper + implementation). Probably quicker at the end with all this practice. Good luck and remember to have fun!

    • Heidaradar 5 days ago
      another question: Do you have a good resource for learning more about GNNs, I'm currently looking at the Stanford course, is that good enough? any other courses/books/ just something else that you think would be more useful?
    • Heidaradar 6 days ago
      quick question: when you say write your own MLP, RNN etc, is that without using PyTorch, I'm assuming so? i.e., i'm guessing you want me to write my own NN library that handles all of this stuff, including backprop etc
    • Heidaradar 11 days ago
  • manimino 11 days ago
    I enjoyed doing this through the Coursera deep learning specialization:

    The lectures take you through each major paper, then you implement the paper in the homework. Much faster than reading the paper yourself.

    • Heidaradar 11 days ago
      hmm, I found that I didn't really like this course but I'll look through it again
  • Artgor 11 days ago
    You could start with Word2Vec or GloVe, for example. Another option is to to with CV papers and start with Alexnet or first ResNet.
  • KhoomeiK 11 days ago
    Hey, feel free to reach out if you’d like to join an NLP project to gain more experience that I’m working on. Will provide mentorship and potentially coauthorship on the publication.
    • zebloxxer 11 days ago
      Hey - I would love to help out with whatever needs doing. Happy to do grunt work. I've been deeply studying NLP for a few months now and a project is exactly what I need to help me move forward with it.
      • KhoomeiK 9 days ago
        Sure, feel free to email me as well. Email is listed on the personal website linked in my profile.
    • freetheelephant 11 days ago
      Hey, I'd love to potentially join and help out on this project and learn more about NLP.
      • KhoomeiK 9 days ago
        Sure, feel free to email me as well. Email is listed on the personal website linked in my profile.
    • Heidaradar 11 days ago
      yeah, i'd love to! how do you want me to message you?
      • KhoomeiK 10 days ago
        Email is on my personal website linked in my profile
  • ausbah 11 days ago
    you could maybe write the whole thing in a few hours, but debugging what you wrote to recreate prior results will probably take much longer depending on choice of problem
    • Heidaradar 11 days ago
      what paper is this referring to, sorry?
      • ausbah 11 days ago
        sorry should've been more specific, but ML is general. from what i've seen it's not hard to reimplement the pseudo-code in any ML paper. it's just that it gets tricky when you actually try to utilize the code you've written, usually in trying to recreate performance results in the implementing paper. it's very common for authors to leave out / downplay the role of tricks or implementation details that greatly contributed to the performance of the model, in addition to just how finicky machine learning is in general
        • Heidaradar 11 days ago
          i see, that makes sense. thanks for the info.
  • Buttons840 11 days ago
    Soft Actor Critic, a reinforcement learning algorithm.