Catgrad: A categorical deep learning compiler

(catgrad.com)

77 points | by remexre 150 days ago

8 comments

incrudible 147 days ago
Category theory always reminds me of this quip by Feynman:
https://www.youtube.com/watch?v=B-eh2SD54fM
When I hear people talking in the jargon of category theory, I do not understand what they say, but I have a suspicion that it is something rather mundane that I would understand if they were using the more specific terms for the given context. I understand the appeal of generalization from a mathematical perspective, but from a practical programming perspective, I fail to understand the value proposition.
[-]
- steego 147 days ago
  While I understand Feynman’s point that mathematicians love to generalize everything and that physicists focus on what actually works in the real world, as programmers, we’re doing something different from both of those groups. We’re engaged in a mix of engineering and analysis. We need to build systems that hold together under real conditions.
  Category theory might seem obsessed with abstraction. In some ways, it is. However, when skilled practitioners apply these tools to areas of computer science and engineering, the point isn’t to create generalized abstractions. The point is identify the minimal, leak-proof abstractions needed to fulfill some desired quality. While I understand the language sounds very general, cryptic and off-putting, it’s also very precise.
  While category theory might seem like a discipline that’s obsessed with generalizations, in practice, the abstractions one constructs using categorical tools can be very specific. The point isn’t always to construct the most general abstractions, it’s to construct abstraction that actually work 100% of the time. The irony is, mathematicians and computer scientists who are concerned with rigor often do this by proving specific cases first and then incrementally generalizing, not the other way around.
- catgary 147 days ago
  It might help the programmer but confuse the quantum physicist or algebraic topologist. It’s an interdisciplinary field and we do our best (well, I’m out of it so it’s not really “we” anymore). The abstract differentiation stuff that makes up the backbone here also gets used in homotopy theory and is applied to algebraic geometry.
  [-]
  - incrudible 146 days ago
    I doubt that the average quantum physicist would have a better time with the jargon of category theory than the programmer. If you want to say something to people, you need to speak their language.
    > The abstract differentiation stuff that makes up the backbone here also gets used in homotopy theory and is applied to algebraic geometry.
    I should certainly hope so!
    [-]
    - catgary 146 days ago
      Look, I agree that someone pitching category theory to programmers should make more of an effort to introduce terms/use common language when possible. And you can look at Gordon Plotkin’s papers to find some good, hard CS papers that take the time to work out examples relating things to structures familiar to a programmer.
      But there’s the reality that CT is an interdisciplinary field, and you will have topologists working with computer scientists and they’ll often be actively disinterested in each other’s fields. Imposing your jargon or notation on your coauthors from other fields is hardly a great way to foster collaboration.
- feanaro 147 days ago
  How do you discover the principles in the first place? You can discover them once and then apply them in all applicable places precisely because you generalised them.
  You have a point that the result may very well be more easily explained in concrete terms to practitioners of a given field in which you applied it, though.
abeppu 148 days ago
So, learning about the categorical structure is interesting, but is there a specific advantage to seeing these concepts directly informing the implementation vs as a post-hoc way of explaining what autodiff is doing? E.g. Tensorflow is creating and transforming graphs of computation. Despite being written before most of the categorical work cited was done, isn't it doing the "same" thing, but we just wouldn't find names or comments in the code that closely align with the categorical work?
[-]
tripplyons 148 days ago
How does this compare to the XLA's ability to compile full training steps from JAX?
catgary 148 days ago
This is a nice project, but “You only linearize once” is more-or-less the type theory version of “Reverse Derivative Categories”, so JAX really does this already.
[-]
- statusfailed 148 days ago
  (author here) the goals of catgrad are a bit different to JAX - first of all, the "autodiff" algorithm is really a general approach to composing optics, of which autodiff is just a special case. Among other things, this means you can "plug your own language" into the syntax library to get a "free" autodiff algorithm. Second, catgrad itself is more like an IR right now - we're using it at hellas.ai to serve as a serverless runtime for models.
  More philosophically, the motto is "write programs as morphisms directly". Rather than writing a term in some type theory which you then (maybe) give a categorical semantics, why not just work directly in a category?
  Long term, the goal is to have a compiler which is a stack of categories with functors as compiler passes. The idea being that in contrast to typical compilers where you are "stuck" at a given abstraction level, this would allow you to view your code at various levels of abstractions. So for example, you could write a program, then write an x86-specific optimization for one function which you can then prove correct with respect to the more abstract program specification.
  [-]
  - efferifick 148 days ago
    > Among other things, this means you can "plug your own language" into the syntax library to get a "free" autodiff algorithm.
    Hello, as a compiler engineer I am interested in this area. Can you expand a little bit more? How would I be able to plug in my own language for example?
    > So for example, you could write a program, then write an x86-specific optimization for one function which you can then prove correct with respect to the more abstract program specification.
    So, what you are saying is that catgrad alllows me to write a program and then also plug in a compiler pass? I.e., the application author can also be the compiler developer?
  - catgary 147 days ago
    I get it, but once you’ve compiled your program down to a computational graph…well those are basically morphisms in a monoidal/multi category, no? And then you can have a stack of MLIR dialects that represent different levels of abstraction, where optimization passes are functors between the multicategory determined by that dialect, then you hand it off to LLVM for CPU execution, or maybe you want to run it with WebGPU, go down to CUDA, etc.
  - nh23423fefe 148 days ago
    can you expand on "write programs as morphisms directly"
    I'm someone super interested in category theory and ITT, but I can't quite parse what you are trying to convey even though I think I have the prereqs to understand the answer.
drumnerd 147 days ago
I think about cones all the time when doing machine learning. If I have an object O that can be mapped to A and B I can learn a function from A to B or B to A if I can generate Os. That’s my view of self supervised learning.
bguberfain 148 days ago
It remembers me Theano [0].
[0] https://en.wikipedia.org/wiki/Theano_(software)
cyanydeez 148 days ago
i always find it fascinating when projects like these can seem to find a proper overview for the AI curious non-PhD students.