I really like the second part of the blogpost but starting with Gaussian elimination is a little "mysterious" for lack of a better word. It seems more logical to start with a problem ("how to solve linear equations?" "how to find intersections of lines?"), show its solution graphically, and then present the computational method or algorithm that provides this solution. Doing it backwards is a little like teaching the chain rule in calculus before drawing the geometric pictures of how derivatives are like slopes.
Author here – I think you're probably right. I wrote the Gaussian elimination section more as a recap, because I figured most readers have seen Gaussian elimination before, and I was keen to get to the rest of it. I'd love to hear if other folks had trouble with this section. Maybe I need to slow it down and explain it better.
Loved the article, and also the shoutout to Strang's lectures.
I agree with the order, the Gaussian should come later I almost closed the article - glad I kept scrolling out of curiosity.
Also I felt like I had been primed to think about nickles and pennies as variables rather than coefficients due to the color scheme, so when I got to the food section I naturally expected to see the column picture first.
When I encountered the carb/protein matrix instead, I perceived it in the form:
[A][x], where the x is [milk bread].T
so I naturally perceived the matrix as a transformation and saw the food items as variables about to be "passed through" the matrix.
But another part of my brain immediately recognized the matrix as a dataset of feature vectors, [[milk].T [bread].T], yearning for y = f(W @ x).
I was never able to resolve this tension in my mind...
I really like this, and I think one way to make it even more clear would be to use other variable letters to represent breads and milks, because their x’s and y’s somehow morph into the x’s and y’s that represent carbs and protein in the graph.
This is nice. Until I took an actual semester of it in college, linear algebra was a total mystery to me. Great job.
For those unfamiliar with vectors, it might be helpful to briefly explain how the two vectors (their magnitude and direction) represent the one bread and one milk and how vectors can be moved around and added to each other.
I feel like it's obligatory to also drop a link to the 3blue1brown series on linear algebra, for anyone interested in learning - it is a step up from what's in this post, but these videos are brilliant and still super accessible:
This is great. I really appreciate visual explanations and the way you build up the motivation. I'm using a few resources to learn linear algebra right now, including "The No Bullshit Guide to Linear Algebra", which has been pretty decent so far. Does anyone have other recommendations? I've found a lot of books to be too dense or academic for what I need. My goal is to develop a practical, working understanding I can apply directly.
>My goal is to develop a practical, working understanding I can apply directly.
Apply directly... to what? IMO it is weird to learn theory (like linear algebra) expressly for practical reasons: surely one could just pick up a book on those practical applications and learn the theory along the way? And if in this process, you end up really needing the theory then certainly there is no substitute for learning the theory no matter how dense it is.
For example, linear algebra is very important to learning quantum mechanics. But if someone wanted to learn linear algebra for this reason they should read quantum mechanics textbooks, not linear algebra textbooks.
You're totally right. I left out the important context. I'm learning linear algebra mainly for applied use in ML/AI. I don't want to skip the theory entirely, but I've found that approaching it from the perspective of how it's actually used in models (embeddings, transformations, optimization, etc.) helps me with motivation and retaining.
So I'm looking for resources that bridge the gap, not purely computational "cookbook" type resources but also not proof-heavy textbooks. Ideally something that builds intuition for the structures and operations that show up all over ML.
Although if your goal is to learn ML you should probably focus on that first and foremost, then after a while you will see which concepts from linear algebra keep appearing (for example, singular value decomposition, positive definite matrices, etc) and work your way back from there
Thanks. I have a copy of Strang and have been going through it intermittently. I am primarily focused on ML itself and that's been where I'm spending most of my time. I'm hoping to simultaneously improve my mathematical maturity.
I hadn't known about Learning from Data. Thank you for the link!
Since you're associating ML with singular value decomposition, do you know if it is possible to factor the matrices of neural networks for fast inverse jacobian products? If this is possible, then optimizing through a neural network becomes roughly as cheap as doing half a dozen forward passes.
"Aside: Another solution for the above is 23 pennies. Or -4 nickels + 43 pennies."
This is where the math nerds just can't help themselves, and I'm here for it. However, these things drive me crazy at the same time. You cannot have -4 nickels. In pure math with only x and y, sure those values can be negative. But when using real world examples using physical objects, no, you cannot have a negative nickel. Maybe you owe your mate the value of 4 nickels, but that's outside the scope of this lesson. Your negative nickels are not in another dimension (because again, the math works that way). You want to help people understand math with real world concepts but then go and confuse things with pure math concepts. And these negative nickels are still not even getting into imaginary nickels territory like you have square root of -4 nickels.
about 15 years ago I started an aggregator to accumulate/sort/filter the best instruction of various topics, kinda like Reddit for learning. This is such a perfect example of the kind of thing I hoped would filter to the top. Thinking about trying to redo it. Is there a use for this sort of thing in today's world?
That "Bam!" thing just brought Josh Starmer to mind. Anyone remember his book with the illustrated ML stuff? I used to watch his YouTube channel too. I really dig these kinds of explainers; they make learning so much more fun.
I don’t like these examples because IRL nobody does things this way.
Try actual problems that require you to use these tools and the inter-relationships between them, where it becomes blindingly obvious why they exist. Calculus is a prime example and it’s comical most students find Calculus hard because their LA is weak. But Calculus has extensive uses, just not for doing basic carb counting.
Seems a bit premature? This is "linear algebra" in the sense of middle/high school algebra in linear equations. I suppose many more chapters are coming?
As much as I like posts like this I can't feel anything other than hate for the substack platform, it just sucks I'm sorry but I can't understand how people can rely on that bloated web app. I just click around and it's so slow and buggy, recently I canceled a subscription because it kepts signin me out and the signup signin experience just suck
The (an) answer is that since the LHS and RHS are equal, you can choose to add or subtract them to another equation and preserve equality.
If I remember correctly, substitution (isolating x or y) was introduced before this technique.
I agree with the order, the Gaussian should come later I almost closed the article - glad I kept scrolling out of curiosity.
Also I felt like I had been primed to think about nickles and pennies as variables rather than coefficients due to the color scheme, so when I got to the food section I naturally expected to see the column picture first.
When I encountered the carb/protein matrix instead, I perceived it in the form:
[A][x], where the x is [milk bread].T
so I naturally perceived the matrix as a transformation and saw the food items as variables about to be "passed through" the matrix.
But another part of my brain immediately recognized the matrix as a dataset of feature vectors, [[milk].T [bread].T], yearning for y = f(W @ x).
I was never able to resolve this tension in my mind...
For those unfamiliar with vectors, it might be helpful to briefly explain how the two vectors (their magnitude and direction) represent the one bread and one milk and how vectors can be moved around and added to each other.
https://youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFit...
Apply directly... to what? IMO it is weird to learn theory (like linear algebra) expressly for practical reasons: surely one could just pick up a book on those practical applications and learn the theory along the way? And if in this process, you end up really needing the theory then certainly there is no substitute for learning the theory no matter how dense it is.
For example, linear algebra is very important to learning quantum mechanics. But if someone wanted to learn linear algebra for this reason they should read quantum mechanics textbooks, not linear algebra textbooks.
So I'm looking for resources that bridge the gap, not purely computational "cookbook" type resources but also not proof-heavy textbooks. Ideally something that builds intuition for the structures and operations that show up all over ML.
https://math.mit.edu/~gs/learningfromdata/
Although if your goal is to learn ML you should probably focus on that first and foremost, then after a while you will see which concepts from linear algebra keep appearing (for example, singular value decomposition, positive definite matrices, etc) and work your way back from there
I hadn't known about Learning from Data. Thank you for the link!
Less popular techniques like normalizing flows do need that but instead of SVD they directly design transformations that are easier to invert.
Same, and I think ML is a perfect use case for this. I also have a series for that coming.
This is where the math nerds just can't help themselves, and I'm here for it. However, these things drive me crazy at the same time. You cannot have -4 nickels. In pure math with only x and y, sure those values can be negative. But when using real world examples using physical objects, no, you cannot have a negative nickel. Maybe you owe your mate the value of 4 nickels, but that's outside the scope of this lesson. Your negative nickels are not in another dimension (because again, the math works that way). You want to help people understand math with real world concepts but then go and confuse things with pure math concepts. And these negative nickels are still not even getting into imaginary nickels territory like you have square root of -4 nickels.
Try actual problems that require you to use these tools and the inter-relationships between them, where it becomes blindingly obvious why they exist. Calculus is a prime example and it’s comical most students find Calculus hard because their LA is weak. But Calculus has extensive uses, just not for doing basic carb counting.
B: I miss scroll bars. I really, really miss scroll bars.
Also, HT to your user name! Egon Schiele is one of my favorite artists! Loved seeing his works at the Neue in NYC.