Looks like this uses mutation of global/shared state. For the example:
z=x*y+3,
what if there is another function that does:
w=x+2*y
and then both functions do a backward pass (simultaneously, perhaps in different threads or otherwise); then it seems dangerous to collect the results of the backward pass (partial derivatives) in the shared variables x and y and make them accessible through x.get_grad() and y.get_grad(). Imho, in a better design, you'd say z.get_grad(x) and z.get_grad(y), and w.get_grad(x) and w.get_grad(y) to get the partial derivatives.
I wanted to store the graph in a heap to be able to send it to the gpu later on, but then I got lazy and abandoned it. But you always learn something. :)
That sounds interesting; what do you mean by "in a heap"? Is the stack they're currently linearized into not GPU-friendly? I don't know much about GPU programming, so this might be a dumb question.
My idea was to make a Vec of nodes with pointers to indexes in the vec, so it would be easier to send this array into the gpu. I wanted to make a minimal version example of making a micrograd network run on the gpu, with wgpu or macroquad, but I didn’t complete it, so would be nice if someone else did it. :)
z=x*y+3,
what if there is another function that does:
w=x+2*y
and then both functions do a backward pass (simultaneously, perhaps in different threads or otherwise); then it seems dangerous to collect the results of the backward pass (partial derivatives) in the shared variables x and y and make them accessible through x.get_grad() and y.get_grad(). Imho, in a better design, you'd say z.get_grad(x) and z.get_grad(y), and w.get_grad(x) and w.get_grad(y) to get the partial derivatives.
I wanted to store the graph in a heap to be able to send it to the gpu later on, but then I got lazy and abandoned it. But you always learn something. :)