Training of Physical Neural Networks

(arxiv.org)

133 points | by Anon84 14 days ago

9 comments

  • tomxor 14 days ago
    Last time I read about this the main practical difficulty was model transferability.

    The very thing that makes it so powerful and efficient is also the thing that make it uncopiable, because sensitivity to tiny physical differences in the devices inevitably gets encoded into the model during training.

    It seems intuitive this is an unavoidable, fundamental problem. Maybe that scares away big tech, but I quite like the idea of having invaluable, non-transferable, irreplaceable little devices. Not so easily deprecated by technological advances, flying in the face of consumerism, getting better with age, making people want to hold onto things.

    • alexpotato 14 days ago
      > Last time I read about this the main practical difficulty was model transferability.

      There is a great write up of this in this old blog post: https://www.damninteresting.com/on-the-origin-of-circuits/

    • jegp 13 days ago
      It's still possible to train a network that's aware of the physics and then transfer that to physical devices. One approach to this from the neuromorphic community (that's been working on this for a long time) is called the Neuromorphic Intermediate Representation (NIR) and already lets you transfer models to several hardware platforms [1]. This is pretty cool because we can use the same model across systems, similar to a digital instruction set. Ofc, this doesn't fix the problem of sensitivity. But biology fixed that with plasticity, so we can probably learn to circumvent that.

      [1]: https://github.com/neuromorphs/nir (disclaimer: I'm one of the authors)

    • robertsdionne 14 days ago
      This is “Mortal Computation” coined in Hinton’s The Forward-Forward Algorithm: Some Preliminary Investigations https://arxiv.org/abs/2212.13345.
    • 6gvONxR4sf7o 13 days ago
      If (somehow/waves hands) you could parallelize training, maybe this would turn into an implicit regularization and be a benefit, not a flaw. Then again, physical parallelizability might be an infeasibly restrictive constraint?
    • CuriouslyC 13 days ago
      This was the thing Geoff Hinton cited as a problem with analog networks.

      I think eventually we'll get to the point where we do a stage of pretraining on noisy digital hardware to create a transferrable network, then fine tune it on the analog system.

    • programjames 14 days ago
      You can regularize the networks to make them transfer easier. I can't remember the abstract's title off the top of my head though.
    • dsabanin 14 days ago
      Couldn’t you still copy by training a new network on a new device to have same outputs for the same inputs as the original?
      • tomxor 14 days ago
        Yes, but training is the most expensive part of ML, for example GPT-3 is estimated to cost something like 1-4 million USD.

        With ANN you can do it one time and then clone the result for negligible energy cost.

        Maybe training a batch of PNNs in parallel could save some of the energy cost, but I don't know how feasible that is considering they could behave slightly differently during training causing divergence... Now that sarcastic comment at the bottom of this thread is starting to sound relevant "Schools".

        • l33tman 13 days ago
          That's not true for the most well-known models. For example Meta's LLAMA training and architecture was predicated on the observation that training cost is a drop in the well compared to the inference cost for a model's lifetime.
        • kmmlng 13 days ago
          > Yes, but training is the most expensive part of ML, for example GPT-3 is estimated to cost something like 1-4 million USD.

          That entirely depends on how many inferences the model will perform during its lifecycle. You can find different estimates for the energy consumption of ChatGPT, but they range from something like 500-1000 MWh a day. Assuming an electricity price of $0.165 per kWh, that would put you at roughly $80,000 to a $160,000 a day.

          Even at the lower end of $80,000 a day, you'll reach your $4 Million in just 50 days.

          • tomxor 13 days ago
            That's not a proportional comparison, n simultaneous users to 1 training. How many users across how many GPUs is that 80k?

            With PNN you would have to multiply n by 1-4 million, training cost explodes.

      • etiam 13 days ago
        Distillation (as you may be aware). https://arxiv.org/abs/1503.02531

        Having to do that in each instance is still really cumbersome for cheap mass deployment compared to just making a digital-style exact copy, but then again I guess a main argument for wanting these systems is that they'd be doing things unachievable in practice on digital computers.

        In some cases one might be able to distill to digital arithmetic after the heavy parts of the optimization are done, for replication, distribution, better access for software analysis, etc.

    • trextrex 14 days ago
      Well, the brain is a physical neural network, and evolution seems to have figured out how to generate a (somewhat) copiable model. I bet we could learn a trick or two from biology here.
      • hansworst 14 days ago
        The way the brain does it is by giving users a largely untrained model that they themselves have to train over the next 20 years for it to be of any use.
        • salomonk_mur 13 days ago
          It is extremely trained already. Everyone alive was born with the ability for all their organs and bodily function to work autonomously.

          A ton of that is probably encoded elsewhere, but no doubt the brain plays a huge part. And somehow, it's all reconstructed for each new "device".

        • wbillingsley 13 days ago
          Sometimes. Foals are born (almost) able to walk. There are occasions where evolution baked the model into the genes.
        • lynx23 13 days ago
          20 years of training is not enough. Neuroscientists say 25. According to my own experience, its more like 30.
          • dcuthbertson 13 days ago
            In the end, it's a life-long process.
      • tomxor 14 days ago
        Some parts are copiable, but not the more abstract things like the human intellect, for lack of a better word.

        We are not even born with what you might consider basic mental faculties, for example it might seem absurd, but we have to learn to see... We are born with the "hardware" for it, a visual cortex, an eye, all defined by our genes, but it's actually trained from birth, there is even a feedback loop that causes the retina to physically develop properly.

        • alexpotato 14 days ago
          Another example:

          Children who were "raised in the wild" or locked in a room by themselves have shown to be incapable of learning full human language.

          The working theory is that our brains can only learn certain skills at certain times of brain development/ages.

          • deepfriedchokes 14 days ago
            We should also consider the effects of trauma on those brains. If you’ve ever spent time around people with extreme trauma they are very much in their own heads and can’t focus outside themselves long enough to focus enough to learn anything. It definitely impacts intellectual capacity. Humans are social animals and anyone raised without proper socializing and intimacy and nurturing will inevitably end up traumatized.
        • immibis 14 days ago
          They raised some cats from birth in an environment with only vertically-oriented edges, none horizontal. Those cats could not see horizontally-oriented things. https://computervisionblog.wordpress.com/2013/06/01/cats-and...

          Likewise, kittens with an eye patch over an eye in the same time period remain blind in that eye forever.

          • tomxor 14 days ago
            Wow, that's a horrific way of proving that theory.
          • BriggyDwiggs42 14 days ago
            Geez poor kitties, but that is interesting.
    • bongodongobob 14 days ago
      Reminds me of the evolutionary FPGA experiment that was dependent on magnetic flux or something. The same program wouldn't work on a different FPGA.
  • ksd482 14 days ago
    PNNs resemble neural networks, however at least part of the system is analog rather than digital, meaning that part or all the input/output data is encoded continuously in a physical parameter, and the weights can also be physical, with the ultimate goal of surpassing digital hardware in performance or efficiency.

    I am trying to understand what format does a node take in PNNs. Is it a transistor? Or is it more complex than that? Or, is it a combination of a few things such as analog signal and some other sensors which work together to form a single node that looks like the one we are all familiar with?

    Can anyone please help me understand what exactly is "physical" about PNNs?

    • eightysixfour 14 days ago
    • sigmoid10 14 days ago
      It's just a general idea to implement the computation part of neurons directly in hardware instead of software. For example by calculating sums or products using voltages in circuits, i.e. analog computing. The actual implementation is up to the designer, who in turn will try to mimic a certain architecture.
  • Shawnecy 14 days ago
    My knowledge in this area is incredibly limited, but I figured the paper would mention NanoWire Networks (NWNs) as an emerging physical neural network[0].

    Last year, researchers from the University of Sydney and UCLA used NWNs to demonstrate online learning of handwritten digits with an accuracy of 93%.

    [0] = https://www.nature.com/articles/s41467-023-42470-5

    • orbifold 13 days ago
      Classifying MNIST digits with 93% accuracy can also be accomplished using a linear classifier. So it isn't clear to me what the advantage would be.
    • programjames 14 days ago
      That doesn't implement a trainable network on hardware, it's just creating a "reservoir" of associations between the inputs.
  • programjames 14 days ago
    > These methods are typically slow because the number of gradient updates scales linearly with the number of learnable parameters in the network, posing a significant challenge for scaling up.

    This is a pretty big problem, though if you use information-bottleneck training you can train each layer simultaneously.

  • UncleOxidant 14 days ago
    So it sounds like these PNNs are essentially analog implementations of neural nets? Seems like an odd choice of naming to call them 'physical'.
    • tomxor 14 days ago
      ANN is taken.
      • agarwaen163 14 days ago
        Well, this is particularly frustrating because PNN is already taken as well, by the (imo) much more novel idea of Physics Informed Neural Networks (aka PINNs or PNNs).

        Why this isn't called Hardware Neural Nets is beyond me.

      • TheLoafOfBread 14 days ago
        I mean LoRA was taken too before LoRA became a thing
        • tomxor 14 days ago
          I don't mean globally, LoRA are at least in different domains. Artificial Neural Networks and Physical Neural Networks are both machine learning, discussion referring to both is highly probable, and the former far more established so it calling it an Analog Neural Network would never last long.
          • TeMPOraL 13 days ago
            You can't call analog NN an AnalNN either, as then the "brain-gut axis" people will throw a fit.
    • pessimizer 14 days ago
      Makes sense as opposed to "abstract." With the constant encoding and decoding that has to be done when things are going in an out of processors and storage (or sensors), digital processes are always in some sense simulations.
  • craigmart 14 days ago
    Schools?