6 comments

  • mrinterweb 35 minutes ago
    I've been wondering when we will see general purpose consumer FPGAs, and eventually ASICs, for inference. This reminds me of bitcoin mining. Bitcoin mining started with GPUs. I think I remember a brief FPGA period that transitioned to ASIC. My limited understanding of Google's tensor processing unit chips are that they are effectively a transformer ASIC. That's likely a wild over-simplification of Google's TPU, but Gemini is proof that GPUs are not needed for inference.

    I suspect GPU inference will come to an end soon, as it will likely be wildly inefficient by comparison to purpose built transformer chips. All those Nvidia GPU-based servers may become obsolete should transformer ASICs become mainstream. GPU bitcoin mining is just an absolute waste of money (cost of electricity) now. I believe the same will be true for GPU-based inference soon. The hundreds of billions of dollars being invested on GPU-based inference seems like an extremely risky bet that ASIC transformers won't happen, although Google has already widely deployed their own TPUs.

    • fooblaster 12 minutes ago
      FPGAs will never rival gpus or TPUs for inference. The main reason is that GPUs aren't really gpus anymore. 50% of the die area or more is for fixed function matrix multiplication units and associated dedicated storage. This just isn't general purpose anymore. FPGAs cannot rival this with their configurable DSP slices. They would need dedicated systolic blocks, which they aren't getting. The closest thing is the versal ML tiles, and those are entire peoxessors, not FPGA blocks. Those have failed by being impossible to program.
    • tucnak 23 minutes ago
      It all comes down to memory and fabric bandwidth. For example, the state of the art developer -friendly (PCIe 5.0) FPGA platform is Alveo V80 which rocks four 200G NIC's. Basically, Alveo currently occupies this niche where it's the only platform on the market to allow programmable in-network compute. However, what's available in terms of bandwidth—lags behind even pathetic platforms like Bluefield. Those in the know are aware of what challenges are there to actually saturate it for inference in practical designs. I think, Xilinx is super well-positioned here, but without some solid hard IP it's still a far cry from purpose silicon.
      • mrinterweb 7 minutes ago
        As far as I understand all the inference purpose-build silicon out there is not being sold to competitors and kept in-house. Google's TPU, Amazon's Inferentia (horrible name), Microsoft's Maia, Meta's MTIA. It seems that custom inference silicon is a huge part of the AI game. I doubt GPU-based inference will be relevant/competitive soon.
  • hinkley 2 hours ago
    I think I could trust AI more if we used it to do heuristics for expensive deterministic processes. Sort of a cross between Bloom Filters and speculative execution. Determine the odds the expensive operation 1 will indicate that expensive operation 2 needs to happen, and then start expensive operation 2 while we determine if it’s actually needed. If its right 95% of the time, which is the sort of ranges AI can aspire to, that’s skipping the high latency task chaining 19 times out of 20, which would be pretty good.
    • hnuser123456 1 hour ago
      There are Bayesian neural networks that could apparently track probability rather than just e.g. randomly selecting one output from the top-k based on probability, but I'm still learning up on them myself. Sounds like they're not normally combined with language models.
    • rjsw 2 hours ago
      There have been comments that some leading AI researchers were switching away from working on language models to do stuff with "real world data".
  • babl-yc 31 minutes ago
    This is cool. I'm observing a trend of "build a tiny version from the ground-up to understand it" a la Karpathy's micrograd/minGPT. Seems like one of the best ways to learn.
  • aunty_helen 2 hours ago
    I think it’s only a matter of time before we see asic vendors making TPU devices. Same thing happened with BTC. There was enough money there to spawn an industry. Nvidias 70% margins are too hard to ignore. And if playing on the open market seems too rough, there’s always acquisition potential like what happened to groq.
    • NitpickLawyer 1 hour ago
      Aren't high end accelerators already closer to ASICs than to og GPUs, tho?
  • ph4evers 46 minutes ago
    Such a cool project! Next one is to run jaxprs via the driver?
  • fooblaster 1 hour ago
    Great! How do you program it?