SIMD-accelerated computer vision on a $2 microcontroller

(shraiwi.github.io)

86 points | by shraiwi 30 days ago

3 comments

  • evanjrowley 30 days ago
    A comparable board is the ESP32-CAM, which is supported by this really practical computer vision project: https://github.com/jomjol/AI-on-the-edge-device?tab=readme-o...
    • maven29 29 days ago
      There is an ESP32-S3 version of this camera breakout board, which is presumably what OP might have used for prototyping.

      The S3 variant easily justifies the slight additional cost, given that it's easily faster by an order of magnitude or greater, having SIMD and an FPU.

      https://github.com/espressif/esp-dl/tree/master/examples/fac...

  • westurner 30 days ago
    > As I've been really interested in computer vision lately, I decided on writing a SIMD-accelerated implementation of the FAST feature detector for the ESP32-S3 [...]

    > In the end, I was able to improve the throughput of the FAST feature detector by about 220%, from 5.1MP/s to 11.2MP/s in my testing. This is well within the acceptable range of performance for realtime computer vision tasks, enabling the ESP32-S3 to easily process a 30fps VGA stream.

    What are some use cases for FAST?

    Features from accelerated segment test: https://en.wikipedia.org/wiki/Features_from_accelerated_segm...

    Is there TPU-like functionality in anything in this price range of chips yet?

    Neon is an optional SIMD instruction set extension for ARMv7 and ARMv8; so Pi Zero and larger have SIMD extensions

    Orrin Nano have 40 TOPS, which is sufficient for Copilot+ AFAIU. "A PCIe Coral TPU Finally Works on Raspberry Pi 5" https://news.ycombinator.com/item?id=38310063

    From https://phys.org/news/2024-06-infrared-visible-device-2d-mat... :

    > Using this method, they were able to up-convert infrared light of wavelength around 1550 nm to 622 nm visible light. The output light wave can be detected using traditional silicon-based cameras.

    > "This process is coherent—the properties of the input beam are preserved at the output. This means that if one imprints a particular pattern in the input infrared frequency, it automatically gets transferred to the new output frequency," explains Varun Raghunathan, Associate Professor in the Department of Electrical Communication Engineering (ECE) and corresponding author of the study published in Laser & Photonics Reviews.

    "Show HN: PicoVGA Library – VGA/TV Display on Raspberry Pi Pico" https://news.ycombinator.com/item?id=35117847#35120403 https://news.ycombinator.com/item?id=40275530

    "Designing a SIMD Algorithm from Scratch" https://news.ycombinator.com/item?id=38450374

    • shraiwi 30 days ago
      Thanks for reading!

      > What are some use cases for FAST?

      The FAST feature detector is an algorithm for finding regions of an image that are visually distinctive, which can be used as a first step in motion tracking and SLAM (simultaneous localization and mapping) algorithms typically seen in XR, robotics, etc.

      > Is there TPU-like functionality in anything in this price range of chips yet?

      I think that in the case of the ESP32-S3, its SIMD instructions are designed to accelerate the inference of quantized AI models (see: https://github.com/espressif/esp-dl), and also some signal processing like FFTs. I guess you could call the SIMD instructions TPU-like, in the sense that the chip has specific instructions that facilitates ML inference (EE.VRELU.Sx performs the ReLU operation). Using these instructions will still take away CPU time where TPUs are typically their own processing core, operating asynchronously. I’d say this is closer to ARM NEON.

      • kylixz 29 days ago
        Interested in doing more of this type of work optimizing a SLAM/factorgraph pipeline?

        Email in bio and would love to chat!

  • restricted_ptr 30 days ago
    I wonder if ESP32 has VLIW slots and a tighter instruction packaging is possible?
    • duskwuff 29 days ago
      Neither Xtensa nor RISC-V are VLIW architectures.