LiME-TM: A Lightning-fast and Memory-Efficient ML Framework
for On-Device Training on MCUs

Han Wu, Shengyu Duan and Tousif

EMG and Why TM?

Vanilla TM: Memory Consumption

  • 100 clauses
  • 10 classes
  • 28 x 28 pixels
  • 8 bits

100 clauses x 10 classes x 28 x 28 pixels x 8 bits
= 12,544,000 Bytes (~ 12MB)

  • On-Device Training: 20 KB SRAM
  • On-Device Inference: 2 KB SRAM

Vanilla TM (12 MB) --> LiME-TM Pruning (ESP32 520 KB)

LiME-TM: Pruning (ESP32)

Vanilla TM (12 MB) --> LiME-TM Pruning (ESP32 520 KB) --> LiME-TM Compiler (STM32F4 192 KB)

LiME-TM: Compiler (STM32F4)

Vanilla TM (12 MB) --> LiME-TM Pruning (ESP32 520 KB) --> LiME-TM Compiler (STM32F4 192 KB) --> LiME-TM Compression (STM32L4 128 KB)

LiME-TM: Compression (STM32L4)

Vanilla TM (12 MB) --> LiME-TM Pruning (ESP32 520 KB) --> LiME-TM Compiler (STM32F4 192 KB) --> LiME-TM Compression (STM32L4 128 KB) --> LiME-TM Bare-metal (STM32F103 20 KB)

LiME-TM: Bare-metal (STM32F103)

Vanilla TM (12 MB) --> LiME-TM Pruning (ESP32 520 KB) --> LiME-TM Compiler (STM32F4 192 KB) --> LiME-TM Compression (STM32L4 128 KB) --> LiME-TM Bare-metal (STM32F103 20 KB) --> LiME-TM Inference-Only (Arduino 2 KB)

LiME-TM: Inference Only (Arduino)

Thanks

  Source Code