AI inference

Affordable robotic systems often have access to inexpensive CPU resources. What affordable robots often do not have is a large, expensive, and power-hungry GPU budget. That motivated us to build SARA [Sharded Activation Reduction Architecture]: a distributed inference path that uses multiple CPUs to reduce single-stream token latency.

AI inference

SARA: Sharded Activation Reduction Architecture