Abstract
The paper focuses on edge-friendly transformer design, including attention compression and memory-aware scheduling for constrained mobile hardware.
Mesklin Research Labs team explores memory-efficient attention mechanisms for mobile hardware.
Authors
A. Chen, J. Smith
Version
v1.0.0
Status
Published
Summary
The paper focuses on edge-friendly transformer design, including attention compression and memory-aware scheduling for constrained mobile hardware.
The paper focuses on edge-friendly transformer design, including attention compression and memory-aware scheduling for constrained mobile hardware.