IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

(github.com)

3 points | by teleforce 5 hours ago ago

1 comments