gitaskhub

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Stars · 169
Language · Python
License · Apache-2.0
Ask anything about this repo to start.

By chatting or signing in you agree to the Terms and chat-message logging (revocable in History).