Skip to content

vllm.v1.attention.backends.mla.flashinfer_fa2_mla

FlashInfer FA2 MLA Backend.

Uses the FlashInfer BatchMLAPagedAttentionWrapper with the FA2 backend for bf16 MLA decode. This backend requires plan()/run() API with CSR-format page indices and supports returning LSE for DCP.