-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Revert "[Core] Remove unnecessary copies in flash attn backend"
#5478
opened Jun 13, 2024 by
Yard1
Loading…
[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations
#5466
opened Jun 12, 2024 by
mgoin
Loading…
[Usage] Clarify and Update Argument for Specifying Model Revisions
#5453
opened Jun 12, 2024 by
Etelis
Loading…
[Hardware][Intel] Support CPU inference with AVX2 ISA
#5452
opened Jun 12, 2024 by
DamonFool
Loading…
[Hardware][Intel] Generate custom activation ops using torch.compile for CPU backend.
x86 CPU
#5446
opened Jun 12, 2024 by
bigPYJ1151
Loading…
[ Misc ] Rs/compressed tensors cleanup
#5432
opened Jun 12, 2024 by
robertgshaw2-neuralmagic
Loading…
[Bugfix] fix lora_dtype value type in arg_utils.py - part 2
#5428
opened Jun 11, 2024 by
c3-ali
Loading…
[Hardware][AMD][CI/Build][Doc][Kernel] Upgrade to ROCm 6.1, Dockerfile improvements, Paged Attention tuning
rocm
#5422
opened Jun 11, 2024 by
mawong-amd
Loading…
[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model
#5414
opened Jun 11, 2024 by
wooyeonlee0
•
Draft
3 of 4 tasks
[Bugfix] We have fixed the bug that occurred when using FlashInfer as the backend in vLLM Speculative Decoding.
#5412
opened Jun 11, 2024 by
bong-furiosa
Loading…
[Core] Refactor Worker and ModelRunner to consolidate control plane communication
action-required
#5408
opened Jun 11, 2024 by
stephanie-wang
Loading…
2 tasks done
Previous Next
ProTip!
no:milestone will show everything without a milestone.