vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.9k
Star 20.9k

Code
Issues 884
Pull requests 278
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 42 Milestones 0

New pull request New

278 Open 1,951 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Misc] add code to get git hash info for vllm

#5482 opened Jun 13, 2024 by dhuangnm

Loading…

[CI/Build] Enable LLaVA test in CPU

#5481 opened Jun 13, 2024 by DarkLight1337

Loading…

Revert "[Core] Remove unnecessary copies in flash attn backend"

#5478 opened Jun 13, 2024 by Yard1

Loading…

Seperate dev requirements into lint and test

#5474 opened Jun 12, 2024 by Yard1

Loading…

Add cuda_device_count_stateless

#5473 opened Jun 12, 2024 by Yard1

Loading…

[MISC] Remove FP8 warning

#5472 opened Jun 12, 2024 by comaniac

Loading…

[Doc] Update documentation on Tensorizer

#5471 opened Jun 12, 2024 by sangstar

Loading…

[ci] Try building wheels and upload as artifact

#5469 opened Jun 12, 2024 by khluu

Loading…

[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations

#5466 opened Jun 12, 2024 by mgoin

Loading…

[WIP] Enable loading FP8 checkpoints for gpt_bigcode models

#5460 opened Jun 12, 2024 by tdoublep • Draft

[Usage] Clarify and Update Argument for Specifying Model Revisions

#5453 opened Jun 12, 2024 by Etelis

Loading…

[Hardware][Intel] Support CPU inference with AVX2 ISA

#5452 opened Jun 12, 2024 by DamonFool

Loading…

[Model] Bert Embedding Model

#5447 opened Jun 12, 2024 by laishzh • Draft

[Hardware][Intel] Generate custom activation ops using torch.compile for CPU backend. x86 CPU

#5446 opened Jun 12, 2024 by bigPYJ1151

Loading…

[Bugfix] Avoid to warmup when world size is 1

#5442 opened Jun 12, 2024 by kerthcet

Loading…

[Kernel] Add punica dimension for Qwen2 LoRA

#5441 opened Jun 12, 2024 by jinzhen-lin

Loading…

[Doc] Update LLaVA docs

#5437 opened Jun 12, 2024 by DarkLight1337

Loading…

compressed-tensors marlin 24 support

#5435 opened Jun 12, 2024 by dsikka • Draft

[ Misc ] Rs/compressed tensors cleanup

#5432 opened Jun 12, 2024 by robertgshaw2-neuralmagic

Loading…

[Bugfix] fix lora_dtype value type in arg_utils.py - part 2

#5428 opened Jun 11, 2024 by c3-ali

Loading…

[Hardware][AMD][CI/Build][Doc][Kernel] Upgrade to ROCm 6.1, Dockerfile improvements, Paged Attention tuning rocm

#5422 opened Jun 11, 2024 by mawong-amd

Loading…

[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model

#5414 opened Jun 11, 2024 by wooyeonlee0 • Draft

3 of 4 tasks

[Bugfix] We have fixed the bug that occurred when using FlashInfer as the backend in vLLM Speculative Decoding.

#5412 opened Jun 11, 2024 by bong-furiosa

Loading…

[Bugfix]Fix evict v2 with long context length

#5411 opened Jun 11, 2024 by puf147

Loading…

[Core] Refactor Worker and ModelRunner to consolidate control plane communication action-required

#5408 opened Jun 11, 2024 by stephanie-wang

Loading…

2 tasks done

Previous 1 2 3 4 5 … 11 12 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly