-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Issues: triton-inference-server/server
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Milestones
Assignee
Sort
Issues list
Regression from 23.07 to 24.05 on model count lifecycle/restarts
#7347
opened Jun 12, 2024 by
sboudouk
The trt llm container does not have the other backends
question
Further information is requested
#7346
opened Jun 12, 2024 by
MatthieuToulemont
could you give some examples about ragged input config for tensorrt backend
#7339
opened Jun 11, 2024 by
wanghuihhh
Triton server crash when running a large model with an ONNX/CPU backend
investigating
The developement team is investigating this issue
#7337
opened Jun 10, 2024 by
LucasAudebert
Does Triton Server support Dynamic Request Batching for models which has sparse tensors as inputs
enhancement
New feature or request
investigating
The developement team is investigating this issue
#7333
opened Jun 7, 2024 by
MorrisMLZ
Segmentation fault (core dumped) - Server version 2.46.0
question
Further information is requested
#7330
opened Jun 6, 2024 by
rahchuenmonroe
CUDA runtime API error raised when using only cpu on Mac M3
investigating
The developement team is investigating this issue
#7324
opened Jun 5, 2024 by
SunXuan90
Triton Server 24.05 can't initialize CUDA drivers if host system has installed Nvidia driver 555.85
#7319
opened Jun 4, 2024 by
romanvelichkin
Uneven QPS leads to low throughput and high latency as well as low GPU utilization
question
Further information is requested
#7318
opened Jun 4, 2024 by
SunnyGhj
When the request is large, the Triton server has a very high TTFT.
investigating
The developement team is investigating this issue
#7316
opened Jun 4, 2024 by
Godlovecui
Memory over 100% with decoupled dali video model
investigating
The developement team is investigating this issue
#7315
opened Jun 3, 2024 by
wq9
Single docker layer is too large
investigating
The developement team is investigating this issue
#7314
opened Jun 3, 2024 by
ShuaiShao93
Low QPS with momentary traffic surges cause significant increases in inference TP99 latency.
question
Further information is requested
#7313
opened Jun 3, 2024 by
a1342772
triton malloc fail
question
Further information is requested
#7308
opened May 31, 2024 by
MouseSun846
unexpected datatype TYPE_INT64 for inference input ,expecting TYPE_INT32
question
Further information is requested
#7307
opened May 31, 2024 by
CallmeZhangChenchen
Add TT-Metalium as a backend
enhancement
New feature or request
#7305
opened May 30, 2024 by
jvasilje
Why is my model in ensemble receiving out-of-order input
question
Further information is requested
#7303
opened May 30, 2024 by
Joenhle
ONNX backend with TensorRT optimizer sometimes fails to start
#7296
opened May 29, 2024 by
ShuaiShao93
Support histogram custom metric in Python backend
enhancement
New feature or request
#7287
opened May 28, 2024 by
ShuaiShao93
Previous Next
ProTip!
Follow long discussions with comments:>50.