Conversation
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
|
A draft, if you're interest, feel free to take a look. @googs1025 @nayihz |
| --> | ||
|
|
||
| - e2e tests to make sure the lora service will run successfully | ||
| - e2e tests to make sure the lora autoscaling works as expected, both scaling up and down |
There was a problem hiding this comment.
Most of e2e test cases may cost a lot of time due to download models/images. So we should design the test plan carefully.
There was a problem hiding this comment.
Yes, but we need to make sure the function works as expected, so still needed.
|
/hold |
| proposal will be implemented, this is the place to discuss them. | ||
| --> | ||
|
|
||
| ### The LoRA Autoscaler |
There was a problem hiding this comment.
I don't know much about LoRA. Do we have a simple diagram to describe different components? This may help us understand it more quickly. 🤔
| - Replica 7: lora-1 | ||
|
|
||
| Make sure **at least one lora exists** in replicas, to avoid lora loading overhead in runtime. | ||
| - Once the lora model loaded successfully, the gateway will update the route table for the lora requests |
There was a problem hiding this comment.
Does gateway refer to another component or something else?
There was a problem hiding this comment.
Yes, we'll introduce the envoy gateway for smart routing. I may need to implement the gateway first.
|
Revisit this after #404, since it's a block feature. |
What this PR does / why we need it
Support dense deployment for LoRA models
Which issue(s) this PR fixes
xref: #287
Special notes for your reviewer
Does this PR introduce a user-facing change?