Research how to deploy your own LLM model instead of using a vendor one.
In the first review of the problem, I’ve identified the following bullets to investigate:
Here I will call them runtimes, because is how i understand it is possible to run a LLM, but in most of the cases they are called “libraries”.
There are 3 common ones:
Between others like LocalAI, https://github.com/kvcache-ai/ktransformers, and so on.
Then, Runtimes have a bunch of common capabilities between them: