Deploy a LLM Model

About project

Research how to deploy your own LLM model instead of using a vendor one.

In the first review of the problem, I’ve identified the following bullets to investigate:

There is a kind of “runtime” to run different models: Ollama, vLLM
Explore models capabilities: phi, llama, so on
Models “configurations” like lora, quantifications etc.

Runtime

Here I will call them runtimes, because is how i understand it is possible to run a LLM, but in most of the cases they are called “libraries”.

There are 3 common ones:

Llama.cpp: https://github.com/ggml-org/llama.cpp
Ollama: https://ollama.com/
vLLM: https://github.com/vllm-project/vllm

Between others like LocalAI, https://github.com/kvcache-ai/ktransformers, and so on.

Then, Runtimes have a bunch of common capabilities between them:

They are able to run a LLM model (key)
They can download models from repositories like HuggingFace
They can be used to configure a LLM model