Ollama – What is it?
Ollama is a tool for running, managing, and interacting with large language models (LLMs) on local machines. It provides an easy way to download, run, and fine-tune open-source models like Llama, Mistral, and Gemma without requiring cloud-based APIs.
Key Features:
- Runs Locally: No need for cloud services—everything runs on your computer.
- Supports Multiple Models: Works with models like Meta’s Llama, Mistral, and others.
- Simple Interface: You can interact with models via a CLI or programmatically in Python/Node.js.
- Fine-tuning & Customization: Allows you to fine-tune models on your own data.
- Efficient Execution: Optimized for fast performance on local hardware.
How to get started?
Download the ollama tool navigating to the official website ollama.com. The installation is straight forward just like any other software tool
Once installed, you are ready to run LLMs locally
Download and run the model using the below command
ollama run <Model:Parameter>
Ex: ollama run gemma3:1b
You can find the list of models available and their memory requirements at model library
How to use?
Once running, interaction with ollama can be through command line or through APIs.
In the command line, you can interact with the LLM by providing prompts. Sample Prompt: “What is the capital of Australia?”
You can also set system message, show the current settings. Available options can be found by typing “/?”
What is the capital of Australia?
The other way to interact is to use APIs. Ollama by default runs on port 11434. You can test the APIs using Postman tool
Below are some of the APIs to try
Generate API:
POST http://localhost:11434/api/generate
Content-Type: application/json
{
"model": "gemma3:1b",
"prompt": "What is the capital of France?",
"stream": false
}
Chat completion API:
POST http://localhost:11434/api/chat
Content-Type: application/json
{
"model": "gemma3:1b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke"}
]
}
Lets take a quick look at the differences between the two APIs
| /api/generate | /api/chat/completion |
| Used for single prompt | Used for prompts with multiple interactions |
| Request has only one “prompt” | Request has an array of “messages” |
| Doesn’t hold context | Previous messages can be added to maintain context in subsequent requests |
| Resposne contains one “response” | Response contains an array of “messages” along with context, token count etc |
| Usecase: Random text generation | Usecase: Chatbot |
Other APIs to try include the below
GET /api/tags: Lists the installed models
POST /api/pull: Pulls and installs the model
{ "model": "gemma3:1b" }
POST /api/create: Create a custom model
{
"name": "custom-mistral",
"modelfile": "FROM mistral\nPARAMETER temperature=0.7\n"
}
POST /api/embeddings: Generate embeddings
{
"model": "mistral",
"prompt": "Generate embeddings for this text"
}
The Postman collection can be found in my github repo at https://github.com/dcurioustech/ollama-local
Leave a comment