Ollama – The power of local LLMs

Ollama – What is it?

Ollama is a tool for running, managing, and interacting with large language models (LLMs) on local machines. It provides an easy way to download, run, and fine-tune open-source models like Llama, Mistral, and Gemma without requiring cloud-based APIs.

Key Features:

Runs Locally: No need for cloud services—everything runs on your computer.
Supports Multiple Models: Works with models like Meta’s Llama, Mistral, and others.
Simple Interface: You can interact with models via a CLI or programmatically in Python/Node.js.
Fine-tuning & Customization: Allows you to fine-tune models on your own data.
Efficient Execution: Optimized for fast performance on local hardware.

How to get started?

Download the ollama tool navigating to the official website ollama.com. The installation is straight forward just like any other software tool

Once installed, you are ready to run LLMs locally

Download and run the model using the below command

ollama run <Model:Parameter>

Ex: ollama run gemma3:1b

You can find the list of models available and their memory requirements at model library

How to use?

Once running, interaction with ollama can be through command line or through APIs.

In the command line, you can interact with the LLM by providing prompts. Sample Prompt: “What is the capital of Australia?”

You can also set system message, show the current settings. Available options can be found by typing “/?”

What is the capital of Australia?

The other way to interact is to use APIs. Ollama by default runs on port 11434. You can test the APIs using Postman tool

Below are some of the APIs to try

Generate API:

POST http://localhost:11434/api/generate
Content-Type: application/json

{
    "model": "gemma3:1b",
    "prompt": "What is the capital of France?",
    "stream": false
}

Chat completion API:

POST http://localhost:11434/api/chat
Content-Type: application/json

{
  "model": "gemma3:1b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me a joke"}
  ]
}

Lets take a quick look at the differences between the two APIs

/api/generate	/api/chat/completion
Used for single prompt	Used for prompts with multiple interactions
Request has only one “prompt”	Request has an array of “messages”
Doesn’t hold context	Previous messages can be added to maintain context in subsequent requests
Resposne contains one “response”	Response contains an array of “messages” along with context, token count etc
Usecase: Random text generation	Usecase: Chatbot

Other APIs to try include the below

GET /api/tags: Lists the installed models


POST /api/pull: Pulls and installs the model
{ "model": "gemma3:1b" }


POST /api/create: Create a custom model
{
  "name": "custom-mistral",
  "modelfile": "FROM mistral\nPARAMETER temperature=0.7\n"
}


POST /api/embeddings: Generate embeddings
{
  "model": "mistral",
  "prompt": "Generate embeddings for this text"
}

The Postman collection can be found in my github repo at https://github.com/dcurioustech/ollama-local

Ollama – The power of local LLMs

Ollama – What is it?

Key Features:

How to get started?

How to use?

Comments

Leave a comment Cancel reply

More posts

Why 2026 Becomes the Year of the AI Operating Model

Notes-Taking AI

Building AI Chat Apps Across Python, TypeScript & Java: A Hands-on Comparison

Why BERTScore and Cosine Similarity Are Not Enough for Evaluating GenAI Outputs

Ollama – The power of local LLMs

Ollama – What is it?

Key Features:

How to get started?

How to use?

Share this:

Comments

Leave a comment Cancel reply

More posts

Why 2026 Becomes the Year of the AI Operating Model

Notes-Taking AI

Building AI Chat Apps Across Python, TypeScript & Java: A Hands-on Comparison

Why BERTScore and Cosine Similarity Are Not Enough for Evaluating GenAI Outputs