Tag: Ollama

  • Ollama – The power of local LLMs

    Ollama – What is it?

    Ollama is a tool for running, managing, and interacting with large language models (LLMs) on local machines. It provides an easy way to download, run, and fine-tune open-source models like Llama, Mistral, and Gemma without requiring cloud-based APIs.

    Key Features:

    • Runs Locally: No need for cloud services—everything runs on your computer.
    • Supports Multiple Models: Works with models like Meta’s Llama, Mistral, and others.
    • Simple Interface: You can interact with models via a CLI or programmatically in Python/Node.js.
    • Fine-tuning & Customization: Allows you to fine-tune models on your own data.
    • Efficient Execution: Optimized for fast performance on local hardware.

    How to get started?

    Download the ollama tool navigating to the official website ollama.com. The installation is straight forward just like any other software tool

    Once installed, you are ready to run LLMs locally

    Download and run the model using the below command

    ollama run <Model:Parameter>
    
    Ex: ollama run gemma3:1b

    You can find the list of models available and their memory requirements at model library

    How to use?

    Once running, interaction with ollama can be through command line or through APIs.

    In the command line, you can interact with the LLM by providing prompts. Sample Prompt: “What is the capital of Australia?”

    You can also set system message, show the current settings. Available options can be found by typing “/?”

    What is the capital of Australia?

    The other way to interact is to use APIs. Ollama by default runs on port 11434. You can test the APIs using Postman tool

    Below are some of the APIs to try

    Generate API:

    POST http://localhost:11434/api/generate
    Content-Type: application/json
    
    {
        "model": "gemma3:1b",
        "prompt": "What is the capital of France?",
        "stream": false
    }

    Chat completion API:

    POST http://localhost:11434/api/chat
    Content-Type: application/json
    
    {
      "model": "gemma3:1b",
      "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke"}
      ]
    }

    Lets take a quick look at the differences between the two APIs

    /api/generate/api/chat/completion
    Used for single promptUsed for prompts with multiple interactions
    Request has only one “prompt”Request has an array of “messages”
    Doesn’t hold contextPrevious messages can be added to maintain context in subsequent requests
    Resposne contains one “response” Response contains an array of “messages” along with context, token count etc
    Usecase: Random text generationUsecase: Chatbot

    Other APIs to try include the below

    GET /api/tags: Lists the installed models
    
    
    POST /api/pull: Pulls and installs the model
    { "model": "gemma3:1b" }
    
    
    POST /api/create: Create a custom model
    {
      "name": "custom-mistral",
      "modelfile": "FROM mistral\nPARAMETER temperature=0.7\n"
    }
    
    
    POST /api/embeddings: Generate embeddings
    {
      "model": "mistral",
      "prompt": "Generate embeddings for this text"
    }

    The Postman collection can be found in my github repo at https://github.com/dcurioustech/ollama-local