Adding Llama Models

How to install Ollama and add local AI models to Promptly

Promptly supports running local language models through Ollama, including Llama, Mistral, and other open-source models. This guide will walk you through setting up Ollama and configuring models to use with Promptly.

What is Ollama?

Ollama is an open-source tool that lets you run large language models locally on your Mac. It provides a simple way to download, manage, and use various open-source AI models without sending your data to external services.

Benefits of using local models with Ollama include:

  • Privacy: Your data stays on your device
  • No API costs: Free to use as much as you want
  • No internet required: Works offline
  • Low latency: Responses can be faster than cloud services

Installing Ollama

  1. Visit the Ollama website
  2. Download the macOS installer
  3. Open the downloaded file and follow the installation instructions
  4. Once installed, Ollama will run in the background with a menu bar icon

Pulling Your First Model

Before you can use a model in Promptly, you need to download ("pull") it using Ollama:

Using the Ollama App

  1. Click the Ollama icon in your menu bar
  2. Select "Pull Model" from the menu
  3. Choose a model from the list or enter a specific model name
  4. Wait for the download to complete (this may take several minutes depending on the model size)

Using Terminal

You can also pull models using the Terminal:

# Pull the Mistral 7B model
ollama pull mistral

# Pull Llama 3 8B
ollama pull llama3

# Pull a specific model version
ollama pull codellama:7b

Popular models to start with:

  • mistral - Balanced performance and size (7B parameters)
  • llama3 - Meta's Llama 3 model (8B parameters)
  • gemma:2b - Google's smaller Gemma model
  • codellama - Specialized for coding tasks

Adding Ollama Models to Promptly

Once you've pulled a model with Ollama, add it to Promptly:

  1. Open Promptly's Preferences (⌘,)
  2. Navigate to the Models tab
  3. Find the Ollama group (or create it if it doesn't exist)
  4. Click the "+" button
  5. Enter the model details:
    • Display Name: A user-friendly name (e.g., "Mistral 7B")
    • API Name: Must be prefixed with "ollama:" followed by the model name (e.g., "ollama:mistral")
  6. Click "Add" to save the model

Hardware Considerations

Local models require significant system resources:

  • Memory: Models can use 4GB-32GB RAM depending on their size
  • Storage: Model files range from 2GB to 20GB
  • CPU/GPU: Models run faster with Apple Silicon Macs or Macs with dedicated GPUs

For the best experience:

  • Start with smaller models (7B or less) on machines with limited resources
  • Ensure you have at least 8GB of RAM, preferably 16GB+
  • Apple Silicon Macs (M1/M2/M3) provide significantly better performance

Troubleshooting

If you're having issues with Ollama models:

  1. Ensure Ollama is running: Check for the Ollama icon in your menu bar
  2. Verify model is pulled: Open Terminal and run ollama list to see available models
  3. Check API name: The API name in Promptly must exactly match the model name in Ollama, prefixed with "ollama:"
  4. Restart Ollama: Sometimes restarting the Ollama service can resolve connection issues
  5. Check system resources: If your Mac is low on memory, models may fail to load

Advanced: Customizing Models

Ollama supports creating custom model configurations using Modelfiles:

# Example Modelfile for a custom Mistral configuration
FROM mistral

# Set different parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9

# Add a custom system message
SYSTEM You are a helpful AI assistant specialized in explaining complex topics simply.

Save this to a file named "Modelfile" and create your custom model:

ollama create mycustom -f ./Modelfile

Then add it to Promptly with the API name "ollama:mycustom".