Adding Llama Models

How to install Ollama and add local AI models to Promptly

Promptly supports running local language models through Ollama, including Llama, Mistral, and other open-source models. This guide will walk you through setting up Ollama and configuring models to use with Promptly.

What is Ollama?

Ollama is an open-source tool that lets you run large language models locally on your Mac. It provides a simple way to download, manage, and use various open-source AI models without sending your data to external services.

Benefits of using local models with Ollama include:

Privacy: Your data stays on your device
No API costs: Free to use as much as you want
No internet required: Works offline
Low latency: Responses can be faster than cloud services

Installing Ollama

Visit the Ollama website
Download the macOS installer
Open the downloaded file and follow the installation instructions
Once installed, Ollama will run in the background with a menu bar icon

Pulling Your First Model

Before you can use a model in Promptly, you need to download ("pull") it using Ollama:

Using the Ollama App

Click the Ollama icon in your menu bar
Select "Pull Model" from the menu
Choose a model from the list or enter a specific model name
Wait for the download to complete (this may take several minutes depending on the model size)

Using Terminal

You can also pull models using the Terminal:

# Pull the Mistral 7B model
ollama pull mistral

# Pull Llama 3 8B
ollama pull llama3

# Pull a specific model version
ollama pull codellama:7b

Popular models to start with:

mistral - Balanced performance and size (7B parameters)
llama3 - Meta's Llama 3 model (8B parameters)
gemma:2b - Google's smaller Gemma model
codellama - Specialized for coding tasks

Adding Ollama Models to Promptly

Once you've pulled a model with Ollama, add it to Promptly:

Open Promptly's Preferences (⌘,)
Navigate to the Models tab
Find the Ollama group (or create it if it doesn't exist)
Click the "+" button
Enter the model details:
- Display Name: A user-friendly name (e.g., "Mistral 7B")
- API Name: Must be prefixed with "ollama:" followed by the model name (e.g., "ollama:mistral")
Click "Add" to save the model

Hardware Considerations

Local models require significant system resources:

Memory: Models can use 4GB-32GB RAM depending on their size
Storage: Model files range from 2GB to 20GB
CPU/GPU: Models run faster with Apple Silicon Macs or Macs with dedicated GPUs

For the best experience:

Start with smaller models (7B or less) on machines with limited resources
Ensure you have at least 8GB of RAM, preferably 16GB+
Apple Silicon Macs (M1/M2/M3) provide significantly better performance

Troubleshooting

If you're having issues with Ollama models:

Ensure Ollama is running: Check for the Ollama icon in your menu bar
Verify model is pulled: Open Terminal and run ollama list to see available models
Check API name: The API name in Promptly must exactly match the model name in Ollama, prefixed with "ollama:"
Restart Ollama: Sometimes restarting the Ollama service can resolve connection issues
Check system resources: If your Mac is low on memory, models may fail to load

Advanced: Customizing Models

Ollama supports creating custom model configurations using Modelfiles:

# Example Modelfile for a custom Mistral configuration
FROM mistral

# Set different parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9

# Add a custom system message
SYSTEM You are a helpful AI assistant specialized in explaining complex topics simply.

Save this to a file named "Modelfile" and create your custom model:

ollama create mycustom -f ./Modelfile

Then add it to Promptly with the API name "ollama:mycustom".