Ollama (Local Models)
Ollama allows you to run large language models locally on your machine, providing privacy, control, and no API costs. This guide covers how to use Ollama through EasilyAI.
Getting Started
Installation
First, install Ollama on your system:
macOS:
# Download from https://ollama.ai or use Homebrew
brew install ollama
Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Windows: Download the installer from ollama.ai
Start Ollama Service
# Start Ollama service
ollama serve
Pull a Model
# Pull a model (e.g., Llama 2)
ollama pull llama2
# See available models
ollama list
Basic Usage with EasilyAI
from easilyai import create_app
# Create Ollama app (no API key needed)
app = create_app("Ollama", "ollama", "", "llama2")
# Generate text
response = app.request("Explain machine learning in simple terms")
print(response)
Available Models
Ollama supports many open-source models. Here are some popular options:
Llama 2
- Model ID:
llama2
(7B),llama2:13b
,llama2:70b
- Best for: General-purpose tasks, instruction following
- Strengths: Well-balanced, good performance
# Pull different sizes
ollama pull llama2 # 7B parameters
ollama pull llama2:13b # 13B parameters
ollama pull llama2:70b # 70B parameters (requires more RAM)
# Use different Llama 2 variants
llama2_7b = create_app("Llama7B", "ollama", "", "llama2")
llama2_13b = create_app("Llama13B", "ollama", "", "llama2:13b")
Code Llama
- Model ID:
codellama
,codellama:13b
,codellama:34b
- Best for: Code generation and programming tasks
- Strengths: Specialized for coding tasks
ollama pull codellama
code_app = create_app("CodeLlama", "ollama", "", "codellama")
code_response = code_app.request("Write a Python function to calculate fibonacci numbers")
Mistral
- Model ID:
mistral
,mistral:7b
- Best for: Fast, efficient performance
- Strengths: Good performance with smaller size
ollama pull mistral
mistral_app = create_app("Mistral", "ollama", "", "mistral")
response = mistral_app.request("What are the benefits of renewable energy?")
Neural Chat
- Model ID:
neural-chat
- Best for: Conversational AI
- Strengths: Optimized for chat applications
ollama pull neural-chat
chat_app = create_app("NeuralChat", "ollama", "", "neural-chat")
chat_response = chat_app.request("Let's discuss the future of artificial intelligence")
Dolphin Models
- Model ID:
dolphin-mistral
,dolphin-llama2
- Best for: Uncensored, helpful responses
- Strengths: Less restricted outputs
ollama pull dolphin-mistral
dolphin_app = create_app("Dolphin", "ollama", "", "dolphin-mistral")
Parameters
Text Generation Parameters
response = app.request(
"Write a creative story",
temperature=0.8, # Controls randomness (0.0 to 2.0)
num_predict=500, # Number of tokens to predict
top_p=0.9, # Nucleus sampling
top_k=40, # Top-k sampling
repeat_penalty=1.1 # Penalty for repetition
)
Advanced Parameters
# Fine-tune model behavior
advanced_response = app.request(
"Explain quantum computing",
temperature=0.7,
num_predict=1000,
top_p=0.95,
top_k=50,
repeat_penalty=1.05,
presence_penalty=0.5,
frequency_penalty=0.5
)
Use Cases
Local Development
Perfect for development without API costs:
from easilyai import create_app
dev_app = create_app("LocalDev", "ollama", "", "codellama")
# Code generation
function_code = dev_app.request(
"Write a Python class for a simple todo list with add, remove, and list methods"
)
# Code review
review = dev_app.request(
"Review this code for improvements:\n"
"def bubble_sort(arr):\n"
" for i in range(len(arr)):\n"
" for j in range(len(arr)-1):\n"
" if arr[j] > arr[j+1]:\n"
" arr[j], arr[j+1] = arr[j+1], arr[j]\n"
" return arr"
)
# Documentation
docs = dev_app.request(
"Write documentation for this function:\n"
"def calculate_distance(lat1, lon1, lat2, lon2):\n"
" # Implementation here\n"
" pass"
)
Private AI Assistant
Run a personal AI assistant locally:
from easilyai import create_app
assistant_app = create_app("LocalAssistant", "ollama", "", "llama2")
# Personal planning
planning = assistant_app.request(
"Help me plan a productive workday. I have meetings at 10 AM and 3 PM, "
"and I need to finish a presentation."
)
# Learning assistance
learning = assistant_app.request(
"I'm learning React. Can you explain the concept of hooks and give me "
"a simple example?"
)
# Creative writing
creative = assistant_app.request(
"Help me brainstorm ideas for a short story about time travel"
)
Content Creation
Generate content without sending data to external services:
from easilyai import create_app
content_app = create_app("ContentCreator", "ollama", "", "mistral")
# Blog posts
blog_post = content_app.request(
"Write a 500-word blog post about the benefits of local AI models"
)
# Social media content
social_content = content_app.request(
"Create 5 engaging social media posts about sustainable technology"
)
# Email drafts
email_draft = content_app.request(
"Draft a professional email to introduce our new software product to potential clients"
)
Research and Analysis
Analyze data and documents privately:
from easilyai import create_app
research_app = create_app("LocalResearch", "ollama", "", "llama2:13b")
# Document analysis
analysis = research_app.request(
"Analyze this market research data and provide insights:\n"
"[Your private data here]"
)
# Summarization
summary = research_app.request(
"Summarize the key points from this internal meeting transcript:\n"
"[Private transcript here]"
)
# Trend analysis
trends = research_app.request(
"Based on this sales data, what trends do you see?\n"
"[Private sales data here]"
)
Model Management
Listing Models
import subprocess
def list_ollama_models():
"""List all downloaded Ollama models"""
result = subprocess.run(['ollama', 'list'], capture_output=True, text=True)
return result.stdout
print(list_ollama_models())
Model Information
def get_model_info(model_name):
"""Get information about a specific model"""
result = subprocess.run(['ollama', 'show', model_name], capture_output=True, text=True)
return result.stdout
print(get_model_info("llama2"))
Switching Models
from easilyai import create_app
# Switch between models for different tasks
code_model = create_app("Coder", "ollama", "", "codellama")
chat_model = create_app("Chat", "ollama", "", "neural-chat")
general_model = create_app("General", "ollama", "", "llama2")
# Use appropriate model for each task
code_response = code_model.request("Write a sorting algorithm")
chat_response = chat_model.request("Let's discuss AI ethics")
general_response = general_model.request("Explain photosynthesis")
Performance Optimization
Hardware Considerations
import psutil
def check_system_resources():
"""Check if system can handle larger models"""
ram_gb = psutil.virtual_memory().total / (1024**3)
cpu_count = psutil.cpu_count()
print(f"RAM: {ram_gb:.1f} GB")
print(f"CPU cores: {cpu_count}")
# Model recommendations based on RAM
if ram_gb >= 32:
print("Can run: llama2:70b, codellama:34b")
elif ram_gb >= 16:
print("Can run: llama2:13b, codellama:13b")
else:
print("Recommended: llama2, mistral, codellama")
check_system_resources()
Model Selection by Task
from easilyai import create_app
class LocalAIManager:
def __init__(self):
self.models = {
"code": create_app("Code", "ollama", "", "codellama"),
"chat": create_app("Chat", "ollama", "", "neural-chat"),
"general": create_app("General", "ollama", "", "llama2"),
"fast": create_app("Fast", "ollama", "", "mistral")
}
def request(self, prompt, task_type="general", **kwargs):
"""Route request to appropriate model"""
if task_type not in self.models:
task_type = "general"
return self.models[task_type].request(prompt, **kwargs)
# Usage
ai_manager = LocalAIManager()
# Automatically use best model for task
code_response = ai_manager.request("Write a web scraper", task_type="code")
chat_response = ai_manager.request("How are you today?", task_type="chat")
quick_response = ai_manager.request("What is 2+2?", task_type="fast")
Temperature Control
from easilyai import create_app
app = create_app("TempControl", "ollama", "", "llama2")
# Consistent, factual responses
factual = app.request(
"What is the capital of Japan?",
temperature=0.1
)
# Balanced responses
balanced = app.request(
"Explain machine learning",
temperature=0.5
)
# Creative responses
creative = app.request(
"Write a poem about the ocean",
temperature=0.9
)
Troubleshooting
Common Issues
from easilyai import create_app
from easilyai.exceptions import EasilyAIException
def safe_ollama_request(prompt, model="llama2"):
"""Make a safe Ollama request with error handling"""
try:
app = create_app("Ollama", "ollama", "", model)
return app.request(prompt)
except EasilyAIException as e:
error_msg = str(e).lower()
if "connection" in error_msg:
return "Error: Ollama service not running. Start with 'ollama serve'"
elif "model" in error_msg:
return f"Error: Model {model} not found. Pull with 'ollama pull {model}'"
else:
return f"Error: {e}"
# Usage
response = safe_ollama_request("Hello!")
print(response)
Model Loading Issues
import subprocess
import time
def ensure_model_available(model_name):
"""Ensure a model is available, pull if necessary"""
# Check if model exists
result = subprocess.run(['ollama', 'list'], capture_output=True, text=True)
if model_name not in result.stdout:
print(f"Model {model_name} not found. Downloading...")
subprocess.run(['ollama', 'pull', model_name])
print(f"Model {model_name} downloaded successfully")
return True
# Ensure model is available before use
ensure_model_available("llama2")
app = create_app("Ollama", "ollama", "", "llama2")
Performance Issues
def optimize_ollama_performance():
"""Tips for optimizing Ollama performance"""
tips = [
"Use smaller models (7B) for faster responses",
"Increase system RAM for larger models",
"Use SSD storage for better model loading",
"Close other applications to free up resources",
"Use GPU acceleration if available (CUDA/Metal)"
]
for tip in tips:
print(f"• {tip}")
optimize_ollama_performance()
Batch Processing
Processing Multiple Requests
from easilyai import create_app
import time
def batch_process_local(prompts, model="llama2"):
"""Process multiple prompts with local model"""
app = create_app("BatchOllama", "ollama", "", model)
results = []
for i, prompt in enumerate(prompts):
print(f"Processing {i+1}/{len(prompts)}")
try:
response = app.request(prompt)
results.append({"prompt": prompt, "response": response, "success": True})
except Exception as e:
results.append({"prompt": prompt, "error": str(e), "success": False})
# Small delay to prevent overwhelming the system
time.sleep(0.1)
return results
# Usage
prompts = [
"What is Python?",
"Explain machine learning",
"Write a hello world program"
]
results = batch_process_local(prompts, "codellama")
for result in results:
if result["success"]:
print(f"✓ {result['prompt']}: {result['response'][:50]}...")
else:
print(f"✗ {result['prompt']}: {result['error']}")
Best Practices
1. Choose the Right Model
# For code tasks
code_app = create_app("Code", "ollama", "", "codellama")
# For general conversation
chat_app = create_app("Chat", "ollama", "", "neural-chat")
# For quick responses
fast_app = create_app("Fast", "ollama", "", "mistral")
2. Manage System Resources
import psutil
def check_resources_before_request():
"""Check system resources before making requests"""
ram_usage = psutil.virtual_memory().percent
cpu_usage = psutil.cpu_percent(interval=1)
if ram_usage > 90:
print("Warning: High RAM usage. Consider using a smaller model.")
if cpu_usage > 90:
print("Warning: High CPU usage. Wait before making more requests.")
return ram_usage < 90 and cpu_usage < 90
# Check before making requests
if check_resources_before_request():
response = app.request("Your prompt here")
3. Use Appropriate Parameters
# For consistent outputs
consistent = app.request("Explain AI", temperature=0.1)
# For creative outputs
creative = app.request("Write a story", temperature=0.8)
# For longer responses
detailed = app.request("Detailed analysis", num_predict=1000)
4. Privacy and Security
Ollama keeps everything local, but consider:
- Regularly update models for security patches
- Be aware of model capabilities and limitations
- Monitor system resources and performance
- Keep sensitive data processing local
Advantages of Local Models
Privacy
- No data sent to external servers
- Complete control over your information
- Compliance with data protection regulations
Cost
- No API fees or usage limits
- One-time setup cost for hardware
- Unlimited usage once installed
Control
- Choose specific models for your needs
- Customize parameters without restrictions
- No rate limits or quotas
Availability
- Works offline
- No dependency on external services
- Consistent availability
Limitations
Performance
- Generally slower than cloud APIs
- Limited by local hardware
- Larger models require significant RAM
Model Selection
- Limited to open-source models
- May not have latest cutting-edge capabilities
- Updates require manual model downloads
Maintenance
- Requires system administration
- Model management and updates
- Hardware requirements planning
Ollama provides an excellent way to run AI models locally, offering privacy, control, and cost savings for many use cases. It's particularly valuable for development, private data processing, and scenarios where data security is paramount.