Installing Hermes Agent With Local Gemma 4: A Step-by-Step Guide
In this article, I explain my actual experience installing Hermes Agent; where I connected it to a local Gemma 4 model via Ollama and subsequently activated features such as web search, Telegram Bot, and Tool Calling management.
Introduction and Important Security Tips
Before anything else, we must clarify an important point:
The goal of Hermes Agent is to have a completely local AI Agent that works with local models and does not transmit your data outside your system.
⚠️ Security Warning (Very Important)
- It is best not to run this system on your main computer. - If the Agent gets hacked or misconfigured: - It might gain access to personal files.
- It might read sensitive information such as passwords.
- It might even gain access to system tools.
👉 My recommendation: Use a separate system or VM.
System
I rented from vast with a GPU RTX 5060 Ti Which allowed me to implement these models. You can select your own system from the following link. Note that depending on various parameters (even the time of day), prices will vary, so try to choose according to your needs. Also, since Docker is a prerequisite for this tutorial, you should use VM-related templates like Ubuntu 22.04 VM.
https://cloud.vast.ai/?ref_id=441593
Installing Prerequisites:
Installing Docker:
To install Docker, you can refer to the main Docker tutorial:
https://docs.docker.com/engine/install/ubuntu/
Installing firecrawl:
For your agent to be able to search, extract data, and perform similar tasks on the internet, you need an API, the easiest and most cost-free way being to set up this system locally on your own system. For more information, you can visit the project’s GitHub link.
Note: Before installing firecrawl, make sure Docker is installed.
To install, simply copy and paste the following commands:
cd ~
git clone [https://github.com/firecrawl/firecrawl.git](https://github.com/firecrawl/firecrawl.git)
cd firecrawl
cat > .env << 'EOF'
PORT=3002
HOST=0.0.0.0
USE_DB_AUTHENTICATION=false
BULL_AUTH_KEY=CHANGEME
EOF
sed -i '' 's|# image: ghcr.io/firecrawl/firecrawl|image: ghcr.io/firecrawl/firecrawl|' docker-compose.yaml
sed -i '' 's| build: apps/api| # build: apps/api|' docker-compose.yaml
sed -i '' 's|# image: ghcr.io/firecrawl/playwright-service:latest|image: ghcr.io/firecrawl/playwright-service:latest|' docker-compose.yaml
sed -i '' 's| build: apps/playwright-service-ts| # build: apps/playwright-service-ts|' docker-compose.yaml
docker compose up -d
Note that you can ask ChatGPT or other AIs to customize the commands for you.
Execution:
docker run -p 3002:3002 firecrawl/firecrawl
Step 1: Choosing a Local Model (Ollama + Gemma 4)
Installing Ollama
It is a tool for managing local models. To install, simply:
Ubuntu
curl -fsSL [https://ollama.com/install.sh](https://ollama.com/install.sh) | sh
macOS
brew install ollama
Windows
irm [https://ollama.com/install.ps1](https://ollama.com/install.ps1) | iex
Download from:
Downloading the Gemma 4 Model
Very simple and lightweight version: (E2B):
ollama pull gemma4:e2b
Lightweight version (E4B)
ollama pull gemma4:e4b
Heavy version (12B)
ollama pull gemma4:12b
My Real Experience
The E2B version (older) performed poorly in web search tests.
The E4B (~9.6GB) version had the best balance between speed and accuracy.
The 12B version has excellent quality but consumes a lot of RAM.
Step 2: Installing Hermes Agent (Official Method)
In the first step, we go to the Hermes Agent GitHub project and run the installation command:
curl -fsSL [https://hermes-agent.nousresearch.com/install.sh](https://hermes-agent.nousresearch.com/install.sh) | bash
This script:
Downloads all dependencies.
Creates the initial environment.
Installs the necessary tools.
⏱ This step usually takes 2 to 5 minutes.
During installation, you will be asked for a Provider:
👉 You must select the following option:
Custom Endpoint (Free Local AI)
Step 3: Connecting Hermes to Ollama
When Hermes asks you for the API URL:
http://localhost:11434/v1
Or in some versions:
http://localhost:11434
Step 4: Important Hermes Agent Settings
Tool Calling Iterations
You can adjust the number of tool interactions:
max_tool_iterations = 60
Context Compression
This option ensures:
Long chat memory is not corrupted.
The model does not suffer from amnesia during long conversations.
Reset Session (Very Important)
If you don’t enable it:
RAM gets filled up.
Speed decreases.
👉 Recommendation:
Daily reset.
Or inactivity timeout.
Step 5: Connecting Tools (Tools Integration)
1. Connecting Telegram Bot
To create a bot:
Go to @BotFather in Telegram.
Command:
/newbot
- Get the Token.
Then enter it in Hermes:
TELEGRAM_BOT_TOKEN=xxxxx
TELEGRAM_USER_ID=your_id
To get the user ID, you can also send a message to the @userinfobot bot.
👉 This ensures that only you can send messages to the Agent.
Hermes Configuration:
web_provider = firecrawl
endpoint = http://localhost:3002
Step 6: Running the Agent
After configuration:
hermes start
Initial Testing
File Test:
Summarize this report.txt file
Web Test:
Search for the latest AI news
Code Test:
Write an API in Flask
Real Execution Experience
Model loading: 10 to 20 seconds.
After loading: Responses are stable and fast.
Telegram Test
After connecting:
You send a message:
Send me a summary of the five recent news articles about artificial intelligence.
The response is received directly on the mobile.
Agent Management
Complete Removal
hermes uninstall
Then choose:
Keep data
Full wipe
Model Comparison
| Model | RAM | Speed | Quality | Recommendation |
|---|---|---|---|---|
| E2B | Low | Fast | Medium | ⭐⭐⭐ |
| E4B | Medium | Fast | Good | ⭐⭐⭐⭐ |
| 12B | High | Medium | Excellent | ⭐⭐⭐⭐⭐ |
Common Issues:
Ollama GPU is not working
CUDA not installed.
Outdated driver.
Out of Memory
Using 12B on a weak system.
Solution: E4B.
Slow Execution
Model is running on CPU.
Solution: Enable GPU.
Hermes connection to Ollama fails
Incorrect URL.
Port issue.
Conclusion
The combination of:
Hermes Agent
Ollama
Gemma 4
Firecrawl
Telegram Bot
Creates a completely personal and local AI Agent that:
Works without cloud.
Is scalable.
Is more secure than external APIs.
My Personal Experience
After this setup:
For many daily tasks, I no longer need the ChatGPT API.