Installing Hermes Agent With Local Gemma 4: A Step-by-Step Guide

In this article, I explain my actual experience installing Hermes Agent; where I connected it to a local Gemma 4 model via Ollama and subsequently activated features such as web search, Telegram Bot, and Tool Calling management.

Introduction and Important Security Tips

Before anything else, we must clarify an important point:

The goal of Hermes Agent is to have a completely local AI Agent that works with local models and does not transmit your data outside your system.

⚠️ Security Warning (Very Important)

  • It is best not to run this system on your main computer. - If the Agent gets hacked or misconfigured: - It might gain access to personal files.
  • It might read sensitive information such as passwords.
  • It might even gain access to system tools.

👉 My recommendation: Use a separate system or VM.

System

I rented from vast with a GPU RTX 5060 Ti Which allowed me to implement these models. You can select your own system from the following link. Note that depending on various parameters (even the time of day), prices will vary, so try to choose according to your needs. Also, since Docker is a prerequisite for this tutorial, you should use VM-related templates like Ubuntu 22.04 VM.

  https://cloud.vast.ai/?ref_id=441593

Installing Prerequisites:

Installing Docker:

To install Docker, you can refer to the main Docker tutorial:

  https://docs.docker.com/engine/install/ubuntu/

Installing firecrawl:

For your agent to be able to search, extract data, and perform similar tasks on the internet, you need an API, the easiest and most cost-free way being to set up this system locally on your own system. For more information, you can visit the project’s GitHub link.

Note: Before installing firecrawl, make sure Docker is installed.

To install, simply copy and paste the following commands:

  cd ~
  git clone [https://github.com/firecrawl/firecrawl.git](https://github.com/firecrawl/firecrawl.git)
  cd firecrawl
  cat > .env << 'EOF'
  PORT=3002
  HOST=0.0.0.0
  USE_DB_AUTHENTICATION=false
  BULL_AUTH_KEY=CHANGEME
  EOF
  sed -i '' 's|# image: ghcr.io/firecrawl/firecrawl|image: ghcr.io/firecrawl/firecrawl|' docker-compose.yaml
  sed -i '' 's|  build: apps/api|  # build: apps/api|' docker-compose.yaml
  sed -i '' 's|# image: ghcr.io/firecrawl/playwright-service:latest|image: ghcr.io/firecrawl/playwright-service:latest|' docker-compose.yaml
  sed -i '' 's|    build: apps/playwright-service-ts|    # build: apps/playwright-service-ts|' docker-compose.yaml
  docker compose up -d

Note that you can ask ChatGPT or other AIs to customize the commands for you.

Execution:

docker run -p 3002:3002 firecrawl/firecrawl

Step 1: Choosing a Local Model (Ollama + Gemma 4)

Installing Ollama

It is a tool for managing local models. To install, simply:

Ubuntu

curl -fsSL [https://ollama.com/install.sh](https://ollama.com/install.sh) | sh

macOS

brew install ollama

Windows

irm [https://ollama.com/install.ps1](https://ollama.com/install.ps1) | iex

Download from:

https://ollama.com/download

Downloading the Gemma 4 Model

Very simple and lightweight version: (E2B):

ollama pull gemma4:e2b

Lightweight version (E4B)

ollama pull gemma4:e4b

Heavy version (12B)

ollama pull gemma4:12b

My Real Experience

  • The E2B version (older) performed poorly in web search tests.

  • The E4B (~9.6GB) version had the best balance between speed and accuracy.

  • The 12B version has excellent quality but consumes a lot of RAM.

Step 2: Installing Hermes Agent (Official Method)

In the first step, we go to the Hermes Agent GitHub project and run the installation command:

curl -fsSL [https://hermes-agent.nousresearch.com/install.sh](https://hermes-agent.nousresearch.com/install.sh) | bash

This script:

  • Downloads all dependencies.

  • Creates the initial environment.

  • Installs the necessary tools.

⏱ This step usually takes 2 to 5 minutes.

During installation, you will be asked for a Provider:

👉 You must select the following option:

Custom Endpoint (Free Local AI)

Step 3: Connecting Hermes to Ollama

When Hermes asks you for the API URL:

http://localhost:11434/v1

Or in some versions:

http://localhost:11434

Step 4: Important Hermes Agent Settings

Tool Calling Iterations

You can adjust the number of tool interactions:

max_tool_iterations = 60

Context Compression

This option ensures:

  • Long chat memory is not corrupted.

  • The model does not suffer from amnesia during long conversations.

Reset Session (Very Important)

If you don’t enable it:

  • RAM gets filled up.

  • Speed decreases.

👉 Recommendation:

  • Daily reset.

  • Or inactivity timeout.

Step 5: Connecting Tools (Tools Integration)

1. Connecting Telegram Bot

To create a bot:

  1. Go to @BotFather in Telegram.

  2. Command:

   /newbot
  1. Get the Token.

Then enter it in Hermes:

TELEGRAM_BOT_TOKEN=xxxxx
TELEGRAM_USER_ID=your_id

To get the user ID, you can also send a message to the @userinfobot bot.

👉 This ensures that only you can send messages to the Agent.

Hermes Configuration:

web_provider = firecrawl
endpoint = http://localhost:3002

Step 6: Running the Agent

After configuration:

hermes start

Initial Testing

File Test:

Summarize this report.txt file

Web Test:

Search for the latest AI news

Code Test:

Write an API in Flask

Real Execution Experience

  • Model loading: 10 to 20 seconds.

  • After loading: Responses are stable and fast.

Telegram Test

After connecting:

  • You send a message:

  • Send me a summary of the five recent news articles about artificial intelligence.

  • The response is received directly on the mobile.

Agent Management

Complete Removal

hermes uninstall

Then choose:

  • Keep data

  • Full wipe

Model Comparison

ModelRAMSpeedQualityRecommendation
E2BLowFastMedium⭐⭐⭐
E4BMediumFastGood⭐⭐⭐⭐
12BHighMediumExcellent⭐⭐⭐⭐⭐

Common Issues:

Ollama GPU is not working

  • CUDA not installed.

  • Outdated driver.

Out of Memory

  • Using 12B on a weak system.

  • Solution: E4B.

Slow Execution

  • Model is running on CPU.

  • Solution: Enable GPU.

Hermes connection to Ollama fails

  • Incorrect URL.

  • Port issue.

Conclusion

The combination of:

  • Hermes Agent

  • Ollama

  • Gemma 4

  • Firecrawl

  • Telegram Bot

Creates a completely personal and local AI Agent that:

  • Works without cloud.

  • Is scalable.

  • Is more secure than external APIs.

My Personal Experience

After this setup:

For many daily tasks, I no longer need the ChatGPT API.