Self-hosted AI HR assistant. Runs on your own hardware. No cloud dependency. No data leaves your network.
Disclaimer. This is an independent project, not an official Kolay Yazilim A.S. product. All write operations target live HR data. There is no sandbox. You are responsible for your API token and all actions performed.
Kolay AI Box packages three components into a single Docker Compose stack:
| Component | Role |
|---|---|
| Open WebUI | Chat interface (port 3000) |
| Ollama | Local LLM inference (Gemma 4) |
| Kolay MCP Proxy | Connects the LLM to Kolay IK API |
The browser talks to Open WebUI. Open WebUI talks to Ollama. When the LLM needs HR data, it calls the proxy. The proxy fetches from Kolay IK and returns JSON. The LLM formats the response. No external AI service is involved.
| Requirement | Minimum |
|---|---|
| Docker | 24.0+ |
| Docker Compose | v2.20+ |
| RAM | 8 GB (CPU mode), 16 GB recommended |
| Disk | 15 GB free (for model weights) |
| GPU (optional) | NVIDIA with CUDA 12+ and nvidia-container-toolkit |
macOS: Download Docker Desktop and install.
Ubuntu / Debian:
sudo apt-get update
sudo apt-get install -y docker.io docker-compose-v2
sudo usermod -aG docker $USER
# log out and back in for group change to take effect
Verify:
docker --version
docker compose version
# Ubuntu / Debian
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Verify: docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
# clone the repository
git clone https://github.com/ezapmar/kolay-cli.git
cd kolay-cli/box
# start in CPU mode (Mac, laptop, no GPU)
make dev
# or start with GPU (NVIDIA)
make up
On first run, make creates a .env file from the template with generated secrets.
It prints the admin password and asks you to run the command again.
Created .env with generated secrets.
1. Optionally set KOLAY_API_TOKEN for global mode
2. Note your admin password: <generated>
Then run this command again.
Run make dev (or make up) again to start.
The first boot pulls the Gemma 4 model. This takes 5-15 minutes depending on your connection speed. Monitor progress:
docker compose logs -f ollama
.env.That is all. No settings menus, no configuration wizards.
All configuration is in the .env file. Edit it and restart:
# edit
nano .env
# restart
make down && make dev
| Variable | Default | Purpose |
|---|---|---|
OLLAMA_MODEL |
gemma4:26b |
LLM model to use |
KOLAY_API_TOKEN |
(empty) | Global token (see Token Modes below) |
KOLAY_SECURITY_PROFILE |
standard |
standard or enterprise |
ADMIN_EMAIL |
admin@kolay.box |
Admin account email |
ADMIN_PASS |
(generated) | Admin account password |
ADMIN_NAME |
Admin |
Admin display name |
WEBUI_PORT |
3000 |
Web UI port |
WEBUI_SECRET_KEY |
(generated) | Session encryption key |
| Model | VRAM | Use case |
|---|---|---|
gemma4:e4b |
~6 GB | Laptop / CPU / demo |
gemma4:26b |
~17 GB | Production default (MoE, fast) |
gemma4:31b |
~20 GB | Maximum quality (dense) |
qwen2.5:32b |
~20 GB | Fallback if Gemma tool-calling is insufficient |
| Mode | Configuration | Use case |
|---|---|---|
| Per-user (default) | Leave KOLAY_API_TOKEN empty |
Each user pastes their own token. Isolated permissions. |
| Global | Set KOLAY_API_TOKEN in .env |
Entire team shares one token. Quick setup. |
Per-user mode is recommended for production. Each user’s actions are scoped to their own Kolay IK permissions.
| Command | Action |
|---|---|
make up |
Start with GPU support |
make dev |
Start in CPU mode (no GPU needed) |
make down |
Stop all containers (data preserved) |
make logs |
Tail all container logs |
make diagnose |
Run health checks and show fixes |
make build |
Rebuild the proxy image |
make clean |
Remove containers AND volumes (destructive) |
make cloud-up |
Provision DigitalOcean GPU droplet |
make cloud-down |
Destroy droplet (volume preserved) |
make help |
Show all available commands |
Browser
|
v
Open WebUI (port 3000)
|
+--> Ollama (Gemma 4, local inference)
|
+--> Kolay MCP Proxy (Python, FastMCP)
|
v
Kolay IK API (api.kolayik.com, HTTPS)
All components run as Docker containers on the same network. The proxy is the only container that makes external network requests (to the Kolay IK API over HTTPS).
localhost:3000 by default. It is not exposed to the internet.KOLAY_SECURITY_PROFILE=enterprise for PII masking and DLP scanning.ADMIN_PASS and WEBUI_SECRET_KEY from their generated values.127.0.0.1:3000 and use a TLS reverse proxy.make diagnose
This checks container health, port availability, GPU detection, model download status, and API connectivity. It prints specific fixes for each problem found.
# check container status
docker compose ps
# check logs
docker compose logs -f
# check Ollama progress
docker compose logs -f ollama
The Gemma 4 26b model is approximately 17 GB. On slow connections, use the smaller model:
# in .env
OLLAMA_MODEL=gemma4:e4b
Wait 30-60 seconds after starting. Open WebUI needs time to initialize. If it still fails:
docker compose ps | grep webui
If the container is not running, check logs:
docker compose logs webui
# verify NVIDIA driver
nvidia-smi
# verify container toolkit
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
If GPU is not available, use CPU mode:
make dev
Reduce model size in .env:
OLLAMA_MODEL=gemma4:e4b # smallest, ~6 GB
# test from inside the container
docker compose exec proxy curl -s https://api.kolayik.com/health
If this fails, check your network configuration. If you are behind a corporate
proxy, set HTTPS_PROXY in .env.
make clean # WARNING: deletes all data, models, and user accounts
# provision a GPU droplet with attached volume
make cloud-up
# when done, destroy the droplet (volume preserved, billing stops)
make cloud-down
The scripts handle droplet creation, volume attachment, Docker installation, and stack deployment.
cd box.make up (GPU) or make dev (CPU).# stop containers and remove volumes
make clean
# remove the repository
cd ../..
rm -rf kolay-cli
Docker images can be removed with:
docker image prune -a
MIT