feat: LLM routing by tier (free→Ollama, pro→Timeweb)
Some checks failed
Build and Deploy GooSeek / build-and-deploy (push) Failing after 8m25s

- Add tier-based provider routing in llm-svc
  - free tier → Ollama (local qwen3.5:9b)
  - pro/business → Timeweb Cloud AI
- Add /api/v1/embed endpoint for embeddings via Ollama
- Update Ollama client: qwen3.5:9b default, remove auth
- Add GenerateEmbedding() function for qwen3-embedding:0.6b
- Add Ollama K8s deployment with GPU support (RTX 4060 Ti)
- Add monitoring stack (Prometheus, Grafana, Alertmanager)
- Add Grafana dashboards for LLM and security metrics
- Update deploy.sh with monitoring and Ollama deployment

Made-with: Cursor
This commit is contained in:
home
2026-03-03 02:25:22 +03:00
parent 5ac082a7c6
commit 7a40ff629e
19 changed files with 1759 additions and 35 deletions

View File

@@ -0,0 +1,38 @@
#!/bin/bash
# Скрипт для загрузки моделей в Ollama
# Запустить ОДИН РАЗ после первого деплоя
# Модели сохраняются в PVC и не нужно скачивать повторно
set -e
NAMESPACE="${NAMESPACE:-gooseek}"
MODELS="${@:-llama3.2:3b}"
echo "=== Ollama Model Loader ==="
echo "Namespace: $NAMESPACE"
echo "Models: $MODELS"
# Проверить что Ollama pod запущен
echo ""
echo "Checking Ollama pod status..."
kubectl -n $NAMESPACE wait --for=condition=ready pod -l app=ollama --timeout=120s
# Получить имя пода
POD=$(kubectl -n $NAMESPACE get pod -l app=ollama -o jsonpath='{.items[0].metadata.name}')
echo "Pod: $POD"
# Скачать модели
for MODEL in $MODELS; do
echo ""
echo "=== Pulling model: $MODEL ==="
kubectl -n $NAMESPACE exec -it $POD -c ollama -- ollama pull $MODEL
done
# Показать список моделей
echo ""
echo "=== Installed models ==="
kubectl -n $NAMESPACE exec -it $POD -c ollama -- ollama list
echo ""
echo "=== Done! ==="
echo "Models are stored in PVC and will persist across restarts."