feat: LLM routing by tier (free→Ollama, pro→Timeweb)
Some checks failed
Build and Deploy GooSeek / build-and-deploy (push) Failing after 8m25s

- Add tier-based provider routing in llm-svc
  - free tier → Ollama (local qwen3.5:9b)
  - pro/business → Timeweb Cloud AI
- Add /api/v1/embed endpoint for embeddings via Ollama
- Update Ollama client: qwen3.5:9b default, remove auth
- Add GenerateEmbedding() function for qwen3-embedding:0.6b
- Add Ollama K8s deployment with GPU support (RTX 4060 Ti)
- Add monitoring stack (Prometheus, Grafana, Alertmanager)
- Add Grafana dashboards for LLM and security metrics
- Update deploy.sh with monitoring and Ollama deployment

Made-with: Cursor
This commit is contained in:
home
2026-03-03 02:25:22 +03:00
parent 5ac082a7c6
commit 7a40ff629e
19 changed files with 1759 additions and 35 deletions

View File

@@ -79,8 +79,11 @@ type Config struct {
TimewebProxySource string
// Ollama (local LLM)
OllamaBaseURL string
OllamaModelKey string
OllamaBaseURL string
OllamaModelKey string
OllamaEmbeddingModel string
OllamaNumParallel int
OllamaAPIToken string
// Timeouts
HTTPTimeout time.Duration
@@ -160,8 +163,11 @@ func Load() (*Config, error) {
TimewebAPIKey: getEnv("TIMEWEB_API_KEY", ""),
TimewebProxySource: getEnv("TIMEWEB_X_PROXY_SOURCE", "gooseek"),
OllamaBaseURL: getEnv("OLLAMA_BASE_URL", "http://ollama:11434"),
OllamaModelKey: getEnv("OLLAMA_MODEL", "llama3.2"),
OllamaBaseURL: getEnv("OLLAMA_BASE_URL", "http://ollama:11434"),
OllamaModelKey: getEnv("OLLAMA_MODEL", "qwen3.5:9b"),
OllamaEmbeddingModel: getEnv("OLLAMA_EMBEDDING_MODEL", "qwen3-embedding:0.6b"),
OllamaNumParallel: getEnvInt("OLLAMA_NUM_PARALLEL", 2),
OllamaAPIToken: getEnv("OLLAMA_API_TOKEN", ""),
HTTPTimeout: time.Duration(getEnvInt("HTTP_TIMEOUT_MS", 60000)) * time.Millisecond,
LLMTimeout: time.Duration(getEnvInt("LLM_TIMEOUT_MS", 120000)) * time.Millisecond,