Arquitectura — EVA AIOps Platform

3.1 Capas del producto

EVAIOps opera en 5 capas funcionales que transforman datos crudos de infraestructura en inteligencia operacional accionable:

┌─────────────────────────────────────────────────────────────────────┐
│  CAPA 5 — PRESENTACIÓN Y AUTOMATIZACIÓN                            │
│  NOC Dashboard · Reports · EOE Flow Designer · TRE Rule Builder    │
├─────────────────────────────────────────────────────────────────────┤
│  CAPA 4 — IA GENERATIVA                                            │
│  Chatbot LLM · AI Narrator · Incident Explainer · NL Query         │
├─────────────────────────────────────────────────────────────────────┤
│  CAPA 3 — MACHINE LEARNING                                         │
│  Anomaly Detection · RCA Prediction · Pattern Mining               │
├─────────────────────────────────────────────────────────────────────┤
│  CAPA 2 — CORRELACIÓN Y TOPOLOGÍA                                  │
│  Correlation Engine (Union-Find) · Topology Inference · Impact     │
├─────────────────────────────────────────────────────────────────────┤
│  CAPA 1 — INGESTA Y PROCESAMIENTO                                  │
│  Edge Collectors · Kafka Pipeline · Zabbix Sync · Normalización    │
├─────────────────────────────────────────────────────────────────────┤
│  ALMACENAMIENTO POLÍGLOTA                                          │
│  PostgreSQL · Redis · Neo4j · OpenSearch · MinIO                   │
└─────────────────────────────────────────────────────────────────────┘

3.2 Patrón arquitectónico

EVAIOps implementa una arquitectura híbrida event-driven microservices:

Principio	Implementación
Event-Driven Ingestion	Kafka como bus central; 4 consumers en pipeline secuencial
CQRS parcial	Escrituras vía Kafka → PostgreSQL; lecturas Redis cache → PostgreSQL
Microservicios especializados	API, Workers, Consumers, EOE, Billing, Gateway como procesos independientes
Edge Computing	Collector Agent desplegable on-premise con buffer local SQLite
Persistencia políglota	PostgreSQL + Neo4j + OpenSearch + Redis + MinIO según el tipo de dato
AI-First Design	Cada incidente pasa por 3 capas de inteligencia: ML, LLM y reglas de correlación
Multi-tenant nativo	`tenant_id` en cada modelo ORM y en cada query

3.3 Flujo de datos completo

┌──────────────┐   HTTPS+JWT   ┌──────────────────┐   Kafka produce
│ Edge         │ ─────────────▶│ Ingestion Gateway │ ──────────────────┐
│ Collector    │               │ :8080             │                   │
│ (Syslog/SNMP │               └──────────────────┘                   ▼
│  Webhook/    │                                              ┌─────────────────┐
│  Zabbix)     │   JSON-RPC pull (15min)                     │  Apache Kafka   │
└──────────────┘ ──────────────────────────────────────────▶ │  KRaft :9092    │
                  Celery: sync_all_datasources                │                 │
                                                              └────────┬────────┘
                                                                       │
                                          ┌────────────────────────────┼────────────────────────────┐
                                          ▼                            ▼                            ▼
                               ┌──────────────────┐        ┌──────────────────┐       (parallel sinks)
                               │  Normalization   │        │   Enrichment     │
                               │  Consumer        │──────▶ │   Consumer       │──────▶ ┌────────────┐
                               │  Syslog/SNMP →  │        │  tenant_id +     │        │  Routing   │──▶ OpenSearch
                               │  canonical fmt   │        │  source_agent    │        │  Consumer  │──▶ PostgreSQL
                               └──────────────────┘        └──────────────────┘        └────────────┘
                                                                       │
                                                                       ▼
                                                           ┌────────────────────┐
                                                           │  AIOps Adapter     │
                                                           │  Kafka → Celery    │──▶ Correlation Engine
                                                           └────────────────────┘        │
                                                                                         ▼
                                                                                ┌─────────────────┐
                                                                                │ Incident Manager│──▶ PostgreSQL
                                                                                └─────────────────┘
                                                                                         │
                                                                        ┌────────────────┼────────────────┐
                                                                        ▼                ▼                ▼
                                                                   ML Workers      LLM Layer         EOE/TRE
                                                                   (Anomaly,       (Chatbot,         (Automatización)
                                                                    RCA, Pattern)   Narrator, NL)

3.4 Latencia por segmento (estimada)

Segmento	Latencia típica	SLA target
Edge → Gateway	50–200 ms	< 500 ms
Gateway → Kafka	< 5 ms	< 10 ms
Normalization + Enrichment	< 20 ms	< 50 ms
Routing → OpenSearch	10–50 ms	< 100 ms
Routing → PostgreSQL	5–20 ms	< 50 ms
API (path con caché)	2–10 ms	< 50 ms
API (query a DB)	20–200 ms	< 500 ms
Correlation Engine	100 ms – 2 s	< 5 s
ML Inference (RCA)	50–500 ms	< 2 s

3.5 Documentación arquitectónica completa

docs/architecture/01-technical-overview.md — Documentación técnica completa (~100 KB) con diagramas Mermaid de todos los flujos
docs/modules/index.md — Referencia de módulos con variables de entorno y estados
docs/architecture/02-arquitectura-objetivo.md — Arquitectura objetivo a largo plazo