Skip to Content
ArchitectureSystem Architecture Overview

System Architecture Overview

OpenModels is a hybrid infrastructure layer for discovering, validating, and comparing LLM models and inference providers. The system follows a registry-first, model-centric architecture where:

  • Open Registry — YAML files in a public GitHub repository serve as the single source of truth for models, providers, and mappings
  • Proprietary Platform — Internal services provide telemetry, analytics, search, and a web interface
  • Model-Centric Design — Users search for canonical models (e.g., deepseek-v3, qwen3-coder) and discover which providers offer them

High-Level Architecture

Key Design Principles

PrincipleDescription
Registry as Source of TruthAll model, provider, and mapping data originates from validated YAML files
Validation-FirstAutomated validation prevents invalid data from entering the system
Performance Through CachingRedis caching reduces database load and improves response times
ObservabilityComprehensive telemetry tracks provider health, latency, and availability
Developer ExperienceClear APIs, interactive documentation, and intuitive web interface

Technology Stack

ComponentTechnologyJustification
RegistryYAML + JSON SchemaHuman-readable, version-controlled, schema-validated
ValidationGitHub Actions + PythonAutomated CI/CD, native GitHub integration
Platform APINestJS (TypeScript)Type-safe, modular, excellent OpenAPI support
Web InterfaceNext.js 16 (React)SSR/SSG for SEO, App Router, modern DX
Telemetry WorkerPython + CeleryAsync task execution, robust scheduling
DatabasePostgreSQL 15+JSONB support, full-text search, reliability
CacheRedis 7+High-performance caching, pub/sub for invalidation
API DocumentationOpenAPI 3.1 + Swagger UIInteractive docs, code generation support

System Components

1. Public Registry

The registry is a public GitHub repository containing YAML definitions for all models, providers, and mappings. It serves as the canonical source of truth for the entire platform.

openmodels/ ├── models/ # Model definitions (e.g., deepseek-v3.yaml) ├── providers/ # Provider definitions (e.g., openai.yaml) ├── mappings/ # Model-to-provider mappings with pricing │ ├── openai/ │ ├── anthropic/ │ └── ... └── schemas/ # JSON Schema definitions for validation

2. Validation Pipeline

A GitHub Actions workflow that runs on every pull request to the registry:

  1. YAML Syntax — Validates that all files are parseable YAML
  2. Schema Validation — Checks files against JSON Schema definitions
  3. Referential Integrity — Ensures mappings reference existing models and providers
  4. Duplicate Detection — Prevents duplicate model or provider IDs

3. Ingestion Script

A Python script triggered on merge to main that loads registry data into PostgreSQL:

  • Uses upsert logic (INSERT ... ON CONFLICT UPDATE) for idempotent operations
  • Maintains transaction boundaries for data consistency
  • Invalidates affected Redis cache keys after updates

4. Platform API

A NestJS REST API providing model/provider discovery and telemetry endpoints:

MethodEndpointDescription
GET/api/modelsList models with filtering and search
GET/api/models/{id}Get model details
GET/api/models/{id}/providersList providers for a model
GET/api/models/{id}/compareCompare providers for a model
GET/api/providersList all providers
GET/api/providers/{id}Get provider details
GET/api/telemetry/health/{provider_id}Provider health status
GET/api/telemetry/latency/{provider_id}Provider latency metrics
GET/api/telemetry/ranked/{model_id}Ranked providers by performance
GET/api/healthSystem health check

5. Telemetry Worker

A Python Celery worker that monitors provider health and latency:

  • Health probes — Every 5 minutes, checks provider API availability
  • Latency probes — Every 15 minutes, measures time-to-first-token and total response time
  • Results are stored in PostgreSQL and exposed via the Platform API

6. Web Interface

A Next.js 16 application providing a user-facing interface for:

  • Searching and browsing models
  • Comparing providers side-by-side (pricing, latency, uptime)
  • Viewing real-time telemetry dashboards

7. Caching Layer

Redis provides high-performance caching with key-based invalidation:

  • Model/Provider queries — 5 minute TTL
  • Telemetry data — 1 minute TTL
  • Pattern-based invalidation — When data changes, affected cache keys are cleared

System Boundaries

In Scope

  • Registry management (YAML files, schemas, validation)
  • Data ingestion pipeline (YAML → PostgreSQL)
  • REST API for model/provider discovery and comparison
  • Telemetry collection (health probes, latency monitoring)
  • Web interface for browsing and comparing models
  • Caching layer for performance optimization

Out of Scope

  • Actual LLM inference (delegated to providers)
  • User authentication and authorization (future phase)
  • Billing and payment processing
  • Model fine-tuning or training
  • Data Flow — Detailed sequence diagrams for registry contributions and model discovery
  • Schemas — YAML schema definitions for models, providers, and mappings
Last updated on