Skip to Content
ArchitectureData Flow

Data Flow

This page describes the primary data flows in OpenModels — how data moves from community contributions through validation and ingestion into the platform, how users discover models via the API, and how telemetry is collected from providers.

Registry Contribution Flow

When a contributor adds or updates a model, provider, or mapping in the registry, the following sequence occurs:

Validation Steps

The validation pipeline performs the following checks on every pull request:

StepCheckFailure Behavior
1YAML syntax parsingRejects malformed YAML files
2JSON Schema validationRejects files that don’t match schema definitions
3Referential integrityRejects mappings referencing non-existent models or providers
4Duplicate detectionRejects duplicate model or provider IDs

Ingestion Details

The ingestion script runs on every push to main that modifies models/, providers/, or mappings/ directories:

AspectDetail
TriggerPush to main branch (filtered by path)
RuntimePython 3.11 on GitHub Actions
StrategyUpsert (INSERT ... ON CONFLICT UPDATE) for idempotency
TransactionAtomic — all changes succeed or all roll back
Cache invalidationPattern-based Redis key deletion after successful write

Model Discovery Flow

When a user searches for models or retrieves model details through the API:

API Response Caching

Endpoint PatternCache TTLInvalidation Trigger
/api/models5 minutesRegistry ingestion
/api/models/{id}5 minutesRegistry ingestion
/api/models/{id}/providers5 minutesRegistry ingestion
/api/providers5 minutesRegistry ingestion
/api/telemetry/*1 minuteNew telemetry data

Telemetry Collection Flow

The telemetry worker continuously monitors provider health and latency:

Telemetry Metrics

MetricCollection IntervalStorageRetention
Health statusEvery 5 minutesPostgreSQL30 days
Time to first token (TTFT)Every 15 minutesPostgreSQL30 days
Total response timeEvery 15 minutesPostgreSQL30 days
Availability (uptime %)Computed from health recordsDerivedRolling 7 days

Provider Ranking Algorithm

When a user requests ranked providers for a model via GET /api/telemetry/ranked/{model_id}, the API computes a composite score:

FactorWeightSource
Uptime percentage (7-day rolling)40%Health probe records
Median latency (TTFT)30%Latency probe records
Price per million tokens20%Registry mapping data
Median total response time10%Latency probe records

End-to-End Data Lifecycle

The complete lifecycle of data in OpenModels from contribution to user consumption:

  • Architecture Overview — High-level system architecture and component descriptions
  • Schemas — YAML schema definitions for models, providers, and mappings
Last updated on