Data Flow
This page describes the primary data flows in OpenModels — how data moves from community contributions through validation and ingestion into the platform, how users discover models via the API, and how telemetry is collected from providers.
Registry Contribution Flow
When a contributor adds or updates a model, provider, or mapping in the registry, the following sequence occurs:
Validation Steps
The validation pipeline performs the following checks on every pull request:
| Step | Check | Failure Behavior |
|---|---|---|
| 1 | YAML syntax parsing | Rejects malformed YAML files |
| 2 | JSON Schema validation | Rejects files that don’t match schema definitions |
| 3 | Referential integrity | Rejects mappings referencing non-existent models or providers |
| 4 | Duplicate detection | Rejects duplicate model or provider IDs |
Ingestion Details
The ingestion script runs on every push to main that modifies models/, providers/, or mappings/ directories:
| Aspect | Detail |
|---|---|
| Trigger | Push to main branch (filtered by path) |
| Runtime | Python 3.11 on GitHub Actions |
| Strategy | Upsert (INSERT ... ON CONFLICT UPDATE) for idempotency |
| Transaction | Atomic — all changes succeed or all roll back |
| Cache invalidation | Pattern-based Redis key deletion after successful write |
Model Discovery Flow
When a user searches for models or retrieves model details through the API:
API Response Caching
| Endpoint Pattern | Cache TTL | Invalidation Trigger |
|---|---|---|
/api/models | 5 minutes | Registry ingestion |
/api/models/{id} | 5 minutes | Registry ingestion |
/api/models/{id}/providers | 5 minutes | Registry ingestion |
/api/providers | 5 minutes | Registry ingestion |
/api/telemetry/* | 1 minute | New telemetry data |
Telemetry Collection Flow
The telemetry worker continuously monitors provider health and latency:
Telemetry Metrics
| Metric | Collection Interval | Storage | Retention |
|---|---|---|---|
| Health status | Every 5 minutes | PostgreSQL | 30 days |
| Time to first token (TTFT) | Every 15 minutes | PostgreSQL | 30 days |
| Total response time | Every 15 minutes | PostgreSQL | 30 days |
| Availability (uptime %) | Computed from health records | Derived | Rolling 7 days |
Provider Ranking Algorithm
When a user requests ranked providers for a model via GET /api/telemetry/ranked/{model_id}, the API computes a composite score:
| Factor | Weight | Source |
|---|---|---|
| Uptime percentage (7-day rolling) | 40% | Health probe records |
| Median latency (TTFT) | 30% | Latency probe records |
| Price per million tokens | 20% | Registry mapping data |
| Median total response time | 10% | Latency probe records |
End-to-End Data Lifecycle
The complete lifecycle of data in OpenModels from contribution to user consumption:
Related Pages
- Architecture Overview — High-level system architecture and component descriptions
- Schemas — YAML schema definitions for models, providers, and mappings