Adapters aren't a theoretical exercise. They solve concrete operational problems that come up any time you work with multiple embedding models or providers in production.
Use a lightweight local model for queries while your index was built with a heavier cloud provider. Get sub-10ms embeddings without sacrificing retrieval quality.
Move from one embedding provider to another incrementally. No need to re-embed millions of documents at once — translate on the fly during the transition.
Handle rate limits, outages, and provider changes without breaking your retrieval pipeline. Route through local adapters when the cloud is unavailable.
Compare how different embedding providers behave on your data without committing to any single one. Sample multiple spaces from a single source model.
Detect when a query is too complex for your local model and intelligently route it to a stronger provider. Built-in confidence scoring tells you when to escalate.
Slash embedding API costs by running common queries through a free local model with an adapter, and reserving cloud calls for only the hardest cases.
Your Pinecone index has 5M documents embedded with OpenAI text-embedding-3-small. You want to switch to a local model for cost and latency reasons, but re-embedding everything would take days and require downtime.
With an adapter, you embed new queries locally with MiniLM, translate them into OpenAI's space on the fly, and query your existing index directly. Zero downtime, zero re-indexing. Migrate at your own pace.
You want most queries handled locally for speed and cost, but some queries are out-of-distribution for your adapter. Use score_source() to check confidence per query, and only call the API when the adapter isn't sure.
Result: 90% of queries resolve in <2ms locally. The remaining 10% fall back to the API. Your blended cost drops by 90% and median latency drops by 50×.
Different teams in your org use different embedding models — one team uses BGE, another uses OpenAI, a third uses E5. With adapters, you can define a single "canonical" embedding space and translate everything into it.
Treat embedding space as a contract, not an implementation detail. Each team keeps their model; adapters handle the translation layer.