Each adapter is a compact neural network that translates one embedding space into another. The key isn't just the translation — it's knowing when the translation is reliable. A built-in quality head scores every query, enabling smart routing: use the adapter when confident, call the provider when not.
Generate embeddings using your local model as you normally would. No changes to your existing pipeline.
The adapter translates vectors into the target space and scores confidence in the same forward pass. One number tells you how reliable this translation is.
High-confidence queries go straight to your index at $0. Low-confidence queries route to the provider — one API call, only when needed.
Each adapter is a compact, multi-layer model — not a simple linear transform. The architecture is designed to capture the non-trivial geometric differences between embedding spaces while staying fast enough for real-time inference (<0.5ms per embedding on CPU).
The adapter files are small (a few MB), load instantly, and have negligible memory overhead. The design is deliberately lightweight so that the translation step never becomes the bottleneck in your pipeline.
Each query is scored individually. The quality head evaluates how well an input embedding fits within the adapter's training distribution, returning a confidence value between 0 and 1.
High scores mean the translation is reliable — retrieval matches the target model. Low scores mean the input is out-of-distribution and you should route to the native provider for a guaranteed-quality embedding.
You control the confidence threshold. Higher routes more queries to the provider for guaranteed accuracy, lower keeps more local for maximum savings. The calibrate endpoint analyzes your data and recommends the optimal setting, so you never degrade below your baseline.
Adapters are trained on parallel corpora: the same set of texts is embedded by both the source and target model, producing aligned vector pairs. The adapter learns to map from one space to the other on this parallel data.
The critical challenge in training is avoiding collapse — ensuring the adapter preserves fine-grained distinctions rather than converging to a degenerate mapping. Our training pipeline is carefully designed to prevent this, producing adapters that maintain neighbor structure, relative distances, and cluster boundaries even across very different embedding geometries.
Pre-trained adapters in the registry cover common model pairs (MiniLM → OpenAI, BGE → OpenAI, E5 → OpenAI, etc.). You can also commission custom adapters trained on your domain data for higher fidelity on specialized corpora.
from sentence_transformers import SentenceTransformer from embedding_adapters import EmbeddingAdapter # 1) Load a lightweight local model model = SentenceTransformer("all-MiniLM-L6-v2") # 2) Load a pre-trained adapter adapter = EmbeddingAdapter.from_registry( source="sentence-transformers/all-MiniLM-L6-v2", target="openai/text-embedding-3-small", flavor="large", ) # 3) Encode locally, translate into OpenAI's space src_embs = model.encode(texts, normalize_embeddings=True) translated = adapter.translate(src_embs) # Check confidence per query scores = adapter.score_source(src_embs) # scores >= 0.99 → use translated # scores < 0.99 → fall back to API