Each adapter is a compact, carefully trained model that translates one embedding space into another. The key isn't architecture complexity — it's the training process, which ensures the adapter preserves the geometric relationships that matter for retrieval without collapsing or losing fidelity.
Generate embeddings using your local or source model as you normally would.
The adapter applies a trained transformation, mapping vectors into the target space.
Query your existing vector index directly — no re-embedding, no downtime, no data migration.
Each adapter is a compact, multi-layer model — not a simple linear transform. The architecture is designed to capture the non-trivial geometric differences between embedding spaces while staying fast enough for real-time inference (<0.5ms per embedding on CPU).
The adapter files are small (a few MB), load instantly, and have negligible memory overhead. The design is deliberately lightweight so that the translation step never becomes the bottleneck in your pipeline.
Each adapter also learns a distribution boundary during training. At inference time, score_source() evaluates how well an input embedding fits within the source model's known space. This returns a confidence value between 0 and 1.
High scores (≥ 0.99) mean the input is well-represented by the adapter's training data and the translation is reliable. Low scores (< 0.95) indicate the input is out-of-distribution — the adapter may still work, but you should consider falling back to the native API.
This is the foundation of the hybrid routing strategy: use the adapter when confident, escalate to the API when not.
Adapters are trained on parallel corpora: the same set of texts is embedded by both the source and target model, producing aligned vector pairs. The adapter learns to map from one space to the other on this parallel data.
The critical challenge in training is avoiding collapse — ensuring the adapter preserves fine-grained distinctions rather than converging to a degenerate mapping. Our training pipeline is carefully designed to prevent this, producing adapters that maintain neighbor structure, relative distances, and cluster boundaries even across very different embedding geometries.
Pre-trained adapters in the registry cover common model pairs (MiniLM → OpenAI, BGE → OpenAI, E5 → OpenAI, etc.). You can also commission custom adapters trained on your domain data for higher fidelity on specialized corpora.
from sentence_transformers import SentenceTransformer from embedding_adapters import EmbeddingAdapter # 1) Load a lightweight local model model = SentenceTransformer("all-MiniLM-L6-v2") # 2) Load a pre-trained adapter adapter = EmbeddingAdapter.from_registry( source="sentence-transformers/all-MiniLM-L6-v2", target="openai/text-embedding-3-small", flavor="large", ) # 3) Encode locally, translate into OpenAI's space src_embs = model.encode(texts, normalize_embeddings=True) translated = adapter.translate(src_embs) # Check confidence per query scores = adapter.score_source(src_embs) # scores >= 0.99 → use translated # scores < 0.99 → fall back to API