1

Embed with source

Generate embeddings using your local model as you normally would. No changes to your existing pipeline.

2

Adapt + score

The adapter translates vectors into the target space and scores confidence in the same forward pass. One number tells you how reliable this translation is.

3

Route or retrieve

High-confidence queries go straight to your index at $0. Low-confidence queries route to the provider — one API call, only when needed.

What adapters actually do

The adapter model

Each adapter is a compact, multi-layer model — not a simple linear transform. The architecture is designed to capture the non-trivial geometric differences between embedding spaces while staying fast enough for real-time inference (<0.5ms per embedding on CPU).

The adapter files are small (a few MB), load instantly, and have negligible memory overhead. The design is deliberately lightweight so that the translation step never becomes the bottleneck in your pipeline.

Confidence scoring

Each query is scored individually. The quality head evaluates how well an input embedding fits within the adapter's training distribution, returning a confidence value between 0 and 1.

High scores mean the translation is reliable — retrieval matches the target model. Low scores mean the input is out-of-distribution and you should route to the native provider for a guaranteed-quality embedding.

You control the confidence threshold. Higher routes more queries to the provider for guaranteed accuracy, lower keeps more local for maximum savings. The calibrate endpoint analyzes your data and recommends the optimal setting, so you never degrade below your baseline.

Training process

Adapters are trained on parallel corpora: the same set of texts is embedded by both the source and target model, producing aligned vector pairs. The adapter learns to map from one space to the other on this parallel data.

The critical challenge in training is avoiding collapse — ensuring the adapter preserves fine-grained distinctions rather than converging to a degenerate mapping. Our training pipeline is carefully designed to prevent this, producing adapters that maintain neighbor structure, relative distances, and cluster boundaries even across very different embedding geometries.

Pre-trained adapters in the registry cover common model pairs (MiniLM → OpenAI, BGE → OpenAI, E5 → OpenAI, etc.). You can also commission custom adapters trained on your domain data for higher fidelity on specialized corpora.

Three lines to cross-model compatibility

quickstart.py
from sentence_transformers import SentenceTransformer
from embedding_adapters import EmbeddingAdapter

# 1) Load a lightweight local model
model = SentenceTransformer("all-MiniLM-L6-v2")

# 2) Load a pre-trained adapter
adapter = EmbeddingAdapter.from_registry(
    source="sentence-transformers/all-MiniLM-L6-v2",
    target="openai/text-embedding-3-small",
    flavor="large",
)

# 3) Encode locally, translate into OpenAI's space
src_embs = model.encode(texts, normalize_embeddings=True)
translated = adapter.translate(src_embs)

# Check confidence per query
scores = adapter.score_source(src_embs)
# scores >= 0.99 → use translated
# scores <  0.99 → fall back to API