How It Works — EmbeddingAdapters

Under the hood

What adapters actually do

The adapter model

Each adapter is a compact, multi-layer model — not a simple linear transform. The architecture is designed to capture the non-trivial geometric differences between embedding spaces while staying fast enough for real-time inference (<0.5ms per embedding on CPU).

The adapter files are small (a few MB), load instantly, and have negligible memory overhead. The design is deliberately lightweight so that the translation step never becomes the bottleneck in your pipeline.

Confidence scoring

Each adapter also learns a distribution boundary during training. At inference time, score_source() evaluates how well an input embedding fits within the source model's known space. This returns a confidence value between 0 and 1.

High scores (≥ 0.99) mean the input is well-represented by the adapter's training data and the translation is reliable. Low scores (< 0.95) indicate the input is out-of-distribution — the adapter may still work, but you should consider falling back to the native API.

This is the foundation of the hybrid routing strategy: use the adapter when confident, escalate to the API when not.

Training process

Adapters are trained on parallel corpora: the same set of texts is embedded by both the source and target model, producing aligned vector pairs. The adapter learns to map from one space to the other on this parallel data.

The critical challenge in training is avoiding collapse — ensuring the adapter preserves fine-grained distinctions rather than converging to a degenerate mapping. Our training pipeline is carefully designed to prevent this, producing adapters that maintain neighbor structure, relative distances, and cluster boundaries even across very different embedding geometries.

Pre-trained adapters in the registry cover common model pairs (MiniLM → OpenAI, BGE → OpenAI, E5 → OpenAI, etc.). You can also commission custom adapters trained on your domain data for higher fidelity on specialized corpora.

Quick start

Three lines to cross-model compatibility

quickstart.py

from sentence_transformers import SentenceTransformer
from embedding_adapters import EmbeddingAdapter

# 1) Load a lightweight local model
model = SentenceTransformer("all-MiniLM-L6-v2")

# 2) Load a pre-trained adapter
adapter = EmbeddingAdapter.from_registry(
    source="sentence-transformers/all-MiniLM-L6-v2",
    target="openai/text-embedding-3-small",
    flavor="large",
)

# 3) Encode locally, translate into OpenAI's space
src_embs = model.encode(texts, normalize_embeddings=True)
translated = adapter.translate(src_embs)

# Check confidence per query
scores = adapter.score_source(src_embs)
# scores >= 0.99 → use translated
# scores <  0.99 → fall back to API

Learned mappings between
embedding spaces

Embed with source

Adapt

Retrieve

What adapters actually do

The adapter model

Confidence scoring

Training process

Three lines to cross-model compatibility