Text Reranker
The text_reranker model reorders a list of documents by relevance to a query. It is typically the last step in a retrieval pipeline, applied after an initial fast retrieval to improve precision before passing context to an LLM.
Dependencies
See Dependency Management for the complete provider matrix.
✦₊⁺ Overview
Rerankers score each (query, document) pair directly, producing a relevance score that is more accurate than embedding similarity alone. They enable:
- RAG Precision: Select the most relevant chunks before LLM context
- Search Quality: Improve ranking over BM25 or vector search results
- Multilingual Reranking: Rank documents across different languages
1. Quick Start
Example
import msgflux as mf
model = mf.Model.text_reranker("jinaai/jina-reranker-v3")
documents = [
"Python is a general-purpose programming language.",
"The Eiffel Tower is located in Paris, France.",
"Machine learning is a subset of artificial intelligence.",
"Neural networks are inspired by the human brain."
]
results = model(
query="What is machine learning?",
documents=documents
).consume()
for r in results:
print(f"[{r['relevance_score']:.3f}] {documents[r['index']]}")
# [ 0.452] Machine learning is a subset of artificial intelligence.
# [-0.059] Neural networks are inspired by the human brain.
# [-0.086] Python is a general-purpose programming language.
# [-0.184] The Eiffel Tower is located in Paris, France.
2. Supported Providers
JinaAI
Example
vLLM (self-hosted)
Example
3. Response Format
consume() returns a list of results sorted by relevance_score descending, each containing:
index— original position in thedocumentslistrelevance_score— relevance score (higher = more relevant). Range depends on the model:- v3: raw logit score, can be negative — compare relative values, not absolute
- v2: normalized probability in
[0, 1]
Example
import msgflux as mf
model = mf.Model.text_reranker("jinaai/jina-reranker-v3")
docs = ["Doc A", "Doc B", "Doc C"]
results = model(query="my query", documents=docs).consume()
# Pick only the top result
best = results[0]
print(docs[best["index"]]) # most relevant document text
print(best["relevance_score"]) # e.g. 0.45 (highest among the batch)
4. RAG Integration
A typical pattern: retrieve candidates with BM25 or vector search, then rerank before passing to the LLM.
Example
import msgflux as mf
reranker = mf.Model.text_reranker("jinaai/jina-reranker-v3")
chat = mf.Model.chat_completion("openai/gpt-4.1-mini")
def rag(query: str, candidate_docs: list[str], top_k: int = 3) -> str:
# Rerank candidates
ranked = reranker(query=query, documents=candidate_docs).consume()
# Take top-k most relevant
context = "\n\n".join(
candidate_docs[r["index"]] for r in ranked[:top_k]
)
response = chat(messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {query}"
}])
return response.consume()
answer = rag(
query="What causes aurora borealis?",
candidate_docs=[...], # Retrieved from vector store
)
print(answer)
5. Async Support
Example
import msgflux as mf
import msgflux.nn.functional as F
model = mf.Model.text_reranker("jinaai/jina-reranker-v3")
queries = ["What is AI?", "How does photosynthesis work?"]
docs = ["Doc A", "Doc B", "Doc C"]
results = F.map_gather(
model,
kwargs_list=[
{"query": q, "documents": docs} for q in queries
]
)
for query, result in zip(queries, results):
top = result.consume()[0]
print(f"{query!r} → best doc: {docs[top['index']]}")
6. Response Caching
Example
import msgflux as mf
model = mf.Model.text_reranker(
"jinaai/jina-reranker-v3",
enable_cache=True,
cache_size=256
)
docs = ["Doc A", "Doc B", "Doc C"]
# First call — hits API
results1 = model(query="my query", documents=docs).consume()
# Second call — served from cache
results2 = model(query="my query", documents=docs).consume()