Semble: Efficient Code Search for AI Agents

Semble is an open-source tool designed to solve a common pain point when using AI agents like Claude Code on large codebases. Agents often rely on grep and file reading to find relevant code, consuming massive token budgets and still missing the right matches. Semble combines static embeddings, BM25, and code-aware reranking to deliver fast, accurate code retrieval using up to 98% fewer tokens than grep-based approaches—all on CPU and with zero configuration. Here are the answers to common questions about Semble.

What problem does Semble solve?

When AI agents search code, they typically fall back to grep, which reads entire files or launches subagents. This process is token-inefficient—often wasting thousands of tokens to find a single function—and still may miss relevant code. Developers using Claude Code on large repositories experienced this frustration. Existing code search tools were either too slow to index on demand, required API keys to external services, or had poor retrieval quality. Semble addresses all these pain points by providing a local, CPU-friendly, and highly efficient alternative that cuts token usage by 98% while maintaining near-perfect retrieval accuracy.

Semble: Efficient Code Search for AI Agents — Source: hnrss.org

How does Semble work under the hood?

Semble combines two retrieval signals: static Model2Vec embeddings (using the potion-code-16M model) and BM25 keyword matching. These are fused via Reciprocal Rank Fusion (RRF) and then reranked with code-aware signals like function boundaries and symbol names. Crucially, the embeddings are static—no transformer inference is needed—so everything runs on CPU. The index for a typical repository builds in about 250 milliseconds, and queries complete in roughly 1.5 milliseconds. This architecture avoids GPU dependency and keeps latency low while delivering high-quality code search results.

How does Semble compare to traditional grep-based search?

Grep-based search requires reading entire files to find matches, consuming a huge number of tokens—especially when agents grep first and then read the matched file fully. Semble uses 98% fewer tokens for the same retrieval task. On a benchmark of ~1250 query/document pairs across 63 repositories and 19 languages, Semble achieves an NDCG@10 of 0.854, which is 99% of the retrieval quality of a 137M-parameter code-trained transformer. Yet it is about 200 times faster on CPU and doesn't require any GPU or API keys. This makes it ideal for token-conscious agent workflows where both speed and accuracy matter.

What are the key performance metrics?

Semble's benchmark results speak for themselves: on the evaluated dataset, it achieves 0.854 NDCG@10, reaching 99% of the best transformer-based setup tested. Indexing a typical repository takes ~250 milliseconds, and each query completes in ~1.5 milliseconds on CPU (very large repos may take longer). The tool uses 98% fewer tokens compared to a grep+read approach. For developers concerned about latency, this means no waiting for GPU spins and no remote API calls—everything runs locally with minimal overhead. The combination of speed, accuracy, and token efficiency makes Semble a practical choice for real-time code search in agent-based development environments.

What makes Semble token-efficient?

Token efficiency comes from two key design choices. First, Semble indexes code statically: it creates a compact, local embedding index that eliminates the need to read entire files during search. When an agent issues a query, Semble returns only the most relevant code snippets (function bodies or small sections) rather than full files. Second, because retrieval uses static embeddings and BM25 on CPU, no transformer inference is required, avoiding the token overhead of loading and running a model for each query. The result is that an agent can find the right code with 98% fewer tokens than if it had to grep for a term and then read the matching file completely.

How can developers integrate Semble into their workflow?

Semble provides an MCP (Model Context Protocol) server, making it a drop-in replacement for grep-based search in tools like Claude Code, Cursor, Codex, and OpenCode. Installation is simple: add a single command to Claude Code—claude mcp add semble -s user -- uvx --from "semble[mcp]" semble. No API keys, no GPU, no external services are needed. Semble also supports standalone usage via its Python package. The full README includes instructions for other environments, benchmark details, and methodology. Developers can start using Semble immediately without any configuration overhead, and it works out of the box on any machine with CPU.

What languages and repositories does Semble support?

Semble is language-agnostic and has been benchmarked across 19 programming languages in 63 repositories. This includes common languages like Python, JavaScript, TypeScript, Go, Java, Rust, C++, and more. The static embedding model (potion-code-16M) was trained on code data and generalizes well to different syntaxes. Because it uses both static embeddings and BM25, it can handle a wide variety of codebases, from small libraries to large monorepos. The index time scales with repository size, but even for larger repos the CPU-only processing remains practical. Developers can use Semble on any codebase without worrying about language-specific setup.

How accurate is Semble compared to larger models?

Despite using a small 16M-parameter static model, Semble achieves 99% of the retrieval quality of a full 137M-parameter code-trained transformer. The NDCG@10 score of 0.854 is remarkably close to the transformer baseline. The secret is the fusion of static embeddings with BM25 and code-aware reranking: this compensates for the smaller model size by leveraging lexical matches and structural signals like function boundaries. As a result, Semble offers near-state-of-the-art accuracy while being 200 times faster on CPU and using zero GPU resources. For most practical code search tasks in agent workflows, the quality difference is negligible, and the dramatic gains in speed and token efficiency make Semble the preferred choice.

Tags: