Qdrant 1.18: Faster, Smarter, and Easier Vector Search

1. A More Flexible Future for Vector Databases

Qdrant 1.18 is a big step toward making vector databases more flexible and easier to manage in production. Instead of treating vector indexes as something rigid that needs rebuilding every time you change them, Qdrant is moving toward a more dynamic system that can adapt while staying online.

For teams running AI systems at scale, this means better scalability, better monitoring, and fewer painful maintenance operations.

This release focuses on five major improvements:

In-place vector schema updates
Better memory monitoring
Improved audit logging
Stronger strict mode protections
TurboQuant, a new high-efficiency quantization method

One of the biggest improvements starts with solving a long-time issue in vector databases: re-indexing.

2. Dynamic Updates Without Rebuilding Collections

In AI applications, embedding models change often. Before Qdrant 1.18, updating a collection’s vector setup usually meant recreating the collection and re-uploading all data.

Now, Qdrant allows adding or removing named vectors directly using the PATCH API or schema endpoints, without downtime.

This means your collection structure can evolve together with your application while the cluster keeps running normally.

Real-World Benefit

A common use case is migrating to a new embedding model.

You can now:

Create a new vector field
Re-embed data gradually in the background
Test the new setup
Remove the old vector field later

All without shutting down the cluster or re-importing everything.

As collections become more dynamic, monitoring resource usage becomes even more important.

To add a new dense named vector to an existing collection:

client.create_vector_name(
    collection_name="{collection_name}",
    vector_name="{vector_name}",
    vector_name_config=models.DenseVectorNameConfig(
        dense=models.DenseVectorConfig(
            size=256,
            distance=models.Distance.COSINE,
        ),
    ),
)

To add a new sparse named vector to an existing collection:

client.create_vector_name(
    collection_name="{collection_name}",
    vector_name="{vector_name}",
    vector_name_config=models.SparseVectorNameConfig(
        sparse=models.SparseVectorConfig(
            modifier=models.Modifier.IDF,
        ),
    ),
)

3. Better Memory Monitoring and Observability

Memory usage in databases is often difficult to understand because operating system metrics don’t clearly show what is using RAM.

Memory Monitoring Dashboard

Qdrant 1.18 improves this with detailed memory monitoring for each collection component. These metrics are available both in the Web UI and through an API endpoint.

Memory Breakdown by Component

Component	What It Tracks
Vectors	Dense vectors, HNSW indexes, quantization data, RAM usage, cached pages
Sparse Vectors	Sparse vector storage and indexing memory
Payload	Metadata storage attached to vectors
Payload Index	Memory used for metadata indexes
ID Tracker	Mapping between external IDs and internal IDs

Understanding “Expected Cache”

Modern operating systems use free RAM as file cache to improve performance. When vectors are stored with on_disk: true, the OS automatically keeps frequently used data in memory.

The Expected Cache metric shows how much data should ideally stay cached for fast searches.

If the Cached value is high, that’s usually a good sign, not a memory leak.

Comparing these values helps determine whether your system is properly warmed up for production traffic.

For a deeper technical explanation, see my last blog at:

TurboQuant: the compression Shannon would approve

4. TurboQuant: Better Compression Without Losing Too Much Accuracy

Before this release, there was a large gap between Scalar Quantization and Binary Quantization in terms of compression and recall quality.

TurboQuant fills that gap.

TurboQuant Compression Performance

It offers compression between 8x and 32x while still keeping strong recall performance.

TurboQuant works using:

Walsh-Hadamard rotations
Lloyd-Max quantization
Additional calibration techniques for real-world embeddings

One important advantage is that the rotation step preserves distances, so queries stay efficient without requiring expensive reverse operations.

Practical Results

TQ 4-bit is a strong replacement for Scalar Quantization users, cutting memory usage in half while keeping recall very close to the original.
TQ 1-bit and 2-bit significantly improve recall compared to traditional Binary Quantization at the same storage size.

In some datasets, TurboQuant improved recall by more than 20 percentage points compared to standard binary quantization.

Performance Optimizations

TurboQuant also uses low-level SIMD optimizations to keep searches fast even with heavy compression.

This allows Qdrant to reduce memory usage while still maintaining strong throughput.

For a full technical breakdown of TurboQuant and its math, check:

https://mohamedarbi.xyz/posts/turboquant-qdrant

5. Better Security and Infrastructure Protection

Qdrant 1.18 also adds several features focused on operational safety and debugging

Query Audit Logs

Administrators can now search JSON audit logs across the cluster using filters such as:

Time ranges
User or subject information

This simplifies security reviews and troubleshooting.

Request Tracing

Support for headers like:

x-request-id
x-tracing-id
traceparent

makes it easier to connect client requests with server logs during debugging.

Strict Mode Improvements

New protections help avoid resource exhaustion:

max_resident_memory_percent prevents memory-heavy writes when the system is close to running out of memory.
search_max_batchsize prevents very large workloads from affecting other users or collections.

Per-Collection Metrics

The new ?per_collection=true parameter helps identify latency spikes for specific collections in multi-tenant environments.

6. Conclusion

Qdrant 1.18 makes vector search systems more flexible, observable, and production-ready.

With dynamic schema updates, improved monitoring, TurboQuant compression, and stronger operational safeguards, this release helps teams scale AI applications more efficiently.

Upgrade Path

Recommended upgrade order:

1.16.x → 1.17.x → 1.18.0

Cloud Users

Select version 1.18 from the Qdrant Cloud dashboard for automatic migration.

Self-Hosted Users

Perform rolling node restarts and verify each node before moving to the next one to maintain availability.

Overall, Qdrant 1.18 gives developers more flexibility to evolve their vector search systems without sacrificing stability or performance.

References

Share this article:

(END)