AI & TechArtificial IntelligenceCybersecurityNewswireTechnology

Rspamd 4.0.0: New Protocol, Memory Savings, Required Migration

▼ Summary

– Rspamd 4.0 introduces a new `/checkv3` scan protocol using structured JSON or msgpack for metadata and supports per-part compression.
– The update integrates Fasttext internally, eliminating an external library and significantly reducing memory usage in multi-worker setups.
– Fuzzy hash storage is enhanced to support multiple flags per digest, and a new rule detects phishing campaigns that reuse HTML templates.
– A breaking change replaces Jump Hash with Ring Hash for sharded Bayes, requiring a data migration step before upgrade for affected deployments.
– Configuration now supports Jinja2 templating with environment variables, and workers can natively serve HTTPS without a reverse proxy.

The latest major release of the open-source spam filtering system introduces significant architectural improvements focused on performance, memory efficiency, and deployment flexibility. This version includes several breaking changes that require careful attention, particularly for administrators managing sharded Redis deployments for per-user statistics.

A fundamental shift arrives with a new scanning protocol. The platform now offers a /checkv3 endpoint that replaces traditional HTTP headers with structured JSON or msgpack for transporting metadata. This modern protocol utilizes multipart formats for requests and responses, supports per-part zstd compression, and employs efficient zero-copy writes for output. Administrators can activate it using command-line flags, while the legacy protocol remains available for compatibility.

Memory consumption sees a dramatic reduction thanks to a redesigned approach to machine learning models. The dependency on an external libfasttext library has been eliminated. Instead, a new built-in mmap-based shim loads model data directly into shared memory accessible by all worker processes. This architectural change removes redundant per-worker heap copies, yielding estimated RAM savings between 500MB and 7GB in typical deployments. Existing model files continue to work without modification.

Enhancements to the fuzzy storage system now allow a single stored hash digest to carry up to eight flags simultaneously. This improvement enables multiple detection rules to independently match the same hash without creating duplicate entries in the database, optimizing storage efficiency. The update logic for Redis was completely rewritten in Lua for better performance. A new detection symbol, HTMLFUZZYPHISHING, specifically targets phishing campaigns that reuse HTML template structures while changing the embedded malicious links.

A critical change for administrators involves the hashing algorithm used for consistent sharding. The system has moved from Jump Hash to Ring Hash (Ketama) with virtual nodes. This algorithm provides better stability, as only approximately 1/n keys redistribute when an upstream server fails, and keys return to their original shard upon recovery. This is a breaking change for deployments using sharded Redis for per-user Bayesian filtering. A mandatory migration step must be completed before upgrading: administrators must run the `rspamadm statistics_dump migrate` command to prevent data from landing on incorrect shards after the update.

Operational flexibility is increased with new native features. Workers can now serve HTTPS traffic directly without requiring a front-end reverse proxy, with SSL support auto-detected from socket configuration. For load balancing across proxy upstreams, the default algorithm has switched from simple round-robin to a more sophisticated token bucket balancing system, which offers configurable parameters for handling traffic bursts.

Configuration management becomes more dynamic with the introduction of a Jinja2-compatible templating engine. Configuration files are preprocessed, allowing the use of control structures and expressions. Environment variables prefixed with `RSPAMD_` are automatically available within templates, facilitating modern containerized deployments where configuration is injected via environment variables. Built-in validation filters can abort startup on invalid input, ensuring configuration integrity.

Additional noteworthy updates include native support for generating UUID v7 identifiers per scanning task, synchronized with logging. The Bayesian classifier now features multiclass support, enabling it to learn arbitrary categories beyond simple spam/ham classification. The regular expression engine’s Hyperscan pattern compilation has been moved to an asynchronous Lua backend with a Redis-based shared cache, resolving several use-after-free conditions that occurred during live reloads.

Important for default security posture, SenderScore RBLs are now disabled by default. These rules required a paid MyValidity account and were previously returning blocked results for all unregistered IPs, which could cause false positives. Operators with valid accounts must explicitly re-enable these rules in their configuration.

The release also bundles numerous fixes, including corrections for PDF parsing evasions, updates to DKIM key handling to follow RFC standards, a resolved memory leak in DKIM signing, and patches for a fuzzy storage client busy-loop and a use-after-free condition.

(Source: Help Net Security)

Topics

new scan protocol 95% fasttext integration 93% breaking changes 90% fuzzy hash flags 88% ring hash replacement 87% native https support 85% token bucket balancing 83% jinja2 templating 82% bayesian multiclass support 80% hyperscan compilation 78%