Spam Scanner: the best anti-spam, email filtering, and phishing prevention service

by do son · Published November 11, 2021 · Updated October 25, 2022

Spam Scanner

Spam Scanner is a drop-in replacement and the best alternative to SpamAssassin, rspamd, SpamTitan, and more.

Spam Scanner

Foreword

Spam Scanner is a tool and service built by @niftylettuce after hitting countless roadblocks with existing spam-detection solutions. In other words, it’s our current plan for spam.

Our goal is to build and utilize a scalable, performant, simple, easy to maintain, and powerful API for use in our service at Forward Email to limit spam and provide other measures to prevent attacks on our users.

Initially, we tried using SpamAssassin, and later evaluated rspamd – but in the end, we learned that all existing solutions (even ones besides these) are overtly complex, missing required features or documentation, incredibly challenging to configure; high-barrier to entry, or have proprietary storage backends (that could store and read your messages without your consent) that limit our scalability.

To us, we value privacy and the security of our data and users – specifically, we have a “Zero-Tolerance Policy” on storing logs or metadata of any kind, whatsoever (see our Privacy Policy for more on that). None of these solutions honored this privacy policy (without removing essential spam-detection functionality), so we had to create our own tool – thus “Spam Scanner” was born.

The solution we created provides several Features and is completely configurable to your liking. You can learn more about the actual Algorithm below. Contributors are welcome.

Features

Spam Scanner includes modern, essential, and performant features that help reduce spam, phishing, and executable attacks.

Naive Bayes Classifier

Our Naive Bayesian classifier is available in this repository, the npm package, and is updated frequently as it gains upstream, anonymous, SHA-256 hashed data from Forward Email.

It was trained with an extremely large dataset of spam, ham, and abuse reporting format (“ARF”) data. This dataset was compiled privately from multiple sources.

Spam Content Detection

Provides an out-of-the-box trained Naive Bayesian classifier (uses naivebayes and natural under the hood), which is sourced from hundreds of thousands of spam and ham emails. This classifier relies upon tokenized and stemmed words (with respect to the language of the email as well) into two categories (“spam” and “ham”).

Phishing Content Detection

Robust phishing detection approach which prevents domain swapping, IDN homograph attacks, and more.

Executable Link and Attachment Detection

Link and attachment detection techniques that check links in the message, “Content-Type” headers, file extensions, magic number, and prevents homograph attacks on file names – all against a list of executable file extensions.