Training and maintaining the Bayesian databases
Bayesian scanning uses databases to determine if an email is spam. For Bayesian scanning to be effective, the databases must be trained with known-spam and known-good email messages so the scanner can learn the differences between the two types of email. To maintain its effectiveness, false positives and false negatives must be sent to the FortiMail unit so the Bayesian scanner can learn from its mistakes.
The AntiSpam > Bayesian submenu lets you manage the databases used to store statistical information for Bayesian antispam processing, and to configure the email addresses used for remote control and training of the Bayesian databases.
To use a Bayesian database, you must enable the Bayesian scan in the antispam profile. For more information, see “Managing antispam profiles” on page 503.
This section contains the following topics:
- Types of Bayesian databases
- Training the Bayesian databases
- Example: Bayesian training
- Backing up, batch training, and monitoring the Bayesian databases
- Configuring the Bayesian training control accounts
- Backup and restore
Types of Bayesian databases
FortiMail units can have up to three types of Bayesian databases:
- Global
- Group
- User
All types contain Bayesian statistical data that can be used by Bayesian scans to detect spam, and should be trained in order to be most accurate for detecting spam within their respective scopes. For more information on training each type of Bayesian database, see “Training the Bayesian databases” on page 645.
Only one Bayesian database is used by any individual Bayesian scan; which type will be used depends on the directionality of the email and your configuration of the FortiMail unit’s protected domains and antispam profiles. For information, see “Use global Bayesian database” on page 401 and “Use personal database” on page 510.
Global
The global Bayesian database is a single database that contains Bayesian statistics that can be used to detect spam for any email user.
Outgoing antispam profiles can use only the global Bayesian database. Incoming antispam profiles can use global, per-domain, or per-user Bayesian databases.
If all spam sent to all protected domains has similar characteristics and you do not require your Bayesian scans to be tailored specifically to the email of a protected domain or email user, using the global database for all Bayesian scanning may be an ideal choice, because there is only one database to train and maintain.
For email that does not require use of the global database, if you want to use the global database, you must disable use of the per-domain and per-user Bayesian databases. For information on configuring protected domains to use the global Bayesian database, see “Use global Bayesian database” on page 401. For information on disabling use of per-user Bayesian databases by incoming antispam profiles, see “Use personal database” on page 510.
Group
Group Bayesian databases, also known as per-domain Bayesian databases, contain Bayesian statistics that can be used to detect spam for email users in a specific protected domain.
FortiMail units can have multiple group Bayesian databases: one for each protected domain.
If you require Bayesian scans to be tailored specifically to the email received by each protected domain, using per-domain Bayesian databases may provide greater accuracy and fewer false positives.
For example, medical terms are a common characteristic of many spam messages. However, those terms may be a poor indicator of spam if the protected domain belongs to a hospital. In this case, you may want to train a separate, per-domain Bayesian database in which medical terms are not statistically likely to indicate spam.
If you want to use a per-domain database, you must disable use of the global and per-user Bayesian databases. For information on disabling use of the global Bayesian database for a protected domain, see “Use global Bayesian database” on page 401. For information on disabling use of per-user Bayesian databases by incoming antispam profiles, see “Use personal database” on page 510.
User
User Bayesian databases, also known as personal or per-user Bayesian databases, contain Bayesian statistics that can be used to detect spam for individual email users or alias email addresses. FortiMail units can have multiple user Bayesian databases: one for each recipient email address.
Per-user Bayesian databases are separate for each email address on each protected domain. For example, if example.com and example.org are defined as protected domains, user1@example.com and user1@example.org will have separate per-user Bayesian databases, even if both email addresses belong to the same person.
If you require Bayesian scans to be tailored specifically to the email received by each email user, using per-user Bayesian databases may provide greater accuracy and fewer false positives.
For example, stock quotes are a common characteristic of many spam messages. However, those terms may be a poor indicator of spam if the email user is a financial advisor. In this case, for that email user, you may want to train a separate, per-user Bayesian database in which stock quotes are not statistically likely to indicate spam.
If you want to use a per-user database, you must enable the use of per-user Bayesian databases. For information on enabling use of per-user Bayesian databases by incoming antispam profiles, see “Use personal database” on page 510. Unlike global and per-domain Bayesian databases, for per-user Bayesian databases, you must also verify that the per-user Bayesian database has reached maturity. For more information, see “Training the Bayesian databases” on page 645.
Emails from at least one customer are still going to quarantine after being added to personal AND system safe list. What am I missing?