Kris Deugau
2008-04-09 16:12:43 UTC
Anyone have any suggestions on tuning a large global Bayes db for
stability and sanity? I've got my fingers in the pie of a moderately
large mail cluster, but I haven't yet found a Bayes configuration that's
sane and stable for any extended period. Wiping it completely about
once a week seems to provide "acceptable" filtering performance (we have
a number of addon rulesets), but I still see spam in my inbox with
BAYES_00 - a sure sign of a mistuned Bayes database.
Past experience with (much) smaller systems has shown stable behaviour
with bayes_expiry_max_db_size set to 1500000 (~40M BDB Bayes), daily
expiry runs delete ~25-35K tokens; mail volume ~3K/day. However, the
larger system (MySQL, currently set with max_db_size at 3000000, on-disk
files running ~100M) only seems to be expiring that same 25-35K tokens
even though autolearn is picking up ~1.5M+ from ~300K messages on a
daily basis. Reading through the docs on token expiry I would guess it
should be far more aggressive than it is. (Among other things, I really
don't want to bump up max_db_size by two orders of magnitude; up to ~5M
should be fine, and I could see as high as 7.5M if really necssary.)
I'm not even really sure what questions to ask to get more detail;
sa-learn -D doesn't really spit out *enough* detail about the expiry
process to know for sure if something is going wrong there.
-kgd
stability and sanity? I've got my fingers in the pie of a moderately
large mail cluster, but I haven't yet found a Bayes configuration that's
sane and stable for any extended period. Wiping it completely about
once a week seems to provide "acceptable" filtering performance (we have
a number of addon rulesets), but I still see spam in my inbox with
BAYES_00 - a sure sign of a mistuned Bayes database.
Past experience with (much) smaller systems has shown stable behaviour
with bayes_expiry_max_db_size set to 1500000 (~40M BDB Bayes), daily
expiry runs delete ~25-35K tokens; mail volume ~3K/day. However, the
larger system (MySQL, currently set with max_db_size at 3000000, on-disk
files running ~100M) only seems to be expiring that same 25-35K tokens
even though autolearn is picking up ~1.5M+ from ~300K messages on a
daily basis. Reading through the docs on token expiry I would guess it
should be far more aggressive than it is. (Among other things, I really
don't want to bump up max_db_size by two orders of magnitude; up to ~5M
should be fine, and I could see as high as 7.5M if really necssary.)
I'm not even really sure what questions to ask to get more detail;
sa-learn -D doesn't really spit out *enough* detail about the expiry
process to know for sure if something is going wrong there.
-kgd