Discussion:
Spam messages bypassing SA
Geoff Soper
2014-09-04 06:51:54 UTC
Permalink
Hi,
I've got an issue whereby spam messages seem to be somehow bypassing SA and getting into my inbox. I call SA via procmail as per https://wiki.apache.org/spamassassin/UsedViaProcmail

The exact procmail file that calls SA is as follows:

#
#Standard SA call to be included from .procmailrc files
#

:0 f
| formail -A"X-Procmail-SpamAssassinInclude: 23/03/2010"

#
# Pipe the mail through spamassassin (replace 'spamassassin' with 'spamc'
# if you use the spamc/spamd combination)
#
# The condition line ensures that only messages smaller than 250 kB
# (250 * 1024 = 256000 bytes) are processed by SpamAssassin. Most spam
# isn't bigger than a few k and working with big messages can bring
# SpamAssassin to its knees.
#
# The lock file ensures that only 1 spamassassin invocation happens
# at 1 time, to keep the load down.
#

:0fw: spamassassin.lock
* < 400000
| spamc -x


# Work around procmail bug: any output on stderr will cause the "F" in "From"
# to be dropped. This will re-add it.
:0
* ^^rom[ ]
{
LOG="*** Dropped F off From_ header! Fixing up. "

:0 fhw
| sed -e '1s/^/F/'
}


:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*
/dev/null
#${DIR}.Spam.Bad/

:0
* ^X-Spam-Status: Yes
${DIR}.Spam/

When I see these spam messages in my inbox, they have the X-Procmail-SpamAssassinInclude header but no evidence (i.e. headers) of SA having processed the message. According to Thunderbird, these messages are well below the size threshold specified. Can it be todo with the locking mechanism specified by the file above? If it's locked, what happens? Does it wait for SA to be available or simply skip SA? Historically I've been getting these messages occasionally but over the past week I've had bursts of dozens of messages over a few minutes all apparently bypassing SA. I'm running version 3.3.2.

Thanks in anticipation
Timothy Murphy
2014-09-04 11:32:46 UTC
Permalink
1) Is there a simple way of dumping email with an empty To: header?
This seems invariably to be spam, and I'm surprised SA doesn't seem
to score it highly.
Maybe it doesn't consider this to be a header?

2) Does "autolearn" actually remove spam with a very high score?
Or does it still get marked as spam by SA and passed on?

3) As will be obvious, I am not a student of SA;
I just use the default setting, which seems to work well enough for me.
But I'm a little surprised that more or less identical email
that I have marked as spam many times and passed through salearn
still seems to get through.
Is there a simple check to make sure salearn is working?
(I get the message that "192 messages have been examined",
and ~/.spamassassin/bayes_seen and bayes_tok are pretty large,
300kB and 5MB.)

4) I haven't found a short and simple SA tutorial,
explaining how SA works,
with a few tests that one might add to the default,
and a couple of checks one could try to make sure it is working.
--
Timothy Murphy
e-mail: gayleard /at/ eircom.net
School of Mathematics, Trinity College, Dublin 2, Ireland
John Hardin
2014-09-04 14:52:07 UTC
Permalink
Post by Timothy Murphy
1) Is there a simple way of dumping email with an empty To: header?
If by "dump" you mean "discard", this simple test might be better done in
your MTA. However, "poison pill" rules (absent certain DNSBLs) are
generally discouraged.
Post by Timothy Murphy
This seems invariably to be spam, and I'm surprised SA doesn't seem
to score it highly.
Probably because even if it's a good spam sign, it isn't very common or it
appears together with enough other spam signs that it's not scored very
highly by itself.

If you post some spamples of such to pastebin we'll take a look.
Post by Timothy Murphy
Maybe it doesn't consider this to be a header?
Yes, it does. There are rules that check for no TO or CC. For example:

http://ruleqa.spamassassin.org/20140902-r1621946-n/REPLYTO_WITHOUT_TO_CC/detail

If you want to score for "no TO or CC header", you could do this:

meta NO_TO_CC !__TOCC_EXISTS
Post by Timothy Murphy
2) Does "autolearn" actually remove spam with a very high score?
Or does it still get marked as spam by SA and passed on?
"autolearn" is submission of the message to the Bayes backend for
training. This can affect the scoring of subsequently-scanned messages,
but it does not affect the score of that message.

Also: SA does not directly have anything to do with the delivery process.
All it does is generate a spamminess score. *Something else* has to
interpret that score to decide the ultimate destination of the message:
inbox, quarantine or bit bucket.
Post by Timothy Murphy
3) As will be obvious, I am not a student of SA;
I just use the default setting, which seems to work well enough for me.
But I'm a little surprised that more or less identical email
that I have marked as spam many times and passed through salearn
still seems to get through.
That would seem to indicate a problem with Bayes.
Post by Timothy Murphy
Is there a simple check to make sure salearn is working?
You will see BAYES_* rule hits on messages if Bayes is working. You have
to learn a minimum number of spam *and* ham messages before it will start
working.

This will report statistics about the Bayes database.

/usr/bin/sa-learn --dump magic

The most common mistake is to train Bayes as a user that is not the same
user that SA is running under to scan messages - i.e., you're training the
wrong Bayes database. Check which user spamd is running under, and which
user you're running sa-learn as. They should be the same user.
Post by Timothy Murphy
4) I haven't found a short and simple SA tutorial,
explaining how SA works,
with a few tests that one might add to the default,
and a couple of checks one could try to make sure it is working.
The definitive test to check whether SA is scanning messages is to send a
message containing the GTUBE string, it should always be detected and
score 1000 points. Google "spam GTUBE" for more details.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
***@impsec.org FALaholic #11174 pgpk -a ***@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The tree of freedom must be freshened from time to time
with the blood of tyrants and tyrannosaurs.
-- DW, commenting on the GM6 Lynx .50BMG bullpup
-----------------------------------------------------------------------
13 days until the 227th anniversary of the signing of the U.S. Constitution
LuKreme
2014-09-04 17:26:01 UTC
Permalink
Post by Timothy Murphy
1) Is there a simple way of dumping email with an empty To: header?
This seems invariably to be spam, and I'm surprised SA doesn't seem
to score it highly.
You may be surprised if you actually check spam and ham.
Post by Timothy Murphy
2) Does "autolearn" actually remove spam with a very high score?
Or does it still get marked as spam by SA and passed on?
SA never removes mail under any circumstances.
Post by Timothy Murphy
Is there a simple check to make sure salearn is working?
(I get the message that "192 messages have been examined",
and ~/.spamassassin/bayes_seen and bayes_tok are pretty large,
300kB and 5MB.)
For the record, using sql for babes is considerably faster.
Post by Timothy Murphy
4) I haven't found a short and simple SA tutorial,
explaining how SA works,
with a few tests that one might add to the default,
and a couple of checks one could try to make sure it is working.
If you see X-Spam headers, it’s working. If in the X-Spam-Report you see BAYES_ then that is working.
--
she [Esk] was already learning that if you ignore the rules people will,
half the time, quietly rewrite them so they don't apply to you. --Equal
Rites
John Hardin
2014-09-04 17:51:41 UTC
Permalink
Post by LuKreme
For the record, using sql for babes is considerably faster.
Is that anything like "SQL for Dummies"?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
***@impsec.org FALaholic #11174 pgpk -a ***@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Trusting in anti-gun laws to keep you from being shot is like
refusing to wear your seatbelt because you trust traffic laws to
keep you from being in a car accident. -- Erin Palette
-----------------------------------------------------------------------
13 days until the 227th anniversary of the signing of the U.S. Constitution
jdow
2014-09-04 17:59:49 UTC
Permalink
Post by John Hardin
Post by LuKreme
For the record, using sql for babes is considerably faster.
Is that anything like "SQL for Dummies"?
John, I was wondering if there was an SQL for boys, too.

{O,o}
John Hardin
2014-09-04 18:18:17 UTC
Permalink
Post by jdow
Post by John Hardin
Post by LuKreme
For the record, using sql for babes is considerably faster.
Is that anything like "SQL for Dummies"?
John, I was wondering if there was an SQL for boys, too.
SQL for Jocks, maybe?

I gotta wonder how LuKreme developed *that* particular finger-macro... :)
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
***@impsec.org FALaholic #11174 pgpk -a ***@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Users mistake widespread adoption of Microsoft Office for the
development of a document format standard.
-----------------------------------------------------------------------
13 days until the 227th anniversary of the signing of the U.S. Constitution
Kevin A. McGrail
2014-09-04 18:33:47 UTC
Permalink
Post by John Hardin
Post by jdow
Post by John Hardin
Post by LuKreme
For the record, using sql for babes is considerably faster.
Is that anything like "SQL for Dummies"?
John, I was wondering if there was an SQL for boys, too.
SQL for Jocks, maybe?
I gotta wonder how LuKreme developed *that* particular finger-macro... :)
His new website development work to replace facebook?

Select * from babes where interested in me = 'true';
0 rows in set (0.00 sec)

Just to continue the silliness...

Regards,
KAM
Chris
2014-09-04 20:14:42 UTC
Permalink
Post by jdow
Post by John Hardin
Post by LuKreme
For the record, using sql for babes is considerably faster.
Is that anything like "SQL for Dummies"?
John, I was wondering if there was an SQL for boys, too.
{O,o}
Haven't seen you on a list in, well, years. You're still as witty as
ever I see :)
--
Chris
31.11°N 97.89°W (Elev. 1092 ft)
15:12:48 up 1 day, 6:43, 1 user, load average: 0.11, 0.18, 0.18
Ubuntu 14.04 LTS, kernel 3.13.0-35-generic
Joe Quinn
2014-09-04 18:36:30 UTC
Permalink
Post by John Hardin
Post by LuKreme
For the record, using sql for babes is considerably faster.
Is that anything like "SQL for Dummies"?
I've heard good things about the Derek Zoolander Center for Kids who
can't SQL Good and who Wanna Learn to do Other Stuff Good too.
LuKreme
2014-09-06 06:43:44 UTC
Permalink
Post by John Hardin
Post by LuKreme
For the record, using sql for babes is considerably faster.
Is that anything like "SQL for Dummies"?
I've heard good things about the Derek Zoolander Center for Kids who can't SQL Good and who Wanna Learn to do Other Stuff Good too.
I think I've gotten more comments on that not-typo, both onlist and off, than any email in recent memory.

OS X autocorrect doesn't like the word "bayes" much. Heh.
--
'I don't see why everyone depends on me. I'm not dependable. Even I
don't depend on me, and I'm me.'
Timothy Murphy
2014-09-04 19:56:43 UTC
Permalink
Post by LuKreme
Post by Timothy Murphy
Is there a simple check to make sure salearn is working?
(I get the message that "192 messages have been examined",
and ~/.spamassassin/bayes_seen and bayes_tok are pretty large,
300kB and 5MB.)
For the record, using sql for babes is considerably faster.
Do you mean using SQL in some way would speed up salearn?
Do you have a reference for that?

Actually, I run salearn as a cron job in the middle of the night,
so it doesn't matter too much to me if it takes 1 minute or 5 minutes.
Post by LuKreme
Post by Timothy Murphy
4) I haven't found a short and simple SA tutorial,
explaining how SA works,
with a few tests that one might add to the default,
and a couple of checks one could try to make sure it is working.
If you see X-Spam headers, it’s working. If in the X-Spam-Report you see
BAYES_ then that is working.
I'm not certain that SA is taking account of the result of sa-learn.
I'm surprised that the spam score does not seem to change significantly
after many instances of almost identical messages are put through sa-learn.
--
Timothy Murphy
e-mail: gayleard /at/ eircom.net
School of Mathematics, Trinity College, Dublin 2, Ireland
John Hardin
2014-09-04 20:07:02 UTC
Permalink
Post by Timothy Murphy
I'm not certain that SA is taking account of the result of sa-learn.
I'm surprised that the spam score does not seem to change significantly
after many instances of almost identical messages are put through sa-learn.
(1) Do you see any BAYES_* rules hitting at all?

(2) What does /usr/bin/sa-learn --dump magic report?

(3) Did you review the spamd user vs. sa-learn user as I suggested?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
***@impsec.org FALaholic #11174 pgpk -a ***@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The most glaring example of the cognitive dissonance on the left
is the concept that human beings are inherently good, yet at the
same time cannot be trusted with any kind of weapon, unless the
magic fairy dust of government authority gets sprinkled upon them.
-- Moshe Ben-David
-----------------------------------------------------------------------
13 days until the 227th anniversary of the signing of the U.S. Constitution
LuKreme
2014-09-06 06:46:25 UTC
Permalink
Post by Timothy Murphy
Post by LuKreme
Post by Timothy Murphy
Is there a simple check to make sure salearn is working?
(I get the message that "192 messages have been examined",
and ~/.spamassassin/bayes_seen and bayes_tok are pretty large,
300kB and 5MB.)
For the record, using sql for babes is considerably faster.
Do you mean using SQL in some way would speed up salearn?
More importantly, it speeds up the bayes checks on incoming spam.
--
"you'd think you could trust a horde of hungarian barbarians"
Ian Zimmerman
2014-09-07 02:17:16 UTC
Permalink
Others have gracefully answered as to the substance of your message.

I'll have to be a pest and ask that you please do not use "Reply" or
"Followup" when you're starting a new topic. For list readers with user
agents that thread the standard (RFC standard) way, that breaks
threading.

The way to start a new topic is to copy the list address, do a "New
Message" or similar, and paste the address into the destination field.
You can also save the address in your contact list / address book to
avoid the copy and paste in the future.

Thanks for your cooperation.
--
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:
John Hardin
2014-09-04 14:56:53 UTC
Permalink
Post by Geoff Soper
I've got an issue whereby spam messages seem to be somehow bypassing SA
and getting into my inbox.
:0fw: spamassassin.lock
* < 400000
| spamc -x
Are the messages that bypass SA always rather large?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
***@impsec.org FALaholic #11174 pgpk -a ***@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The tree of freedom must be freshened from time to time
with the blood of tyrants and tyrannosaurs.
-- DW, commenting on the GM6 Lynx .50BMG bullpup
-----------------------------------------------------------------------
13 days until the 227th anniversary of the signing of the U.S. Constitution
g***@alphaworks.co.uk
2014-10-23 21:51:38 UTC
Permalink
Post by John Hardin
Post by Geoff Soper
I've got an issue whereby spam messages seem to be somehow bypassing
SA and getting into my inbox.
:0fw: spamassassin.lock
* < 400000
| spamc -x
Are the messages that bypass SA always rather large?
No, unfortunately not... I understand why the large messages bypass SA
but not the small ones.

As a slightly related aside, what do people typically do about larger
messages containing virus laden zip files?

Many thanks,
Geoff
Axb
2014-10-23 21:57:25 UTC
Permalink
Post by g***@alphaworks.co.uk
Post by John Hardin
Post by Geoff Soper
I've got an issue whereby spam messages seem to be somehow bypassing
SA and getting into my inbox.
:0fw: spamassassin.lock
* < 400000
| spamc -x
Are the messages that bypass SA always rather large?
No, unfortunately not... I understand why the large messages bypass SA
but not the small ones.
As a slightly related aside, what do people typically do about larger
messages containing virus laden zip files?
I do

### ClamAV
CLAMSCAN=/usr/local/bin/clamdscan

:0
{
VIRUS=`$CLAMSCAN --stdout -`

:0 Di
* VIRUS ?? FOUND
/dev/null
}
###

you can also store them fo review instead of sending to /dev/null
John Hardin
2014-10-23 22:51:12 UTC
Permalink
Post by John Hardin
Post by Geoff Soper
I've got an issue whereby spam messages seem to be somehow bypassing SA
and getting into my inbox.
: 0fw: spamassassin.lock
* < 400000
| spamc -x
Are the messages that bypass SA always rather large?
No, unfortunately not... I understand why the large messages bypass SA but
not the small ones.
As a slightly related aside, what do people typically do about larger
messages containing virus laden zip files?
Personally, my site security policy is no zipped executables absent prior
arrangement.

<plug type="shameless">http://www.impsec.org/email-tools/procmail-security.html</plug>
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
***@impsec.org FALaholic #11174 pgpk -a ***@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
If Microsoft made hammers, everyone would whine about how poorly
screws were designed and about how they are hard to hammer in, and
wonder why it takes so long to paint a wall using the hammer.
-----------------------------------------------------------------------
875 days since the first successful private support mission to ISS (SpaceX)
g***@alphaworks.co.uk
2014-10-23 21:47:53 UTC
Permalink
Using procmail without MTA glue is OK for many uses. I am wondering how many spamd connections you allow and if you have checked your logs?
I also cannot remember but the uses of a lock file seem odd for something that can thread. Any one know if that is a good idea to remove?
Regards,
KAM
Hi,
apologies for the delay in my response...

I wonder if you could explain in simple terms what the lockfile achieves
in this situation? Is it even possible that it could cause messages to
bypass SA?

I have access to SA through a VPS, it's not a server I administer but
support are very helpful...

Many thanks,
Geoff
Kevin A. McGrail
2014-10-23 22:00:29 UTC
Permalink
Post by g***@alphaworks.co.uk
Using procmail without MTA glue is OK for many uses. I am wondering
how many spamd connections you allow and if you have checked your logs?
I also cannot remember but the uses of a lock file seem odd for
something that can thread. Any one know if that is a good idea to
remove?
Regards,
KAM
Hi,
apologies for the delay in my response...
I wonder if you could explain in simple terms what the lockfile
achieves in this situation? Is it even possible that it could cause
messages to bypass SA?
I don't think a lockfile achieves anything because it's a call to a
program. Procmail has some weird syntax so hopefully someone with some
procmail-fu can tell us if a lock on a procmail system call does
anything. In the meantime, here's a good resource about procmail and
lockfiles:
http://www.techrepublic.com/article/all-the-wonders-of-procmail-part-2-lockfiles-and-nondelivering-recipes/

regards,
KAM
g***@alphaworks.co.uk
2014-10-23 22:06:52 UTC
Permalink
Post by Kevin A. McGrail
Post by g***@alphaworks.co.uk
Using procmail without MTA glue is OK for many uses. I am wondering
how many spamd connections you allow and if you have checked your logs?
I also cannot remember but the uses of a lock file seem odd for
something that can thread. Any one know if that is a good idea to
remove?
Regards,
KAM
Hi,
apologies for the delay in my response...
I wonder if you could explain in simple terms what the lockfile
achieves in this situation? Is it even possible that it could cause
messages to bypass SA?
I don't think a lockfile achieves anything because it's a call to a
program. Procmail has some weird syntax so hopefully someone with
some procmail-fu can tell us if a lock on a procmail system call does
anything. In the meantime, here's a good resource about procmail and
http://www.techrepublic.com/article/all-the-wonders-of-procmail-part-2-lockfiles-and-nondelivering-recipes/
BTW, I followed the example detailed at
http://wiki.apache.org/spamassassin/UsedViaProcmail (see "How do I use
SpamAssassin with procmail?") for my implementation...
Loading...