Discussion:
tildes and underscore replacing spaces in spam gets by Spamassassin
Jeffrey L. Premer
2003-04-17 05:55:28 UTC
Permalink
hi all. i have been receiving more and more mail such as the one
below, which is getting by spamassassin no problem. Any suggestions?
thanks jeff

begin
message____________________________________________________________
From: <***@cs.com>
Date: Thu Apr 17, 2003 13:01:34 Asia/Taipei
To: <***@889.net>
Subject: jeff~The~Impact~of~War~on~Investing???_tip_#45
Return-Path: <***@cs.com>
Received: from cs.com (adsl-67-118-0-72.dsl.sntc01.pacbell.net
[67.118.0.72]) by 889.net (8.12.8p1/8.12.8) with SMTP id h3H5CJEt021887
for <***@889.net>; Thu, 17 Apr 2003 01:12:20 -0400
Message-Id: <91a4aa887808$049a9354$***@wxtbnoxkt.sp>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Priority: 3
X-Mimeole: Produced By Microsoft MimeOLE V5.00.2919.6700
X-Msmail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6700
Content-Transfer-Encoding: 8bit
X-Mime-Autoconverted: from base64 to 8bit by 889.net id h3H5CJEt021887
X-Spam-Status: No, hits=1.2 required=5.0
tests=SUBJ_HAS_Q_MARK,NO_REAL_NAME version=2.31
X-Spam-Level: *

Great to meet you, jeff

jeff_Do_You_Know_The_Impact_of_War_on_Global_Financial_Markets?

jeff_Are_Your_Investments_Safe?

http://www.wealthideas.com.ar/invest/war_impact.html

jeff_You_can_not_afford_to_ignore_the_vital_information_contained_within
_this_special_report.

http://www.workingathome.com.ar/invest/war_impact.html










If you would rather say goodbye, please visit:
http://www.workingathome.com.ar/invest/cleanlist.html



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Chris Santerre
2003-04-17 13:25:12 UTC
Permalink
-----Original Message-----
Great to meet you, jeff
jeff_Do_You_Know_The_Impact_of_War_on_Global_Financial_Markets?
jeff_Are_Your_Investments_Safe?
http://www.wealthideas.com.ar/invest/war_impact.html
jeff_You_can_not_afford_to_ignore_the_vital_information_contai
ned_within
_this_special_report.
OUCH. Yeah I'm getting sorter and simpler spam as well. Haven't seen this
yet. One of the reasons I wanted to have procmail setups for each user was
just for such a case. If SA doesn't implement a new rule, I think I could
pass this through a filter to replace all the underscores with spaces before
going to SA.

I have a feeling that legit uses of the underscore in an email are very
small.

Chris


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Jeffrey L. Premer
2003-04-17 13:35:29 UTC
Permalink
filtering may be a good method, but the problem arises in that
tildes/underscores/dashes all have legitimate uses in email formatting.
I was thinking maybe a ruleset based on underscore/tilde/dash included
strings without spaces that rejects outright or sets point level to
email that does not pass. easy to code but the ruleset may be a bit
tougher to get correct without being too restrictive. how many
tildes/underscores per how long of character set is/is not acceptable?
what characters are not acceptable in this context? what about entire
lines of underscores and dashes and mixes thereof (people do that to
separate content in email)? jeff
Post by Chris Santerre
-----Original Message-----
Great to meet you, jeff
jeff_Do_You_Know_The_Impact_of_War_on_Global_Financial_Markets?
jeff_Are_Your_Investments_Safe?
http://www.wealthideas.com.ar/invest/war_impact.html
jeff_You_can_not_afford_to_ignore_the_vital_information_contai
ned_within
_this_special_report.
OUCH. Yeah I'm getting sorter and simpler spam as well. Haven't seen this
yet. One of the reasons I wanted to have procmail setups for each user was
just for such a case. If SA doesn't implement a new rule, I think I could
pass this through a filter to replace all the underscores with spaces before
going to SA.
I have a feeling that legit uses of the underscore in an email are very
small.
Chris
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Chris Devers
2003-04-17 14:55:22 UTC
Permalink
Post by Chris Santerre
I have a feeling that legit uses of the underscore in an email are very
small.
*shrug*

I use them, but only _sometimes_, as a way to place _emphasis_.

'Course you could place emphasis in HTML, but then in my opinion HTML mail
is mail pollution, and should have been banned by the Kyoto accords.

Not that anybody asked me :)

Also,

if sending_code($for_example)
$it_may_make_sense_to_use = "underscores";

but it's not something that I for one have to use every day :)
--
Chris Devers ***@boston.com

portable, adj.
(Of a program) able to CRASH any OS on any PLATFORM. Compare MACHINE-
INDEPENDENT; VENDOR-INDEPENDENT. See also C; UNIX.

-- from _The Computer Contradictionary_, Stan Kelly-Bootle, 1995


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Tony Earnshaw
2003-04-17 15:04:32 UTC
Permalink
Post by Jeffrey L. Premer
i have been receiving more and more mail such as the one
below, which is getting by spamassassin no problem. Any suggestions?
thanks jeff
Dunno if it'll encourage you or not (I suspect not), but I cut'n pasted
your vessage into a vi'ed file jeff.txt in the spamd user's temp
directory and did 'spamassassin -D -L -t < jeff.txt' on it ;

in 2.60-CVS :->

Content analysis details: (9.60 points, 6 required)
RCVD_FAKE_HELO_DOTCOM (3.7 points) Received contains a faked HELO hostname
NO_REAL_NAME (0.9 points) From: does not include a real name
INVALID_DATE (0.6 points) Invalid Date: header (not RFC 2822)
MSGID_OUTLOOK_TIME (4.4 points) Message-Id is fake (in Outlook Express format)

then:

sa-learn --spam --file jeff.txt

then the first test again (I've marked up the BAYES_90 points):

Content analysis details: (13.70 points, 6 required)
RCVD_FAKE_HELO_DOTCOM (3.7 points) Received contains a faked HELO hostname
NO_REAL_NAME (0.9 points) From: does not include a real name
INVALID_DATE (0.6 points) Invalid Date: header (not RFC 2822)
BAYES_90 (4.1 points) BODY: Bayesian classifier says spam probability is 90 to 99%
[score: 0.9702]
MSGID_OUTLOOK_TIME (4.4 points) Message-Id is fake (in Outlook Express format)

Bingo.

Best,

Tony
--
Tony Earnshaw

Do not come to visit me with both arms the same length.

e-post: ***@billy.demon.nl
www: http://www.billy.demon.nl



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Chris Santerre
2003-04-17 16:05:18 UTC
Permalink
Post by Chris Santerre
Post by Chris Santerre
I have a feeling that legit uses of the underscore in an
email are very
Post by Chris Santerre
small.
*shrug*
I use them, but only _sometimes_, as a way to place _emphasis_.
'Course you could place emphasis in HTML, but then in my
opinion HTML mail
is mail pollution, and should have been banned by the Kyoto accords.
Here it wouldn't hurt anything to replace with spaces..
Post by Chris Santerre
Also,
if sending_code($for_example)
$it_may_make_sense_to_use = "underscores";
I am an idiot :)
Yeah, this list uses them for talking about rules. Then again one can
whitelist the list. Or make a rule to offset for list using them. The 2.60
example, 1st pass, seems to be only from increased scores. The second pass
shows the coolness of bayes. Which I still am not using :)



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Chris Devers
2003-04-17 16:02:39 UTC
Permalink
Post by Chris Santerre
Post by Chris Devers
Post by Chris Santerre
I have a feeling that legit uses of the underscore
in an email are very small.
*shrug*
I use them, but only _sometimes_, as a way to place _emphasis_.
Here it wouldn't hurt anything to replace with spaces..
Hmm...

"I use them, but only sometimes , as a way to place emphasis ."

Doesn't quite have the same effect, I think. Just looks like funny
spacing, rather than conveying any subtlety of meaning.
Post by Chris Santerre
Post by Chris Devers
Also,
if sending_code($for_example)
$it_may_make_sense_to_use = "underscores";
I am an idiot :)
Nah. :)

I just think that, in my own use of email, enough odd characters pop up
that I wouldn't want them *banned*. Maybe /scored/ a little higher, but
not enough to say _this *is* spam_. Ya know?

Maybe the rule[s] for gappy text can be extended to count examples like
these. It should be possible to use symbols without it being scored as a
severe spam penalty, but I'd be willing to spend a small number of points
on ham if it would boost obfuscating use of these characters in real spam.
Hopefully it won't lead to very many false positives...
--
Chris Devers cdevers+***@boston.com

debugger, n. (Anglo-Irish)
The person responsible for errors in a program; the person who sold us
our system.

-- from _The Computer Contradictionary_, Stan Kelly-Bootle, 1995


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
Loading...