How to facilitate SA-LEARN...

Discussion:

Ken Morley

2004-04-05 13:38:36 UTC

I've successfully setup Sendmail, MIMEDefang, ClamAV and SpamAssassin on a
Red Hat Linux 8.0 box front-ending our Exchange server and it's all working
well.

I would like to train SA by hand sorting spam/ham and using the SA-Learn
command. The problem is that we use Outlook 2000 as a MUA. There doesn't
appear to be a convenient way to export a mail message as a text file that I
could copy over to a directory on the Linux server.

Could I setup Ham / Spam mailboxes on the linux box and forward messages
there?

Any suggestions on how to best accomodate this?

Thanks!

Matt Kettler

2004-04-05 13:56:34 UTC

Permalink

Post by Ken Morley
I've successfully setup Sendmail, MIMEDefang, ClamAV and SpamAssassin on a
Red Hat Linux 8.0 box front-ending our Exchange server and it's all working
well.
I would like to train SA by hand sorting spam/ham and using the SA-Learn
command. The problem is that we use Outlook 2000 as a MUA. There doesn't
appear to be a convenient way to export a mail message as a text file that I
could copy over to a directory on the Linux server.
Could I setup Ham / Spam mailboxes on the linux box and forward messages
there?

Forward? definitely not.. A forwarded message is an entirely new message,
with new headers. Since sa-learn does tokenize message headers, you'd be
training bayes to think that all forwarded email is spam.

However, according to the SA wiki's FAQ, Outlook's "redirect" option may
work. As long as the message gets there with the headers more-or-less the
same as when they came in (an extra Received: or two is fine), then it
should work.

http://wiki.spamassassin.org

Another approach is to set up spamtraps, and "hamtraps"... spamtraps being
unused system accounts, nonexistant addresses used in dictionary attacks,
etc that will only get spam. I find a LOT of spam going to mail@, sync@,
gopher@, apache@ and adm@, presumably because most of these are default
system accounts on many *nix boxes.

hamtraps are addresses that I set up and subscribe to non-spam newsletters,
mailing lists, etc which are typical of the kinds of things my users
subscribe to. It's not a comprehensive sample of nonspam, but you can pick
up a decent amount of nonspam in a more-or-less automated way. I prefer to
use aliases for each list, this way if one somehow gets leaked to a spammer
I just cut off that alias and the others keep going.

Post by Ken Morley
Any suggestions on how to best accomodate this?
Thanks!

Nick Gilbert

2004-04-05 15:02:17 UTC

Permalink

Post by Matt Kettler

Post by Ken Morley
Could I setup Ham / Spam mailboxes on the linux box and forward messages
there?

If you forward the messages as attachments (ie drag the hams/spams onto
a new message) then it keeps all headers intact. I'm not sure if you
need to do anything else to the messages after they arrive in your
spam/ham accounts or if sa-learn automatically realises what you're
trying to do and only scores the attached e-mails rather than the
original messages itself.

I'd like to set this up myself so any pointers for a Qmail box would be
very useful.

Nick..

Theo Van Dinter

2004-04-05 15:05:48 UTC

Permalink

Post by Nick Gilbert
spam/ham accounts or if sa-learn automatically realises what you're
trying to do and only scores the attached e-mails rather than the
original messages itself.

sa-learn has no way to tell if you're trying to learn a message that's
attached, or learning the whole message. so you'd need to unencapsulate
the message you want to learn first.

--
Randomly Generated Tagline:
I've always maintained a cordial dislike for indent, because it's usually
right.
-- Larry Wall in <***@wall.org>

Matt Kettler

2004-04-05 15:23:22 UTC

Permalink

Post by Theo Van Dinter
sa-learn has no way to tell if you're trying to learn a message that's
attached, or learning the whole message. so you'd need to unencapsulate
the message you want to learn first.

Aw, come on Theo, you guys haven't included Mime::Parser_Psychic support in
SA yet?

More seriously, it might be useful to have an option to sa-learn to tell it
to learn an attachment instead of a message. If nothing else, it would make
it much easier to answer the "how do I let my users forward mail?" question
in the FAQ.

Would that be brutally ugly in the sa-learn code, and thus not worth the
effort, or is it simple enough to be worth opening an enhancement bug in
bugzilla for?

Theo Van Dinter

2004-04-05 15:46:28 UTC

Permalink

Post by Matt Kettler
Aw, come on Theo, you guys haven't included Mime::Parser_Psychic support in
SA yet?

Heh. People think my tagline choosing script is psychic. It usually
picks a tagline related to what I really feel, but am not going to say. ;)

(it's just "fortune" BTW ...)

If it were psychic though, I could link it in against sa-learn. ;)

Post by Matt Kettler
Would that be brutally ugly in the sa-learn code, and thus not worth the
effort, or is it simple enough to be worth opening an enhancement bug in
bugzilla for?

I personally don't want to add that functionality into sa-learn proper. I'd rather see a set of contrib
scripts or something to (un)encapsulate the mails for easy/standardized reporting.

--
Randomly Generated Tagline:
LILO, you've got me on my knees!
(from David Black, ***@pilot.njin.net, with apologies to Derek and the
Dominos, and Werner Almsberger)

Justin Mason

2004-04-05 18:16:51 UTC

Permalink

Post by Matt Kettler

Aw, come on Theo, you guys haven't included Mime::Parser_Psychic support in
SA yet?
More seriously, it might be useful to have an option to sa-learn to tell it
to learn an attachment instead of a message. If nothing else, it would make
it much easier to answer the "how do I let my users forward mail?" question
in the FAQ.
Would that be brutally ugly in the sa-learn code, and thus not worth the
effort, or is it simple enough to be worth opening an enhancement bug in
bugzilla for?

That's an interesting idea -- definitely open the bug for discussion.
Issues I can see:

1. how hard is it to extract that inside SpamAssassin?

2. how many MUAs can competently forward as rfc-822 attachment?

- --j.

Matt Kettler

2004-04-05 18:38:42 UTC

Permalink

Post by Justin Mason
That's an interesting idea -- definitely open the bug for discussion.
1. how hard is it to extract that inside SpamAssassin?

Justin, I think Theo had the right idea of wanting it done as contrib
scripts, rather than implemented in sa-learn proper.

I think in the long run separate scripts would likely be more maintainable
anyway, and just as useful.

Post by Justin Mason
2. how many MUAs can competently forward as rfc-822 attachment?

That I'm not sure of.. Eudora appears to not be able to do it, but
Groupwise does (it creates a Content-Type: message/rfc822)

Shaun T. Erickson

2004-04-05 15:44:51 UTC

Permalink

Post by Theo Van Dinter

Post by Nick Gilbert
spam/ham accounts or if sa-learn automatically realises what you're
trying to do and only scores the attached e-mails rather than the
original messages itself.

sa-learn has no way to tell if you're trying to learn a message that's
attached, or learning the whole message. so you'd need to unencapsulate
the message you want to learn first.

Does this mean I cannot train sa-learn with spam that arrives as an
attachement to the spam report? Do I have to configure SA to pass it
through unmolested, without the report?

-ste

Matt Kettler

2004-04-05 16:01:42 UTC

Permalink

Post by Shaun T. Erickson
Does this mean I cannot train sa-learn with spam that arrives as an
attachement to the spam report? Do I have to configure SA to pass it
through unmolested, without the report?

If the reporting and encapsulation was done by SA itself, then sa-learn is
smart enough to recognize and remove it.. sa-learn has ALWAYS accepted SA's
output as input, and this is VERY unlikely to ever change.

However, this thread was not discussing SA's own encapsulations, it was
discussing an end-user forwarding an entire email as an attachment. Since
SA can't recognize the actions of end users, it can't know that it needs to
take some form of special action.

Grant Baxter

2004-04-05 17:39:59 UTC

Permalink

This was an answer on the SAProxy maillist, so I hope I haven't broken
any netiquette rules (I don't use either Outlook or Mozilla, so I'm
assuming that Mozilla has the ability to import Outlook mail):

Easiest is to install Mozilla and have Mozilla convert your Outlook
mail
to Mozilla's "mbox" format.

Once you have your email in "mbox" format, you can run sa-learn.exe
against it easily.

You don't need to actually switch to using Mozilla as your mailer, you
only need to do this to convert the mail.

I expect you want to continue using Outlook as your mailer so I
suggest
that, as a safety measure, you make sure Mozilla doesn't download your
POP3 mail into its inbox. One way to make sure this doesn't happen
is to enter an invalid password in Mozilla's POP3 configuration.

Don Dudan

2004-04-05 23:54:16 UTC

Permalink

Post by Grant Baxter
Easiest is to install Mozilla and have Mozilla convert your Outlook
mail
to Mozilla's "mbox" format.

Just a note for Mac OS X users (there ARE some of us out here! ;-))

You can easily create an mbox from directly in OS X using Apple's Mail program.
Drag any mailbox to the Desktop. It appears as <mailbox-name>.mbox.
It actually contains a few files. To see the files rename the file <mailbox-
name>.doc. It then appears as a folder. Open it and there is the mbox file
inside.

If you use Entourage dragging a mailbox will create the mbox file
directly (without all the other index and associated files like Mail
creates).You should be able to use SA-Learn with these mboxes.

(You can create two mailboxes in your mail client - spam and ham for example
and use the routine above to create mboxes.)

Don