SA skipping URI processing

Discussion:

Ken Bass

2014-10-15 20:49:32 UTC

I'm using Centos 7, which means SA version 3.3.2.

I am encountering several emails that are not being processed correctly
when checking against URI rules.

1) My local.cf has a rule to address the new .link domain which spammers
appear to be using recently:

uri LR_LINK_TLD /^(?:https?:\/\/|mailto:)[^\/]+\.link(?:\/|$)/i
describe LR_LINK_TLD Contains a URL in the LINK top-level domain
score LR_LINK_TLD 3.0

2) The URIDNSBL rules are not being executed for these email either.

Debug of SA shows an empty domains to query: Huh?
Oct 15 16:24:55.416 [15519] dbg: uridnsbl: domains to query:

Here is the pastebin link to the full spam email:

http://pastebin.com/RJWyGkKB

Kevin A. McGrail

2014-10-15 20:52:26 UTC

Permalink

Post by Ken Bass
I'm using Centos 7, which means SA version 3.3.2.
I am encountering several emails that are not being processed
correctly when checking against URI rules.
1) My local.cf has a rule to address the new .link domain which
uri LR_LINK_TLD /^(?:https?:\/\/|mailto:)[^\/]+\.link(?:\/|$)/i
describe LR_LINK_TLD Contains a URL in the LINK top-level domain
score LR_LINK_TLD 3.0
2) The URIDNSBL rules are not being executed for these email either.
Debug of SA shows an empty domains to query: Huh?
http://pastebin.com/RJWyGkKB

The TLDs are hardcoded in SA 3.3.2. We are working on not having them
hard-coded in 3.4.1.

I believe someone made a patch suitable for 3.3.2 but I can't find it at
the moment.

regards,
KAM

Ken Bass

2014-10-15 21:01:57 UTC

Permalink

Post by Kevin A. McGrail

Post by Ken Bass
1) My local.cf has a rule to address the new .link domain which
uri LR_LINK_TLD /^(?:https?:\/\/|mailto:)[^\/]+\.link(?:\/|$)/i
describe LR_LINK_TLD Contains a URL in the LINK top-level domain
score LR_LINK_TLD 3.0
2) The URIDNSBL rules are not being executed for these email either.
Debug of SA shows an empty domains to query: Huh?
http://pastebin.com/RJWyGkKB

The TLDs are hardcoded in SA 3.3.2. We are working on not having
them hard-coded in 3.4.1.
I believe someone made a patch suitable for 3.3.2 but I can't find it
at the moment.

Sorry but I think you might be confusing some specific TLD related rule
issues rather than the more generic custom uri rules and uridnsbl rules
that I am using. Because these work fine on OTHER emails. Something in
specific emails, like the one in the above pastebin are causing the
issue. I've got lots of other emails that hit the above LR_LINK_TLD
and/or URIBL_DBL_SPAM.

Martin Gregorie

2014-10-15 22:12:12 UTC

Permalink

Post by Ken Bass

Post by Kevin A. McGrail

The TLDs are hardcoded in SA 3.3.2. We are working on not having
them hard-coded in 3.4.1.
I believe someone made a patch suitable for 3.3.2 but I can't find it
at the moment.

I'm certain KAM is right and here's why.

: I recently wrote a set of three experimental rules to detect *.link
Rules in body text, Received headers and From headers and set up some
test messages since I've yet to see any .link TLDs . The body text rule
was, of course, a URI rule. It didn't work though the other two rules,
which used ordinary regexes with \.link as part of the expression,
worked as expected. Eventually, as a debugging aid I changed the rules
and the test messages to search for \.com and all three rules worked as
expected.

IOW, uri rules depend on matching the terminal part of the domain name
with an entry in SA's built-in TLD list and my version, installed from
the Fedora repo, doesn't yet include .link.

I reverted my rules and test messages to test for the .link TLD and am
now waiting for a TLD list that contains .link to percolate through the
Fedora update process.

HTH
Martin

Ken Bass

2014-10-15 22:20:46 UTC

Permalink

Post by Martin Gregorie
I'm certain KAM is right and here's why.

...snip...

Post by Martin Gregorie
IOW, uri rules depend on matching the terminal part of the domain name
with an entry in SA's built-in TLD list and my version, installed from
the Fedora repo, doesn't yet include .link.
I reverted my rules and test messages to test for the .link TLD and am
now waiting for a TLD list that contains .link to percolate through the
Fedora update process.

I think my confusion is that for many spam messages, the uri rule is
working fine for the .link domain.
After looking at some different spam emails, I think the difference is
that if the .link is inside an 'HTML' spam, the url processing works. If
it is a normal text spam email, the url processing does not work. That
has been the source of my confusion and why I was thinking KAM was
referring to a different issue.

So I am thinking that the HTML decoding part of SA doesn't use that
built-in TLD list, but the test email processing does. That is the only
way I can explain it what I am seeing.

Kevin A. McGrail

2014-10-15 22:50:59 UTC

Permalink

Post by Ken Bass

Post by Martin Gregorie
I'm certain KAM is right and here's why.

...snip...

I think my confusion is that for many spam messages, the uri rule is
working fine for the .link domain.
After looking at some different spam emails, I think the difference is
that if the .link is inside an 'HTML' spam, the url processing works.
If it is a normal text spam email, the url processing does not work.
That has been the source of my confusion and why I was thinking KAM
was referring to a different issue.
So I am thinking that the HTML decoding part of SA doesn't use that
built-in TLD list, but the test email processing does. That is the
only way I can explain it what I am seeing.

I'd have to dig into it to find out more but there are different modules
used for different tests so deviation in behavior is not something that
alarms me. If you replace your RegistrarBoundaries.pm and it still has
issues, please let us know. I am 99.9% sure I'm right.

regards,
KAM

Ken Bass

2014-10-15 23:33:11 UTC

Permalink

Post by Kevin A. McGrail
I'd have to dig into it to find out more but there are different
modules used for different tests so deviation in behavior is not
something that alarms me. If you replace your RegistrarBoundaries.pm
and it still has issues, please let us know. I am 99.9% sure I'm right.
regards,
KAM

Thanks -- My apologies for doubting you. Kinda of scary that there is a
loophole that will grow each time a new tld is introduced. For now, I'll
just block the .link domain at the smtp level.

Kevin A. McGrail

2014-10-16 00:20:01 UTC

Permalink

Post by Ken Bass

Thanks -- My apologies for doubting you. Kinda of scary that there is
a loophole that will grow each time a new tld is introduced. For now,
I'll just block the .link domain at the smtp level.

I'm an engineer so Doubt is a good thing. Trust but verify ;-)

But yes, we know the TLD issue is a growing pain point and we have some
thoughts in progress to resolve it.

Martin Gregorie

2014-10-15 22:54:30 UTC

Permalink

Post by Ken Bass

Post by Martin Gregorie
I'm certain KAM is right and here's why.

...snip...

I think my confusion is that for many spam messages, the uri rule is
working fine for the .link domain.
After looking at some different spam emails, I think the difference is
that if the .link is inside an 'HTML' spam, the url processing works. If
it is a normal text spam email, the url processing does not work. That
has been the source of my confusion and why I was thinking KAM was
referring to a different issue.
So I am thinking that the HTML decoding part of SA doesn't use that
built-in TLD list, but the test email processing does. That is the only
way I can explain it what I am seeing.

That's quite possible. My test messages are all plaintext or have the
uris in plaintext MIME parts.

Martin

Ken Bass

2014-10-15 21:49:01 UTC

Permalink

Post by Kevin A. McGrail
The TLDs are hardcoded in SA 3.3.2. We are working on not having
them hard-coded in 3.4.1.

I found Bug 6782, which I think you are referring to. I don't quite
understand the details of it. But are saying that the 'uri' and uridnsbl
rules
rely on those functions? If so, I am confused, because I have many spam
emails with the '.link' domain that are being tagged properly.