SURBL implementation

by **kimkodde** » Fri Feb 29, 2008 12:06 am

SURBL differs from RBL that is looks into the body text for named url's and then test these url's.
There is (at this moment) no support for SURBL but can reasonable easy be implemented with an
external application. Content Filtering has no option to give an external program access to the message
body... but Virus Filtering has!

I wrote a program that takes the message body file as parameter, reads it's contents, checks the links
and returns a 1 after the first blacklisted entry. The rest is very easy, mark it as spam or whatever
you whish to do with it.

I was triggered by this due to a flood of "casino" messages that contains reasonable normal dutch text
and were not detected as spam by the Bayesian filter while ISP's using SURBL were faster with marking
it as spam.

The way I currently implemented this (external program) is not optimal for performance but very effective.
Ofcourse it's better if it is directly implemted into Ability Mailserver. Technically this must be qute easy.

Kim

by **rob** » Fri Feb 29, 2008 12:44 pm

Thank you for the tip, and I have made a recommendation that we look into SURBL for direct support (and any other similar services). We do have plans for a URL black list feature and this of course would compliment that very nicely.

by **kimkodde** » Fri Feb 29, 2008 11:29 pm

For more info about SURBL see http://www.surbl.org/

My program is still under development, added the following features
- added valid TLD check
- added a 24 hours local Blacklist cache to minimize DNS checks
- added whitelist hardcoded url's (w3.org, google.com etc)
- added runtime whitelist cache to prevent an uri is checked multiple times in 1 run
- enhanced uri check (checks now for http:// and www.)
- for invalid url's make copy of message file for later diagnosis (to improve the program)
- creates a list of non-blacklisted url's to extend my hardcoded whitelist
- logfile
- added support for url's with an IP number
- hardcoded whitelist url's now with generic TLD (google.*)
- added support for cloaked encoded url's (like http://%6E%6F%73....)
- added support for "aaaa[DOT]bbb" url notations in some mails now popping up (July 5th, 2008)

Most bad url triggers now seems to be badly formed url's.
If I stripp out the debugging parts it's ready.

"white" url's are not cached for 24 hours as they might be blacklisted any moment.
I have seen new 'casino' url's appearing which are blacklisted in SURBL within an hour.
Especially this kind of spam is difficult to identify as the text is reasonable normal.

Kim

by **MikeG** » Tue Mar 04, 2008 10:32 am

I'd be very keen to see a SURBL facility. I ran a few manual tests on the emails which the normal RBL's didn't detect as spam, and SURBL correctly identified them all.
There are already a few 3rd party tools out there which claim to to give access to SURBL from an email system which allows the execution of external applications, but none seem to be ideal.

SURBL implementation

SURBL implementation

Re: SURBL implementation

Re: SURBL implementation

Re: SURBL implementation

Who is online