Bayesian learning/updating question

by **Pugglewuggle** » Fri Feb 27, 2009 12:47 am

Hi, just a quick question:

When AMS runs autolearn on a schedule, does it update tokens accordingly based on which folder a mail is currently in?

For example:

I get a real message and a spam message. Somehome, the two end up in the wrong folder (the real message is in the spam folder and the spam message is in the inbox). The autolearn process runs and catalogs the real message as spam and the spam as legit mail. I then move the two items into the correct folders at a later date. Does AMS update the mails' corresponding tokens in the bayesian database to reflect the changes I made, thus correctly marking the spam as spm and the real message as legit, or do the tokens remain in the bayesian database as incorrectly categorized?

Thanks!

by **rob** » Tue Mar 03, 2009 1:07 pm

Once the mails have been processed, this unfortunatly cannot be undone (due to the nature how the database is stored and processed, incrementally). However, after moving the mails and then relearning (each mail contains a flag and can be learned once as SPAM and once as normal mail), this goes some way to reduce the negative effect of this. So basically what you did was the correct approach. What often recommend however is that you set the bayesian to only process mails older than a few days. This gives the user and yourself some time to correct any mistakes and avoid pollution of hte bayesian. Ultimatly, if mistakes are common place and hte bayesian begins to suffer effectiveness, you can wipe the slate clean and relearn it from scratch. If is for this reason you try to hold onto your SPAM mails and store them (when we clear our SPAM folder, we zip up the raw mails and put them on the backup server).

by **Pugglewuggle** » Wed Mar 04, 2009 4:42 am

Hi Rob,

I've actually got the system set to process mails after 5 days in case anybody is out for a while and ALSO to make sure it doesn't immediately learn and start flagging legit messages as spam.

One question: in your second sentence you say "relearn." By this do you mean delete the current token database and then learn again, or do you mean auto-relearning that occurs when an already-flagged message changes folders?

Thanks!

BTW, did you get my emails from a few weeks ago about the spam from local addresses that included example messages? If I recall, it was in relation to the previous thread "Getting SPF to work properly".

by **rob** » Thu Mar 05, 2009 9:46 am

I realy meant both, a message can be learnt as both SPAM and non-SPAM, so moving a incorrectly identified mail to the appropiate folder will result in it being reused for the opposite type. This effectively means the mail will be learnt as both SPAM and non-SPAM, which goes some way to neturialise the negative effect of the mistake. But I also was referring to the act of resetting the database and learning from scratch (hence why you should archive SPAM and non SPAM mails if possible).

Sorry I am ensure which mails are you are referring to so if you could forward them again I or other memebers of staff will be happy to respond.

by **Pugglewuggle** » Thu Mar 05, 2009 11:47 pm

Thanks for the info! How's 3 coming?

I just resent those mails to your email address.

Bayesian learning/updating question

Bayesian learning/updating question

Re: Bayesian learning/updating question

Re: Bayesian learning/updating question

Re: Bayesian learning/updating question

Re: Bayesian learning/updating question

Who is online