by Code Crafters » Mon Jan 21, 2008 11:51 am
Altering the subject to "<SPAM> ####subject####" where ####subject#### will be the original subject in this content filter action will have some but very minimal effect on Bayesian training. All words are used as tokens with good and bad counts in the Bayesian database. If a word appears in one more than the other it may be used for Bayesian scoring but only the strongest words that appear nearly always only in SPAM / non-SPAM will be used for scoring and for this reason it probably will never use these words anyway and will usually more use other parts of the header or body of the mail that are more recognisable with SPAM mails always. Therfore, in short, it's fine to use these and they won't adversely affect your Bayesian training at all.