Spamassassin: is bayesian learning working here? [closed]

Spamassassin: is bayesian learning working here? [closed] - email

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I am trying to train a recently installed copy of Spamassassin, and I'm having the impression that bayesian learning isn't working.
First of all: yes, spamd is running with the --allow-tell option.
Now, I have a piece of spam. I first run it by Spamassassin and I get a given score:
[paulo#myserver ~]$ spamc -R < spam6.txt
2.9/5.0
Spam detection software, running on the system "myserver",
has NOT identified this incoming email as spam. The original
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
the administrator of that system for details.
Content preview: Nombre - herbertrl1 E-mail: - mu18#atsushi1010.masumi76.pushmail.fun
Asunto - Mensaje - New sexy website is available on the web http://porndreamscene.sexjanet.com/?katarina
porn star carl paula blum porn double d hamster porn video oiled porn clitoris
massage free young nubile porn [...]
Content analysis details: (2.9 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
[Blocked - see <https://www.spamcop.net/bl.shtml?164.132.34.35>]
1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: sexjanet.com]
0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
So I feed it to spamc using the -L option:
[paulo#myserver ~]$ spamc -L spam < spam6.txt
Message successfully un/learned
And then I try to analyze it with spamc again... and I get the exact same score:
[paulo#myserver ~]$ spamc -R < spam6.txt
2.9/5.0
Spam detection software, running on the system "myserver",
has NOT identified this incoming email as spam. The original
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
the administrator of that system for details.
Content preview: Nombre - herbertrl1 E-mail: - mu18#atsushi1010.masumi76.pushmail.fun
Asunto - Mensaje - New sexy website is available on the web http://porndreamscene.sexjanet.com/?katarina
porn star carl paula blum porn double d hamster porn video oiled porn clitoris
massage free young nubile porn [...]
Content analysis details: (2.9 points, 5.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
1.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
[Blocked - see <https://www.spamcop.net/bl.shtml?164.132.34.35>]
1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: sexjanet.com]
0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
Am I missing something?

SpamAssasin : How much learning is needed for Bayes?
Default spamassassin configuration requires minimum 200 spam and 200 ham messages to train bayes. You can execute sa-learn --dump magic to check number of messages passed to bayes learning.
man Mail::SpamAssassin::Conf (SpamAssassin version 3.1)
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is
200 of each ham and spam, but you can tune these up or down with these two settings
$ sa-learn --dump magic
[…]
0.000 0 2508 0 non-token data: nspam
0.000 0 508 0 non-token data: nham
[…]

Related

Test RADIUS configuration method

I'm developing a product that need to integrate with RADIUS server as an authentication method.
When configuring the RADIUS server (IP Address, Port, Shared Secret) I would like to do a "test" in order to check that the configuration is valid - The server is available and it is indeed a RADIUS server, Shared secret is OK.
I did some research on how to do it,
My options are:
Send Access-Request message with fictional user name and password to the RADIUS server
Send Status-Server message to the RADIUS server
RFC 5997 introduces the use of Status-Server Packets in the RADIUS protocol.
This packet extension enabling clients to query the status of a RADIUS server.
The Status-Server is marked as experimental and as Informational RFC rather than as a Standards-Track RFC
My questions are:
Which are the most common \ in use RADIUS server vendors ? MS NPS, FreeRADIUS, Other?
Are these vendors supporting Status-Server request - Do they implementing this packet type ?
If i will use Access-Request, I will receive "Access-Reject" with a failure message in "Reply-Message" attribute. Can i understand the reason for the refusal from that text message? Is there any list of error codes\messages that are part of the Standard ?
Thanks a lot,
Yossi Zrahia

Ad 1) Exact (or even estimate) numbers are hard to come by, but you should expect to encounter FreeRADIUS, Microsoft NPS, Radiator and maybe Cisco ACS/ISE.
Ad 2) FreeRADIUS, Radiator support it. Microsoft NPS and Cisco ACS/ISE do not. If your "test" is used once (upon configuring) I would use option 1 with the Access-Request. If you wish to periodically check the availability and configuration of a RADIUS server, I would suggest implementing both options and allow for configuration of the check as part of the RADIUS configuration:
IP: 1.2.3.4
Port: 1812
Shared Secret: U7tr453cur3
Servercheck: [x] Status-Server
[ ] Access-Request
Ad 3) From RFC2865, section 5.18 (Reply-Message):
"[...] This Attribute indicates text which MAY be displayed to the user. [...] When used in an Access-Reject, it is the failure message. It MAY indicate a dialog message to prompt the user before another Access-Request attempt. [...] The Text field is one or more octets, and its contents are implementation dependent. It is intended to be human readable, and MUST NOT affect operation of the protocol. It is recommended that the message contain UTF-8 encoded 10646 [7] characters."
There apparently are no standard messages specified; however if IP, Port or Shared Secret are configured incorrectly you should not get a response at all, because RFC 2865 specifies:
"A request from a client for which the RADIUS server does not have a shared secret MUST be silently discarded."

Why do I receive a DMARC report everyday? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I've setup DMARC policy on my domain.
But every day I recieve an XML report from Google.
I don't understand what the problem is?
The report is:
<?xml version="1.0" encoding="UTF-8" ?>
<feedback>
<report_metadata>
<org_name>google.com</org_name>
<email>noreply-dmarc-support#google.com</email>
<extra_contact_info>http://support.google.com/a/bin/answer.py?answer=2466580</extra_contact_info>
<report_id>7241837801886321635</report_id>
<date_range>
<begin>1431388800</begin>
<end>1431475199</end>
</date_range>
</report_metadata>
<policy_published>
<domain>rigweb.ru</domain>
<adkim>r</adkim>
<aspf>r</aspf>
<p>none</p>
<sp>none</sp>
<pct>100</pct>
</policy_published>
<record>
<row>
<source_ip>144.76.154.188</source_ip>
<count>2</count>
<policy_evaluated>
<disposition>none</disposition>
<dkim>pass</dkim>
<spf>pass</spf>
</policy_evaluated>
</row>
<identifiers>
<header_from>site.ru</header_from>
</identifiers>
<auth_results>
<dkim>
<domain>rigweb.ru</domain>
<result>pass</result>
</dkim>
<spf>
<domain>site.ru</domain>
<result>pass</result>
</spf>
</auth_results>
</record>
</feedback>
My DMARC Policy:
v=DMARC1; sp=none; aspf=r; p=none; rua=mailto: support#site.ru
How can I solve the problem?

In short: it's all good.
Here's the explanation for this:
DMARC stands for Domain-based Message Authentication, Reporting, and Conformance. It is in fact a ruleset made for reporting back to you on the quality of the email messages received from your domain.
You are receiving those XML reports because it's what you asked with the rua=mailto:email#example.com; part of your dmarc TXT record. Note you didn't set any processing rule for failing messages: p=none means you only want to see the results of the checks.
As you can read in the specification (RFC 7489), RUA stands for "Reporting URI of Aggregate reports". Using it you are telling every DMARC-compliant recipient server to send you a (daily) aggregate report for the emails it receives by you or sent on your behalf.
Have a look at dmarc.org where you can find a nice overview of the system.
What does this specific report from Google tells you?
<date_range> This is the time range this report refers to (in your case, may 12 - may 13)
<policy_published> It's the parsed content of the dmarc record Google found in your DNS zone
<source_ip> The IP the emails were sent from
<policy_evaluated> The result of the DKIM and SPF checks are good (the two messages passed both tests).
Meaning:
The two messages sent by an email address #yoursite.ru and received by Google mail servers between may12 and may13 were correctly signed (DKIM) and were sent from authorized IPs (SPF). Based on this, we can reasonably say that Google has only received legitimate messages from your domain.

There's no problem. Aggregate reports (like this one) are sent as a summary (typically daily) of all emails received by that receiver - passing and failing.
The sample report you provided shows that all emails are passing, so there's nothing to fix.

separate email from original email using perl

When people email each other, they generally include the original email in their reply to a sender, adding a little more information each time to the email. Each email client seems to have a different way of adding the original email to a reply.
I need to parse email arriving at our mail server and try and extract the new part of the message, and I'm wondering if there is a sensible way to strip this appended (or prepended) information (the "original message") and just get the new information in a mail body? I believe sadly, that there is no encoding, the original email is simply added to the new message, but I thought I'd check with the experts?
thanks.

No, there is no simple, straightforward algorithm to separate quoted or forwarded text from new content. Quoting and forwarding are poorly standardized and different conventions have existed at different times.
Having said that, e.g. Google's Gmail succeeds fairly well in practice. With enough samples, you can clearly come up with reasonable heuristics.
Good indicators for quoted material are forwarded (pseudo-) headers and indented text, perhaps with a quote indicator along the left margin before the quoted text. You occasionally see outdents as well.
Traditionally, on Usenet in the early 1990s, people would use different, unique quoting styles.
: ~ | This seems to be the original.
: ~ This is the first reply.
: This is the second reply.
This is the third reply, quoting the
previous three messages in sequence.
Around 1995, both clients and standardization initiatives by and large converged on "wedge" quotes;
> >> This seems to be the original.
> > This is the first reply.
> This is the second reply.
This is the third reply, quoting the
previous three messages in sequence.
Then along came Microsoft and ruined it all. I suppose that top quoting makes sense in some corporate settings where you quickly need to collect all the background from a thread to a new participant, but even for that purpose it's a horrible abomination.
This is the third reply, quoting the
previous three messages in sequence.
---- Begin forwarded message ----
From: Him [smtp:bogus]
To: His Friend
Subject: VS: Re: Same as on this message
Date: nothing machine-readable
This is the second reply.
---- Alkuperäinen viesti ----
Lähettäjä: His Friend [smtp:poppycock]
Saaja: Some Guy
Aihe: Re: Same as on this message
Päivämäärä: olisiko eilen ehkä
This is the first reply.
----- Original message ----
From: Somebody Else [smtp:mindless]
To: Some Guy
Subject: Same as on this message
Date: like, the day before
This seems to be the original.

Want to automatically process email attachments based on username and subject

I'm seeking advice about setting up an email gateway so students can email me homework and the email will be processed automatically.
For example, if a studenta#univ.edu emails me with a subject of "CS208 hw1", I would cross check studenta in a list of students taking CS208, then take all the attached files, dump them in that student's hw1 folder and respond with an email stating what files were received and when. If the student's email was malformed in some way such as bad subject, or missing files, the service would send an appropriate email.
I have administrative access to an on-campus Linux machine that could be configured as an email server.
Offhand I was thinking of using fetchmail and a cron job to consistently read a designated user's email and perform the appropriate responses with some sort of script. Does this sound like a good way to go? I would welcome better ideas?

I expect that in practice there will be far, far more exceptions to whatever rules you prescribe than there will be conforming mail which is properly handled. You'll be buying yourself a headache of manual fixups and "the computer ate my homework" claims.
Since this is a CS 200 level class, require them to use some version control system and save yourself the hassles of parsing free-format e-mail with the rigid structure that a VCS imposes. Your students will benefit too from the requirement. If my 10-year-old could appreciate the merit of automatic revision control within Google Docs, I'm guessing your students can handle Mercurial or git or even (gasp!) Subversion.
added in response to comment
Yes, but with Mercurial (and presumably git) "repository" is a fancy word for "directory" and is not the heavyweight DBMSy thingy that older VCS models may have led you to expect.
Here is how as a student I would expect to work on a hypothetical assignment:
studenta#dorm$ hg clone https://Rich.univ.edu/studenta/cs208
$ cd cs208 ; broswer ./hw1.html
$ mkdir hw1 ; cd hw1 ; make my work files
$ hg add * ; hg commit -m "perfect the first time!" # updates locally only
$ make lots of bug fixes
$ hg commit -m "okay really done now"
$ hg push
# sleep, party, go to class with hangover
$ hg pull
$ browse hw2.html ; mkdir hw2
...
The assignments in the student's repository placed there by you was just for the sake of demonstration. Since you "own" the Rich.unix.edu machine, their pushes become authoritative. You'd
Write a (tiny) script to hg init $student/cs208 on Rich.univ.edu for each student in the roster.
Figure whether HTTPS or SSH works best in your environment
Add commentary - if desired - to the student's files that they'd pick up on their next pull
Have a managed, convenient, logged record of all the interactions.
The students get affirmative feedback at the moment of push that it was accepted
Finally, should the repository server be down they could
$ hg export tip | mail -s "server down; assignment done" Rich#univ.edu
And you'd still have a timestamped, digested version of their submission which has a rigid format which you could commit for them, or better still:
"Dr. Rich, the server was down!!!"
"But
you sent me an export via e-mail,
yes?"
"Of course, sir."
"Well, just
push when the machine is back up, I
already have proof that you completed
it on time."
"Oh gee, Dr. Rich, you're
swell!"

Personally, I would root for a page with an upload dialog and also the possibility to list current files and maybe an FTP server. The problem with Email is, that the transmission until the server is out of your reach, as the mail gets processed by other servers than your own on the way. Mails could be lost or altered on the way, not all servers might accept attachments of a certain size or type. Although the idea is quite good, I think it would produce a less than optimal solution than others, like the mentioned page or ftp server.
edit
I'd rather prefer msw's way. A version control system would spare you much hassle and problems. * tips hat to msw*

How to accurately parse smtp message status code (DSN)?

RFC1893 claims that status codes will come in the format below you can read more here.
But our bounce management system is having a hard time parsing error status code from bounce messages. We are able to get the raw message, but depending on the email server the code will come in different places. Is there any rule on how to parse this type of messages to obtain better results. We are not looking for the 100% solution but at least 80%.
This document defines a new set of status codes to report mail system
conditions. These status codes are intended to be used for media and
language independent status reporting. They are not intended for
system specific diagnostics.
The syntax of the new status codes
is defined as:
status-code = class "." subject "." detail
class = "2"/"4"/"5"
subject = 1*3digit
detail = 1*3digit
White-space characters and comments
are NOT allowed within a status-
code. Each numeric sub-code within
the status-code MUST be expressed
without leading zero digits.
The quote above from the RFC tells one thing but then the text below from a leading tool on bounce management says something different, where I can get a good source of standard status codes:
Return Code Description
0 UNDETERMINED - (ie. Recipient Reply)
10 HARD BOUNCE - (ie. User Unknown)
20 SOFT BOUNCE - General
21 SOFT BOUNCE - Dns Failure
22 SOFT BOUNCE - Mailbox Full
23 SOFT BOUNCE - Message Too Large
30 BOUNCE - NO EMAIL ADDRESS. VERY RARE!
40 GENERAL BOUNCE
50 MAIL BLOCK - General
51 MAIL BLOCK - Known Spammer
52 MAIL BLOCK - Spam Detected
53 MAIL BLOCK - Attachment Detected
54 MAIL BLOCK - Relay Denied
60 AUTO REPLY - (ie. Out Of Office)
70 TRANSIENT BOUNCE
80 SUBSCRIBE Request
90 UNSUBSCRIBE/REMOVE Request
100 CHALLENGE-RESPONSE

I'm not sure that it's a full answer, but this algorithm for detecting bounces might be useful.

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse