Remove Text between two Strings - perl

I have a disclaimer message in an email which I want to remove using Perl.
The code is below:
my $stval = 'hii This is a test Email*************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message.
******MAILEND***** End of Disclaimer ******MAILEND*****';
$stval =~ s/[*]//g; # this removes all * Characters
print "$stval\n\n";
The output I am expecting should be as below:
hii This is a test Email

That scalar string has several embedded \n, as if it is a here document. You can remove everything from the first '*' to the end of the string with:
$stval =~ s/[*\n]+.+//g; # this removes all * Characters

Use s modifier to include newline in the deletion:
$stval =~ s/\*{10}.*//s;

Related

Perl Net::Jabber::Bot new line

I'm using Net::Jabber::Bot module in my Perl script and it works properly but one problem is that when I want to send a message all new lines get removed! Two questions :
How we can have new lines in our messages? Should we disable achomp somewhere?
What happens with new lines in Jabber/XMPP?
This is a known issue, somebody already submitted a patch for this: http://code.google.com/p/perl-net-jabber-bot/issues/detail?id=24
You are not able to send \n directly but you maybe able to send xmpp/jabber coded newline if that code does not contain unprintable chars.
In this sub:
sub _send_individual_message {
...
# Strip out anything that's not a printable character
# Now with unicode support?
$message_chunk =~ s/[^[:print:]]+/./xmsg;

What are these lines of log-parsing Perl doing and how can I come up with something that might work?

This problem comes under the context of pop-before-smtp / Postfix / Dovecot, but if I knew Perl string parsing, I could come up with an answer myself. However, I'm so lost I don't even know the precise question. To wit:
We've been using Postfix for a LONG time now and are kind of hooked on it. Now we need to "move into the modern era" and let people SEND email from our SMTP server(s) even when they're outside our network. So, tasked with this job, I've found pop-before-smtp.
You can find it here.
So, I've got it all configured but it fails in testing. I've troubleshot it using the directions here, and determined that the Perl that's trying to parse the log appears to be incorrect. We're using Dovecot as our IMAP / POP server, and there are three choices given in the configuration file. Here is an excerpt from the config file showing the three sets:
# For Dovecot POP3/IMAP when using syslog.
#$pat = '^[LOGTIME] \S+ (?:dovecot: )?(?:imap|pop3)-login: ' .
# 'Login: .*? (?:\[|rip=)[:f]*(\d+\.\d+\.\d+\.\d+)[],]';
#$out_pat = '^[LOGTIME] \S+ (?:dovecot: )?(?:imap|pop3)-login: ' .
# 'Disconnected.*? (?:\[|rip=)[:f]*(\d+\.\d+\.\d+\.\d+)[],]';
# For Dovecot POP3/IMAP when it does its own logging.
##$logtime_pat = '(\d\d\d\d-\d+-\d+ \d+:\d+:\d+)';
#$pat = '^dovecot: [LOGTIME] Info: (?:imap|pop3)-login: ' .
# 'Login: .+? rip=[:f]*(\d+\.\d+\.\d+\.\d+),';
#$out_pat = '^dovecot: [LOGTIME] Info: (?:imap|pop3)-login: ' .
# 'Disconnected.*? rip=[:f]*(\d+\.\d+\.\d+\.\d+),';
# For older Dovecot POP3/IMAP when it does its own logging.
#$pat = '^(?:imap|pop3)-login: [LOGTIME] Info: ' .
# 'Login: \S+ \[[:f]*(\d+\.\d+\.\d+\.\d+)\]';
#$out_pat = '^(?:imap|pop3)-login: [LOGTIME] Info: ' .
# 'Disconnected.*? \[[:f]*(\d+\.\d+\.\d+\.\d+)\]';
One is supposed to uncomment the ones that apply, however, none of them work.
I surmise that 'pat' is the pattern for login, and out-pat is the pattern for logging out or otherwise disconnecting.
The actual log record format is clearly different than any of these three, but they're close. Here are an example pair:
Mar 11 17:53:55 imap-login: Info: Login: user=<username>, method=PLAIN, rip=208.54.4.205, lip=192.168.1.1, TLS
Mar 11 17:59:10 IMAP(username): Info: Disconnected: Logged out bytes=352/43743
When using POP, 'imap-login' is replaced by 'pop-login', and on log-out, 'POP' replaces 'IMAP' - why the changes in capitalization I can't say!
Importand data are: The timestamp, the username, and, when logging in, the "remote" ip ("rip").
Given enough time, I may be able to piece together something that works, but since I don't actually know Perl, this is kind of tough. Please help me write new rules to parse the logging output used with our Dovecot package.
The (:?.. portion of a Perl regular expression asks for clustering but not capturing; this allows entire groups to be matched or ignored as as group without influencing the capture group numbers; all the lines capture exactly one field, the IP to allow. (Which is a little odd, I might have expected both username and IP, but this might be easier in the long run.)
# For Dovecot POP3/IMAP when using syslog.
$pat = '^[LOGTIME] \S+ (?:imap|pop3)-login: Info: ' .
'Login: .*? (?:\[|rip=)[:f]*(\d+\.\d+\.\d+\.\d+)[],]';
# not necessary? see comment header START OF PATTERNS
# $out_pat = '^[LOGTIME] \S+ (?:IMAP|POP3)\(\S+\): Info: ' .
# 'Disconnected.*';
I've removed the dovecot pieces since they weren't in your input. I added the Info: to both lines. I've modified the $out_pat to use IMAP(username) instead of the no-longer-there imap-login from the original. (The use of \S+ will break if usernames have spaces. Since this assumption was made elsewhere in the file, I hope it's fine.)
Since there is no longer any IP address to capture for the logout line, it is probably best to not define $out_pat -- the START OF PATTERNS comment block includes the phrase If the entry of your choice also provides $out_pat, you should uncomment that variable as well, which allows us to keep track of users who are still connected to the server (e.g. Thunderbird caches open IMAP connections).
I haven't tested this but I have good feelings about it.

Perl: pattern match a string and then print next line/lines

I am using Net::Whois::Raw to query a list of domains from a text file and then parse through this to output relevant information for each domain.
It was all going well until I hit Nominet results as the information I require is never on the same line as that which I am pattern matching.
For instance:
Name servers:
ns.mistral.co.uk 195.184.229.229
So what I need to do is pattern match for "Name servers:" and then display the next line or lines but I just can't manage it.
I have read through all of the answers on here but they either don't seem to work in my case or confuse me even further as I am a simple bear.
The code I am using is as follows:
while ($record = <DOMAINS>) {
$domaininfo = whois($record);
if ($domaininfo=~ m/Name servers:(.*?)\n/){
print "Nameserver: $1\n";
}
}
I have tried an example of Stackoverflow where
<DOMAINS>;
will take the next line but this didn't work for me and I assume it is because we have already read the contents of this into $domaininfo.
EDIT: Forgot to say thanks!
how rude.
So, the $domaininfo string contains your domain?
What you probably need is the m parameter at the end of your regular expression. This treats your string as a multilined string (which is what it is). Then, you can match on the \n character. This works for me:
my $domaininfo =<<DATA;
Name servers:
ns.mistral.co.uk 195.184.229.229
DATA
$domaininfo =~ m/Name servers:\n(\S+)\s+(\S+)/m;
print "Server name = $1\n";
print "IP Address = $2\n";
Now, I can match the \n at the end of the Name servers: line and capture the name and IP address which is on the next line.
This might have to be munged a bit to get it to work in your situation.
This is half a question and perhaps half an answer (the question's in here as I am not yet allowed to write comments...). Okay, here we go:
Name servers:
ns.mistral.co.uk 195.184.229.229
Is this what an entry in the file you're parsing looks like? What will follow immediately afterwards - more domain names and IP addresses? And will there be blank lines in between?
Anyway, I think your problem may (in part?) be related to your reading the file line by line. Once you get to the IP address line, the info about 'Name servers:' having been present will be gone. Multiline matching will not help if you're looking at your file line by line. Thus I'd recommend switching to paragraph mode:
{
local $/ = ''; # one paragraph instead of one line constitutes a record
while ($record = <DOMAINS>) {
# $record will now contain all consecutive lines that were NOT separated
# by blank lines; once there are >= 1 blank lines $record will have a
# new value
# do stuff, e.g. pattern matching
}
}
But then you said
I have tried an example of Stackoverflow where
<DOMAINS>;
will take the next line but this didn't work for me and I assume it is because we have already read the contents of this into $domaininfo.
so maybe you've already tried what I have just suggested? An alternative would be to just add another variable ($indicator or whatever) which you'll set to 1 once 'Name servers:' has been read, and as long as it's equal to 1 all following lines will be treated as containing the data you need. Whether this is feasible, however, depends on you always knowing what else your data file contains.
I hope something in here has been helpful to you. If there are any questions, please ask :)

PHP - How to identify e-mail addresses from input containing lines of misc data

Apologizing in advance for yet another email pattern matching query.
Here is what I have so far:
$text = strtolower($intext);
$lines = preg_split("/[\s]*[\n][\s]*/", $text);
$pattern = '/[A-Za-z0-9_-]+#[A-Za-z0-9_-]+\.([A-Za-z0-9_-][A-Za-z0-9_]+)/';
$pattern1= '/^[^#]+#[a-zA-Z0-9._-]+\.[a-zA-Z]+$/';
foreach ($lines as $email) {
preg_match($pattern,$email,$goodies);
$goodies[0]=filter_var($goodies[0], FILTER_SANITIZE_EMAIL);
if(filter_var($goodies[0], FILTER_VALIDATE_EMAIL)){
array_push($good,$goodies[0]);
}
}
$Pattern works fine but .rr.com addresses (and more issues I am sure) are stripped of .com
$pattern1 only grabs emails that are on a line by themselves.
I am pasting in a whole page of miscellaneous text into a textarea that contains some emails from an old data file I am trying to recover.
Everything works great except for the emails with more than one "." either before or after the "#".
I am sure there must be more issues as well.
I have tried several patterns I have found as well as some i tried to write.
Can someone show me the light here before I pull my remaining hair out?
How about this?
/((?:\w+[.]*)*(?:\+[^# \t]*)?#(?:\w+[.])+\w+)/
Explanation: (?:\w+[.])* recognizes 0 or more instances of strings of word characters (alphanumeric + _) optionally separated by strings of periods. Next, (?:\+[^# \t]*)? recognizes a plus sign followed by zero or more non-whitespace, non-at-sign characters. Then we have the # sign, and finally (?:\w+[.])+\w+, which matches a sequence of word character strings separated by periods and ending in a word character string. (ie, [subdomain.]domain.topleveldomain)

Splitting a variable and putting into an array

I have a string like this <name>sekar</name>. I want to split this string (i am using perl) and take out only sekar, and push it into an array while leaving other stuff.
I know how to push into an array, but struck with the splitting part.
Does any one have any idea of doing this?
push #output, $1 if m|<name>(\w*)</name>|;
Try this:
my($name) = $string =~ m|<name>(.*)</name>|;
From perldoc perlop:
If the "/g" option is not used, "m//" in list context returns a
list consisting of the subexpressions matched by the
parentheses in the pattern, i.e., ($1, $2, $3...).
Try <(("[^"]*"|'[^']*'|[^'">])*)>(\w+)<\/\1>. Should work, when I get home I'll test it. The idea is that the first capture group finds the contents within a <> and its nested capture group prevents a situation like <blah=">"> matching as <blah=">. The third capture group (\w+) matches the inner word. This may have to be changed depending on the format of the possibilities you can have within the <tag>content</tag>. Lastly the \1 looks back at the content of the first capture group so that this way you will find the proper closing tag.
Edit: I've tested this with perl and it works.