How to extract the date from the string pattern in perl - perl

How do I extract date Simple_Invoice_Report.Summary.20150701000000.csv from this string in perl?
i.e. 20150701
Can you please help me out.

The simplest solution is a regular expression match. Assuming your date is always the first 8 digits in your string:
my ( $date ) = ( $string =~ m/(\d{8})/ );
If it's more complicated than that, you'll need to be a bit more specific.

Below is the code to extract the data from the above string:
#!/usr/bin/perl
use strict;
use warnings;
my $string = q{Simple_Invoice_Report.Summary.20150701000000.csv};
if($string =~ m#.(\d{8})\d+.csv#g){
print $1;
}
Output: 20150701

Related

Trying to find the index of the first number in a string using perl

I'm trying to find the index of the first occurrence of a number from 0-9.
Let's say that:
$myString = "ABDFSASF9fjdkasljfdkl1"
I want to find the position where 9 is.
I've tried this:
print index($myString,[0-9]);
And:
print index($myString,\d);
Use regex Positional Information:
use strict;
use warnings;
my $myString = "ABDFSASF9fjdkasljfdkl1";
if ($myString =~ /\d/) {
print $-[0];
}
Outputs:
8
You can try even below perl code:
use strict;
use warnings;
my $String = "ABDFSASF9fjdkasljfdkl11";
if($String =~ /(\d)/)
{
print "Prematch string of first number $1 is $`\n";
print "Index of first number $1 is " . length($`);
}
You can try this:
perl -e '$string="ABDFSASF9fjdkasljfdkl1";#array=split(//,$string);for $i (0..$#array) {if($array[$i]=~/\d/){print $i;last;}}'

separating a string which does not has a delimiter in it

Can we separate a string using index values, so that my input which comprises of combined characters can be separated and the final output would be a string which is readable.
Input = 20140610182213
Expecting Output = 2014-06-10 18:22:13
I usually use a substitution for similar tasks:
my $input = '20140610182213';
$input =~ s/(....)(..)(..)(..)(..)(..)/$1-$2-$3 $4:$5:$6/;
print $input;
Another possibility is to use substr:
my $input = '20140610182213';
my #delims = ('-', '-', ' ', ':', ':');
substr $input, $_, 0, pop #delims for 12, 10, 8, 6, 4;
print $input;
You might like unpack for this:
my ( $year, $mon, $mday, $hour, $min, $sec ) = unpack "A4 A2 A2 A2 A2 A2", "20140610182213";
say "$year-$mon-$mday $hour:$min:$sec";
Let's get all modern on you:
use strict;
use warnings;
use feature qw(say);
my $input = "20140610182213";
$input =~ /(?<year>\d{4})
(?<month>\d{2})
(?<day>\d{2})
(?<hour>\d{2})
(?<minute>\d{2})
(?<second>\d{2})/x;
say "$+{year}-$+{month}-$+{day} $+{hour}:$+{minute}:$+{second}";
I'm using named back references here. In Perl since the very early beginning, you always had numeric back references which I could set by using parentheses:
$input =~ /(\d{4})(\d{2})(\d{2})/;
my $year = $1;
my $month = $2;
my $day = $3;
Each parentheses grouping was a back reference. This was taken directly from sed.
In Perl 5.10, named back references can now be used. They're in the format of (?<name>regex) where name is the name of the back reference and regex is the regular expression. To refer to them, you use $+{name} where name is your back reference name.
The big advantage is that you now have actual names for your back references, and you don't have to worry what $2 means:
$input =~ /(?<year>\d{4})(?<month>\d{2})(?<day>\d{2})/;
my $year = $+{year};
my $month = $+{month};
my $day = $+{day};
Now, we use the x flag in regular expressions. This allows us to have multiple line regular expressions:
$input =~ /(?<year>\d{4})
(?<month>\d{2})
(?<day>\d{2})/x; #The /x flag
my $year = $+{year};
my $month = $+{month};
my $day = $+{day};
If you're not familiar with the syntax, it can be a bit hard on the eyes at first. However, one of the nice things about this is that it documents what's going on, and makes maintenance easer. Once your eyes adjust to the light, it is easy to see what is going on and find errors.
Another possibility is to use Time::Piece to convert that date time to something that Perl can directly manipulate:
use strict;
use warnings;
use feature qw(say);
use Time::Piece;
my $input = "20140610182213";
my $date_object = Time::Piece->strptime($input, "%Y%m%d%H%M%S");
printf "%04d-%02d-%02d %02d:%02d:%02d\n",
$date_object->year,
$date_object->mon,
$date_object->mday,
$date_object->hour,
$date_object->minute,
$date_object->second;
Again, what's going on is well documented. You can see what the input string is, and you can easily refer to each part of the string, and even change the formatting. What if you want the name of the month? Use $date_object->month.
For simple parsing and never using this again, the first way is probably the best. However, by using Time::Piece, you can now check to see how days there are between two dates (for example, how old is your date/time stamp?).
I think this is tidiest using a global pattrn match with sprintf:
my $dt = sprintf '%s%s-%s-%s %s:%s:%s', '20140610182213' =~ /../g;
print $dt;
output
2014-06-10 18:22:13
If you want the values assigned to variables (e.g. $y, $m,$d etc.) for use later:
perl -E '($y,$m,$d,$H,$M,$S)="20140610182213"=~/(....)(..)(..)(..)(..)(..)/;say "$y-$m-$d $H:$M:$S"'
Just use Time::Piece
use strict;
use warnings;
use Time::Piece;
my $string = "20140610182213";
my $date = Time::Piece->strptime($string, "%Y%m%d%H%M%S");
print $date->strftime("%Y-%m-%d %H:%M:%S"), "\n";
Outputs:
2014-06-10 18:22:13

How to split a this string 'gi|216ATGCTGATGCTGTG' in this format 'gi|216 ATGCTGTGCTGATGCTG' in Perl?

I am parsing the fasta alignment file which contains
gi|216CCAACGAAATGATCGCCACACAA
gi|21-GCTGGTTCAGCGACCAAAAGTAGC
I want to split this string into this:
gi|216 CCAACGAAATGATCGCCACACAA
gi|21- GCTGGTTCAGCGACCAAAAGTAGC
For first string, I use
$aar=split("\d",$string);
But that didn't work. What should I do?
So you're parsing some genetic data and each line has a gi| prefix followed by a sequence of numbers and hyphens followed by the nucleotide sequence? If so, you could do something like this:
my ($number, $nucleotides);
if($string =~ /^gi\|([\d-]+)([ACGT]+)$/) {
$number = $1;
$nucleotides = $2;
}
else {
# Broken data?
}
That assumes that you've already stripped off leading and trailing whitespace. If you do that, you should get $number = '216' and $nucleotides = 'CCAACGAAATGATCGCCACACAA' for the first one and $number = '216-' and $nucleotides = 'GCTGGTTCAGCGACCAAAAGTAGC' for the second one.
Looks like BioPerl has some stuff for dealing with fasta data so you might want to use BioPerl's tools rather than rolling your own.
Here's how I'd go about doing that.
#!/usr/bin/perl -Tw
use strict;
use warnings;
use Data::Dumper;
while ( my $line = <DATA> ) {
my #strings =
grep {m{\A \S+ \z}xms} # no whitespace tokens
split /\A ( \w+ \| [\d-]+ )( [ACTG]+ ) /xms, # capture left & right
$line;
print Dumper( \#strings );
}
__DATA__
gi|216CCAACGAAATGATCGCCACACAA
gi|21-GCTGGTTCAGCGACCAAAAGTAGC
If you just want to add a space (can't really tell from your question), use substitution. To put a space in front of any grouping of ACTG:
$string =~ s/([ACTG]+)/ \1/;
or to add a tab after any grouping of digits and dashes:
$string =~ s/([\d-]+)/\1\t/;
note that this will substitute on $string in place.

In Perl, how can I extract part of the hostname from a URI?

I want to find a string that begins with http:// and ends with .com.
but the http:// and .com it doesn't need to be printed.
$str = "http://example.com";
$str =~ /http:\/\/example.com/;$result = "$&\n";
print $result;
essentially the same as that done with python.
#!/usr/bin/python
import re
str = 'http://example.com'
search = re.search(r'http://(\w+).com', str)
if search:
print search.group(1)
it will only show "example". How to do it in Perl?
Robust solution with a specialised parser:
use feature 'say';
use strict; use warnings;
use URI;
use URI::Find;
URI::Find->new(sub {
my $uri = shift;
say $uri->host =~ m{(\w+)[.]com\z};
})->find(\ (my $x = q{http://example.com/}) );
Not so perlish solution below:
$str = 'http://example.com';
if (($url) = $str =~ /http:\/\/(\w+)\.com/) {
print $url, "\n";
}
Try this simple code:
$str = 'http://example.com';
print "$_\n" for $str =~ m{\A http:// (\w+) [.] com \z}x;
To ensure your result is complete, anchor the pattern at the beginning , \A, and end, \z. Use a different pattern delimiter than / to avoid the leaning toothpick syndrome, and use the x option to make your pattern more readable.
You need to use (...) to capture the part you want to extract.
You can test this code on ideone.com
In your Python snippet you're capturing the text you want with parentheses, but in your Perl snippet you've left them out. Also, the part you want to capture is hard-coded instead of expressed as \w+. Dig there.

In Perl, how can I find the day of the week which corresponds to a given date?

I am trying to convert given dates to the corresponding days of the week using Perl with Date::Day.
The input string is in the format: October 24, 2011; March 12, 1989; November 26, 1940.
I have written a script which will parse the above input and convert each date to a format which will be accepted by Date::Day::day().
This subroutine accepts the input in format, mm,dd,yyyy. I have done this using hashes. Recently posted a thread on this query on stackoverflow and with the help of other members, was able to do it.
Here's my script and it returns me ERR for each of the dates instead of returning the Day of Week corresponding to the date.
There seems to be something wrong with the input format of the parameter passed to day() subroutine.
Here's more documentation on the Perl Module I am using:
http://helpspy.com/c.m/programming/lang/perl/cpan/c06/Date/Day/d_1/
I am interested to know, where exactly I am going wrong. Do I have to make some modifications to the date before passing it as a parameter to the day() subroutine?
#!/usr/bin/perl
use Date::Day;
use strict;
use warnings;
my #arr;
print "Enter the string: ";
my $str=<>;
chomp $str;
my #dates= split /; /,$str;
my %days= ("January",1,"February",2,"March",3,"April",4,"May",5,"June",6,"July",7,"August",8,"September",9,"October",10,"November",11,"December",12);
my #output = map {
my $pattern=$_;
$pattern =~ s/(\S*)\s/$days{$1}, /;
$pattern =~ s/\s//g;
$pattern
} #dates;
print &day(11,9,1987); # for test purpose and it returns correct value
foreach $output (#output)
{
chomp $output;
my $result=&day($output);
push(#arr,$result);
}
foreach my $arr (#arr)
{
print $arr."; ";
}
The output of the above script is: ERR; ERR; ERR;
Date::Day looks like a rather old module. It was last updated almost nine years ago. That's not to say that it's broken, but these days the DateTime family of modules handle pretty much any date/time processing that you want. I'd strongly recommend taking a look at those instead.
Here's an example solving your problem using DateTime::Format::Strptime.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use DateTime::Format::Strptime;
my $dt_parser = DateTime::Format::Strptime->new(
pattern => '%B %d, %Y'
);
while (<DATA>) {
chomp;
my $dt = $dt_parser->parse_datetime($_);
say $dt->day_name;
}
__END__
October 24, 2011
March 12, 1989
November 26, 1940
You're passing a string to the sub &day.
Here is a rewrite :
#!/usr/local/bin/perl
use Data::Dump qw(dump);
use strict;
use warnings;
use Date::Day;
print "Enter the string: ";
my $str=<>;
chomp $str;
my #dates= split /; /,$str;
my %days = ("January",1,"February",2,"March",3,"April",4,"May",5,"June",6,"July",7,"August",8,"September",9,"October",10,"November",11,"December",12);
my #output = map {
my #l = split/[ ,]+/;
$l[0] = $days{$l[0]};
[#l];
} #dates;
my #arr;
foreach my $date(#output) {
push #arr, &day(#$date);
}
dump#arr;
output:
("MON", "SUN", "TUE")