reformat text in perl - perl

I have a file of 1000 lines, each line in the format
filename dd/mm/yyyy hh:mm:ss
I want to convert it to read
filename mmddhhmm.ss
been attempting to do this in perl and awk - no success - would appreciate any help
thanks

You can do a simple regular expression replacement if the format is really fixed:
s|(..)/(..)/.... (..):(..):(..)$|$2$1$3$4.$5|
I used | as a separator so that I do not need to escape the slashes.
You can use this with Perl on the shell in place:
perl -pi -e 's|(..)/(..)/.... (..):(..):(..)$|$2$1$3$4.$5|' file
(Look up the option descriptions with man perlrun).

Another somehow ugly approach: foreach line of code ($str here) you get from the file do something like this:
my $str = 'filename 26/12/2010 21:09:12';
my #arr1 = split(' ',$str);
my #arr2 = split('/',$arr1[1]);
my #arr3 = split(':',$arr1[2]);
my $day = $arr2[0];
my $month = $arr2[1];
my $year = $arr2[2];
my $hours = $arr3[0];
my $minutes = $arr3[1];
my $seconds = $arr3[2];
print $arr1[0].' '.$month.$day.$year.$hours.$minutes.'.'.$seconds;

Pipe your file to a perl script with:
while( my line = <> ){
if ( $line =~ /(\S+)\s+\(d{2})\/(\d{2})/\d{4}\s+(\d{2}):(\d{2}):(\d{2})/ ) {
print $1 . " " . $3 . $2 . $4 . $5 . '.' . $6;
}
}
Redirect the output however you want.
This says match line to:
(non-whitespace>=1)whitespace>=1(2digits)/(2digits)/4digits
whitepsace>=1(2digits):(2digits):(2digits)
Capture groups are in () numbered 1 to 6 left to right.

Using sed:
sed -r 's|/[0-9]{4} ||; s|/||; s/://; s/:/./' file.txt
delete the year /yyyy
delete the remaining slash
delete the first colon
change the remaining colon to a dot
Using awk:
awk '{split($2,d,"/"); split($3,t,":"); print $1, d[1] d[2] t[1] t[2] "." t[3]}'

Related

Split a perl string with a substring and a space

local_addr = sjcapp [value2]
How do you split this string so that I get 2 values in my array i.e.
array[0] = sjcapp and array[1] = value2.
If I do this
#array = split('local_addr =', $input)
then my array[0] has sjcapp [value2]. I want to be able to separate it into two in my split function itself.
I was trying something like this but it didn't work:
split(/local_addr= \s/, $input)
Untested, but maybe something like this?
#array = ($input =~ /local_addr = (\S+)\s\[(\S+)\]/);
Rather than split, this uses a regex match in list context, which gives you an array of the parts captured in parentheses.
~/ cat data.txt
local_addr = sjcapp [value2]
other_addr = superman [value1492]
euro_addr = overseas [value0]
If the data really is as regularly structured as that , then you can just split on the whitespace. On the command line (see the perlrun(1) manual page) this is easiest with "autosplit" (-a) which magically creates an array of fields called #F from the input:
perl -lane 'print "$F[2] $F[3]" ' data.txt
sjcapp [value2]
superman [value1492]
overseas [value0]
In your script you can change the name of array, and the position of the elements within,it by shift-ing or splice-ing - possibly in a more elegant way than this - but it works:
perl -lane 'my #array = ($F[2],$F[3]) ; print "$array[0], $array[1]" ' data.txt
Or, without using autosplit, as follows :
perl -lne 'my #arr=split(" ");splice(#arr,0,2); print "$arr[0] $arr[1]"' data.txt
try :
if ( $input =~ /(=)(.+)(\[)(.+)(\])/ ) {
#array=($2,$4);
}
I would use a regexp rather than a split, since this is clearly a standard format config file line. How you construct your regexp will likely depend on the full line syntax and how flexible you want to be.
if( $input =~ /(\S+)\s*=\s*(\S+)\s*\[\s*(\S+)\s*\]/ ) {
#array = ($2,$3);
}

split file into single lines via delimiter

Hi I have the following file:
>101
ADFGLALAL
GHJGKGL
>102
ASKDDJKJS
KAKAKKKPP
>103
AKNCPFIGJ
SKSK
etc etc;
and I need it in the following format:
>101
ADFGLALALGHJGKGL
>102
ASKDDJKJSKAKAKKKPP
>103
AKNCPFIGJSKSK
how can I do this? perhaps a perl one liner?
Thanks very much!
perl -npe 'chomp if ($.!=1 && !s/^>/\n>/)' input
Remove the newline at the end (chomp) if there is no > at the beginning (!s/^>/\n>/ is false). Also, add a newline at the beginning of the line if this is not the first line ($.!=1) and there is a > at the beginning of the line (s/^>/\n>/).
perl -lne '
if (/^>/) {print}
else{
if ($count) {
print $string . $_;
$count = 0;
} else {
$string = $_;
$count++;
}
}
' file.txt

Replace characters in certain postion of lines with whitespace

I need to be able to replace, character positions 58-71 with whitespace on every line in a file, on Unix / Solaris.
Extract example:
LOCAX0791LOCPIKAX0791LOC AX0791LOC095200130008PIKAX079100000000000000WL1G011 000092000000000000
LOCAX0811LOCPIKAX0811LOC AX0811LOC094700450006PIKAX0811000000000000006C1G011 000294000000000000
LOCAX0831LOCPIKAX0831LOC AX0831LOC094000180006PIKAX083100000000000000OJ1G011 000171000000000000
Or:
sed -r 's/^(.{57})(.{14})/\1 /' bar.txt
With apologies for the horrible 14 space string.
Simple Perl oneliner
perl -pne 'substr($_, 58, 13) = (" "x13);' inputfile.txt > outputfile.txt
try this:
awk 'BEGIN{FS=OFS=""} {for(i=57;i<=71;i++)$i=" "}1' file
output for your first line:
LOCAX0791LOCPIKAX0791LOC AX0791LOC095200130008PIKAX079 WL1G011
Try this in Perl:
use strict;
use warnings;
while(<STDIN>) {
my #input = split(//, $_);
for(my $i=58; $i<71; $i++) {
$input[$i] = " ";
}
$_ = join(//, #input);
print $_;
}
If you have gawk on your Solaris box, you could try:
gawk 'BEGIN{FIELDWIDTHS = "57 14 1000"} gsub(/./," ",$2)' OFS= file

Perl parsing - mixture of chars, tabs and spaces

I have the following types of line in my code:
MMAPI_CLOCK_OUTPUTS = 1, /*clock outputs system*/
MMAPI_SYSTEM_MANAGEMENT = 0, /*sys man system*/
I want to parse them to get:
'MMAPI_CLOCK_OUTPUTS'
'1'
'clock outputs system'
So I tried:
elsif($TheLine =~ /\s*(.*)s*=s*(.*),s*\/*(.*)*\//)
but this doesn't get the last string 'clock outputs system'
What should the parsing code actually be?
You should escape the slashes, stars and the s for spaces. Instead of writing /, * or s in your regex, write \/, \* and \s:
/\s*(.*)\s=\s*(.*),\s\/\*(.*)\*\//
if($TheLine =~ m%^(\S+)\s+=\s+(\d+),\s+/\*(.*)\*/%) {
print "$1 $2 $3\n"
}
This uses % as an alternative delimiter in order to avoid leaning toothpick syndrome when you escape the / characters.
Try this regex: /^\s*(.*?)\s*=\s*(\d+),\s*\/\*(.*?)\*\/$/
Here is an example in which you can test it:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "MMAPI_CLOCK_OUTPUTS = 1, /*clock outputs system*/\n
MMAPI_SYSTEM_MANAGEMENT = 0, /*sys man system*/";
while ($str =~ /^\s*(.*?)\s*=\s*(\d+),\s*\/\*(.*?)\*\/$/gm) {
print "$1 $2 $3 \n";
}
# Output:
# MMAPI_CLOCK_OUTPUTS 1 clock outputs system
# MMAPI_SYSTEM_MANAGEMENT 0 sys man system

Piping output from awk to perl

I want to make an array in Perl with the values obtained from my awk script. Then I can do math on them in Perl.
Here is my Perl, which runs a program, which saves a text file:
my $unix_command_dsc = (`./program -s test.fasta saved_file.txt`);
my $dsc_run = qx($unix_command_dsc);
Now I have some Awk that parses that data saved in the text file:
#!/usr/bin/awk -f
BEGIN{ # Initialize the values to zero. Note, done automatically also.
sumc4 = 0
sumc5 = 0
sumc6 = 0
}
/^[1-9][0-9]* residue/ {next} #Match line that begins with number and has word 'residue', skip it.
/^[1-9]/ { #Match line that begins with number.
sumc4 += $4 #Add up the values of the nth column into the variables.
sumc5 += $5
sumc6 += $6
print $4 "\t" $5 "\t" $6 #This will show the whole columns.
}
END{
print "sum H" "\t" "sum E" "\t" "sum C"
print sumc4 "\t" sumc5 "\t" sumc6
}
I run this Awk from terminal with the following commands:
./awk_program.txt saved_file.txt
Any ideas how I would gather this data from the print statements in awk into arrays in perl?
What I've tried is to just run that awk script in perl:
my $unix_command_awk = (`./awk_program.txt saved_file.txt`);
my $awk_run = qx($unix_command_awk);
But perl gives me errors and commands not found, like it thinks the data are commands. Should there be a STDOUT in the awk that I'm missing, rather than print?
It should just be:
my $awk_run = `./awk_program.txt saved_file.txt`;
Backticks tell perl to run the command and return the output. So your assignment to $unix_command_awk is running the command, and then qx($unix_command_awk) executes the output as a new command.
Pipe from awk to your perl script:
./awk_program file.txt | perl perl-script.pl
Then read from stdin inside the perl:
while (<>) {
# do stuff with $_
my #cols = split(/\t/);
}