split file into single lines via delimiter - perl

Hi I have the following file:
>101
ADFGLALAL
GHJGKGL
>102
ASKDDJKJS
KAKAKKKPP
>103
AKNCPFIGJ
SKSK
etc etc;
and I need it in the following format:
>101
ADFGLALALGHJGKGL
>102
ASKDDJKJSKAKAKKKPP
>103
AKNCPFIGJSKSK
how can I do this? perhaps a perl one liner?
Thanks very much!

perl -npe 'chomp if ($.!=1 && !s/^>/\n>/)' input
Remove the newline at the end (chomp) if there is no > at the beginning (!s/^>/\n>/ is false). Also, add a newline at the beginning of the line if this is not the first line ($.!=1) and there is a > at the beginning of the line (s/^>/\n>/).

perl -lne '
if (/^>/) {print}
else{
if ($count) {
print $string . $_;
$count = 0;
} else {
$string = $_;
$count++;
}
}
' file.txt

Related

lowercase everything except content between single quotes - perl

Is there a way in perl to replace all text in input line except ones within single quotes(There could be more than one) using regex, I have achieved this using the code below but would like to see if it can be done with regex and map.
while (<>) {
my $m=0;
for (split(//)) {
if (/'/ and ! $m) {
$m=1;
print;
}
elsif (/'/ and $m) {
$m=0;
print;
}
elsif ($m) {
print;
}
else {
print lc;
}
}
}
**Sample input:**
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
**Sample output:**
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
You can give this a shot. All one regexp.
$str =~ s/(?:^|'[^']*')\K[^']*/lc($&)/ge;
Or, cleaner and more documented (this is semantically equivalent to the above)
$str =~ s/
(?:
^ | # Match either the start of the string, or
'[^']*' # some text in quotes.
)\K # Then ignore that part,
# because we want to leave it be.
[^']* # Take the text after it, and
# lowercase it.
/lc($&)/gex;
The g flag tells the regexp to run as many times as necessary. e tells it that the substitution portion (lc($&), in our case) is Perl code, not just text. x lets us put those comments in there so that the regexp isn't total gibberish.
Don't you play too hard with regexp for such a simple job?
Why not get the kid 'split' for it today?
#!/usr/bin/perl
while (<>)
{
#F = split "'";
#F = map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
print join "'", #F;
}
The above is for understanding. We often join the latter two lines reasonably into:
print join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);
Or enjoy more, making it a one-liner? (in bash shell) In concept, it looks like:
perl -pF/'/ -e 'join "'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
In reality, however, we need to respect the shell and do some escape (hard) job:
perl -pF/\'/ -e 'join "'"'"'", map { $_ % 2 ? $F[$_] : lc $F[$_] } (0..#F);' YOUR_FILE
(The single-quoted single quote needs to become 5 letters: '"'"')
If it doesn't help your job, it helps sleep.
One more variant with Perl one-liner. I'm using hex \x27 for single quotes
$ cat sql_str.txt
and (t.TARGET_TYPE='RAC_DATABASE' or (t.TARGET_TYPE='ORACLE_DATABASE' and t.TYPE_QUALIFIER3 != 'racinst'))
$ perl -ne ' { #F=split(/\x27/); for my $val (0..$#F) { $F[$val]=lc($F[$val]) if $val%2==0 } ; print join("\x27",#F) } ' sql_str.txt
and (t.target_type='RAC_DATABASE' or (t.target_type='ORACLE_DATABASE' and t.type_qualifier3 != 'racinst'))
$

Find matches in a log file based on the time and ID

I have a radius log file which is comma separated.
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
Is it possible through any Linux command line tool like awk to count the number of occurrences where the second column (the time) and the seventh column (the number) are the same, and a Start event follows a Stop event?
I want to find the occurrences where a Stop is followed by a Start at the same time for the same number.
There will be other entries as well with the same timestamp between these cases.
You don't say very clearly what kind of result you want, but you should use Perl with Text::CSV to process CSV files.
This program just prints the three relevant fields from all lines of the file where the event is Start or Stop and the time and the ID string are duplicated.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new;
open my $fh, '<', 'text.csv' or die $!;
my %data;
while (my $row = $csv->getline($fh)) {
my ($time, $event, $id) = #$row[1,3,6];
next unless $event eq 'Start' or $event eq 'Stop';
push #{ $data{"$time/$id"} }, $row;
}
for my $lines (values %data) {
next unless #$lines > 1;
print "#{$_}[1,3,6]\n" for #$lines;
print "\n";
}
output
00:52:23 Stop 15444111111
00:52:23 Start 15444111111
I have tried the following using GNU sed & awk
sed -n '/Stop/,/Start/{/Stop/{h};/Start/{H;x;p}}' text.csv \
| awk -F, 'NR%2 != 0 {prev=$0;time=$2;num=$7} \
NR%2 == 0 {if($2==time && $7==num){print prev,"\n", $0}}'
The sed part would select pairing Stop line and Start line. There can(or not) be other lines between the two lines, and if there are multiple Stop lines before a Start line the last Stop line would be selected (This may be not necessary in this case...).
The awk part would compare the selected pairs in sed part, if the second and seventh columns are identical, the pair would be print out.
My test as below:
text.csv:
"1/3/2013","00:52:20","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","XXXX","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","XXXX","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
"1/3/2013","00:52:28","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:29","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
The output:
"1/3/2013","00:52:23","NASK","Stop","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC400",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","ATGGSN17","2","7",,,"1385772885",,
"1/3/2013","00:52:23","NASK","Start","15444111111","200","15444111111","15444111111","10.142.98.190","moen",,,,,"D89BA1F93E5DC500",,,"31026","216.155.166.8","310260010265999",,"10.184.81.145","780246","18","A","2","7",,,"1385772885",,
If the "stop" line is followed immediately by the "start" line, you could try the following:
awk -f cnt.awk input.txt
where cnt.awk is
BEGIN {
FS=","
}
$4=="\"Stop\"" {
key=($2 $5)
startl=$0
getline
if ($4=="\"Start\"") {
if (key==($2 $5)) {
print startl
print $0
}
}
}
Update
If there can be other lines between a "Start" and "Stop" line, you could try:
BEGIN {
FS=","
}
$4=="\"Stop\"" {
a[($2 $5)]=$0
next
}
$4=="\"Start\"" {
key=($2 $5)
if (key in a) {
sl[++i]=a[key]
el[i]=$0
}
}
END {
nn=i
for (i=1; i<=nn; i++) {
print sl[i]
print el[i]
}
}

Replace characters in certain postion of lines with whitespace

I need to be able to replace, character positions 58-71 with whitespace on every line in a file, on Unix / Solaris.
Extract example:
LOCAX0791LOCPIKAX0791LOC AX0791LOC095200130008PIKAX079100000000000000WL1G011 000092000000000000
LOCAX0811LOCPIKAX0811LOC AX0811LOC094700450006PIKAX0811000000000000006C1G011 000294000000000000
LOCAX0831LOCPIKAX0831LOC AX0831LOC094000180006PIKAX083100000000000000OJ1G011 000171000000000000
Or:
sed -r 's/^(.{57})(.{14})/\1 /' bar.txt
With apologies for the horrible 14 space string.
Simple Perl oneliner
perl -pne 'substr($_, 58, 13) = (" "x13);' inputfile.txt > outputfile.txt
try this:
awk 'BEGIN{FS=OFS=""} {for(i=57;i<=71;i++)$i=" "}1' file
output for your first line:
LOCAX0791LOCPIKAX0791LOC AX0791LOC095200130008PIKAX079 WL1G011
Try this in Perl:
use strict;
use warnings;
while(<STDIN>) {
my #input = split(//, $_);
for(my $i=58; $i<71; $i++) {
$input[$i] = " ";
}
$_ = join(//, #input);
print $_;
}
If you have gawk on your Solaris box, you could try:
gawk 'BEGIN{FIELDWIDTHS = "57 14 1000"} gsub(/./," ",$2)' OFS= file

How to compress 4 consecutive blank lines into one single line in Perl

I'm writing a Perl script to read a log so that to re-write the file into a new log by removing empty lines in case of seeing any consecutive blank lines of 4 or more. In other words, I'll have to compress any 4 consecutive blank lines (or more lines) into one single line; but any case of 1, 2 or 3 lines in the file will have to remain the format. I have tried to get the solution online but the only I can find is
perl -00 -pe ''
or
perl -00pe0
Also, I see the example in vim like this to delete blocks of 4 empty lines :%s/^\n\{4}// which match what I'm looking for but it was in vim not Perl. Can anyone help in this? Thanks.
To collapse 4+ consecutive Unix-style EOLs to a single newline:
$ perl -0777 -pi.bak -e 's|\n{4,}|\n|g' file.txt
An alternative flavor using look-behind:
$ perl -0777 -pi.bak -e 's|(?<=\n)\n{3,}||g' file.txt
use strict;
use warnings;
my $cnt = 0;
sub flush_ws {
$cnt = 1 if ($cnt >= 4);
while ($cnt > 0) {print "\n"; $cnt--; }
}
while (<>) {
if (/^$/) {
$cnt++;
} else {
flush_ws();
print $_;
}
}
flush_ws();
Your -0 hint is a good one since you can use -0777 to slurp the whole file in -p mode. Read more about these guys in perlrun So this oneliner should do the trick:
$ perl -0777 -pe 's/\n{5,}/\n\n/g'
If there are up to four new lines in a row, nothing happens. Five newlines or more (four empty lines or more) are replaced by two newlines (one empty line). Note the /g switch here to replace not only the first match.
Deparsed code:
BEGIN { $/ = undef; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
s/\n{5,}/\n\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
HTH! :)
One way using GNU awk, setting the record separator to NUL:
awk 'BEGIN { RS="\0" } { gsub(/\n{5,}/,"\n")}1' file.txt
This assumes that you're definition of empty excludes whitespace
This will do what you need
perl -ne 'if (/\S/) {$n = 1 if $n >= 4; print "\n" x $n, $_; $n = 0} else {$n++}' myfile

reformat text in perl

I have a file of 1000 lines, each line in the format
filename dd/mm/yyyy hh:mm:ss
I want to convert it to read
filename mmddhhmm.ss
been attempting to do this in perl and awk - no success - would appreciate any help
thanks
You can do a simple regular expression replacement if the format is really fixed:
s|(..)/(..)/.... (..):(..):(..)$|$2$1$3$4.$5|
I used | as a separator so that I do not need to escape the slashes.
You can use this with Perl on the shell in place:
perl -pi -e 's|(..)/(..)/.... (..):(..):(..)$|$2$1$3$4.$5|' file
(Look up the option descriptions with man perlrun).
Another somehow ugly approach: foreach line of code ($str here) you get from the file do something like this:
my $str = 'filename 26/12/2010 21:09:12';
my #arr1 = split(' ',$str);
my #arr2 = split('/',$arr1[1]);
my #arr3 = split(':',$arr1[2]);
my $day = $arr2[0];
my $month = $arr2[1];
my $year = $arr2[2];
my $hours = $arr3[0];
my $minutes = $arr3[1];
my $seconds = $arr3[2];
print $arr1[0].' '.$month.$day.$year.$hours.$minutes.'.'.$seconds;
Pipe your file to a perl script with:
while( my line = <> ){
if ( $line =~ /(\S+)\s+\(d{2})\/(\d{2})/\d{4}\s+(\d{2}):(\d{2}):(\d{2})/ ) {
print $1 . " " . $3 . $2 . $4 . $5 . '.' . $6;
}
}
Redirect the output however you want.
This says match line to:
(non-whitespace>=1)whitespace>=1(2digits)/(2digits)/4digits
whitepsace>=1(2digits):(2digits):(2digits)
Capture groups are in () numbered 1 to 6 left to right.
Using sed:
sed -r 's|/[0-9]{4} ||; s|/||; s/://; s/:/./' file.txt
delete the year /yyyy
delete the remaining slash
delete the first colon
change the remaining colon to a dot
Using awk:
awk '{split($2,d,"/"); split($3,t,":"); print $1, d[1] d[2] t[1] t[2] "." t[3]}'