copy everything before the first blank line - perl

I have a file with several blocks of text separated by blank line. Ex.:
block1
block1
block2
block3
block3
I need a solution with sed, awk or Perl to locate the first blank line and redirect the previous block to another file and so on until the end of the file.
I have this command in sed that locates the first block, but not the rest:
sed -e '/./!Q'
Can someone help me?

give this line a try:
awk -v RS="" '{print > "file"++c".txt"}' input
it will generate file1...n.txt

Here's an awk:
$ awk 'BEGIN{file="file"++cont}/^$/{file="file"++cont;next}{print>file}' infile
Results
$ cat file1
block1
block1
$ cat file2
block2
$ cat file3
block3
block3

taking into account several empty string between block
awk '/./{if(!L)++C;print>"Out"C".txt"}{L=$0!~/^$/}' YourFile
Sed will not allow different external files (unspecified number of in fact) as output

Here's the solution in Perl
open( my $fh, '<', '/tmp/a.txt' ) or die $!;
{
## record delimiter
local $/ = "\n\n";
my $count = 1;
while ( chomp( my $block = <$fh> ) ) {
open( my $ofh, '>', sprintf( '/tmp/file%d', $count++ ) ) or die $!;
print {$ofh} $block;
close($ofh);
}
}
close($fh);

Here's my solution in Perl:
#!/usr/bin/perl
use strict;
use warnings;
my $n = 0;
my $block = '';
while (<DATA>) { # line gets stored in $_
if (/^\s*$/) { # blank line
write_to_file( 'file' . ++$n, $block );
$block = '';
} else {
$block .= $_;
}
}
# Write any remaining lines
write_to_file( 'file' . ++$n, $block );
sub write_to_file {
my $file = shift;
my $data = shift;
open my $fh, '>', $file or die $!;
print $fh $data;
close $fh;
}
__DATA__
block1
block1
block2
block3
block3
Output:
$ grep . file*
file1:block1
file1:block1
file2:block2
file3:block3
file3:block3

Another way to do it in Perl:
#!/usr/bin/perl
use strict;
use warnings;
# store all lines in $data
my $data = do { local $/; <DATA> };
my #blocks = split /\n\n/, $data;
my $n = 0;
write_to_file( 'file' . ++$n, $_ ) for #blocks;
sub write_to_file {
my $file = shift;
my $data = shift;
open my $fh, '>', $file or die $!;
print $fh $data;
close $fh;
}
__DATA__
block1
block1
block2
block3
block3

This might work for you (GNU csplit & sed):
csplit -qf uniqueFileName file '/^$/' '{*}' && sed -i '/^$/d' uniqueFileName*
or if you want to go with the defaults:
csplit -q file '/^$/' '{*}' && sed -i '/^$/d' xx*
Use:
tail -n+1 xx* # to check the results

Related

how to awk values of perl array?

I have an some attributes with values stored in an array as below, now i need to perform some checks on attribute values,Suggest me how can i proceed in perl.
#arr1 = `cat passwd.txt|tr ' ' '\n'|egrep -i "maxage|minage"|sort'`;
array arr1 contains info as "maxage=0 minage=0"
In this i need to perform if condition on the value of maxage, is there any way like below, suggest me as i am new to perl.
if ( #arr1[0]|awk -F= '{print $2}' == 0 )
{
printf "Then print task done";
}
You can do the whole process in Perl. For example:
use feature qw(say);
use strict;
use warnings;
my $fn = 'passwd.txt';
open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
my #arr1 = sort grep /maxage|minage/i, split ' ', <$fh>;
close $fh;
if ( (split /=/, shift #arr1)[1] == 0) {
say "done";
}
Can you try this?
#arr1 = `cat passwd.txt | tr ' ' '\n' | grep -i "maxage|minage"| sort`;
$x = `$arr1[0] | awk -F= {print $2}`; // maybe $3 is true index
if ( x == 0 )
{
print "Then print task done";
}

Perl - Open file - Last line is missing if newline is not presented after last line

Hello can somebody please explain me why I have different output from following two scripts:
01.pl
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
open FDGROUP, "< file" or die "Can't open file: $!\n";
my #file = <FDGROUP>;
close FDGROUP;
#file = grep {/\S/} #file;
#file = grep {s/\r//} #file;
#file = grep {s/\n//} #file;
print Dumper #file;
02.pl
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
open FDGROUP, "< file" or die "Can't open file: $!\n";
my #file = <FDGROUP>;
close FDGROUP;
#file = grep {/\S/} #file;
my $j = 0;
foreach (#file){
$_ =~ s/\r//;
$_ =~ s/\n//;
$file[$j++] = $_;
}
print Dumper #file;
Output:
wakatana#azureus ~/scripts/stackoverflow
$ perl 01.pl
$VAR1 = '1';
$VAR2 = '2';
$VAR3 = '3';
$VAR4 = '4';
$VAR5 = '5';
$VAR6 = '6';
wakatana#azureus ~/scripts/stackoverflow
$ perl 02.pl
$VAR1 = '1';
$VAR2 = '2';
$VAR3 = '3';
$VAR4 = '4';
$VAR5 = '5';
$VAR6 = '6';
$VAR7 = '7';
wakatana#azureus ~/scripts/stackoverflow
$ od -ab file
0000000 1 cr nl 2 cr nl 3 cr nl 4 cr nl 5 cr nl 6
061 015 012 062 015 012 063 015 012 064 015 012 065 015 012 066
0000020 cr nl 7
015 012 067
0000023
wakatana#azureus ~/scripts/stackoverflow
$ perl -e 'print $/' | od -ab
0000000 nl
012
0000001
When I add another newline after last line in file which I am opening the scripts gives same results (7 variables). I know that chomp is used for such operations but when I used following script:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
open FDGROUP, "< file" or die "Can't open file: $!\n";
my #file = <FDGROUP>;
close FDGROUP;
#file = grep {/\S/} #file;
chomp #file;
print Dumper #file;
I get following output:
wakatana#azureus ~/scripts/stackoverflow
$ perl 03.pl
';AR1 = '1
';AR2 = '2
';AR3 = '3
';AR4 = '4
';AR5 = '5
';AR6 = '6
';AR7 = '7
Probably this is caused by CR white-space or something with it.
All this is done under under cygwin.
Thanks
With these statements:
#file = grep {/\S/} #file; # strips any element which doesn't have non-whitespace characters
#file = grep {s/\r//} #file; # strips any elem which doesn't have a \r, strips \r from those that do
#file = grep {s/\n//} #file; # strips any elem which doesn't have a \n, strips \n from those that do
Each time you're building a new array. That new array consists of all the elements of the input to grep which match the given { block }.
With the last line missing its \n, it'll leave out that line.
Grep only works if it matches the expression. The last line doesn't have \n so it doesn't return a thing.
Unlike my other answer https://stackoverflow.com/a/24890193/3755747 not technically an answer to what you're really asking ... but your code is quite an old style Perl, so here are a few more modern alternatives for you.
Fully written out, basic Perl:
use strict;
use warnings;
use Data::Printer; # I prefer this over Data::Dumper
open( my $fh, '<', 'file' ) or die "can't open 'file': $!";
my #lines;
while ( my $line = <$fh> ) {
$line =~ s/^(.*?)\r?\n?$/$1/;
next if $line eq '';
push #lines, $line;
}
close $fh or die "can't close 'file': $!";
p( #lines );
A very compact version, but with explanation:
use strict;
use warnings;
use Data::Printer;
my #lines = grep {
s/
^ # start of string
(.*?) # capture non-greedy match, without the ? it consumes the \r and \n as well
\r? \n? # optional CR, optional LF
$ # end of string
/$1/x # replace with the match, whitespace allowed in regex
&& length # and string has to have some length remaining
} read_file( 'file' );
p( #lines );
Different way, using split:
use Modern::Perl '2012';
use File::Slurp;
use Data::Printer;
# added parenthesis around split arguments for clarity, they're not needed
my #lines = grep { length } split( /\r?\n/, read_file 'file' );
p( #lines );
Slurping is perfectly possible without modules as well:
use Modern::Perl;
use Data::Printer;
open( my $fh, '<', 'file' ) or die "can't open 'file': $!";
my #lines = grep { s/^(.*?)\r?\n?$/$1/ && length } <$fh>;
close $fh or die "can't close 'file': $!";
p( #lines );
I think I prefer the split version.

Remove first line of a file using perl -pe

I'm trying to remove the first line of the output file "bss_concurrent_calls.txt" using perl -pe instead of system & sed -i. The server I'm using is solaris 9 (infact it didn't recognize "sed -i")
open my $file_in, "<", "/export/home/cassi/4.1-15_HPBX/cfg/LicenseCounters.log" or die($!);
open my $file_out, '>', 'bss_concurrent_calls.txt' or die $!;
while( <$file_in> ) {
my #columns = split /\s+/, $_;
print $file_out "$columns[0]\t$columns[2]\n";
}
system('sed "1d" bss_concurrent_calls.txt');
close $file_in;
close $file_out or die $!;
No need to call sed from Perl here (or anywhere else).
perl -ane 'print "$F[0]\t$F[2]\n" unless $. == 1' \
< /export/.../LicenseCounters.log > bss_concurrent_calls.txt
I like #choroba's answer, but if you want to keep your program structure:
use autodie;
open my $file_in, '<', '/export/home/cassi/4.1-15_HPBX/cfg/LicenseCounters.log';
open my $file_out, '>', 'bss_concurrent_calls.txt';
my $line1 = <$file_in>; # read the first line
while (<$file_in>) {
print $file_out join("\t", (split)[0,2]), "\n";
}
close $file_in;
close $file_out;
while( <$file_in> ) {
next unless $i++;
my #columns = split /\s+/, $_;
print $file_out "$columns[0]\t$columns[2]\n";
}

error by running a bash command in Perl script

I wrote a Perl script in order to automate a process for sending a lot of jobs in an lsf based cluster.
The problem is that for a reason that I don't get it cuts the line with the job in two lines, and the job cannot run.
here is my script:
my $controler = $ARGV[0];
open my $cont, $controler or die "Could not open $controler: $!";
my $str = qx(wc -l $controler | awk '{print $1}');
#my $str2 = system($str);
#my #count = split /\n/,$str;
#print $str;
for (my $f = 1; $f <= $str; $f++) {
#print $count[0];
`mkdir ~/Multiple_alignments/ELEMENTS-AUTO-$f`;
#`mkdir ../SCORES-AUTO-$f`;
}
while ( $cont = <>){
chomp;
my #lines = split /\t/, $cont;
my $count2 = $lines[0];
my $cover = $lines[1];
my $length = $lines[2];
my $rho = $lines[3];
my #files = <*maf>;
foreach my $file (#files) {
#print $file . "\n";
my $base = basename($file, ".maf");
#print "$base\n";
#print "$cover\n";
#print "$length\n";
#print "$rho\n";
print "`bsub -q q_cf_htc_work -R \"select[type==X86_64 && mem>6000]\" rusage[mem=6000] -M6000 -o /home/vasilis.lenis/Multiple_alignments/out-files/phastCons_$base.o -e /home/vasilis.lenis/Multiple_alignments/out-files/phastCons_$base.e -J phastCons$base phastCons --target-coverage $cover --expected-length $length --rho $rho --most-conserved ../ELEMENTS-AUTO-$count2/most_conserved_$base.bed --msa-format MAF $file mytrees_no3.noncons.mod --no-post-probs`\n";
}
}
I just cannot understand why its happening.
(Also, the awk command that I have at the third line doesn't work)
Thank you in advance,
Vasilis.
Thank you very much for your guidance.
I believe that I solved the problem with the new line.
I was using wrong the chomp function.
chomp($cont);

Unique element in file - perl

I want to get the unique elements (lines) from a file which I will further send through email.
I have tried 2 methods but both are not working:
1st way:
my #array = "/tmp/myfile.$device";
my %seen = ();
my $file = grep { ! $seen{ $_ }++ } #array;
2nd way :
my $filename = "/tmp/myfile.$device";
cat $filename |sort | uniq > $file
How can I do it?
You seem to have forgotten to read the file!
open(my $fh, '<', $file_name)
or die("Can't open \"$file_name\": $!\n");
my %seen;
my #unique = grep !$seen{$_}++, <$fh>;
You need to open the file and read it.
"cat" is a shell command not perl
Try something like this
my $F;
die $! if(!open($F,"/tmp/myfile.$device"));
my #array = <$F>;
my %seen = (); my $file = grep { ! $seen{ $_ }++ } #array;
The die $! will stop the program with an error if the file doesn't open correctly;
#array=<$F> reads all the data from the file $F opened above into the array.
If you rig the argument list, you can make Perl open the file automatically, using:
perl -n -e 'BEGIN{#ARGV=("/tmp/myfile.device");} print if $count{$_}++ == 0;'