Unix/Perl - Remove contents of a file before a pattern - perl

I have a file like this
### SECTION 1 ###
data data
data data
### SECTION 2 ###
data data
data data
Now I want everything before SECTION 2 to be removed.
How can I do this in Perl or Unix?

To edit the file in-place:
perl -i -ne 'print if /SECTION 2/..0' file

perl -ne '$m = 1 if $_ =~ /SECTION 2/ ; next unless $m ; print $_;' filename > newfilename

$ perl -pi -e '$_ = "" unless /SECTION 2/ .. /(*FAIL)/' file

Related

awk output to variable and change directory

In the below script. am not able to change the directory.i need the output like above 70% disk inside that directory which one is consuming more space.
#!/usr/bin/perl
use strict;
use warnings;
my $test=qx("df -h |awk \+\$5>=70 {print \$6} ");
chdir($test) or die "$!";
print $test;
system("du -sh * | grep 'G'");
No need to call awk in your case because Perl is quite good at splitting and printing certain lines itself. Your code has some issues:
The code qx("df -h |awk \+\$5>=70 {print \$6} ") tries to execute the string "df -h | awk ..." as a command which fails because there is no such command called "df -h | awk". When I run that code I get sh: 1: df -h |awk +>=70 {print } : not found. You can fix that by dropping the quotes " because qx() already is quoting. The variable $test is empty afterwards, so the chdir changes to your $HOME directory.
Then you'll see the next error: awk: line 1: syntax error at or near end of line, because it calls awk +\$5>=70 {print \$6}. Correct would be awk '+\$5>=70 {print \$6}', i.e. with ticks ' around the awk scriptlet.
As stated in a comment, df -h splits long lines into two lines. Example:
Filesystem 1K-blocks Used Available Use% Mounted on
/long/and/possibly/remote/file/system
10735331328 10597534720 137796608 99% /local/directory
Use df -hP to get guaranteed column order and one line output.
The last system call shows the directory usage (space) for all lines containing the letter G. I reckon that's not exactly what you want.
I suggest the following Perl script:
#!/usr/bin/env perl
use strict;
use warnings;
foreach my $line ( qx(df -hP) ) {
my ($fs, $size, $used, $avail, $use, $target) = split(/\s+/, $line);
next unless ($use =~ /^\d+\s*\%$/); # skip header line
# now $use is e.g. '90%' and we drop the '%' sign:
$use =~ s/\%$//;
if ($use > 70) {
print "almost full: $target; top 5 directories:\n";
# no need to chdir here. Simply use $target/* as search pattern,
# reverse-sort by "human readable" numbers, and show the top 5:
system("du -hs $target/* 2>/dev/null | sort -hr | head -5");
print "\n\n";
}
}
#!/usr/bin/perl
use strict;
use warnings;
my #bigd = map { my #f = split " "; $f[5] }
grep { my #f = split " "; $f[4] =~ /^(\d+)/ && $1 >= 70}
split "\n", `df -hP`;
print "big directories: $_\n" for #bigd;
for my $bigd (#bigd) {
chdir($bigd);
my #bigsubd = grep { my #f = split " "; $f[0] =~ /G/ }
split "\n", `du -sh *`;
print "big subdirectories in $bigd:\n";
print "$_\n" for #bigsubd;
}
I belive you wanted to do something like this.

Performing a one-liner on multiple input files specified by extension

I'm using the following line to split and process a tab-delimited .txt file:
perl -lane 'next unless $. >30; #array = split /[:,\/]+/, $F[2]; print if $array[1]/$array[2] >0.5 && $array[4] >2' input.txt > output.txt
Is there a way to alter this one-liner in order to perform this on multiple input files without specifying each individually?
Ideally this would be accomplished by performing it on all files within the current directory holding the .txt (or other) file extension - and then outputting a set of modified files names e.g.:
Input:
test1.txt
test2.txt
Output:
test1MOD.txt
test2MOD.txt
I know that I can access the filename to modify it with $ARGV but I do not know how to go about getting it to run on multiple files.
Solution:
perl -i.MOD -lane 'next unless $. >30; #array = split /[:,\/]+/, $F[2]; print if $array[1]/$array[2] >0.5 && $array[4] >2; close ARGV if eof;' *.txt
$. needs to be reset otherwise it throws a division by zero error.
If you don't mind slightly different output file name,
perl -i.MOD -lane'
next unless $. >30;
#array = split /[:,\/]+/, $F[2];
print if $array[1]/$array[2] >0.5 && $array[4] >2;
close ARGV if eof; # Reset $. for each file.
' *.txt
Have you considered calling the perl script from a shell for loop?
for TXT in *.txt; do
OUT=$(basename $TXT .txt)MOD.txt
perl ... $TXT > $OUT
done

How to add blank line after every grep result using Perl?

How to add a blank line after every grep result?
For example, grep -o "xyz" may give something like -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I want the output to be like this -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I would like to do something like
grep "xyz" | perl (code to add a new line after every grep result)
This is the direct answer to your question:
grep 'xyz' | perl -pe 's/$/\n/'
But this is better:
perl -ne 'print "$_\n" if /xyz/'
EDIT
Ok, after your edit, you want (almost) this:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++'
If you don’t like the blank line at the beginning, make it:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++ && $. > 1'
NOTE: This won’t work right on filenames with colons in them. :)½
If you want to use perl, you could do something like
grep "xyz" | perl -p -e 's/(.*)/\1\n/g'
If you want to use sed (where I seem to have gotten better results), you could do something like
grep "xyz" | sed 's/.*/\0\n/g'
This prints a newline after every single line of grep output:
grep "xyz" | perl -pe 'print "\n"'
This prints a newline in between results from different files. (Answering the question as I read it.)
grep 'xyx' * | perl -pe '/(.*?):/; if ($f ne $1) {print "\n"; $f=$1}'
Use a state machine to determine when to print a blank line:
#!/usr/bin/env perl
use strict;
use warnings;
# state variable to determine when to print a blank line
my $prev_file = '';
# change DATA to the appropriate input file handle
while( my $line = <DATA> ){
# did the state change?
if( my ( $file ) = $line =~ m{ \A ([^:]*) \: .*? xyz }msx ){
# blank lines between states
print "\n" if $file ne $prev_file && length $prev_file;
# set the new state
$prev_file = $file;
}
# print every line
print $line;
}
__DATA__
file1:xyz
file2:xyz
file2:xyz2
file3:xyz

s3cmd list of contents - only filenames - perl one liner?

Currently I'm using s3cmd ls s3://location/ > file.txt to get a list of contents of my s3 bucket and save on a txt. However the above returns dates, filesizes paths and filenames.
for example:
2011-10-18 08:52 6148 s3://location//picture_1.jpg
I only need the filenames of the s3 bucket - so on the above example I only need picture_1.jpg.
Any suggestions?
Could this be done with a Perl one liner maybe after the initial export?
Use awk:
s3cmd ls s3://location/ | awk '{ print $4 }' > file.txt
If you have filenames with spaces, try:
s3cmd ls s3://location/ | awk '{ s = ""; for (i = 4; i <= NF; i++) s = s $i " "; print s }' > file.txt
File::Listing does not support this format because the designers of this listing format were stupid enough to not simply reuse an existing one. Let's parse it manually instead.
use URI;
my #ls = (
"2011-10-18 08:52 6148 s3://location//picture_1.jpg\n",
"2011-10-18 08:52 6148 s3://location//picture_2.jpg\n",
"2011-10-18 08:52 6148 s3://location//picture_3.jpg\n",
);
for my $line (#ls) {
chomp $line;
my $basename = (URI->new((split q( ), $line)[-1])->path_segments)[-1];
}
__END__
picture_1.jpg
picture_2.jpg
picture_3.jpg
As oneliner:
perl -mURI -lne 'print ((URI->new((split q( ), $line)[-1])->path_segments)[-1])' < input
I am sure a specific module is the safer option, but if the data is reliable, you can get away with a one-liner:
Assuming the input is:
2011-10-18 08:52 6148 s3://location//picture_1.jpg
2011-10-18 08:52 6148 s3://location//picture_2.jpg
2011-10-18 08:52 6148 s3://location//picture_3.jpg
...
The one-liner:
perl -lnwe 'print for m#(?<=//)([^/]+)$#'
-l chomps the input, and adds newline to end of print statements
-n adds a while(<>) loop around the script
(?<=//) lookbehind assertion finds a double slash
...followed by non-slashes to the end of the line
The for loop assures us that non-matches are not printed.
The benefit of the -n option is that this one-liner may be used in a pipe, or on a file.
command | perl -lnwe '...'
perl -lnwe '...' filename

awk or perl one-liner to print line if second field is longer than 7 chars

I have a file of 1000 lines, each line has 2 words, separated by a space. How can I print each line only if the last word length is greater than 7 chars? Can I use awk RLENGTH? is there an easy way in perl?
#OP, awk's RLENGTH is used when you call match() function. Instead, use the length() function to check for length of characters
awk 'length($2)>7' file
if you are using bash, a shell solution
while read -r a b
do
if [ "${#b}" -gt 7 ];then
echo $a $b
fi
done <"file"
perl -ane 'print if length($F[1]) > 7'
You can do:
perl -ne '#a=split/\s+/; print if length($a[1]) > 7' input_file.txt
Options used:
-n assume 'while () { ... }' loop around program
-e 'command' one line of program (several -e's allowed, omit programfile)
You can use the auto-split option as used by Chris
-a autosplit mode with -n or -p (splits $_ into #F)
perl -ane 'length $F[1] > 7 && print' <input_file>
perl -lane 'print if (length($F[$#F]) > 7)' fileName
or
perl -pae '$_ = "" if (length($F[$#F]) <= 7)' fileName