want to read file line by line and then want to cut the line on delimiter

want to read file line by line and then want to cut the line on delimiter - perl

cat $INPUT_FILE| while read LINE
do
abc=cut -d ',' -f 4 $LINE

Perl:
cat $INPUT_FILE | perl -ne '{my #fields = split /,/; print $fields[3];}'

The key is to use command substitution if you want the output of a command saved in a variable.
POSIX shell (sh):
while read -r LINE
do
abc=$(cut -d ',' -f 4 "$LINE")
done < "$INPUT_FILE"
If you're using a legacy Bourne shell, use backticks instead of the preferred $():
abc=`cut -d ',' -f 4 "$LINE"`
In some shells, you may not need to use an external utility.
Bash, ksh, zsh:
while read -r LINE
do
IFS=, read -r f1 f2 f3 abc remainder <<< "$LINE"
done < "$INPUT_FILE"
or
while read -r LINE
do
IFS=, read -r -a array <<< "$LINE"
abc=${array[3]}
done < "$INPUT_FILE"
or
saveIFS=$IFS
while read -r LINE
do
IFS=,
array=($LINE)
IFS=$saveIFS
abc=${array[3]}
done < "$INPUT_FILE"

Bash:
while read line ; do
cut -d, -f4 <<<"$line"
done < $INPUT_FILE

Straight Perl:
open (INPUT_FILE, "<$INPUT_FILE") or die ("Could not open $INPUT_FILE");
while (<INPUT_FILE>) {
#fields = split(/,/, $_);
$use_this_field_value = $fields[3];
# do something with field value here
}
close (INPUT_FILE);

Related

awk output to variable and change directory

In the below script. am not able to change the directory.i need the output like above 70% disk inside that directory which one is consuming more space.
#!/usr/bin/perl
use strict;
use warnings;
my $test=qx("df -h |awk \+\$5>=70 {print \$6} ");
chdir($test) or die "$!";
print $test;
system("du -sh * | grep 'G'");

No need to call awk in your case because Perl is quite good at splitting and printing certain lines itself. Your code has some issues:
The code qx("df -h |awk \+\$5>=70 {print \$6} ") tries to execute the string "df -h | awk ..." as a command which fails because there is no such command called "df -h | awk". When I run that code I get sh: 1: df -h |awk +>=70 {print } : not found. You can fix that by dropping the quotes " because qx() already is quoting. The variable $test is empty afterwards, so the chdir changes to your $HOME directory.
Then you'll see the next error: awk: line 1: syntax error at or near end of line, because it calls awk +\$5>=70 {print \$6}. Correct would be awk '+\$5>=70 {print \$6}', i.e. with ticks ' around the awk scriptlet.
As stated in a comment, df -h splits long lines into two lines. Example:
Filesystem 1K-blocks Used Available Use% Mounted on
/long/and/possibly/remote/file/system
10735331328 10597534720 137796608 99% /local/directory
Use df -hP to get guaranteed column order and one line output.
The last system call shows the directory usage (space) for all lines containing the letter G. I reckon that's not exactly what you want.
I suggest the following Perl script:
#!/usr/bin/env perl
use strict;
use warnings;
foreach my $line ( qx(df -hP) ) {
my ($fs, $size, $used, $avail, $use, $target) = split(/\s+/, $line);
next unless ($use =~ /^\d+\s*\%$/); # skip header line
# now $use is e.g. '90%' and we drop the '%' sign:
$use =~ s/\%$//;
if ($use > 70) {
print "almost full: $target; top 5 directories:\n";
# no need to chdir here. Simply use $target/* as search pattern,
# reverse-sort by "human readable" numbers, and show the top 5:
system("du -hs $target/* 2>/dev/null | sort -hr | head -5");
print "\n\n";
}
}

#!/usr/bin/perl
use strict;
use warnings;
my #bigd = map { my #f = split " "; $f[5] }
grep { my #f = split " "; $f[4] =~ /^(\d+)/ && $1 >= 70}
split "\n", `df -hP`;
print "big directories: $_\n" for #bigd;
for my $bigd (#bigd) {
chdir($bigd);
my #bigsubd = grep { my #f = split " "; $f[0] =~ /G/ }
split "\n", `du -sh *`;
print "big subdirectories in $bigd:\n";
print "$_\n" for #bigsubd;
}
I belive you wanted to do something like this.

running awk command in perl

I have a tab delimited file(dummy) that looks like this :
a b
a b
a c
a c
a b
I am trying to write an awk command inside the perl script in which the file.txt is being made.
The awk command :
$n=system(" awk -F"\t" '{if($1=="a" && $2=="b") print $1,$2}' file.txt|wc -l ")
Error :
comparison operator :error in '==' , ',' between $1 and $2 in print }'
The awk script is running fine on command line but giving error while running inside the script.
I don't see any syntax error in the awk command.

Aside from the fact that what are you trying to achieve by executing awk from within perl (since it could be accomplished using the latter itself), you could use the q operator:
$cmd = q(awk -F"\t" '{if($1=="a" && $2=="b") print $1,$2}' file.txt | wc -l);
$n = system($cmd);
Note that using double-quotes would interpolate variables and you'd need to escape those.

You can get the number of a\tbs from Perl itself without calling an external command:
#!/usr/bin/perl
use warnings;
use strict;
open my $FH, '<', 'file.txt' or die $!;
my $n = 0;
"a\tb\n" eq $_ and $n++ while <$FH>;
print "$n\n";

How to add blank line after every grep result using Perl?

How to add a blank line after every grep result?
For example, grep -o "xyz" may give something like -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I want the output to be like this -
file1:xyz
file2:xyz
file2:xyz2
file3:xyz
I would like to do something like
grep "xyz" | perl (code to add a new line after every grep result)

This is the direct answer to your question:
grep 'xyz' | perl -pe 's/$/\n/'
But this is better:
perl -ne 'print "$_\n" if /xyz/'
EDIT
Ok, after your edit, you want (almost) this:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++'
If you don’t like the blank line at the beginning, make it:
grep 'xyz' * | perl -pe 'print "\n" if /^([^:]+):/ && ! $seen{$1}++ && $. > 1'
NOTE: This won’t work right on filenames with colons in them. :)½

If you want to use perl, you could do something like
grep "xyz" | perl -p -e 's/(.*)/\1\n/g'
If you want to use sed (where I seem to have gotten better results), you could do something like
grep "xyz" | sed 's/.*/\0\n/g'

This prints a newline after every single line of grep output:
grep "xyz" | perl -pe 'print "\n"'
This prints a newline in between results from different files. (Answering the question as I read it.)
grep 'xyx' * | perl -pe '/(.*?):/; if ($f ne $1) {print "\n"; $f=$1}'

Use a state machine to determine when to print a blank line:
#!/usr/bin/env perl
use strict;
use warnings;
# state variable to determine when to print a blank line
my $prev_file = '';
# change DATA to the appropriate input file handle
while( my $line = <DATA> ){
# did the state change?
if( my ( $file ) = $line =~ m{ \A ([^:]*) \: .*? xyz }msx ){
# blank lines between states
print "\n" if $file ne $prev_file && length $prev_file;
# set the new state
$prev_file = $file;
}
# print every line
print $line;
}
__DATA__
file1:xyz
file2:xyz
file2:xyz2
file3:xyz

Counting lines ignored by grep

Let me try to explain this as clearly as I can...
I have a script that at some point does this:
grep -vf ignore.txt input.txt
This ignore.txt has a bunch of lines with things I want my grep to ignore, hence the -v (meaning I don't want to see them in the output of grep).
Now, what I want to do is I want to be able to know how many lines of input.txt have been ignored by each line of ignore.txt.
For example, if ignore.txt had these lines:
line1
line2
line3
I would like to know how many lines of input.txt were ignored by ignoring line1, how many by ignoring line2, and so on.
Any ideas on how can I do this?
I hope that made sense... Thanks!

Note that the sum of the ignored lines plus the shown lines may NOT add up to the total number of lines... "line1 and line2 are here" will be counted twice.
#!/usr/bin/perl
use warnings;
use strict;
local #ARGV = 'ignore.txt';
chomp(my #pats = <>);
foreach my $pat (#pats) {
print "$pat: ", qx/grep -c $pat input.txt/;
}

According to unix.stackexchange
grep -o pattern file | wc -l
counts the total number of a given pattern in the file. A solution, given this and the information, that you already use a script, is to use several grep instances to filter and count the patterns, which you want to ignore.
However, I'd try to build a more comfortable solution involving a scripting language like e.g. python.

This script will count the matched lines by hash lookup and save the lines to be printed in #result, where you may process them as you will. To emulate grep, just print them.
I made the script so it can print out an example. To use with the files, uncomment the code in the script, and comment the ones marked # example line.
Code:
use strict;
use warnings;
use v5.10;
use Data::Dumper; # example line
# Example data.
my #ignore = ('line1' .. 'line9'); # example line
my #input = ('line2' .. 'line9', 'fo' .. 'fx', 'line2', 'line3'); # example line
#my $ignore = shift; # first argument is ignore.txt
#open my $fh, '<', $ignore or die $!;
#chomp(my #ignore = <$fh>);
#close $fh;
my #result;
my %lookup = map { $_ => 0 } #ignore;
my $rx = join '|', map quotemeta, #ignore;
#while (<>) { # This processes the remaining arguments, input.txt etc
for (#input) { # example line
chomp; # Required to avoid bugs due to missing newline at eof
if (/($rx)/) {
$lookup{$1}++;
} else {
push #result, $_;
}
}
#say for #result; # This will emulate grep
print Dumper \%lookup; # example line
Output:
$VAR1 = {
'line6' => 1,
'line1' => 0,
'line5' => 1,
'line2' => 2,
'line9' => 1,
'line3' => 2,
'line8' => 1,
'line4' => 1,
'line7' => 1
};

while IFS= read -r pattern ; do
printf '%s:' "$pattern"
grep -c -v "$pattern" input.txt
done < ignore.txt
grep with -c counts matching lines, but with -v added it counts non-matching lines. So, simply loop over the patterns and count once for each pattern.

This will print the number of ignored matches along with the matching pattern:
grep -of ignore.txt input.txt | sort | uniq -c
For example:
$ perl -le 'print "Coroline" . ++$s for 1 .. 21' > input.txt
$ perl -le 'print "line2\nline14"' > ignore.txt
$ grep -of ignore.txt input.txt | sort | uniq -c
1 line14
3 line2
I.e., A line matching "line14" was ignored once. A line matching "line2" was ignored 3 times.
If you just wanted to count the total ignored lines this would work:
grep -cof ignore.txt input.txt
Update: modified the example above to use strings so that the output is a little clearer.

This might work for you:
# seq 1 15 | sed '/^1/!d' | sed -n '$='
7
Explanation:
Delete all lines except those that match. Pipe these matching (ignored) lines to another sed command. Delete all these lines but show the line number only of the last line. So in this example 1 thru 15, lines 1,10 thru 15 are ignored - a total of 7 lines.
EDIT:
Sorry misread the question (still a little confused!):
sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt| uniq -c,' ignore.txt | sh
This shows the number of matches for each pattern in the the ignore.txt
sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,' ignore.txt | sh
This shows the number of non-matches for each pattern in the the ignore.txt
If using GNU sed, these should work too:
sed 's,.*,sed "/&/!d;s/.*/matched &/" input.txt | uniq -c,;e' ignore.txt
or
sed 's,.*,sed "/&/d;s/.*/non-matched &/" input.txt | uniq -c,;e' ignore.txt
N.B. Your success with patterns may vary i.e. check for meta characters beforehand.
On reflection I thought this can be improved to:
sed 's,.*,/&/i\\matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c
or
sed 's,.*,/&/!i\\non-matched &,;$a\\d' ignore.txt | sed -f - input.txt | sort -k2n | uniq -c
But NO, on large files this is actually slower.

Are both ignore.txt and input.txt sorted?
If so, you can use the comm command!
$ comm -12 ignore.txt input.txt
How many lines are ignored?
$ comm -12 ignore.txt input.txt | wc -l
Or, if you want to do more processing, combine comm with awk.:
$ comm ignore.txt input.txt | awk '
END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}
{
if ($0 !~ /^\t/) {uniqtotal+=1}
if ($0 ~ /^\t[^\t]/) {commtotal+=1}
if ($0 ~ /^\t\t/) {igtotal+=1}
}'
Here I'm taking advantage with the tabs that are placed in the output by the comm command:
* If there are no tabs, the line is in ignore.txt only.
* If there is a single tab, it is in input.txt only
* If there are two tabs, the line is in both files.
By the way, not all the lines in ignore.txt are ignored. If the line isn't also in input.txt, the line can't really be said to be ignored.
With Dennis Williamson's Suggestion
comm ignore.txt input.txt | awk '
!/^\t/ {uniqtotal++}
/^\t[^\t]/ {commtotal++}
/^\t\t/ {igtotal++}
END {print "Ignored lines = " igtotal " Lines not ignored = "commtotal " Lines unique to Ignore file = " uniqtotal}'

How to write a Perl script to convert file to all upper case?

How can I write a Perl script to convert a text file to all upper case letters?

perl -ne "print uc" < input.txt
The -n wraps your command line script (which is supplied by -e) in a while loop. A uc returns the ALL-UPPERCASE version of the default variable $_, and what print does, well, you know it yourself. ;-)
The -p is just like -n, but it does a print in addition. Again, acting on the default variable $_.
To store that in a script file:
#!perl -n
print uc;
Call it like this:
perl uc.pl < in.txt > out.txt

$ perl -pe '$_= uc($_)' input.txt > output.txt

perl -pe '$_ = uc($_)' input.txt > output.txt
But then you don't even need Perl if you're using Linux (or *nix). Some other ways are:
awk:
awk '{ print toupper($0) }' input.txt >output.txt
tr:
tr '[:lower:]' '[:upper:]' < input.txt > output.txt

$ perl -Tpe " $_ = uc; " --
$ perl -MO=Deparse -Tpe " $_ = uc; " -- a s d f
LINE: while (defined($_ = <ARGV>)) {
$_ = uc $_;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK
$ cat myprogram.pl
#!/usr/bin/perl -T --
LINE: while (defined($_ = <ARGV>)) {
$_ = uc $_;
}
continue {
die "-p destination: $!\n" unless print $_;
}