Perl: How to remove spaces and blank lines in one pass - perl

I have got 2 perl scripts, first one removes blank lins from a file and the second one removes all spaces inside a file. I wonder, if it's possible to connect both of these regular expressions inside 1 script?
For spaces, i have used this regsub: $str =~ tr/ //d;
and for Blank lines, I have used this regexp
while (<$file>) {
if (/\S/){
print $new_file $_; }}

It should be really easy: just add tr/ //d before the if line.
Note: It will remove lines containing spaces only, too. If you want to keep them (but transliterated to empty lines), insert the transliteration before the print line.

If you wish to trim the end of the line that contains space,
you might want it to work like this:
perl -pi -e 's/\s*$/\n/' f1 f2 f3 #UNIX file format
perl -pi -e 's/\s*$/\r\n/' f1 f2 f3 #DOS file format

Related

How to remove newline from the end of a file using Perl

I have a file that reads like this:
dog cat mouse
apple orange pear
red yellow green
There is a tab \t separating the words on each row, and a newline \n separating each of the rows. Below the last line, red yellow green there is a blank line due to a newline \n after green.
I would like to use Perl to remove the newline.
I have seen a few articles like this How can I delete a newline if it is the last character in a file? that give solutions for Perl, but I would like to do this in hard code so that I can incorporate it into my Perl script.
I don't know if this might be possible using chomp, or if chomp works on each line separately (I would like to keep the newline between lines).
Also I have seen previously comments that suggest maintaining a newline at the end of a file because Unix commands work better when a file ends with a newline. However, I have created a script which relies on input files not ending with a newline, therefore I really feel removing the newlines is necessary for my work.
You can try this:
perl -pe 'chomp if eof' file.txt
Here is another simple way, if you need it in a script:
open $fh, "file.txt";
#lines=<$fh>; # read all lines and store in array
close $fh;
chomp $lines[-1]; # remove newline from last line
print #lines;
Or something like this (in script), as suggested by jnhc for the command line:
open $fh, "file.txt";
while (<$fh>) {
chomp if eof $fh;
print;
}
close $fh;

How to split a file (with sed) into numerous files according to a value found on each line?

I have several Company_***.csv files (altough the separator's a tab not a comma; hence should be *.tsv, but never mind) which contains a header plus numerous data lines e.g
1stHeader 2ndHeader DateHeader OtherHeaders...
111111111 SOME STRING 2020-08-01 OTHER STRINGS..
222222222 ANOT STRING 2020-08-02 OTHER STRINGS..
I have to split them according to the 3rd column here, it's a date.
Each file should be named like e.g. Company_2020_08_01.csv Company_2020_08_02.csv & so one
and containing: same header on the 1st line + matching rows as the following lines.
At first I thought about saving (once) the header in a single file e.g.
sed -n '1w Company_header.csv' Company_*.csv
then parsing the files with a pattern for the date (hence the headers would be skipped) e.g.
sed -n '/\t2020-[01][0-9]-[0-3][0-9]\t/w somefilename.csv' Company_*.csv
... and at last, insert the (missing) header in each generated file.
But I'm stuck at step 2: I can't find how I could generate (dynamically) the "filename" expected by the w command, neither how to capture the date in the search pattern (because apparently this is just an address, not a search-replace "field" as in the s/regexp/replacement/[flags] command, so you can't have capturing groups ( ) in there).
So I wonder if this is actually doable with sed? Or should I look upon other tools e.g. awk?
Disclaimer: I'm quite a n00b with these commands so I'm just learning/starting from scratch...
Perl to the rescue!
perl -e 'while (<>) {
$h = $_, next if $. == 1;
$. = 0 if eof;
#c = split /\t/;
open my $out, ">>", "Company_" . $c[2] =~ tr/-/_/r . ".csv" or die $!;
print {$out} $h unless tell $out;
print {$out} $_;
}' -- Company_*.csv
The diamond operator <> in scalar context reads a line from the input.
The first line of each file is stored in the variable $h, see $. and eof
split populates the #c array by the column values for each line
$c[2] contains the date, using tr we translate dashes to underscores to create a filename from it. open opens the file for appending.
print prints the header if the file is empty (see tell)
and prints the current line, too.
Note that it only appends to the files, so don't forget to delete any output files before running the script again.

Awk's output in Perl doesn't seem to be working properly

I'm writing a simple Perl script which is meant to output the second column of an external text file (columns one and two are separated by a comma).
I'm using AWK because I'm familiar with it.
This is my script:
use v5.10;
use File::Copy;
use POSIX;
$s = `awk -F ',' '\$1==500 {print \$2}' STD`;
say $s;
The contents of the local file "STD" is:
CIR,BS
60,90
70,100
80,120
90,130
100,175
150,120
200,260
300,500
400,600
500,850
600,900
My output is very strange and it prints out the desired "850" but it also prints a trailer of the line and a new line too!
ka#man01:$ ./test.pl
850
ka#man01:$
The problem isn't just printing. I need to use the variable generated by awk "i.e. the $s variable) but the variable is also being reserved with a long string and a new line!
Could you guys help?
Thank you.
I'd suggest that you're going down a dirty road by trying to inline awk into perl in the first place. Why not instead:
open ( my $input, '<', 'STD' ) or die $!;
while ( <$input> ) {
s/\s+\z//;
my #fields = split /,/;
print $fields[1], "\n" if $fields[0] == 500;
}
But the likely problem is that you're not handling linefeeds, and say is adding an extra one. Try using print instead, or chomp on the resultant string.
perl can do many of the things that awk can do. Here's something similar that replaces your entire Perl program:
$ perl -naF, -le 'chomp; print $F[1] if $F[0]==500' STD
850
The -n creates a while loop around your argument to -e.
The -a splits up each line into #F and -F lets you specify the separator. Since you want to separate the fields on a comma you use -F,.
The -l adds a newline each time you call print.
The -e argument is the program to run (with the added while from -n). The chomp removes the newline from the output. You get a newline in your output because you happen to use the last field in the line. The -l adds a newline when you print; that's important when you want to extract a field in the middle of the line.
The reason you get 2 newlines:
the backtick operator does not remove the trailing newline from the awk output. $s contains "850\n"
the say function appends a newline to the string. You have say "850\n" which is the same as print "850\n\n"

How to remove all lines except .c extention(at last) lines using perl scripting

I've a string $string which has got list of lines, some ending with *.c, *.pdf,etc and few without any extensions(these are directories). I need to remove all lines except *.c lines. How can i do that using regular expression? I've written to get removed *.c files as below but how to do a not of it?
next if $line =~ /(\.c)/i;
Any ideas.
thanks,
Sharath
Use unless instead of if to reverse the sense of the condition.
next unless $line =~ /\.c$/i;
or simply invert the test:
next if $line !~ /\.c$/i;
Also, you don't need parentheses around the regexp, and you need $ to anchor it to the end of the line.

How to combine two lines together using Perl?

How to combine two lines together using Perl? I'm trying to combine these two lines using a Perl regular expression:
__Data__
test1 - results
dkdkdkdkdkd
I would like the output to be like this:
__Data__
test1 - results dkdkdkdkdkd
I thought this would accomplish this but not working:
$_ =~ s/__Data__\n(test1.*)\n(.*)\n/__Data__\n$1 $2/smg;
If you have a multiline string:
s/__Data__\ntest1.*\K\n//g;
The /s modifier only makes the wildcard . match \n, so it will cause .* to slurp your newline and cause the match of \n to be displaced to the last place it occurs. Which, depending on your data, might be far off.
The /m modifier makes ^ and $ match inside the string at newlines, so not so useful. The \K escape preserves whatever comes before it, so you do not need to put it back afterwards.
If you have a single line string, for instance in a while loop:
while (<>) {
if (/^__Data__/) {
$_ .= <>; # add next line
chomp; # remove newline
$_ .= <>; # add third line
}
print;
}
There seems to be a problem with the setup of $_. When I run this script, I get the output I expect (and the output I think you'd expect). The main difference is that I've added a newline at the end of the replacement pattern in the substitute. The rest is cosmetic or test infrastructure.
Script
#!/usr/bin/env perl
use strict;
use warnings;
my $text = "__Data__\ntest1 - results\ndkdkdkdkdkd\n";
my $copy = $text;
$text =~ s/__Data__\n(test1.*)\n(.*)\n/__Data__\n$1 $2\n/smg;
print "<<$copy>>\n";
print "<<$text>>\n";
Output
<<__Data__
test1 - results
dkdkdkdkdkd
>>
<<__Data__
test1 - results dkdkdkdkdkd
>>
Note the use of << and >> to mark the ends of strings; it often helps when debugging. Use any symbols you like; just enclose your displayed text in such markers to help yourself debug what's going on.
(Tested with Perl 5.12.1 on RHEL 5 for x86/64, but I don't think the code is version or platform dependent.)