Perl deleting "blank" lines from a csv file - perl

I'm looking to delete blank lines in a CSV file, using Perl.
I'm not too sure how to do this, as these lines aren't exactly "blank" (they're just a bunch of commas).
I'd also like to save the output as a file of the same name, overwriting the original.
How could I go about doing this?
edit: I can't use modules or any source code due to network restrictions...

You can do this using a simple Perl one-liner:
perl -i -ne 'print unless /^[,\s]*$/' <filename>
The -n flag assumes this loop around your program:
while(<>) {
print unless /^[,\s]*$/;
}
and the -i flag means inplace and modifies your input file.
Note: If you are worried about losing your data with -i, you can specify -i.bak and perl will automatically write the original file to your <filename>.bak

More of a command line hack,
perl -i -ne 'print if /[^,\r\n]/' file.csv

If you want to put it inside a shell script you can do this ...
#!/bin/sh
$(perl -i -n -e 'print $_ unless ($_ =~ /^\,+$/);' $*)

Related

Cross-platform compatible sed command?

I'm trying to use sed, but I want to run properly both under Linux and Mac. Currently, I have something like this:
if test -f ${GENESISFILE};
then
echo "Replacing ..."
sed -i '' "s/ADDRESS/${ADDRESS}/g" ${GENESISFILE}
else
echo "No such file"
fi
Now, the point is that using -i '' part it runs properly under Mac, but doesn't under Linux, and if I remove it then it doesn't work under Mac. What's proper way to make it cross-platform compatible?
Instead of sed one-liner:
sed -i '' "s/ADDRESS/${ADDRESS}/g" ${GENESISFILE}
use this cross-platform Perl one-liner, which runs OK on both Linux and macOS:
perl -i.bak -pe 's/ADDRESS/$ENV{ADDRESS}/g' ${GENESISFILE}
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. Use -i alone, without .bak, to skip making the backup.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

perl one-liner to keep only desired lines

I have a text file (input.txt) like this:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418345.2:
NP_418473.3: 1-19, 567-1093
NP_418398.2:
I want a perl one-liner that keeps only those lines in file where ":" is followed by number range (that means, here, the lines containing "NP_418345.2:" and "NP_418398.2:" get deleted). For this I have tried:
perl -ni -e "print unless /: \d/" -pi.bak input.txt del input.txt.bak
But it shows exactly same output as the input file.
What will be the exact pattern that I can match here?
Thanks
First, print unless means print if not -- opposite to what you want.
More to the point, it doesn't make sense using both -n and -p, and when you do -p overrides the other. While both of them open the input file(s) and set up the loop over lines, -p also prints $_ for every iteration. So with it you are reprinting every line. See perlrun.
Finally, you seem to be deleting the .bak file ... ? Then don't make it. Use just -i
Altogether
perl -i -ne 'print if /:\s*\d+\s*-\s*\d+/' input.txt
If you do want to keep the backup file use -i.bak instead of -i
You can see the code equivalent to a one-liner with particular options with B::Deparse (via O module)
Try: perl -MO=Deparse -ne 1 and perl -MO=Deparse -pe 1
This way:
perl -i.bak -ne 'print if /:\s+\d+-\d/' input.txt
This:
perl -ne 'print if /:\s*(\d+\s*-\s*\d+\s*,?\s*)+\s*$/' input.txt
Prints:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418473.3: 1-19, 567-1093
I'm not sure if you want to match lines that are possibly like this:
NP_418580.2: 493-500, asdf
or this:
NP_418580.2: asdf
This answer will not print these lines, if given to it.

Require exactly 1 trailing newline in a file

I have a mix of files with various ways of using trailing new lines. There are no carriage returns, it's only \n. Some files have multiple newlines and some files have no trailing newline. I want to edit the files in place.
How can I edit the files to have exactly 1 trailing newline?
To change text files in-place to have one and only one trailing newline:
sed -zi 's/\n*$/\n/'
This requires GNU sed.
-z tells sed to read in the file using the NUL character as a separator. Since text files have no NUL characters, this has the effect of reading the whole file in at once.
-i tells GNU sed to change the file in place.
s/\n*$/\n/ tells sed to replace however many newlines there are at the end of the file with a single newline.
Replace all trailing new lines with one?
$text =~ s/\n+$/\n/;
This leaves the file with one newline at the end – if it had at least one to start with. If you want it to be there even if the file didn't have one, replace \n+ with \n*.
For the in-place specification, implying a one-liner:
perl -i -0777 -wpe 's/\n+$/\n/' file.txt
The meaning of the switches is explained in Command Switches in perlrun.
Here is a summary of the switches. Please see the above docs for precise explanations.
-i changes the file "in place." Note that data is still copied and temporary files used
-0777 reads the file whole. The -0[oct|hex] sets $/ to the number, so to nul with -0
-w uses warnigs. Not exactly the same as use warnings but better than nothing
-p the code in '' runs on each line of file in turn, like -n, and then $_ is printed
-e what follows between '' is executed as Perl code
-E is the same but also enables features, likesay
Note that we can see the equivalent code by using core O and B::Deparse modules as
perl -MO=Deparse -wp -e 1
This prints
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
'???';
}
continue {
print $_;
}
-e syntax OK
showing a script equivalent to the one liner with -w and -p.
perl -i -0 -pe 's/\n\n*$/\n/' input-file
The solutions posted so far read your whole input file into memory which will be an issue if your file is huge. This only reads contiguous empty lines into memory:
awk -i inplace '/./{printf "%s", buf; buf=""; print; next} {buf = buf $0 ORS}' file
The above uses GNU awk for inplace editing.

Only print matching lines in perl from the command line

I'm trying to extract all ip addresses from a file. So far, I'm just using
cat foo.txt | perl -pe 's/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
but this also prints lines that don't contain a match. I can fix this by piping through grep, but this seems like it ought to be unnecessary, and could lead to errors if the regexes don't match up perfectly.
Is there a simpler way to accomplish this?
Try this:
cat foo.txt | perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
or:
<foo.txt perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
It's the shortest alternative I can think of while still using Perl.
However this way might be more correct:
<foo.txt perl -ne 'if (/((\d{1,3}\.){3}\d{1,3})/) { print $1 . "\n" }'
If you've got grep, then just call grep directly:
grep -Po "(\d{1,3}\.){3}\d{1,3}" foo.txt
You've already got a suitable answer of using grep to extract the IP addresses, but just to explain why you were seeing non-matches being printed:
perldoc perlrun will tell you about all the options you can pass Perl on the command line.
Quoting from it:
-p causes Perl to assume the following loop around your program, which makes it
iterate over filename arguments somewhat like sed:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
You could have used the -n switch instead, which does similar, but does not automatically print, for example:
cat foo.txt | perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1'
Also, there's no need to use cat; Perl will open and read the filenames you give it, so you could say e.g.:
perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1' foo.txt
ruby -0777 -ne 'puts $_.scan(/((?:\d{1,3}\.){3}\d{1,3})/)' file

Perl regex to act on a file from the command line

In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.
This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out
Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt
Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe