Require exactly 1 trailing newline in a file - perl

I have a mix of files with various ways of using trailing new lines. There are no carriage returns, it's only \n. Some files have multiple newlines and some files have no trailing newline. I want to edit the files in place.
How can I edit the files to have exactly 1 trailing newline?

To change text files in-place to have one and only one trailing newline:
sed -zi 's/\n*$/\n/'
This requires GNU sed.
-z tells sed to read in the file using the NUL character as a separator. Since text files have no NUL characters, this has the effect of reading the whole file in at once.
-i tells GNU sed to change the file in place.
s/\n*$/\n/ tells sed to replace however many newlines there are at the end of the file with a single newline.

Replace all trailing new lines with one?
$text =~ s/\n+$/\n/;
This leaves the file with one newline at the end – if it had at least one to start with. If you want it to be there even if the file didn't have one, replace \n+ with \n*.
For the in-place specification, implying a one-liner:
perl -i -0777 -wpe 's/\n+$/\n/' file.txt
The meaning of the switches is explained in Command Switches in perlrun.
Here is a summary of the switches. Please see the above docs for precise explanations.
-i changes the file "in place." Note that data is still copied and temporary files used
-0777 reads the file whole. The -0[oct|hex] sets $/ to the number, so to nul with -0
-w uses warnigs. Not exactly the same as use warnings but better than nothing
-p the code in '' runs on each line of file in turn, like -n, and then $_ is printed
-e what follows between '' is executed as Perl code
-E is the same but also enables features, likesay
Note that we can see the equivalent code by using core O and B::Deparse modules as
perl -MO=Deparse -wp -e 1
This prints
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
'???';
}
continue {
print $_;
}
-e syntax OK
showing a script equivalent to the one liner with -w and -p.

perl -i -0 -pe 's/\n\n*$/\n/' input-file

The solutions posted so far read your whole input file into memory which will be an issue if your file is huge. This only reads contiguous empty lines into memory:
awk -i inplace '/./{printf "%s", buf; buf=""; print; next} {buf = buf $0 ORS}' file
The above uses GNU awk for inplace editing.

Related

Deleting lines between two characters using sed

I have multiple datasets in txt format which have a predictable content. I am trying to remove the first set of lines. The first line starts with >*chromosome and I want to delete everything until >*plasmid. I can either tell it to delete everything from > until it encounters it again or delete everything between the first > and the second >. I have been trying something like this:
sed -i.bak '/>/,/^\>*$/{d}' file.txt
This did not work the original code I found was:
sed -i.bak '/>/,/^\s*$/{d}' file.txt
Use this Perl one-liner:
perl -0777 -pe 's{^>chromosome.*(?=^>plasmid)}{}sm' in.fasta
EXAMPLE:
# Create example input file:
cat > in.fasta <<EOF
>foo
TCGA
>chromosome
ACGT
>plasmid
CGTA
EOF
perl -0777 -pe 's{^>chromosome.*(?=^>plasmid)}{}sm' in.fasta > out.fasta
Output in out.fasta:
>foo
TCGA
>plasmid
CGTA
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-0777 : Slurp files whole.
The regex uses these modifiers:
/m : Allow multiline matches.
/s : Allow . to match a newline.
^>chromosome.*(?=^>plasmid) : Regex that matches >chromosome starts starts at the beginning of the line, followed by 0 or more characters, and ending right at (but not including) the match to >plasmid at the beginning of the line. The expression (?=PATTERN) is zero-length positive lookahead.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

Insert linebreak in a file after a string

I have a unique (to me) situation:
I have a file - file.txt with the following data:
"Line1", "Line2", "Line3", "Line4"
I want to insert a linebreak each time the pattern ", is found.
The output of file.txt shall look like:
"Line1",
"Line2",
"Line3",
"Line4"
I am having a tough time trying to escape ", .
I tried sed -i -e "s/\",/\n/g" file.txt, but I am not getting the desired result.
I am looking for a one liner using either perl or sed.
You may use this gnu sed:
sed -E 's/(",)[[:blank:]]*/\1\n/g' file.txt
"Line1",
"Line2",
"Line3",
"Line4"
Note how you can use single quote in sed command to avoid unnecessary escaping.
If you don't have gnu sed then here is a POSIX compliant sed solution:
sed -E 's/(",)[[:blank:]]*/\1\
/g' file.txt
To save changes inline use:
sed -i.bak -E 's/(",)[[:blank:]]*/\1\
/g' file.txt
Could you please try following. using awk's substitution mechanism here, in case you are ok with awk.
awk -v s1="\"" -v s2="," '{gsub(/",[[:blank:]]+"/,s1 s2 ORS s1)} 1' Input_file
Here's a Perl solution:
perl -pe 's/",\K/\n/g' file.txt
The substitution pattern matches the ",, but the \K says to ignore anything to the left for the replacement (so, ",) will not be replaced. The replacement then effectively inserts the newline.
I used the single quote for the argument to -e, but that doesn't work on Windows where you have to use ". Instead of escaping the ", you can specify it in another way. That's code number 0x22, so you can write:
perl -pe "s/\x22,\K/\n/g" file.txt
Or in octal:
perl -pe "s/\042,\K/\n/g" file.txt
Use this Perl one-liner:
perl -F'/"\K,\s*/' -lane 'print join ",\n", #F;' in_file > out_file
Or this for in-line replacement:
perl -i.bak -F'/"\K,\s*/' -lane 'print join ",\n", #F;' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
-F'/"\K,\s*/' : Split into #F on a double quote, followed by comma, followed by 0 or more whitespace characters, rather than on whitespace. \K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. This causes to keep the double quote in #F elements, while comma and whitespace are removed during the split.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

sed or awk to change a specific number in a file on RHEL7

I need help figuring out the syntax or what command to use to find an replace a specific number in a file.
I need to replace the number 10 with 25 in a configuration file. I have tried the following:
sed 's/10/25/g' /etc/security/limits.conf
This changes other instances that contain 10 such as 1000 and 10000 to 2500 and 25000, I need to juct change the need to just change 10 to 25. Please help.
Thank you,
Joseph
The trick here is to limit the sed substitution to the line you want to change. For limits.conf you are best off matching the domain, type and item. So if you wanted to just change a limit for domain #student, type hard, item nproc, you'd use something like
sed '/#student.*hard.*nproc/s/10/25/g' /etc/security/limits.conf
sed -ri '/^#/!s/(^.*)([[:space:]]10$)/\1 25/' /etc/security/limits.conf
With regular expression interpretation enabled (-r or -E), process all lines that don't start with a # by using ! We then split the lines into two sections, and replace the line for the first section followed by a space and 25. The $ ensure that the entry to replace is anchored at the end of the line.
Awk is another option:
awk -i 'NF==4 && $4==10 { gsub("10","25",$4) }1' /etc/security/limits.conf
Check if the line has 4 space delimited fields (NF==4) and the 4th field ($4) is 10. If this condition is met, replace 10 with 25 using gsub and print all lines with 1
The -i is an inplace amend flag on more recent versions of awk. If a compliant version is not available, use:
awk 'NF==4 && $4==10 { gsub("10","25",$4) }1' /etc/security/limits.conf > /etc/security/limits.tmp && mv -f /etc/security/limits.tmp /etc/security/limits.conf
Use this Perl one-liner, where \b stands for word break (so that 10 will not match 210 or 102):
perl -pe 's/\b10\b/25/g' in_file > out_file
Or to change the file in-place:
perl -i.bak -pe 's/\b10\b/25/g' in_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
The regex uses modifier /g : Match the pattern repeatedly.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

Perl deleting "blank" lines from a csv file

I'm looking to delete blank lines in a CSV file, using Perl.
I'm not too sure how to do this, as these lines aren't exactly "blank" (they're just a bunch of commas).
I'd also like to save the output as a file of the same name, overwriting the original.
How could I go about doing this?
edit: I can't use modules or any source code due to network restrictions...
You can do this using a simple Perl one-liner:
perl -i -ne 'print unless /^[,\s]*$/' <filename>
The -n flag assumes this loop around your program:
while(<>) {
print unless /^[,\s]*$/;
}
and the -i flag means inplace and modifies your input file.
Note: If you are worried about losing your data with -i, you can specify -i.bak and perl will automatically write the original file to your <filename>.bak
More of a command line hack,
perl -i -ne 'print if /[^,\r\n]/' file.csv
If you want to put it inside a shell script you can do this ...
#!/bin/sh
$(perl -i -n -e 'print $_ unless ($_ =~ /^\,+$/);' $*)

Perl regex to act on a file from the command line

In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.
This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out
Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt
Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe