To remove blank lines in data set

To remove blank lines in data set - perl

I need a one liner using sed, awk or perl to remove blank lines from my data file. The data in my file looks like this -
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca
These blanks are at random and appear anywhere in my data file. Can someone suggest a one-liner to remove these blank lines from my dataset.

it can be done in many ways.
e.g with awk:
awk '$0' yourFile
or sed:
sed '/^$/d' yourFile
or grep:
grep -v '^$' yourFile

A Perl solution. From the command line.
$ perl -i.bak -n -e'print if /\S/' INPUT_FILE
Edits the file in-place and creates a backup of the original file.

AWK Solution:
Here we loop through the input file to check if they have any field set. NF is AWK's in-built variable that is set to th number of fields. If the line is empty then NF is not set. In this one liner we test if NF is true, i.e set to a value. If it is then we print the line, which is implicit in AWK when the pattern is true.
awk 'NF' INPUT_FILE
SED Solution:
This solution is similar to the ones mentioned as the answer. As the syntax show we are not printing any lines that are blank.
sed -n '/^$/!p' INPUT_FILE

You can do:
sed -i.bak '/^$/d' file

A Perl solution:
perl -ni.old -e 'print unless /^\s*$/' file
...which create as backup copy of the original file, suffixed with '.old'

for perl it is as easier as sed,awk, or grep.
$ cat tmp/tmpfile
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca
$ perl -i -pe 's{^\s*\n$}{}' tmp/tmpfile
$ cat tmp/tmpfile
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca

Related

Sed Remove 3 last digits from string

27211;18:05:03479;20161025;0;0;0;0;10991;0;10991;000;0;0;000;1000000;0;0;000;0;0;0;82
Second string after ; is time. gg:mm:sssss:. I just want to be gg:mm:ss:
Like so:
27211;18:05:03;20161025;0;0;0;0;10991;0;10991;000;0;0;000;1000000;0;0;000;0;0;0;82
I tried with cut but it deletes everything after n'th occurance of character, and for now I am stuck, please help.

give this one liner a try:
awk -F';' -v OFS=";" 'sub(/...$/,"",$2)+1' file
It removes the last 3 chars from column 2.
update with sed one liner
If you are a fan of sed:
sed -r 's/(;[^;]*)...;/\1;/' file

With sed:
sed -r 's/^([^;]+;[^;]+)...;/\1;/' file
(Or)
sed -r 's/^([^;]+;[0-9]{2}:[0-9]{2}:[0-9]{2})...;/\1;/' file

It also can be something like sed 's/(.*)([0-9]{2}\:){2}([0-9]{3})[0-9]*\;(.*)/\1\2\3\4/g'
It is not very clean, but at least is more clear for me.
Regards

I'd use perl for this:
perl -pe 's/(?<=:\d\d)\d+(?=;)//' file
That removes any digits between "colon-digit-digit" and the semicolon (first match only, not globally in the line).
If you want to edit the file in-place: perl -i -pe ...

With sed:
sed -E 's/(:[0-9]{2})[0-9]{3}/\1/' file
or perl:
perl -pe's/:\d\d\B\K...//' file

split header of string

I want to reformat the lines below. Please see input example and desired output. I have been messing around with awk without finding the correct solution
Input:
>1-672762
TGAGGTAGTAGGTTGTATGGTT
>2-240457
TGAGGTAGTAGGTTGTGTGGTT
>3-130231
TAGCAGCACGTAAATATTGGCG
>4-116485
TGAGGTAGTAGGTTGTATAGTT
Output (needs to be tab separated):
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

With perl :
$ perl -lne '/^>\d+-(\d+)/ or print "$_\t$1"' file
Output:
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

Another approach in perl ("-" is chr(055)):
perl -wln055e's/(\S+)\s+(\S+).*/$2\t$1/s and print'
or
perl -wlp055e'BEGIN{<>}s/(\S+)\s+(\S+).*/$2\t$1/s'

$ awk -F- '/>/{x=$2;next} {print $0 "\t" x}' file
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

This might work for you (GNU sed):
sed -r 'N;s/^[^-]*-(.*)\n(.*)/\2\t\1/' file

How to change part of the string using sed?

I have a file data.txt with the following strings:
text-common-1.1.1-SNAPSHOT.jar
text-special-common-2.1.2-SNAPSHOT.jar
some-text-variant-1.1.1-SNAPSHOT.jar
text-another-variant-text-3.3.3-SNAPSHOT.jar
I want to change all of the text-something-digits-something.jar to text-something-5.0.jar.
Here is my script with sed (GNU sed version 4.2.1
), but it doesn't work, I don't know why:
#!/bin/bash
for t in ./data.txt
do
sed -i "s/\(text-[a-z]*-(\d|\.)*\).*\(.jar\)/\15.0\2/" ${t}
done
What is wrong with my sed usage?

How about this awk
awk '/^text/ {sub(/[0-9].*\./,"5.0.")}1'
text-common-5.0.jar
text-special-common-5.0.jar
some-text-variant-1.1.1-SNAPSHOT.jar
text-another-variant-text-5.0.jar
text-something-digits-something.jar to text-something-5.0.jar
equal change digits-someting to 5.0
It also takes care of changing line only starting with text

I think a simpler approach might be enough: sed -r -e 's/(text-(.*-)?common-)([0-9\.]+)(-.*\.jar)/\15.0\4/' < your_data.
Another way of saying the same thing with perl: perl -pe 's/(text-(?:(.*-))*common-)([\d\.]+)(-.*\.jar)/${1}1.5${4}/' < your_data.

#!/bin/bash
for t in ./data.txt
do
sed -i '/^text-/ s/[.0-9]\{1,\}-something\(\.jar\)$/5.0\2/' ${t}
# for "any" something
#sed -i '/^text-/ s/[.0-9]\{1,\}-[^?]\{1,\}\(\.jar\)$/5.0\2/' ${t}
done
select string starting with text and change digit value is present

Using sed:
sed '/^text-/ s/-[0-9.]*-/-5.0-/' file

Add new line using awk, sed

I have a large file which is slightly corrupted. The new lines have disappeared. There should have been a new line at every 250th character. How can I fix that?
Thanks in advance.

How about
sed 's/.\{250\}/&\n/g'
The .\{250\} captures 250 of any type of character. The characters are replaced by themselves, plus a newline.

try this:
sed -r 's/.{250}/&\n/g'
gawk:
awk -v FPAT='.{1,25}' -v OFS='\n' '$1=$1'

There is a command in coreutils that can wrap lines, it is called fold:
fold -w 250

sed 's/^.\{250\}/&\
/;P;D' YourFile
Could be faster on huge file

An awk version
awk '{L=250;for (i=1;i<=length($0);i+=L) print substr($0,i,L)}'

Delete first and last line or record from file using sed

I want to delete first and last line from the file
file1 code :
H|ACCT|XEC|1|TEMP|20130215035845|
849002|48|1208004|1
849007|28|1208004|1
T|2
After delete the output should be
849002|48|1208004|1
849007|28|1208004|1
I have tried below method but has to run it 2 times, I want one liner solution to remove both in one go!
sed '1,1d' file1.txt >> file1.out
sed '$d' file1.out >> file2
Please suggest one liner code....

You could use ;
sed '1d; $d' file

Use Command Separator
In sed, you can separate commands using a semicolon. For example:
sed '1d; $d' /path/to/file

How about:
sed '$d' < file1.txt | sed "1d"

Try sed -i '1d;$d' /path/to/file

awk 'NR>2{print v}{v=$0}'
Starting with line 3, print the previous line each time. This means the first and last lines will not be printed.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

To remove blank lines in data set - perl

it can be done in many ways. e.g with awk: awk '$0' yourFile or sed: sed '/^$/d' yourFile or grep: grep -v '^$' yourFile

A Perl solution. From the command line. $ perl -i.bak -n -e'print if /\S/' INPUT_FILE Edits the file in-place and creates a backup of the original file.

You can do: sed -i.bak '/^$/d' file

A Perl solution: perl -ni.old -e 'print unless /^\s*$/' file ...which create as backup copy of the original file, suffixed with '.old'

for perl it is as easier as sed,awk, or grep. $ cat tmp/tmpfile Aamir Ravi Arun Rampaul Pankaj Amit Bianca $ perl -i -pe 's{^\s*\n$}{}' tmp/tmpfile $ cat tmp/tmpfile Aamir Ravi Arun Rampaul Pankaj Amit Bianca

Related

Sed Remove 3 last digits from string

split header of string

How to change part of the string using sed?

Add new line using awk, sed

Delete first and last line or record from file using sed

Categories

Resources