split header of string - perl

I want to reformat the lines below. Please see input example and desired output. I have been messing around with awk without finding the correct solution
Input:
>1-672762
TGAGGTAGTAGGTTGTATGGTT
>2-240457
TGAGGTAGTAGGTTGTGTGGTT
>3-130231
TAGCAGCACGTAAATATTGGCG
>4-116485
TGAGGTAGTAGGTTGTATAGTT
Output (needs to be tab separated):
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

With perl :
$ perl -lne '/^>\d+-(\d+)/ or print "$_\t$1"' file
Output:
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

Another approach in perl ("-" is chr(055)):
perl -wln055e's/(\S+)\s+(\S+).*/$2\t$1/s and print'
or
perl -wlp055e'BEGIN{<>}s/(\S+)\s+(\S+).*/$2\t$1/s'

$ awk -F- '/>/{x=$2;next} {print $0 "\t" x}' file
TGAGGTAGTAGGTTGTATGGTT 672762
TGAGGTAGTAGGTTGTGTGGTT 240457
TAGCAGCACGTAAATATTGGCG 130231
TGAGGTAGTAGGTTGTATAGTT 116485

This might work for you (GNU sed):
sed -r 'N;s/^[^-]*-(.*)\n(.*)/\2\t\1/' file

Related

Remove all the characters from string after last '/'

I have the followiing input file and I need to remove all the characters from the strings that appear after the last '/'. I'll also show my expected output below.
input:
/start/one/two/stopone.js
/start/one/two/three/stoptwo.js
/start/one/stopxyz.js
expected output:
/start/one/two/
/start/one/two/three/
/start/one/
I have tried to use sed but with no luck so far.
You could simply use good old grep:
grep -o '.*/' file.txt
This simple expression takes advantage of the fact that grep is matching greedy. Meaning it will consume as much characters as possible, including /, until the last / in path.
Original Answer:
You can use dirname:
while read line ; do
echo dirname "$line"
done < file.txt
or sed:
sed 's~\(.*/\).*~\1~' file.txt
perl -lne 'print $1 if(/(.*)\//)' your_file
Try this GNU sed command,
$ sed -r 's~^(.*\/).*$~\1~g' file
/start/one/two/
/start/one/two/three/
/start/one/
Through awk,
awk -F/ '{sub(/.*/,"",$NF); print}' OFS="/" file

How to change part of the string using sed?

I have a file data.txt with the following strings:
text-common-1.1.1-SNAPSHOT.jar
text-special-common-2.1.2-SNAPSHOT.jar
some-text-variant-1.1.1-SNAPSHOT.jar
text-another-variant-text-3.3.3-SNAPSHOT.jar
I want to change all of the text-something-digits-something.jar to text-something-5.0.jar.
Here is my script with sed (GNU sed version 4.2.1
), but it doesn't work, I don't know why:
#!/bin/bash
for t in ./data.txt
do
sed -i "s/\(text-[a-z]*-(\d|\.)*\).*\(.jar\)/\15.0\2/" ${t}
done
What is wrong with my sed usage?
How about this awk
awk '/^text/ {sub(/[0-9].*\./,"5.0.")}1'
text-common-5.0.jar
text-special-common-5.0.jar
some-text-variant-1.1.1-SNAPSHOT.jar
text-another-variant-text-5.0.jar
text-something-digits-something.jar to text-something-5.0.jar
equal change digits-someting to 5.0
It also takes care of changing line only starting with text
I think a simpler approach might be enough: sed -r -e 's/(text-(.*-)?common-)([0-9\.]+)(-.*\.jar)/\15.0\4/' < your_data.
Another way of saying the same thing with perl: perl -pe 's/(text-(?:(.*-))*common-)([\d\.]+)(-.*\.jar)/${1}1.5${4}/' < your_data.
#!/bin/bash
for t in ./data.txt
do
sed -i '/^text-/ s/[.0-9]\{1,\}-something\(\.jar\)$/5.0\2/' ${t}
# for "any" something
#sed -i '/^text-/ s/[.0-9]\{1,\}-[^?]\{1,\}\(\.jar\)$/5.0\2/' ${t}
done
select string starting with text and change digit value is present
Using sed:
sed '/^text-/ s/-[0-9.]*-/-5.0-/' file

Grep numbers between colon and comma

I want to grep all results which contain over 70 percent of usage
Example of output:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":69,"dir":"/root"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":1,"dir":"/oradump"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Expected View after the grep:
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Awk is more suited here:
$ awk -F'[:,]' '$6>70' file
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":79,"dir":"/oracle"},
{"ipaddr":"1.1.1.1","hostname":"host1.test.com","percentage":90,"dir":"/archive"},
Or with Perl:
$ perl -ne'print if /"percentage":([0-9]+),/ and $1 > 70'
(no pesky seperator counting needed)
perl -F'[:,]' -ane 'print if $F[5]>70' file
GNU sed
sed -n '/:[0]\?70,/d;/:[0-1]\?[7-9][0-9],/p' file

To remove blank lines in data set

I need a one liner using sed, awk or perl to remove blank lines from my data file. The data in my file looks like this -
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca
These blanks are at random and appear anywhere in my data file. Can someone suggest a one-liner to remove these blank lines from my dataset.
it can be done in many ways.
e.g with awk:
awk '$0' yourFile
or sed:
sed '/^$/d' yourFile
or grep:
grep -v '^$' yourFile
A Perl solution. From the command line.
$ perl -i.bak -n -e'print if /\S/' INPUT_FILE
Edits the file in-place and creates a backup of the original file.
AWK Solution:
Here we loop through the input file to check if they have any field set. NF is AWK's in-built variable that is set to th number of fields. If the line is empty then NF is not set. In this one liner we test if NF is true, i.e set to a value. If it is then we print the line, which is implicit in AWK when the pattern is true.
awk 'NF' INPUT_FILE
SED Solution:
This solution is similar to the ones mentioned as the answer. As the syntax show we are not printing any lines that are blank.
sed -n '/^$/!p' INPUT_FILE
You can do:
sed -i.bak '/^$/d' file
A Perl solution:
perl -ni.old -e 'print unless /^\s*$/' file
...which create as backup copy of the original file, suffixed with '.old'
for perl it is as easier as sed,awk, or grep.
$ cat tmp/tmpfile
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca
$ perl -i -pe 's{^\s*\n$}{}' tmp/tmpfile
$ cat tmp/tmpfile
Aamir
Ravi
Arun
Rampaul
Pankaj
Amit
Bianca

Perl regex to act on a file from the command line

In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.
This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out
Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt
Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe