Replace compound number with sed - sed

I need to replace a list of values with texts, but I can't distinguish between numbers 0, 1 and 10, for example:
0 - mamao
1 - banana
2 - apple
10 - strawberry
11 - lemon
sed 's/0/papaya/;s/1/banana/;s/2/apple/;s/10/strawberry/;s/11/lemon/'
cat fruits | sed 's/0/papaya/;s/1/banana/;s/2/apple/;s/10/strawberry/;s/11/lemon/'
papaya
banana
apple
bananapapaya
banana1
I find problems with replacing values 10 and 11 that should replace strawberry and lemon

Related

How to skip a line every two lines starting by skipping the first line?

Here's my code : ls -lt | sed -n 'p;n'
That code makes me skip from a line to another when listing file names but doesn't start by skipping the first one, how to make that happen?
Here's an exemple without my code to skip to make it clear:
And here's an exemple of when I use the skip code:
You have to invert your sed command: it should be n;p instead of p;n:
Your code:
for x in {1..20}; do echo $x ; done | sed -n 'p;n'
1
3
5
7
9
11
13
15
17
19
The version with sed inverted:
for x in {1..20}; do echo $x ; done | sed -n 'n;p'
Output:
2
4
6
8
10
12
14
16
18
20
You can use sed's ~ operator: first~step
$ seq 1 10 | sed -n '1~2p'
1
3
5
7
9
$ seq 1 10 | sed -n '2~2p'
2
4
6
8
10

prepend text to every n:th line in a textfile

This sed comandline script prepends text on every line in a file:
sed -i 's/^/to be prepended/g' text.txt
How can I make it so it only do that on every nth line?
I am working with sequencing data and in the "norma" multiple fasta format there is first an identifier line staring with a > and then have additional text.
The next line starts with a random DNA sequence like "AATTGCC" and so on when that string is done its new line and new identifier, how can i prepend text (additional bases) to the beginning of the sequence line?
Just use the following GNU sed syntax:
sed '0~Ns/^/to be prepended/'
# ^^^
# set N to the number you want!
for example, prepend HA to lines numbers that are multiple of 4:
$ seq 10 | sed '0~4s/^/HA/'
1
2
3
HA4
5
6
7
HA8
9
10
Or to those that are on the form 4N+1:
$ seq 10 | sed '1~4s/^/HA/'
HA1
2
3
4
HA5
6
7
8
HA9
10
From the sed manual → 3.2. Selecting lines with sed:
first~step
This GNU extension matches every stepth line starting with line first. In particular, lines will be selected when there exists a non-negative n such that the current line-number equals first + (n * step). Thus, to select the odd-numbered lines, one would use 1~2; to pick every third line starting with the second, ‘2~3’ would be used; to pick every fifth line starting with the tenth, use ‘10~5’; and ‘50~0’ is just an obscure way of saying 50.
By the way, there is no need to use /g for global replacement, since ^ can just be replaced once on every line.
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3'
1
2
to be prepended 3
4
5
to be prepended 6
7
8
to be prepended 9
10
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3 - 1'
to be prepended 1
2
3
to be prepended 4
5
6
to be prepended 7
8
9
to be prepended 10
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3 - 2'
1
to be prepended 2
3
4
to be prepended 5
6
7
to be prepended 8
9
10
You have an idea.
seq 15|awk -v line=4 'NR%line==0{$0="Prepend this text : " $0}1'
1
2
3
Prepend this text : 4
5
6
7
Prepend this text : 8
9
10
11
Prepend this text : 12
13
14
15

Character count (length) within specific column

Is there a one-line method to obtain character length for strings held within a specific column of a tab-delimited .txt file and then append these counts onto the final column (number of columns may be variable)?
Sample Data:
1 AA
2 BBB
3 CCCCC
4 EE
5 DDD
6 AAA
7 FFFFF
8 AA
9 BBB
10 NNN
To get the counts, I have attempted to use:
perl -lane 'print length $F[2]' in > out
perl -F, -Mopen=:locale -lane 'print length $F[2]' in > out
However, the results are empty.
I have also tried:
perl -lane '$_.=$F[2]; print length $_'
But this, as I now realise, prints the number of characters for the entire line rather than a specific column.
I am not sure how I would then append the final column.
Desired Output (when counting column 2):
1 AA 2
2 BBB 3
3 CCCCC 5
4 EE 2
5 DDD 3
6 AAA 3
7 FFFFF 5
8 AA 2
9 BBB 3
10 NNN 3
It seems that you were close. Perl array indices start at zero, so how about using the length of $F[1]? You will also need some sort of separator
perl -lape '$_ .= "\t". length($F[1])' input
output
1 AA 2
2 BBB 3
3 CCCCC 5
4 EE 2
5 DDD 3
6 AAA 3
7 FFFFF 5
8 AA 2
9 BBB 3
10 NNN 3
If you want the output exactly as you show, then you will need to use printf like this
perl -lane 'printf qq{%-4d%-8s%d\n}, #F, length($F[1])' input
output
1 AA 2
2 BBB 3
3 CCCCC 5
4 EE 2
5 DDD 3
6 AAA 3
7 FFFFF 5
8 AA 2
9 BBB 3
10 NNN 3

Find "N" minimum and "N" maximum values with respect to a column in the file and print the specific rows

I have a tab delimited file such as
Jack 2 98 F
Jones 6 25 51.77
Mike 8 11 61.70
Gareth 1 85 F
Simon 4 76 4.79
Mark 11 12 38.83
Tony 7 82 F
Lewis 19 17 12.83
James 12 1 88.83
I want to find the N minimum values and N maximum values (more than 5) in th the last print the rows that has those values. I want to ignore the rows with E. For example, if I want minimum two values and maximum in above data, my output would be
Minimum case
Simon 4 76 4.79
Lewis 19 17 12.83
Maximum case
James 12 1 88.83
Mike 8 11 61.70
I can ignore the columns that does not have numeric value in fourth column using
awk -F "\t" '$4+0 != $4{next}1' inputfile.txt
I can also pipe this output and find one minimum value using
awk -F "\t" '$4+0 != $4{next}1' inputfile.txt |awk 'NR == 1 || $4 < min {line = $0; min = $4}END{print line}'
and similarly for maximum value, but how can I extend this to more than one values like 2 values in the toy example above and 10 cases for my real data.
n could be a variable. in this case, I set n=3. not, this may have problem if there are lines with same value in last col.
kent$ awk -v n=3 '$NF+0==$NF{a[$NF]=$0}
END{ asorti(a,k,"#ind_num_asc")
print "min:"
for(i=1;i<=n;i++) print a[k[i]]
print "max:"
for(i=length(a)-n+1;i<=length(a);i++)print a[k[i]]}' f
min:
Simon 4 76 4.79
Lewis 19 17 12.83
Mark 11 12 38.83
max:
Jones 6 25 51.77
Mike 8 11 61.70
James 12 1 88.83
You can get the minimum and maximum at once with a little redirection:
minmaxlines=2
( ( grep -v 'F$' inputfile.txt | sort -n -k4 | tee /dev/fd/4 | head -n $minmaxlines >&3 ) 4>&1 | tail -n $minmaxlines ) 3>&1
Here's a pipeline approach to the problem.
$ grep -v 'F$' inputfile.txt | sort -nk 4 | head -2
Simon 4 76 4.79
Lewis 19 17 12.83
$ grep -v 'F$' inputfile.txt | sort -rnk 4 | tail -2
Mike 8 11 61.70
James 12 1 88.83

print lines if $2<25 from text files with sed or awk

I would like to print $1 and $2 if $2<25from text files. I also need to get the total number of students with marks less than 25 from all files. How can I do this with awk or sed?
students marks
jerry 12
peter 35
john 5
jerry 15
john 10
Desired output
jerry 12
john 5
jerry 15
john 10
Total no:of students :- 4
In awk:
$ awk '$2<25 {print; i++} END{print "\nTotal number of students:- "i}' file
Output:
jerry 12
john 5
jerry 15
john 10
Total number of students:- 4
If you want the output sorted by grade (lowest to highest):
$ sort -n -k2,2 file | awk '$2<25 {print; i++} END{print "\nTotal number of students:- "i}'
Sorted Output:
john 5
john 10
jerry 12
jerry 15
Total number of students:- 4
-n numerical sort;
-k2,2 sort on the second field.
awk '$2<25{count++ ; print}END{print "Total No of Students :-",count}' your_file
tested below:
> awk '$2<25{count++ ; print}END{print "Total No of Students :-",count}' temp
jerry 12
john 5
jerry 15
john 10
Total No of Students :- 4