How to skip a line every two lines starting by skipping the first line? - sh

Here's my code : ls -lt | sed -n 'p;n'
That code makes me skip from a line to another when listing file names but doesn't start by skipping the first one, how to make that happen?
Here's an exemple without my code to skip to make it clear:
And here's an exemple of when I use the skip code:

You have to invert your sed command: it should be n;p instead of p;n:
Your code:
for x in {1..20}; do echo $x ; done | sed -n 'p;n'
1
3
5
7
9
11
13
15
17
19
The version with sed inverted:
for x in {1..20}; do echo $x ; done | sed -n 'n;p'
Output:
2
4
6
8
10
12
14
16
18
20

You can use sed's ~ operator: first~step
$ seq 1 10 | sed -n '1~2p'
1
3
5
7
9
$ seq 1 10 | sed -n '2~2p'
2
4
6
8
10

Related

prepend text to every n:th line in a textfile

This sed comandline script prepends text on every line in a file:
sed -i 's/^/to be prepended/g' text.txt
How can I make it so it only do that on every nth line?
I am working with sequencing data and in the "norma" multiple fasta format there is first an identifier line staring with a > and then have additional text.
The next line starts with a random DNA sequence like "AATTGCC" and so on when that string is done its new line and new identifier, how can i prepend text (additional bases) to the beginning of the sequence line?
Just use the following GNU sed syntax:
sed '0~Ns/^/to be prepended/'
# ^^^
# set N to the number you want!
for example, prepend HA to lines numbers that are multiple of 4:
$ seq 10 | sed '0~4s/^/HA/'
1
2
3
HA4
5
6
7
HA8
9
10
Or to those that are on the form 4N+1:
$ seq 10 | sed '1~4s/^/HA/'
HA1
2
3
4
HA5
6
7
8
HA9
10
From the sed manual → 3.2. Selecting lines with sed:
first~step
This GNU extension matches every stepth line starting with line first. In particular, lines will be selected when there exists a non-negative n such that the current line-number equals first + (n * step). Thus, to select the odd-numbered lines, one would use 1~2; to pick every third line starting with the second, ‘2~3’ would be used; to pick every fifth line starting with the tenth, use ‘10~5’; and ‘50~0’ is just an obscure way of saying 50.
By the way, there is no need to use /g for global replacement, since ^ can just be replaced once on every line.
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3'
1
2
to be prepended 3
4
5
to be prepended 6
7
8
to be prepended 9
10
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3 - 1'
to be prepended 1
2
3
to be prepended 4
5
6
to be prepended 7
8
9
to be prepended 10
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3 - 2'
1
to be prepended 2
3
4
to be prepended 5
6
7
to be prepended 8
9
10
You have an idea.
seq 15|awk -v line=4 'NR%line==0{$0="Prepend this text : " $0}1'
1
2
3
Prepend this text : 4
5
6
7
Prepend this text : 8
9
10
11
Prepend this text : 12
13
14
15

print every 4 columns to one row in perl or awk

would you please help me how to convert every 4-sequantial rows into one tab-separated column?
convert:
A
1
2
3
3
3
4
1
to :
A 1 2 3
3 3 4 1
A simple way to do this is to use xargs:
$ xargs -n4 < file
A 1 2 3
3 3 4 1
With awk you would do:
$ awk '{printf "%s%s",$0,(NR%4?FS:RS)}' file
A 1 2 3
3 3 4 1
Another flexible approach is to use pr:
$ pr -tas' ' --columns 4 file
A 1 2 3
3 3 4 1
Both the awk and pr solution can be easily modified to change the output separator to a TAB:
$ pr -at --columns 4 file
A 1 2 3
3 3 4 1
$ awk '{printf "%s%s",$0,(NR%4?OFS:RS)}' OFS='\t' file
A 1 2 3
3 3 4 1
$ perl -pe 's{\n$}{\t} if $. % 4' old.file > new.file
or simply (thanks to mpapec's comment):
$ perl -pe 'tr_\n_\t_ if $. % 4' old.file > new.file

Find "N" minimum and "N" maximum values with respect to a column in the file and print the specific rows

I have a tab delimited file such as
Jack 2 98 F
Jones 6 25 51.77
Mike 8 11 61.70
Gareth 1 85 F
Simon 4 76 4.79
Mark 11 12 38.83
Tony 7 82 F
Lewis 19 17 12.83
James 12 1 88.83
I want to find the N minimum values and N maximum values (more than 5) in th the last print the rows that has those values. I want to ignore the rows with E. For example, if I want minimum two values and maximum in above data, my output would be
Minimum case
Simon 4 76 4.79
Lewis 19 17 12.83
Maximum case
James 12 1 88.83
Mike 8 11 61.70
I can ignore the columns that does not have numeric value in fourth column using
awk -F "\t" '$4+0 != $4{next}1' inputfile.txt
I can also pipe this output and find one minimum value using
awk -F "\t" '$4+0 != $4{next}1' inputfile.txt |awk 'NR == 1 || $4 < min {line = $0; min = $4}END{print line}'
and similarly for maximum value, but how can I extend this to more than one values like 2 values in the toy example above and 10 cases for my real data.
n could be a variable. in this case, I set n=3. not, this may have problem if there are lines with same value in last col.
kent$ awk -v n=3 '$NF+0==$NF{a[$NF]=$0}
END{ asorti(a,k,"#ind_num_asc")
print "min:"
for(i=1;i<=n;i++) print a[k[i]]
print "max:"
for(i=length(a)-n+1;i<=length(a);i++)print a[k[i]]}' f
min:
Simon 4 76 4.79
Lewis 19 17 12.83
Mark 11 12 38.83
max:
Jones 6 25 51.77
Mike 8 11 61.70
James 12 1 88.83
You can get the minimum and maximum at once with a little redirection:
minmaxlines=2
( ( grep -v 'F$' inputfile.txt | sort -n -k4 | tee /dev/fd/4 | head -n $minmaxlines >&3 ) 4>&1 | tail -n $minmaxlines ) 3>&1
Here's a pipeline approach to the problem.
$ grep -v 'F$' inputfile.txt | sort -nk 4 | head -2
Simon 4 76 4.79
Lewis 19 17 12.83
$ grep -v 'F$' inputfile.txt | sort -rnk 4 | tail -2
Mike 8 11 61.70
James 12 1 88.83

bash merge files by matching columns

I do have two files:
File1
12 abc
34 cde
42 dfg
11 df
9 e
File2
23 abc
24 gjr
12 dfg
8 df
I want to merge files column by column (if column 2 is the same) for the output like this:
File1 File2
12 23 abc
42 12 dfg
11 8 df
34 NA cde
9 NA e
NA 24 gjr
How can I do this?
I tried it like this:
cat File* >> tmp; sort tmp | uniq -c | awk '{print $2}' > column2; for i in
$(cat column2); do grep -w "$i" File*
But this is where I am stuck...
Don't know how after greping I should combine files column by column & write NA where value is missing.
Hope someone could help me with this.
Since I was testing with bash 3.2 running as sh (which does not have process substitution as sh), I used two temporary files to get the data ready for use with join:
$ sort -k2b File2 > f2.sort
$ sort -k2b File1 > f1.sort
$ cat f1.sort
12 abc
34 cde
11 df
42 dfg
9 e
$ cat f2.sort
23 abc
8 df
12 dfg
24 gjr
$ join -1 2 -2 2 -o 1.1,2.1,0 -a 1 -a 2 -e NA f1.sort f2.sort
12 23 abc
34 NA cde
11 8 df
42 12 dfg
9 NA e
NA 24 gjr
$
With process substitution, you could write:
join -1 2 -2 2 -o 1.1,2.1,0 -a 1 -a 2 -e NA <(sort -k2b File1) <(sort -k2b File2)
If you want the data formatted differently, use awk to post-process the output:
$ join -1 2 -2 2 -o 1.1,2.1,0 -a 1 -a 2 -e NA f1.sort f2.sort |
> awk '{ printf "%-5s %-5s %s\n", $1, $2, $3 }'
12 23 abc
34 NA cde
11 8 df
42 12 dfg
9 NA e
NA 24 gjr
$

SED: How to remove every 10 lines in a file (thin or subsample the file)

I have this so far:
sed -n '0,10p' yourfile > newfile
But it is not working, just outputs a blank file :(
Your question is ambiguous, so here is every permutation I can think of:
Print only the first 10 lines
head -n10 yourfile > newfile
Skip the first 10 lines
tail -n+10 yourfile > newfile
Print every 10th line
awk '!(NR%10)' yourfile > newfile
Delete every 10th line
awk 'NR%10' yourfile > newfile
(Since an ambiguous questions can only have an ambiguous answer...)
To print every tenth line (GNU sed):
$ seq 1 100 | sed -n '0~10p'
10
20
30
40
...
100
Alternatively (GNU sed):
$ seq 1 100 | sed '0~10!d'
10
20
30
40
...
100
To delete every tenth line (GNU sed):
$ seq 1 100 | sed '0~10d'
1
...
9
11
...
19
21
...
29
31
...
39
41
...
To print the first ten lines (POSIX):
$ seq 1 100 | sed '11,$d'
1
2
3
4
5
6
7
8
9
10
To delete the first ten lines (POSIX):
$ seq 1 100 | sed '1,10d'
11
12
13
14
...
100
python -c "import sys;sys.stdout.write(''.join(line for i, line in enumerate(open('yourfile')) if i%10 == 0 ))" >newfile
It is longer, but it is a single language - not different syntax and aprameters for each thing one tries to do.
With non-GNU sed, to print every 10th line use
sed '10,${p;n;n;n;n;n;n;n;n;n;}'
(GNU : sed -n '0~10p')
and to delete every 10th line use
sed 'n;n;n;n;n;n;n;n;n;d;'
(GNU : sed -n '0~10d')