print every 4 columns to one row in perl or awk - perl

would you please help me how to convert every 4-sequantial rows into one tab-separated column?
convert:
A
1
2
3
3
3
4
1
to :
A 1 2 3
3 3 4 1

A simple way to do this is to use xargs:
$ xargs -n4 < file
A 1 2 3
3 3 4 1
With awk you would do:
$ awk '{printf "%s%s",$0,(NR%4?FS:RS)}' file
A 1 2 3
3 3 4 1
Another flexible approach is to use pr:
$ pr -tas' ' --columns 4 file
A 1 2 3
3 3 4 1
Both the awk and pr solution can be easily modified to change the output separator to a TAB:
$ pr -at --columns 4 file
A 1 2 3
3 3 4 1
$ awk '{printf "%s%s",$0,(NR%4?OFS:RS)}' OFS='\t' file
A 1 2 3
3 3 4 1

$ perl -pe 's{\n$}{\t} if $. % 4' old.file > new.file
or simply (thanks to mpapec's comment):
$ perl -pe 'tr_\n_\t_ if $. % 4' old.file > new.file

Related

Why does sed (insert line) output spaces between each character?

I have split a larger data file into individual 2-column files for each field. This results in something like this:
0.00 3.02211e+07
1.00 3.02211e+07
2.00 3.02211e+07
3.00 3.02211e+07
4.00 3.02211e+07
5.00 3.01295e+07
6.00 3.00608e+07
7.00 2.99768e+07
When I try to add a row via sed,
sed -i '1i pressure-prof' myfile.txt the output has a space character between each character (including existing spaces). If I look in notepad++, the extra spaces appear as the ASCII "NULL". In the terminal it looks like this:
pressure-prof
0 . 0 0 3 . 0 2 2 1 1 e + 0 7
1 . 0 0 3 . 0 2 2 1 1 e + 0 7
2 . 0 0 3 . 0 2 2 1 1 e + 0 7
3 . 0 0 3 . 0 2 2 1 1 e + 0 7
4 . 0 0 3 . 0 2 2 1 1 e + 0 7
5 . 0 0 3 . 0 1 2 9 5 e + 0 7
6 . 0 0 3 . 0 0 6 0 8 e + 0 7
7 . 0 0 2 . 9 9 7 6 8 e + 0 7
This is on Windows, and I think sed is being provided by cygwin or msys2. I don't know if that has anything to do with the output format issues.
Yes, I can resort to opening up files in a text editor and just adding that way. I would like to be able to utilize sed in the future though.
Thanks for any thoughts and assistance.
cat myfile.txt | tr -d ' ' | sed 's/./0 /4' | sed '1s/0 //' > mf2 && mv mf2 myfile.txt
Run that after you've finished adding your rows. Using tr initially wipes all the spaces, and then sed counts to the fourth character and re-adds a space.

How to skip a line every two lines starting by skipping the first line?

Here's my code : ls -lt | sed -n 'p;n'
That code makes me skip from a line to another when listing file names but doesn't start by skipping the first one, how to make that happen?
Here's an exemple without my code to skip to make it clear:
And here's an exemple of when I use the skip code:
You have to invert your sed command: it should be n;p instead of p;n:
Your code:
for x in {1..20}; do echo $x ; done | sed -n 'p;n'
1
3
5
7
9
11
13
15
17
19
The version with sed inverted:
for x in {1..20}; do echo $x ; done | sed -n 'n;p'
Output:
2
4
6
8
10
12
14
16
18
20
You can use sed's ~ operator: first~step
$ seq 1 10 | sed -n '1~2p'
1
3
5
7
9
$ seq 1 10 | sed -n '2~2p'
2
4
6
8
10

prepend text to every n:th line in a textfile

This sed comandline script prepends text on every line in a file:
sed -i 's/^/to be prepended/g' text.txt
How can I make it so it only do that on every nth line?
I am working with sequencing data and in the "norma" multiple fasta format there is first an identifier line staring with a > and then have additional text.
The next line starts with a random DNA sequence like "AATTGCC" and so on when that string is done its new line and new identifier, how can i prepend text (additional bases) to the beginning of the sequence line?
Just use the following GNU sed syntax:
sed '0~Ns/^/to be prepended/'
# ^^^
# set N to the number you want!
for example, prepend HA to lines numbers that are multiple of 4:
$ seq 10 | sed '0~4s/^/HA/'
1
2
3
HA4
5
6
7
HA8
9
10
Or to those that are on the form 4N+1:
$ seq 10 | sed '1~4s/^/HA/'
HA1
2
3
4
HA5
6
7
8
HA9
10
From the sed manual → 3.2. Selecting lines with sed:
first~step
This GNU extension matches every stepth line starting with line first. In particular, lines will be selected when there exists a non-negative n such that the current line-number equals first + (n * step). Thus, to select the odd-numbered lines, one would use 1~2; to pick every third line starting with the second, ‘2~3’ would be used; to pick every fifth line starting with the tenth, use ‘10~5’; and ‘50~0’ is just an obscure way of saying 50.
By the way, there is no need to use /g for global replacement, since ^ can just be replaced once on every line.
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3'
1
2
to be prepended 3
4
5
to be prepended 6
7
8
to be prepended 9
10
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3 - 1'
to be prepended 1
2
3
to be prepended 4
5
6
to be prepended 7
8
9
to be prepended 10
$ seq 10 | perl -pe's/^/to be prepended / unless $. % 3 - 2'
1
to be prepended 2
3
4
to be prepended 5
6
7
to be prepended 8
9
10
You have an idea.
seq 15|awk -v line=4 'NR%line==0{$0="Prepend this text : " $0}1'
1
2
3
Prepend this text : 4
5
6
7
Prepend this text : 8
9
10
11
Prepend this text : 12
13
14
15

Character count (length) within specific column

Is there a one-line method to obtain character length for strings held within a specific column of a tab-delimited .txt file and then append these counts onto the final column (number of columns may be variable)?
Sample Data:
1 AA
2 BBB
3 CCCCC
4 EE
5 DDD
6 AAA
7 FFFFF
8 AA
9 BBB
10 NNN
To get the counts, I have attempted to use:
perl -lane 'print length $F[2]' in > out
perl -F, -Mopen=:locale -lane 'print length $F[2]' in > out
However, the results are empty.
I have also tried:
perl -lane '$_.=$F[2]; print length $_'
But this, as I now realise, prints the number of characters for the entire line rather than a specific column.
I am not sure how I would then append the final column.
Desired Output (when counting column 2):
1 AA 2
2 BBB 3
3 CCCCC 5
4 EE 2
5 DDD 3
6 AAA 3
7 FFFFF 5
8 AA 2
9 BBB 3
10 NNN 3
It seems that you were close. Perl array indices start at zero, so how about using the length of $F[1]? You will also need some sort of separator
perl -lape '$_ .= "\t". length($F[1])' input
output
1 AA 2
2 BBB 3
3 CCCCC 5
4 EE 2
5 DDD 3
6 AAA 3
7 FFFFF 5
8 AA 2
9 BBB 3
10 NNN 3
If you want the output exactly as you show, then you will need to use printf like this
perl -lane 'printf qq{%-4d%-8s%d\n}, #F, length($F[1])' input
output
1 AA 2
2 BBB 3
3 CCCCC 5
4 EE 2
5 DDD 3
6 AAA 3
7 FFFFF 5
8 AA 2
9 BBB 3
10 NNN 3

replace character with increasing numbers per line

I have large matrix files consisting of only "0" and "a" in clolumns and I want to do what this does:
perl -pe 'BEGIN { our $i = 1; } s/a/($i++)/ge;'; < FILE > NEW_FILE
but only increment once for each line instead of every instance on each line.
So if my first line in the file is:
0 0 a a a
The perl command gives me:
0 0 1 2 3
While i would want
0 0 1 1 1
and on the next line for instance 2 0 2 0 2 and so on...
This should be possible to do with awk, but using:
'{ i=1; gsub(/a/,(i+1));print}' tmp2
just gives me 0's and 2's for all lines...
Just increment before, not on every substitution:
awk '{i++; gsub(/a/,i)}1' file
This way, the variable gets updated once per line, not once per record.
The same applies to the Perl script:
perl -pe 'BEGIN { our $i = 0; } $i++; s/a/$i/ge;' file
Test
$ cat a
0 0 a a a
2 3 a a a
$ awk '{i++; gsub(/a/,i)}1' a
0 0 1 1 1
2 3 2 2 2
$ perl -pe 'BEGIN { our $i = 0; } $i++; s/a/$i/ge;' a
0 0 1 1 1
2 3 2 2 2
You can simply replace every occurrence of a with the current line number
perl -pe 's/a/$./g' FILE > NEW_FILE
perl -pe'$i++;s/a/$i/g'
or if you like to increment only for lines with any substitution
perl -pe'/a/&&$i++;s/a/$i/g'
In action:
$ cat a
0 0 a a a
1 2 0 0 0
2 3 a a a
$ perl -pe'$i++;s/a/$i/g' a
0 0 1 1 1
1 2 0 0 0
2 3 3 3 3
$ perl -pe'/a/&&$i++;s/a/$i/g' a
0 0 1 1 1
1 2 0 0 0
2 3 2 2 2