Rearrange row to column awk or shell [duplicate]

Rearrange row to column awk or shell [duplicate] - perl

This question already has answers here:
An efficient way to transpose a file in Bash
(33 answers)
Closed 6 years ago.
Given a text file file.txt, transpose its content.
For example, if file.txt has the following content:
name age
alice 21
ryan 30
Output the following:
name alice ryan
age 21 30

With shell utils cut and paste:
for f in 1 2 ; do cut -d ' ' -f $f file.txt ; done | paste -d ' ' - - -
Outputs:
name alice ryan
age 21 30
How it works. The field separator in file.txt is a space, so both cut and paste (which have tab as a default field separator) must use the -d ' ' option. We know in advance there are two columns in file.txt, the for loop therefore requires two passes. Pass #1, cut selects column 1 from file.txt, pass #2, column 2. When the for loop ends, what's fed to the pipe '|' looks like:
name alice ryan age 21 30
Then paste outputs that three at a time (hence the three hyphens).

Related

Insert filename into text file with sed

I've been learning about sed and finding it very useful, but cannot find an answer to this in any of the many guides and examples ... I'd like to insert the filename of a text file, minus its path and extension, into a specific line within the text itself. Possible?

In such cases, the correct starting point should be man pages. Manual of sed does not provide a feature for sed to understand "filename", but sed does support inserting a text before/after a line.
As a result you need to isolate the filename separatelly , store the text to a variable and inject this text after/before the line you wish.
Example:
$ a="/home/gv/Desktop/PythonTests/cpu.sh"
$ a="${a##*/}";echo "$a"
cpu.sh
$ a="${a%.*}"; echo "$a"
cpu
$ cat file1
LOCATION 0 X 0
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a $a" file1 # Inject the contents of variable $a after line2
LOCATION 0 X 0
VALUE 1a 2 3
cpu
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2i $a" file1 # Inject the contetns of variable $a before line2
LOCATION 0 X 0
cpu
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a George" file1 #Inject a fixed string "George" after line 2
LOCATION 0 X 0
VALUE 1a 2 3
George
VALUE 1b 2 3
VALUE 1c 2 3
Explanation:
a="${a##*/}" : Removes all chars from the beginning of string up to last found slash / (longer match)
a="${a%.*}" : Remove all chars starting from the end of the string up to the first found dot . (short match) . You can also use %% for the longest found dot.
sed "2a $a" : Insert after line 2 the contents of variable $a
sed "2i $q" : Insert before line 2 the contents of $a
Optionally you can use sed -i to make changes in-place / in file under process

wrt I've been learning about sed then you may have been wasting your time as there isn't a lot TO learn about sed beyond s/old/new. Sure there's a ton of other language constructs and things you could do with sed, but in practice you should avoid them all and simply use awk instead. If you edit your question to include concise, testable sample input and expected output and add an awk tag then we can show you how to do whatever you want to do the right way.
Meanwhile, it sounds like all you need is:
$ cat /usr/tmp/file
a
b
c
d
e
$ awk 'NR==3{print gensub(/.*\//,"",1,FILENAME)} 1' /usr/tmp/file
a
b
file
c
d
e
The above inserts the current file name before line 3 of the open file. It uses GNU awk for gensub(), with other awks you'd just use sub() and a variable.

removing just the middle of the file with sed

I want to use sed as a part of the pipeline to preserve just 10 first and 10 last lines of its input. It would not be working on physical files, but just reading from STDIN and outputting to STDOUT. The amount of data in stream is bigger than machine RAM (or its disk space), so it needs to relatively efficient. It also must work in stream mode, without creating temporary files (no writeable filesystems).
Extra bonus if it could display one line instead of all of the middle it deleted:
for example, if I had input lines containing numbers from 1 to 100000, I would need it to output (line with literal <cut> text would be nice, but is optional):
1
2
3
4
5
6
7
8
9
10
<cut>
99991
99992
99993
99994
99995
99996
99997
99998
99999
100000
The best I've come up with is to have it output first 10 lines, and last one 1 line with:
yes ' ' | head -n 100000 |nl | \
sed -e '$q;11,$d'`
which outputs
1
2
3
4
5
6
7
8
9
10
100000
but I need it to output more context (10 lines instead of just 1) at the end of data too.
Update: length of the input stream is unknown and will vary, 100000 above is just an example.
Update: as noted in the question and the tag, I need it in sed, not awk, perl or other programming languages in which it is more easy to accomplish (that requirement, along with no tmp files, is due to fact it is embedded system with limited commands and resources available)
Update: if the input is less then that 10+10 lines, it should ideally just print the whole input

You can try following command:
sed -n 'H; 1,10 { p; b }; g; s/\n[^\n]*//; h; $ { s/\n/<cut>\n/; p }'
sed has two blocks to save content, pattern space and hold space. The first one is used to parse current line, and the second one can be used as a backup. The approach is to save in hold space the last ten lines processed.
H saves every line to hold space, g recover hold space, then remove oldest line and save again to hold space, and in last line ($) print adding your magic word in front of it.
The whole command:
yes ' ' | head -n 100000 |nl|\
sed -n 'H; 1,10 { p; b }; g; s/\n[^\n]*//; h; $ { s/\n/<cut>\n/; p }'
Yields:
1
2
3
4
5
6
7
8
9
10
<cut>
99991
99992
99993
99994
99995
99996
99997
99998
99999
100000
And said that, follow the advice from Ed Morton, because awk is simpler and easier to debug or modify some weeks later.
UPDATE:
You can append to hold space after first ten lines and check if there are more than 10 newline characters in it before removing oldest as FIFO structure:
sed -n '1,10 { p; b }; H; g; /\(\n[^\n]\+\)\{11\}/ s/\n[^\n]*//; h; $ { s/^\n//; p }'
Now it's more challenging to know where to add the <cut> string in the edge case of 20 input lines, but I will leave it as an exercise for you.

sed is for simple substitutions on a single line, that is all. For anything else, including this task, you should be using awk:
$ cat tst.awk
BEGIN { beg=(beg?beg:3); end=(end?end:3) }
NR<=beg
{ rec[(NR-1)%end+1] = $0 }
END {
print "<cut>"
for (i=1;i<=end;i++) {
print rec[(NR+i-1)%end+1]
}
}
$ seq 10 | awk -f tst.awk
1
2
3
<cut>
8
9
10
$ seq 10 | awk -v beg=2 -v end=4 -f tst.awk
1
2
<cut>
7
8
9
10
I see you've added a "it has to be sed" requirement to your question but I'll leave this answer here for future readers looking for a sensible way to perform the task.

This might work for you (GNU sed):
sed '1,10b;:a;$!{N;s/\n/&/10;Ta;D};i\<cut>' file
Print the first 10 lines as normal. Collect the next 11 lines and if it is not the end of file, delete the first of them and repeat always maintaining the last 10 lines. At the end of the file, insert a line containing <cut> and print the remaining 10 lines.

romove last 5 character of string in Unix

I need to trim last 8 character of a file name
Example:
Input -Vignesh.dat12345678
expected o/p : Vignesh.dat
I tried using rev but it didn't worked.

Your question says that you need to remove 5 characters, while the description says 8 !!!
Standard Format-
echo Vignesh.dat12345678 | rev | cut -c X- | rev
Assuming you want to remove 8 characters,
echo Vignesh.dat12345678 | rev | cut -c 9- | rev
The above code will remove last 8 characters. To remove n characters, simply put (n+1) instead of X.

sed editor: How to remove all fields except ones that I want in text file

I have a file that contains multiple lines with fields (from FIX protocol) like this:
35=V|311=123|515=ABC|825=BBB|9803=AKEFP Oct 12|55=1
35=V|311=456|515=CDE|825=CCC|9803=BUF Nov|55=33|66=8
I need to remove all fields except 311 and 9803, so for the above lines I want to receive:
311=123|9803=AKEFP Oct 12
311=456|9803=BUF Nov
How is it possible to do this with sed editor (or with another application)?

If the format of your data is really consistent and always has the same number of columns in the same order you can do it easily with awk
awk -F'|' '{print $2 "|" $5}' file.dat
This command sets the record separator to | and then prints the second and fifth record for each line. If the structure of your data file is not as consistent and you actually have to pattern match, you can use the following more complicated awk expression
awk -F'|' '/311|9803/{for(i=1;i<=NF;++i){if($i~/311|9803/)printf "%s|", $i} printf "\n"}' file.dat
This will output
311=123|9803=AKEFP Oct 12|
311=456|9803=BUF Nov|
Note the trailing | which if that is really a problem you can edit this after the fact

How to use 'sed or gawk' to delete a text block until the third line previous the last one

Good day,
I was wondering how to delete a text block like this:
1
2
3
4
5
6
7
8
and delete from the second line until the third line previous the last one, to obtain:
1
2
6
7
8
Thanks in advance!!!
BTW This text block is just an example, the real text blocks I working on are huge and each one differs among them in the line numbers.

Getting the number of lines with wc and using awk to print the requested range:
$ awk 'NR<M || NR>N-M' M=3 N="$(wc -l file)" file
1
2
6
7
8
This allows you to easily change the range by just changing the value of M.

This might work for you (GNU sed):
sed '3,${:a;$!{N;s/\n/&/3;Ta;D}}' file
or i f you prefer:
sed '1,2b;:a;$!{N;s/\n/&/3;Ta;D}' file
These always print the first two lines, then build a running window of three lines.
Unless the end of file is reached the first line is popped off the window and deleted. At the end of file the remaining 3 lines are printed.

since you mentioned huge and also line numbers could be differ. I would suggest this awk one-liner:
awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}' file
it processes the input file only once, without (pre) calculating total line numbers
it stores minimal data in memory, in all processing time, only 3 lines data were stored.
If you want to change the filtering criteria, for example, removing from line x to $-y, you just simply change the offset in the oneliner.
add a test:
kent$ seq 8|awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}'
1
2
6
7
8

Using sed:
sed -n '
## Append second line, print first two lines and delete them.
N;
p;
s/^.*$//;
## Read next three lines removing leading newline character inserted
## by the "N" command.
N;
s/^\n//;
N;
:a;
N;
## I will keep three lines in buffer until last line when I will print
## them and exit.
$ { p; q };
## Not last line yet, so remove one line of buffer based in FIFO algorithm.
s/^[^\n]*\n//;
## Goto label "a".
ba
' infile
It yields:
1
2
6
7
8

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Rearrange row to column awk or shell [duplicate] - perl

This question already has answers here: An efficient way to transpose a file in Bash (33 answers) Closed 6 years ago. Given a text file file.txt, transpose its content. For example, if file.txt has the following content: name age alice 21 ryan 30 Output the following: name alice ryan age 21 30

Related

Insert filename into text file with sed

removing just the middle of the file with sed

romove last 5 character of string in Unix

sed editor: How to remove all fields except ones that I want in text file

How to use 'sed or gawk' to delete a text block until the third line previous the last one

Categories

Resources