Separating a nested field into two new fields, maintaining order

Separating a nested field into two new fields, maintaining order - sed

I've been trying to break a sample file as below such that the third column becomes two parts while maintaining order within the file.
100 400 500.00APPLE 5.8 9.2
200 300 600.00DOG 5.3 9.1
300 763 454.44KITTEN 5.7 9.2
Should result in
100 400 500.00 APPLE 5.8 9.2
200 300 600.00 DOG 5.3 9.1
300 763 454.44 KITTEN 5.7 9.2
I've toyed doing this in awk but seem to be having issues.
PS: The point upon which to separate is always a digit [0-9] followed by [a-zA-Z] in regex.

Try:
sed 's/\([0-9]\)\([A-Z]\)/\1 \2/' ./infile
Proof of Concept
$ sed 's/\([0-9]\)\([A-Z]\)/\1 \2/' ./infile
100 400 500.00 APPLE 5.8 9.2
200 300 600.00 DOG 5.3 9.1
300 763 454.44 KITTEN 5.7 9.2
Or if you have gawk you can limit the split to just the 3rd field by using:
awk '{$3=gensub(/([0-9])([A-Z])/,"\\1 \\2","",$3)}1' ./infile
Proof of Concept
$ awk '{$3=gensub(/([0-9])([A-Z])/,"\\1 \\2","",$3)}1' ./infile
100 400 500.00 APPLE 5.8 9.2
200 300 600.00 DOG 5.3 9.1
300 763 454.44 KITTEN 5.7 9.2

Related

Understanding sed hold-space work-flow

I would like to print out the last line of a file which contains one or more integers. "Hippo 9991" in example below. I tried to achieve this with gsed -n -r '/[0-9]+/h;x;$p' command, but this doesn't quite work:
$ cat testfile
dog
lion 34
elephant
tiger 7
hippo 9991
zebra
gepard
cat
$ cat testfile | gsed -n -r '/[0-9]+/h;x;$p'
gepard
$
Could somebody explain what exactly gsed -n -r '/[0-9]+/h;x;$p' does? As I understand, it should remove the trailing new-line character from line and read the line into pattern space. Then if the line in pattern space contains one or more integers, the line is put into hold space by replacing the previous data in hold space. This cycle is repeated until the last line which will be printed. Obviously I do not understand this correctly. More than a correct answer I would like to understand the work-flow of sed.

You almost have it. Here is what your script does:
/[0-9]+/h # if line contains a number, save the line to hold space
x # swap content of pattern space and hold space
$p # when on the last line print pattern space
You save the line to hold space then swap it back to pattern space. The contents of pattern space and hold space can be illustrated like this:
Line Command Pattern Space Hold Space
~~~~ ~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~
1 /[0-9]+/h dog
1 x dog
2 /[0-9]+/h lion 34 lion 34
2 x lion 34 lion 34
3 /[0-9]+/h elephant lion 34
3 x lion 34 elephant
4 /[0-9]+/h tiger 7 tiger 7
4 x tiger 7 tiger 7
.
.
.
$ /[0-9]+/h cat geopard
$ x geopard cat
$ p geopard cat
What you really want is to only swap contents when the last line of the input file is reached. You can do this by grouping the x and p commands:
gsed -n -r '/[0-9]+/h; $ {x;p}' testfile
Output:
hippo 9991
The corresponding pattern space and hold space sequence is now:
Line Command Pattern Space Hold Space
~~~~ ~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~
1 /[0-9]+/h dog
2 /[0-9]+/h lion 34 lion 34
3 /[0-9]+/h elephant lion 34
4 /[0-9]+/h tiger 7 tiger 7
.
.
.
$ /[0-9]+/h cat hippo 9991
$ x hippo 9991 cat
$ p hippo 9991 cat

The following works for me:
sed -n -r '/[0-9]+/ {h;x}; ${x;p}'
You want to run both h and x only if the integer is present, in your example, x is run every time. At the end, you don't want to print the last line, but the last stored line, so you have to exchange them once more.

I can not help you with the sed version, but an awk solution could easely do it.
awk '/[0-9]+/ {f=$0} END {print f}' file
hippo 9991

This might work for you (GNU sed):
sed '/[0-9]/h;$!d;x' file
If a line contains a digit, hive it away in the hold space (previously hived away lines will be overwritten!). All but the last line delete (deleted lines never get to be printed). On the last line swap to the hold space. The natural flow of the program prints the last line containing a digit with no need for options.

I would do:
sed -n -r '/[0-9]+/{h}; ${x;p}' file
h overwrites hold space with current (matched) line
when till the last line($), we (x) exchange the pattern/hold space, and print the content of hold place, which would be the last matching line of the pattern [0-9]+.

grep [0-9] testfile | tail -1
Has the disadvantage that we don't get to learn about "sed" but so much simpler.

Extract every nth number from a txt file

So I have a txt file where I need to extract every third number and print it to separate file using Terminal. The txt file is just a long list of numbers, tab delimited:
18 25 0 18 24 5 18 23 5 18 22 8.2 ...
I know there is a way to do this using sed or awk, but so far I've only been able to extract every third line by using:
awk 'NR%3==1' testRain.txt > rainOnly.txt
So here's the answer (or rather, the answer I utilized!):
xargs -n1 < input.txt | awk '!(NR%3)' > output.txt
This gives you an output.txt that has every third number of the original file as a separate line.

A quick pipe line to extract every 3rd number:
$ xargs -n1 < file | sed '3~3!d'
0
5
5
8.2
If you don't want each number on a newline throw the result back through xargs:
$ xargs -n1 < file | sed '3~3!d' | xargs
0 5 5 8.2
Use redirection to store the output in a new file:
$ xargs -n1 < file | sed '3~3!d' | xargs > new_file
With awk using a simple for loop you could do:
$ awk '{for(i=3;i<=NF;i+=3)print $i}' file
0
5
5
8.2
or (adds a trailing tab):
$ awk '{for(i=3;i<=NF;i+=3)printf "%s\t",$i;print ""}' file
0 5 5 8.2
Or by setting the value of RS (adds trailing newline):
$ awk '!(NR%3)' RS='\t' file
0
5
5
8.2
$ awk '!(NR%3)' RS='\t' ORS='\t' file
0 5 5 8.2

You can print every third character by substituting the next two with nothing, globally. When the count straddles a newline, using Perl might be the simplest solution:
perl -p000 -e 's/(.)../$1/gs'
If you want the first, fourth etc character from every line, a line-oriented tool like sed suffices:
sed 's/\(.\)../\1/g'

Using grep -P
grep -oP '([^\t]+\t){2}\K[^\t\n]+' file
0
5
5
8.2

This might work for you (GNU sed):
sed -r 's/(\S+\s){3}/\1/g;s/\s$//' file

#user2718946
Your solution was close, but here you are without xarg.
awk 'NR%3==1' RS=" " file
18
18
18
18
Different start:
awk 'NR%3==0' RS=" " file
0
5
5
8.2

strip the last and first character from a String

Is fairly easy to strip the first and last character from a string using awk/sed?
Say I have this string
( 1 2 3 4 5 6 7 )
I would like to strip parentheses from it.
How should I do this?

sed way
$ echo '( 1 2 3 4 5 6 7 )' | sed 's/^.\(.*\).$/\1/'
1 2 3 4 5 6 7
awk way
$ echo '( 1 2 3 4 5 6 7 )' | awk '{print substr($0, 2, length($0) - 2)}'
1 2 3 4 5 6 7
POSIX sh way
$ var='( 1 2 3 4 5 6 7 )'; var="${var#?}"; var="${var%?}"; echo "$var"
1 2 3 4 5 6 7
bash way
$ var='( 1 2 3 4 5 6 7 )'; echo "${var:1: -1}"
1 2 3 4 5 6 7
If you use bash then use the bash way.
If not, prefer the posix-sh way. It is faster than loading sed or awk.
Other than that, you may also be doing other text processing, that you can combine with this, so depending on the rest of the script you may benefit using sed or awk in the end.
why doesn't this work? sed '..' s_res.temp > s_res.temp ?
This does not work, as the redirection > will truncate the file before it is read.
To solve this you have some choices:
what you really want to do is edit the file. sed is a stream editor not a file editor.
ed though, is a file editor (the standard one too!). So, use ed:
$ printf '%s\n' "%s/^.\(.*\).$/\1/" "." "wq" | ed s_res.temp
use a temporary file, and then mv it to replace the old one.
$ sed 's/^.\(.*\).$/\1/' s_res.temp > s_res.temp.temp
$ mv s_res.temp.temp s_res.temp
use -i option of sed. This only works with GNU-sed, as -i is not POSIX and GNU-only:
$ sed -i 's/^.\(.*\).$/\1/' s_res.temp
abuse the shell (not recommended really):
$ (rm test; sed 's/XXX/printf/' > test) < test
On Mac OS X (latest version 10.12 - Sierra) bash is stuck to version 3.2.57 which is quite old. One can always install bash using brew and get version 4.x which includes the substitutions needed for the above to work.
There is a collection of bash versions and respective changes, compiled on the bash-hackers wiki

To remove the first and last characters from a given string, I like this sed:
sed -e 's/^.//' -e 's/.$//'
# ^^ ^^
# first char last char
See an example:
sed -e 's/^.//' -e 's/.$//' <<< "(1 2 3 4 5 6 7)"
1 2 3 4 5 6 7

And also a perl way:
perl -pe 's/^.|.$//g'

If I want to remove the First (1) character and the last two (2) characters using sed.
Input "t2.large",
Output t2.large
sed -e 's/^.//' -e 's/..$//'
`

How can I emulate `uniq -d` in awk?

I've got a busybox system which doesn't have uniq and I'd like to generate a unique list of duplicated lines.
A plain uniq emulated in awk would be:
sort <filename> | awk '!($0 in a){a[$0]; print}'
How can I use awk (or sed for that matter, not perl) to accomplish:
sort <filename> | uniq -d

On a busybox system, you might need to save bytes. ;-)
awk ++a[\$0]==2

Could do this (needn't sort it):
awk '{++a[$0]; if(a[$0] == 2) print}'

This might work for you:
# make some test data
seq 25 >/tmp/a
seq 3 3 25 >>/tmp/a
seq 5 5 25 >>/tmp/a
# run old command
sort -n /tmp/a | uniq -d
3
5
6
9
10
12
15
18
20
21
24
25
# run sed command
sort -n /tmp/a |
sed ':a;$bb;N;/^\([^\n]*\)\(\n\1\)*$/ba;:b;/^\([^\n]*\)\(\n\1\)*/{s//\1/;P};D'
3
5
6
9
10
12
15
18
20
21
24
25

Sed remove all content in line after \t

I have o/p like
19599 user 20 0 120m 32m 4260 S 14.0 5.3 3:21.13 app.out \t Wed Jun 8 09:31:06 UTC 2011
19599 user 20 0 120m 32m 4260 S 14.0 5.4 3:21.61 app.out \t Wed Jun 8 09:31:12 UTC 2011
19599 user 20 0 121m 32m 4260 S 12.0 5.4 3:22.31 app.out \t Wed Jun 8 09:31:17 UTC 2011
I want to remove all character starting from \t in the line.
How can I do that with sed?
I tried with awk -F t '{print $1}'
but it removing t from app.out .
I want o/p like
19599 user 20 0 120m 32m 4260 S 14.0 5.3 3:21.13 app.out
19599 user 20 0 120m 32m 4260 S 14.0 5.4 3:21.61 app.out
19599 user 20 0 121m 32m 4260 S 12.0 5.4 3:22.31 app.out
If I wrote the awk like this:
awk -F t '{print $1"t"}'
it works fine, but it is only a work around. How can I remove all character starting from \t in the line till end of line?

If the output contains the two characters backslash and 't', then you use:
sed 's/ *\\t.*//'
This removes the blanks leading up to the two characters, the backslash and the 't', plus everything after them.
If the output contains a tab character, then you need to replace the '\\t' with an actual tab character.

It sounds like you want the first field in a tab-delimited text. You might try one of:
cut -d $'\t' -f 1
awk -F '\t' '{print $1}'
sed $'s/\t.*//'
The $'' syntax is used in bash (and ksh and zsh I believe) to more easily allow for embedding escape sequences in strings.

awk 'BEGIN { FS = "\t" } 1 == 1 {print $1}' file.name

Just pipe it through:
sed 's/\(.*\)\t.*/\1/'

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Separating a nested field into two new fields, maintaining order - sed

Related

Understanding sed hold-space work-flow

Extract every nth number from a txt file

strip the last and first character from a String

How can I emulate `uniq -d` in awk?

Sed remove all content in line after \t

Categories

Resources