Extract a substring using command line utilities - command-line

I have a text file including lines in the form of:
(term1 x:a y:b (term2 z:c k:a))
I want to extract only terms from this line using command line utilities such as awk, grep, sed. i.e I want the result to be:
term1
term2
I have formed a regex matching the rest but the terms, but could not find a way to negate it.
(\()|( \()|( (.*?) \()|( (.*?)\)+)
How can I form a command extracting the every substring after '(' and before ' '?
Thanks

Try this:
sed "s/(\([^ (]*\)[^(]*/\1\n/g"
For example:
$ echo "(term1 x:a y:b (term2 (term3) z:c k:a) x (termX a:b ) )" | sed "s/(\([^ )]*\)[^(]*/\1\n/g"
term1
term2
term3
termX

Related

How to substitute with basic regex with alternating signs?

I want to do the following to all of the statements in the file:
Input: xblahxxblahxxblahblahx
Output: <blah><blah><blahblah>
So far I am thinking of using sed -i 's/x/</g' something.ucli
You can use
sed 's/x\([^x]*\)x/<\1>/g'
Details:
x - an x
\([^x]*\) - Group 1 (\1 refers to this group value from the replacement pattern): zero or more (*) chars other than x ([^x])
x - an x
See the online demo:
#!/bin/bash
s='xblahxxblahxxblahblahx'
sed 's/x\([^x]*\)x/<\1>/g' <<< "$s"
# => <blah><blah><blahblah>
If x is a multichar string, e.g.xyz, it will be easier with perl:
perl -pe 's/xyz(.*?)xyz/<$1>/g'
See this online demo.

Appending using sed pattern after certain line number

I am using following command to append string after AMP, but now I want to add after to AMP which is after SET2 or line number 9, can we modify this command to append the string only after SET2 or line number 9? And if I want to add to only to SET1 AMPs or before line number 9 , could someone help me with the command, thanks.
$ sed -i '/AMP/a Target4' test.txt
$ cat test.txt
#SET1
AMP
Target 1
Target 2
AMP
Target 3
Target 4
Target 5
#Set2
AMP
Target 11
Target 12
Note there is no line between above text.
Would you please try the following:
sed -i '
/^#Set2/,${ ;# if the line starts with "#Set2", execute the {block} until the last line $
/AMP/a Target4 ;# append the string after "AMP"
} ;# end of the block
' test.txt
If you want to append the string before the #Set2 line, please try:
sed -i '
1,/^#Set2/ { ;# excecute the {block} while the line number >= 1 until the line matches the pattern /^#Set2/
/AMP/a Target4
}
' test.txt
The expression address1,address2 is a flip-flop operator. Once the
address1 (line number, regular expression, or other condition) meets,
the operator keeps on returning true until the address2 meets.
Then the following command or block is executed from address1 until
address2.
If you want to add to after AMP which is after #Set2 or line number 9,
I think it is better to process up to the 8th line and after the 9th line separately.
For example, the command is below:
sed '
1,8{
/^#Set2/,${
/AMP/a Target4
}
}
9,${
/AMP/a Target4
}' test.txt

Multiple mathematical operations on a file containing numbers

I have extracted the following data using 'grep' & 'sed' pipes from a file and now I want to perform a mathematical equation on the last two numbers, delete them and replace them with a single number.
Mathematical operations
Add the numbers together
divide by 2
multiply by 141
ROUNDUP to whole number
File Data
AJ29 IO_0_VRN_10 77.234 78.011
AJ30 IO_L1P_T0_100M 89.886 90.789
AJ31 IO_L1N_T0_100S 101.388 102.406
AK29 IO_L2P_T0_101M 66.163 66.828
AL29 IO_L2N_T0_101S 63.626 64.266
So the line starting AJ29 should appear as:
AJ29 IO_0_VRN_10 10945
I could put it in MS excel / Open Office calc and do this but want to avoid MS and keep it in a single linux script if it is possible. Hope you can help. The script I have so far is below and ideally I'd like to add a few more pipes to achieve this.
grep IOB xc7vx690tffg1930.pkg | sed 's/pin//g' | sed 's/IOB_[A-Za-z0-9]*//g' | sed 's/ /-/g' | sed 's/\t//g' | sed 's/^[-]*//g' | sed 's/-/ /g' | sed 's/ [0-9][0-9] //g' | sed 's/[[:space:]]\+/,/g' | sed 's/,X[0-9A-Z]*,//g' | sed 's/,[0-9]*[A-Z],//g' | sed 's/N\.A\.,/,/g' | sed 's/,$//g' | sed 's/,/ /g'
For calculations, use awk!
$ awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 141); NF--}1' file
AJ29 IO_0_VRN_10 10945
AJ30 IO_L1P_T0_100M 12738
AJ31 IO_L1N_T0_100S 14367
AK29 IO_L2P_T0_101M 9376
AL29 IO_L2N_T0_101S 9016
This replaces the penultimate field with the result of (penultimate*last)/2 * 141). To make it round, we use %.0f format as indicated in Awk printf number in width and round it up.
Also, it looks to me that you are piping way too many things: I counted one call to grep and 13 (!) to sed. You can probably use sed -e 'first block' -e 'second block' ... instead.
Explanation
In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
{...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
$(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
{$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
{...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".

use sed to change a text report to csv

I have a report looks like this:
par_a
.xx
.yy
par_b
.zz
.tt
I wish to convert this format into csv format as below using sed 1 liner:
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
please help.
With awk:
awk '/^par_/{v=$0;next}/^ /{$0=v","$1;print}' File
Or to make it more generic:
awk '/^[^[:blank:]]/{v=$0;next} /^[[:blank:]]/{$0=v","$1;print}' File
When a line starts with par_, save the content to variable v. Now, when a line starts with space, change the line to content of v followed by , followed by the first field.
Output:
AMD$ awk '/^par_/{v=$0}/^ /{$0=v","$1;print}' File
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
With sed:
sed '/^par_/ { h; d; }; G; s/^[[:space:]]*//; s/\(.*\)\n\(.*\)/\2,\1/' filename
This works as follows:
/^par_/ { # if a new paragraph begins
h # remember it
d # but don't print anything yet
}
# otherwise:
G # fetch the remembered paragraph line to the pattern space
s/^[[:space:]]*// # remove leading whitespace
s/\(.*\)\n\(.*\)/\2,\1/ # rearrange to desired CSV format
Depending on your actual input data, you may want to replace the /^par_/ with, say, /^[^[:space:]]/. It just has to be a pattern that recognizes the beginning line of a paragraph.
Addendum: Shorter version that avoids regex repetition when using the space pattern to recognize paragraphs:
sed -r '/^\s+/! { h; d; }; s///; G; s/(.*)\n(.*)/\2,\1/' filename
Or, if you have to use BSD sed (as comes with Mac OS X):
sed '/^[[:space:]]\{1,\}/! { h; d; }; s///; G; s/\(.*\)\n\(.*\)/\2,\1/' filename
The latter should be portable to all seds, but as you can see, writing portable sed involves some pain.

How to escape minus in regular expression with sed?

I need to free a string from unwanted characters. In this example I want to filter all +'s and all -'s from b and write the result to c. So if b is +fdd-dfdf+, c should be +-+.
read b
c=$(echo $b | sed 's/[^(\+|\-)]//g')
But when i run the script, the console says:
sed: -e expression #1, char 15: Invalid range end
The reason is the \- in my regular expression. How can I solve this problem and say, that I want to filter all -'s?
are you looking for this?
kent$ echo 'a + b + c - d - e'|sed 's/[^-+]//g'
++--