I am trying to replace a hyphen character - in the key name of a JSON file with _ without impacting the value side of the key-value pair.
Example input:
{
"outcome": "failed",
"failure-description": "ra ra ra - and more",
"rolled-back": true
}
Is there any way to do this using sed? It could be a match pattern where sed would only replace between "(.*[^"])": but I have not been able to work out how to replace the unwanted character in the matched substring.
The expected result would look like this:
{
"outcome": "failed",
"failure_description": "ra ra ra - and more",
"rolled_back": true
}
This would work:
$ sed 's/-\([^:]*\):/_\1:/' infile
{
"outcome": "failed",
"failure_description": "ra ra ra - and more",
"rolled_back": true
}
This looks for a - followed by a captured series of characters other than a colon, and then a colon; it replaces the hyphen by an underscore, and puts the captured group and the colon back.
A limitation of this is that it only replaces the first hyphen. Assume our input looks like this:
{
"outcome": "failed",
"failure-description": "ra ra ra - and more",
"two-hyphens-here": "ra ra ra - and more",
"rolled-back": true
}
To replace all hyphens before the colon, we can use conditional branching:
$ sed ':a;s/-\([^:]*\):/_\1:/;ta' infile
{
"outcome": "failed",
"failure_description": "ra ra ra - and more",
"two_hyphens_here": "ra ra ra - and more",
"rolled_back": true
}
This sets a label (:a) and uses the t command (branch to label if a the pattern space was changed).
For BSD sed as found in Mac OS, the label has to be in a separate command:
sed -e ':a' -e 's/-\([^:]*\):/_\1:/;ta' infile
Notice that an inherent limitation of all this is that there must not be any colons between the quotes and it's generally advisable to use a proper JSON parser such as jq to do this kind of manipulation.
Use extended regular expressions and parenthesis structures.
-r, --regexp-extended
use extended regular expressions in the script.
This produces the correct results, but some adjustment may be needed to harden the regular expression against false matches:
sed -re 's/([:alpha:]*)[-]([:alpha:]*)/\1_\2/'
Result:
{
"outcome": "failed",
"failure_description": "ra ra ra - and more",
"rolled_back": true
}
Note that the simple expression given above is inadequate if the value side contains the pattern. Examine your data set, added more parenthesized expressions and references to them as needed to anchor the match more tightly. It is possible to nest parenthesized expressions, though that does complicate determining the back reference to it.
$ sed --version
GNU sed version 4.1.5
Just use awk:
$ awk 'BEGIN{FS=OFS="\": \""} {gsub(/-/,"_",$1)} 1' file
{
"outcome": "failed",
"failure_description": "ra ra ra - and more",
"rolled_back": true
}
Related
I want to do the following to all of the statements in the file:
Input: xblahxxblahxxblahblahx
Output: <blah><blah><blahblah>
So far I am thinking of using sed -i 's/x/</g' something.ucli
You can use
sed 's/x\([^x]*\)x/<\1>/g'
Details:
x - an x
\([^x]*\) - Group 1 (\1 refers to this group value from the replacement pattern): zero or more (*) chars other than x ([^x])
x - an x
See the online demo:
#!/bin/bash
s='xblahxxblahxxblahblahx'
sed 's/x\([^x]*\)x/<\1>/g' <<< "$s"
# => <blah><blah><blahblah>
If x is a multichar string, e.g.xyz, it will be easier with perl:
perl -pe 's/xyz(.*?)xyz/<$1>/g'
See this online demo.
I have this sed filter:
/.*[1-9][0-9][0-9] .*/{
s/.*\([1-9][0-9][0-9] .*\)/\1/
}
/.*[1-9][0-9] .*/{
s/.*\([1-9][0-9] .*\)/\1/
}
/.*[0-9] .*/{ # but this is always preferred/executed
s/.*\([0-9] .*\)/\1/
}
The problem is that the first two are more restrictive, and they are not executed because the last third one is more "powerfult" because it includes the first two. Is there a way to make sed take the first two, with a "priority order"? Like
if the first matches
do first things
elif the second matches
do second things
elif the third matches
do third things
if .. elif
sed is a simple GOTO language. Research b and : commands in sed.
/.*[1-9][0-9][0-9] .*/{
s/.*\([1-9][0-9][0-9] .*\)/\1/
b END
}
/.*[1-9][0-9] .*/{
s/.*\([1-9][0-9] .*\)/\1/
b END
}
/.*[0-9] .*/{ # but this is always preferred/executed
s/.*\([0-9] .*\)/\1/
}
: END
This might work for you (GNU sed):
sed -E 's/(^|[^0-9])([1-9][0-9]{,2}|[0-9]) .*/\n\2\n/;s/.*\n(.*)\n.*/\1/' file
I assume you want to capture a 1,2 or 3 digit number followed by a space.
Alternation | works left to right.
The above regexp will capture the first match or just return the whole string.
N.B. The ^|[^0-9] is necessary to restrict the match to a 1,2 or 3 digit number.
If the required string occurs more than once in a line the match may be altered to the nth match,e.g the second:
sed -E 's/(^|[^0-9])([1-9][0-9]{,2}|[0-1]) .*/\n\2\n/2;s/.*\n(.*)\n.*/\1/' file
The last match for the above situation is:
sed -E 's/(^|.*[^0-9])([1-9][0-9]{,2}|[0-1]) .*/\n\2\n/;s/.*\n(.*)\n.*/\1/' file
I'd like to replace commas within brackets with spaces (and also remove the brackets). I used sed, but the solution I could come up to is dependent on the elements in the list.
sed 's/\[\(.*\), \(.*\)\]/\1 \2/g'
# [-0.0, 1.23] => -0.0 1.23 (works)
# [-0.0, 1.23, 4.56] => -0.0, 1.23 4.56 (doesn't work)
# foo=[12.3, 4.5, 3.0, 4.1], bar=123.0, xyz=6.7 => foo=12.3, 4.5, 3.0 4.1, bar=123.0, xyz=6.7` (doesn't work, expected: foo=12.3 4.5 3.0 4.1, bar=123.0, xyz=6.7)
Is there any way sed can be used to do what I want?
Consider this test file:
$ cat file
[-0.0, 1.23]
[-0.0, 1.23, 4.56]
foo=[12.3, 4.5, 3.0, 4.1], bar=123.0, xyz=6.7
[1,2,-3,4]
To remove any commas within square brackets and also the remove square brackets:
$ sed -E ':a; s/(\[[^],]*), */\1 /; ta; s/\[([^]]*)\]/\1/g' file
-0.0 1.23
-0.0 1.23 4.56
foo=12.3 4.5 3.0 4.1, bar=123.0, xyz=6.7
1 2 -3 4
How it works
:a
This defines a label a.
s/(\[[^],]*), */\1 /
This looks for the first comma within a square bracket and removes it.
[^],] matches any character except ] or ,. Thus, (\[[^],]*) matches [ followed by any number of characters not ] or , and stores the result in group 1.
ta
If the above substitution resulted in a change, jump back to label a so we can try the substitution again.
s/\[([^]]*)\]/\1/g
After we have finished removing commas, this removes the square brackets.
Note that [^]] matches any character that is not ]. Thus \[([^]]*)\] matches a [ followed by any number of any character except ] followed by ]. In other words, it matches a single bracketed expression and the contents of the expression, excluding the square brackets, are stored in group 1.
I have the following string:
signal[i]
signal[bg]
output [10:0]
input [i:1]
what I want is to replace the letters between square brackets (by underscore for example) and to keep the other strings that represents table declaration:
signal[_]
signal[__]
output [10:0]
input [i:1]
thanks
try:
awk '{gsub(/\[[a-zA-Z]+\]/,"[_]")} 1' Input_file
Globally substituting the (bracket)alphabets till their longest match then with [_]. Mentioning 1 will print the lines(edited or without edited ones).
EDIT: Above will substitute all alphabets with one single _, so to get as many underscores as many characters are there following may help in same.
awk '{match($0,/\[[a-zA-Z]+\]/);VAL=substr($0,RSTART+1,RLENGTH-2);if(VAL){len=length(VAL);;while(i<len){q=q?q"_":"_";i++}};gsub(/\[[a-zA-Z]+\]/,"["q"]")}1' Input_file
OR
awk '{
match($0,/\[[a-zA-Z]+\]/);
VAL=substr($0,RSTART+1,RLENGTH-2);
if(VAL){
len=length(VAL);
while(i<len){
q=q?q"_":"_";
i++
}
};
gsub(/\[[a-zA-Z]+\]/,"["q"]")
}
1
' Input_file
Will add explanation soon.
EDIT2: Following is the one with explanation purposes for OP and users.
awk '{
match($0,/\[[a-zA-Z]+\]/); #### using match awk's built-in utility to match the [alphabets] as per OP's requirement.
VAL=substr($0,RSTART+1,RLENGTH-2); #### Creating a variable named VAL which has substr($0,RSTART+1,RLENGTH-2); which will have substring value, whose starting point is RSTART+1 and ending point is RLENGTH-2.
RSTART and RLENGTH are the variables out of the box which will be having values only when awk finds any match while using match.
if(VAL){ #### Checking if value of VAL variable is NOT NULL. Then perform following actions.
len=length(VAL); #### creating a variable named len which will have length of variable VAL in it.
while(i<len){ #### Starting a while loop which will run till the value of VAL from i(null value).
q=q?q"_":"_"; #### creating a variable named q whose value will be concatenated it itself with "_".
i++ #### incrementing the value of variable i with 1 each time.
}
};
gsub(/\[[a-zA-Z]+\]/,"["q"]") #### Now globally substituting the value of [ alphabets ] with [ value of q(which have all underscores in it) then ].
}
1 #### Mentioning 1 will print (edited or non-edited) lines here.
' Input_file #### Mentioning the Input_file here.
Alternative gawk solution:
awk -F'\\[|\\]' '$2!~/^[0-9]+:[0-9]$/{ gsub(/./,"_",$2); $2="["$2"]" }1' OFS= file
The output:
signal[_]
signal[__]
output [10:0]
-F'\\[|\\]' - treating [ and ] as field separators
$2!~/^[0-9]+:[0-9]$/ - performing action if the 2nd field does not represent table declaration
gsub(/./,"_",$2) - replace each character with _
This might work for you (GNU sed);
sed ':a;s/\(\[_*\)[[:alpha:]]\([[:alpha:]]*\]\)/\1_\2/;ta' file
Match on opening and closing square brackets with any number of _'s and at least one alpha character and replace said character by an underscore and repeat.
awk '{sub(/\[i\]/,"[_]")sub(/\[bg\]/,"[__]")}1' file
signal[_]
signal[__]
output [10:0]
input [i:1]
The explanation is as follows: Since bracket is as special character it has to be escaped to be handled literally then it becomes easy use sub.
I have some java code declaring a 2d array that I want to flip.
Content is like:
zData[0][0] = 198;
zData[0][1] = 198;
zData[0][2] = 198;
...
And I want to flip indices to have
zData[0][0] = 198;
zData[1][0] = 198;
zData[2][0] = 198;
So I tried doing it with sed:
sed -r 's#zData[([0-9]*)][([0-9]*)]#zData[\2][\1]#g' DataSample1.java
But unfortunately sed says:
sed: -e expression #1, char 43: Unmatched ) or \)
Might the string "zData" hold kind of flag or option?
I tried not using the -r option but I have the same kind of message for:
sed 's#zData[\(\[\0\-\9\]\*\)][\(\[\0\-\9\]\*\)]#zData[\2][\1]#g' DataSample1.java
Thanks for your help
Simples:
$ sed -r 's/(zData)(\[[^]]+])(\[[^]]+])/\1\3\2/' file
zData[0][0] = 198;
zData[1][0] = 198;
zData[2][0] = 198;
Regexplanation:
# Match
(zData) # Capture the variable name we want to transpose
( # Start capture group for first index
\[ # Opening bracket escaped to mean literal [
[^]]+ # One or more none ] characters i.e the digits
] # The closing literal ] doesn't need escaping here.
) # Close the capture
(\[[^]]+]) # Same regexp as before for the second index
# Replace
\1\3\2 # Switch the indexes but rearranging the 2nd and 3rd capture groups
Note: Switch \[[^]]+] to if it is clearer \[[0-9]+] for you, so instead of saying match an opening square bracket followed by one or more none-closing brackets followed by a closing bracket you are saying match an opening square bracket followed by one or more digit followed by a closing bracket.
Try that one:
sed 's#\([a-zA-Z0-9_-]\+\)\(\[[^]]*\]\)\(\[[^]*]\]\)\(.*$\)#\1\3\2\4#'
It adds four captures for the variable name, the first index, the second index and the rest and then switches order.
Edit: #Sudo_O's solution with extended regular expressions is much more readable. Thx for that! Nevertheless, on some systems sed -r may not be available, since it is not part of basic POSIX.