execute a command if substitution succeded - sed

I would like to remove \" from perl script with sed,
sed -ne '
#(here some substitutions...)
s/print "\(.*[^"]\)"/\1/p;
' | \
sed -e 's/\\"/"/g'
Is it possible to only substitute \" by " on lines where the first substitution was done? In other word to make this script in one line?
Branching is not cool, because if a not listed previous substitution has been done, the condition is considered as true (but the latest substitution has not been done)...
EXEMPLE:
#! /usr/bin/perl
(...)
while (#someArray) {
print "la variable \"$_\" est cool!\n";
syslog ('info|machin', "la variable \"$_\" est cool!");
}
to
"la variable "$_" est cool!\n"
but no possible substitution in
syslog ('info|machin', "la variable \"$_\" est cool!");
If this line was previously selected.

sed -ne '
# if other substitution are to be made without regarding of s/print....
#(here some substitutions...)
s/print "\(.*[^"]\)"/\1/;
t bs
# if other substitution are to be made if /print... is NOT found
#(here some substitutions...)
b
: bs
# if other substitution are to be made if /print... is found
#(here some substitutions...)
s/\\"/"/g
p
'
after a s//, there is the t that mean Test and goto label (bs in this case) if true.
So here, substitute, if occuring, go to bs and make other substitution than print the result, if not true, go to the end (b without label following)
(code review due to different interpretation about other substitution)

Related

Sed - substitute only within the line containing braces

I have been struggling with this all day. Trying to make variables in sections of a line only contained within braces.
Lines look like this:
blah blah [ae b c] blah [zv y] blah
I need to make this:
blah blah [$ae $b $c] blah [$zv $y] blah
There must be an easy way to do this. However, whenever I try
$ echo "blah blah [ae b c] blah [zv y] blah" | sed 's/\[\(\b.*\b\)\]/$\1/g'
I get greedy matching and just one variable:
blah blah $ae b c] blah [zv y blah
Is there something better?
Thanks,
$ echo "blah blah [ae b c] blah [zv y] blah" | sed -r ':b; s/([[][^]$]* )([[:alnum:]]+)/\1$\2/g; t b; s/[[]([[:alnum:]])/[$\1/g'
blah blah [$ae $b $c] blah [$zv $y] blah
How it works
-r
This turns on extended regex.
:b
This creates a label b.
s/([[][^]$]* )([[:alnum:]]+)/\1$\2/g
This looks for [, followed by anything except ] or $, followed by a space, followed by any alphanumeric characters. It puts a $ in front of the alphanumeric characters.
Note that awk convention that makes [[] match [ while [^]$] matches anything except ] and $. This is more portable than attempting to escape these characters with backslashes.
t b
If the command above resulted in a substitution, this branches back to label b so that the substitution is attempted again.
s/[[]([[:alnum:]])/[$\1/g
The last step is to look for [ followed by an alphanumeric character and put a $ between them.
Because [[:alnum:]] is used, this code is unicode-safe.
Mac OSX (BSD) Version
On BSD sed (OSX) limits the ability to combine statements with semicolons. Try this instead:
sed -E -e ':b' -e 's/([[][^]$]* )([[:alnum:]]+)/\1$\2/g' -e 't b' -e 's/[[]([[:alnum:]])/[$\1/g'
To disable it being greedy, instead of matching any character, match any character except closing bracket:
sed 's/\[\(\b[^]]*\b\)\]/$\1/g'
The task you want to do cannot be done with sed because context-sensitive matching cannot be described with regular grammar.
It's difficult to solve it using sed. As alternative, you can use perl with the help of the Text::Balanced module, that extracts text between balanced delimiters, like square brackets. Each call returns an array with the content between delimiters, the text before them and the text after them, so you can apply the regex that insert $ sign to the significative part of the string.
perl -MText::Balanced=extract_bracketed -lne '
BEGIN { $result = q||; }
do {
#result = extract_bracketed($_, q{[]}, q{[^[]*});
if (! defined $result[0]) {
$result .= $result[1];
last;
}
$result[0] =~ s/(\[|\s+)/$1\$/g;
$result .= $result[2] . $result[0];
$_ = $result[1];
} while (1);
END { printf qq|%s\n|, $result; }
' infile
It yields:
blah blah [$ae $b $c] blah [$zv $y] blah
sed 's/\[\([^]]*\)\]/[ \1]/g
:loop
s/\(\(\[[^]$]*\)\([[:blank:]]\)\)\([^][:blank:]$][^]]*\]\)/\1\$\4/g
t loop
s/\[ \([^]]*\)\]/[\1]/g' YourFile
posix version
assuming there is no bracket inside bracket like [a b[c] d ]
algo:
add a space char after opening bracket (needed to use blank as starting word separator an often no space for first one)
label anchor for a loop
add a $ in front of last word between bracket that does not have one (not starting by $). Do it for each bracket group in line, but 1 add per group only
if occuring, retry another time going to label loop
remove the first space added in first operation
This might work for you (GNU sed):
sed -r 'h;s/\</$/g;T;G;s/^/\n/;:a;s/\n[^[]*(\[[^]]*\])(.*\n)([^[]*)[^]]*\]/\3\1\n\2/;ta;s/\n(.*)\n(.*)/\2/' file
Make a copy of the current line. Insert $ infront of all start-of-word boundaries. If nothing is substituted print the current line and bale out. Otherwise append the copy of the unadulterated line and insert a newline at the start of the adulterated current line. Using substitution and pattern matching replace the parts of the line between [...] with the original matching parts using the newline to move the match forwards through the line. When all matches have been made replace the end of the original line and remove the newlines.

Use sed to replace word in 2-line pattern

I try to use sed to replace a word in a 2-line pattern with another word. When in one line the pattern 'MACRO "something"' is found then in the next line replace 'BLOCK' with 'CORE'. The "something" is to be put into a reference and printed out as well.
My input data:
MACRO ABCD
CLASS BLOCK ;
SYMMETRY X Y ;
Desired outcome:
MACRO ABCD
CLASS CORE ;
SYMMETRY X Y ;
My attempt in sed so far:
sed 's/MACRO \([A-Za-z0-9]*\)/,/ CLASS BLOCK ;/MACRO \1\n CLASS CORE ;/g' input.txt
The above did not work giving message:
sed: -e expression #1, char 30: unknown option to `s'
What am I missing?
I'm open to one-liner solutions in perl as well.
Thanks,
Gert
Using a perl one-liner in slurp mode:
perl -0777 -pe 's/MACRO \w+\n CLASS \KBLOCK ;/CORE ;/g' input.txt
Or using a streaming example:
perl -pe '
s/^\s*\bCLASS \KBLOCK ;/CORE ;/ if $prev;
$prev = $_ =~ /^MACRO \w+$/
' input.txt
Explanation:
Switches:
-0777: Slurp files whole
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
When in one line the pattern 'MACRO "something"' is found then in the
next line replace 'BLOCK' with 'CORE'.
sed works on lines of input. If you want to perform substitution on the next line of a specified pattern, then you need to add that to the pattern space before being able to do so.
The following might work for you:
sed '/MACRO/{N;s/\(CLASS \)BLOCK/\1CORE/;}' filename
Quoting from the documentation:
`N'
Add a newline to the pattern space, then append the next line of
input to the pattern space. If there is no more input then sed
exits without processing any more commands.
If you want to make use of address range as in your attempt, then you need:
sed '/MACRO/,/CLASS BLOCK/{s/\(CLASS\) BLOCK/\1 CORE/}' filename
I'm not sure why do you need a backreference for substituting the macro name.
You could try this awk command also,
awk '{print}/MACRO/ {getline; sub (/BLOCK/,"CORE");{print}}' file
It prints all the lines as it is and do the replacing action on seeing a word MACRO on a line.
Since getline has so many pitfall I try not to use it, so:
awk '/MACRO/ {a++} a==1 {sub(/BLOCK/,"CORE")}1' file
MACRO ABCD
CLASS CORE ;
SYMMETRY X Y ;
This could do it
#!awk -f
BEGIN {
RS = ";"
}
/MACRO/ {
sub("BLOCK", "CORE")
}
{
printf s++ ? ";" $0 : $0
}
"line" ends with ;
sub BLOCK for CORE in "lines" with MACRO
print ; followed by "line" unless first line

find the line number where a specific word appears with “sed” on tcl shell

I need to search for a specific word in a file starting from specific line and return the line numbers only for the matched lines.
Let's say I want to search a file called myfile for the word my_word and then store the returned line numbers.
By using shell script the command :
sed -n '10,$ { /$my_word /= }' $myfile
works fine but how to write that command on tcl shell?
% exec sed -n '10,$ { /$my_word/= }' $file
extra characters after close-brace.
I want to add that the following command works fine on tcl shell but it starts from the beginning of the file
% exec sed -n "/$my_word/=" $file
447431
447445
448434
448696
448711
448759
450979
451006
451119
451209
451245
452936
454408
I have solved the problem as follows
set lineno 10
if { ! [catch {exec sed -n "/$new_token/=" $file} lineFound] && [string length $lineFound] > 0 } {
set lineNumbers [split $lineFound "\n"]
foreach num $lineNumbers {
if {[expr {$num >= $lineno}] } {
lappend col $num
}
}
}
Still can't find a single line that solve the problem
Any suggestions ??
I don't understand a thing: is the text you are looking for stored inside the variable called my_word or is the literal value my_word?
In your line
% exec sed -n '10,$ { /$my_word/= }' $file
I'd say it's the first case. So you have before it something like
% set my_word wordtosearch
% set file filetosearchin
Your mistake is to use the single quote character ' to enclose the sed expression. That character is an enclosing operator in sh, but has no meaning in Tcl.
You use it in sh to group many words in a single argument that is passed to sed, so you have to do the same, but using Tcl syntax:
% set my_word wordtosearch
% set file filetosearchin
% exec sed -n "10,$ { /$my_word/= }" $file
Here, you use the "..." to group.
You don't escape the $ in $my_word because you want $my_word to be substitued with the string wordtosearch.
I hope this helps.
After a few trial-and-error I came up with:
set output [exec sed -n "10,\$ \{ /$myword/= \}" $myfile]
# Do something with the output
puts $output
The key is to escape characters that are special to TCL, such as the dollar sign, curly braces.
Update
Per Donal Fellows, we do not need to escape the dollar sign:
set output [exec sed -n "10,$ \{ /$myword/= \}" $myfile]
I have tried the new revision and found it works. Thank you, Donal.
Update 2
I finally gained access to a Windows 7 machine, installed Cygwin (which includes sed and tclsh). I tried out the above script and it works just fine. I don't know what your problem is. Interestingly, the same script failed on my Mac OS X system with the following error:
sed: 1: "10,$ { /ipsum/= }": extra characters at the end of = command
while executing
"exec sed -n "10,$ \{ /$myword/= \}" $myfile"
invoked from within
"set output [exec sed -n "10,$ \{ /$myword/= \}" $myfile]"
(file "sed.tcl" line 6)
I guess there is a difference between Linux and BSD systems.
Update 3
I have tried the same script under Linux/Tcl 8.4 and it works. That might mean Tcl 8.4 has nothing to do with it. Here is something else that might help: Tcl comes with a package called fileutil, which is part of the tcllib. The fileutil package contains a useful tool for this case: fileutil::grep. Here is a sample on how to use it in your case:
package require fileutil
proc grep_demo {myword myfile} {
foreach line [fileutil::grep $myword $myfile] {
# Each line is in the format:
# filename:linenumber:text
set lineNumber [lindex [split $line :] 1]
if {$lineNumber >= 10} { puts $lineNumber}
}
}
puts [grep_demo $myword $myfile]
Here is how to do it with awk
awk 'NR>10 && $0~f {print NR}' f="$my_word" "$myfile"
This search for all line larger than line number 10 that contains word in variable $my_word in file name stored in variable myfile

Remove newline depending on the format of the next line

I have a special file with this kind of format :
title1
_1 texthere
title2
_2 texthere
I would like all newlines starting with "_" to be placed as a second column to the line before
I tried to do that using sed with this command :
sed 's/_\n/ /g' filename
but it is not giving me what I want to do (doing nothing basically)
Can anyone point me to the right way of doing it ?
Thanks
Try following solution:
In sed the loop is done creating a label (:a), and while not match last line ($!) append next one (N) and return to label a:
:a
$! {
N
b a
}
After this we have the whole file into memory, so do a global substitution for each _ preceded by a newline:
s/\n_/ _/g
p
All together is:
sed -ne ':a ; $! { N ; ba }; s/\n_/ _/g ; p' infile
That yields:
title1 _1 texthere
title2 _2 texthere
If your whole file is like your sample (pairs of lines), then the simplest answer is
paste - - < file
Otherwise
awk '
NR > 1 && /^_/ {printf "%s", OFS}
NR > 1 && !/^_/ {print ""}
{printf "%s", $0}
END {print ""}
' file
This might work for you (GNU sed):
sed ':a;N;s/\n_/ /;ta;P;D' file
This avoids slurping the file into memory.
or:
sed -e ':a' -e 'N' -e 's/\n_/ /' -e 'ta' -e 'P' -e 'D' file
A Perl approach:
perl -00pe 's/\n_/ /g' file
Here, the -00 causes perl to read the file in paragraph mode where a "line" is defined by two consecutive newlines. In your example, it will read the entire file into memory and therefore, a simple global substitution of \n_ with a space will work.
That is not very efficient for very large files though. If your data is too large to fit in memory, use this:
perl -ne 'chomp;
s/^_// ? print "$l " : print "$l\n" if $. > 1;
$l=$_;
END{print "$l\n"}' file
Here, the file is read line by line (-n) and the trailing newline removed from all lines (chomp). At the end of each iteration, the current line is saved as $l ($l=$_). At each line, if the substitution is successful and a _ was removed from the beginning of the line (s/^_//), then the previous line is printed with a space in place of a newline print "$l ". If the substitution failed, the previous line is printed with a newline. The END{} block just prints the final line of the file.

divide each line in equal part

I would be happy if anyone can suggest me command (sed or AWK one line command) to divide each line of file in equal number of part. For example divide each line in 4 part.
Input:
ATGCATHLMNPHLNTPLML
Output:
ATGCA THLMN PHLNT PLML
This should work using GNU sed:
sed -r 's/(.{4})/\1 /g'
-r is needed to use extended regular expressions
.{4} captures every four characters
\1 refers to the captured group which is surrounded by the parenthesis ( ) and adds a space behind this group
g makes sure that the replacement is done as many times as possible on each line
A test; this is the input and output in my terminal:
$ echo "ATGCATHLMNPHLNTPLML" | sed -r 's/(.{4})/\1 /g'
ATGC ATHL MNPH LNTP LML
I suspect awk is not the best tool for this, but:
gawk --posix '{ l = sprintf( "%d", 1 + (length()-1)/4);
gsub( ".{"l"}", "& " ) } 1' input-file
If you have a posix compliant awk you can omit the --posix, but --posix is necessary for gnu awk and since that seems to be the most commonly used implementation I've given the solution in terms of gawk.
This might work for you (GNU sed):
sed 'h;s/./X/g;s/^\(.*\)\1\1\1/\1 \1 \1 \1/;G;s/\n/&&/;:a;/^\n/bb;/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta;s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta;:b;s/\n//g' file
Explanation:
h copy the pattern space (PS) to the hold space (HS)
s/./X/g replace every character in the HS with the same non-space character (in this case X)
s/^\(.*\)\1\1\1/\1 \1 \1 \1/ split the line into 4 parts (space separated)
G append a newline followed by the contents of the HS to the PS
s/\n/&&/ double the newline (to be later used as markers)
:a introduce a loop namespace
/^\n/bb if we reach a newline we are done and branch to the b namespace
/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta; if the first character is a space add a space to the real line at this point and repeat
s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta any other character just bump along and repeat
:b;s/\n//g all done just remove the markers and print out the result
This work for any length of line, however is the line is not exactly divisible by 4 the last portion will contain the remainder as well.
perl
perl might be a better choice here:
export cols=4
perl -ne 'chomp; $fw = 1 + int length()/$ENV{cols}; while(/(.{1,$fw})/gm) { print $1 . " " } print "\n"'
This re-calculates field-width for every line.
coreutils
A GNU coreutils alternative, field-width is chosen based on the first line of infile:
cols=4
len=$(( $(head -n1 infile | wc -c) - 1 ))
fw=$(echo "scale=0; 1 + $len / 4" | bc)
cut_arg=$(paste -d- <(seq 1 $fw 19) <(seq $fw $fw $len) | head -c-1 | tr '\n' ',')
Value of cut_arg is in the above case:
1-5,6-10,11-15,16-
Now cut the line into appropriate chunks:
cut --output-delimiter=' ' -c $cut_arg infile