How can I use sed to to convert $$ blah $$ in TeX to \begin{equation} blah \end{equation} - sed

I have files with entries of the form:
$$
y = x^2
$$
I'm looking for a way (specifically using sed) to convert them to:
\begin{equation}
y = x^2
\end{equation}
The solution should not rely on the form of the equation (which may also span mutiple lines) nor on the text preceding the opening $$ or following the closing $$.
Thanks for the help.

sed '
/^\$\$$/ {
x
s/begin/&/
t use_end_tag
s/^.*$/\\begin{equation}/
h
b
: use_end_tag
s/^.*$/\\end{equation}/
h
}
'
Explanation:
sed maintains two buffers: the pattern space (pspace) and the hold space (hspace). It operates in cycles, where during each cycle it reads a line and executes the script for that line. pspace is usually auto-printed at the end of each cycle (unless the -n option is used), and then deleted before the next cycle. hspace holds its contents between cycles.
The idea of the script is that whenever $$ is seen, hspace is first checked to see if it contains the word "begin". If it does, then substitute the end tag; otherwise substitute the begin tag. In either case, store the substituted tag in the hold space so it can be checked next time.
sed '
/^\$\$$/ { # if line contains only $$
x # exchange pspace and hspace
s/begin/&/ # see if "begin" was in hspace
t use_end_tag # if it was, goto use_end_tag
s/^.*$/\\begin{equation}/ # replace pspace with \begin{equation}
h # set hspace to contents of pspace
b # start next cycle after auto-printing
: use_end_tag
s/^.*$/\\end{equation}/ # replace pspace with \end{equation}
h # set hspace to contents of pspace
}
'

This might work for you (GNU sed):
sed -r '1{x;s/^/\\begin{equation}\n\\end{equation}/;x};/\$\$/{g;P;s/(.*)\n(.*)/\2\n\1/;h;d}' file
Prime the hold space with the required strings. On encountering the marker print the first line and then swap the strings in anticipation of the next marker.

I can not help you with sed, but this awk should do:
awk '/\$\$/ && !f {$0="\\begin{equation}";f=1} /\$\$/ && f {$0="\\end{equation}";f=0}1' file
\begin{equation}
y = x^2
\end{equation}
The f=0is not needed, if its not repeated.

Related

Can Sed match matching brackets?

My code has a ton of occurrences of something like:
idof(some_object)
I want to replace them with:
some_object["id"]
It sounds simple:
sed -i 's/idof(\([^)]\+\))/\1["id"]/g' source.py
The problem is that some_object might be something like idof(get_some_object()), or idof(my_class().get_some_object()), in which case, instead of getting what I want (get_some_object()["id"] or my_class().get_some_object()["id"]), I get get_some_object(["id"]) or my_class(["id"].get_some_object()).
Is there a way to have sed match closing bracket, so that it internally keeps track of any opening/closing brackets inside my (), and ignores those?
It needs to keep everything that's between those brackets: idof(ANYTHING) becomes ANYTHING["id"].
Using sed
$ sed -E 's/idof\(([[:alpha:][:punct:]]*)\)/\1["id"]/g' input_file
Using ERE, exclude idof and the first opening parenthesis.
As a literal closing parenthesis is also excluded, everything in-between the capture parenthesis including additional parenthesis will be captured.
[[:alpha:]] will match all alphabetic characters including upper and lower case while [[:punct:]] will capture punctuation characters including ().-{} and more.
The g option will make the substitution as many times as the pattern is found.
Theoretically, you can write a regex that will handle all combinations of idof(....) up to some limit of nested () calls inside ..... Such regex would have to list with all possible combinations of calls, like idof(one(two(three))) or idof(one(two(three)four(five)) you can match with an appropriate regex like idof([^()]*([^()]*([^()]*)[^()]*)[^()]*) or idof([^()]*([^()]*([^()]*)[^()]*([^()]*)[^()]*) respectively.
The following regex handles only some cases, but shows the complexity and general path. Writing a regex to handle all possible cases to "eat" everything in front of the trailing ) is left to OP as an exercise why it's better to use something else. Note that handling string literals ")" becomes increasingly complex.
The following Bash code:
sed '
: begin
# No idof? Just print the line!
/^\(.*\)idof(\([^)]*)\)/!n
# Note: regex is greedy - we start from the back!
# Note: using newline as a stack separator.
s//\1\n\2/
# hold the front
{ h ; x ; s/\n.*// ; x ; s/[^\n]*\n// ; }
: handle_brackets
# Eat everything before final ) up to some number of nested ((())) calls.
# Insert more jokes here.
: eat_brackets
/^[^()]*\(([^()]*\(([^()]*\(([^()]*\(([^()]*\(([^()]*\(([^()]*)\)\?[^()]*)\)\?[^()]*)\)\?[^()]*)\)\?[^()]*)\)\?[^()]*)\)/{
s//&\n/
# Hold the front.
{ H ; x ; s/\n\([^\n]*\)\n.*/\1/ ; x ; s/[^\n]*\n// ; }
b eat_brackets
}
/^\([^()]*\))/!{
s/^/ERROR: eating brackets did not work: /
q1
}
# Add the id after trailing ) and remove it.
s//\1["id"]/
# Join with hold space and clear the hold space for next round
{ H ; s/.*// ; x ; s/\n//g ; }
# Restart for another idof if in input.
b begin
' <<EOF
before idof(some_object) after
before idof(get_some_object()) after
before idof(my_class().get_some_object()) after
before idof(one(two(three)four)five) after
before idof(one(two(three)four)five) between idof(one(two(three)four)five) after
before idof( one(two(three)four)five one(two(three)four)five ) after
before idof(one(two(three(four)five)six(seven(eight)nine)ten) between idof(one(two(three(four)five)six(seven(eight)nine)ten) after
EOF
Will output:
before some_object["id"] after
before get_some_object()["id"] after
before my_class().get_some_object()["id"] after
before one(two(three)four)five["id"] after
before one(two(three)four)five["id"] between one(two(three)four)five["id"] after
before one(two(three)four)five one(two(three)four)five ["id"] after
ERROR: eating brackets did not work: one(two(three(four)five)six(seven(eight)nine)ten) after
The last line is not handled correctly, because (()()) case is not correctly handled. One would have to write a regex to match it.

sed conditional insertion between text & pattern

I would like to parse a file's content with text blocks & add a complementary delimiter.
Example of a good existing block:
%
sometext
-+- some signature
Example of a bad existing block:
%
sometext
%
someothertext
What I can already do is identify the pattern and insert the pattern unconditionally, like:
sed '/%$/ i\-+-' toto
-+-
%
1
-+-
%
in my test file.
How can I identify that the line above the % char is a text block, and if so, insert a -+- signature -+- line between the text and the new signature line?
Full example:
%
good signature is present
-+- signature -+-
%
bad signature is no present
%
this is also bad
%
this one is good
-+- signature -+-
must become
%
good signature is present
-+- signature -+-
%
bad signature is no present
-+- signature -+-
%
this is also bad
-+- signature -+-
%
this one is good
-+- signature -+-
The texts themselves won't change.
The following script:
#!/bin/sh
cat <<EOF |
%
good signature is present
-+- signature -+-
%
bad signature is no present
%
this is also bad
%
this one is good
EOF
sed -E '
# Last line is a big special - we add to hold buffer first.
${
# Give me functions in sed....
# Keep last 2 lines in hold space.
x; G; s/^.*((\n[^\n]*){2})$/\1/; x;
# Add the line.
b ADD;
}
# Check if current line does not contain -+-
/^-\+-/!{ b ADD; }
b NOADD; { : ADD;
# Check if two last lines match the pattern.
x; /^\n% *\n[a-zA-Z ]+$/{
# Last line needs to print pattern space first.
${ x; p; x; };
# Insert the line with signature.
# Flush hold space.
s/.*/-+- signature -+-/; p; s/.*//;
# Last line exits
${ d; };
}; x;
}; : NOADD
# Keep last 2 lines in hold space.
x; G; s/^.*((\n[^\n]*){2})$/\1/; x;
'
outputs:
%
good signature is present
-+- signature -+-
%
bad signature is no present
-+- signature -+-
%
this is also bad
-+- signature -+-
%
this one is good
-+- signature -+-
The general idea is that you accumulate enough state inside hold buffer so that you can make the decision on what you want to do. Then only evaluate if there is in hold buffer + pattern buffer that what you want and make an action then.
The last line handling is semi-broken and probably has to be also fixed and handled better - which is left to others.
Alternatively to storing state inside hold buffer, you can "store" state in like current control flow position inside the script. I think which method to choose is subjective and depends on the work to be done. I believe it is actually simpler here:
sed -E '
: RESTART
# Check for %
/^%/{
n;
# Check next line for words.
/^[a-zA-Z ]+$/{
# If end of line, first print, then we add.
${ p; b ADD; }
n;
# If something else, we also add.
/^-\+-/!{ b ADD; }
b NOADD; { : ADD;
# Add the signature.
x; s/.*/-+- signature -+-/p; x;
# Last line already printed - just quit.
${ d; }
# We already read next line above - restart.
b RESTART
}; : NOADD
}
}
'
With awk:
awk -v s='-+- signature -+-' '
$0=="%"{if(f) print s; f=1}
$0==s{f=0} 1; END{if(f) print s}' ip.txt
f is a flag that will be set if input line is % and unset if signature is found
If f is still set when the next % occurs, print the signature
END{if(f) print s} is needed if the final block didn't have a signature
Note that exact string comparison is used here to check the input lines, if there are excess whitespace you'll have to use regex instead or take care of the excess whitespace first
Using regexp instead of string matching, adjust regexp as needed:
awk -v s='-+- signature -+-' '
/^%/{if(f) print s; f=1}
/^-\+- signature/{f=0} 1;
END{if(f) print s}' ip.txt
This might work for you (GNU sed):
sed '1{x;s/^/-+- dummy sig -+-/;x};/^%/{:a;${G;b};n;/^%/{x;p;x;ba};/^-+-/!ba}' file
On the first line set up a dummy signature in the hold space for later use.
If a line begins % keep printing lines until either another % in which case insert a dummy signature and repeat above or a line beginning -+- in which case end processing of the leading %.
The solution may be altered to use the previous signature, like so:
sed -e '1{x;s/^/-+- dummy sig -+-/;x};/^%/{:a;${G;b};n;/^%/{x;p;x;ba};/^-+-/!ba;h}' file
N.B. That in the case of processing text between a pattern and the last line is encountered, the dummy signature is appended.

use sed to change a text report to csv

I have a report looks like this:
par_a
.xx
.yy
par_b
.zz
.tt
I wish to convert this format into csv format as below using sed 1 liner:
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
please help.
With awk:
awk '/^par_/{v=$0;next}/^ /{$0=v","$1;print}' File
Or to make it more generic:
awk '/^[^[:blank:]]/{v=$0;next} /^[[:blank:]]/{$0=v","$1;print}' File
When a line starts with par_, save the content to variable v. Now, when a line starts with space, change the line to content of v followed by , followed by the first field.
Output:
AMD$ awk '/^par_/{v=$0}/^ /{$0=v","$1;print}' File
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
With sed:
sed '/^par_/ { h; d; }; G; s/^[[:space:]]*//; s/\(.*\)\n\(.*\)/\2,\1/' filename
This works as follows:
/^par_/ { # if a new paragraph begins
h # remember it
d # but don't print anything yet
}
# otherwise:
G # fetch the remembered paragraph line to the pattern space
s/^[[:space:]]*// # remove leading whitespace
s/\(.*\)\n\(.*\)/\2,\1/ # rearrange to desired CSV format
Depending on your actual input data, you may want to replace the /^par_/ with, say, /^[^[:space:]]/. It just has to be a pattern that recognizes the beginning line of a paragraph.
Addendum: Shorter version that avoids regex repetition when using the space pattern to recognize paragraphs:
sed -r '/^\s+/! { h; d; }; s///; G; s/(.*)\n(.*)/\2,\1/' filename
Or, if you have to use BSD sed (as comes with Mac OS X):
sed '/^[[:space:]]\{1,\}/! { h; d; }; s///; G; s/\(.*\)\n\(.*\)/\2,\1/' filename
The latter should be portable to all seds, but as you can see, writing portable sed involves some pain.

error in debugging the algorithm

I'm trying to make an algorithm in Matlab that scans the character array from left to right and if it encounters a space, it should do nothing, but if it encounters 2 consecutive spaces, it should start printing the remaining quantities of array from next line. for example,
inpuut='a bc d';
after applying this algorithm, the final output should have to be:
a bc
d
but this algorithm is giving me the output as:
a bc
d d
Also, if someone has got a more simpler algorithm to do this task, do help me please :)
m=1; t=1;
inpuut='a bc d';
while(m<=(length(inpuut)))
if((inpuut(m)==' ')&&(inpuut(m+1)==' '))
n=m;
fprintf(inpuut(t:(n-1)));
fprintf('\n');
t=m+2;
end
fprintf(inpuut(t));
if(t<length(inpuut))
t=t+1;
elseif(t==length(inpuut))
t=t-1;
else
end
m=m+1;
end
fprintf('\n');
OK I gave up telling why your code doesn't work. This is a working one.
inpuut='a bc d ';
% remove trailing space
while (inpuut(end)==' ')
inpuut(end)=[];
end
str = regexp(inpuut, ' ', 'split');
for ii = 1:length(str)
fprintf('%s\n', str{ii});
end
regexp with 'split' option splits the string into a cell array, with delimiter defined in the matching expression.
fprintf is capable of handling complicated strings, much more than printing a single string.
You can remove the trailing space before printing, or do it inside the loop (check if the last cell is empty, but it's more costly).
You can use regexprep to replace two consecutive spaces by a line feed:
result_string = regexprep(inpuut, ' ', '\n');
If you need to remove trailing spaces: use this first:
result_string = regexprep(inpuut, ' $', '');
I have a solution without using regex, but I assumed you wanted to print on 2 lines maximum.
Example: with 'a b c hello':
a b
c hello
and not:
a b
c
hello
In any case, here is the code:
inpuut = 'a b c';
while(length(inpuut) > 2)
% Read the next 2 character
first2char = inpuut(1:2);
switch(first2char)
case ' ' % 2 white spaces
% we add a new line and print the rest of the input
fprintf('\n%s', inpuut(3:end));
inpuut = [];
otherwise % not 2 white spaces
% Just print one character
fprintf('%s', inpuut(1))
inpuut(1) = [];
end
end
fprintf('%s\n', inpuut);

How to efficiently transpose rows into columns in Vim?

I have a data file like the following:
----------------------------
a b c d e .............
A B C D E .............
----------------------------
But I want it to be in the following format:
----------------------------
a A
b B
c C
d D
e E
...
...
----------------------------
What is the quickest way to do the transformation in Vim or Perl?
Basically :.s/ /SpaceCtrl+vEnter/gEnterjma:.s/ /Ctrl+vEnter/gEnterCtrl+v'axgg$p'adG will do the trick. :)
OK, let's break that down:
:.s/ /Ctrl+vEnter/gEnter: On the current line (.), substitute (s) spaces (/ /) with a space followed by a carriage return (SpaceCtrl+vEnter/), in all positions (g). The cursor should now be on the last letter's line (e in the example).
j: Go one line down (to A B C D E).
ma: Set mark a to the current position... because we want to refer to this position later.
:.s/ /Ctrl+vEnter/gEnter: Do the same substitution as above, but without the Space. The cursor should now be on the last letter's line (E in the example).
Ctrl+v'a: Select from the current cursor position (E) to mark a (that we set in step 3 above), using the block select.
x: Cut the selection (into the " register).
gg: Move the cursor to the first line.
$: Move the cursor to the end of the line.
p: Paste the previously cut text after the cursor position.
'a: Move the cursor to the a mark (set in step 3).
dG: Delete everything (the empty lines left at the bottom) from the cursor position to the end of the file.
P.S. I was hoping to learn about a "built-in" solution, but until such time...
Simple re-map of the columns:
use strict;
use warnings;
my #a = map [ split ], <>; # split each line on whitespace and store in array
for (0 .. $#{$a[0]}) { # for each such array element
printf "%s %s\n", $a[0]->[$_], $a[1]->[$_]; # print elements in order
}
Usage:
perl script.pl input.txt
Assuming that the cursor is on the first of the two lines, I would use
the command
:s/ /\r/g|+&&|'[-;1,g/^/''+m.|-j