sed conditional insertion between text & pattern - sed

I would like to parse a file's content with text blocks & add a complementary delimiter.
Example of a good existing block:
%
sometext
-+- some signature
Example of a bad existing block:
%
sometext
%
someothertext
What I can already do is identify the pattern and insert the pattern unconditionally, like:
sed '/%$/ i\-+-' toto
-+-
%
1
-+-
%
in my test file.
How can I identify that the line above the % char is a text block, and if so, insert a -+- signature -+- line between the text and the new signature line?
Full example:
%
good signature is present
-+- signature -+-
%
bad signature is no present
%
this is also bad
%
this one is good
-+- signature -+-
must become
%
good signature is present
-+- signature -+-
%
bad signature is no present
-+- signature -+-
%
this is also bad
-+- signature -+-
%
this one is good
-+- signature -+-
The texts themselves won't change.

The following script:
#!/bin/sh
cat <<EOF |
%
good signature is present
-+- signature -+-
%
bad signature is no present
%
this is also bad
%
this one is good
EOF
sed -E '
# Last line is a big special - we add to hold buffer first.
${
# Give me functions in sed....
# Keep last 2 lines in hold space.
x; G; s/^.*((\n[^\n]*){2})$/\1/; x;
# Add the line.
b ADD;
}
# Check if current line does not contain -+-
/^-\+-/!{ b ADD; }
b NOADD; { : ADD;
# Check if two last lines match the pattern.
x; /^\n% *\n[a-zA-Z ]+$/{
# Last line needs to print pattern space first.
${ x; p; x; };
# Insert the line with signature.
# Flush hold space.
s/.*/-+- signature -+-/; p; s/.*//;
# Last line exits
${ d; };
}; x;
}; : NOADD
# Keep last 2 lines in hold space.
x; G; s/^.*((\n[^\n]*){2})$/\1/; x;
'
outputs:
%
good signature is present
-+- signature -+-
%
bad signature is no present
-+- signature -+-
%
this is also bad
-+- signature -+-
%
this one is good
-+- signature -+-
The general idea is that you accumulate enough state inside hold buffer so that you can make the decision on what you want to do. Then only evaluate if there is in hold buffer + pattern buffer that what you want and make an action then.
The last line handling is semi-broken and probably has to be also fixed and handled better - which is left to others.
Alternatively to storing state inside hold buffer, you can "store" state in like current control flow position inside the script. I think which method to choose is subjective and depends on the work to be done. I believe it is actually simpler here:
sed -E '
: RESTART
# Check for %
/^%/{
n;
# Check next line for words.
/^[a-zA-Z ]+$/{
# If end of line, first print, then we add.
${ p; b ADD; }
n;
# If something else, we also add.
/^-\+-/!{ b ADD; }
b NOADD; { : ADD;
# Add the signature.
x; s/.*/-+- signature -+-/p; x;
# Last line already printed - just quit.
${ d; }
# We already read next line above - restart.
b RESTART
}; : NOADD
}
}
'

With awk:
awk -v s='-+- signature -+-' '
$0=="%"{if(f) print s; f=1}
$0==s{f=0} 1; END{if(f) print s}' ip.txt
f is a flag that will be set if input line is % and unset if signature is found
If f is still set when the next % occurs, print the signature
END{if(f) print s} is needed if the final block didn't have a signature
Note that exact string comparison is used here to check the input lines, if there are excess whitespace you'll have to use regex instead or take care of the excess whitespace first
Using regexp instead of string matching, adjust regexp as needed:
awk -v s='-+- signature -+-' '
/^%/{if(f) print s; f=1}
/^-\+- signature/{f=0} 1;
END{if(f) print s}' ip.txt

This might work for you (GNU sed):
sed '1{x;s/^/-+- dummy sig -+-/;x};/^%/{:a;${G;b};n;/^%/{x;p;x;ba};/^-+-/!ba}' file
On the first line set up a dummy signature in the hold space for later use.
If a line begins % keep printing lines until either another % in which case insert a dummy signature and repeat above or a line beginning -+- in which case end processing of the leading %.
The solution may be altered to use the previous signature, like so:
sed -e '1{x;s/^/-+- dummy sig -+-/;x};/^%/{:a;${G;b};n;/^%/{x;p;x;ba};/^-+-/!ba;h}' file
N.B. That in the case of processing text between a pattern and the last line is encountered, the dummy signature is appended.

Related

Can Sed match matching brackets?

My code has a ton of occurrences of something like:
idof(some_object)
I want to replace them with:
some_object["id"]
It sounds simple:
sed -i 's/idof(\([^)]\+\))/\1["id"]/g' source.py
The problem is that some_object might be something like idof(get_some_object()), or idof(my_class().get_some_object()), in which case, instead of getting what I want (get_some_object()["id"] or my_class().get_some_object()["id"]), I get get_some_object(["id"]) or my_class(["id"].get_some_object()).
Is there a way to have sed match closing bracket, so that it internally keeps track of any opening/closing brackets inside my (), and ignores those?
It needs to keep everything that's between those brackets: idof(ANYTHING) becomes ANYTHING["id"].
Using sed
$ sed -E 's/idof\(([[:alpha:][:punct:]]*)\)/\1["id"]/g' input_file
Using ERE, exclude idof and the first opening parenthesis.
As a literal closing parenthesis is also excluded, everything in-between the capture parenthesis including additional parenthesis will be captured.
[[:alpha:]] will match all alphabetic characters including upper and lower case while [[:punct:]] will capture punctuation characters including ().-{} and more.
The g option will make the substitution as many times as the pattern is found.
Theoretically, you can write a regex that will handle all combinations of idof(....) up to some limit of nested () calls inside ..... Such regex would have to list with all possible combinations of calls, like idof(one(two(three))) or idof(one(two(three)four(five)) you can match with an appropriate regex like idof([^()]*([^()]*([^()]*)[^()]*)[^()]*) or idof([^()]*([^()]*([^()]*)[^()]*([^()]*)[^()]*) respectively.
The following regex handles only some cases, but shows the complexity and general path. Writing a regex to handle all possible cases to "eat" everything in front of the trailing ) is left to OP as an exercise why it's better to use something else. Note that handling string literals ")" becomes increasingly complex.
The following Bash code:
sed '
: begin
# No idof? Just print the line!
/^\(.*\)idof(\([^)]*)\)/!n
# Note: regex is greedy - we start from the back!
# Note: using newline as a stack separator.
s//\1\n\2/
# hold the front
{ h ; x ; s/\n.*// ; x ; s/[^\n]*\n// ; }
: handle_brackets
# Eat everything before final ) up to some number of nested ((())) calls.
# Insert more jokes here.
: eat_brackets
/^[^()]*\(([^()]*\(([^()]*\(([^()]*\(([^()]*\(([^()]*\(([^()]*)\)\?[^()]*)\)\?[^()]*)\)\?[^()]*)\)\?[^()]*)\)\?[^()]*)\)/{
s//&\n/
# Hold the front.
{ H ; x ; s/\n\([^\n]*\)\n.*/\1/ ; x ; s/[^\n]*\n// ; }
b eat_brackets
}
/^\([^()]*\))/!{
s/^/ERROR: eating brackets did not work: /
q1
}
# Add the id after trailing ) and remove it.
s//\1["id"]/
# Join with hold space and clear the hold space for next round
{ H ; s/.*// ; x ; s/\n//g ; }
# Restart for another idof if in input.
b begin
' <<EOF
before idof(some_object) after
before idof(get_some_object()) after
before idof(my_class().get_some_object()) after
before idof(one(two(three)four)five) after
before idof(one(two(three)four)five) between idof(one(two(three)four)five) after
before idof( one(two(three)four)five one(two(three)four)five ) after
before idof(one(two(three(four)five)six(seven(eight)nine)ten) between idof(one(two(three(four)five)six(seven(eight)nine)ten) after
EOF
Will output:
before some_object["id"] after
before get_some_object()["id"] after
before my_class().get_some_object()["id"] after
before one(two(three)four)five["id"] after
before one(two(three)four)five["id"] between one(two(three)four)five["id"] after
before one(two(three)four)five one(two(three)four)five ["id"] after
ERROR: eating brackets did not work: one(two(three(four)five)six(seven(eight)nine)ten) after
The last line is not handled correctly, because (()()) case is not correctly handled. One would have to write a regex to match it.

Using matlab to extract data in between text

I am trying to export a set of data and I am pretty knew to this. The data in question has this structure:
# ************************************
# ***** GLOBAL ATTRIBUTES ******
# ************************************
#
# PROJECT THEMIS
#
UT UT BX_FGL-D BY_FGL-D BZ_FGL-D
(#_1_) (#_2_) (#_3_)
dd-mm-yyyy hh:mm:ss.mil.mic.nan.pic sec nT_GSE nT_GSE nT_GSE
21-05-2015 00:00:00.223.693.846.740 1.43208E+09 1.14132 9.14226 27.1446
21-05-2015 00:00:00.473.693.845.716 1.43208E+09 1.11194 9.16192 27.1798
21-05-2015 00:00:00.723.693.844.692 1.43208E+09 1.12992 9.11103 27.1595
21-05-2015 00:00:00.973.693.843.668 1.43208E+09 1.15966 9.15324 27.1589
21-05-2015 00:00:01.223.693.846.740 1.43208E+09 1.20576 9.14420 27.1388
21-05-2015 00:09:59.973.693.843.668 1.43208E+09 1.97445 8.66407 26.1837
#
# Key Parameter and Survey data (labels K0,K1,K2) are preliminary browse data.
# Generated by CDAWeb on: Mon May 27 06:01:29 2019
I require those written between “dd-mm-yyyy….” and “# # Key Parameter” to be exported to columns.
E.g. , the first line 21-05-2015 00:00:00.223.693.846.740 1.43208E+09 1.14132 9.14226 27.1446, has to exported into 21, 05,2015, 00,00,00,223,693,846,740, 1.43208E+09,1.14132, 9.14226 and 27.1446.
Similar question is tackled at Use MATLAB to extract data beyond "Data starts on next line:" in text-file but I believe my data is complicated and I could not do further. The best I could do was to write a part of code to read till “dd-mm-yyyy”:
clear;clc;close all;
f = fopen('dataa_file.txt');
line = fgetl(f);
while isempty(strfind(line, 'nT_GSE'))
if line == -1 %// If we reach the end of the file, get out
break;
end
line = fgetl(f);
end
Any help will be deeply appreciated…
This seems to work. It assumes that
The first line that contains numbers is that immediately after the line that begins with 'dd-mm-yyyy'.
The last line that contains numbers is two lines above the line that begins with '# Key Parameter'.
Code:
t = fileread('file.txt'); % Read the file as a character vector
t = strsplit(t, {'\r' '\n'}, 'CollapseDelimiters', true); % Split on newline or carriage
% return. This gives a cell array with each line in a cell
ind_start = find(cellfun(#any, regexp(t, '^dd-mm-yyyy', 'once')), 1) + 1; % index of
% line where the numbers begin: immediately after the line 'dd-mm-yyyy...'
ind_end = find(cellfun(#any, regexp(t, '^# Key Parameter', 'once')), 1) - 2; % index of
% line where numbers end: two lines before the line '# Key Parameter...'
result = cellfun(#(u) sscanf(u, '%d-%d-%d %02d:%02d:%02d.%d.%d.%d.%d %f %f %f %f').', ...
t(ind_start:ind_end), 'UniformOutput', false);
% Apply sscanf to each line. The format specifier uses %d where needed to prevent
% the dot from being interpreted as part of a floating point number. Also, the
% possible existence of leading zeros needs to be taken into account. The result is
% a cell array, where each cell contains a numeric vector corresponding to one line
result = cell2mat(result.'); % convert the result to a numerical array

use sed to change a text report to csv

I have a report looks like this:
par_a
.xx
.yy
par_b
.zz
.tt
I wish to convert this format into csv format as below using sed 1 liner:
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
please help.
With awk:
awk '/^par_/{v=$0;next}/^ /{$0=v","$1;print}' File
Or to make it more generic:
awk '/^[^[:blank:]]/{v=$0;next} /^[[:blank:]]/{$0=v","$1;print}' File
When a line starts with par_, save the content to variable v. Now, when a line starts with space, change the line to content of v followed by , followed by the first field.
Output:
AMD$ awk '/^par_/{v=$0}/^ /{$0=v","$1;print}' File
par_a,.xx
par_a,.yy
par_b,.zz
par_b,.tt
With sed:
sed '/^par_/ { h; d; }; G; s/^[[:space:]]*//; s/\(.*\)\n\(.*\)/\2,\1/' filename
This works as follows:
/^par_/ { # if a new paragraph begins
h # remember it
d # but don't print anything yet
}
# otherwise:
G # fetch the remembered paragraph line to the pattern space
s/^[[:space:]]*// # remove leading whitespace
s/\(.*\)\n\(.*\)/\2,\1/ # rearrange to desired CSV format
Depending on your actual input data, you may want to replace the /^par_/ with, say, /^[^[:space:]]/. It just has to be a pattern that recognizes the beginning line of a paragraph.
Addendum: Shorter version that avoids regex repetition when using the space pattern to recognize paragraphs:
sed -r '/^\s+/! { h; d; }; s///; G; s/(.*)\n(.*)/\2,\1/' filename
Or, if you have to use BSD sed (as comes with Mac OS X):
sed '/^[[:space:]]\{1,\}/! { h; d; }; s///; G; s/\(.*\)\n\(.*\)/\2,\1/' filename
The latter should be portable to all seds, but as you can see, writing portable sed involves some pain.

How can I use sed to to convert $$ blah $$ in TeX to \begin{equation} blah \end{equation}

I have files with entries of the form:
$$
y = x^2
$$
I'm looking for a way (specifically using sed) to convert them to:
\begin{equation}
y = x^2
\end{equation}
The solution should not rely on the form of the equation (which may also span mutiple lines) nor on the text preceding the opening $$ or following the closing $$.
Thanks for the help.
sed '
/^\$\$$/ {
x
s/begin/&/
t use_end_tag
s/^.*$/\\begin{equation}/
h
b
: use_end_tag
s/^.*$/\\end{equation}/
h
}
'
Explanation:
sed maintains two buffers: the pattern space (pspace) and the hold space (hspace). It operates in cycles, where during each cycle it reads a line and executes the script for that line. pspace is usually auto-printed at the end of each cycle (unless the -n option is used), and then deleted before the next cycle. hspace holds its contents between cycles.
The idea of the script is that whenever $$ is seen, hspace is first checked to see if it contains the word "begin". If it does, then substitute the end tag; otherwise substitute the begin tag. In either case, store the substituted tag in the hold space so it can be checked next time.
sed '
/^\$\$$/ { # if line contains only $$
x # exchange pspace and hspace
s/begin/&/ # see if "begin" was in hspace
t use_end_tag # if it was, goto use_end_tag
s/^.*$/\\begin{equation}/ # replace pspace with \begin{equation}
h # set hspace to contents of pspace
b # start next cycle after auto-printing
: use_end_tag
s/^.*$/\\end{equation}/ # replace pspace with \end{equation}
h # set hspace to contents of pspace
}
'
This might work for you (GNU sed):
sed -r '1{x;s/^/\\begin{equation}\n\\end{equation}/;x};/\$\$/{g;P;s/(.*)\n(.*)/\2\n\1/;h;d}' file
Prime the hold space with the required strings. On encountering the marker print the first line and then swap the strings in anticipation of the next marker.
I can not help you with sed, but this awk should do:
awk '/\$\$/ && !f {$0="\\begin{equation}";f=1} /\$\$/ && f {$0="\\end{equation}";f=0}1' file
\begin{equation}
y = x^2
\end{equation}
The f=0is not needed, if its not repeated.

Matlab - how to remove a line break when printing to screen?

There is for example a big big score
for 2 hours
and there is need to see how many more before the end of
do output on the screen of the outer loop
but the values ​​and there are many, such as 70 000
Question - how to remove a line break when printing to screen
not to receive 70 000 lines
and to see only the current display in one line?
Instead of using disp to display text to the screen, use fprintf, which requires you to enter line breaks manually.
Compare
>> disp('Hello, '), disp('World')
Hello,
World
with
>> fprintf('Hello, '), fprintf('World\n')
Hello, World
The \n at the end of 'World\n' signifies a line break (or newline as they're commonly called).
Try this function, which you can use in place of disp for a string argument. It displays to the command window, and remembers the message it has displayed. When you call it the next time, it first deletes the previous output from the command window (using ASCII backspace characters), then prints the new message.
In this way you only get to see the last message, and the command window doesn't fill up with old messages.
function teleprompt(s)
%TELEPROMPT prints to the command window, over-writing the last message
%
% TELEPROMPT(S)
% TELEPROMPT() % Terminate
%
% Input S is a string.
persistent lastMsg
if isempty(lastMsg)
lastMsg = '';
end
if nargin == 0
lastMsg = [];
fprintf('\n');
return
end
fprintf(repmat('\b', 1, numel(sprintf(lastMsg))));
fprintf(s);
lastMsg = s;