Use sed to remove a block of duplicated text

Use sed to remove a block of duplicated text - sed

The data looks like this
Para1
X12Y1
AABBAABABA
BBAAABABAB
Para2
X13Y2
ABABAABAAB
ABABABABAA
Para3
X13Y2
BBBABABABA
BABABABABA
Para4
X12Y1
BBBABABABA
BABABABABA
Para5
X20Y9
BBBABABABA
BABABABABA
How to remove the Para3 and Para 4 base on the rule that X13Y2 and X12Y1 been duplicated ?
Desired output as below:
Para1
X12Y1
AABBAABABA
BBAAABABAB
Para2
X13Y2
ABABAABAAB
ABABABABAA
Para5
X20Y9
BBBABABABA
BABABABABA

awk solution:
awk '/Para/{ p=$0 }/^X[0-9]/ && !a[$0]++{ rn=NR+2; printf "%s\n%s\n",p,$0; next }NR<=rn' file
The output:
Para1
X12Y1
AABBAABABA
BBAAABABAB
Para2
X13Y2
ABABAABAAB
ABABABABAA
Para5
X20Y9
BBBABABABA
BABABABABA

This might work for you (GNU sed):
sed -r '/^Para/{N;H;N;N;x;s/^(.*)\n.*(\n.*)$/\2\1/;/^(\n[^\n]*)(\n.*)*(\1)/{x;d};x}' file
On matching a line beginning Para, append the index (line 2) to the hold space (HS), append the following 2 lines to the pattern space (PS), swap to the the HS and prepend the index to the front of the line (separated by a newline), check to see if that index is already present in the HS and if so swap back to the PS and delete that entry, otherwise swap back to the PS and print that entry.

Related

Replace file entries in subdirectories with sed from list

I want to replace a certain line (#6) in a whole bunch of documents that looks like this:
N Metal1 Metal2 Metal3 Metal4
where the metals need to be replaced with chemical symbols from a list of permutations:
CrHfMoNb CrHfMoTa CrHfMoTi CrHfMoV CrHfMoW CrHfMoZr CrHfNbTa CrHfNbTi CrHfNbV CrHfNbW CrHfNbZr CrHfTaTi CrHfTaV CrHfTaW CrHfTaZr CrHfTiV CrHfTiW CrHfTiZr CrHfVW CrHfVZr CrHfWZr CrMoNbTa CrMoNbTi CrMoNbV CrMoNbW CrMoNbZr CrMoTaTi CrMoTaV CrMoTaW CrMoTaZr CrMoTiV CrMoTiW CrMoTiZr CrMoVW CrMoVZr CrMoWZr CrNbTaTi CrNbTaV CrNbTaW CrNbTaZr CrNbTiV CrNbTiW CrNbTiZr CrNbVW CrNbVZr CrNbWZr CrTaTiV CrTaTiW CrTaTiZr CrTaVW CrTaVZr CrTaWZr CrTiVW CrTiVZr CrTiWZr CrVWZr HfMoNbTa HfMoNbTi HfMoNbV HfMoNbW HfMoNbZr HfMoTaTi HfMoTaV HfMoTaW HfMoTaZr HfMoTiV HfMoTiW HfMoTiZr HfMoVW HfMoVZr HfMoWZr HfNbTaTi HfNbTaV HfNbTaW HfNbTaZr HfNbTiV HfNbTiW HfNbTiZr HfNbVW HfNbVZr HfNbWZr HfTaTiV HfTaTiW HfTaTiZr HfTaVW HfTaVZr HfTaWZr HfTiVW HfTiVZr HfTiWZr HfVWZr MoNbTaTi MoNbTaV MoNbTaW MoNbTaZr MoNbTiV MoNbTiW MoNbTiZr MoNbVW MoNbVZr MoNbWZr MoTaTiV MoTaTiW MoTaTiZr MoTaVW MoTaVZr MoTaWZr MoTiVW MoTiVZr MoTiWZr MoVWZr NbTaTiV NbTaTiW NbTaTiZr NbTaVW NbTaVZr NbTaWZr NbTiVW NbTiVZr NbTiWZr NbVWZr TaTiVW TaTiVZr TaTiWZr TaVWZr TiVWZr
to make it look for example like this:
N Cr Hf Mo Nb
I can do this easily with the sed command using:
sed -i '6s/Metal1 Metal2 Metal3 Metal4/Cr Hf Mo Nb/' filename`
The problem is that I need to do it automatically for all 126 combinations, where each file is residing in its own subdirectory for each composition and has to be adjusted accordingly to its own elements. The file always has the same name and is completely identical before this change.
The chemical symbols have to be listed alphabetically and there must be one space between each, or the code won't work. I assume this is difficult because all the used elements have two letters in their symbols except for V. Is there an efficient way to do this?

Handling the two-char (Zr) one-char (W) problem is easy. Each capital letter marks the beginning of a new element.
cd dir/
for dir in *N; do
split="$(sed 's/N$//;s/[A-Z]/ &/g' <<< "$dir")"
sed -i "s/N Metal1 Metal2 Metal3 Metal4/N $split/" "$dir/file"
done
Note that $split starts with a space, so the replacement string is something like N Cr Hf Mo Nb with two spaces between N and Cr – just as you wanted.

matlab: delimit .csv file where no specific delimiter is available

i wonder if there is the possibility to read a .csv file looking like:
0,0530,0560,0730,....
90,15090,15290,157....
i should get:
0,053 0,056 0,073 0,...
90,150 90,152 90,157 90,...
when using dlmread(path, '') matlab spits out an error saying
Mismatch between file and Format character vector.
Trouble reading 'Numeric' field frin file (row 1, field number 2) ==> ,053 0,056 0,073 ...
i also tried using "0," as the delimiter but matlab prohibits this.
Thanks,
jonnyx

str= importdata('file.csv',''); %importing the data as a cell array of char
for k=1:length(str) %looping till the last line
str{k}=myfunc(str{k}); %applying the required operation
end
where
function new=myfunc(str)
old = str(1:regexp(str, ',', 'once')); %finding the characters till the first comma
%old is the pattern of the current line
new=strrep(str,old,[' ',old]); %adding a space before that pattern
new=new(2:end); %removing the space at the start
end
and file.csv :
0,0530,0560,073
90,15090,15290,157
Output:
>> str
str=
'0,053 0,056 0,073'
'90,150 90,152 90,157'

You can actually do this using textscan without any loops and using a few basic string manipulation functions:
fid = fopen('no_delim.csv', 'r');
C = textscan(fid, ['%[0123456789' 10 13 ']%[,]%3c'], 'EndOfLine', '');
fclose(fid);
C = strcat(C{:});
output = strtrim(strsplit(sprintf('%s ', C{:}), {'\n' '\r'})).';
And the output using your sample input file:
output =
2×1 cell array
'0,053 0,056 0,073'
'90,150 90,152 90,157'
How it works...
The format string specifies 3 items to read repeatedly from the file:
A string containing any number of characters from 0 through 9, newlines (ASCII code 10), or carriage returns (ASCII code 13).
A comma.
Three individual characters.
Each set of 3 items are concatenated, then all sets are printed to a string separated by spaces. The string is split at any newlines or carriage returns to create a cell array of strings, and any spaces on the ends are removed.

If you have access to a GNU / *NIX command line, I would suggest using sed to preprocess your data before feeding into matlab. The command would be in this case : sed 's/,[0-9]\{3\}/& /g' .
$ echo "90,15090,15290,157" | sed 's/,[0-9]\{3\}/& /g'
90,150 90,152 90,157
$ echo "0,0530,0560,0730,356" | sed 's/,[0-9]\{3\}/& /g'
0,053 0,056 0,073 0,356
also, you easily change commas , to decimal point .
$ echo "0,053 0,056 0,073 0,356" | sed 's/,/./g'
0.053 0.056 0.073 0.356

sed: Delete first line of hold space?

How do I delete the first line of the hold space in sed?
I've tried
x;
s/.*\n//;
x;
But .*\n matches up to the last newline, deleting all the lines except for the last one.

this should remove the 1st line from "hold space"
x;s/[^\n]*\n//
Example:
kent$ sed -n 'H;${x;p}' <(seq 3)
1
2
3
remove the first empty line:
kent$ sed -n 'H;${x;s/[^\n]*\n//;p}' <(seq 3)
1
2
3

Simple put any random string with h i.e 1h;1d, by default it's empty.

replace two consecutive lines based on a pattern and repeat through out the file

I'm trying to replace two consecutive lines based on a pattern match, and would want this to repeat for the entire file. Here is the input file:
c aaaaa bbb
+ 0.1
c xxxx
c yyyy
+ 0.2
* c gggg
m eeeee hhhhh
+ 0.3
The command I tried is:
sed '/^c/{N;s/+/*+/}'
I expected to see a * prepended to each line beginning, but only those lines immediatlely following a c line:
c aaaaa bbb
*+ 0.1
c xxxx
c yyyy
*+ 0.2
* c gggg
m eeeee hhhhh
+ 0.3
what I actually get:
c aaaaa bbb
*+ 0.1
c xxxx
c yyyy
+ 0.2
* c gggg
m eeeee hhhhh
+ 0.3
Here, i see only the first occurrence of + (with previous line beginning with c) is getting replaced with *+. The second occurrence of + in the file is not getting replaced.
What am I doing wrong? How do I get the result I want: replacement happens in multiple consecutive lines in the file?

The problem you run into is that when a line that starts with c comes right after another line that comes with c, the N command in your code consumes it, and it isn't available for checking when you process the line that comes next.
Instead of reading ahead to see if the next line should be changed, I'd remember the last line and look back to see if the current line should be changed:
sed 'x; G; /^c/ s/+/*+/; s/.*\n//' file
This works as follows:
x # Swap pattern space and hold buffer. Because we do this here,
# the previous line will be in the hold buffer for every line
# (except the first, then it is empty)
G # append hold buffer to pattern space. Now the pattern space
# contains the previous line followed by the current line.
/^c/ s/+/*+/ # If the pattern space begins with a c (i.e., if the previous
# line began with a c), replace + with *+
s/.*\n// # Remove the first line (the previous one) from the pattern
# space
# Then drop off the end. The changed current line is printed.

sed -e 'H;$!d' -e 'x' -e ':cycle' -e 's/\(\nc[[:alnum:][:blank:][:punct:]]*\n\)+/\1*+/g;t cycle' -e 's/.//' YourFile
Posix version changing the whoe in max 2 internal cycle
load the file in memory (-e 'H;$!d' -e 'x')
Add the * in front of line starting with a + after a line starting with a c ( s/\(\nc[[:alnum:][:blank:][:punct:]]*\n\)+/\1*+/g)
do the same if occur in previous line ( :cycle and t cycle)
use a trick to insure starting with new line( H append current line to buffer also for first line so an extra new line as heading) (for first line with a c) and remove this at the end ('s/.//)

How can I use sed to to convert $$ blah $$ in TeX to \begin{equation} blah \end{equation}

I have files with entries of the form:
$$
y = x^2
$$
I'm looking for a way (specifically using sed) to convert them to:
\begin{equation}
y = x^2
\end{equation}
The solution should not rely on the form of the equation (which may also span mutiple lines) nor on the text preceding the opening $$ or following the closing $$.
Thanks for the help.

sed '
/^\$\$$/ {
x
s/begin/&/
t use_end_tag
s/^.*$/\\begin{equation}/
h
b
: use_end_tag
s/^.*$/\\end{equation}/
h
}
'
Explanation:
sed maintains two buffers: the pattern space (pspace) and the hold space (hspace). It operates in cycles, where during each cycle it reads a line and executes the script for that line. pspace is usually auto-printed at the end of each cycle (unless the -n option is used), and then deleted before the next cycle. hspace holds its contents between cycles.
The idea of the script is that whenever $$ is seen, hspace is first checked to see if it contains the word "begin". If it does, then substitute the end tag; otherwise substitute the begin tag. In either case, store the substituted tag in the hold space so it can be checked next time.
sed '
/^\$\$$/ { # if line contains only $$
x # exchange pspace and hspace
s/begin/&/ # see if "begin" was in hspace
t use_end_tag # if it was, goto use_end_tag
s/^.*$/\\begin{equation}/ # replace pspace with \begin{equation}
h # set hspace to contents of pspace
b # start next cycle after auto-printing
: use_end_tag
s/^.*$/\\end{equation}/ # replace pspace with \end{equation}
h # set hspace to contents of pspace
}
'

This might work for you (GNU sed):
sed -r '1{x;s/^/\\begin{equation}\n\\end{equation}/;x};/\$\$/{g;P;s/(.*)\n(.*)/\2\n\1/;h;d}' file
Prime the hold space with the required strings. On encountering the marker print the first line and then swap the strings in anticipation of the next marker.

I can not help you with sed, but this awk should do:
awk '/\$\$/ && !f {$0="\\begin{equation}";f=1} /\$\$/ && f {$0="\\end{equation}";f=0}1' file
\begin{equation}
y = x^2
\end{equation}
The f=0is not needed, if its not repeated.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Use sed to remove a block of duplicated text - sed

awk solution: awk '/Para/{ p=$0 }/^X[0-9]/ && !a[$0]++{ rn=NR+2; printf "%s\n%s\n",p,$0; next }NR<=rn' file The output: Para1 X12Y1 AABBAABABA BBAAABABAB Para2 X13Y2 ABABAABAAB ABABABABAA Para5 X20Y9 BBBABABABA BABABABABA

Related

Replace file entries in subdirectories with sed from list

matlab: delimit .csv file where no specific delimiter is available

sed: Delete first line of hold space?

replace two consecutive lines based on a pattern and repeat through out the file

How can I use sed to to convert $$ blah $$ in TeX to \begin{equation} blah \end{equation}

Categories

Resources