A way to append the beginning of every line before a pattern to the end of each same line? - sed

I am trying to copy the beginning of every line in a text file before a certain character to the end of the same line.
I've tried duplicating each line to the end of itself, and then deleting everything after the character, but the trouble is I haven't been able to figure out how to skip the first instance of the character so the result is that the duplicated text gets deleted as well as everything beyond the first instance of the character.
I've tried things like
sed '/S/ s/$/ append text/' sample.txt > cleaned.txt
but this only adds a fixed text. I've also tried using:
s/\(.*\)/\1 \1/
to duplicate the line, and then deleting everything after the S, but I can't figure out how to get it to go to the 3rd S not the 1st to start deleting.
What I have to start with:
dog 50_50_S5_Scale
cat 10_RV_S76_Scale
mouse 15_SQ_S81_Scale
What I'm trying to get:
dog 50_50_S5_Scale dog 50_50_
cat 10_RV_17_S76_Scale cat 10_RV_17_
mouse 15_EQ_S81_Scale mouse 15_EQ_
Where everything before the first S gets copied to the end of the line.

You may use
sed 's/\([^S]*\)S.*/& \1/' file
See the online demo
Details
\([^S]*\) - Capturing group 1 (\1): any 0+ chars other than S
S.* - S and the rest of the string (actually, line, since sed processes line by line by default).
The replacement is the concatenation of the whole match (&), space and Group 1 value.

You could try:
awk '{print $0 " " substr($0, 0, index($0,"S") - 1)}' file
We take the substring from the first character up to but not including the first occurance of "S".

Related

Replace block of text inside a file using the contents of another file using sed

I am looking to replace a block of text that is between markers with the contents of another file.
I came across this solution but it only works with one line
$ sed -n '/foo/{p;:a;N;/bar/!ba;s/.*\n/REPLACEMENT\n/};p' file
line 1
line 2
foo
REPLACEMENT
bar
line 6
line 7
I am trying to get the following working but it's not.
content=`cat file_content`
sed -n '/foo/{p;:a;N;/bar/!ba;s/.*\n/${content}\n/};p' file
output
line 1
line 2
foo
${content}
bar
line 6
line 7
How can I get ${content} to list the output of the file?
So I guess this should be a reasonably short way of doing it to replace text between foo and bar lines with content of file file_content:
sed -e '/^foo$/,/^bar$/{/^bar$/{x;r content_file
D};d}' file
For range of lines matching ^foo$ and ^bar$. If line matches ^bar$ swap (empty) hold space into pattern space, read and append content of content_file, then delete pattern space up to first newline and start next cycle with the reminder of the pattern space. For all other lines in that range... just drop the line (delete patter space and move to the next line of input).
Otherwise to the result of your question... any string enclosed in single quotes is taken literally by shell and without any expansion (also of variables) taking place. '${content}' means literally ${content} and that is also part of the argument passed to sed, whereas double quote text ("${content}") would still see shell expand variable to what its value before becoming part of the sed arguments. Since that could still see content tripping up sed, I would opt for the r method for being more generic / robust.
EDIT: Edit keeping the start and end lines in (since I've misread the question):
sed -e '/^foo$/,/^bar$/{/^foo$/{r content_file
p};/^bar$/!d}' file
This time for range between matched of ^foo$ and ^bar$... for opening line matching ^foo$ we it reads content from content_file appending it to pattern space and then prints it (because of delete that follow). Then for all line in the range not matching the closing line pattern ^bar$ it just drops it and moves on.
This might work for you (GNU sed):
sed '/foo/!b;:a;$b;N;/bar/!ba;P;s/.*\n//;e cat contentFile' file
Print all lines until one containing foo.
If this is the last line, then there will never be a line containing bar so break out and do not insert the contentFile.
Otherwise, append the next line and check for it containing bar, if not repeat.
The pattern space should now contain both foo and bar so, print the first line (containing foo), remove all other lines other than the one containing bar, print the file contentFile and then print the last line of the collection containing bar.
N.B. This does not insert the contentFile unless both foo and bar exist in file. Also the e command will evaluate the cat contentFile immediately and insert the result into the output stream before printing the line containing bar, whereas the r command always prints to the output stream after the implicit print of the sed cycle.
An alternative:
sed -ne '/foo/{p;:a;n;/bar/!ba;e cat contentFile' -e '};p' file
However this solution will only print lines before foo if file does not have a line containing bar.
sed '/foo/,/bar/{//!d;/foo/s//&\n'${content}'/}' file
From foo to bar, delete lines not matching previous match //!d.
On foo line, replace match & with match followed by \n${content}

sed pattern matching, stop at first found character

Take the string "hello_world 1 2 3"
I want the output to be "hello_world"
My attempt is "s/\(.*\) .*/\1/g"
But I get "hello_world 1 2"
Instead of stopping at the first space after the sequence, it gets the last space on the line.
I want to take any length of characters \(.*\) followed by a space ' ' and remove anything that comes after it .*
How can I do it?
Could you please try following.
echo "hello_world 1 2 3" | sed 's/\([^ ]*\).*/\1/'
Explanation of above:
Using sed's capability of storing matched regex into a temp buffer. Which could be later accessed by variables like 1, 2 and so on(depending upon number of buffers you are mentioning).
In here we are capturing everything till occurrence of first space into 1st temp buffer and then keeping everything as it is .*. While substituting we are mentioning \1 here which means substitute whole line's value with first matched/stored value of 1st temp buffer(which is hello_world).
Why OP's code not working: Because OP using .* which is a greedy matched regex and capturing all the line in 1st buffer itself that's why when its used \1 its actually printing whole line there.
This might work for you (GNU sed):
sed 's/\s.*//' file
Matches the first white space character and everything thereafter and removes it, leaving whatever is in front of i.e. all non-white space characters.
Same as:
sed 's/^(\S+).*/\1/' -E file

I want to print the last line of group using sed

I have file which is shown below
Section1
George, 1998-1995
Peter, 1999-1990
Simon, 1988-1960
Section2
Gery, 2019-2015
John, 1984-1983
Thomson, 1978-1965
When i give Section1 Expected output is
Simon, 1988-1960
Like this i have lots of sections. I want to achieve this with sed not using awk.
I tried like this . But it has the line number hard coding. And also it is printing the complete range
sed -n '/Section1/,4{p}'
Here i could able to remove the hardcoding. But unable to print the last line. And also next section name also coming.
sed -n '/Section1/ , /Section./{p}'
This might work for you (GNU sed):
sed '$b;N;/\nSection/P;D' file
Make a moving window of two lines and print the first line if the second line is begins Section and always the last line.
For the last line of a specific section use:
sed -n '/^Section1/{:a;h;$!{n;/^\S/!ba};x;s/^\s*//p}' file
A gnu awk solution.
awk -v RS='Section' '$1=="1" {print $(NF-1),$NF}' file
Simon, 1988-1960
By setting Record Selector to Section, awk works in block. Then print the second latest and the latest field of block matching 1, since Section is stripped of.
You may consider using
sed -n '/^Section1$/,/^Section[0-9]*$/{:a;h;n;/^Section[0-9]*$/!ba;x;s/^[ \t]*//;p}' file > newfile
See the online demo.
Details
-n - the switch suppresses default line output mode
/^Section1$/,/^Section[0-9]*$/ - a block of lines between a line that is equal to Section1 and a line that fully matches a Section and any 0 or more digits pattern (the next {...} group of commands relates to the range matched with this)
:a - sets a label named a
h - copies the current line into hold buffer
n - discards the current pattern space value and reads the next line into it
/^Section[0-9]*$/!ba - if the pattern space value does not match the end block line go back to label a
x - else, once we get to the last line, the previous one is in hold space, so x is used to swap hold and pattern space
s/^[ \t]*// - remove initial whitespace
p - print the pattern space.
Regex:
(Section1)((\n.*,.*)*\n\s*)(?'lastLine'.*)
Test here.
I did not understand exactly what you want to do with the result, so I cannot tell you the exact sed command.

SED: Operate on Last seven lines regardless of file length

I would like to operate on the last 7 lines of a file with sed regardless of the filelength.
According to a related question this type of range won't work: $-6,$ {..commands..}
What is the equivalent that will?
Pipe the output of tail -7 into sed.
tail -7 test.txt | sed -e "s/e/WWW/"
More info on Pipes here.
You could just switch from sed(1) to ed(1), the commands are about the same. In this case, the command is the same, except with no limitations on address range.
$ cat > fl7.ed
ed - $1 << \eof
1,7s/$/ (one of the first seven lines)/
$-6,$s/$/ (one of the last seven lines)/
w
q
eof
$ sh fl7.ed yourfile
perl -lne 'END{print join$\,#a,"-",#b}push#a,$_ if#a<6;push#b,$_;shift#b if#b>7'
In the END{} block you can do whatever is required; #a contains the first 6, #b the last 7 lines as requested.
This should work for you:
sed '1{N;N;N;N;N};N;$s/foo/bar/g;P;D' inputfile
Explanation:
1{N;N;N;N;N} - when the first line is read, load the pattern space with five more lines (total: 6 at this point)
N - append another line
$s/foo/bar/g - when the last line is read, perform some operation on the entire contents of pattern space (the last seven lines of the file). Operations can be more complex than shown here
P - print the test before the first newline in pattern space
D - delete the text just printed and loop to the beginning of the script (the "append another line" step - the first instruction is skipped since it only applies to the first line in the file)
This might work for you:
sed ':a;1,6{$!N;ba};${s/foo/bar/g;q};N;D' file
Explanation:
Create a loop label. :a
Gather lines 1 to 6 in the pattern space (PS). 1,6{$!N;ba}
If it's the last line, process the PS and quit, therefore printing out the last seven lines. ${s/foo/bar/g;q}
If it's not the last line, append the next line to the PS. N
Delete upto the first newline and begin a new cycle without reading a new line. D

Very basic replace using sed

Really would appreciate help on this.
I am using sed to create a CSV file. Essentially multiple html files are all merged to a single html file and sed is then used to remove all the junk pictures etc to get to the raw columnar data.
I have all this working but am stuck on the last bit.
What I want to do is very basic - I want to replace the following lines:
"a variable string"
"end td"
"begin td"
with a single line:
"a variable string"
(with a tab character at the end of this line)
I'M USING DOS.
As you see I'm new to all this. If I could get this working would save me a lot of time in the future so would appreciate the help.
At the moment I have to inject some html headers back into the text file, open it in a html editor, select the table and then paste this into a spreadsheet which is a bit of pain.
P.S. is there an easy way to get sed to remove the parenthesis '(' and ')' from a given line?
I doubt that this is what you really want, but it's what you asked for.
sed "s/\"a variable string\"/&\t/; s/\"end td\"//; s/\"begin td\"//" inputfile
What you probably want to do is replace them when they appear consecutively. Here's how you might do that:
sed "1{N;N}; /\"a variable string\"\n\"end td\"\n\"begin td\"/ s/\n.*$/\t/;ta;bb;:a;N;N;:b;$!P;N;D" inputfile
This will remove all parentheses in a file:
sed "s/[()]//g" inputfile
To select particular lines, you could do something like this:
sed "/foo/ s/[()]//g" inputfile
which will only make the replacement if the word "foo" is somewhere on a line.
Edit: Changed single quotes to double quotes to accommodate GNUWin32 and CMD.EXE.
A previous comment I left doesn't appear to have been saved - so will try again
The code to remove the ( and ) worked perfectly thanks
You are right - I was looking to merge the 3 lines into one line so the second example you gave where it looks like its reading the next two lines into the pattern space looks more promising. The output wasn't what I was expecting however.
I now realize the code is going to have to be more complicated and I don't want to trouble you any more as my manual method of injecting some html code back into the text file and opening it up in Openoffice and pasting into a spreadsheet only takes a few seconds and I have a feeling to manually produce the sed coding to this would be a nightmare.
Essentially the rules for converting the html would need to be:
[each tag has been formatted so it appears on its own line]
I have given example of an input file and desired output file below for reference
1) if < tr > is followed by < td > on the next line completely remove the < tr > and < td > lines [i.e. do not output a carriage return] and on the NEXT line stick a " at the start of that line [it doesn't matter about a carriage return at the end of this line as it is going to be edited later]
2) if < /td > is followed by < td > completely remove both these two lines [again do not output a carriage return after these lines] and on the PREVIOUS line output a ", [do not output a carriage return] and on the NEXT line stick "at the start of the line [don't worry about the the ending carriage return is will be edited later]
3) if < /td > is followed by < /tr > delete both of these lines and on the previous line add a " at to the end of the line and a final carriage return.
I have given an example of what the input and desired output would be:
input: http://medinfo.redirectme.net/input.txt
[the wanted file will be posted in the next message - this board will not allow new users to post a message with more than one hyperlink!]
there is an added issue that the address column is on multiple lines on the input file - this could be reduced to one line by looking to see if the first character of the NEXT line is a " If it isn't then do not output the carriage return at the end of the current line
Phew that was a nightmare just to type out never mind actually code. But thanks again for all your help in getting this far!
:-)