Can someone please assist me expand sed to extract required data? - sed

I must admit to sed seeming at times to be a bit of a black art to me; I found the following statement, which provided some of what I require. I am assuming that sed is my handiest option as it will be in a bash script.
I have a file with lots of stuff, e.g.
LOTS_OF_OTHER_STUFF/STRING1\nOL8.0:2019-10-08-2/STRING2/LOTS_OF_OTHER_STUFF_HERE/STRING1\nOL8-slim:2019-10-08-20/SRING2/LOTS_OF_OTHER_STUFF
sed '/STRING1/!d;s//&\n/;s/.*\n//;:a;/STRING2/bb;$!{n;ba};:b;s//\n&/;P;D'
\nOL8.0:2019-10-08-2/
\nOL8-slim:2019-10-08-20/
What I require is:
8.0 2
8-slim 20
Can anyone help?

Related

Substitution/Replacement problem with a data file Perl, Sublime

My file is x in the format \D{5}\d\d/ D{5}\d or |D{5}dd
example:
aahed9aalii5aargh9abaca9abaci9aback13
The /d may be 1 or 2 digits no spaces or breaks in the entire document.
The goal is to create a .csv file dividing the \D{5} from \d{1} or \d{2}
Tried sublime text,perl,textedit or pages
In Sublime I understand how to find the (\D{5} group) but not how to replace that with (\D{5}),)
I found the s(dog/cat)substitution example but could not get that to translate in perl or sublime.
Found the perl command line idea
(perl -pi.bak -e 's\/D{5}/D{5}\,/g' $filename) may not be exact
But could not decipher all the errors
The reason I chose regex for this is the only commonality to each value is the length of the word is the same throughout the document. There are no tabs, no parens, no spaces, no fixed length fields nothing to get my hooks in.
The question:
How do I retain the original values in the replace/substitution function?
I realize what this board has to deal with in regard to duplicate
questions. Do you realize on my side how difficult it is to search through all the previous questions when I am not sure what I am looking for?
I am not looking for someone to give me a fish, looking for someone to teach me how to fish.
If REGEX is not the answer maybe I am missing something any guidance would be appreciated.
Thanks
The $1, $2, etc variables may be used to refer back to "captures" (parenthesized parts) within the most recent regexp.
echo aahed9aalii5aargh9abaca9abaci9aback13 | perl -pe 's/(\D{5})(\d*)/$1=$2,/g'
Outputs:
aahed=9,aalii=5,aargh=9,abaca=9,abaci=9,aback=13,

Using SED to remove a domain name starting with http from result

I am very new to shell scripting, the command line, sed, awk, etc so bear with me.
I have a script that outputs -
Reseller: iwantmyname http://iwantmyname.com
I want it to read -
Reseller: iwantmyname
Dropping anything starting with http
I figured SED would be a good tool but I only have a basic knowledge of it, and the tutorials I've found online seem advanced and difficult for me.
I know the basic is sed 's/find_this/replace_with_this/' and I figured I'd replaced the found http with // or nothing. But I don't know how to search for something that starts with http and include EVERYTHING after it. I've looked up regex but that seems quite difficult as well.
Replace a white space followed by http and rest of row with nothing:
's/ http.*//'

Deleting lines of a file with sed - unexpected behaviour

I noticed something a bit odd while fooling around with sed. If you try to remove multiple line intervals (by number) from a file, but any interval specified later in the list is fully contained within an interval earlier in the list, then an additional single line is removed after the specified (larger) interval.
seq 10 > foo.txt
sed '2,7d;3,6d' foo.txt
1
9
10
This behaviour was behind an annoying bug for me, since in my script I generated the interval endpoints on the fly, and in some cases the intervals produced were redundant. I can clean this up, but I can't think of a good reason why sed would behave this way on purpose.
Since this question was highlighted as needing an answer in the Stack Overflow Weekly Newsletter email for 2015-02-24, I'm converting the comments above (which provide the answer) into a formal answer. Unattributed comments here were made by me in essentially equivalent form.
Thank you for a concise, complete question. The result is interesting. I can reproduce it with your script. Intriguingly, sed '3,6d;2,7d' foo.txt (with the delete operations in the reverse order) produces the expected answer with 8 included in the output. That makes it look like it might be a reportable bug in (GNU) sed, especially as BSD sed (on Mac OS X 10.10.2 Yosemite) works correctly with the operations in either order. I tested using 'sed (GNU sed) 4.2.2' from an Ubuntu 14.04 derivative.
More data points for you/them. Both of these include 8 in the output:
sed -e '/2/,/7/d' -e '/3/,/6/d' foo.txt
sed -e '2,7d' -e '/3/,/6/d' foo.txt
By contrast, this does not:
sed -e '/2/,/7/d' -e '3,6d' foo.txt
The latter surprised me (even accepting the basic bug).
Beats me. I thought given some of sed's arcane constructs that you might be missing the batman symbol or something from the middle of your command but sed -e '2,7d' -e '3,6d' foo.txt behaves the same way and swapping the order produces the expected results (GNU sed 4.2.2 on Cygwin). /bin/sed on Solaris always produces the expected result and interestingly so does GNU sed 3.02. Ed Morton
More data: it only seems to happen with sed 4.2.2 if the 2nd range is a subset of the first: sed '2,5d;2,5d' shows the bug, sed '2,5d;1,5d' and sed '2,5d;2,6d' do not. glenn jackman
The GNU sed home page says "Please send bug reports to bug-sed at gnu.org" (except it has an # in place of ' at '). You've got a good reproduction; be explicit about the output you expect vs the output you get (they'll get the point, but it's best to make sure they can't misunderstand). Point out that the reverse ordering of the commands works as expected, and give the various other commands as examples of working or not working. (You could even give this Q&A URL as a cross-reference, but make sure that the bug report is self-contained so that it can be understood even if no-one follows the URL.)
You can also point to BSD sed (and the Solaris version, and the older GNU 3.02 sed) as behaving as expected. With the old version GNU sed working, it means this is arguably a regression. […After a little experimentation…] The breakage occurred in the 4.1 release; the 4.0.9 release is OK. (I also checked 4.1.5 and 4.2.1; both are broken.) That will help the maintainers if they want to find the trouble by looking at what changed.
The OP noted:
Thanks everyone for comments and additional tests. I'll submit a bug report to GNU sed and post their response. santayana

using sed with ? (question mark) special character

I have an infected website, and I am trying to clean it out using sed. Unfortunately I am unable to escape the question mark sign in the URL and I am really stuck here. I've searched over the web for a possible solution, but unfortunately I didn't found a proper way to do so.
Just an explanation:
The injected code is similar to this one:
< iframe src=http://test.com/index.html?i=23123>< /iframe>
Note that I am not a pro, and there is why I need your help!
so my way to clear the code is :
sed -i '/< iframe src=http:\/\/test.com\/index.html\?i=23123>/,/< \/iframe>/d' index.html
Unfortunately that didn't help as well as all others.
All help will be gratefully appreciated.
echo "< iframe src=http://test.com/index.html?i=23123>< /iframe>" \
| sed 's#< iframe src=http://test.com/index.html?i=23123>< /iframe>##'
Produces no output, which to me means this is successfully deleting your problem string.
Note that most seds will accept an alternate regex-replacement character, here I am using # because there are no #s in the search target. On some seds, you have to tell it 'hey I'm using an alternate, and escape the char, like s\#.....##.
I don't see why your attempt to quote the ? is failing. Did you try [?] and (worst case) [\?]. Are there 2nd level evaluations happening by the shell that you're not mentioning here? Does my simple example also fail?
As others will certainly tell you, your approach is strictly a bandaid, you need to figure out what the security hole is in your system and fix it. Your pages will get corrupted again. :-(
IHTH

General help for deciphering/explaining sed one-liners?

I've just stumbled upon some cryptic sed expression in a legacy script. Could you give me some hints how to start decoding it?
Best thing would be some automatic tool translating sed incantations to English, but for a close runner up, I'd be very grateful for some nice index of (all) sed commands. Otherwise, I'm certainly highly interested in any help at all on how to quickly attack the problem (other than having to read the manual cover to cover...).
(Side note: as you may have guessed, I don't want to just paste the expression here, as I'd like to be able to do it easier and faster next time I stumble on some similar line noise...)
I'd be very grateful for help!
Edit: regexps themselves aren't problem, by the way, I'm good enough at them.
i don't think there is automatic tool that can 'transalte' sed commands to english. however you may want to check http://aurelio.net/sedsed/ . it will help you to understand one sed script, what it does, and how.
anyway, if you list some examples would be good.
This might work for you.
Unix in a Nutshell by Robbins has a very nice chapter on sed. Clear and concise descriptions of the commands.
Your best bet would be to learn the sed language in-depth. Unforunately, the sed documentation is more like a reference. Here's a nice step by step guide that doesn't take too long to read.
I found "Sed One-Liners Explained" to be very informative as well as fun.