Invalid preceding regular expression error given by sed - sed

I'm part of a rotating team that manages a lot of websites, and we inherited some particularly bad code for one website that we're in the process of completely redesigning. Horrifically enough, there are links on the development server that lead you to the live server and to old domains and a lot of other terrible, terrible things.
I've been trying to write a grep/sed command to replace all of these links with the user-defined php function full_link that we use across our websites now to prevent all the linking to different places. So (using a placeholder for our domain) instead of writing http://www.place.com/foo/bar, you'd write <?php echo full_link('foo/bar'); ?> and it will work when we move it from one server to another.
Here's what I got:
grep -v 'echo' * -r -P | grep "(?<=<a href=['\"])(http:\/\/foo\.bar\.net\/|10\.41\.6\.118\/|http:\/\/foo2\.bar\.net\/)([^<]*?)(?=['\"])" -P | sed -r "s#(?<=<a href=['\"])(http://foo\.bar\.net/|10\.41\.6\.118/|http://foo2\.bar\.net/)([^<]*?)(?=['\"])#<?php echo full_link('\2'); ?>#gpw output"
(If you're wondering about the first grep or the [^<], they're both just a basic attempt to keep from putting php tags inside existing php tags. Since it's just a first pass to make manual editing less full of copy-pasting links and getting redirected to the wrong server, it doesn't need to be perfect, but I am open to better ways to do that.)
I've got the grep statements working and picking up what I want them to get, but when I add the sed to the end, this is what happens:
sed: -e expression #1, char 159: Invalid preceding regular expression
From what research I've done, it seems like I probably escaped something in my sed statement incorrectly, and I've tried a number of things, but it only ever gives me the same message pointing to one of the last few characters of the expression.

sed "s.http://[^/]*/.<?php echo full_link('.;s/$/'); ?>/"
look for http://[^/]*/
replace with <?php echo full_link('
look for $
replace with '); ?>

Related

The perl -pe command

So I've done a research about the perl -pe command and I know that it takes records from a file and creates an output out of it in a form of another file. Now I'm a bit confused as to how this line of command works since it's a little modified so I can't really figure out what exactly is the role of perl pe in it. Here's the command:
cd /usr/kplushome/entities/Standalone/config/webaccess/WebaccessServer/etc
(PATH=/usr/ucb:$PATH; ./checkall.sh;) | perl -pe "s,^, ,g;"
Any idea how it works here?
What's even more confusing in the above statement is this part : "s,^, ,g;"
Any help would be much appreciated. Let me know if you guys need more info. Thank you!
It simply takes an expression given by the -e flag (in this case, s,^, ,g) and performs it on every line of the input, printing the modified line (i.e. the result of the expression) to the output.
The expression itself is something called a regular expression (or "regexp" or "regex") and is a field of learning in and of itself. Quick googles for "regular expression tutorial" and "getting started with regular expressions" turn up tons of results, so that might be a good place to start.
This expression, s,^, ,g, adds ten spaces to the start of the line, and as I said earlier, perl -p applies it to every line.
"s,^, ,g;"
s is use for substitution. syntax is s/somestring/replacement/.
In your command , is the delimiter instead of /.
g is for work globally, means replace all occurrence.
For example:
perl -p -i -e "s/oldstring/newstring/g" file.txt;
In file.txt all oldstring will replace with newstring.
i is for inplace file editing.
See these doc for information:
perlre
perlretut
perlop

Sed replace in html file

How could I append 'index.html' to all links in a html file that do not end with that word ?
So that, for example, href="http://mysite/" would become href="http://mysite/index.html".
I am not a sed expert, but think this works:
sed -e "s_\"\(http://[^\"]*\)/index.html\"_\"\1\"_g" \
-e "s_\"\(http://[^\"]*[^/]\)/*\"_\"\1/index.html\"_g"
The first replacement finds URLS already ending in /index.html and deletes this ending.
The second replacement adds the /index.html as required. It deals with cases that end in / and also those that don't.
More than one version of sed exists. I'm using the one that comes in XCode for OS X.
for href ending with /
sed '\|href="http://.*/| s||\1index.html' YourFile
if there is folder ref without ending /, you should specifie what is consider as a file or not (like last name with a dot inside for file, ...)
What about this:
echo 'href="http://mysite/"' | awk '/http/ {sub(/\/\"/,"/index.html\"")}1'
href="http://mysite/index.html"
echo 'href="http://www.google.com/"' | awk '/http/ {sub(/\/\"/,"/index.html\"")}1'
href="http://www.google.com/index.html"
In general this is an almost unsolvable problem. If your html is "reasonably well behaved", the following expression searches for things that "look a lot like a URL"; you can see it at work at http://regex101.com/r/bZ9mR8 (this shows the search and replace for several examples; it should work for most others)
((?:(?:https?|ftp):\/{2})(?:(?:[0-9a-z_#-]+\.)+(?:[0-9a-z]){2,4})?(?:(?:\/(?:[~0-9a-z\#\+\%\#\.\/_-]+))?\/)*(?=\s|\"))(\/)?(index\.html?)?
The result of the above match should be replaced with
\1index.html
Unfortunately this requires regex wizardry that is well beyond the rather pedestrian capabilities of sed, so you will have to unleash the power of perl, as follows:
perl -p -e '((?:(?:https?|ftp):\/{2})(?:(?:[0-9a-z_#-]+\.)+(?:[0-9a-z]){2,4})?(?:(?:\/(?:[~0-9a-z\#\+\%\#\.\/_-]+))?\/)*(?=\s|\"))(\/)?(index\.html?)?/\index.html/gi'
It looks a bit daunting, I know. But it works. The only problem - if a link ends in /, it will add /index.html. You could easily take the output of the above and process it with
sed 's/\/\/index.html/\/index.html/g'
To replace a double-backslash-before-index.html with a single backslash...
Some examples (several more given in the link above)
http://www.index.com/ add /index.html
http://ex.com/a/b/" add /index.html
http://www.example.com add /index.html
http://www.example.com/something do nothing
http://www.example.com/something/ add /index.html
http://www.example.com/something/index.html do nothing

Using bash/tail/perl/alias for easy highlighting of different strings

I am developing a tomcat application and would like to be able to search for specific things and highlight it when viewing the log. I want something like an alias that takes a parameter (regex) as input and highlight the matching string.
So far, I've figured this works, but its not practical enough to have to change a small part of it for every time I want something new:
tail -n 100 -f /opt/apache-tomcat-6.0.26/logs/catalina.out | perl -pe 's/null/\e[1;31m$&\e[0m/g'
This is what I thought would work:
logColor(){
x="'s/"
y="/\e[1;31m$&\e[0m/g'"
tail -n 100 -f /opt/apache-tomcat-6.0.26/logs/catalina.out | perl -pe $x$1$y
}
alias logC=logColor
I've tested that this prints out the two same lines:
logColorTest(){
x="'s/"
y="/\e[1;31m$&\e[0m/g'"
echo $x$1$y
echo "'s/null/\e[1;31m$&\e[0m/g'"
}
alias logCT=logColorTest
logCT null
So I am lost on why this does not work and would appreciate input from someone who knows how this works :)
Problem with grep is that, you get only matching lines & other lines are filtered out. (That's what is grep supposed to do anyway.) Many times however, we need all the output, but with some particular strings highlighted.
I have this small bash function in my .bashrc for such requirement:
mark ()
{
local searchExpr=${1/\//\\\/};
sed "s/$searchExpr/"`echo -n -e "\e[91;1m"`'&'`echo -n -e "\e[0m"`'/gi' $2
}
Usage:
command | mark some_string # OR
mark some_string some_file
Rename to suitable function name if required.
NOTE: There is a great command called highlight. Hence I could not use that as my function name.
As #fedorqui pointed out, you can use grep to do this:
grep --colour 'null\|$'
This will match and highlight null or the end of a line, meaning all lines are shown.
Using the GREP_COLORS environment variable you can control how different parts are highlighted, e.g mark matched text in yellow:
export GREP_COLORS='ms=1;33'

Perl Search and replace keeping middle part of string

I've been using Codeigniter for my PHP project and I've been using their session class.
$this->session->userdata('variablename')
I've been having a lot of problems with this so i've decided to use PHP Native session.
$_SESSION['variablename']
This is what I've got so far
perl -p -i -e "s/$this->session->userdata('.*?$SOMEVAR.*?\')/$_SESSION['$1']/g" offer.php
But truth to be told I don't really know what I'm doing.
I would also like to do this on all php files in my project.
Help much appreciated.
The regex should be:
s/\$this->session->userdata\('(.?)'\)/$_SESSION['$1']/g
Issues with the version you posted are mostly with un-escaped characters--you can escape a $ or parenthesis by adding a \ prior to the character. For example, \$this will find the text "$this", while $this will search for the value of the $this variable.
For a more comprehensive look at escapes (and other quick tips), if you have $2, I highly recommend this cheat sheet.
Also, you don't need to use the .*?$SOMEVAR.*? construct you added in there...Perl will automatically capture the result found between the first pair of parentheses and store it in $1, the second set of parentheses gets $2, etc.
When shell quoting is getting complicated, the simplest thing to do is to just put the source into a file. You can still use it as a one-liner. I have used a negative lookahead assertion to make sure that it does not break for escaped single quotes inside the string.
# source file, regex.txt
s/\$this->session->userdata\('(.+?)(?!\\')'\)/\$_SESSION['$1']/g;
Usage:
perl -pi regex.txt yourfile.php
Note that you simply leave out the -e switch. Also note that -i requires a backup extension for Windows, e.g. -i.bak.

How to use sed to return something from first line which matches and quit early?

I saw from
How to use sed to replace only the first occurrence in a file?
How to do most of what I want:
sed -n '0,/.*\(something\).*/s//\1/p'
This finds the first match of something and extracts it (of course my real example is more complicated). I know sed has a 'quit' command which will make it stop early, but I don't know how to combine the 'q' with the above line to get my desired behavior.
I've tried replacing the 'p' with {p;q;} as I've seen in some examples, but that is clearly not working.
The "print and quit" trick works, if you also put the substitution command into the block (instead of only putting the print and quit there).
Try:
sed -n '/.*\(something\).*/{s//\1/p;q}'
Read this like: Match for the pattern given between the slashes. If this pattern is found, execute the actions specified in the block: Replace, print and exit. Typically, match- and substitution-patterns are equal, but this is not required, so the 's' command could also contain a different RE.
Actually, this is quite similar to what ghostdog74 answered for gawk.
My specific use case was to find out via 'git log' on a git-p4 imported tree which perforce branch was used for the last commit. git log (when called without -n will log every commit that every happened (hundreds of thousands for me)).
We can't know a-priori what value to give git for '-n.' After posting this, I found my solution which was:
git log | sed -n '0,/.*\[git-p4:.*\/\/depot\/blah\/\([^\/]*\)\/.*/s//\1/p; /\[git-p4/ q'
I'd still like to know how to do this with a non '/' separator, and without having to specify the 'git-p4' part twice, once for the extraction and once for the quit. There has got to be a way to combine both on the same line...
Sed usually has an option to specify more than one pattern to execute (IIRC, it's the -e option). That way, you can specify a second pattern that quits after the first line.
Another approach is to use sed to extract the first line (sed '1q'), then pipe that to a second sed command (what you show above).
use gawk
gawk '/MATCH/{
print "do something with "$0
exit
}' file
To "return something from first line and quit" you could also use grep:
grep --max-count 1 'something'
grep is supposed to be faster than sed and awk [1]. Even if not, this syntax is easier to remember (and to type: grep -m1 'something').
If you don't want to see the whole line but just the matching part, the --only-matching (-o) option might suffice.