perl - Extract data using grep and sed - perl

I'm using this code to get all titles from urls with http://something.txt:
#!/usr/bin/perl -w
$output = `cat source.html | grep -o '<a .*href=.*>' | grep -E 'txt' | sed -e 's/<a /\n<a /g' | sed -e 's/<a .*title="//' | cut -f1 -d '"'`;
print("$output");
When i run this on perl i get the error:
sed: -e expression #1, char 6: unterminated `s' command
The error is related with this portion of code:
sed -e 's/<a /\n<a /g'

In backquotes, Perl uses the same rules as in double quotes. Therefore, \n corresponds to a newline; you have to backslash the backslash to pass literal \ to the shell:
`sed -e 's/<a /\\n<a /g'`

Related

How to insert multiple complex lines containing spaces, pipes, grep and sed commands before pattern

The goal is to insert the following complex lines before a specific pattern in a file:
NDPI_VERSION_SHORT=$(cat Makefile | grep -P "^NDPI_VERSION_SHORT = " | sed -E 's|^NDPI_VERSION_SHORT = (.*)$|\1|g') \
NDPI_VERSION_SHORT=${NDPI_VERSION_SHORT//[[:space:]]/} \
NDPI_MAJOR=$(cat Makefile | grep -P "^NDPI_MAJOR = " | sed -E 's|^NDPI_MAJOR = (.*)$|\1|g') \
NDPI_MAJOR=${NDPI_MAJOR//[[:space:]]/}
I unsuccessfully tried the following:
sed -i '/pattern/i \
NDPI_VERSION_SHORT=$(cat Makefile | grep -P "^NDPI_VERSION_SHORT = " | sed -E \'s|^NDPI_VERSION_SHORT = (.*)$|\1|g\') \
NDPI_VERSION_SHORT=${NDPI_VERSION_SHORT\/\/[[:space:]]\/} \
NDPI_MAJOR=$(cat Makefile | grep -P "^NDPI_MAJOR = " | sed -E \'s|^NDPI_MAJOR = (.*)$|\1|g\') \
NDPI_MAJOR=${NDPI_MAJOR\/\/[[:space:]]\/}' file
bash: syntax error near unexpected token `('
I also tried to quote all inserted lines leading to the same result.
What am I doing wrong?
This should work:
sed "/pattern/i \
NDPI_VERSION_SHORT=\$\(cat Makefile | grep -P \"^NDPI_VERSION_SHORT = \" | sed -E 's|^NDPI_VERSION_SHORT = \(.*\)\$|\\\1|g'\) \\\ \n\
NDPI_VERSION_SHORT=\${NDPI_VERSION_SHORT//[[:space:]]/} \\\ \n\
NDPI_MAJOR=\$\(cat Makefile | grep -P \"^NDPI_MAJOR = \" | sed -E 's|^NDPI_MAJOR = \(.*\)\$|\\\1|g'\) \\\ \n\
NDPI_MAJOR=\${NDPI_MAJOR//[[:space:]]/}" file
The problem is the single quote within the inserted text, which will end the sed script and which cannot be escaped. You can use single quotes, though, if you use double quotes to enclose the script. This, however, means you'll need to escape quite a lot of things in your text: The $, ", (, ). Since the shell itself uses up a backslash for escaping, you need to write \\\ where you have a \. And the line break is achieved via a \n. Note that the / does not need to be escaped since sed does not use it as delimiter here.

sed search and replace \" but not \\"

I am trying to replace all escaped characters \" in a string with "" but not if \" is preceded by a \
So that input such as:
\"\"\"\" would return """"""""
\"\\"\"\" would return ""\\"""""
\" would return ""
\"\" would return """"
\\"\" would return \\"""
\"\\" would return ""\\"
\\\\\\\" would return \\\\\\\"
So far I have
$ echo sed -e 's/\([^\]\)\\"/\1""/;s/^\\"/""/'
but in the case of
$ echo '\"\"\"\"\"' | sed -e 's/\([^\]\)\\"/\1""/;s/^\\"/""/'`
I am getting incorrect results.
Any help would be appreciated.
This might work for you (GNU sed):
sed 's/\\\\"/\n/g;s/\\"/""/g;s/\n/\\\\"/g' file
Replace all occurances of the string you want untouched by something else (\n is a good choice), replace the string you want changed globally, reinstate the first set of strings.
How about this:
#!/bin/bash
function myreplace {
echo "$1" | sed -e "s/[\\]\"/MYDUMMY/g" \
-e 's/\\MYDUMMY/\\\\"/g' \
-e 's/MYDUMMY/""/g'
}
myreplace '\"\"\"\"'
myreplace '\"\\"\"\"'
myreplace '\"'
myreplace '\"\"'
myreplace '\\"\"'
myreplace '\"\\"'
myreplace '\\\\\\\"'
Executing the script above results in:
""""""""
""\\"""""
""
""""
\\"""
""\\"
\\\\\\\"
Using a sed loop will allow not having to pick a unique replacement string for an unknown dataset.
sed -e 's/^\\"/""/;:inner; s/\([^\]\)[\]"/\1""/g; t inner'
$ echo '\"\"\"\"' | sed -e 's/^\\"/""/;:inner; s/\([^\]\)[\]"/\1""/g;t inner'
""""""""
$ echo '\"\\"\"\"' | sed -e 's/^\\"/""/;:inner; s/\([^\]\)[\]"/\1""/g; t inner'
""\\"""""
$ echo '\"' | sed -e 's/^\\"/""/;:inner; s/\([^\]\)[\]"/\1""/g; t inner'
""
$ echo '\"\"' | sed -e 's/^\\"/""/;:inner; s/\([^\]\)[\]"/\1""/g; t inner'
""""
$ echo '\\"\"' | sed -e 's/^\\"/""/;:inner; s/\([^\]\)[\]"/\1""/g; t inner'
\\"""
$ echo '\"\\"' | sed -e 's/^\\"/""/;:inner; s/\([^\]\)[\]"/\1""/g; t inner'
""\\"
$ echo '\\\\\\\"' | sed -e 's/^\\"/""/;:inner; s/\([^\]\)[\]"/\1""/g; t inner'
\\\\\\\"

Removing matching text from line

I have a example cut down from a log file.
112 172.172.172.1#50912 (ssl.bing.com):
I would like some how to remove the # and numbers after and (): from the url.
Would like the result.
112 172.172.172.1 ssl.bing.com
Here is the sed oneliner I have been working on.
cat newdns.log | sed -e 's/.*query: //' | cut -f 1 -d' ' | sort | uniq -c | sort -k2 > old.log
Thanks
Using sed, you could say:
sed 's/#[0-9]*//;s/(\(.*\)):$/\1/' filename
or, in a single substitution:
sed 's/#[0-9]* *(\(.*\)):$/ \1/' filename
Another sed:
sed -r 's/#[^ ]+|[():]//g'
$ echo '112 172.172.172.1#50912 (ssl.bing.com):' | sed -r 's/#[^ ]+|[():]//g'
112 172.172.172.1 ssl.bing.com

Solaris sed label too long

I am trying to execute a shell file, in which there is a line:
sed -ne ':1;/PinnInstitutionPath/{n;p;b1}' Institution | sed -e s/\ //g | sed -e s/\=//g | sed -e s/\;//g | sed -e s/\"//g | sed -e s/\Name//g
And un error message turns out : "Label too long: :1;/PinnInstitutionPath/{n;p;b1}"
I am a noob at linux, so can anyone help me to solve this problem, thank you!
Try changing
sed -ne ':1;/PinnInstitutionPath/{n;p;b1}'
to
sed -ne ':1' -e '/PinnInstitutionPath/{n;p;b1}'
Also, you don't need to call sed so many times:
sed -ne 's/[ =;"]//g; s/Name//g' -e ':1' -e '/PinnInstitutionPath/{n;p;b1}'
Concerning 'sed: Label too long' in Solaris (SunOS) - you will need to split your command into several lines, if you use labels.
In your casesed -ne ':1
/PinnInstitutionPath/{
n
p
b 1
}' Institution | sed -e s/\ //g -e s/\=//g -e s/\;//g -e s/\"//g -e s/\Name//g

'sed' usage in perl script error

I have the following line in a Perl script:
my $temp = `sed 's/ /\n/g' /sys/bus/w1/devices/w1_bus_master1/10-000802415bef/w1_slave | grep t= | sed 's/t=//'`;
Which throws up the error:
"sed: -e expression #1, char 2: unterminated `s' command"
If I run a shell script as below it works fine:
temp1=`sed 's/ /\n/g' /sys/bus/w1/devices/w1_bus_master1/10-000802415bef/w1_slave | grep t= | sed 's/t=//'`
echo $temp1
Anyone got any ideas?
Perl interpretes your \n as a literal newline character. Your command line will therefore look something like this from sed's perspective:
sed s/ /
/g ...
which sed doesn't like. The shell does not interpret it that way.
The proper solution is not to use sed/grep in such a situation at all. Perl is, after all, very, very good at handling text. For example (untested):
use File::Slurp;
my #lines = split m/\n/, map { s/ /\n/g; $_ } scalar(read_file("/sys/bus...));
#lines = map { s/t=//; $_ } grep { m/t=/ } #lines;
Alternatively escape the \n once, e.g. sed 's/ /\\n/g'....
You need to escape the \n in our first regular expression. The backtick-operator in perl thinks it is a control-character and inserts a newline instead of the string \n.
|
V
my $temp = `sed 's/ /\\n/g' /sys/bus/ # ...