Using variables within sed - sed

I am trying to use a variable within sed but cant work it out. I have tried
read -p " enter your name" name
sed -i 's/myname/$name/g' file
But unfortunately it just replaces myname with "$name".
Is there an alternative way?

The problem is that the bash shell does not do variable expansion within single quotes. For example:
pax> name=paxdiablo
pax> echo 'Hello, $name, how are you?'
Hello, $name, how are you?
For simple cases like this, you can just use double quotes:
pax> echo "Hello, $name, how are you?"
Hello, paxdiablo, how are you?
Having said that, there are some serious concerns with just using arbitrary data expanded like this. A clever attacker could give a name containing characters that would cause your sed to do things you may not want (via an injection attack):
pax#paxbox$ name='/; e printenv; echo '
pax#paxbox$ echo 'Hello, myname' | sed "s/myname/$name/"
LESSOPEN=| /usr/bin/lesspipe %s
USER=pax
: (lots of other stuff about my environment I don't want people to see)
HOSTTYPE=x86_64
/
Hello,
And it doesn't even have to be clever - anyone entering a name containing / will almost certainly cause your sed to fail.
You should either sanitise your input data or use tools where the scope for attack is greatly reduced.

Related

Use perl to replace some string in file with complicated password

I want to use perl in a bash script to replace some string in a text file with other strings that may contain various special characters (passwords). The variables containig the special characters come from the environment, so I cannot know them.
For example
# pw comes from the environment and I cannot be sure about it's content
pw='%$&/|some!\!smart%(]password'
pw=${pw//:/\\:}
perl -p -e "s:PASSWORD:$pw:" <<< "my pw is: PASSWORD" # this would come from a text file
# yields: my pw is: %PASSWORD/|some!!smart%(]password
Here I use : as a delimiter and escape possible occurences before, which should prevent some errors. But executing this does show that this isn't even remotely working as expected, though. The bash expansion is still messing up the password.
Now my question is: How can I safely take some environment variable and place it somewhere else? What might be a better approach? I could of course replace and escape further characters in the unknown variable but how can I ever be sure if this is enough?
Using the proper ENV special variable:
pw='%$&/|some!\!smart%(]password'
export pw=${pw//:/\\:}
perl -pe 's:PASSWORD:$ENV{pw}:' <<< "my pw is: PASSWORD"

Best scripting tool to profile a log file

I have extracted log files from servers based on my date and time requirement and after extraction it has hundreds of HTTP requests (URLs). Each request may or may not contain various parameters a,b,c,d,e,f,g etc.,
For example:
http:///abcd.com/blah/blah/blah%20a=10&b=20ORC
http:///abcd.com/blah/blah/blahsomeotherword%20a=30&b=40ORC%26D
http:///abcd.com/blah/blah/blahORsomeORANDworda=30%20b=40%20C%26D
http:///abcd.com/blah/blah/"blah"%20b=40ORCANDD%20G%20F
I wrote a shell script to profile this log file in a while loop, grep for different parameters a,b,c,d,e. If they contain respective parameter then what is the value for that parameter, or TRUE or FALSE.
while read line ; do
echo -n -e $line | sed 's/^.*XYZ:/ /;s/ms.*//' >> output.txt
echo -n -e "\t" >> output.txt
echo -n -e $line | sed 's/^.*XYZ:/ /;s/ABC.*//' >> output.txt
echo -n -e "\t" >> output.txt
echo -n -e $line | sed 's/^.*?q=/ /;s/AUTH_TYPE:.*//'>> output.txt
echo -n -e "\t" >> output.txt
echo " " >> output.txt
done < queries.csv
My question is, my cygwin is taking lot of time (an hour or so) to execute on a log file containing 70k-80k requests. Is there a best way to write this script so that it executes asap? I'm okay with perl too. But my concern is, the script is flexible enough to execute and extract parameters.
Like #reinerpost already pointed out, the loop-internal redirection is probably the #1 killer issue here. You might be able to reap significant gains already by switching from
while read line; do
something >>file
something else too >>file
done <input
to instead do a single redirection after done:
while read line; do
something
something else too
done <input >file
Notice how this also simplifies the loop body, and allows you to overwrite the file when you (re)start the script, instead of separately needing to clean out any old results. As also suggested by #reinerpost, not hard-coding the output file would also make your script more general; simply print to standard output, and let the invoker decide what to do with the results. So maybe just remove the redirections altogether.
(Incidentally, you should switch to read -r unless you specifically want the shell to interpret backslashes and other slightly weird legacy behavior.)
Additionally, collecting results and doing a single print at the end would probably be a lot more efficient than the repeated unbuffered echo -n -e writes. (And again, printf would probably be preferrable to echo for both portability and usability reasons.)
The current script could be reimplemented in sed quite easily. You collect portions of the input URL and write each segment to a separate field. This is easily done in sed with the following logic: Save the input to the hold space. Swap the hold space and the current pattern space, perform the substitution you want, append to the hold space, and swap back the input into the pattern space. Repeat as necessary.
Because your earlier script was somewhat more involved, I'm suggesting to use Awk instead. Here is a crude skeleton for doing things you seem to be wanting to do with your data.
awk '# Make output tab-delimited
BEGIN { OFS="\t" }
{ xyz_ms = $0; sub("^.*XYX:", " ", xyz_ms); sub("ms.*$", "", xyz_ms);
xyz_abc = $0; sub("^.*XYZ:", " ", xyz_abc); sub("ABC.*$", "", xyz_abc);
q = $0; sub("^.*?q=", " ", q); sub("AUTH_TYPE:.*$", "", q);
# ....
# Demonstration of how to count something
n = split($0, _, "&"); ampersand_count = n-1;
# ...
# Done: Now print
print xyz_mx, xyz_abc, q, " " }' queries.csv
Notice how we collect stuff in variables and print only at the end. This is less crucial here than it would have been in your earlier shell script, though.
The big savings here is avoiding to spawn a large number of subprocesses for each input line. Awk is also better optimized for doing this sort of processing quickly.
If Perl is more convenient for you, converting the entire script to Perl should produce similar benefits, and be somewhat more compatible with the sed-centric syntax you have already. Perl is bigger and sometimes slower than Awk, but in the grand scheme of things, not by much. If you really need to optimize, do both and measure.
Problems with your script:
The main problem: you append to a file in every statement. This means the file has to be opened and closed in every statement, which is extremely inefficient.
You hardcode the name of the output file in your script. This is a bad practice. Your script will be much more versatile if it just writes its output to stdout. Leave it to the call to specify where to direct the output. That will also get rid of the previous problem.
bash is interpreted and not optimized for text manipulation: it is bound to be slow, and complex text filtering won't be very readable. Using awk instead will probably make it more concise and more readable (once you know the language); however, if you don't know it yet, I advise learning Perl instead, which is good at what awk is good at but is also general-purpose language: it makes you much far more flexible, allows you to make it even more readable (those who complain about Perl being hard to read have never seen nontrivial shell scripts), and probably makes it a lot faster, because perl compiles scripts prior to running them. If you'd rather invest your efforts into a more popular language than Perl, try Python.

How to capture single quote when using Perl in CLi?

Suppose I have a text file with content like below:
'Jack', is a boy
'Jenny', is a girl
...
...
...
I'd like to use perl in Cli to only capture the names between pairs of single quotes
cat text| perl -ne 'print $1."\n" if/\'(\w+?)\'/'
Above command was what I ran but it didn't work. It seems like "'" messed up with Shell.
I know we have other options like writing a perl script. But given my circumstances, I'd like to find a way to fulfill this in Shell command line.
Please advise.
The shell has the interesting property of concatenating quoted strings. Or rather, '...' or "..." should not be considered strings, but modifiers for available escapes. The '...'-surrounded parts of a command have no escapes available. Outside of '...', a single quote can be passed as \'. Together with the concatenating property, we can embed a single quote like
$ perl -E'say "'\''";'
'
into the -e code. The first ' exits the no-escape zone, \' is our single quote, and ' re-enters the escapeless zone. What perl saw was
perl // argv[0]
-Esay "'"; // argv[1]
This would make your command
cat text| perl -ne 'print $1."\n" if/'\''(\w+?)'\''/'
(quotes don't need escaping in regexes), or
cat text| perl -ne "print \$1.qq(\n) if/'(\w+?)'/"
(using double quotes to surround the command, but using qq// for double quoted strings and escaping the $ sigil to avoid shell variable interpolation).
Here are some methods that do not require manually escaping the perl statement:
(Disclaimer: I'm not sure how robust these are – they haven't been tested extensively)
Cat-in-the-bag technique
perl -ne "$(cat)" text
You will be prompted for input. To terminate cat, press Ctrl-D.
One shortcoming of this: The perl statement is not reusable. This is addressed by the variation:
$pline=$(cat)
perl -ne "$pline" text
The bash builtin, read
Multiple lines:
read -rd'^[' pline
Single line:
read -r pline
Reads user input into the variable pline.
The meaning of the switches:
-r: stop read from interpreting backslashes (e.g. by default read interprets \w as w)
-d: determines what character ends the read command.
^[ is the character corresponding to Esc, you insert ^[ by pressing Ctrl-V then Esc.
Heredoc and script.
(You said no scripts, but this is quick and dirty, so might as well...)
cat << 'EOF' > scriptonite
print $1 . "\n" if /'(\w+)'/
EOF
then you simply
perl -n scriptonite text

How can I have a newline in a string in sh?

This
STR="Hello\nWorld"
echo $STR
produces as output
Hello\nWorld
instead of
Hello
World
What should I do to have a newline in a string?
Note: This question is not about echo.
I'm aware of echo -e, but I'm looking for a solution that allows passing a string (which includes a newline) as an argument to other commands that do not have a similar option to interpret \n's as newlines.
If you're using Bash, you can use backslash-escapes inside of a specially-quoted $'string'. For example, adding \n:
STR=$'Hello\nWorld'
echo "$STR" # quotes are required here!
Prints:
Hello
World
If you're using pretty much any other shell, just insert the newline as-is in the string:
STR='Hello
World'
Bash recognizes a number of other backslash escape sequences in the $'' string. Here is an excerpt from the Bash manual page:
Words of the form $'string' are treated specially. The word expands to
string, with backslash-escaped characters replaced as specified by the
ANSI C standard. Backslash escape sequences, if present, are decoded
as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\nnn the eight-bit character whose value is the octal value
nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal
value HH (one or two hex digits)
\cx a control-x character
The expanded result is single-quoted, as if the dollar sign had not
been present.
A double-quoted string preceded by a dollar sign ($"string") will cause
the string to be translated according to the current locale. If the
current locale is C or POSIX, the dollar sign is ignored. If the
string is translated and replaced, the replacement is double-quoted.
Echo is so nineties and so fraught with perils that its use should result in core dumps no less than 4GB. Seriously, echo's problems were the reason why the Unix Standardization process finally invented the printf utility, doing away with all the problems.
So to get a newline in a string, there are two ways:
# 1) Literal newline in an assignment.
FOO="hello
world"
# 2) Command substitution.
BAR=$(printf "hello\nworld\n") # Alternative; note: final newline is deleted
printf '<%s>\n' "$FOO"
printf '<%s>\n' "$BAR"
There! No SYSV vs BSD echo madness, everything gets neatly printed and fully portable support for C escape sequences. Everybody please use printf now for all your output needs and never look back.
What I did based on the other answers was
NEWLINE=$'\n'
my_var="__between eggs and bacon__"
echo "spam${NEWLINE}eggs${my_var}bacon${NEWLINE}knight"
# which outputs:
spam
eggs__between eggs and bacon__bacon
knight
I find the -e flag elegant and straight forward
bash$ STR="Hello\nWorld"
bash$ echo -e $STR
Hello
World
If the string is the output of another command, I just use quotes
indexes_diff=$(git diff index.yaml)
echo "$indexes_diff"
The problem isn't with the shell. The problem is actually with the echo command itself, and the lack of double quotes around the variable interpolation. You can try using echo -e but that isn't supported on all platforms, and one of the reasons printf is now recommended for portability.
You can also try and insert the newline directly into your shell script (if a script is what you're writing) so it looks like...
#!/bin/sh
echo "Hello
World"
#EOF
or equivalently
#!/bin/sh
string="Hello
World"
echo "$string" # note double quotes!
The only simple alternative is to actually type a new line in the variable:
$ STR='new
line'
$ printf '%s' "$STR"
new
line
Yes, that means writing Enter where needed in the code.
There are several equivalents to a new line character.
\n ### A common way to represent a new line character.
\012 ### Octal value of a new line character.
\x0A ### Hexadecimal value of a new line character.
But all those require "an interpretation" by some tool (POSIX printf):
echo -e "new\nline" ### on POSIX echo, `-e` is not required.
printf 'new\nline' ### Understood by POSIX printf.
printf 'new\012line' ### Valid in POSIX printf.
printf 'new\x0Aline'
printf '%b' 'new\0012line' ### Valid in POSIX printf.
And therefore, the tool is required to build a string with a new-line:
$ STR="$(printf 'new\nline')"
$ printf '%s' "$STR"
new
line
In some shells, the sequence $' is a special shell expansion.
Known to work in ksh93, bash and zsh:
$ STR=$'new\nline'
Of course, more complex solutions are also possible:
$ echo '6e65770a6c696e650a' | xxd -p -r
new
line
Or
$ echo "new line" | sed 's/ \+/\n/g'
new
line
A $ right before single quotation marks '...\n...' as follows, however double quotation marks doesn't work.
$ echo $'Hello\nWorld'
Hello
World
$ echo $"Hello\nWorld"
Hello\nWorld
Disclaimer: I first wrote this and then stumbled upon this question. I thought this solution wasn't yet posted, and saw that tlwhitec did post a similar answer. Still I'm posting this because I hope it's a useful and thorough explanation.
Short answer:
This seems quite a portable solution, as it works on quite some shells (see comment).
This way you can get a real newline into a variable.
The benefit of this solution is that you don't have to use newlines in your source code, so you can indent
your code any way you want, and the solution still works. This makes it robust. It's also portable.
# Robust way to put a real newline in a variable (bash, dash, ksh, zsh; indentation-resistant).
nl="$(printf '\nq')"
nl=${nl%q}
Longer answer:
Explanation of the above solution:
The newline would normally be lost due to command substitution, but to prevent that, we add a 'q' and remove it afterwards. (The reason for the double quotes is explained further below.)
We can prove that the variable contains an actual newline character (0x0A):
printf '%s' "$nl" | hexdump -C
00000000 0a |.|
00000001
(Note that the '%s' was needed, otherwise printf will translate a literal '\n' string into an actual 0x0A character, meaning we would prove nothing.)
Of course, instead of the solution proposed in this answer, one could use this as well (but...):
nl='
'
... but that's less robust and can be easily damaged by accidentally indenting the code, or by forgetting to outdent it afterwards, which makes it inconvenient to use in (indented) functions, whereas the earlier solution is robust.
Now, as for the double quotes:
The reason for the double quotes " surrounding the command substitution as in nl="$(printf '\nq')" is that you can then even prefix the variable assignment with the local keyword or builtin (such as in functions), and it will still work on all shells, whereas otherwise the dash shell would have trouble, in the sense that dash would otherwise lose the 'q' and you'd end up with an empty 'nl' variable (again, due to command substitution).
That issue is better illustrated with another example:
dash_trouble_example() {
e=$(echo hello world) # Not using 'local'.
echo "$e" # Fine. Outputs 'hello world' in all shells.
local e=$(echo hello world) # But now, when using 'local' without double quotes ...:
echo "$e" # ... oops, outputs just 'hello' in dash,
# ... but 'hello world' in bash and zsh.
local f="$(echo hello world)" # Finally, using 'local' and surrounding with double quotes.
echo "$f" # Solved. Outputs 'hello world' in dash, zsh, and bash.
# So back to our newline example, if we want to use 'local', we need
# double quotes to surround the command substitution:
# (If we didn't use double quotes here, then in dash the 'nl' variable
# would be empty.)
local nl="$(printf '\nq')"
nl=${nl%q}
}
Practical example of the above solution:
# Parsing lines in a for loop by setting IFS to a real newline character:
nl="$(printf '\nq')"
nl=${nl%q}
IFS=$nl
for i in $(printf '%b' 'this is line 1\nthis is line 2'); do
echo "i=$i"
done
# Desired output:
# i=this is line 1
# i=this is line 2
# Exercise:
# Try running this example without the IFS=$nl assignment, and predict the outcome.
I'm no bash expert, but this one worked for me:
STR1="Hello"
STR2="World"
NEWSTR=$(cat << EOF
$STR1
$STR2
EOF
)
echo "$NEWSTR"
I found this easier to formatting the texts.
Those picky ones that need just the newline and despise the multiline code that breaks indentation, could do:
IFS="$(printf '\nx')"
IFS="${IFS%x}"
Bash (and likely other shells) gobble all the trailing newlines after command substitution, so you need to end the printf string with a non-newline character and delete it afterwards. This can also easily become a oneliner.
IFS="$(printf '\nx')" IFS="${IFS%x}"
I know this is two actions instead of one, but my indentation and portability OCD is at peace now :) I originally developed this to be able to split newline-only separated output and I ended up using a modification that uses \r as the terminating character. That makes the newline splitting work even for the dos output ending with \r\n.
IFS="$(printf '\n\r')"
On my system (Ubuntu 17.10) your example just works as desired, both when typed from the command line (into sh) and when executed as a sh script:
[bash]§ sh
$ STR="Hello\nWorld"
$ echo $STR
Hello
World
$ exit
[bash]§ echo "STR=\"Hello\nWorld\"
> echo \$STR" > test-str.sh
[bash]§ cat test-str.sh
STR="Hello\nWorld"
echo $STR
[bash]§ sh test-str.sh
Hello
World
I guess this answers your question: it just works. (I have not tried to figure out details such as at what moment exactly the substitution of the newline character for \n happens in sh).
However, i noticed that this same script would behave differently when executed with bash and would print out Hello\nWorld instead:
[bash]§ bash test-str.sh
Hello\nWorld
I've managed to get the desired output with bash as follows:
[bash]§ STR="Hello
> World"
[bash]§ echo "$STR"
Note the double quotes around $STR. This behaves identically if saved and run as a bash script.
The following also gives the desired output:
[bash]§ echo "Hello
> World"
I wasn't really happy with any of the options here. This is what worked for me.
str=$(printf "%s" "first line")
str=$(printf "$str\n%s" "another line")
str=$(printf "$str\n%s" "and another line")
This isn't ideal, but I had written a lot of code and defined strings in a way similar to the method used in the question. The accepted solution required me to refactor a lot of the code so instead, I replaced every \n with "$'\n'" and this worked for me.

What is the correct usage of (nested | double | simple) quotes

I'm sure this question may seem foolish to some of you, but I'm here to learn.
Are these assumptions true for most of the languages ?
EDIT : OK, let's assume I'm talking about Perl/Bash scripting.
'Single quotes'
=> No interpretation at all (e.g. '$' or any metacharacter will be considered as a character and will be printed on screen)
"Double quotes"
=> Variable interpretation
To be more precise about my concerns, I'm writing some shell scripts (in which quotes can sometimes be a big hassle), and wrote this line :
CODIR=`pwd | sed -e "s/$MODNAME//"`
If I had used single quotes in my sed, my pattern would have been '$MODNAME', right ? (and not the actual value of $MODNAME, which is `alpha' in this particular case)
Another problem I had, with an awk inside an echo :
USAGE=`echo -ne "\
Usage : ./\`basename $0\` [-hnvV]\n\
\`ls -l ${MODPATH}/reference/ | awk -F " " '$8 ~ /\w+/{print "> ",$8}'\`"`
I spent some time debugging that one. I came to the conclusion that backticks were escaped so that the interpreter doesn't "split" the command (and stop right before «basename»). In the awk commmand, '$8' is successfully interpreted by awk, thus not by shell. What if I wanted to use a shell variable ? Would I write awk -F "\"$MY_SHELL_VAR\"" ? Because $MY_SHELL_VAR as is, will be interpreted by awk, won't it ?
Don't hesitate to add any information about quoting or backticks !
Thank you ! :)
It varies massively by language. For example, in the C/Java/C++/C# etc family, you can't use single quotes for a string at all - they're only for single characters.
I think it's far better to learn the rules properly for the languages you're actually interested in than to try to generalise.
Are these assumptions true for most of the languages ?
Answer: No
In bash scripting, backticks are deprecated in favor of $() in part because it is non-obvious how nested quotes and escaping are supposed to work. You may also want to take a look at Bash Pitfalls.
It's definitely not the same for all languages. In Python, for example, single and double quotes are interchangeable. The only difference is that you can include single quotes within a double-quoted string without escaping them and vice versa ("How's it going?").
Also, there are triple-quoted strings that can span multiple lines.
In Perl, you also have q() and qq() to help you in nested quoting situations:
my $x = q(a string with 'single quotes');
my $y = qq(an $interpreted string with "double quotes");
These certainly will help you avoid "\"needlessly\"" '\'escaping\'' internal quotes.
Yes, something like awk -F "\"$MY_SHELL_VAR\"" will work, however in this case you wouldn't be able to use variables in awk, since they will be interpreted by shell, so the way to go is something like this (I will use command simpler than yours, if you don't mind :) ):
awk -F " " '$8 ~ /\w+/{print "> ",$8, '$SOME_SHELL_VAR'}'
Note the single quotes terminating and restarting.
The trickiest part, usually, is to pass a quote in the argument to the command. In this case you need to terminate single quote, add escaped quote character, start quote again, like this:
awk '$1 ~ '\''{print}'
Note, that single quote can't be escaped inside single quotes, since the "\" won't be treated as an escape character.
This is probably not related directly to your quiestion, but still useful.
I don't know about perl, but for bash you don't need to backslash the newline.
As for quotes, I have a (very personal) pattern that I call the "five quotes" pattern. It helps to put one quote in a string enclosed by the same kind of quotes
For instance:
doublequoted="some things "'"'"quoted"'"'" and some not"
simplequoted='again '"'"'quote this'"'"' but not that'
Note that you can freely append strings with different kinds of quotes, which is useful when you want the shell to interprete some vars but not some others:
awk -F " " '$8 ~ /\w+/{print "> ",$8, '"$SOME_SHELL_VAR"'}'
Also, I don't use the backtick anymore but the $(...)pattern which is more legible and can be nested.
USAGE=$(echo -ne "
Usage : ./$(basename $0) [-hnvV]\n
$(ls -l ${MODPATH}/reference/ | awk -F " " '$8 ~ /\w+/{print "> ",$8}')")
In perl, double quoted strings will have their variables expanded.
If you write that for instance:
my $email = "foo#bar.com" ;
perl will try to expand #bar. If you use strict, you'll see an complain about the array bar not existing. If you don't, you'll just see a weird behavior.
So it's better to write:
my $email = 'foo#bar.com' ;
For these types of reason, my advice is to always use single quote for strings, unless you know that you need variable expansion.