Brace expansion as flag argument pairs - fish

I'm trying to use top in conjunction with pgrep to filter the processes that are being shown. What I have working so far is:
top -pid (pgrep dart)
Which works great for getting a single process to show in top's interactive view.
However, the macos version of top only has one way of listing multiple processes, you have to just repeat -pid [process id] again and again eg:
top -pid 354 -pid 236 -pid 287
My first thought was that I could use brace expansion and command substitution to achieve this, and I tried:
top "-pid "{(pgrep dart)}
But I get invalid option or syntax: -pid {33978}. Even when I manually add in pids it doesn't work:
top "-pid "{45, 23}
invalid option or syntax: -pid 45
Is it possible to achieve what I'm trying to do with fish? i.e insert flags into a command via a combination of command substitution and brace expansion?

I'm thinking we can come up with something more concise, but at least:
top (string split " " (printf " -p %s" (pgrep dart)))
seems to work for me on fish on Ubuntu/WSL as a first attempt. That should translate to:
top (string split " " (printf " -pid %s" (pgrep dart)))
on MacOS, but from the comments, it sounds like the -pid may be getting passed to printf (first), and then string split, which neither like, of course. On WSL/Ubuntu, I ran into that problem, but was able to resolve it by adding the space at the beginning of the string. However, that appears to not be working for Mac. Your solution is really the more canonical one; use -- to stop those commands from interpreting the -pid as intended for them.
So, top (string split -n " " -- (printf -- "-pid %s " (pgrep dart))) works for MacOS top, and the only change needed for Linux/Ubuntu is to replace the -pid with -p, as in top (string split -n " " -- (printf -- "-p %s " (pgrep dart))).
Which works, but is quite verbose. To simplify, you came up with:
top (string split " " -- "-pid "(pgrep dart)) # MacOS
top (string split " " -- "-p "(pgrep dart)) # Linux
Brilliant! I've honestly never used Cartesian Product in fish, but that's clearly what you were going for in the first place with your brace-expansion attempts.
Using that example, I was then able to improve it (at least on Linux) to:
top "-p "(pgrep dart)
But I have a feeling that the equivalent top "-pid "(pgrep dart) isn't going to work on MacOS, since top "-pid "{45, 23} didn't work for you either. That same construct (top "-p "{1,11}) works for me, for some reason.

You need to compose an expansion that results in every term of the expanded expression, when quoted, yielding a valid argument for top.
"-pid "{45, 23} looks, at first glance, as though it ought to be viable:
> echo "-pid "{45, 23}
-pid 45 -pid 23
So, what's the problem ? If we delimit the original expression and expand again:
> echo ["-pid "{45, 23}]
[-pid 45] [-pid 23]
Thus, the command-line you're generating for top using this expansion would be parsed like:
top "-pid 45" "-pid 23"
Instead, observe the following:
> p={45,23} echo [{-pid,$p}]
[-pid] [45] [-pid] [23]
As many things as FiSH does right, it does just as many wrong, so for no good reason whatsoever, the following isn't viable:
> echo [{-pid,{45,23}}]
[-pid] [45] [23]
Therefore, store your list of process id's in an array, and then use a brace expansion:
p=(pgrep dart) top {-pid,$p}

Related

Sed to replace variable length string between 2 known patterns

I'd like to be able to replace a string between 2 known patterns. The catch is that I want to replace it by a string of the same length that is composed only of 'x'.
Let's say I have a file containing:
Hello.StringToBeReplaced.SecondString
Hello.ShortString.SecondString
I'd like the output to be like this:
Hello.xxxxxxxxxxxxxxxxxx.SecondString
Hello.xxxxxxxxxxx.SecondString
Using sed loops
You can use sed, though the thinking required is not wholly obvious:
sed ':a;s/^\(Hello\.x*\)[^x]\(.*\.SecondString\)/\1x\2/;t a'
This is for GNU sed; BSD (Mac OS X) sed and other versions may be fussier and require:
sed -e ':a' -e 's/^\(Hello\.x*\)[^x]\(.*\.SecondString\)/\1x\2/' -e 't a'
The logic is identical in both:
Create a label a
Substitute the lead string and a sequence of x's (capture 1), followed by a non-x, and arbitrary other data plus the second string (capture 2), and replace it with the contents of capture 1, an x and the content of capture 2.
If the s/// command made a change, go back to the label a.
It stops substituting when there are no non-x's between the two marker strings.
Two tweaks to the regex allow the code to recognize two copies of the pattern on a single line. Lose the ^ that anchors the match to the beginning of the line, and change .* to [^.]* (so that the regex is not quite so greedy):
$ echo Hello.StringToBeReplaced.SecondString Hello.StringToBeReplaced.SecondString |
> sed ':a;s/\(Hello\.x*\)[^x]\([^.]*\.SecondString\)/\1x\2/;t a'
Hello.xxxxxxxxxxxxxxxxxx.SecondString Hello.xxxxxxxxxxxxxxxxxx.SecondString
$
Using the hold space
hek2mgl suggests an alternative approach in sed using the hold space. This can be implemented using:
$ echo Hello.StringToBeReplaced.SecondString |
> sed 's/^\(Hello\.\)\([^.]\{1,\}\)\(\.SecondString\)/\1#\3##\2/
> h
> s/.*##//
> s/./x/g
> G
> s/\(x*\)\n\([^#]*\)#\([^#]*\)##.*/\2\1\3/
> '
Hello.xxxxxxxxxxxxxxxxxx.SecondString
$
This script is not as robust as the looping version but works OK as written when each line matches the lead-middle-tail pattern. It first splits the line into three sections: the first marker, the bit to be mangled, and the second marker. It reorganizes that so that the two markers are separated by #, followed by ## and the bit to be mangled. h copies the result to the hold space. Remove everything up to and including the ##; replace each character in the bit to be mangled by x, then copy the material in the hold space after the x's in the pattern space, with a newline separating them. Finally, recognize and capture the x's, the lead marker, and the tail marker, ignoring the newline, the # and ## plus trailing material, and reassemble as lead marker, x's, and tail marker.
To make it robust, you'd recognize the pattern and then group the commands shown inside { and } to group them so they're only executed when the pattern is recognized:
sed '/^\(Hello\.\)\([^.]\{1,\}\)\(\.SecondString\)/{
s/^\(Hello\.\)\([^.]\{1,\}\)\(\.SecondString\)/\1#\3##\2/
h
s/.*##//
s/./x/g
G
s/\(x*\)\n\([^#]*\)#\([^#]*\)##.*/\2\1\3/
}'
Adjust to suit your needs...
Adjusting to suit your needs
[I tried one of your solutions and it worked fine.]
However when I try to replace the 'hello' by my real string (which is
'1.2.840.') and my second string (which is simply a dot '.'), things stop
working. I guess all these dots confuse the sed command.
What I try to achieve is transform this '1.2.840.10008.' to
'1.2.840.xxxxx.'
And this pattern happens several times in my file with variable number
of characters to be replaced between the '1.2.840.' and the next dot '.'
There are times when it is important to get your question close enough to the real scenario — this may be one such. Dot is a metacharacter in
sed regular expressions (and in most other dialects of regular expression — shell globbing being the noticeable exception). If the 'bit to be mangled' is always digits, then we can tighten up the regular expressions, though actually (when I look at the code ahead) the tightening really isn't imposing much in the way of a restriction.
Pretty much any solution using regular expressions is a balancing act that has to pit convenience and abbreviation against reliability and precision.
Revised code plus data
cat <<EOF |
transform this '1.2.840.10008.' to '1.2.840.xxxxx.'
OK, and hence 1.2.840.21. and 1.2.840.20992. should lose the 21 and 20992.
EOF
sed ':a;s/\(1\.2\.840\.x*\)[^x.]\([^.]*\.\)/\1x\2/;t a'
Example output:
transform this '1.2.840.xxxxx.' to '1.2.840.xxxxx.'
OK, and hence 1.2.840.xx. and 1.2.840.xxxxx. should lose the 21 and 20992.
The changes in the script are:
sed ':a;s/\(1\.2\.840\.x*\)[^x.]\([^.]*\.\)/\1x\2/;t a'
Add 1\.2\.840\. as the start pattern.
Revise the 'character to replace' expression to 'not x or .'.
Use just \. as the tail pattern.
You could replace the [^x.] with [0-9] if you're sure you only want digits matched, in which case you won't have to worry about spaces as discussed below.
You may decide you don't want spaces to be matched so that a casual comment like:
The net prefix is 1.2.840. And there are other prefixes too.
does not end up as:
The net prefix is 1.2.840.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.
In which case, you probably need to use:
sed ':a;s/\(1\.2\.840\.x*\)[^x. ]\([^ .]*\.\)/\1x\2/;t a'
And so the changes continue until you've got something precise enough to do what you want without doing anything you don't want on your current data set. Writing bullet-proof regular expressions requires a precise specification of what you want matched, and can be quite hard.
I'd choose perl:
perl -pe 's/(?<=Hello\.)(.*?)(?=\.SecondString)/ "x" x length($1) /e' file
This awk should do:
awk -F. '{for (i=1;i<=length($2);i++) a=a"x";$2=a;a=""}1' OFS="." file
Hello.xxxxxxxxxxxxxxxxxx.SecondString
Hello.xxxxxxxxxxx.SecondString
Bash Works Too
While the perl, sed and awk solutions are probably the better choice, a Bash solution is not that difficult (just longer). Bash has good character-by-character handling abilities as well:
#!/bin/bash
rep=0 # replace flag
skip=0 # delay reset flag
while read -r line; do # read each line
for ((i=0; i<${#line}; i++)); do # for each character in the line
# if '.' and replace on, turn off and set skip
[ ${line:i:1} == '.' -a $rep -eq 1 ] && { rep=0; skip=1; }
# print char or "x" depending on replace flag
[ $rep -eq 0 ] && printf "%c" ${line:i:1} || printf "x"
# if '.' and replace off
if [ ${line:i:1} == '.' -a $rep -eq 0 ]; then
# if skip, turn skip off, else set replace on
[ $skip -eq 1 ] && skip=0 || rep=1
fi
done
printf "\n"
done
exit 0
Input
$ cat dat/replacefile.txt
Hello.StringToBeReplaced.SecondString
Hello.ShortString.SecondString
Output
$ bash replacedot.sh < dat/replacefile.txt
Hello.xxxxxxxxxxxxxxxxxx.SecondString
Hello.xxxxxxxxxxx.SecondString
For the sake of your sanity, just use awk:
$ awk 'BEGIN{FS=OFS="."} {gsub(/./,"x",$2)} 1' file
Hello.xxxxxxxxxxxxxxxxxx.SecondString
Hello.xxxxxxxxxxx.SecondString

How to capture single quote when using Perl in CLi?

Suppose I have a text file with content like below:
'Jack', is a boy
'Jenny', is a girl
...
...
...
I'd like to use perl in Cli to only capture the names between pairs of single quotes
cat text| perl -ne 'print $1."\n" if/\'(\w+?)\'/'
Above command was what I ran but it didn't work. It seems like "'" messed up with Shell.
I know we have other options like writing a perl script. But given my circumstances, I'd like to find a way to fulfill this in Shell command line.
Please advise.
The shell has the interesting property of concatenating quoted strings. Or rather, '...' or "..." should not be considered strings, but modifiers for available escapes. The '...'-surrounded parts of a command have no escapes available. Outside of '...', a single quote can be passed as \'. Together with the concatenating property, we can embed a single quote like
$ perl -E'say "'\''";'
'
into the -e code. The first ' exits the no-escape zone, \' is our single quote, and ' re-enters the escapeless zone. What perl saw was
perl // argv[0]
-Esay "'"; // argv[1]
This would make your command
cat text| perl -ne 'print $1."\n" if/'\''(\w+?)'\''/'
(quotes don't need escaping in regexes), or
cat text| perl -ne "print \$1.qq(\n) if/'(\w+?)'/"
(using double quotes to surround the command, but using qq// for double quoted strings and escaping the $ sigil to avoid shell variable interpolation).
Here are some methods that do not require manually escaping the perl statement:
(Disclaimer: I'm not sure how robust these are – they haven't been tested extensively)
Cat-in-the-bag technique
perl -ne "$(cat)" text
You will be prompted for input. To terminate cat, press Ctrl-D.
One shortcoming of this: The perl statement is not reusable. This is addressed by the variation:
$pline=$(cat)
perl -ne "$pline" text
The bash builtin, read
Multiple lines:
read -rd'^[' pline
Single line:
read -r pline
Reads user input into the variable pline.
The meaning of the switches:
-r: stop read from interpreting backslashes (e.g. by default read interprets \w as w)
-d: determines what character ends the read command.
^[ is the character corresponding to Esc, you insert ^[ by pressing Ctrl-V then Esc.
Heredoc and script.
(You said no scripts, but this is quick and dirty, so might as well...)
cat << 'EOF' > scriptonite
print $1 . "\n" if /'(\w+)'/
EOF
then you simply
perl -n scriptonite text

Manipulate ampersand in sed

Is it possible to manipulate the ampersand in sed? I want to add +1 to all numbers in a file. Something like this:
sed -i "s/[0-9]\{1,2\}/$(expr & + 1)/g" filename
EDIT: Today I created a loop using grep and sed that does the job needed. But the question remains open if anyone knows of a way of manipulating the ampersand, since this is not the first time I wanted to run commands on the replacement string, and couldn't.
You may use e modifier to achieve this:
$ cat test.txt
1
2
$ sed 's/^[0-9]\{1,2\}$/expr & + 1/e' test.txt
2
3
In this case you should construct command in replacement part which will be executed and result will be used for replacement.
sed will need to thunk out to some shell command (with '!') on each line to do that.
Here you think you are calling sed which then calls back to the shell to evaluate $(expr & + 1) for each line, but actually it isn't. $(expr & + 1) will just get statically evaluated (once) by the outer shell, and cause an error, since '&' is not at that point a number.
To actually do this, either:
hardcode all ten cases of last digit 0..9, as per this example in sed documentation
Use a sed-command which starts with '1,$!' to invoke the shell on each line, and perform the increment there, with expr, awk, perl or whatever.
FOOTNOTE: I never knew about the /e modifier, which php-coder shows.
Great question. smci answered first and was spot on about shells.
In case you want to solve this problem in general, here is (for fun) a Ruby solution embedded in an example:
echo "hdf 4 fs 88\n5 22 sdf lsd 6" | ruby -e 'ARGF.each {|line| puts line.gsub(/(\d+)/) {|n| n.to_i+1}}'
The output should be
hdf 5 fs 89\n6 23 sdf lsd 7

How to reformat a source file to go from 2 space indentations to 3?

This question is nearly identical to this question except that I have to go to three spaces (company coding guidelines) rather than four and the accepted solution will only double the matched pattern. Here was my first attempt:
:%s/^\(\s\s\)\+/\1 /gc
But this does not work because four spaces get replaced by three. So I think that what I need is some way to get the count of how many times the pattern matched "+" and use that number to create the other side of the substitution but I feel this functionality is probably not available in Vim's regex (Let me know if you think it might be possible).
I also tried doing the substitution manually by replacing the largest indents first and then the next smaller indent until I got it all converted but this was hard to keep track of the spaces:
:%s/^ \(\S\)/ \1/gc
I could send it through Perl as it seems like Perl might have the ability to do it with its Extended Patterns. But I could not get it to work with my version of Perl. Here was my attempt with trying to count a's:
:%!perl -pe 'm<(?{ $cnt = 0 })(a(?{ local $cnt = $cnt + 1; }))*aaaa(?{ $res = $cnt })>x; print $res'
My last resort will be to write a Perl script to do the conversion but I was hoping for a more general solution in Vim so that I could reuse the idea to solve other issues in the future.
Let vim do it for you?
:set sw=3<CR>
gg=G
The first command sets the shiftwidth option, which is how much you indent by. The second line says: go to the top of the file (gg), and reindent (=) until the end of the file (G).
Of course, this depends on vim having a good formatter for the language you're using. Something might get messed up if not.
Regexp way... Safer, but less understandable:
:%s#^\(\s\s\)\+#\=repeat(' ',strlen(submatch(0))*3/2)#g
(I had to do some experimentation.)
Two points:
If the replacement starts with \=, it is evaluated as an expression.
You can use many things instead of /, so / is available for division.
The perl version you asked for...
From the command line (edits in-place, no backup):
bash$ perl -pi -e 's{^((?: )+)}{" " x (length($1)/2)}e' YOUR_FILE
(in-place, original backed up to "YOUR_FILE.bak"):
bash$ perl -pi.bak -e 's{^((?: )+)}{" " x (length($1)/2)}e' YOUR_FILE
From vim while editing YOUR_FILE:
:%!perl -pe 's{^((?: )+)}{" " x (length($1)/2)}e'
The regex matches the beginning of the line, followed by (the captured set of) one or more "two space" groups. The substitution pattern is a perl expression (hence the 'e' modifier) which counts the number of "two space" groups that were captured and creates a string of that same number of "three space" groups. If an "extra" space was present in the original it is preserved after the substitution. So if you had three spaces before, you'll have four after, five before will turn into seven after, etc.

How can I have a newline in a string in sh?

This
STR="Hello\nWorld"
echo $STR
produces as output
Hello\nWorld
instead of
Hello
World
What should I do to have a newline in a string?
Note: This question is not about echo.
I'm aware of echo -e, but I'm looking for a solution that allows passing a string (which includes a newline) as an argument to other commands that do not have a similar option to interpret \n's as newlines.
If you're using Bash, you can use backslash-escapes inside of a specially-quoted $'string'. For example, adding \n:
STR=$'Hello\nWorld'
echo "$STR" # quotes are required here!
Prints:
Hello
World
If you're using pretty much any other shell, just insert the newline as-is in the string:
STR='Hello
World'
Bash recognizes a number of other backslash escape sequences in the $'' string. Here is an excerpt from the Bash manual page:
Words of the form $'string' are treated specially. The word expands to
string, with backslash-escaped characters replaced as specified by the
ANSI C standard. Backslash escape sequences, if present, are decoded
as follows:
\a alert (bell)
\b backspace
\e
\E an escape character
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
\\ backslash
\' single quote
\" double quote
\nnn the eight-bit character whose value is the octal value
nnn (one to three digits)
\xHH the eight-bit character whose value is the hexadecimal
value HH (one or two hex digits)
\cx a control-x character
The expanded result is single-quoted, as if the dollar sign had not
been present.
A double-quoted string preceded by a dollar sign ($"string") will cause
the string to be translated according to the current locale. If the
current locale is C or POSIX, the dollar sign is ignored. If the
string is translated and replaced, the replacement is double-quoted.
Echo is so nineties and so fraught with perils that its use should result in core dumps no less than 4GB. Seriously, echo's problems were the reason why the Unix Standardization process finally invented the printf utility, doing away with all the problems.
So to get a newline in a string, there are two ways:
# 1) Literal newline in an assignment.
FOO="hello
world"
# 2) Command substitution.
BAR=$(printf "hello\nworld\n") # Alternative; note: final newline is deleted
printf '<%s>\n' "$FOO"
printf '<%s>\n' "$BAR"
There! No SYSV vs BSD echo madness, everything gets neatly printed and fully portable support for C escape sequences. Everybody please use printf now for all your output needs and never look back.
What I did based on the other answers was
NEWLINE=$'\n'
my_var="__between eggs and bacon__"
echo "spam${NEWLINE}eggs${my_var}bacon${NEWLINE}knight"
# which outputs:
spam
eggs__between eggs and bacon__bacon
knight
I find the -e flag elegant and straight forward
bash$ STR="Hello\nWorld"
bash$ echo -e $STR
Hello
World
If the string is the output of another command, I just use quotes
indexes_diff=$(git diff index.yaml)
echo "$indexes_diff"
The problem isn't with the shell. The problem is actually with the echo command itself, and the lack of double quotes around the variable interpolation. You can try using echo -e but that isn't supported on all platforms, and one of the reasons printf is now recommended for portability.
You can also try and insert the newline directly into your shell script (if a script is what you're writing) so it looks like...
#!/bin/sh
echo "Hello
World"
#EOF
or equivalently
#!/bin/sh
string="Hello
World"
echo "$string" # note double quotes!
The only simple alternative is to actually type a new line in the variable:
$ STR='new
line'
$ printf '%s' "$STR"
new
line
Yes, that means writing Enter where needed in the code.
There are several equivalents to a new line character.
\n ### A common way to represent a new line character.
\012 ### Octal value of a new line character.
\x0A ### Hexadecimal value of a new line character.
But all those require "an interpretation" by some tool (POSIX printf):
echo -e "new\nline" ### on POSIX echo, `-e` is not required.
printf 'new\nline' ### Understood by POSIX printf.
printf 'new\012line' ### Valid in POSIX printf.
printf 'new\x0Aline'
printf '%b' 'new\0012line' ### Valid in POSIX printf.
And therefore, the tool is required to build a string with a new-line:
$ STR="$(printf 'new\nline')"
$ printf '%s' "$STR"
new
line
In some shells, the sequence $' is a special shell expansion.
Known to work in ksh93, bash and zsh:
$ STR=$'new\nline'
Of course, more complex solutions are also possible:
$ echo '6e65770a6c696e650a' | xxd -p -r
new
line
Or
$ echo "new line" | sed 's/ \+/\n/g'
new
line
A $ right before single quotation marks '...\n...' as follows, however double quotation marks doesn't work.
$ echo $'Hello\nWorld'
Hello
World
$ echo $"Hello\nWorld"
Hello\nWorld
Disclaimer: I first wrote this and then stumbled upon this question. I thought this solution wasn't yet posted, and saw that tlwhitec did post a similar answer. Still I'm posting this because I hope it's a useful and thorough explanation.
Short answer:
This seems quite a portable solution, as it works on quite some shells (see comment).
This way you can get a real newline into a variable.
The benefit of this solution is that you don't have to use newlines in your source code, so you can indent
your code any way you want, and the solution still works. This makes it robust. It's also portable.
# Robust way to put a real newline in a variable (bash, dash, ksh, zsh; indentation-resistant).
nl="$(printf '\nq')"
nl=${nl%q}
Longer answer:
Explanation of the above solution:
The newline would normally be lost due to command substitution, but to prevent that, we add a 'q' and remove it afterwards. (The reason for the double quotes is explained further below.)
We can prove that the variable contains an actual newline character (0x0A):
printf '%s' "$nl" | hexdump -C
00000000 0a |.|
00000001
(Note that the '%s' was needed, otherwise printf will translate a literal '\n' string into an actual 0x0A character, meaning we would prove nothing.)
Of course, instead of the solution proposed in this answer, one could use this as well (but...):
nl='
'
... but that's less robust and can be easily damaged by accidentally indenting the code, or by forgetting to outdent it afterwards, which makes it inconvenient to use in (indented) functions, whereas the earlier solution is robust.
Now, as for the double quotes:
The reason for the double quotes " surrounding the command substitution as in nl="$(printf '\nq')" is that you can then even prefix the variable assignment with the local keyword or builtin (such as in functions), and it will still work on all shells, whereas otherwise the dash shell would have trouble, in the sense that dash would otherwise lose the 'q' and you'd end up with an empty 'nl' variable (again, due to command substitution).
That issue is better illustrated with another example:
dash_trouble_example() {
e=$(echo hello world) # Not using 'local'.
echo "$e" # Fine. Outputs 'hello world' in all shells.
local e=$(echo hello world) # But now, when using 'local' without double quotes ...:
echo "$e" # ... oops, outputs just 'hello' in dash,
# ... but 'hello world' in bash and zsh.
local f="$(echo hello world)" # Finally, using 'local' and surrounding with double quotes.
echo "$f" # Solved. Outputs 'hello world' in dash, zsh, and bash.
# So back to our newline example, if we want to use 'local', we need
# double quotes to surround the command substitution:
# (If we didn't use double quotes here, then in dash the 'nl' variable
# would be empty.)
local nl="$(printf '\nq')"
nl=${nl%q}
}
Practical example of the above solution:
# Parsing lines in a for loop by setting IFS to a real newline character:
nl="$(printf '\nq')"
nl=${nl%q}
IFS=$nl
for i in $(printf '%b' 'this is line 1\nthis is line 2'); do
echo "i=$i"
done
# Desired output:
# i=this is line 1
# i=this is line 2
# Exercise:
# Try running this example without the IFS=$nl assignment, and predict the outcome.
I'm no bash expert, but this one worked for me:
STR1="Hello"
STR2="World"
NEWSTR=$(cat << EOF
$STR1
$STR2
EOF
)
echo "$NEWSTR"
I found this easier to formatting the texts.
Those picky ones that need just the newline and despise the multiline code that breaks indentation, could do:
IFS="$(printf '\nx')"
IFS="${IFS%x}"
Bash (and likely other shells) gobble all the trailing newlines after command substitution, so you need to end the printf string with a non-newline character and delete it afterwards. This can also easily become a oneliner.
IFS="$(printf '\nx')" IFS="${IFS%x}"
I know this is two actions instead of one, but my indentation and portability OCD is at peace now :) I originally developed this to be able to split newline-only separated output and I ended up using a modification that uses \r as the terminating character. That makes the newline splitting work even for the dos output ending with \r\n.
IFS="$(printf '\n\r')"
On my system (Ubuntu 17.10) your example just works as desired, both when typed from the command line (into sh) and when executed as a sh script:
[bash]§ sh
$ STR="Hello\nWorld"
$ echo $STR
Hello
World
$ exit
[bash]§ echo "STR=\"Hello\nWorld\"
> echo \$STR" > test-str.sh
[bash]§ cat test-str.sh
STR="Hello\nWorld"
echo $STR
[bash]§ sh test-str.sh
Hello
World
I guess this answers your question: it just works. (I have not tried to figure out details such as at what moment exactly the substitution of the newline character for \n happens in sh).
However, i noticed that this same script would behave differently when executed with bash and would print out Hello\nWorld instead:
[bash]§ bash test-str.sh
Hello\nWorld
I've managed to get the desired output with bash as follows:
[bash]§ STR="Hello
> World"
[bash]§ echo "$STR"
Note the double quotes around $STR. This behaves identically if saved and run as a bash script.
The following also gives the desired output:
[bash]§ echo "Hello
> World"
I wasn't really happy with any of the options here. This is what worked for me.
str=$(printf "%s" "first line")
str=$(printf "$str\n%s" "another line")
str=$(printf "$str\n%s" "and another line")
This isn't ideal, but I had written a lot of code and defined strings in a way similar to the method used in the question. The accepted solution required me to refactor a lot of the code so instead, I replaced every \n with "$'\n'" and this worked for me.