New line character in Scala - scala

Is there a shorthand for a new line character in Scala? In Java (on Windows) I usually just use "\n", but that doesn't seem to work in Scala - specifically
val s = """abcd
efg"""
val s2 = s.replace("\n", "")
println(s2)
outputs
abcd
efg
in Eclipse,
efgd
(sic) from the command line, and
abcdefg
from the REPL (GREAT SUCCESS!)
String.format("%n") works, but is there anything shorter?

A platform-specific line separator is returned by
sys.props("line.separator")
This will give you either "\n" or "\r\n", depending on your platform. You can wrap that in a val as terse as you please, but of course you can't embed it in a string literal.
If you're reading text that's not following the rules for your platform, this obviously won't help.
References:
scala.sys package scaladoc (for sys.props)
java.lang.System.getProperties javadoc (for "line.separator")

Your Eclipse making the newline marker the standard Windows \r\n, so you've got "abcd\r\nefg". The regex is turning it into "abcd\refg" and Eclipse console is treaing the \r slightly differently from how the windows shell does. The REPL is just using \n as the new line marker so it works as expected.
Solution 1: change Eclipse to just use \n newlines.
Solution 2: don't use triple quoted strings when you need to control newlines, use single quotes and explicit \n characters.
Solution 3: use a more sophisticated regex to replace \r\n, \n, or \r

Try this interesting construction :)
import scala.compat.Platform.EOL
println("aaa"+EOL+"bbb")

If you're sure the file's line separator in the one, used in this OS, you should do the following:
s.replaceAll(System.lineSeparator, "")
Elsewhere your regex should detect the following newline sequences: "\n" (Linux), "\r" (Mac), "\r\n" (Windows):
s.replaceAll("(\r\n)|\r|\n", "")
The second one is shorter and, I think, is more correct.

var s = """abcd
efg""".stripMargin.replaceAll("[\n\r]","")

Use \r\n instead
Before:
After:

Related

Replace \r\n with \n

This does not work:
scala> """one\r\ntwo\r\nthree\r\nfour""".replace("\r\n", "\n")
res1: String = one\r\ntwo\r\nthree\r\nfour
How to do that in Scala?
Is there a more idiomatic way of doing that, instead of using replace?
The problem is that """ quotes does not expand escape sequences. Three different approaches:
Use single " quotes in order to treat escape sequences correctly: "one\r\ntwo";
Use the s string interpolator, be careful following this approach cause this could lead to unexpected replacements: s"""one\r\ntwo""";
Call treatEscapes directly to expands escape sequences in your string: StringContext.treatEscapes("""one\r\ntwo""").
Refer also to this earlier question.
try this
"""one\r\ntwo\r\nthree\r\nfour""".replace("\\r\\n", "\n")
\ is treated as escape charater within string, so you need to tell the compiler that its not a escape character but a string.

powershell -split('') specify a new line

Get-Content $user| Foreach-Object{
$user = $_.Split('=')
New-Variable -Name $user[0] -Value $user[1]}
Im trying to work on a script and have it split a text file into an array, splitting the file based on each new line
What should I change the "=" sign to
It depends on the exact encoding of the textfile, but [Environment]::NewLine usually does the trick.
"This is `r`na string.".Split([Environment]::NewLine)
Output:
This is
a string.
The problem with the String.Split method is that it splits on each character in the given string. Hence, if the text file has CRLF line separators, you will get empty elements.
Better solution, using the -Split operator.
"This is `r`na string." -Split "`r`n" #[Environment]::NewLine, if you prefer
You can use the String.Split method to split on CRLF and not end up with the empty elements by using the Split(String[], StringSplitOptions) method overload.
There are a couple different ways you can use this method to do it.
Option 1
$input.Split([string[]]"`r`n", [StringSplitOptions]::None)
This will split on the combined CRLF (Carriage Return and Line Feed) string represented by `r`n. The [StringSplitOptions]::None option will allow the Split method to return empty elements in the array, but there should not be any if all the lines end with a CRLF.
Option 2
$input.Split([Environment]::NewLine, [StringSplitOptions]::RemoveEmptyEntries)
This will split on either a Carriage Return or a Line Feed. So the array will end up with empty elements interspersed with the actual strings. The [StringSplitOptions]::RemoveEmptyEntries option instructs the Split method to not include empty elements.
The answers given so far consider only Windows as the running environment. If your script needs to run in a variety of environments (Linux, Mac and Windows), consider using the following snippet:
$lines = $input.Split(
#("`r`n", "`r", "`n"),
[StringSplitOptions]::None)
There is a simple and unusual way to do this.
$lines = [string[]]$input
This will split $input like:
$input.Split(#("`r`n", "`n"))
This is undocumented at least in docs for Conversions.
Beware, this will not remove empty entries.
And it doesn't work for Carriage Return (\r) line ending at least on Windows.
Experimented in Powershell 7.2.
This article also explains a lot about how it works with carriage return and line ends. https://virot.eu/powershell-and-newlines/
having some issues with additional empty lines and such i found the solution to understanding the issue. Excerpt from virot.eu:
So what makes up a new line. Here comes the tricky part, it depends.
To understand this we need to go to the line feed the character.
Line feed is the ASCII character 10. It in most programming languages
escaped by writing \n, but in powershell it is `n. But Windows is not
content with just one character, Windows also uses carriage return
which is ASCII character 13. Escaped \r. So what is the difference?
Line feed advances the pointer down one row and carriage return
returns it to the left side again. If you store a file in Windows by
default are linebreaks are stored as first a carriage return and then
a line feed (\r\n). When we aren’t using any parameters for the
split() command it will split on all white-space characters, that is
both carriage return, linefeed, tabs and a few more. This is why we
are getting 5 results when there is both carriage return and line
feeds.

Why does Scala see more lines in a file?

Running this from the terminal prompt:
$ wc data.csv
195727 15924341 201584826 data.csv
So, 195727 lines. What about Scala?
val raw_rows: Iterator[String] = scala.io.Source.fromFile("data.csv").getLines()
println(raw_rows.length)
Result: 200945
What am I facing here? I wish for it to be the same. In fact, if I use mighty csv (opencsv wrapper lib) it also reads 195727 lines.
It might be a newline issue. From the doc of getLines
Returns an iterator who returns lines (NOT including newline character(s)). It will treat any of \r\n, \r, or \n as a line separator (longest match) - if you need more refined behavior you can subclass Source#LineIterator directly

about use sed Modify the file?

I have a question about using sed to modify file. My file content:
<data-value name="WLS_INSTALL_DIR" value="/home/Oracle/wlserver_10.3">
I want to replace the content of field value="/home/Oracle/wlserver_10.3"
to get this result:
<data-value name="WLS_INSTALL_DIR" value="/u03/Middle_home/Oracle/wlserver_10.3">
I use sed:
sed "6 i/^value=/>/s/value= />\(.*\)/value=\"\/u03\/Oracle/Middleware/wlserver_10.3"\" \/\ /u03/silent.xml
Your sed script has a number of issues.
First off, anything that looks like 6istuff will simply write everything after i ("insert") verbatim as a new line before the sixth line. (Some dialects require a newline after the i and will basically do nothing.)
Secondly, ^value= does not match your input; it would only select a line starting with the string value= (the ^ metacharacter means beginning of line).
Thirdly, the /> in your subsitution regex terminates the substitution and so everything from > onwards is parsed as invalid flags for the substitution. I cannot see the purpose of this part, anyway; it doesn't match your data, and so the regex fails.
What remains after removing all these superfluous and erroneous details is a more or less useful sed script. (I assume the 6 to address only the sixth line of input is intentional, although you don't mention this in the question at all.) I have made some additional minor improvements, such as using % as the substitution delimiter and tightening the regex so that it only ever substitutes a double-quoted value.
sed '6s%value="[^"]*"%value="/u03/Oracle/Middleware/wlserver_10.3"%' /u03/silent.xml
Better than 6 would perhaps be to identify the line with /name="WLS_INSTALL_DIR"/.
Still, as alluded to in a comment, the proper way to manipulate XML is with a dedicated tool such as xsltproc.
Try:
sed 's|/home|/u03/Middle_home|'

What's the difference between single and double quotes in Perl?

I am just begining to learn Perl. I looked at the beginning perl page and started working.
It says:
The difference between single quotes and double quotes is that single quotes mean that their contents should be taken literally, while double quotes mean that their contents should be interpreted
When I run this program:
#!/usr/local/bin/perl
print "This string \n shows up on two lines.";
print 'This string \n shows up on only one.';
It outputs:
This string
shows up on two lines.
This string
shows up on only one.
Am I wrong somewhere?
the version of perl below:
perl -v
This is perl, v5.8.5 built for aix
Copyright 1987-2004, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.
I am inclined to say something is up with your shell/terminal, and whatever you are outputting to is interpreting the \n as a newline and that the problem is not with Perl.
To confirm: This Shouldn't Happen(TM) - in the first case I would expect to see a new line inserted, but with single quotes it ought to output literally the characters \n and not a new line.
In Perl, single-quoted strings do not expand backslash-escapes like \n or \t. The reason you're seeing them expanded is probably due to the nature of the shell that you're using, which is munging your output for some reason.
Everything you need to know about quoting and quote-like operators is in perlop.
To answer your specific question, double-quotes can turn certain sequences of literal characters into other characters. In your example, the double quotes turn the sequence of characters \ and n into the single character that represents a newline. In a single quoted string, that same literal sequence is just the literal \ and n characters.
By "interpreted", they mean that variable names and such will not be printed, but their values instead. \n is an escape sequence, so I'd think it would not be interpreted.
In addition to your O'Reilly link, a reference no less authoritative than the 'Programming Perl' book by Larry Wall, states that backslash interpolation does not occur in single quoted strings.
... much like Unix shell quotes: double quoted string literals are subject to
backslash and variable interpolation; single quoted strings are not
(except for \' and \\, so that you may ...)
Programing Perl, 2nd ed, 1996 page 16
So it would be interesting to see what your Perl does with
print 'Double backslash n: \\n';
As above, please show us the output from 'perl -v'.
And I believe I have confused the forum editor software, because that last Perl 'print' should have indented.
If you use the double quote it will be interpreted the \n as a newline.
But if you use the single quote it will not interpreted the \n as a newline.
For me it is working correctly.
file content
print "This string \n shows up on two lines.";
print 'This string \n shows up on only one.'