I am fairly new to scala and I have the need to convert a string that is pipe delimited to one that is comma delimited, with the values wrapped in quotes and any quotes escaped by "\"
in c# i would probably do this like this
string st = "\"" + oldStr.Replace("\"", "\\\\\"").Replace("|", "\",\"") + "\""
I haven't validated that actually works but that is the basic idea behind what I am trying to do. Is there a way to do this easily in scala?
Similarly:
val st = "\"" + oldStr.replaceAll("\"", "\\\\\"").replaceAll("\\|", "\",\"") + "\""
Could also be:
val st = oldStr.replaceAll("\"","\\\\\"").split("\\|").mkString("\"","\",\"","\"")
Related
I am new to spark. I have a huge file which has data like-
18765967790#18765967790#T#20130629#00#31#2981546 " "18765967790#18765967790#T#20130629#19#18#3240165 " "18765967790#18765967790#T#20130629#18#18#1362836
13478756094#13478756094#T#20130629#31#26#2880701 " "13478756094#13478756094#T#20130629#19#18#1230206 " "13478756094#13478756094#T#20130629#00#00#1631440
40072066693#40072066693#T#20130629#79#18#1270246 " "40072066693#40072066693#T#20130629#79#18#3276502 " "40072066693#40072066693#T#20130629#19#07#3321860
I am trying to replace " " with new line character so that my output looks like this-
18765967790#18765967790#T#20130629#00#31#2981546
18765967790#18765967790#T#20130629#19#18#3240165
18765967790#18765967790#T#20130629#18#18#1362836
13478756094#13478756094#T#20130629#31#26#2880701
13478756094#13478756094#T#20130629#19#18#1230206
13478756094#13478756094#T#20130629#00#00#1631440
40072066693#40072066693#T#20130629#79#18#1270246
40072066693#40072066693#T#20130629#79#18#3276502
40072066693#40072066693#T#20130629#19#07#3321860
I have tried with-
val fact1 = sc.textFile("s3://abc.txt").map(x=>x.replaceAll("\"","\n"))
But this doesn't seem to be working. Can someone tell what I am missing?
Edit1- My final output will be a dataframe with schema imposed after splitting with delimeter "#".
I am getting below o/p-
scala> fact1.take(5).foreach(println)
18765967790#18765967790#T#20130629#00#31#2981546
18765967790#18765967790#T#20130629#19#18#3240165
18765967790#18765967790#T#20130629#18#18#1362836
13478756094#13478756094#T#20130629#31#26#2880701
13478756094#13478756094#T#20130629#19#18#1230206
13478756094#13478756094#T#20130629#00#00#1631440
40072066693#40072066693#T#20130629#79#18#1270246
40072066693#40072066693#T#20130629#79#18#3276502
40072066693#40072066693#T#20130629#19#07#3321860
I am getting extra blank lines which is further troubling me to create dataframe. This might seem simple here, but the file is huge, also the rows containing " " are long. In the question I have put only 2 double quotes but they can be more than 40-50 in numbers.
There are more than one quote in between textes, which is creating multiple line breaks. You either need to remove additional quotes before replace or empty lines after replace:
.map(x=>x.replaceAll("\"","\n").replaceAll("(?m)^[ \t]*\r?\n", ""))
Reference: Remove all empty lines
You might be missing implicit Encoders and you try the code as below
spark.read.text("src/main/resources/doubleQuoteFile.txt").map(row => {
row.getString(0).replace("\"","\n") // looking to replace " " with next line
row.getString(0).replace("\" \"","\n") // looking to replace " " with next line
})(org.apache.spark.sql.Encoders.STRING)
newbie question here.
I need to create a list. but my problem is what is the best way to not start with a comma?
eg:
output to /usr2/appsrv/test/Test.txt.
def var dTextList as char.
for each emp no-lock:
dTextList = dTextList + ", " + emp.Name.
end.
put unformatted dTextList skip.
output close.
then my end result is
, jack, joe, brad
what is the best way to get rid of the leading comma?
thank you
Here's one way:
ASSIGN
dTextList = dTextList + ", " WHEN dTextList > ""
dTextList = dTextList + emp.Name
.
This does it without any conditional logic:
for each emp no-lock:
csv = csv + emp.Name + ",".
end.
right-trim( csv, "," ).
or you can do this:
for each emp no-lock:
csv = substitute( "&1,&2" csv, emp.Name ).
end.
trim( csv, "," ).
Which also has the advantage of playing nicely with unknown values (the ? value...)
TRIM() trims both sides, LEFT-TRIM() only does leading characters and RIGHT-TRIM() gets trailing characters.
My vanilla list:
output to /usr2/appsrv/test/Test.txt.
def var dTextList as char no-undo.
for each emp no-lock:
dTextList = substitute( "&1, &2", dTextList, emp.Name )
end.
put unformatted substring( dTextList, 3 ) skip.
output close.
substitute prevents unknowns from wiping out list
keep list delimiter checking outside of loop
generally leave the list delimiter prefixed unless the prefix really needs to go as in the case when outputting it
When using delimited lists often you may want to consider a creating a list class to remove this irrelevant noise out of your code so that you can functionally just add an item to a list and export a list without tinkering with these details every time.
I usually do
ASSIGN dTextList = dTextList + (if dTextList = '' then '' else ',') + emp.name.
I come up (well my colleague did) he come up with this:
dTextList = substitute ("&1&3&2", dTextList, emp.Name, min(dTextList,",")).
But it is cool to see various ways to do this. Thank you for all the response
This results in no leading comma (delimiter) and no fiddling with trim/substring/etc
def var cDelim as char.
def var dTextList as char.
cDelim = ''.
for each emp no-lock:
dTextList = dTextList + cDelim + emp.Name.
cDelim = ','.
end.
I need to make some Windows file paths that contain spaces into string literals in Scala. I have tried wrapping the entire path in double quotes AND wrapping the entire path in double quotes with each directory name that has a space with single quotes. Now it is wanting an escape character for "\Jun" in both places and I don't know why.
Here are the strings:
val input = "R:\'Unclaimed Property'\'CT State'\2015\Jun\ct_finderlist_2015.pdf"
val output = "R:\'Unclaimed Property'\'CT State'\2015\Jun"
Here is the latest error:
The problem is with the \ character, that has to be escaped.
This should work:
val input = "R:\\Unclaimed Property\\CT State\\2015\\Jun.ct_finderlist_2015.pdf"
val output = "R:\\Unclaimed Property\\CT State\\2015\\Jun"
A cleaner way to create string literals is to use triple quotes.
You can wrap your string directly in triple quotes without escaping special characters. And you can put multiple lines string in it.
It's much easier to code and read.
For example
val input =
"""
|R:\Unclaimed Property\CT State\2015\Jun.ct_finderlist_2015.pdf
"""
To add a variable to the string, do it like the following by adding "$variableName".
val input =
s"""
|R:\Unclaimed Property\$variablePath\CT State\2015\Jun.ct_finderlist_2015.pdf
"""
I have a scala program where I take "\t" as a command line input.
Inside the program I want to split a string on the basis of the delimiter passed from command line.
val splitter = args(0).charAt(0)
if(splitter == '\t')
println("true")
else
println("false")
This prints "false" and splitter "\".
The above method works for "," comma delimiter.
Please suggest how can I pass a tab or any other delimiter as command line parameter and use it for the splitting purpose.
It's because if you're passing "\t" in on the command line, then it's coming in as a two-character string \t, not a single-character tab. To do what you want, you can't just take the first character (charAt(0)) since you'll miss the t. Instead you'll have to unescape it by converting from the string \t to the tab character.
An easy way:
val splitter = args(0) match {
case "\\t" => '\t'
case x => x.head // same as x.charAt(0)
}
Hell, I am new to play framework and Scala. I don't know how to display special characters like #, -, "" as text.
Help!
If I got your question right, you are asking how to escape special characters in scala. It's pretty much same as other languages. using escape character \
val str = " \\\" " //output str: \"
or follow the below special syntax to store in string value with exactly what is inside the triple quotes.
val str = """ \" """ //output str: \"