Convert Word8Vector.vector to Word8VectorSlice.slice - type-conversion

How can I convert buf to Word8VectorSlice.slice in sml/nj? For example,
val msg = "hello\n";
val buf = Byte.stringToBytes msg; (* how to convert to Word8VectorSlice.slice ?*)

Word8VectorSlice.full buf;
Docs: http://sml-family.org/Basis/mono-vector-slice.html#SIG:MONO_VECTOR_SLICE.full:VAL

Related

scala/databricks/pyspark: how to parse/format a string based on a pattern?

how to parse/format a string based on a pattern which is similar in SQL server with scala or pyspark?
Example,
val string = abcdefg
val pattern= XX-XXX-XX
Then format_string= ab-cde-fg
if val pattern = X-XX-XXXX
Then format_string = a-bc-defg is another example.
I tried substring and others.
Here is my raw initial code:
val pattern="XX-XX-XXXX"
val item="abcdefgh"
var lenplus = pattern.indexOf("-")
var fitem=item.substring(0,lenplus)
val result = pattern.split("-")
for ( a <-result )
{
val len= a.length()
// Displays output
//println(len)
var sitem= item.substring(lenplus,lenplus + len)
lenplus = len + lenplus
fitem=fitem + "-" + sitem
//println(lenplus)
println(fitem)
}
But I received below err message. This is the output and err message
ab-cd
ab-cd-ef
StringIndexOutOfBoundsException: String index out of range: 10

How to return a value from the while/for loop in Scala

I am a newbie in Scala. I want to read data from Oracle database in each Spark Node and convert it to Spark DataFrame. The code is in following:
def read_data(group_id: Int):String = {
val table_name = "table"
val col_name = "col"
val query =
""" select f1,f2,f3,f4,f5,f6,f7,f8
| from """.stripMargin + table_name + """ where MOD(TO_NUMBER(substr("""+col_name+""", -LEAST(2, LENGTH("""+col_name+""")))),"""+num_node+""")="""+group_id
val oracleUser = "ORCL"
val oraclePassword = "*******"
val oracleURL = "jdbc:oracle:thin:#//x.x.x.x:1521/ORCLDB"
val ods = new OracleDataSource()
ods.setUser(oracleUser)
ods.setURL(oracleURL)
ods.setPassword(oraclePassword)
val con = ods.getConnection()
val statement = con.createStatement()
statement.setFetchSize(1000) // important
val resultSet : java.sql.ResultSet = statement.executeQuery(query)
var ret = " "
while(resultSet.next()) {
for {i <- 1 until 8 by 1
ret = ret.concat(resultSet.getString(i))
ret = ret.concat(" ")
}yield(ret)
return ret
}
println("ret:",ret)
return ret
}
val conf = new SparkConf()
.setMaster("local[2]")
.setAppName("testScala")
.set("spark.executor.memory", "8g")
.set("spark.executor.cores", "2")
.set("spark.task.cpus","1")
val sc = new SparkContext(conf)
val rdd = sc.parallelize(group_list,num_node)
.map(read_data).map(x => println(x)).count()
println("rdd:",rdd)
The part of the code that I have problem is in the following:
var ret = " "
while(resultSet.next()) {
for (i <- 1 until 8 by 1) {
ret = ret.concat(resultSet.getString(i))
ret = ret.concat(" ")
return ret
}
return ret
}
println("ret:",ret)
println("ret:",ret) print null string. When I change the code like this:
var ret = " "
while(resultSet.next()) {
for {i <- 1 until 8 by 1
ret = ret.concat(resultSet.getString(i))
ret = ret.concat(" ")
}yield(ret)
return ret
}
I receive this error:
ret is already defined as value ret
ret = ret.concat(" ")
In fact, before running, I see that code has problem with concat:
Cannot resolve symbol concat
Would you please guide me how I can access result of while/for outside them?
Any help is really appreciated.
You can replace your code
var ret = " "
while(resultSet.next()) {
for {i <- 1 until 8 by 1
ret = ret.concat(resultSet.getString(i))
ret = ret.concat(" ")
}yield(ret)
return ret
}
by
val ret = Iterator.continually(resultSet)
.takeWhile(_.next)
.flatMap(r => (1 until 8).map(i => r.getString(i)))
.mkString(" ")
You're using for-comprehension here. What you actually do here is creating a new val called ret. What you write is evaluated as
for(i <- 1 until 8 by 1){
val ret = ret.concat(resultSet.getString(i))
val ret = ret.concat(" ")
} yield(ret)
What you can do instead is usage of
for {i <- 1 until 8 by 1
_ = ret = ret.concat(resultSet.getString(i))
_ = ret = ret.concat(" ")
} yield(ret)
You're using for-comprehension here. What you actually do here is creating a new val called ret. What you write is evaluated as
for(i <- 1 until 8 by 1){
val ret = ret.concat(resultSet.getString(i))
val ret = ret.concat(" ")
} yield(ret)
What you can do instead is usage of
for {i <- 1 until 8 by 1
_ = ret = ret.concat(resultSet.getString(i))
_ = ret = ret.concat(" ")
} yield(ret)
Or to simplify the whole loop replace it with the following (I'm not sure what's your intent with that code, I assume you want to have whole concanated string ret, however by usage of yield I would assume you also want the intermediate steps of this process)
val ret = new StringBuilder(" ")
var steps: Seq[String] = Nil
while(resultSet.next()) {
steps = {
for (i <- 1 until 8 by 1) {
ret = ret.append(resultSet.getString(i)).append(" ")
} yield(ret.toString)
}
}

IPv6ToBigInteger

I have this function which uses InetAddress, but the output is occasionally wrong. (example: "::ffff:49e7:a9b2" will give an incorrect result.)
def IPv6ToBigInteger(ip: String): BigInteger = {
val i = InetAddress.getByName(ip)
val a: Array[Byte] = i.getAddress
new BigInteger(1, a)
}
And the I also have this function
def IPv6ToBigInteger(ip: String): BigInteger = {
val fragments = ip.split(":|\\.|::").filter(_.nonEmpty)
require(fragments.length <= 8, "Bad IPv6")
var ipNum = new BigInteger("0")
for (i <-fragments.indices) {
val frag2Long = new BigInteger(s"${fragments(i)}", 16)
ipNum = frag2Long.or(ipNum.shiftLeft(16))
}
ipNum
}
which appears to have a parsing error because it gives the wrong output unless it is in 0:0:0:0:0:0:0:0 format, but is an based on my IPv4ToLong function:
def IPv4ToLong(ip: String): Long = {
val fragments = ip.split('.')
var ipNum = 0L
for (i <- fragments.indices) {
val frag2Long = fragments(i).toLong
ipNum = frag2Long | ipNum << 8L
}
ipNum
}
This
ipNum = frag2Long | ipNum << 8L
is
ipNum = (frag2Long | ipNum) << 8L
not
ipNum = frag2Long | (ipNum << 8L)
[ And please use foldLeft rather than var and while ]
Interesting challenge: transform IP address strings into BigInt values, allowing for all legal IPv6 address forms.
Here's my try.
import scala.util.Try
def iPv62BigInt(ip: String): Try[BigInt] = Try{
val fill = ":0:" * (8 - ip.split("[:.]").count(_.nonEmpty))
val fullArr =
raw"((?<=\.)(\d+)|(\d+)(?=\.))".r
.replaceAllIn(ip, _.group(1).toInt.toHexString)
.replace("::", fill)
.split("[:.]")
.collect{case s if s.nonEmpty => s"000$s".takeRight(4)}
if (fullArr.length == 8) BigInt(fullArr.mkString, 16)
else throw new NumberFormatException("wrong number of elements")
}
This is, admittedly, a bit lenient in that it won't catch all all non-IPv6 forms, but that's not a trivial task using tools like regex.

for loop into map method with Spark using Scala

Hi I want to use a "for" into a map method in scala.
How can I do it?
For example here for each line read I want to generate a random word :
val rdd = file.map(line => (line,{
val chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
val word = new String;
val res = new String;
val rnd = new Random;
val len = 4 + rnd.nextInt((6-4)+1);
for(i <- 1 to len){
val char = chars(rnd.nextInt(51));
word.concat(char.toString);
}
word;
}))
My current output is :
Array[(String, String)] = Array((1,""), (2,""), (3,""), (4,""), (5,""), (6,""), (7,""), (8,""), (9,""), (10,""), (11,""), (12,""), (13,""), (14,""), (15,""), (16,""), (17,""), (18,""), (19,""), (20,""), (21,""), (22,""), (23,""), (24,""), (25,""), (26,""), (27,""), (28,""), (29,""), (30,""), (31,""), (32,""), (33,""), (34,""), (35,""), (36,""), (37,""), (38,""), (39,""), (40,""), (41,""), (42,""), (43,""), (44,""), (45,""), (46,""), (47,""), (48,""), (49,""), (50,""), (51,""), (52,""), (53,""), (54,""), (55,""), (56,""), (57,""), (58,""), (59,""), (60,""), (61,""), (62,""), (63,""), (64,""), (65,""), (66,""), (67,""), (68,""), (69,""), (70,""), (71,""), (72,""), (73,""), (74,""), (75,""), (76,""), (77,""), (78,""), (79,""), (80,""), (81,""), (82,""), (83,""), (84,""), (85,""), (86...
I don't know why the right side is empty.
There's no need for var here. It's a one liner
Seq.fill(len)(chars(rnd.nextInt(51))).mkString
This will create a sequence of Char of length len by repeatedly calling chars(rnd.nextInt(51)), then makes it into a String.
Thus you'll get something like this :
import org.apache.spark.rdd.RDD
import scala.util.Random
val chars = ('a' to 'z') ++ ('A' to 'Z')
val rdd = file.map(line => {
val randomWord = {
val rnd = new Random
val len = 4 + rnd.nextInt((6 - 4) + 1)
Seq.fill(len)(chars(rnd.nextInt(chars.length-1))).mkString
}
(line, randomWord)
})
word.concat doesn't modify word but return a new String, you can make word a variable and add new string to it:
var word = new String
....
for {
...
word += char
...
}

Reading lines and raw bytes from the same source in scala

I need to write code that does the following:
Connect to a tcp socket
Read a line ending in "\r\n" that contains a number N
Read N bytes
Use those N bytes
I am currently using the following code:
val socket = new Socket(InetAddress.getByName(host), port)
val in = socket.getInputStream;
val out = new PrintStream(socket.getOutputStream)
val reader = new DataInputStream(in)
val baos = new ByteArrayOutputStream
val buffer = new Array[Byte](1024)
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if(firstLine(0) == "OK") {
def read(written: Int, max: Int, baos: ByteArrayOutputStream): Array[Byte] = {
if(written >= max) baos.toByteArray
else {
val count = reader.read(buffer, 0, buffer.length)
baos.write(buffer, 0, count)
read(written + count, max, baos)
}
}
read(0, firstLine(1).toInt, baos)
} else {
// RAISE something
}
baos.toByteArray()
The problem with this code is that the use of DataInputStream#readLine raises a deprecation warning, but I can't find a class that implements both read(...) and readLine(...). BufferedReader for example, implements read but it reads Chars and not Bytes. I could cast those chars to bytes but I don't think it's safe.
Any other ways to write something like this in scala?
Thank you
be aware that on the JVM a char has 2 bytes, so "\r\n" is 4 bytes. This is generally not true for Strings stored outside of the JVM.
I think the safest way would be to read your file in raw bytes until you reache your Binary representation of "\r\n", now you can create a Reader (makes bytes into JVM compatible chars) on the first bytes, where you can be shure that there is Text only, parse it, and contiue safely with the rest of the binary data.
You can achive the goal to use read(...) and readLine(...) in one class. The idea is use BufferedReader.read():Int. The BufferedReader class has buffered the content so you can read one byte a time without performance decrease.
The change can be: (without scala style optimization)
import java.io.BufferedInputStream
import java.io.BufferedReader
import java.io.ByteArrayOutputStream
import java.io.PrintStream
import java.net.InetAddress
import java.net.Socket
import java.io.InputStreamReader
object ReadLines extends App {
val host = "127.0.0.1"
val port = 9090
val socket = new Socket(InetAddress.getByName(host), port)
val in = socket.getInputStream;
val out = new PrintStream(socket.getOutputStream)
// val reader = new DataInputStream(in)
val bufIns = new BufferedInputStream(in)
val reader = new BufferedReader(new InputStreamReader(bufIns, "utf8"));
val baos = new ByteArrayOutputStream
val buffer = new Array[Byte](1024)
val cmd = "get:"
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if (firstLine(0) == "OK") {
def read(written: Int, max: Int, baos: ByteArrayOutputStream): Array[Byte] = {
if (written >= max) {
println("get: " + new String(baos.toByteArray))
baos.toByteArray()
} else {
// val count = reader.read(buffer, 0, buffer.length)
var count = 0
var b = reader.read()
while(b != -1){
buffer(count) = b.toByte
count += 1
if (count < max){
b = reader.read()
}else{
b = -1
}
}
baos.write(buffer, 0, count)
read(written + count, max, baos)
}
}
read(0, firstLine(1).toInt, baos)
} else {
// RAISE something
}
baos.toByteArray()
}
for test, below is a server code:
object ReadLinesServer extends App {
val serverSocket = new ServerSocket(9090)
while(true){
println("accepted a connection.")
val socket = serverSocket.accept()
val ops = socket.getOutputStream()
val printStream = new PrintStream(ops, true, "utf8")
printStream.print("OK 2\r\n") // 1 byte for alpha-number char
printStream.print("ab")
}
}
Seems this is the best solution I can find:
val reader = new BufferedReader(new InputStreamReader(in))
val buffer = new Array[Char](1024)
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if(firstLine(0) == "OK") {
def read(readCount: Int, acc: List[Byte]): Array[Byte] = {
if(readCount <= 0) acc.toArray
else {
val count = reader.read(buffer, 0, buffer.length)
val asBytes = buffer.slice(0, count).map(_.toByte)
read(readCount - count, acc ++ asBytes)
}
}
read(firstLine(1).toInt, List[Byte]())
} else {
// RAISE
}
That is, use buffer.map(_.toByte).toArray to transform a char Array into a Byte Array without caring about the encoding.