scala fast check if string is ip address - scala

I am running spark with scala and want to filter out Seq[String] where String might be some domain, ipv4, ipv6, some string or empty.
What will be fastest way to check if string is ipv4 or ipv6?
Right now this is my approach:
import com.google.common.net.InetAddresses
val someString: String = ""
InetAddresses.isInetAddress(someString)
is there any other faster approach?

Related

Scala - How to get Array[Byte] from scala.io.Source

I have an instance of scala.io.BufferedSource (retrieved from scala.io.Source) and want to get raw bytes out of it. Most of the answers found on the internet use getLines method which disregards new-line delimiters. I need to retrieve the contents as-is and the API seems rather complicated. What is the easiest way to do that?
You can do it something like this:
val bs: BufferedSource = scala.io.Source.fromURL(new URI("https://google.com").toURL)
val result: Array[Byte] = bs.map(_.toByte).toArray

How to override stdin to a String

Basic question, I want to set the standard input to be a specific string. Currently I am trying it with this:
import java.nio.charset.StandardCharsets
import java.io.ByteArrayInputStream
// Let's say we are inside a method now
val str = "textinputgoeshere"
System.setIn(new ByteArrayInputStream(str.getBytes(StandardCharsets.UTF_8)))
Because that's similar to how I'd do it in Java, however str.getBytes seems to work differently in Scala as System in is set to a memory address when I check it with println....
I've looked at the Scala API: http://www.scala-lang.org/api/current/scala/Console$.html#setIn(in:java.io.InputStream):Unit
and I've found
def withIn[T](in: InputStream)(thunk: ⇒ T): T
But this seems to only set the input stream for a specific chunk of code, I'd like this to be a feature in a Setup method in my JUnit tests.
My problem ended up being something related to my code, not this specific concept. The correct way to override Standard In / System In to a String in Scala is the following:
val str = "your string here"
val in: InputStream = new ByteArrayInputStream(str.getBytes(StandardCharsets.UTF_8))
Console.withIn(in)(yourMethod())"
My tests run correctly now.

What is the fastest way to read an entire file into a String in Scala?

I am currently using the following to do this.
val string = scala.io.Source.fromFile(filePath).mkString
However, I've noticed that it is quite slow. Is there any better (in terms of speed) method to read the whole file into a string?
I used the following. This is much faster than my previous approach.
import java.nio.file.Files
import java.nio.file.Paths
val string = new String(Files.readAllBytes(Paths.get(filePath)))

Is there a substring proxy in scala?

Using strings as String objects is pretty convenient for many string processing tasks.
I need extract some substrings to process and scala String class provide me with such functionality. But it is rather expensive: new String object is created every time substring function is used. Using tuples (string : String, start : Int, stop : Int) solves the performance problem, but makes code much complicated.
Is there any library for creating string proxys, that stores original string, range bound and is compatibles with other string functions?
Java 7u6 and later now implement #substring as a copy, not a view, making this answer obsolete.
If you're running your Scala program on the Sun/Oracle JVM, you shouldn't need to perform this optimization, because java.lang.String already does it for you.
A string is stored as a reference to a char array, together with an offset and a length. Substrings share the same underlying array, but with a different offset and/or length.
Look at the implementation of String (in particular substring(int beginIndex, int endIndex)): it's already represented as you wish.

How to create an Array from Iterable in Scala 2.7.7?

I'm using Scala 2.7.7
I'm experiencing difficulties with access to the documentation, so code snippets would be greate.
Context
I parse an IP address of 4 or 16 bytes in length. I need an array of bytes, to pass into java.net.InetAddress. The result of String.split(separator).map(_.toByte) returns me an instance of Iterable.
I see two ways to solve the problem
use an array of 16 bytes length, fil it from Iterable and return just a part of it, if not all fields are used (Is there a function to fill an array in 2.7.7? How to get the part?).
use a dynamic length container and form an array form it (Which container is suitable?).
Current implementation is published in my other question about memory leaks.
In Scala 2.7, Iterable has a method called copyToArray.
I'd strongly advise you not to use an Array here, unless you have to use a particular library/framework then requires an array.
Normally, you'd be better off with a native Scala type:
String.split(separator).map(_.toByte).toList
//or
String.split(separator).map(_.toByte).toSeq
Update
Assuming that your original string is a delimited list of hostnames, why not just:
val namesStr = "www.sun.com;www.stackoverflow.com;www.scala-tools.com"
val separator = ";"
val addresses = namesStr.split(separator).map(InetAddress.getByName)
That'll give you an iterable of InetAddress instances.