hi I have string like this :
var ma_test="~0.000000~~~"
I am using the split function with ~ as delimiter but it did not split correctly
what I try :
scala> var ma_test="~0.000000~~~"
scala> val split_val = ma_test.split("~")
split_val: Array[String] = Array("", 0.000000)
scala> val split_dis = split_val(2)
java.lang.ArrayIndexOutOfBoundsException: 2
... 32 elided
I try also to use val split_val = ma_test.split("\~") and ma_test.split('~') still not able to split correctly
Using split will remove all the trailing empty strings, so there are 2 elements after split (as the leading ~ will also split), and starting from index 0, 1 etc..
Note that you get the first empty entry in the Array, as there is a ~ at the start that will also split, so you should use index 1.
var ma_test="~0.000000~~~"
val split_val = ma_test.split("~")
val split_dis = split_val(1)
Output
var ma_test: String = ~0.000000~~~
val split_val: Array[String] = Array("", 0.000000)
val split_dis: String = 0.000000
You can pass -1 as the second argument to see all parts, and using index 2 will then give you an empty string.
var ma_test="~0.000000~~~"
val split_val = ma_test.split("~", -1)
val split_dis = split_val(2)
Output
var ma_test: String = ~0.000000~~~
val split_val: Array[String] = Array("", 0.000000, "", "", "")
val split_dis: String = ""
The output is the same with a "special character" ~ or the character x which suggests the the split function is not the issue. In either case if you try and access split_val(2) you will get an ArrayOutOfBoundsException since that index does not exist in the array. 0 or 1 will work ok.
scala> var ma_test="x0.000000xxx"
var ma_test: String = x0.000000xxx
scala> ma_test.split("x")
val res1: Array[String] = Array("", 0.000000)
scala> var ma_test="~0.000000~~~"
var ma_test: String = ~0.000000~~~
scala> ma_test.split("~")
val res0: Array[String] = Array("", 0.000000)
I am implementing function that gets random index and returns the element at random index of tuple.
I know that for tuple like, val a=(1,2,3) a._1=2
However, when I use random index val index=random_index(integer that is smaller than size of tuple), a._index doesnt work.
You can use productElement, note that it is zero based and has return type of Any:
val a=(1,2,3)
a.productElement(1) // returns 2nd element
If you know random_index only at runtime the best what you can have is (as #GuruStron answered)
val a = (1,2,3)
val i = 1
val x = a.productElement(i)
x: Any // 2
If you know random_index at compile time you can do
import shapeless.syntax.std.tuple._
val a = (1,2,3)
val x = a(1)
x: Int // 2 // not just Any
// a(4) // doesn't compile
val i = 1
// a(i) // doesn't compile
https://github.com/milessabin/shapeless/wiki/Feature-overview:-shapeless-2.0.0#hlist-style-operations-on-standard-scala-tuples
Although this a(1) seems to be pretty similar to standard a._1.
I'm changing the column position of my DF, because I will put it into Cassandra.
The problems is that I have more that 22 columns and I get this error:
<console>:1: error: too many elements for tuple: 38, allowed: 22
I am using this procedure:
scala> val columns: Array[String] = firstDF.columns
columns: Array[String] = Array(HOCPNY, HOCOL, HONUMR, HOLINH, HODTTO, HOTOUR, HOCLIC, HOOE, HOTPAC, HODTAC, HOHRAC, HODESF, HOCDAN, HOCDRS, HOCDSL, HOOBS, HOTDSC, HONRAC, HOLINR, HOUSCA, HODTEA, HOHREA, HOUSEA, HODTCL, HOHRCL, HOUSCL, HODTRC, HOHRRC, HOUSRC, HODTRA, HOHRRA, HOUSRA, HODTCM, HOHRCM, HOUSCM, HODTUA, HOHRUA, HOUSER)
scala> val reorderedColumnNames: Array[String] = (hoclic,hotpac, hocdan, hocdrs,hocdsl,hocol,hocpny,hodesf,hodtac,hodtcl,hodtcm,hodtea,hodtra,hodtrc,hodtto,hodtua,hohrac,hohrcl,hohrcm,hohrea,hohrra,hohrrc,hohrua,holinh,holinr,honrac,honumr,hoobs,hooe,hotdsc,hotour,housca,houscl,houscm,housea,houser,housra,housrc)
<console>:1: error: too many elements for tuple: 38, allowed: 22
val reorderedColumnNames: Array[String] = (hoclic,hotpac,hocdan,hocdrs,hocdsl,hocol,hocpny,hodesf,hodtac,hodtcl,hodtcm,hodtea,hodtra,hodtrc,hodtto,hodtua,hohrac,hohrcl,hohrcm,hohrea,hohrra,hohrrc,hohrua,holinh,holinr,honrac,honumr,hoobs,hooe,hotdsc,hotour,housca,houscl,houscm,housea,houser,housra,housrc)
How can I solve?.
P.S. The table in Cassandra has this structure:
CREATE TABLE tfm.foehis(hocpny text, hocol text,honumr int,holinh text,hodtto date,hotour text,hoclic int,hooe text,hotpac text,hodtac int,hohrac int,hodesf text,hocdan text,hocdrs text,hocdsl text, hoobs text,hotdsc int,honrac int,holinr int,housca text,hodtea int,hohrea int,housea text,hodtcl int,hohrcl int,houscl text,hodtrc int,hohrrc int,housrc text,hodtra int,hohrra int,housra text,hodtcm int,hohrcm int,houscm text,hodtua int,hohrua int,houser text, PRIMARY KEY((hoclic),hotpac,hocdan));
val reorderedColumnNames: Array[String] = (hoclic,hotpac,hocdan,hocdrs,hocdsl,hocol,hocpny,hodesf,hodtac,hodtcl,hodtcm,hodtea,hodtra,hodtrc,hodtto,hodtua,hohrac,hohrcl,hohrcm,hohrea,hohrra,hohrrc,hohrua,holinh,holinr,honrac,honumr,hoobs,hooe,hotdsc,hotour,housca,houscl,houscm,housea,houser,housra,housrc)
The issue is in the definition of the right hand side of this assignment. Let's take a quick look and what happens with a smaller example
scala> val x = ("hello", "world")
x: (String, String) = (hello,world)
x became a two element tuple! That's because in scala (...) is syntax for making a tuple not a sequence. Instead you should use something like
scala> val x = Seq("hello", "world")
x: Seq[String] = List(hello, world)
To make a sequence or
scala> val x = Array("hello", "world")
x: Array[String] = Array(hello, world)
to make an array. Depending on what you need.
Something like:
val arr : Array[Array[Double]] = new Array(featureSize)
sc.parallelize(arr, 100).saveAsTextFile(args(1))
Then Spark will store data type into HDFS.
Array in Scala exactly corresponds to Java Arrays - in particular, it's a mutable type, and its toString method will return a reference to the Array. When you save this RDD as textFile, it's invoking toString method on each element of the RDD and therefore giving you gibberish. If you want to output actual elements of the Array, you first have to stringify the Array, for example by applying mkString(",") method to each array. Example from Spark shell:
scala> Array(1,2,3).toString
res11: String = [I#31cba915
scala> Array(1,2,3).mkString(",")
res12: String = 1,2,3
For double arrays:
scala> sc.parallelize(Array( Array(1,2,3), Array(4,5,6), Array(7,8,9) )).collect.mkString("\n")
res15: String =
[I#41ff41b0
[I#5d31aba9
[I#67fd140b
scala> sc.parallelize(Array( Array(1,2,3), Array(4,5,6), Array(7,8,9) ).map(_.mkString(","))).collect.mkString("\n")
res16: String =
1,2,3
4,5,6
7,8,9
So, your code should be:
sc.parallelize(arr.map(_.mkString(",")), 100).saveAsTextFile(args(1))
or
sc.parallelize(arr), 100).map(_.mkString(",")).saveAsTextFile(args(1))
Does anyone know how to create a lazy iterator in scala?
For example, I want to iterate through instantiating each element. After passing, I want the instance to die / be removed from memory.
If I declare an iterator like so:
val xs = Iterator(
(0 to 10000).toArray,
(0 to 10).toArray,
(0 to 10000000000).toArray)
It creates the arrays when xs is declared. This can be proven like so:
def f(name: String) = {
val x = (0 to 10000).toArray
println("f: " + name)
x
}
val xs = Iterator(f("1"),f("2"),f("3"))
which prints:
scala> val xs = Iterator(f("1"),f("2"),f("3"))
f: 1
f: 2
f: 3
xs: Iterator[Array[Int]] = non-empty iterator
Anyone have any ideas?
Streams are not suitable because elements remain in memory.
Note: I am using an Array as an example, but I want it to work with any type.
Scala collections have a view method which produces a lazy equivalent of the collection. So instead of (0 to 10000).toArray, use (0 to 10000).view. This way, there will be no array created in the memory. See also https://stackoverflow.com/a/6996166/90874, https://stackoverflow.com/a/4799832/90874, https://stackoverflow.com/a/4511365/90874 etc.
Use one of Iterator factory methods which accepts call-by-name parameter.
For your first example you can do one of this:
val xs1 = Iterator.fill(3)((0 to 10000).toArray)
val xs2 = Iterator.tabulate(3)(_ => (0 to 10000).toArray)
val xs3 = Iterator.continually((0 to 10000).toArray).take(3)
Arrays won't be allocated until you need them.
In case you need different expressions for each element, you can create separate iterators and concatenate them:
val iter = Iterator.fill(1)(f("1")) ++
Iterator.fill(1)(f("2")) ++
Iterator.fill(1)(f("3"))