I have a ReactiveMongo BSONDocument but I want to write it to a file - I know there's the BSON format (http://bsonspec.org/spec.html) and I want to write it according to those specs, but the problem is that I can't find any method call to do this. I've been able to convert it to an array of Bytes, but the problem begins when I convert to a string, UTF8 format by default.
However the BSON specs require a 32 bit number in the beginning. Is there a library that can do this for me? If not, how can I add string representing a 32 bit number and UTF8 string together without losing the encoding for either or both?
Here's what I've got in Scala:
import reactivemongo.bson.buffer.ArrayBSONBuffer
val doc = BSONDocument("data" -> overall)
val buffer = new ArrayBSONBuffer()
BSONDocument.write(doc, buffer)
val bytes = buffer.array
val str = new String(bytes, Charset.forName("UTF8"))
For reference, I know in Ruby, we can do something like this, but how do I do the same thing with ReactiveMongo?
bson_data = BSON.serialize({data: arr}).to_s
As indicated in the documentation, you can use BSONDocument.pretty(myDoc).
Note that you are using the deprecated/being removed BSON API.
Related
I'm trying write DataSet into text file.
Example
datasets
.wirte
.text(path)
What I intended is to write "some\text"(String which dataset contains).
As scala to interpret this String, we should set String value like something this
val text: String = "some\\text"
Of course when testing in scala, it prints out correct value ("some\text").
But when I write this dataset with spark.write, it appears to be written "some\\text"
Reading the internal codes, I just found escape option only for csv writing.
Is there any way to solve this problem?
Thanks
I'm running a Spark Job in Scala and I'm struck with parsing the input file.
The Input file(TAB separated) is something like,
date=20160701 name=mike age=26
date=20160402 name=john age=33
I want to parse it and extract only values and not the keys, such as,
20160701 mike 26
20160402 john 33
How can this be achieved in SCALA?
I'm using,
SCALA VERSION: 2.11
You can use CSVParser() and you know the location for key, it will be easy and clean
Test data
val data = "date=20160701\tname=mike\tage=26\ndate=20160402 name=john\tage=33\n"
One statement to do what you asked
val rdd = sc.parallelize(data.split('\n'))
.map(_.split('\t') // split into key=value
.map(_.split('=')(1))) // split those at "=" and select only the value
Display what we got
rdd.collect().foreach(r=>println(r.mkString(",")))
// 20160701,mike,26
// 20160402,john,33
But don't do this for real code. It's very fragile in the face of data format errors, etc. Use CSVParser or something instead as Narendra Parmar suggests.
val rdd = sc.textFile()
rdd.map(x => x.split("\t")).map(x => x.split("=")(1)).map(x => x.mkstring("\t")).saveAsTextFile("")
I'm using Akka to develop a server application and I was wondering if there was a "cleaner" way to go about getting a substring of a ByteString - Something like
bytestr.getSubstringAtFor(int start, int len): ByteString
or similar. Right now I'm converting the ByteString to a list, creating another List[Byte], looping over it with a for loop and copying the relevant bytes to my new list, then converting that list of bytes back to a ByteString.
Is there a "cleaner" way to get a substring of a ByteString?
You should be able to use slice to get a contiguous subset of the bytes taking a start index that is inclusive and an end index that is exclusive. For instance, if you had a ByteString wrapping the string "foobar" and wanted to get a ByteString of just "oob" then that would look like this:
val bs = ByteString("foobar")
val subbs = bs.slice(1, 4)
I have a dataset of employees and their leave-records. Every record (of type EmployeeRecord) contains EmpID (of type String) and other fields. I read the records from a file and then transform into PairRDDFunctions:
val empRecords = sc.textFile(args(0))
....
val empsGroupedByEmpID = this.groupRecordsByEmpID(empRecords)
At this point, 'empsGroupedByEmpID' is of type RDD[String,Iterable[EmployeeRecord]]. I transform this into PairRDDFunctions:
val empsAsPairRDD = new PairRDDFunctions[String,Iterable[EmployeeRecord]](empsGroupedByEmpID)
Then, I go for processing the records as per the logic of the application. Finally, I get an RDD of type [Iterable[EmployeeRecord]]
val finalRecords: RDD[Iterable[EmployeeRecord]] = <result of a few computations and transformation>
When I try to write the contents of this RDD to a text file using the available API thus:
finalRecords.saveAsTextFile("./path/to/save")
the I find that in the file every record begins with an ArrayBuffer(...). What I need is a file with one EmployeeRecord in each line. Is that not possible? Am I missing something?
I have spotted the missing API. It is well...flatMap! :-)
By using flatMap with identity, I can get rid of the Iterator and 'unpack' the contents, like so:
finalRecords.flatMap(identity).saveAsTextFile("./path/to/file")
That solves the problem I have been having.
I also have found this post suggesting the same thing. I wish I saw it a bit earlier.
i try to write a query with Casbah and Salat to query a field that it includes parts of a name.
I tried to use a regular expression like this (inside a SalatDAO):
val regexp = (""".*"""+serverName+""".*""").r
val query = "serverName" -> regexp
val result = find(MongoDBObject(query))
and with
val regexp = ".*"+serverName+".*"
The record is in MongoDB and when i search it with the complete name it works.
How is the right way to tell casbah to search for a part of the string ?
Another thing that i would like to fix is the string concatenation for the parameter.
Is there any default way to escape input parameters with casbah, so the parameter is not
interpreted as a javascript command ?
Best Regards,
Oliver
In mongodb shell you can find the server names contains the specific string by
db.collection.find({serverName:/whatever/i})
i dont have any experience with casbah, i believe it must be like this. please test
val regexp = ("""/"""+serverName+"""/i""").r
find(MongoDBObject("serverName" -> regexp))