Find all names in a mongoDB containing a given substring - scala

Hi there I am using scala and play2-reactivemongo version 0.16.2-play26.
I want to be able to do a search query on a json collection where all names will be returned that contain a given substring. I have come across using $text with indexes but I am not sure how to set this up in the version of reactivemongo I am using. Is $text supposed to be declared in my code?
Does any have an example that is written in Scala?
Many thanks

Thank you #cchantep :)
I managed to solve this problem with this code:
def searchItem(name: String): Future[List[Item]] =
jsonCollectionFuture.flatMap(
_.find(Json.obj("name" -> Json.obj("$regex" -> (".*" + name + ".*"))), None)
.cursor[Item](ReadPreference.Primary)
.collect[List](-1, Cursor.FailOnError[List[Item]]())
)

In case you want a BSONDocument, please use the below code snippet:
val query = BSONDocument("columnName" -> "tally")
collection.find(query,Option.empty[BSONDocument]).cursor[BSONDocument]().collect[List](-1,Cursor.FailOnError[List[BSONDocument]]())

Related

Removing Data Type From Tuple When Printing In Scala

I currently have two maps: -
mapBuffer = Map[String, ListBuffer[(Int, String, Float)]
personalMapBuffer = Map[mapBuffer, String]
The idea of what I'm trying to do is create a list of something, and then allow a user to create a personalised list which includes a comment, so they'd have their own list of maps.
I am simply trying to print information as everything is good from the above.
To print the Key from mapBuffer, I use: -
mapBuffer.foreach(line => println(line._1))
This returns: -
Sample String 1
Sample String 2
To print the same thing from personalMapBuffer, I am using: -
personalMapBuffer.foreach(line => println(line._1.map(_._1)))
However, this returns: -
List(Sample String 1)
List(Sample String 2)
I obviously would like it to just return "Sample String" and remove the List() aspect. I'm assuming this has something to do with the .map function, although this was the only way I could find to access a tuple within a tuple. Is there a simple way to remove the data type? I was hoping for something simple like: -
line._1.map(_._1).removeDataType
But obviously no such pre-function exists. I'm very new to Scala so this might be something extremely simple (which I hope it is haha) or it could be a bit more complex. Any help would be great.
Thanks.
What you see if default List.toString behaviour. You build your own string with mkString operation :
val separator = ","
personalMapBuffer.foreach(line => println(line._1.map(_._1.mkString(separator))))
which will produce desired result of Sample String 1 or Sample String 1, Sample String 2 if there will be 2 strings.
Hope this helps!
I have found a way to get the result I was looking for, however I'm not sure if it's the best way.
The .map() method just returns a collection. You can see more info on that here:- https://www.geeksforgeeks.org/scala-map-method/
By using any sort of specific element finder at the end, I'm able to return only the element and not the data type. For example: -
line._1.map(_._1).head
As I was writing this Ivan Kurchenko replied above suggesting I use .mkString. This also works and looks a little bit better than .head in my mind.
line._1.map(_._1).mkString("")
Again, I'm not 100% if this is the most efficient way but if it is necessary for something, this way has worked for me for now.

How to remove header by using filter function in spark?

I want to remove header from a file. But, since the file will be split into partitions, I can't just drop the first item. So I was using a filter function to figure it out and here below is the code I am using :
val noHeaderRDD = baseRDD.filter(line=>!line.contains("REPORTDATETIME"));
and the error I am getting says "error not found value line "what could be the issue here with this code?
I don't think anybody answered the obvious, whereby line.contains also possible:
val noHeaderRDD = baseRDD.filter(line => !(line contains("REPORTDATETIME")))
You were nearly there, just a syntax issue, but that is significant of course!
Using textFile as below:
val rdd = sc.textFile(<<path>>)
rdd.filter(x => !x.startsWith(<<"Header Text">>))
Or
In Spark 2.0:
spark.read.option("header","true").csv("filePath")

Using ORACLE-LIKE like feature between Spark DataFrame and a List of words - Scala

My requirement is similar to one in :
LINK
Instead of direct match I need LIKE type match on a list. i.e Want to LIKE match COMMENTS with List
ID,COMMENTS
1,bad is he
2,hell thats good
3,sick !thats hell
4,That was good
List = ('good','horrible','hell')
I want to get output like
ID, COMMENTS,MATCHED_WORD,NUM_OF_MATCHES
1,bad is he,,
2,hell thats good,(hell,good),2
3,sick !thats hell,hell,1
4,That was good,good,1
In simpler terms I need : ( rlike isn't matching values from a list instead expects one single string , as far I know it)
file.select($"COMMENTS",$"ID").filter($"COMMENTS".rlike(List_ :_*)).show()
I tried isin , that works but matches WHOLE WORDS ONLY.
file.select($"COMMENTS",$"ID").filter($"COMMENTS".isin(List_ :_*)).show()
Kindly help or please re-direct to me any links as I tried lot of searching !
With simple words I'd use an alternative:
val xs = Seq("good", "horrible", "hell")
df.filter($"COMMENTS".rlike(xs.mkString("|"))
otherwise:
df.filter(xs.foldLeft(lit(false))((acc, x) => acc || $"COMMENTS".rlike(x)))

Is there a way to filter a field not containing something in a spark dataframe using scala?

Hopefully I'm stupid and this will be easy.
I have a dataframe containing the columns 'url' and 'referrer'.
I want to extract all the referrers that contain the top level domain 'www.mydomain.com' and 'mydomain.co'.
I can use
val filteredDf = unfilteredDf.filter(($"referrer").contains("www.mydomain."))
However, this pulls out the url www.google.co.uk search url that also contains my web domain for some reason. Is there a way, using scala in spark, that I can filter out anything with google in it while keeping the correct results I have?
Thanks
Dean
You can negate predicate using either not or ! so all what's left is to add another condition:
import org.apache.spark.sql.functions.not
df.where($"referrer".contains("www.mydomain.") &&
not($"referrer".contains("google")))
or separate filter:
df
.where($"referrer".contains("www.mydomain."))
.where(!$"referrer".contains("google"))
You may use a Regex. Here you can find a reference for the usage of regex in Scala. And here you can find some hints about how to create a proper regex for URLs.
Thus in your case you will have something like:
val regex = "PUT_YOUR_REGEX_HERE".r // something like (https?|ftp)://www.mydomain.com?(/[^\s]*)? should work
val filteredDf = unfilteredDf.filter(regex.findFirstIn(($"referrer")) match {
case Some => true
case None => false
} )
This solution requires a bit of work but is the safest one.

Casbah/Salat: How to query a field that a part of a string is contained?

i try to write a query with Casbah and Salat to query a field that it includes parts of a name.
I tried to use a regular expression like this (inside a SalatDAO):
val regexp = (""".*"""+serverName+""".*""").r
val query = "serverName" -> regexp
val result = find(MongoDBObject(query))
and with
val regexp = ".*"+serverName+".*"
The record is in MongoDB and when i search it with the complete name it works.
How is the right way to tell casbah to search for a part of the string ?
Another thing that i would like to fix is the string concatenation for the parameter.
Is there any default way to escape input parameters with casbah, so the parameter is not
interpreted as a javascript command ?
Best Regards,
Oliver
In mongodb shell you can find the server names contains the specific string by
db.collection.find({serverName:/whatever/i})
i dont have any experience with casbah, i believe it must be like this. please test
val regexp = ("""/"""+serverName+"""/i""").r
find(MongoDBObject("serverName" -> regexp))