Removing Data Type From Tuple When Printing In Scala - scala

I currently have two maps: -
mapBuffer = Map[String, ListBuffer[(Int, String, Float)]
personalMapBuffer = Map[mapBuffer, String]
The idea of what I'm trying to do is create a list of something, and then allow a user to create a personalised list which includes a comment, so they'd have their own list of maps.
I am simply trying to print information as everything is good from the above.
To print the Key from mapBuffer, I use: -
mapBuffer.foreach(line => println(line._1))
This returns: -
Sample String 1
Sample String 2
To print the same thing from personalMapBuffer, I am using: -
personalMapBuffer.foreach(line => println(line._1.map(_._1)))
However, this returns: -
List(Sample String 1)
List(Sample String 2)
I obviously would like it to just return "Sample String" and remove the List() aspect. I'm assuming this has something to do with the .map function, although this was the only way I could find to access a tuple within a tuple. Is there a simple way to remove the data type? I was hoping for something simple like: -
line._1.map(_._1).removeDataType
But obviously no such pre-function exists. I'm very new to Scala so this might be something extremely simple (which I hope it is haha) or it could be a bit more complex. Any help would be great.
Thanks.

What you see if default List.toString behaviour. You build your own string with mkString operation :
val separator = ","
personalMapBuffer.foreach(line => println(line._1.map(_._1.mkString(separator))))
which will produce desired result of Sample String 1 or Sample String 1, Sample String 2 if there will be 2 strings.
Hope this helps!

I have found a way to get the result I was looking for, however I'm not sure if it's the best way.
The .map() method just returns a collection. You can see more info on that here:- https://www.geeksforgeeks.org/scala-map-method/
By using any sort of specific element finder at the end, I'm able to return only the element and not the data type. For example: -
line._1.map(_._1).head
As I was writing this Ivan Kurchenko replied above suggesting I use .mkString. This also works and looks a little bit better than .head in my mind.
line._1.map(_._1).mkString("")
Again, I'm not 100% if this is the most efficient way but if it is necessary for something, this way has worked for me for now.

Related

Match part of a string with regex

I have two arrays of strings and I want to check if a string of array a matches a string from array b. Those strings are phone numbers that might come in different formats. For example:
Array a might have a phone number with prefix like so +44123123123 or 0044123123123
Array b have a standard format without prefixes like so 123123123
So I'm looking for a regex that can match a part of a string like +44123123123 with 123123123
Btw I'm using Swift but I don't think there's a native way to do it (at least a more straightforward solution)
EDIT
I decided to reactivate the question after experimenting with the library #Larme mentioned because of inconsistent results.
I'd prefer a simper solution as I've stated earlier.
SOLUTION
Thanks guys for the responses. I saw many comments saying that Regex is not the right solution for this problem. And this is partly true. It could be true (or false) depending on my current setup/architecture ( which thinking about it now I realise that I should've explained better).
So I ended up using the native solution (hasSuffix/contains) but to do that I had to do some refactoring on the way the entire flow was structured. In the end I think it was the least complicated solution and more performant of the two. I'll give the bounty to #Alexey Inkin for being the first to mention the native solution and the right answer to #Ωmega for providing a more complete solution.
I believe regex is not the right approach for this task.
Instead, you should do something like this:
var c : [String] = b.filter ({ (short : String) -> Bool in
var result = false
for full in a {
result = result || full.hasSuffix(short)
}
return result
})
Check this demo.
...or similar solution like this:
var c : [String] = b.filter ({ (short : String) -> Bool in
for full in a {
if full.hasSuffix(short) { return true }
}
return false
})
Check this demo.
As you do not mention requirements to prefixes, the simplest solution is to check if string in a ends with a string in b. For this, take a look at https://developer.apple.com/documentation/swift/string/1541149-hassuffix
Then, if you have to check if the prefix belongs to a country, you may replace ^00 with + and then run a whitelist check against known prefixes. And the prefix itself can be obtained as a substring by cutting b's length of characters. Not really a regex's job.
I agree with Alexey Inkin that this can also nicely be solved without regex. If you really want a regex, you can try something like the following:
(?:(\+|00)(93|355|213|1684|376))?(\d+)
^^^^^^^^^^^^^^^^^^^^^ Add here all your expected country prefixes (see below)
^^^ ^^ Match a country prefix if it exists but don't give it a group number
^^^^^^^ Match the "prefix-prefix" (+ or 00)
^^^^ Match the local phone number
Unfortunatly with this regex, you have to provide all the expected country prefixes. But you can surely get this list online, e.g. here: https://www.countrycode.org
With this regex above you will get the local phone number in matching group 3 (and the "prefix-prefix" in group 1 and the country code in group 2).

Using ORACLE-LIKE like feature between Spark DataFrame and a List of words - Scala

My requirement is similar to one in :
LINK
Instead of direct match I need LIKE type match on a list. i.e Want to LIKE match COMMENTS with List
ID,COMMENTS
1,bad is he
2,hell thats good
3,sick !thats hell
4,That was good
List = ('good','horrible','hell')
I want to get output like
ID, COMMENTS,MATCHED_WORD,NUM_OF_MATCHES
1,bad is he,,
2,hell thats good,(hell,good),2
3,sick !thats hell,hell,1
4,That was good,good,1
In simpler terms I need : ( rlike isn't matching values from a list instead expects one single string , as far I know it)
file.select($"COMMENTS",$"ID").filter($"COMMENTS".rlike(List_ :_*)).show()
I tried isin , that works but matches WHOLE WORDS ONLY.
file.select($"COMMENTS",$"ID").filter($"COMMENTS".isin(List_ :_*)).show()
Kindly help or please re-direct to me any links as I tried lot of searching !
With simple words I'd use an alternative:
val xs = Seq("good", "horrible", "hell")
df.filter($"COMMENTS".rlike(xs.mkString("|"))
otherwise:
df.filter(xs.foldLeft(lit(false))((acc, x) => acc || $"COMMENTS".rlike(x)))

How to find most frequent string in List of strings

I have a list of strings (List[String]) and I want to obtain the most frequent string from this list:
val list1 = List('a','a','0','b','b','a')
The answer should be:
freq_list1 = a
I was thinking to use list1.sliding(2).count... in order to get the count of unique string, but I don't know how to wrap it into finding the most frequent string.
list1.groupBy(identity).mapValues(_.size).maxBy(_._2)._1
EDIT: See comment below, can be made shorter by using maxBy(_._2.size) without mapping beforehand, thanks #kawty

How do I print the contents of an ApacheSpark RDD in my terminal?

This is my first time using Scala and ApacheSpark for a project. I'm trying to print the contents of an matrix when I run my code in the terminal, but nothing I try is working so far.
Instead I only get this printed:
org.apache.spark.mllib.linalg.distributed.MatrixEntry;#71870da7
org.apache.spark.mllib.linalg.distributed.CoordinateMatrix#1dcca8d3
I just using println() but when I use collect(), that doesn't give a good result either.
The default toString prints the name of a class followed by an address in memory.
org.apache.spark.mllib.linalg.distributed.MatrixEntry;#71870da7
You're going to want to find a way to iterate through your matrix and print each element.
Building on #zero323 's comment ( aside would you like to put an answer out there?): given an RDD[SomeType] you can call
rdd.collect()
or
rdd.take(k)
Then you can print out the results using normal toString() methods that depend on the type of the rdd contents. So if SomeType were a List[Double] then the
println(s"${rdd.collect().mkString(",")}")
would give you a single-line comma separated output of the results.
As #zero323 another consideration is: "do you really want to print out the contents of your rdd?" More likely you might only want a summary - such as
println(s"Number of entries in RDD is ${rdd.count()}")
Iterate over the rdd like this,
rdd.foreach(println)
scala>val rdd1 = sc.parallelize(List(1,2,3,4)).map(_*2)
To print the data within RDD
scala> rdd1.collect().foreach(println)
Output:
2
4
6
8

Parsing options that take more than one value with scopt in scala

I am using scopt to parse command line arguments in scala. I want it to be able to parse options with more than one value. For instance, the range option, if specified, should take exactly two values.
--range 25 45
Coming, from python background, I am basically looking for a way to do the following with scopt instead of python's argparse:
parser.add_argument("--range", default=None, nargs=2, type=float,
metavar=('start', 'end'),
help=(" Foo bar start and stop "))
I dont think minOccurs and maxOccurs solves my problem exactly, nor the key:value example in its help.
Looking at the source code, this is not possible. The Read type class used has a member tuplesToRead, but it doesn't seem to be working when you force it to 2 instead of 1. You will have to make a feature request, I guess, or work around this by using --min 25 --max 45, or --range '25 45' with a custom Read instance that splits this string into two parts. As #roterl noted, this is not a standard way of parsing.
It should be ok if only your values are delimited with something else than a space...
--range 25-45
... although you need to split them manually. Parse it with something like:
opt[String]('r', "range").action { (x, c) =>
val rx = "([0-9]+)\\-([0-9]+)".r
val rx(from, to) = x
c.copy(from = from.toInt, to = to.toInt)
}
// ...
println(s" Got range ${parsedArgs.from}..${parsedArgs.to}")