If i have a list of pairs ex. (String, int) how to add every int in strings are the same to one single pair - eclipse

i have this problem where i have a list of pairs string int and i want to sum the total of ints with the same String ex:
list -> [("a",1);("b",1);("a",1);("c",1)] should return
list -> [("a",2);("b",1);("c",1)] order doesnt mather will sort laterĀ“
for now i have this
let rec merge l =
match l with
| [] -> []
| (c1,n1)::(c2,n2)::xs -> if c1 = c2
then
(c1, n1+n2)::merge xs
else
something i cant think of yet
;;
thats my train on thougth but i know it wont work yet
ps i cant use imperative stuff suported by ocaml

If you assume that your list is sorted, the "something i cant think of yet" is simply copying the head of the list and applying the function recursively to the tail: (c1,n1)::merge ((c2,n2)::xs).

Related

Scala combination function issue

I have a input file like this:
The Works of Shakespeare, by William Shakespeare
Language: English
and I want to use flatMap with the combinations method to get the K-V pairs per line.
This is what I do:
var pairs = input.flatMap{line =>
line.split("[\\s*$&#/\"'\\,.:;?!\\[\\(){}<>~\\-_]+")
.filter(_.matches("[A-Za-z]+"))
.combinations(2)
.toSeq
.map{ case array => array(0) -> array(1)}
}
I got 17 pairs after this, but missed 2 of them: (by,shakespeare) and (william,shakespeare). I think there might be something wrong with the last word of the first sentence, but I don't know how to solve it, can anyone tell me?
The combinations method will not give duplicates even if the values are in the opposite order. So the values you are missing already appear in the solution in the other order.
This code will create all ordered pairs of words in the text.
for {
line <- input
t <- line.split("""\W+""").tails if t.length > 1
a = t.head
b <- t.tail
} yield a -> b
Here is the description of the tails method:
Iterates over the tails of this traversable collection. The first value will be this traversable collection and the final one will be an empty traversable collection, with the intervening values the results of successive applications of tail.

scala Convert String to tuple and insert into list

code:
var tup = ""
var l1 = new ListBuffer[String]()
tup=""
for (element1 <- tds) {
tup += element1.text + "|"
}
l1 += tup
l1
Output:
ListBuffer(STANDINGS|CONFERENCE|OVERALL|, ACC|W-L|GB|PCT|W-L|PCT|STRK|, North Carolina|14-2|--|.875|29-5|.853|L1|, Duke|13-3|1|.813|27-6|.818|L1|)
Now this is a list of string. I want it to be a list of tuple.
You can't. The thing you're looking for (assuming you want to split on |) is not well-typed. You would get
ListBuffer(("Standings", "Conference", "Overall"), ("ACC", "W-L", "GB", ...), ...)
The first element would be Tuple3[String, String, String]. The second would be Tuple7[String, ... String], and ListBuffer, like all collections, can't have heterogeneous types. You can get a ListBuffer of arrays, though.
l1.map(_.split("|"))
I used List[List[String]]. And now am able to refer to each element.
Am adding Lists to a List like this
(1::2::Nil)::(5::7::Nil)::Nil
Now my output is like this
List(List(STANDINGS, CONFERENCE, OVERALL), List(ACC, W-L, GB, PCT, W-L, PCT, STRK))

Set of functions that are instances in a common way

I'm pretty new to haskell and I think I'm falling into some OO traps. Here's a sketch of a structure (simplified) that I'm having trouble implementing:
A concept of an Observable that acts on a list of samples (Int) to produce a result (Int)
A concept SimpleObservable that achieves the result using a certain pattern (while there will be Observables that do it other ways), e.g. something like an average
A function instance, e.g. one that's just an average times a constant
My first thought was to use a subclass; something like (the below is kinda contrived but hopefully gets the idea across)
class Observable a where
estimate :: a -> [Int] -> Int
class (Observable a) => SimpleObservable a where
compute :: a -> Int -> Int
simpleEstimate :: a -> [Int] -> Int
simpleEstimate obs list = sum $ map compute list
data AveConst = AveConst Int
instance Observable AveConst where
estimate = simpleEstimate
instance SimpleObservable AveConst where
compute (AveConst c) x = c * x
However, even if something like the above compiles it's ugly. Googling tells me DefaultSignatures might help in that I don't have to do estimate = simpleEstimate for each instance but from discussions around it it seems doing it this way would be an antipattern.
Another option would be to have no subclass, but something like (with the same Observable class):
data AveConst = AveConst Int
instance Observable AveConst where
estimate (AveConst c) list = sum $ map (*c) list
But this way I'm not sure how to reuse the pattern; each Observable has to contain the complete estimate definition and there will be code repetition.
A third way is a type with a function field:
data SimpleObservable = SimpleObservable {
compute :: [Int] -> Int
}
instance Observable SimpleObservable where
estimate obs list =
sum $ map (compute obs) list
aveConst :: Int -> SimpleObservable
aveConst c = SimpleObservable {
compute = (*c)
}
But I'm not sure this is idiomatic either. Any advice?
I propose going even simpler:
type Observable = [Int] -> Int
Then, an averaging observable is:
average :: Observable
average ns = sum ns `div` length ns
If your Observable needs some data inside -- say, a constant to multiply by -- no problem; that's what closures are for. For example:
sumTimesConst :: Int -> Observable
sumTimesConst c = sum . map (c*)
You can abstract over the construction of Observables without trouble; e.g. if you want a SimpleObservable which only looks at elements, and then sums, you can:
type SimpleObservable = Int -> Int
timesConst :: Int -> SimpleObservable
timesConst = (*)
liftSimple :: SimpleObservable -> Observable
liftSimple f = sum . map f
Then liftSimple . timesConst is another perfectly fine way to spell sumTimesConst.
...but honestly, I'd feel dirty doing any of the above things. sum . map (c*) is a perfectly readable expression without introducing a questionable new name for its type.
I do not fully understand the question yet but I'll edit this answer as I learn more.
Something which acts on a list and produces a result can simply be a function. The interface (that is, the type) of this function can be [a] -> b. This says the function accepts a list of elements of some type and returns a result of a possibly different type.
Now, lets invent a small problem as an example. I want to take a list of lists, some function on lists which produces a number, apply this function to every list, and return the average of the numbers.
average :: (Fractional b) => ([a] -> b) -> [[a]] -> b
average f xs = sum (fmap f xs) / genericLength xs
For example, average genericLength will tell me the average length of the sub-lists. I do not need to define any type classes or new types. Simply, I use the function type [a] -> b for those functions which map a list to some result.

How to create a map from a RDD[String] using scala?

My file is,
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
Here there are 7 rows & 5 columns(0,1,2,3,4)
I want the output as,
Map(0 -> Set("sunny","overcast","rainy"))
Map(1 -> Set("hot","mild","cool"))
Map(2 -> Set("high","normal"))
Map(3 -> Set("false","true"))
Map(4 -> Set("yes","no"))
The output must be the type of [Map[Int,Set[String]]]
EDIT: Rewritten to present the map-reduce version first, as it's more suited to Spark
Since this is Spark, we're probably interested in parallelism/distribution. So we need to take care to enable that.
Splitting each string into words can be done in partitions. Getting the set of values used in each column is a bit more tricky - the naive approach of initialising a set then adding every value from every row is inherently serial/local, since there's only one set (per column) we're adding the value from each row to.
However, if we have the set for some part of the rows and the set for the rest, the answer is just the union of these sets. This suggests a reduce operation where we merge sets for some subset of the rows, then merge those and so on until we have a single set.
So, the algorithm:
Split each row into an array of strings, then change this into an
array of sets of the single string value for each column - this can
all be done with one map, and distributed.
Now reduce this using an
operation that merges the set for each column in turn. This also can
be distributed
turn the single row that results into a Map
It's no coincidence that we do a map, then a reduce, which should remind you of something :)
Here's a one-liner that produces the single row:
val data = List(
"sunny,hot,high,FALSE,no",
"sunny,hot,high,TRUE,no",
"overcast,hot,high,FALSE,yes",
"rainy,mild,high,FALSE,yes",
"rainy,cool,normal,FALSE,yes",
"rainy,cool,normal,TRUE,no",
"overcast,cool,normal,TRUE,yes")
val row = data.map(_.split("\\W+").map(s=>Set(s)))
.reduce{(a, b) => (a zip b).map{case (l, r) => l ++ r}}
Converting it to a Map as the question asks:
val theMap = row.zipWithIndex.map(_.swap).toMap
Zip the list with the index, since that's what we need as the key of
the map.
The elements of each tuple are unfortunately in the wrong
order for .toMap, so swap them.
Then we have a list of (key, value)
pairs which .toMap will turn into the desired result.
These don't need to change AT ALL to work with Spark. We just need to use a RDD, instead of the List. Let's convert data into an RDD just to demo this:
val conf = new SparkConf().setAppName("spark-scratch").setMaster("local")
val sc= new SparkContext(conf)
val rdd = sc.makeRDD(data)
val row = rdd.map(_.split("\\W+").map(s=>Set(s)))
.reduce{(a, b) => (a zip b).map{case (l, r) => l ++ r}}
(This can be converted into a Map as before)
An earlier oneliner works neatly (transpose is exactly what's needed here) but is very difficult to distribute (transpose inherently needs to visit every row)
data.map(_.split("\\W+")).transpose.map(_.toSet)
(Omitting the conversion to Map for clarity)
Split each string into words.
Transpose the result, so we have a list that has a list of the first words, then a list of the second words, etc.
Convert each of those to a set.
Maybe this do the trick:
val a = Array(
"sunny,hot,high,FALSE,no",
"sunny,hot,high,TRUE,no",
"overcast,hot,high,FALSE,yes",
"rainy,mild,high,FALSE,yes",
"rainy,cool,normal,FALSE,yes",
"rainy,cool,normal,TRUE,no",
"overcast,cool,normal,TRUE,yes")
val b = new Array[Map[String, Set[String]]](5)
for (i <- 0 to 4)
b(i) = Map(i.toString -> (Set() ++ (for (s <- a) yield s.split(",")(i))) )
println(b.mkString("\n"))

simple function to return list of integers

if am trying to write a simple function that list of pair of integers - representing a graph and returns a list of integers : all the nodes in a graph
eg if input is [(1,2) (3,4) (5,6) (1,5)]
o/p should be [1,2,3,4,5,6,1,5]
The function is simply returning list of nodes , in the returning list values may repeat as above.
I wrote the following function
fun listofnodes ((x:int,y:int)::xs) = if xs=nil then [x::y] else [[x::y]#listofnodes(xs)]
stdIn:15.12-15.18 Error: operator and operand don't agree [tycon mismatch
operator domain: int * int list
operand: int * int
in expression:
x :: y.
I am not able to figure out what is wrong.
first of all you should know what each operator does:
:: puts individual elemtents into an existing list so that: 1::2::3::[] = [1,2,3]
# puts two lists together so that: [1,2] # [3,4] = [1,2,3,4]
you can also use :: to put lists together but then it becomes a list of lists like:
[1,2] :: [3,4] = [[1,2],[3,4]]
so by writing [x::y] you are saying that x and y should become a list inside a list.
and you shouldnt use an if statement to check for the end of the list, instead you can use patterns to do it like this:
fun listofnodes [] = []
| listofnodes ((x,y)::xs) = x :: y :: listofnodes(xs);
the first pattern assures that when we reach the end of the list, when you extract the final tuple your xs is bound to an empty list which it calls itself with, it leaves an empty list to put all the elements into, so that [(1,2) (3,4) (5,6) (1,5)] would evaluate like this:
1 :: 2 :: 3 :: 4 :: 5 :: 6 :: 1 :: 5 :: [] = [1,2,3,4,5,6,1,5].
you could also make it like this:
fun listofnodes [] = []
| listofnodes ((x,y)::xs) = [x,y] # listofnodes(xs);
this way you make a small 2 element list out of each tuple, and then merge all these small lists into one big list. you dont really need the empty list at the end, but its the only way of ensuring that the recursion stops at the end of the list and you have to put something on the other side of the equals sign. it evaluates like this:
[1,2] # [3,4] # [5,6] # [1,5] # [] = [1,2,3,4,5,6,1,5].
also you cast your x and y as ints, but you dont really have to. if you dont, it gets the types " ('a * 'a) list -> 'a list " which just means that it works for all input types including ints (as long as the tuple doesnt contain conflicting types, like a char and an int).
im guessing you know this, but in case you dont: what you call pairs, (1,2), is called tuples.