Related
I'm in a situation where I get a jsonb value (from the scrape field which is jsonb) that looks like this:
SELECT COALESCE(scrape->'amenity_ids', '[]'::jsonb) AS ids
FROM my_table
ids |
-------------------------------------------------------------------------------------------------------------+
[] |
[33, 34, 35, 4, 5, 37, 8, 40, 9, 41, 42, 11, 44, 45, 46, 47, 16, 21, 56] |
[129, 35, 4, 36, 37, 103, 40, 41, 45, 77, 17, 23, 30] |
[1, 33, 34, 35, 4, 36, 8, 40, 41, 44, 45, 77, 46, 47, 85, 56, 90, 91, 92, 93, 30, 95] |
[1, 129, 2, 4, 8, 9, 77, 85, 89, 90, 91, 92, 93, 30, 94, 95, 96, 33, 34, 100, 37, 38, 40, 41, 44, 45, 46, 57]|
Note that there are NULL values in the jsonb object. So at this point ids is going to be of type jsonb and what I need is to have an array of integers as I'm trying to query for:
SELECT int_array_ids #> '{33,34,35}' FROM my_table;
Once I'm able to have a converted ids to INT[] I can create indexes to speed my array contains queries.
I tried a subquery using array_agg but it's terrible slow:
SELECT array_agg(arrayed.am_id) FROM (
SELECT
id,
jsonb_array_elements_text(scrape->'amenity_ids') AS am_id
FROM my_table
) AS arrayed
GROUP BY arrayed.id
I am new to Scala, here i am trying to find the even numbers from 1 to 100, so while i am filtering,i am getting
scala.collection.immutable.Range.Inclusive
scala> var a = List(1 to 100)
a: List[scala.collection.immutable.Range.Inclusive] = List(Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100))
scala> a.filter(x => (x % 2 == 0))
<console>:26: error: value % is not a member of scala.collection.immutable.Range.Inclusive
a.filter(x => (x % 2 == 0))
^
scala> val b = a.filter(x => x % 2 == 0)
<console>:25: error: value % is not a member of scala.collection.immutable.Range.Inclusive
val b = a.filter(x => x % 2 == 0)
^
You're creating a list of Range, not a list with the ints in that range. For that, change it to:
val a = (1 to 10).toList
But #Tim's right, you can filter directly on the Range
You don't need to wrap the Range in a List, just do this:
val a = 1 to 100
a.filter(x => x % 2 == 0)
I have two RDD's, for example:
firstmapRDD - (0-14,List(0, 4, 19, 19079, 42697, 444, 42748))
secondmapRdd-(0-14,List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94))
I want to find the intersection.
I tried, var interResult = firstmapRDD.intersection(secondmapRdd), which shows no result in output file.
I also tried , cogrouping based on keys, mapRDD.cogroup(secondMapRDD).filter(x=>), but I don't know how to find the intersection between both the values, is it x=>x._1.intersect(x._2), Can someone help me with the syntax?
Even this throws a compile time error, mapRDD.cogroup(secondMapRDD).filter(x=>x._1.intersect(x._2))
var mapRDD = sc.parallelize(map.toList)
var secondMapRDD = sc.parallelize(secondMap.toList)
var interResult = mapRDD.intersection(secondMapRDD)
It may be because of ArrayBuffer[List[]] values, because of which the intersection is not working. Is there any hack to remove it?
I tried doing this
var interResult = mapRDD.cogroup(secondMapRDD).filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case (k,(l,r)) => (k, l.toList.intersect(r.toList))}
Still getting an empty list!
Since you are looking intersect on values, you need to join both RDDs, get all the matched values, then do the intersect on values.
sample code:
val firstMap = Map(1 -> List(1,2,3,4,5))
val secondMap = Map(1 -> List(1,2,5))
val firstKeyRDD = sparkContext.parallelize(firstMap.toList, 2)
val secondKeyRDD = sparkContext.parallelize(secondMap.toList, 2)
val joinedRDD = firstKeyRDD.join(secondKeyRDD)
val finalResult = joinedRDD.map(tuple => {
val matchedLists = tuple._2
val intersectValues = matchedLists._1.intersect(matchedLists._2)
(tuple._1, intersectValues)
})
finalResult.foreach(println)
The output will be
(1,List(1, 2, 5))
In the context of another stackoverflow question, I have this snippet:
def orderedGroupBy[T, P](seq: Traversable[T], f: T => P): Traversable[Tuple2[P, Traversable[T]]] = {
#tailrec
def accumulator(seq: Traversable[T], f: T => P, res: List[Tuple2[P, Traversable[T]]]): Traversable[Tuple2[P, Traversable[T]]] = seq.headOption match {
case None => res.reverse
case Some(h) => {
val key = f(h)
val subseq = seq.takeWhile(f(_) == key)
accumulator(seq.drop(subseq.size), f, (key -> subseq) :: res)
}
}
accumulator(seq, f, Nil)
}
I'd like to use it just like one can use .groupBy, e.g.:
orderedGroupBy(1 to 100, (_ / 10))
But the compiler yields an error about not having enough type info
<console>:10: error: missing parameter type for expanded function ((x$1) => x$1.$div(10))
orderedGroupBy(1 to 100, (_ / 10))
What is the idiomatic way to do this?
You can curry the parameters, so that T is inferred solely from seq: Traversable[T].
def orderedGroupBy[T, P](seq: Traversable[T])(f: T => P): Traversable[Tuple2[P, Traversable[T]]] = ???
scala> orderedGroupBy(1 to 100)(_ / 10)
res110: Traversable[(Int, Traversable[Int])] = List((0,Range(1, 2, 3, 4, 5, 6, 7, 8, 9)), (1,Range(10, 11, 12, 13, 14, 15, 16, 17, 18, 19)), (2,Range(20, 21, 22, 23, 24, 25, 26, 27, 28, 29)), (3,Range(30, 31, 32, 33, 34, 35, 36, 37, 38, 39)), (4,Range(40, 41, 42, 43, 44, 45, 46, 47, 48, 49)), (5,Range(50, 51, 52, 53, 54, 55, 56, 57, 58, 59)), (6,Range(60, 61, 62, 63, 64, 65, 66, 67, 68, 69)), (7,Range(70, 71, 72, 73, 74, 75, 76, 77, 78, 79)), (8,Range(80, 81, 82, 83, 84, 85, 86, 87, 88, 89)), (9,Range(90, 91, 92, 93, 94, 95, 96, 97, 98, 99)), (10,Range(100)))
I am using elixir-mongo and trying to stream the results of a query. Here's the code...
def archive_stream(z) do
Stream.resource(
fn ->
{jobs, datetime} = z
lt = datetime_to_bson_utc datetime
c = jobs |> Mongo.Collection.find( %{updated_at: %{"$lt": lt}}) |> Mongo.Find.exec
{:cont, c.response.buffer, c}
end,
fn(z) ->
{j, {cont, therest, c}} = next(z)
case cont do
:cont -> {j, {cont, therest, c}}
:halt -> {:halt, {cont, therest, c}}
end
end,
fn(:halt, resp) -> resp end
)
end
All of the sub-bits seem to work (like the query), but when I try to get at the stream, I fail...
Mdb.archive_stream({jobs, {{2013,11,1},{0,0,0}}})|>Enum.take(2)
I get...
(BadArityError) #Function<2.49475906/2 in Mdb.archive_stream/1> with arity 2 called with 1 argument ({:cont, <<90, 44, 0, 0, 7, 95, 105, 100, 0, 82, 110, 129, 221, 102, 160, 249, 201, 109, 0, 137, 233, 4, 95, 115, 108, 117, 103, 115, 0, 51, 0, 0, 0, 2, 48, 0, 39, 0, 0, 0, 109, 97, 110, 97, 103, 101, 114, 45, ...>>, %Mongo.Cursor{batchSize: 0, collection: %Mongo.Collection{db: %Mongo.Db{auth: {"admin", "edd5404c4f906060b125688e26ffb281"}, mongo: %Mongo.Server{host: 'db-stage.member0.mongolayer.com', id_prefix: 57602, mode: :passive, opts: %{}, port: 27017, socket: #Port<0.27099>, timeout: 6000}, name: "db-stage", opts: %{mode: :passive, timeout: 6000}}, name: "jobs", opts: %{}}, exhausted: false, response: %Mongo.Response{buffer: <<188, 14, 0, 0, 7, 95, 105, 100, 0, 82, 110, 129, 221, 102, 160, 249, 201, 109, 0, 137, 242, 4, 95, 115, 108, 117, 103, 115, 0, 45, 0, 0, 0, 2, 48, 0, 33, 0, 0, 0, 114, 101, 116, 97, ...>>, cursorID: 3958284337526732701, decoder: &Mongo.Response.bson_decode/1, nbdoc: 101, requestID: 67280413, startingFrom: 0}}})
(elixir) lib/stream.ex:1020: Stream.do_resource/5
(elixir) lib/enum.ex:1731: Enum.take/2
I'm stumped. Any ideas?
thanks for the help
Dang! Rookie Error.
:halt -> {:halt, {cont, therest, c}} should be _ -> {:halt, z} and fn(:halt, resp) -> resp end should be fn(resp) -> resp end
I've been d**king around with everything but the after function for a day and a half.
A little more explanation for fellow rookies...
the last option in the next_fun() should probably _ inorder to catch other "bad behavior" and not just {:halt}
the after_fn() is only expecting 1 arg and in the above code that would be the z tuple in the last option of the next_fun(). It is not expecting to see :halt and z, just z.
Would like to have REAL experts input.