CAST jsonb column into INT[] - postgresql

I'm in a situation where I get a jsonb value (from the scrape field which is jsonb) that looks like this:
SELECT COALESCE(scrape->'amenity_ids', '[]'::jsonb) AS ids
FROM my_table
ids |
-------------------------------------------------------------------------------------------------------------+
[] |
[33, 34, 35, 4, 5, 37, 8, 40, 9, 41, 42, 11, 44, 45, 46, 47, 16, 21, 56] |
[129, 35, 4, 36, 37, 103, 40, 41, 45, 77, 17, 23, 30] |
[1, 33, 34, 35, 4, 36, 8, 40, 41, 44, 45, 77, 46, 47, 85, 56, 90, 91, 92, 93, 30, 95] |
[1, 129, 2, 4, 8, 9, 77, 85, 89, 90, 91, 92, 93, 30, 94, 95, 96, 33, 34, 100, 37, 38, 40, 41, 44, 45, 46, 57]|
Note that there are NULL values in the jsonb object. So at this point ids is going to be of type jsonb and what I need is to have an array of integers as I'm trying to query for:
SELECT int_array_ids #> '{33,34,35}' FROM my_table;
Once I'm able to have a converted ids to INT[] I can create indexes to speed my array contains queries.
I tried a subquery using array_agg but it's terrible slow:
SELECT array_agg(arrayed.am_id) FROM (
SELECT
id,
jsonb_array_elements_text(scrape->'amenity_ids') AS am_id
FROM my_table
) AS arrayed
GROUP BY arrayed.id

Related

Place the networkx graph on top of the mplleaflet map

I have a vehicle routing (VRP) model output currently visulized using networkx. In addition, I am trying to place the networkx graph on top of a mplleaflet map, but I came across below problem:
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\mplleaflet\utils.py", line 14, in iter_rings raise ValueError('Unrecognized code: {}'.format(code)) ValueError: Unrecognized code: S
Below is the networx part of the code and the graph.
def DrawNetwork():
G = nx.DiGraph()
locations = DataProblem()._locations
# print(locations)
x = 0
# for vehicle_id in vlist:
for vehicle_id in new_vlist:
n = 0
e = []
node = []
cl=PickupColor(x)
# print(cl)
# print(data.num_vehicles)
# print(this_vehicle.id)
# print(this_vehicle.routes)
for i in vehicle_id:
G.add_node(i, pos=(locations[i][0], locations[i][1]))
# a= [locations[i][0], locations[i][1]]
# print(a)
################
node.append(i)
################
if n > 0:
# print(n)
# print(vehicle_id.routes[n])
# print (vehicle_id.routes[n-1])
u = (vehicle_id[n - 1], vehicle_id[n])
e.append(u)
node.append(i)
G.add_edge(vehicle_id[n - 1], vehicle_id[n])
# nx.draw(G, nx.get_node_attributes(G, 'pos'), nodelist=node, edgelist=e, with_labels=True,
# node_color=cl, width=2, edge_color=cl,
# style='dashed', font_color='w', font_size=12, font_family='sans-serif')
n += 1
nx.draw(G, nx.get_node_attributes(G, 'pos'), nodelist=node, edgelist=e, with_labels=True,
node_color=cl, width=2, edge_color=cl,
style='dashed', font_color='w', font_size=12, font_family='sans-serif')
x += 1
# let's color the node 0 in black
nx.draw_networkx_nodes(G, locations, nodelist=[0], node_color='k')
plt.axis('on')
The "new_vlist" used for drawing networkx is:
[[32, 2, 90], [83, 82, 68, 90], [62, 40, 39, 90], [44, 60, 59, 61, 67, 90], [54, 53, 55, 90], [10, 77, 7, 84, 13, 90], [8, 51, 26, 71, 90], [76, 72, 75, 69, 90], [63, 19, 20, 52, 90], [42, 81, 65, 38, 28, 27, 30, 31, 90], [80, 43, 64, 22, 21, 66, 25, 29, 90], [85, 9, 88, 70, 6, 90], [3, 90], [49, 33, 35, 16, 14, 15, 87, 90], [24, 23, 78, 79, 17, 86, 90], [34, 18, 58, 11, 12, 57, 90], [37, 74, 73, 36, 5, 4, 89, 90], [48, 47, 46, 45, 50, 1, 41, 56, 90]]
below is the neworkx graph plotted:
what is the right way to combine networkx graph with mplleaflet map? thanks

I am getting an error Called "value % is not a member of scala.collection.immutable.Range.Inclusive" while filering

I am new to Scala, here i am trying to find the even numbers from 1 to 100, so while i am filtering,i am getting
scala.collection.immutable.Range.Inclusive
scala> var a = List(1 to 100)
a: List[scala.collection.immutable.Range.Inclusive] = List(Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100))
scala> a.filter(x => (x % 2 == 0))
<console>:26: error: value % is not a member of scala.collection.immutable.Range.Inclusive
a.filter(x => (x % 2 == 0))
^
scala> val b = a.filter(x => x % 2 == 0)
<console>:25: error: value % is not a member of scala.collection.immutable.Range.Inclusive
val b = a.filter(x => x % 2 == 0)
^
You're creating a list of Range, not a list with the ints in that range. For that, change it to:
val a = (1 to 10).toList
But #Tim's right, you can filter directly on the Range
You don't need to wrap the Range in a List, just do this:
val a = 1 to 100
a.filter(x => x % 2 == 0)

Intersection of Two Map rdd's in Scala

I have two RDD's, for example:
firstmapRDD - (0-14,List(0, 4, 19, 19079, 42697, 444, 42748))
secondmapRdd-(0-14,List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94))
I want to find the intersection.
I tried, var interResult = firstmapRDD.intersection(secondmapRdd), which shows no result in output file.
I also tried , cogrouping based on keys, mapRDD.cogroup(secondMapRDD).filter(x=>), but I don't know how to find the intersection between both the values, is it x=>x._1.intersect(x._2), Can someone help me with the syntax?
Even this throws a compile time error, mapRDD.cogroup(secondMapRDD).filter(x=>x._1.intersect(x._2))
var mapRDD = sc.parallelize(map.toList)
var secondMapRDD = sc.parallelize(secondMap.toList)
var interResult = mapRDD.intersection(secondMapRDD)
It may be because of ArrayBuffer[List[]] values, because of which the intersection is not working. Is there any hack to remove it?
I tried doing this
var interResult = mapRDD.cogroup(secondMapRDD).filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case (k,(l,r)) => (k, l.toList.intersect(r.toList))}
Still getting an empty list!
Since you are looking intersect on values, you need to join both RDDs, get all the matched values, then do the intersect on values.
sample code:
val firstMap = Map(1 -> List(1,2,3,4,5))
val secondMap = Map(1 -> List(1,2,5))
val firstKeyRDD = sparkContext.parallelize(firstMap.toList, 2)
val secondKeyRDD = sparkContext.parallelize(secondMap.toList, 2)
val joinedRDD = firstKeyRDD.join(secondKeyRDD)
val finalResult = joinedRDD.map(tuple => {
val matchedLists = tuple._2
val intersectValues = matchedLists._1.intersect(matchedLists._2)
(tuple._1, intersectValues)
})
finalResult.foreach(println)
The output will be
(1,List(1, 2, 5))

Is it bad practice to populate this array using a for loop?

Please forgive me for asking what is probably a real beginners question. My search on google and stackoverflow didn't produce anything conclusive.
My array needs to contain the numbers 0 through 59. Here is a simple for loop to populate the array:
var timeArray = [0]
count = 1
while count < 60 {
timeArray.append(count)
count++
}
On the other hand, I could do this:
var timeArray = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
The second I guess is faster and maybe more readable. The first is maybe more concise.
What is general best practice in this case? Is there another, beter alternative?
Thanks.
Yes you are right the for loop will be slower that the second one.
I would use the second option but with slightly different syntax, just to save typings:
var timeArray = Array(0..<60)

Histogram from two vectors in Matlab

Thanks in advance for the help.
I have two sets of parallel vectors:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 55];
x_count = [7721, 6475, 3890, 2138, 1152, 784, 674, 492, 424, 365, 309, 302, 232, 250, 220, 208, 190, 162, 144, 134, 97, 93, 89, 97, 92, 85, 77, 87, 64, 75, 72, 82, 61, 48, 46, 44, 35, 20, 28, 20, 21, 10, 6, 8, 4, 4, 4, 3, 1, 1];
y = [1, 2, 3, 4, 5, 6, 7, 8, 9, 55];
y_count = [88, 40, 24, 12, 8, 5, 1, 1, 1, 100];
where x, y are the categories, and x_count, y_count are the frequency of each categories. x and y can be of unequal lengths, and need not contain the same categories.
I want to create a side-by-side bar/histogram plot, where the x-axis is the categories, placed side-by-side like this: side by side multiply histogram in matlab. The frequency counts go along the y-axis.
I've tried googling around, but still stuck on this. If someone could help, that would be great. The solution in side by side multiply histogram in matlab works only if x and y have the same length, but mine's not.
You can try this:
% create unique bins
bins = unique([x y]);
% create vectors with zeros same size as bins
xBins = zeros(size(bins));
yBins = zeros(size(bins));
% fill in counts in the respective spots
xBins(ismember(x, bins)) = x_count;
yBins(ismember(y, bins)) = y_count;
bar(bins, [xBins' yBins']);