query returning empty dataframe when called through a module [closed] - pyspark

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 days ago.
Improve this question
I have the following setup.
A module "ExampleModule" installed via egg file which contains the following code.
def fetch_data(sql, sql_query_text):
data = sql(sql_query_text).toPandas()
print(data) # this gives me an EmptyDataframe with 0 rows and 28 columns
In my jupyter notebook that has pyspark kernel running, I have the following code:
from pyspark.sql import SQLContext
sqlContext = SQLContext(spark)
sql = sqlContext.sql
from ExampleModule import *
sql_text = "<THE SELECT QUERY>"
fetch_data(sql, sql_text)
This gives me empty dataframe. However, if I define a local function "fetch_data_local" it runs fine and gives me 43k rows as expected. Example local method:
def fetch_data_local(sql, sql_text):
data = sql(sql_text).toPandas()
print(data.size)
fetch_data_local(sql, sql_text)
Above function works fine and gives me 43k rows.

I had tried it using the Databricks Community edition. It works for me
spark.sparkContext.addPyFile("dbfs:/FileStore/shared_uploads/********#gmail.com/CustomModule.py")
from CustomModule import *
df = [{"Category": 'A', "date": '01/01/2022', "Indictor": 1},
{"Category": 'A', "date": '02/01/2022', "Indictor": 0},
{"Category": 'A', "date": '03/01/2022', "Indictor": 1},
{"Category": 'A', "date": '04/01/2022', "Indictor": 1},
{"Category": 'A', "date": '05/01/2022', "Indictor": 1},
{"Category": 'B', "date": '01/01/2022', "Indictor": 0},
{"Category": 'B', "date": '02/01/2022', "Indictor": 1},
{"Category": 'B', "date": '03/01/2022', "Indictor": 1},
{"Category": 'B', "date": '04/01/2022', "Indictor": 0},
{"Category": 'B', "date": '05/01/2022', "Indictor": 0},
{"Category": 'B', "date": '06/01/2022', "Indictor": 1}]
df = spark.createDataFrame(df)
df.write.mode("overwrite").saveAsTable("sample")
from pyspark.sql import SQLContext
sqlContext = SQLContext(spark)
sql = sqlContext.sql
sql_text = "select * from sample"
fetch_data(sql, sql_text)
Output
df:pyspark.sql.dataframe.DataFrame = [Category: string, Indictor: long ... 1 more field]
/databricks/spark/python/pyspark/sql/context.py:117: FutureWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
warnings.warn(
Category Indictor date
0 A 1 03/01/2022
1 A 1 04/01/2022
2 B 1 02/01/2022
3 B 1 03/01/2022
4 B 0 05/01/2022
5 B 1 06/01/2022
6 A 1 01/01/2022
7 A 0 02/01/2022
8 A 1 05/01/2022
9 B 0 01/01/2022
10 B 0 04/01/2022

Related

postgresql jsonb - from list of integers to list of Objects

I have a question regarding jsonb in postgresql.
I have a table that has a column of type jsonb, where I store a list of integers.
For example list_integers column:
[1, 2, 3, 4]
I want to add a new column in this table, and insert in this column the same IDs, but the form would be list of objects, where the ID field corresponds to the integer.
For example list_ids column:
[{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
What would be the best way to do this?
To transform:
test=> SELECT jsonb_agg(jsonb_build_object('id', id))
test-> FROM jsonb_array_elements(jsonb '[1, 2, 3, 4]') id;
jsonb_agg
----------------------------------------------
[{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}]
(1 row)```

Grafana transformations: calculate percentage in the table panel

i am using grafana-v7.3.6 on ubuntu.
i basically have a timeseries with different versions as tags. I want to create a table with each version and its percentage of the total value. I am using opentsdb-v2.4 as my datasource.
example:
time, metric, value
... result{version = 1}, 10
... result{version = 2}, 5
... result{version = 1}, 5
... result{version = 3}, 2
... result{version = 1}, 2
... result{version = 3}, 5
... result{version = 2}, 5
... result{version = 1}, 3
... result{version = 2}, 0
... result{version = 3}, 3
using series to rows transformations: i was able to get the following:
metric, value
result{version = 1}, 20
result{version = 2}, 10
result{version = 3}, 10
What i would like is the following:
metric, value
result{version = 1}, 50%
result{version = 2}, 25%
result{version = 3}, 25%
how can i achieve this?
any pointers/suggestions would be really appreciated. thank you.
You need to replace the query you're using with the following:
your-query/scalar(sum(your-query))

MongoDB - Is it possible to only insert a record when the record doesn't exist

I have a database with custom ids and I only want to insert a new record if the id is different from the other ids. If the id exists I don't want to update the value (so I think upsert isn't a solution for me).
I'm using the pymongo connector.
Example Database:
[
{"_id": 1, "name": "john"},
{"_id": 2, "name": "paul"}
]
Trap and ignore the DuplicateKeyError, e.g.:
pymongo import MongoClient
from pymongo.errors import DuplicateKeyError
db = MongoClient()['mydatabase']
records = [
{"_id": 1, "name": "john"},
{"_id": 2, "name": "paul"},
{"_id": 3, "name": "ringo"}
]
for record in records:
try:
db.mycollection.insert_one(record)
print (f'Inserted {record}')
except DuplicateKeyError:
print (f'Skipped duplicate {record}')
pass
Result (something like):
Skipped duplicate {'_id': 1, 'name': 'john'}
Skipped duplicate {'_id': 2, 'name': 'paul'}
Inserted {'_id': 3, 'name': 'ringo'}

How to fetch only few fields from all the MongoDB records [duplicate]

This question already has answers here:
How to select a single field for all documents in a MongoDB collection?
(24 answers)
Closed 3 years ago.
I'm new to MongoDB work and I need help in the following.
I have a collection named student in MongoDB. And I have the following data in it.
{'_id': 1, 'name': 'A', 'dt_joined': '2010'}
{'_id': 2, 'name': 'B', 'dt_joined': '2011'}
{'_id': 3, 'name': 'C', 'dt_joined': '2009'}
{'_id': 4, 'name': 'D', 'dt_joined': '2010'}
{'_id': 5, 'name': 'E', 'dt_joined': '2008'}
From the above collection, I want to retrieve (_id, dt_joined) from all the records. That means, I'm expecting the following result.
{'_id': 1, 'dt_joined': '2010'}
{'_id': 2, 'dt_joined': '2011'}
{'_id': 3, 'dt_joined': '2009'}
{'_id': 4, 'dt_joined': '2010'}
{'_id': 5, 'dt_joined': '2008'}
Is it possible with the MongoDB find command?
Thanks in advance!!
try this
db.student.find({}, {_id:1, dt_joined: 1 })
for more details check official site - https://docs.mongodb.com/manual/reference/method/db.collection.find/
You can use one of the following.
db.student.find({}, {_id:1, dt_joined: 1 })
db.student.find({}, {name: 0 })

mongoid search in an array inside an array of hash

Say Object embeds_many searched_items
Here is the document:
{"_id": { "$oid" : "5320028b6d756e1981460000" },
"searched_items": [
{
"_id": { "$oid" : "5320028b6d756e1981470000" },
"hotel_id": 127,
"room_info": [
{
"price": 10,
"amenity_ids": [
1,
2
]
},
{
"price": 160,
"amenity_ids": null
}
]
},
{
"_id": { "$oid" : "5320028b6d756e1981480000" },
"hotel_id": 161,
"room_info": [
{
"price": 400,
"amenity_ids": [4,5]
}
]
}
]
}
I want to find the "searched_items" having room_info.amenity_ids IN [2,3].
I've tried
object.searched_items.where('room_info.amenity_ids' => [2, 3])
object.searched_items.where('room_info.amenity_ids' =>{'$in' => [2,3]}
with no luck
mongoid provides elem_match method for searching within objects of Array Type
e.g.
class A
include Mongoid::Document
field :some_field, type: Array
end
A.create(some_field: [{id: 'a', name: 'b'}, {id: 'c', name: 'd'}])
A.elem_match(some_field: { :id.in=> ["a", "c"] }) => will return the object
Let me know if you have any other doubts.
update
class SearchedHotel
include Mongoid::Document
field :hotel_id, type: String
field :room_info, type: Array
end
SearchedHotel.create(hotel_id: "1", room_info: [{id: 1, amenity_ids: [1,2], price: 600},{id: 2, amenity_ids: [1,2,3], price: 1000}])
SearchedHotel.create(hotel_id: "2", room_info: [{id: 3, amenity_ids: [1,2], price: 600}])
SearchedHotel.elem_match(room_info: {:amenity_ids.in => [1,2]})
Mongoid::Criteria
selector: {"room_info"=>{"$elemMatch"=>{"amenity_ids"=>{"$in"=>[1, 2]}}}}
options: {}
class: SearchedHotel
embedded: false
And it returns both the records. Am I missing something from your question/requirement. If yes, do let me know.
It's important to distinguish between top-level queries sent to the MongoDB server and
client-side operations on embedded-documents that are implemented by Mongoid.
This is the underlying confusion between the original question and the answer from #sandeep-kumar and associated comments.
The original question is all about the where clause on embedded documents after the query result has already been fetched.
The answer #sandeep-kumar and comments are all about top-level queries.
The following test covers both, showing how answers from #sandeep-kumar do work on the examples in your comments,
and also what does and does not work on your original question.
To summarize, Sandeep's answers do work for top-level queries.
Please review your code, if there are remaining problems, please post the exact Ruby code that summarizes the problem.
For your original question, please note that "object" has already been fetched from MongoDB,
and that you can verify this by looking at the log/test.log file.
The subsequent "where" operations are all client-side execution by Mongoid.
Simple "where" clauses do work at the embedded document level.
Complex "where" clauses involving nested array values don't seem to work -
I didn't really expect Mongoid to reimplement '$in' on the client-side.
Knowing that the "object" already has the query result,
and that the association "searched_items" gives you convenient access to the embedded documents,
you can write Ruby code to select what you want as in the following test.
Hope that this helps.
test/unit/my_object_test.rb
require 'test_helper'
require 'pp'
class MyObjectTest < ActiveSupport::TestCase
def setup
MyObject.delete_all
A.delete_all
SearchedHotel.delete_all
end
test "original question with client-side where operation on embedded documents" do
doc = {"_id"=>{"$oid"=>"5320028b6d756e1981460000"}, "searched_items"=>[{"_id"=>{"$oid"=>"5320028b6d756e1981470000"}, "hotel_id"=>127, "room_info"=>[{"price"=>10, "amenity_ids"=>[1, 2]}, {"price"=>160, "amenity_ids"=>nil}]}, {"_id"=>{"$oid"=>"5320028b6d756e1981480000"}, "hotel_id"=>161, "room_info"=>[{"price"=>400, "amenity_ids"=>[4, 5]}]}]}
MyObject.create(doc)
puts
object = MyObject.first
<<-EOT.split("\n").each{|line| puts "#{line}:"; eval "pp #{line}"}
object.searched_items.where('hotel_id' => 127).to_a
object.searched_items.where(:hotel_id.in => [127,128]).to_a
object.searched_items.where('room_info.amenity_ids' => {'$in' => [2,3]}).to_a
object.searched_items.where('room_info.amenity_ids'.to_sym.in => [2,3]).to_a
object.searched_items.select{|searched_item| searched_item.room_info.any?{|room_info| room_info['amenity_ids'] && !(room_info['amenity_ids'] & [2,3]).empty?}}.to_a
EOT
end
test "A comment - top-level queries" do
A.create(some_field: [{id: 'a', name: 'b', tag_ids: [6,7,8]}, {id: 'c', name: 'd'}, tag_ids: [5,6,7]])
A.create(some_field: [{id: 'a', name: 'b', tag_ids: [1,2,3]}, {id: 'c', name: 'd'}, tag_ids: [2,3,4]])
puts
pp A.where('some_field.tag_ids'.to_sym.in => [2,3]).to_a
pp A.elem_match(some_field: { :tag_ids.in => [2,3,4] }).to_a
end
test "SearchedHotel comment - top-level query" do
s = <<-EOT
[#<SearchedHotel _id: 53253c246d756e49a7030000, hotel_id: \"1\", room_info: [{\"id\"=>1, \"amenity_ids\"=>[1, 2], \"price\"=>600}, {\"id\"=>2, \"amenity_ids\"=>[1, 2, 3], \"price\"=>1000}]>, #<SearchedHotel _id: 53253c246d756e49a7040000, hotel_id: \"2\", room_info: [{\"id\"=>3, \"amenity_ids\"=>[1, 2], \"price\"=>600}]>]
EOT
a = eval(s.gsub('#<SearchedHotel ', '{').gsub(/>,/, '},').gsub(/>\]/, '}]').gsub(/_id: \h+, /, ''))
SearchedHotel.create(a)
puts
<<-EOT.split("\n").each{|line| puts "#{line}:"; eval "pp #{line}"}
SearchedHotel.elem_match(room_info: {:amenity_ids.in => [1,2]}).to_a
EOT
end
end
$ ruby -Ilib -Itest test/unit/my_object_test.rb
Run options:
# Running tests:
[1/3] MyObjectTest#test_A_comment_-_top-level_queries
[#<A _id: 5359329d7f11ba034b000002, some_field: [{"id"=>"a", "name"=>"b", "tag_ids"=>[1, 2, 3]}, {"id"=>"c", "name"=>"d"}, {"tag_ids"=>[2, 3, 4]}]>]
[#<A _id: 5359329d7f11ba034b000002, some_field: [{"id"=>"a", "name"=>"b", "tag_ids"=>[1, 2, 3]}, {"id"=>"c", "name"=>"d"}, {"tag_ids"=>[2, 3, 4]}]>]
[2/3] MyObjectTest#test_SearchedHotel_comment_-_top-level_query
SearchedHotel.elem_match(room_info: {:amenity_ids.in => [1,2]}).to_a:
[#<SearchedHotel _id: 5359329d7f11ba034b000003, hotel_id: "1", room_info: [{"id"=>1, "amenity_ids"=>[1, 2], "price"=>600}, {"id"=>2, "amenity_ids"=>[1, 2, 3], "price"=>1000}]>,
#<SearchedHotel _id: 5359329d7f11ba034b000004, hotel_id: "2", room_info: [{"id"=>3, "amenity_ids"=>[1, 2], "price"=>600}]>]
[3/3] MyObjectTest#test_original_question_with_client-side_where_operation_on_embedded_documents
object.searched_items.where('hotel_id' => 127).to_a:
[#<SearchedItem _id: 5359329d7f11ba034b000006, hotel_id: 127, room_info: [{"price"=>10, "amenity_ids"=>[1, 2]}, {"price"=>160, "amenity_ids"=>nil}]>]
object.searched_items.where(:hotel_id.in => [127,128]).to_a:
[#<SearchedItem _id: 5359329d7f11ba034b000006, hotel_id: 127, room_info: [{"price"=>10, "amenity_ids"=>[1, 2]}, {"price"=>160, "amenity_ids"=>nil}]>]
object.searched_items.where('room_info.amenity_ids' => {'$in' => [2,3]}).to_a:
[]
object.searched_items.where('room_info.amenity_ids'.to_sym.in => [2,3]).to_a:
[]
object.searched_items.select{|searched_item| searched_item.room_info.any?{|room_info| room_info['amenity_ids'] && !(room_info['amenity_ids'] & [2,3]).empty?}}.to_a:
[#<SearchedItem _id: 5359329d7f11ba034b000006, hotel_id: 127, room_info: [{"price"=>10, "amenity_ids"=>[1, 2]}, {"price"=>160, "amenity_ids"=>nil}]>]
Finished tests in 0.089544s, 33.5031 tests/s, 0.0000 assertions/s.
3 tests, 0 assertions, 0 failures, 0 errors, 0 skips