How to view child nodes of a parent from panda dataframe?

How to view child nodes of a parent from panda dataframe? - mongodb

I was trying to extract data from mongodb. So, I was using panda as a dataframe. I was using twitter dataset. The dataset was in json and when I import it in the database it looks like this:
user:Object
id:1292598776
id_str:1292598776
name:ahmd
screen_name:sameh7753
location:
url:null
description:null
protected:false
followers_count:5
friends_count:76
listed_count:0
created_at:Sat Mar 23 21:59:37 +0000 2013
favourites_count:1
utc_offset:null
time_zone:null
geo_enabled:true
lang:ar
contributors_enabled:false
is_translator:false
profile_background_color:C0DEED
profile_use_background_image:true
default_profile:true
default_profile_image:false
follow_request_sent:null
So, here 'user' is the parent and under it there are many children.There are other fields too in the dataset.
So, I was trying to execute a query which will find any tweet, tweeted on 2013 and the location of the tweet is "US". And then I was storing those cursors in the panda data frame. So when I was printing the data frame I was expecting to see those screen_name but it was not getting printed and also I couldn't access those data.
Here is the code I was using:
import pandas as pd
from pymongo import MongoClient
import matplotlib.pyplot as plt
import re
pd.set_option('display.expand_frame_repr', False)
def _connect_mongo(host, port, db):
conn = MongoClient(host, port)
return conn[db]
def read_mongo(db, collection, host, port):
""" Read from Mongo and Store into DataFrame """
# Connect to MongoDB
db = _connect_mongo(host=host, port=port, db=db)
cursor = db[collection].find({'created_at':{'$regex': '2013'}},
{'place.country':'US'}, no_cursor_timeout=True).toArray()
print cursor
# Expand the cursor and construct the DataFrame
df = pd.DataFrame(list(cursor))
return df
db = 'twittersmall' #'twitter'
collection='twitterdata' #'twitterCol' #
#query={'lang':'{$exists: true}'}
host='localhost'
port=27017
var = read_mongo(db, collection, host, port)
print var
It only prints under the user column in panda data frame this:
False {u'follow_request_sent':
u'profile_use_b...
And rest of the attributes don't get print and I can't even access them by writing var['user.screen_name'] stated in the python code.
How can I access the data?

First you have to include from pandas.io.json import json_normalize.
Now your read_mongo function should be like this-
def read_mongo(db, collection, host, port):
""" Read from Mongo and Store into DataFrame """
# Connect to MongoDB
db = _connect_mongo(host=host, port=port, db=db)
cursor = db[collection].find({'created_at':{'$regex': '2013'}},
no_cursor_timeout=True)
cursor = list(cursor)
df = json_normalize(cursor)
return df
Here json_normalaize flattens those fields which have children and make them columns of the panda dataframe.

Related

Create a mongodb collection index with expireAfterSeconds using pymongo

I have a collection in mongodb. In my python program, I have a variable named coll point at it. I want to create an index on a specified field, digestedOn, which will cause expiration of the record after 7776000 seconds.
I know how to create a simple index in python: coll.create_index([( "digestedOn", pymongo.ASCENDING)]). Where do I stick the {"expireAfterSeconds": 7776000} part?
Here's my whole program, I need the last line fixed so that the index is created with expireAfterSeconds.
import pymongo
import ssl
def connect_to_mongo(host, port, ssls, user, password, auth_source):
return pymongo.MongoClient(host, port, ssl=ssls, username=user, ssl_cert_reqs=ssl.CERT_NONE,
password=password, authSource=auth_source,
authMechanism='SCRAM-SHA-1', maxPoolSize=None)
client = connect_to_mongo(host="10.10.10.10", port=27017, ssls=True, user="user",
password="password",auth_source="admin")
db = client['logs']
colnames = db.list_collection_names()
coll = db[colnames[0]]
coll.create_index([( "digestedOn", pymongo.ASCENDING )])

Just pass it as a named parameter:
coll.create_index([( "digestedOn", pymongo.ASCENDING )], expireAfterSeconds=7776000)

How to fetch mongo db data and pass in jmeter request

I have one post call that response is like this.
{
"status":0,
"message":"Prescription Created",
"jsonResponse":{},
"cid":"C5975K",
"pid":"Rx5975K-175A",
"prescriptionSource":"GO_RX_CTO",
"imageStatus":[]
}
By taking this pid , I have to do the query for the one more record. For example:
db.order.find({"pid":"Rx5975K-175A"})
and the result of this query should pass in one more jmeter request.
I have used MongoDB Script (DEPRECATED) .. But this wont work as its deprecated ..
Tried with JSR223 Sampler, but its not working in new jmeter 3.2
import com.mongodb.*
import com.mongodb.BasicDBObject
MongoCredential coreCredential = MongoCredential.createCredential("${mongodb_user}", "${mongodb_database}", "${mongodb_password}".toCharArray());
MongoClient coreMongoClient = new MongoClient(new ServerAddress("${mongodb_server}", 13017), Arrays.asList(coreCredential));
DB coreDB = coreMongoClient.getDB("${mongodb_database}");
DBCollection coll = coreDB.getCollection("order");
coll.find();

You have to find your result based on "pid" and you are nowhere passing it. After finding you collection you need to create a query and searching using that query.
import com.mongodb.*
import com.mongodb.BasicDBObject
MongoCredential coreCredential = MongoCredential.createCredential("${mongodb_user}", "${mongodb_database}", "${mongodb_password}".toCharArray());
MongoClient coreMongoClient = new MongoClient(new ServerAddress("${mongodb_server}", 13017), Arrays.asList(coreCredential));
DB coreDB = coreMongoClient.getDB("${mongodb_database}");
DBCollection coll = coreDB.getCollection("order");
BasicDBObject query = new BasicDBObject();
query.put("pid", "Rx5975K-175A");
DBObject getData= coll.findOne(query);
and this will give you the desire result

how to declare query to fetch data from mongodb?

import pandas as pd
from pymongo import MongoClient
import matplotlib.pyplot as plt
def _connect_mongo(host, port, db):
""" A util for making a connection to mongo
if username and password:
mongo_uri = 'mongodb://%s:%s#%s:%s/%s' % (username, password, host, port, db)
conn = MongoClient(mongo_uri)
else:
"""
conn = MongoClient(host, port)
return conn[db]
def read_mongo(db, collection, host, port, query):
""" Read from Mongo and Store into DataFrame """
# Connect to MongoDB
db = _connect_mongo(host=host, port=port, db=db)
# Make a query to the specific DB and Collection
cursor = db[collection].find(query)
# Expand the cursor and construct the DataFrame
df = pd.DataFrame(list(cursor))
'''
Delete the _id
if no_id:
del df['_id']
'''
return df
#initialization
db = 'twittersmall'
collection='twitterdata'
query='{lang:{$exists: true}}'
host='localhost'
port=27017
var = read_mongo(db, collection, host, port, query)
print var
tweets_by_lang = var['lang'].value_counts()
fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')
In this code, I was trying to fetch those data from mongodb where language field exists(might be null). So in the attribute, query I assigned a filter that will be used in the fetching operation. But the problem is, when I initialize query='{lang:{$exists: true}}', query is of string datatype and query must be a dictionary. When I declare query={lang:{$exists: true}} , it says -> syntax error. Obviously because so far as I know declaration of dictionary is {'key':'value'} . And when I declare like this query={'lang':'{$exists: true}'} it doesn't work because of keyError as there's filed in the database called lang.
So, how to declare this query and pass it in the method?
ps: when I use query={lang:{$exists: true}} in Webstorm terminal, it works, but I am currently working on jupyter notebook that integrates ipython, so that I can create graph, charts using the data from mongodb. I also used pandas for dataframe.

How to add data in right collection in pymongo?

I want to add data in right collection considering by name. The code below is defining well. collection(db,name) returns the name of collection. But when I want to save the collection name via rightCollection = collections(db, name) and inserting it as db.rightCollection.insert({"1" : "Righ collection"}). Pymongo is creating the collection under name rightCollection not Peter. I want to insert data in Peter. Why is it so? Can I resolve it?
from pymongo import MongoClient
def collections(db,name):
if(name is 'Peter'):
return db.Peter
client = MongoClient()
db = client.myDB
name="Peter"
rightCollection = collections(db, name)
db.rightCollection.insert({"1" : "Righ collection"})

Using pymongo 3.2.2, you don't need the collections function, you can just use the collection name directly:
from pymongo import MongoClient
client = MongoClient()
db = client.myDB
db.Peter.insert_one({'1': 'Right collection'})
That should insert the document {'1': 'Right collection} into collection Peter under database myDB. To verify that the data is inserted correctly, you can use the mongo shell:
> use myDB
> db.Peter.find()
{ "_id": ObjectId("57df7a4f98e914c98d540992"), "1": "Right collection" }
Or, if you need the name Peter to be defined in a variable, you can do:
from pymongo import MongoClient
client = MongoClient()
db = client.myDB
coll_name = 'Peter'
db[coll_name].insert_one({'1': 'Right collection'})

How to save data in mongo collection using spark with mongo-hadoop connector?

I followed mongo-hadoop connector's documentation.
I am able to transfer data from inputCol collection to outputCol collection in testDB database using:
Configuration mongodbConfig = new Configuration();
mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/testDB.inputCol");
JavaSparkContext sc = new JavaSparkContext(sparkClient.sparkContext);
JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD(
mongodbConfig, // Configuration
MongoInputFormat.class, // InputFormat: read from a live cluster.
Object.class, // Key class
BSONObject.class // Value class
);
Configuration outputConfig = new Configuration();
outputConfig.set("mongo.output.format",
"com.mongodb.hadoop.MongoOutputFormat");
outputConfig.set("mongo.output.uri",
"mongodb://localhost:27017/testDB.outputCol");
documents.saveAsNewAPIHadoopFile(
"file:///this-is-completely-unused",
Object.class,
BSONObject.class,
MongoOutputFormat.class,
outputConfig
);
I want to save a simple document say
{"_id":1, "name":"dev"}
in outputCol collection in testDB database.
How can I achieve that?

For using query in Spark MongoDB Hadoop Connector you can use:
mongodbConfig.set("mongo.input.query","{'_id':1,'name':'dev'}")

It's the same, just put your BsonObject into a RDD[(Object,BsonObject)] (that Object can be anything, null should be fine) and save it as you did for documents

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to view child nodes of a parent from panda dataframe? - mongodb

Related

Create a mongodb collection index with expireAfterSeconds using pymongo

How to fetch mongo db data and pass in jmeter request

how to declare query to fetch data from mongodb?

How to add data in right collection in pymongo?

How to save data in mongo collection using spark with mongo-hadoop connector?

Categories

Resources