How to Define and Use Variables in Pymongo Aggregation Script - mongodb

I am trying to learn about mongodb aggregation. I've been able to get the commands to work for a single output. I am now working on a pymongo script to parse through a dirty collection and output sterilised data into a clean collection. I am stuck on how to define variables properly so that I can use them in the aggregation command. Please forgive me if this turns out to be a trivial matter. But I've been searching through online documents for a while now, but I've not had any luck.
This is the script so far:
from pymongo import MongoClient
import os, glob, json
#
var_Ticker = "corn"
var_Instrument = "Instrument"
var_Date = "Date"
var_OpenPrice = "prices.openPrice.bid"
var_HighPrice = "prices.highPrice.bid"
var_LowPrice = "prices.lowPrice.bid"
var_ClosePrice = "prices.closePrice.bid"
var_Volume = "prices.lastTradedVolume"
var_Unwind = "$prices"
#
#
client = MongoClient()
db = client.cmdty
col_clean = var_Ticker + "_clean"
col_dirty = var_Ticker + "_dirty"
db[col_dirty].aggregate([{$project:{_id:0,var_Instrument:1,var_Date:1,var_OpenPrice:1,var_HighPrice:1,var_LowPrice:1,var_ClosePrice:1,var_Volume:1}},{$unwind:var_Unwind},{$out:col_clean}])
This is the error that I get:
>>> db[col_dirty].aggregate([{$project:{_id:0,var_Instrument:1,var_Date:1,var_OpenPrice:1,var_HighPrice:1,var_LowPrice:1,var_ClosePrice:1,var_Volume:1}},{$unwind:var_Unwind},{$out:col_clean}])
File "<stdin>", line 1
db[col_dirty].aggregate([{$project:{_id:0,var_Instrument:1,var_Date:1,var_OpenPrice:1,var_HighPrice:1,var_LowPrice:1,var_ClosePrice:1,var_Volume:1}},{$unwind:var_Unwind},{$out:col_clean}])
^
SyntaxError: invalid syntax
If I take out the variables and use the proper values, the command works fine.
Any assistance would be greatly appreciated.

In Python you must wrap a literal string like "$project" in quotes:
db[col_dirty].aggregate([{"$project":{"_id":0,var_Instrument:1 ...
The same goes for "_id", which is a literal string. This is different from how Javascript treats dictionary keys.
Note that you should not put quotes around var_Instrument, since that is not a string literal, it's a variable whose value is a string.

Related

Save custom transformers in pyspark

When I implement this part of this python code in Azure Databricks:
class clustomTransformations(Transformer):
<code>
custom_transformer = customTransformations()
....
pipeline = Pipeline(stages=[custom_transformer, assembler, scaler, rf])
pipeline_model = pipeline.fit(sample_data)
pipeline_model.save(<your path>)
When I attempt to save the pipeline, I get this:
AttributeError: 'customTransformations' object has no attribute '_to_java'
Any work arounds?
It seems like there is no easy workaround but to try and implement the _to_java method, as is suggested here for StopWordsRemover:
Serialize a custom transformer using python to be used within a Pyspark ML pipeline
def _to_java(self):
"""
Convert this instance to a dill dump, then to a list of strings with the unicode integer values of each character.
Use this list as a set of dumby stopwords and store in a StopWordsRemover instance
:return: Java object equivalent to this instance.
"""
dmp = dill.dumps(self)
pylist = [str(ord(d)) for d in dmp] # convert byes to string integer list
pylist.append(PysparkObjId._getPyObjId()) # add our id so PysparkPipelineWrapper can id us.
sc = SparkContext._active_spark_context
java_class = sc._gateway.jvm.java.lang.String
java_array = sc._gateway.new_array(java_class, len(pylist))
for i in xrange(len(pylist)):
java_array[i] = pylist[i]
_java_obj = JavaParams._new_java_obj(PysparkObjId._getCarrierClass(javaName=True), self.uid)
_java_obj.setStopWords(java_array)
return _java_obj

Console log in MongoDB Shell

I want to write functions into MongoDB Shell like this:
var last = function(collection) { db[collection].find().sort({_id: -1}).limit(1).toArray(); }
But there is one problem. When I call last() function, it will make no output. How to fix it?
You need to use either use the JavaScript print() function or the mongo specific printjson() function which returns formatted JSON to actually log to output the result from the find method, for example:
var last = function(collection) {
var doc = db.getCollection(collection).find().sort({_id: -1}).limit(1).toArray();
printjson(doc);
};
last("test");

Neo4j: Converting REST call output to JSON

I have a requirement to convert the output of cypher into JSON.
Here is my code snippet.
RestCypherQueryEngine rcqer=new RestCypherQueryEngine(restapi);
String nodeN = "MATCH n=(Company) WITH COLLECT(n) AS paths RETURN EXTRACT(k IN paths | LAST(nodes(k))) as lastNode";
final QueryResult<Map<String,Object>> queryResult = rcqer.query(searchQuery);
for(Map<String,Object> row:queryResult)
{
System.out.println((ArrayList)row.get("lastNode"));
}
Output:
[http://XXX.YY6.192.103:7474/db/data/node/445, http://XXX.YY6.192.103:7474/db/data/node/446, http://XXX.YY6.192.103:7474/db/data/node/447, http://XXX.YY6.192.103:7474/db/data/node/448, http://XXX.YY6.192.103:7474/db/data/node/449, http://XXX.YY6.192.103:7474/db/data/node/450, http://XXX.YY6.192.103:7474/db/data/node/451, http://XXX.YY6.192.103:7474/db/data/node/452, http://XXX.YY6.192.103:7474/db/data/node/453]
I am not able to see the actual data (I am getting URL's). I am pretty sure I am missing something here.
I would also like to convert the output to JSON.
The cypher works in my browser interface.
I looked at various articles around this:
Java neo4j, REST and memory
Neo4j Cypher: How to iterate over ExecutionResult result
Converting ExecutionResult object to json
The last 2 make use of EmbeddedDatabase which may not be possible in my scenario (as the Neo is hosted in another cloud, hence the usage of REST).
Thanks.
Try to understand what you're doing? Your query does not make sense at all.
Perhaps you should re-visit the online course for Cypher: http://neo4j.com/online-course
MATCH n=(Company) WITH COLLECT(n) AS paths RETURN EXTRACT(k IN paths | LAST(nodes(k))) as lastNode
you can just do:
MATCH (c:Company) RETURN c
RestCypherQueryEngine rcqer=new RestCypherQueryEngine(restapi);
final QueryResult<Map<String,Object>> queryResult = rcqer.query(query);
for(Node node : queryResult.to(Node.class))
{
for (String prop : node.getPropertyKeys()) {
System.out.println(prop+" "+node.getProperty(prop));
}
}
I think it's better to use the JDBC driver for what you try to do, and also actually return the properties you're trying to convert to JSON.

Regular Expression error in mongodb if string contains arithmetic operators

using Mongodb in my meteor application I am making a query using regular expression to check if the name or code is already available in the database or not. In my string all the numbers and special character are included. But when regular expression finds a special character ++ in the string it is giving the error
Exception while invoking method
'createSubject' SyntaxError: Invalid regular expression: /^C++$/: Nothing to repeat
I20140109-13:15:21.277(5.5)? at new RegExp ()
my code is
var code_regex = new RegExp(["^",code,"$"].join(""),"i");
var curr = Meteor.curri.findOne({code: code_regex});
It is working fine with the strings but i tried C++ as the code and it produce the above error.
You need to escape your characters because C++ is part of the regular expression, with + being to look for more matches of the prior expression.
From : How to escape regular expression special characters using javascript?
RegExp.escape = function(text) {
return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
};
var code_regex = new RegExp(["^",
RegExp.escape(code),
"$"].join(""),"i");

Passing dynamic value gets failed in Mongodb using java script

I have to copy the data from one collection to another collection based on a date. Here date is calculated as yesterday date dynamically and working properly.
If i pass the dynamic date value as /$yesterday/ to mongo find method, Its getting failed.
Assume data_timestamp format is 2013-08-20 17:04:40.633 and trying to get the result by like query.
Sample JS Code:
db=db.getSiblingDB('masterdb')
$today = new Date();
$yesterday = new Date($today);
$yesterday.setDate($today.getDate() - 1);
var $dd = $yesterday.getDate();
var $mm = $yesterday.getMonth()+1;
var $yyyy = $yesterday.getFullYear();
if($dd<10){$dd='0'+dd} if($mm<10){$mm='0'+$mm} $yesterday = $yyyy+'-'+$mm+'-'+$dd;
db.mastercollection.find( { "data_timestamp": /$yesterday/ } ).forEach( function(x){db.newcollection.insert(x)} );
Is any other way to pass dynamic value without using '$' symbol?
Please share your valuable comments
Thanks in advance...
Ramesh Kasi
The way you're doing your query now, I'm pretty sure that /$yesterday/ is being interpreted as a regular expression matching strings starting with "yesterday". A better approach would be to use the $regex operator so that you can pass in a javascript variable that holds the regular expression you hope to match.