How to make index when insert dataframe to mongodb via pyspark - mongodb

need some help please.
How can I add an index while inserting dataframe to mongodb like this?
my_dataframe.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()
My dataframe contains a field named my_custom_id and I want to make this one my new index instead of the default index added by mongodb.
Thank you

Related

Nested fields and partitioning in MongoDB to BigQuery template

I want to use MongoDB to BigQuery dataflow template and I have 2 questions:
Can I somehow configure partitioning for the destination table? For e.g. if I want to dump my database every day?
Can I map nested fields in MongoDB to records in BQ instead of columns with string values?
I see User option with values FLATTEN and NONE, but FLATTEN will flatten documents for 1 level only.
May any of these two approaches help?:
Create a destination table with structure definition before running dataflow
Using UDF
I tried to use MongoDB to BigQuery dataflow with User option set to FLATTEN.

Mongoose insert based on filter

I need to insert a document in Mongodb using Mongoose when a certain condition is met. How can I achieve this by using a single query?
I know I can do this using find and create functions but I want to do it in single db query. Is it possible?

Updating old entries on MongoDB

I have some JSON entries in MongoDB but I plan to add new attributes into all my entries, like how you would create a new column and appoint a default value in RDBMS.
Is there any way to do it with MongoDB?
If you mean that every new document needs to be guaranteed to have the new fields, I'm afraid that it's not possible in MongoDB itself, because it is schemaless. I am however not familiar with mongoose, and there can be a solution using it.
If, like your title suggests, you just want to add the new fields to all existing documents you can do it with an update with empty filter like this:
db.collection.updateMany({}, {"newfield1":"val1","newfield2":"val2"})

Easiest way to replace existing mongo collection with new one in Meteor

I have a csv file that I've imported into a Meteor project and I've updated the csv file (added a couple of columns with data) and I'd like to re-import the csv file. If I just import it again, will it over-write the first one? Or would I have two collections with the same name? What's the best way to do this?
If you re-import the file again, it will do insert not update to the collection
So if your collection have a unique key index on a field (like _id because by default _id is indexed and unique) and that field is a column in the csv file. When you import again, mongodb will throw an error saying you have violated a unique unique constraint and stop, your old data is untouched.
If not, your collection don't have any other unique key index and _id is not a column in the csv file. Then if you re-import, your collection will have duplicate records with the old data and the new data that you just imported.
Either way, the result is not what you wanted.
You can't have 2 collections with the same name in the same database.
Easiest way to do: if your data is not important, you could just drop the collection and import again
Else you will have to update the document in mongodb (using mongo console or write a script)

Hadoop - producing multi column ouptut (MongoDB)

I am using Hadoop to apply map reduce in my MongoDB database.
I can able to execute the sample in this link.
Right now I can able to get only key, value pair in output collection after map reduce job was executed. I wonder if it is possible to save multiple columns in a map reduce output collection?
or embedded document in value column?
thanks.
Yes - use BSONWritable as your reducer output class, and create a BSONWritable object with as many columns as you need.
See example here:
https://github.com/mongodb/mongo-hadoop/blob/master/examples/treasury_yield/src/main/java/com/mongodb/hadoop/examples/treasury/TreasuryYieldReducer.java