Omit database collection when importing from MongoDB to ElasticSearch - mongodb

I'm trying to omit a collection from being included when importing from a MongoDB database to ElasticSearch.
https://github.com/appbaseio/abc is being used with a transform file, which has the following code:
t.Source('source', source, '/.*/')
.Transform(omit(user_database_bugtest.users))
.Save('sink', sink, '/.*/');
Database: user_database_bugtest
Collection to be omitted: users
I assume this is what's not formatted correctly, unless I have to make other changes: user_database_bugtest.users

I used this transform file code:
t.Source("source", source, '/^testcollection$/').Save("sink", sink, "/.*/")

Related

Get Schema using collection name mongoose

Lets say I have a collection "employees" in mongodb.now i want to get the
Schema of that collection using "mongoose".Can I do that? I want to have the
schema object from the collection name.
import mongoose from 'mongoose';
public getMappers(collectionName): Schema {
let schema = mongoose.model(collectionName).schema;
return schema ;
}
is there any way to do this?
Short answer: No.
Let me explain why? NoSQL DBs are known for the flexibility of the unknown data fields. One document can have a different set of fields from another sibling document. Due to this, a tool can not determine all the fields in your schema (and ponder that fields can be added later as well). However, You can get a superset of fields by looking at all the documents in your collections and creating a schema out of it.
Mongo compass has a schema tab where you can analyze and use the collection menu to export schema JSON. See below:
You will still need to do a lot of manipulation to create schema out of this JSON, and this JSON isn't meant for creating schema but to understand the kind of data your collection has. e.g. How many docs have this particular field? How many unique values are there for a particular field(cardinality) etc?
Edit 1: I found that the analyzer runs on a subset of docs only not the full collection. We might miss some fields due to that. Read more here

MongoDB - querying GridFS by metadata does not return any results

I am trying to query MongoDB database for a file stored in GridFS using metadata in the following way:
db['fs'].files.find({'metadata': {'a_field': 'a_value'}})
And it does not return any results whereas I can see the file with such a field value exists when I run e.g.:
db['fs'].files.find()
What is wrong about my query?
It turns out the problem is solved by changing the nesting of JSON query document from:
{'metadata': {'a_field': 'a_value'}}
to:
{'metadata.a_field': 'a_value'}
It is still a mystery to me why the two queries are not equivalent, though.

mongodb dataypes when importing

I want to clone a collection to a new collection, remove all the documents, and then import new documents from a csv file. When I do the copy using copyTo everything works fine. The datatypes are copied over from the source collection to the new collection. However, after I remove all the documents from the new collection and import from the csv, the datatypes are lost. The datatypes from my source csv are already setup to match what is in the source collection I copied from.
Is there a way to preserve the datatypes after removing all documents from a collection?
How can I copy the datatypes from my csv when importing? For example my date columns show as string.
A new collection doesn't have a fixed schema so documents added don't have to be similar unless you've created the collection using the validator option. You can also add validation to an existing collection. See Document Validation in the MongoDB manual.

MongoDB import to different collections set by a field

I have a file called data.json and extracted with mongoexport, with the following structure:
{"id":"63","name":"rcontent","table":"modules"}
{"id":"81","name":"choicegroup","table":"modules"}
{"id":"681","course":"1242","name":"Requeriments del curs","timemodified":"1388667164","table":"page"}
{"id":"682","course":"1242","name":"Guia d'estudi","timemodified":"1374183513","table":"page"}
What I need is to import this file into my local mongodb with a command like mongoimport or with pymongo, but storing every line in the collection named after the table value.
For example, the collection modules would contain the documents
{"id":"63","name":"rcontent"} and {"id":"81","name":"choicegroup"}
I've tried with mongoimport but I haven't seen any option which allows that. Does anyone know if there is a command or a method to do that?
Thank you
The basic steps for this using python are:
parse the data.json file to create python objects
extract the table key value pair from each document object
insert the remaining doc into a pymongo collection
Thankfully, pymongo makes this pretty straightforward, as below:
import json
from pymongo import MongoClient
client = MongoClient() # this will use default port and host
db = client['test-db'] # select the db to use
with open("data.json", "r") as json_f:
for str_doc in json_f.readlines():
doc = json.loads(str_doc)
table = doc.pop("table") # remove the 'table' key
db[table].insert(doc)

mongodump by date / find() in dumped data

How to dump all collections by date? If my records hasn't timestamp field?
Fields: _id, name, email, carnumber... etc.
And how to look/find() in archived/dumped database?
I need to create search mechanism, for searching in archive
You can pass a query to mongodump that will make it dump only a portion of your data. If you can't make a query that finds a required portion of data, then you're out of luck.
Result of mongodump is a collection of bson files. They are not directly queryable. But you can load them into another database and query that. Or you can use mongoexport utility that creates JSON documents. JSON is a little bit easier to work with.
Although what Sergio says is broadly true, let me expand a bit:
First, You mention using _id - if that is an ObjectID (the default), then it contains a timestamp - the first 4 bytes are a unix style timestamp:
http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification
Next, the problem with using mongoexport is that JSON does not preserve all BSON types (http://bsonspec.org/#/specification) - BSON has more types than JSON does and so storing as JSON can be problematic unless you have rules to re-import
If you keep the data in BSON format there is the bsondump to inspect things as-is in the files:
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-bsondump
Or, if you had an "archive" MongoDB instance, you could just use mongodump/mongorestore, which works directly with the BSON files and does not have the JSON issues seen with mongoexport etc.:
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-mongodumpandmongorestore