Create schema.xml automatically for Solr from mongodb

Create schema.xml automatically for Solr from mongodb - mongodb

Is there an option to generate automatically a schema.xml for solr from mongodb? e.g each field of a document and subdocuments from a collection should by indexed and get searchable by default.

As written as in this SO answer Solr's Schemaless Mode could help you
Solr supports a Schemaless Mode. When starting Solr this way, you are initially not bound to a schema. When you give Solr a first document it will guess the appropriate field types and generate a schema that includes those field types for you. These fields are then fixed. You may still add new fields on the fly that way.
What you still need to do is to create an Import Route of some kind from your mongodb into Solr.
After googling a bit, you may stumble over the SO question - solr Data Import Handlers for MongoDB - which may help you on that part too.
Probably simpler would be to create a mongo query whose result contains all relevant information you require, save the result to json and send that to Solr's direct update handler, which can parse json.
So in short
Create a new, empty core in Schemaless Mode
Create an import of some kind that covers all entities and attributes you want
Run the import
Check if the result is as you want it to be
As long as (4) is not satisfied you may delete the core and repeat these steps.

No, MongoDB does not provide this option. You will have to create a script that maps documents to XML.

Related

Adding _cls to existing mongodb collections

I have an existing MongoDB without the _cls field in the documents.
Data will continue to enter the DB during the lifetime of the DB, data is added through Morphia which doesn't add the _cls field automatically.
It seems not the best idea to add a _cls field to Morphia Entities.
Do you have a better idea how to make the data coming from morphia better fit (have the _cls field) in mongo documents?
Edit:
*I am using flask server in python with mongoengine which require the field
I saw the solution for using #PreSave in Morphia, it is a good Idea and I will use it if another solution is not found. ** I am looking for a solution in the Python side**. you aren't always able to change the data insertion

You could use Morphia’s #PreSave annotation to ‘tweak’ the JSON document prior to it being saved to the database. You could then just inject the _cls field and value without having to declare a field in your Java class

SHOW CREATE TABLE in mongodb

Does mongodb have some analogue of SHOW CREATE TABLE in mysql which shows create query for collection?
Or can I create another collection like existing one with all settings?

There no analogs for SHOW CREATE TABLE.
But maybe you will find a some usufull functions there https://docs.mongodb.com/manual/reference/command/nav-administration/
For example the information about indexes you can retrieve with getIndexes function.
Create the indexes you can via createIndexes function.
Example:
var indexes = db.collection.getIndexes();
db.collection.createIndexes(indexes);

Use MongoDB compass : Visualize, understand, and work with your data through an intuitive GUI.
https://www.mongodb.com/products/compass?_bt=208952627176&_bk=mongodb%20compass&_bm=e&_bn=g&utm_source=google&utm_campaign=Americas_US_CorpEntOnly_Brand_Alpha_FM&utm_keyword=mongodb%20compass&utm_device=c&utm_network=g&utm_medium=cpc&utm_creative=208952627176&utm_matchtype=e&_bt=208952627176&_bk=mongodb%20compass&_bm=e&_bn=g&jmp=search&gclid=Cj0KCQiAmITRBRCSARIsAEOZmr6S3Hw_plZO3dbZS7UGwhU2hS-EGz2vB1SR5tAuMOGd-6j82FkQunIaAgDQEALw_wcB

There is no good answer to this question because the schema involved when dealing with schema-less databases like MongoDB is dictated by the application, not the database.
The database will shove in whatever it is given as there is nothing enforcing a consistent document structure within a given collection, even though all access to the database should be controlled through some kind of wrapper. In conclusion, the only place you should look at for the schema is your model classes.

Import/Export of Mongo Collections, Preserving _id

I have a MEAN database application with a number of Mongo collections with hierarchical relationships via ObjectId. A copy of the application works locally offline, and another copy runs on the production server.
The data contain collectively describe rules and content that drive a complex process. These data need to be entered offline so that these processes can be tested before the data go into the production environment.
What I assumed I would be able to easily do is to export selected documents as JSON, then relatively simply import them into the production database. So, the system would have a big "Export" button that would take the current document and all subdocuments and related documents, and export them as a single JSON file. Then, my "Import" button would parse that JSON file on the production server.
So, exporting is no problem. Did that in a couple of hours.
But, I quickly found that when I import a document, its _id field value is not preserved. This breaks relationships, obviously.
I have considered writing parsing routines that preserved these relationships by programmatically setting ObjectIds in parent documents after the child documents have been saved. This will be a huge headache though.
I'm hoping there is either:
a) ... and easy way to import a JSON document with _id fields intact, or ...
b) ... another way to accomplish this entirely that is easier than I am making it.
I appreciate any advice.

There's always got to be someone that doesn't know the answer who complains about the question. The question is clear and the problem is familiar.
Indeed, Mongoose will overwrite any value you provide for _id when you create a document either via the create() method or using the constructor (var thing = new Thing()).
Also, mongoexport/mongoimport will not fill the need to do this programmatically, at least not easily.
If I'm understanding correctly, you want to export a subset of documents, along with any related documents, keeping references intact. Then, you want to import this data into a remote system, again, keeping references intact.
The approach you took would work just fine except it will destroy all references, as you found out.
I've worked on a similar problem and I believe that the best way to do this is to do what it sounds like you wanted to avoid. That is, you'll iterate over your collections and let Mongo generate its _ids as it will. Add your child documents first, then set the references correctly in your parent documents. I really don't think there is a better way that still gives you granular control.

In current version of mongodb you can use db.copyDatabase(). Start current instance of mongodb where you want to copy database and run following command:
db.copyDatabase(fromDB, toDB).
For more options and details refer to db.copyDatabase()

What is the typical usage of ElasticSearch in conjuncion with other storage?

It is not recommended to use ElasticSearch as the only storage from some obvious reasons like security, transactions etc. So how it is usually used together with other database?
Say, I want to store some documents in MongoDB and be able to effectively search by some of their properties. What I'd do would be to store full document in Mongo as usual and then trigger insertion to ElasticSearch but I'd insert only searchable properties plus MongoDB ObjectID there. Then I can search using ElasticSearch and having ObjectID found, go to Mongo and fetch whole documents.
Is this correct usage of ElasticSearch? I don't want to duplicate whole data as I have them already in Mongo.

The best practice is for now to duplicate documents in ES.
The cool thing here is that when you search, you don't have to return to your database to fetch content as ES provide it in only one single call.
You have everything with ES Search Response to display results to your user.
My 2 cents.

You may like to use mongodb river take a look at this post
There are more issue then the size of the data you store or index, you might like to have MongoDB as a backup with "near real time" query for inserted data. and as a queue for the data to indexed (you may like to use mongodb as cluster with the relevant write concern suited for you application

Creting collection only on MongoDB

I want to create only the collections structure.
i.e.
Say Products collection contains a list of Categories.
I want to specify this container structure by creating this dependencies, but I do not want to create any collection entry (say there is a loader program somewhere that bulk uploads the data).
The closet analogy in RDBMS is; DBA creates the schema design with constraints and dependencies; application or ETL tool loads the actual data.
Most of the examples that I see simply create a sample collection and then invoke the
db.insert(document)
OR
db.save(document)
Is it even possible in MongoDB?
if the question is not clear, please let me know.
Thanks

The short answer is NO.
You cannot create a schema in MongoDB. A collection is just a set of documents. Furthermore, dependencies are likely to be represented with embedded documents (as opposed to referenced documents).
We can be more specific if you post the data you want to represent.