Elasticsearch and NoSql database [duplicate] - nosql

This question already has answers here:
Elasticsearch as a database? [closed]
(4 answers)
Closed 8 years ago.
What is the use to use both ElasticSearch and a separated Nosql database ?
Can't Elasticsearch be used both as a database and for search indexing ?

Yes, you can use ElasticSearch as a data source as well as an index.
By default each document you send to the ElasticSearch system is index, and, the original document is stored as well. This means whenever you query ElasticSearch you can also retrieve the original JSON document that you indexed.
If you have large documents and you want to be able to retrieve a smaller amount of data then when you can use the mapping API to set "store" to "yes" for specific fields, and then use the "fields" key to pull out specific fields you might want.
In my system I have address autocompletion and I only fetch the address field of a property. Here is an example from my system:
_search?q=FullAddress:main&fields:FullAddress
Then when a user selects the address I pull up the entire JSON document (along with others).
Note:
You cannot do updates like you can in SQL (update all items matching a query to increase an attribute, let's say)
You can, however, add a new document and replace the existing one at the ID you want to update. Elastic search increments a _version property on each document which can be used by the developer to enforce optimistic concurrency, but it does not maintain a separate version history of each document. You can only retrieve the latest version of a document.

Related

How to solve concurrent read + write update to the document in MongoDB?

I have an application that runs multiple instances and that needs to perform this:
Read the document
Check a value of the timestamp field
If an incoming document is newer, then update (including the timestamp field).
Even if I use transactions, two parallel operations would be able to perform 1 and 2 simultaneously and then both will write to the database, potentially writing the "older" document last (since it will check the timestamp of the original document, rather than a "new" one)
So what I am looking for is some kind of a read lock on the document or some other mechanism that will be able to solve this.

How to get documents in Google firestore by created\updated date.? [duplicate]

This question already has answers here:
How can I query firestore by document snapshot updateTime?
(3 answers)
Closed 2 years ago.
i want to get only the documents that are newly created and\or updated in my collections\sub collections.
i see we have a create_date , update_date metadata methods on the document object (using python SDK).
can we use it and how.? or is there any better ways i can do it easier.
Regards,
Sai
There is no way to query Firestore by this metadata. If you need to use such timestamps in your queries, you'll need to add them to your documents as regular fields, typically setting them as server-side timestamps (to prevent clients from tampering with them).
Also see: How can I query firestore by document snapshot updateTime?, which I'll mark your question as a duplicate of now that I found it.

MongoDB - Save vs Update [duplicate]

This question already has answers here:
Mongoose difference between .save() and using update()
(4 answers)
Closed 3 years ago.
I have around 400 fields in my collection (including both at top level as well as embedded), following is the nature of write queries:
All write queries always update single document and an average of 60
fields in that document.
There are indexed fields in collection but no write query updates an indexed field.
Volume of write queries is very large.
I can use either .save() or .update() to update the document. In update I only pass the fields that need to be updated, whereas in save I pass the entire document. I want to know if using update in this case will give me better performance than save (or vice versa) or does it not make any difference at the database level and both perform equally well?
It doesn't make any significant change in performance. The reasons are as below
When you save or update a document in mongodb, you probably decide to call save or update from another application that could be written in C#, Java, JavaScript, PHP or someother language.
In this case, there is a Inter process communication (or network call if you mongo db is running in another machine). When compared to this the difference in time take to selectively replace a document by update and completely replace the document by save is negligible. By the way, save and update both will probably have run time complexity of O(n) if there is no indexes.
For a document with 250 fields, the size of the document is probably not too big that we have to consider. If the size of the update document is significantly smaller that size of the save document, then please use update.
Else use a save/update depending on the which is more elegant in the client side code.

Create schema.xml automatically for Solr from mongodb

Is there an option to generate automatically a schema.xml for solr from mongodb? e.g each field of a document and subdocuments from a collection should by indexed and get searchable by default.
As written as in this SO answer Solr's Schemaless Mode could help you
Solr supports a Schemaless Mode. When starting Solr this way, you are initially not bound to a schema. When you give Solr a first document it will guess the appropriate field types and generate a schema that includes those field types for you. These fields are then fixed. You may still add new fields on the fly that way.
What you still need to do is to create an Import Route of some kind from your mongodb into Solr.
After googling a bit, you may stumble over the SO question - solr Data Import Handlers for MongoDB - which may help you on that part too.
Probably simpler would be to create a mongo query whose result contains all relevant information you require, save the result to json and send that to Solr's direct update handler, which can parse json.
So in short
Create a new, empty core in Schemaless Mode
Create an import of some kind that covers all entities and attributes you want
Run the import
Check if the result is as you want it to be
As long as (4) is not satisfied you may delete the core and repeat these steps.
No, MongoDB does not provide this option. You will have to create a script that maps documents to XML.

What is the typical usage of ElasticSearch in conjuncion with other storage?

It is not recommended to use ElasticSearch as the only storage from some obvious reasons like security, transactions etc. So how it is usually used together with other database?
Say, I want to store some documents in MongoDB and be able to effectively search by some of their properties. What I'd do would be to store full document in Mongo as usual and then trigger insertion to ElasticSearch but I'd insert only searchable properties plus MongoDB ObjectID there. Then I can search using ElasticSearch and having ObjectID found, go to Mongo and fetch whole documents.
Is this correct usage of ElasticSearch? I don't want to duplicate whole data as I have them already in Mongo.
The best practice is for now to duplicate documents in ES.
The cool thing here is that when you search, you don't have to return to your database to fetch content as ES provide it in only one single call.
You have everything with ES Search Response to display results to your user.
My 2 cents.
You may like to use mongodb river take a look at this post
There are more issue then the size of the data you store or index, you might like to have MongoDB as a backup with "near real time" query for inserted data. and as a queue for the data to indexed (you may like to use mongodb as cluster with the relevant write concern suited for you application