ElasticSearch with river to MongoDB - mongodb

I have installed and configured MongoDB and ES with mongodb river. But I'm not sure if I really understand rivers in ES. For example, I want index collection "users" from mongodb.
I will send curl PUT/POST request to url /_river/mongodb_users/_meta
{
"type": "mongodb",
"mongodb": {
"db": "somedb",
"collection": "users"
},
"index": {
"name": "users",
"type": "user"
}
}
But now, I want to index second collection, for example "users2". I really need to create new river using curl POST/PUT on url like /_river/mongodb_users2/_meta with JSON:
{
"type": "mongodb",
"mongodb": {
"db": "somedb",
"collection": "users2"
},
"index": {
"name": "users2",
"type": "user"
}
}
I can not use already created river "mongodb_users"? I will need create one river for one collection?
Thank you for explanation!

Yes. The way MongoDB river works does not allow to fetch content from more than one collection in a single river.
But, you can create as many river as you need.
That said, if you want to index users1 to users type in Elasticsearch and users2 to the same users type, you can (as soon as they don't use the same IDs).
Just modify index.type to "users".
Does it help?

Related

How to create mongodb 2dsphere index on strapi?

My strapi project use Mongo database.
Strapi version : 3.0.0-beta.19.5
I have tried :
creating the 2dsphere index manually with the command in the mongo console, but when the application start to run, the index got deleted. I think maybe the database kinda synchronize with the strapi model configuration.
I checked the strapi document and I see there is an option to create an index by adding a configuration to the model.settings.json, but there is only a single field index option.
Is there any way to create a 2dspere index?
I just found a solution. I had to look up on the mongoose documentation index part.
In strapi documentation, they only tell that the value of 'index' is a boolean type. Which is different from mongoose doc. Actually the model.settings.json structure follow mongoose documentation.
So, to create the 2dsphere index we just need to specify "2dsphere" in the key "index" on that field.
Eg.
{
"kind": "collectionType",
"connection": "default",
"collectionName": "phone_stores",
"info": {
"name": "phoneStore"
},
"options": {
"increments": true,
"timestamps": true
},
"attributes": {
"car": {
"type": "integer",
"required": true
},
"userStoreId": {
"type": "objectId"
},
"location": {
"type": "json",
"index": "2dsphere" // <------ <1>
},
}
}
<1> if true were specify, Single field index will be created on this field. But u can also specify other type of index, like in my case i use '2dsphere'.
UPDATE
What I said about
u can also specify other type of index
is not correct. The index type is limited because of the framework. So far I have test with 2dsphere, which is working. I test also text index, but it didn't work.

Getting an error while loading data with DMS from mongodb to elasticsearch, any ideas?

I am trying to use AWS DMS and transfer data from mongodb to amazon elasticsearch.
i am encountering the following log in CloudWatch.
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
}
],
"type": "mapper_parsing_exception",
"reason": "Field [_id] is a metadata field and cannot be added inside a document. Use the index API request parameters."
},
"status": 400
}
This is my configuration for the mongo db source.
it has the _id as a separete column check box enabled.
i tried disabling it and it says that there is no primary key.
is there anything that you guys know that can fix it ?
quick note:
i have added mapping of the _id field to old_id and now it doesn't import all the other field, even when i add them in the mapping
As ElasticSearch will not support the LOB data type, Other fields are not migrated.
Add additional transformation rule to change the data type to String
{
"rule-type": "transformation",
"rule-id": "3",
"rule-name": "3",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "test",
"table-name": "%",
"column-name": "%"
},
"data-type": {
"type": "string",
"length": "30"
}
}

MongoDB - how to properly model relations

Let's assume we have the following collections:
Users
{
"id": MongoId,
"username": "jsloth",
"first_name": "John",
"last_name": "Sloth",
"display_name": "John Sloth"
}
Places
{
"id": MongoId,
"name": "Conference Room",
"description": "Some longer description of this place"
}
Meetings
{
"id": MongoId,
"name": "Very important meeting",
"place": <?>,
"timestamp": "1506493396",
"created_by": <?>
}
Later on, we want to return (e.g. from REST webservice) list of upcoming events like this:
[
{
"id": MongoId(Meetings),
"name": "Very important meeting",
"created_by": {
"id": MongoId(Users),
"display_name": "John Sloth",
},
"place": {
"id": MongoId(Places),
"name": "Conference Room",
}
},
...
]
It's important to return basic information that need to be displayed on the main page in web ui (so no additional calls are needed to render the table). That's why, each entry contains display_name of the user who created it and name of the place. I think that's a pretty common scenario.
Now my question is: how should I store this information in db (question mark values in Metting document)? I see 2 options:
1) Store references to other collections:
place: MongoId(Places)
(+) data is always consistent
(-) additional calls to db have to be made in order to construct the response
2) Denormalize data:
"place": {
"id": MongoId(Places),
"name": "Conference room",
}
(+) no need for additional calls (response can be constructed based on one document)
(-) data must be updated each time related documents are modified
What is the proper way of dealing with such scenario?
If I use option 1), how should I query other documents? Asking about each related document separately seems like an overkill. How about getting last 20 meetings, aggregate the list of related documents and then perform a query like db.users.find({_id: { $in: <id list> }})?
If I go for option 2), how should I keep the data in sync?
Thanks in advance for any advice!
You can keep the DB model you already have and still only do a single query as MongoDB introduced the $lookup aggregation in version 3.2. It is similar to join in RDBMS.
$lookup
Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. The $lookup stage does an equality match between a field from the input documents with a field from the documents of the “joined” collection.
So instead of storing a reference to other collections, just store the document ID.

River Plugin Not_analyzed option for Elasticsearch

I'm using Elasticsearch 1.1.1, River Plugin and MongoDB 2.4
I have a field called cidr that is being analyzed. I need to set it so that it is not_analyzed anymore to use it with Kibana correctly. Following is the index I used. But now Im going to reindex it again (delete and write a new one.)
Whats the proper way to write a new index in a way that the values in the "cidr" field are not analyzed? Thank you.
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "collective_name",
"collection": "ips"
},
"index": {
"name": "mongoindex"
}
}'
I see. It's working now. Mapping should be created BEFORE creating the index.
curl -XPUT "localhost:9200/mongoindex" -d '
{
"mappings": {
"mongodb" : {
"properties": {
"cidr": {"type":"string", "index" : "not_analyzed"}
}
}
}
}'
This is it. :)

Not all documents are indexed with ElasticSearch and MongoDB

I am using ElasticSearch 1.1.0 (I was running 1.2.0 but had issues with a ElasticSearch plugin) and MongoDB 2.6.1. I've installed them using the tutorial provided at enter link description here. When I create an index using
curl -XPUT "localhost:9200/_river/tenders/_meta" -d '{
"type": "mongodb",
"mongodb": {
"servers": [
{ "host": "127.0.0.1", "port": 27017 }
],
"options": { "secondary_read_preference": true },
"db": "tenderdb",
"collection": "tenders"
},
"index": {
"name": "tendersidx",
"type": "page"
}
}'
Indexing starts fine of the collection but it only indexes a part of the collection. E.g. the collection has at the moment 5184 records while only 1060 are indexed.
Avish's comment did the trick, he wrote: "ElasticSearch rivers only monitor changes in the other data store; your river should only track documents added to the collection after the river has been set up."