mongodb river for elasticsearch - mongodb

Is there any official mongodb river available for elasticsearch ? I am using mongodb in node.js through the module mogoose.
I have seen one in http://www.matt-reid.co.uk/blog_post.php?id=68
Is this the correct one ? It says unofficial though...
Edit:
looks like, https://github.com/aparo/elasticsearch has inbuilt mongodb plugin.. Is there any doc available about how to configure this with mongodb and how mongodb pushes data for indexing to elasticsearch?

There is a new MongoDB river on github:
https://github.com/richardwilly98/elasticsearch-river-mongodb

according to the code you can specify several things but there is no separate doc (expect one mailing list discussion):
https://github.com/aparo/elasticsearch/blob/master/plugins/river/mongodb/src/main/java/org/elasticsearch/river/mongodb/MongoDBRiver.java
https://github.com/aparo/elasticsearch/blob/master/plugins/river/mongodb/src/test/java/org/elasticsearch/river/mongodb/MongoDBRiverTest.java

This isn't really the answer you're looking for. I looked at building this mongo river but I found some discussion on it having some memory leaks and I didn't want to fiddle with Java code. I wrote my own mongo->ES importer using the bulk API.
It's a work in progress, so feel free to contribute! :)
https://github.com/orenmazor/elastic-search-loves-mongo

Yes, There is a new MongoDB river on github:
https://github.com/richardwilly98/elasticsearch-river-mongodb
For Further Explanation You can follow below steps:
Step.1: -Install
ES_HOME/bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.4.0
ES_HOME/bin/plugin -install richardwilly98/elasticsearch-river-mongodb/1.4.0
Step.2: -Restart Elasticsearch
ES_HOME/bin/service/elasticsearch restart
Step.3: -Enable replica sets in mongodb
go to mongod.conf & Add line
replSet=rs0
save & Exit
Restart mongod
Step.4: -Tell elasticsearch to index the “person” collection in testmongo database by issuing the following command in your terminal
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
Step.5: -add some data to the mongodb through mongo terminal
use testmongo
var p = {firstName: "John", lastName: "Doe"}
db.person.save(p)
Step.6: -Use this command to search the data
curl -XGET 'http://localhost:9200/mongoindex/_search?q=firstName:John'
NOTE:
DELETE /_river
DELETE/_mongoindex
Again run this command,
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "testmongo",
"collection": "person"
},
"index": {
"name": "mongoindex",
"type": "person"
}
}'
Step.7: -See HQ Plugin
In mongoindex, you will get your data.

Related

PGSync with aws elasticsearch: index not found

I am trying to sync my Postgres database with Aws Elasticsearch using PGSync
I have defined a simple schema:
[
{
"database": "tenancyportal",
"index": "properties",
"nodes": [
{
"table": "properties",
"schema": "public",
"columns": ["id", "address"]
}
]
}
]
But when I am trying to bootstrap the database using
bootstrap --config schema.json
I get the following error:
elasticsearch.exceptions.NotFoundError: NotFoundError(404,
'index_not_found_exception', 'no such index [:9200]', :9200,
index_or_alias)
In the below screenshot, you will be able to see the GET URL for elasticsearch is completely wrong, I am not able to understand what config is causing it to be formed like that.
It looks like your AWS Elasticsearch URL is not constructed properly. This was adressed in a recent update to PGSync. Can you pull the latest master branch and try this again.

Setting up Strapi using MongoDB via the PM2 Runtime

I'm quite new to Strapi and I'm following the Strapi deployment documentation at https://strapi.io/documentation/3.0.0-beta.x/guides/deployment.html#configuration. I have setup strapi using mongodb and it seems to work both in production and dev on my server. I can create content types and add data...
Now I'm trying to start Strapi using the PM2 Runtime. I have setup the ecosystem.config.js file (see below) and I run pm2 start ecosystem.config.js. The Strapi app seems to start just fine, but now what happens in the browser is that I am prompted with a new admin user. Seems like all users and data is lost... Mongo db not accessed or whats going on?
this is my ecosystem.config.js file
module.exports = {
apps : [{
name: 'cms.strapi',
cwd: '/var/www/domain/public_html',
script: 'server.js',
env: {
NODE_ENV: 'production',
DATABASE_HOST: '127.0.0.1',
DATABASE_PORT: '28015',
DATABASE_NAME: 'db-name',
DATABASE_USERNAME: 'db-u-name',
DATABASE_PASSWORD: 'pw',
},
}],
};
What am I missing?
Hi Jim and thanks for your reply! I believe the problem was a mixup between the prod and the dev environment. Sorry, my bad. I thought I was in one environment when I was really in the other. I guess it should be obvious when you start the server from the prompt whether your starting dev or prod, but once the web server is up and running in the browser I guess you can't tell from the gui whether it's the one or the other. At least I can't find one other than that the admin usernames (and possibly data) are different... Hmm..
Anyway my production/database.json file looks like this:
{
"defaultConnection": "default",
"connections": {
"default": {
"connector": "mongoose",
"settings": {
"uri": "mongodb://localhost:27017/db-prod",
"database": "db-prod",
"host": "127.0.0.1",
"srv": false,
"port": 27017,
"username": "u-name-prd",
"password": "pw"
},
"options": {
"ssl": false
}
}
}
}
PM2 Runtime seems to be working correctly with Strapi and Mongo now :-)

AWS Data Pipelines with a Heroku Database

I'm wondering about the feasibility of connecting an AWS Data Pipeline to a Heroku Database. The heroku databases are stored on EC2 instances (east region), and require SSL.
I've tried to open up a connection using a JdbcDatabase Object, but have run into issues at every turn.
I've tried the following:
{
"id" : "heroku_database",
"name" : "heroku_database",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "org.postgresql.Driver",
"connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
"jdbcProperties": "ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory",
"username" : "#{myHerokuDatabaseUserName}",
"*password" : "#{*myHerokuDatabasePassword}"
},
with the result of:
unable to find valid certification path to requested target
ActivityFailed:SunCertPathBuilderException
as well as:
{
"id" : "heroku_database",
"name" : "heroku_database",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "org.postgresql.Driver",
"connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
"jdbcProperties": "sslmode=require",
"username" : "#{myHerokuDatabaseUserName}",
"*password" : "#{*myHerokuDatabasePassword}"
},
with the result of:
amazonaws.datapipeline.database.ConnectionFactory: Unable to establish connection to jdbc:postgresql://ec2-54-235-something-something.compute-1.amazonaws.com:5442/redacted FATAL: no pg_hba.conf entry for host "52.13.105.196", user "redacted", database "redacted", SSL off
To boot -- I have also tried to use a ShellCommandActivity to copy the postgres table from the ec2 instance and stdout it to my s3 bucket -- however the ec2 instance doesn't understand the psql command:
{
"id": "herokuDatabaseDump",
"name": "herokuDatabaseDump",
"type": "ShellCommandActivity",
"runsOn": {
"ref": "Ec2Instance"
},
"stage": "true",
"stdout": "#{myOutputS3Loc}/#{myOutputFileName}",
"command": "PGPASSWORD=#{*myHerokuDatabasePassword} psql -h #{myHerokuDatabaseHost} -U #{myHerokuDatabaseUserName} -d #{myHerokuDatabaseName} -p #{myHerokuDatabasePort} -t -A -F',' -c 'select * #{myHerokuDatabaseTableName}'"
},
and I also cannot yum install postgres beforehand.
It sucks to have both RDS and Heroku as our database sources. Any ideas on how to get a select query to run against a heroku postgres db via a data pipeline would be a great help. Thanks.
It looks like Heroku needs/wants the postgres 42.2.1 driver: https://devcenter.heroku.com/articles/heroku-postgresql#connecting-in-java. Or at least if you are compiling a java app that's what they tell you to use.
I wasn't able to find out which driver Data Pipeline uses by default but it allows you to use the jdbcDriverJarUri and specify custom driver jars: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html
An important note here is that it requires Java7, so you are going to want to use the postgres-42.2.1.jre7.jar: https://jdbc.postgresql.org/download.html
That combined with a jdbcProperties field of sslmode=require should allow it to go through and create the dump file you are looking for.

Elasticsearch and MongoDB: no river _meta document found after 5 attempts

I have a MongoDB database named news to which I tried to index with ES.
Using these plugins:
richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.9
and elasticsearch/elasticsearch-mapper-attachments/2.5.0
This is what happening when I tried to create the index. I have tried to delete the index and recreating it, without that helping.
$ curl -XPUT 'http://localhost:9200/_river/news/_meta' -d #init.json
init.json
{
"type": "mongodb",
"mongodb": {
"db": "news",
"collection": "entries"
},
"index": {
"name": "news",
"type": "entries"
}
}
Here is a log
update_mapping [mongodb] (dynamic)
MongoDB River Plugin - version[2.0.9] - hash[73ddea5] - time[2015-04-06T21:16:46Z]
setRiverStatus called with mongodb - RUNNING
river mongodb startup pending
Starting river mongodb
MongoDB options: secondaryreadpreference [false], drop_collection [false],
include_collection [], throttlesize [5000], gridfs [false], filter [null],
db [news], collection [entries], script [null], indexing to [news]/[entries]
MongoDB version - 3.0.2
update_mapping [mongodb] (dynamic)
[org.elasticsearch.river.mongodb.CollectionSlurper] Cannot ..
import collection entries into existing index
d with mongodb - INITIAL_IMPORT_FAILED
Started river mongodb
no river _meta document found after 5 attempts
no river _meta document found after 5 attempts
Any suggestions to what might be wrong?
I'm running ES 1.5.2 and MongoDB 3.0.2 on OS X.
On the mongodb river github pages, it looks like the plugin is supported up until version 1.4.2, but not higher (i.e. you're running 1.5.2)
Also note that rivers have been deprecated in ES v1.5 and there's an open issue in the mongodb river project on this very topic.
UPDATE after chatting with #martins
Finally, the issue was simply that the name of the created river was wrong (i.e. news instead of mongodb), the following command would properly create the mongodb river, which still works with ES 1.5.2 even though not it's officially tested.
curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d #init.json

elasticsearch throw exception work with mongodb river

I followed the link http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-and-Mongo-DB-td4033358.html to integrate elasticsearch and mongodb using mongodb river. The versions of each component are:
ubuntu 12.04 64bit
ES 0.90.0
mongodb 2.4.3
river 1.6.5
Mongodb is standalone running in one server but according to this link http://loosexaml.wordpress.com/2012/09/03/how-to-get-a-mongodb-oplog-without-a-full-replica-set/, the oplog is opened as replSet and oplogSize is configured in /etc/mongodb.conf, and the db.oplog.rs.find() also displayed some operation records.
The index added by:
curl -XPUT localhost:9200/_river/appdata/_meta -d'
{
"type": "mongodb",
"mongodb" : {
"db": "test_appdata",
"collection": "app_collection"
},
"index": {
"name": "test_appdata",
"type": "app"
}
}'
But when the elasticsearch started, the log show some exception as follow:
[2013-05-07 23:20:40,400][INFO ][river.mongodb ] [Ransak the Reject] [mongodb][app] starting mongodb stream. options: secondaryreadpreference [false], throttlesize [500], gridfs [false], filter [], db [test_appdata], script [null], indexing to [test_appdata]/[app]
Exception in thread "elasticsearch[Sundragon][mongodb_river_slurper][T#1]" java.lang.NoSuchMethodError: org.elasticsearch.action.get.GetResponse.exists()Z
at org.elasticsearch.river.mongodb.MongoDBRiver.getLastTimestamp(MongoDBRiver.java:1088)
at org.elasticsearch.river.mongodb.MongoDBRiver.access$2200(MongoDBRiver.java:93)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.getIndexFilter(MongoDBRiver.java:967)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.oplogCursor(MongoDBRiver.java:1021)
at org.elasticsearch.river.mongodb.MongoDBRiver$Slurper.run(MongoDBRiver.java:858)
at java.lang.Thread.run(Thread.java:679)
I'm a newbie to elasticsearch and mongodb, is the replica setting of mongodb caused the error?
Any suggestion is appreciated.
Your river is not compatible with Elasticsearch 0.90.
Move to ES 0.20.6 or ask for a patch in Mongodb river Project.