AWS Data Pipelines with a Heroku Database - postgresql

I'm wondering about the feasibility of connecting an AWS Data Pipeline to a Heroku Database. The heroku databases are stored on EC2 instances (east region), and require SSL.
I've tried to open up a connection using a JdbcDatabase Object, but have run into issues at every turn.
I've tried the following:
{
"id" : "heroku_database",
"name" : "heroku_database",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "org.postgresql.Driver",
"connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
"jdbcProperties": "ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory",
"username" : "#{myHerokuDatabaseUserName}",
"*password" : "#{*myHerokuDatabasePassword}"
},
with the result of:
unable to find valid certification path to requested target
ActivityFailed:SunCertPathBuilderException
as well as:
{
"id" : "heroku_database",
"name" : "heroku_database",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "org.postgresql.Driver",
"connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
"jdbcProperties": "sslmode=require",
"username" : "#{myHerokuDatabaseUserName}",
"*password" : "#{*myHerokuDatabasePassword}"
},
with the result of:
amazonaws.datapipeline.database.ConnectionFactory: Unable to establish connection to jdbc:postgresql://ec2-54-235-something-something.compute-1.amazonaws.com:5442/redacted FATAL: no pg_hba.conf entry for host "52.13.105.196", user "redacted", database "redacted", SSL off
To boot -- I have also tried to use a ShellCommandActivity to copy the postgres table from the ec2 instance and stdout it to my s3 bucket -- however the ec2 instance doesn't understand the psql command:
{
"id": "herokuDatabaseDump",
"name": "herokuDatabaseDump",
"type": "ShellCommandActivity",
"runsOn": {
"ref": "Ec2Instance"
},
"stage": "true",
"stdout": "#{myOutputS3Loc}/#{myOutputFileName}",
"command": "PGPASSWORD=#{*myHerokuDatabasePassword} psql -h #{myHerokuDatabaseHost} -U #{myHerokuDatabaseUserName} -d #{myHerokuDatabaseName} -p #{myHerokuDatabasePort} -t -A -F',' -c 'select * #{myHerokuDatabaseTableName}'"
},
and I also cannot yum install postgres beforehand.
It sucks to have both RDS and Heroku as our database sources. Any ideas on how to get a select query to run against a heroku postgres db via a data pipeline would be a great help. Thanks.

It looks like Heroku needs/wants the postgres 42.2.1 driver: https://devcenter.heroku.com/articles/heroku-postgresql#connecting-in-java. Or at least if you are compiling a java app that's what they tell you to use.
I wasn't able to find out which driver Data Pipeline uses by default but it allows you to use the jdbcDriverJarUri and specify custom driver jars: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html
An important note here is that it requires Java7, so you are going to want to use the postgres-42.2.1.jre7.jar: https://jdbc.postgresql.org/download.html
That combined with a jdbcProperties field of sslmode=require should allow it to go through and create the dump file you are looking for.

Related

UserNotFound: Could not find user admin#automation

I am trying to launch a Mongodb instance as follows and running into error as below, how do I debug this?please provide guidance on how to fix this error
mongo automation --host machine40.scv.company.com -u admin -p passd
2018-05-15T21:11:30.346-0700 I ACCESS [conn3] SCRAM-SHA-1 authentication failed for admin on automation from client 17.xxx.xxx.x:54756 ; UserNotFound: Could not find user admin#automation
Please refer below step.
1) log into mongo console as super admin user.
mongo admin --host machine40.scv.company.com -u admin -p passd
2) Then check, is there any user call admin who can authenticate to access "automation" DB.
Sample output given below
{
"_id" : "XXXXXX",
"user" : "admin",
"db" : "automation",
"roles" : [
{
"role" : "root",
"db" : "automation"
}
]
}

How to clone/copy a database in Azure Cosmos DB

How do I clone a database in Azure Cosmos DB to a mongod running on localhost:27001?
I've tried the following, but I can't get it working:
db.cloneDatabase("mycosmosdb:mypassword#username.documents.azure.com:10255/MyDatabase?ssl=true&replicaSet=globaldb")
This returns the following when running the command from my local machine where mongod is running :
{
"clonedColls" : [ ],
"ok" : 0,
"errmsg" : "connect failed to replica set mycosmosdb:mypassword#username.documents.azure.com:10255/Mydatabase?ssl=true&replicaSet=globaldb:27017",
"code" : 6,
"codeName" : "HostUnreachable"
}
Trying this variant also fails:
db.copyDatabase("NameOfAzureDB", "NameOfLocalDB", "username.documents.azure.com:10255", "username", "password")
{
"ok" : 0,
"errmsg" : "couldn't connect to server username.documents.azure.com:27017, connection attempt failed"
}
cloneDatabase is not supported on Cosmos DB as clone command that it wraps around is not flexible enough to allow SSL connection and the name of the replica set. It only accepts the hostname and port number and nothing else, so the connection string used in the question is not supported (and there's no such moniker as "mycosmosdb" either).

MongoDB shell authentication error

I created a user and password for a database called student.:
db.createUser({user:'Catalin',pwd:'Catalin',roles:[{role:'userAdmin',db:'student'}]});
So I restarted MongoDB Server with this command:
mongod --auth --dbpath C:\data\db
in another terminal, I connected to the sever with:
mongo
then queried the server with :
> db.getUsers()
[
{
"_id" : "student.Catalin",
"user" : "Catalin",
"db" : "student",
"roles" : [
{
"role" : "userAdmin",
"db" : "student"
}
]
}
]
went to the student database:
> use student
switched to db student
Entered my username and password succesfully via this command:
> db.auth('Catalin','Catalin');
1
and when I want to view my collections I get an error, WHY?:
> show collections
2016-03-07T15:54:41.166+0300 E QUERY [thread1] Error: listCollections failed:
{
"ok" : 0,
**"errmsg" : "not authorized on student to execute command { listCollectio
ns: 1.0, filter: {} }",**
"code" : 13
} :
_getErrorWithCode#src/mongo/shell/utils.js:23:13
DB.prototype._getCollectionInfosCommand#src/mongo/shell/db.js:746:1
DB.prototype.getCollectionInfos#src/mongo/shell/db.js:758:15
DB.prototype.getCollectionNames#src/mongo/shell/db.js:769:12
shellHelper.show#src/mongo/shell/utils.js:695:9
shellHelper#src/mongo/shell/utils.js:594:15
#(shellhelp2):1:1
"errmsg" : "not authorized on student to execute command { listCollectio
ns: 1.0, filter: {} }",
P.S. : I'm using mongoDB 3.2
The userAdmin built-in role only provides the ability to create and modify roles and users on a database. If you need access to the database, you would need to either assign database roles, or other roles that has database access such as dbOwner.
Please see Built-in Roles for more detailed information.
You may also find these useful:
Manage users and roles.
Enable client access control.
If you are using version 5 or befor DataBase MongoDB, pay attention to this point. You should not write anything in the mongod.conf file in the /etc/mongod.conf path. Use the following two commands to update the dependencies.
install dependencies :
sudo apt install libgconf-2-4
sudo add-apt-repository universe

MongoDB not authorized for query - code 13

In MongoDB 2.6.1 I've setup a user with dbAdmin rights:
{
"_id" : "mydbname.myusername",
"user" : "myusername",
"db" : "mydbname",
"credentials" : {
"MONGODB-CR" : "<some credentials>"
},
"roles" : [
{
"role" : "dbAdmin",
"db" : "mydbname"
}
]
}
When I use the mongo shell to connect to the database (using -u and -p on command line) and run a query like this:
db.mycollectionname.find()
I get this error:
error: { "$err" : "not authorized for query on mydbname.request", "code" : 13 }
Any ideas what can be happening?
So far I've tried adding every role I can find to the user but that hasn't helped.
You need to assign the read role to the user in question.
The dbAdmin role does not include read access on non-system collections.
For anybody running into this problem against Mongo 3.0.6, I fixed by adding
?authMode=scram-sha1
at the end of my mongodb uri.
Here are some docs explaining the purpose of scram-sha1
In addition to #Sebastian's answer, that means explicitely:
Grant a Role
Grant a role using the db.grantRolesToUser() method. For example, the
following operation grants the reportsUser user the read role on the
accounts database:
db.grantRolesToUser(
"your_user",
[
{ role: "read", db: "your_db" }
]
)
I’m posting this because I had trouble finding this solution online. The problem is embarrassing but the error message and scenario make it somewhat difficult to figure out so I'm hoping to save someone some pain.
My application was able to establish a database connection, start and serve static pages but every time it tried to execute a query I got this error.
MongoError: not authorized on mydb to execute command { count: "urls", query: {} }
This error was caused by a userid and password with the wrong separator
mongodb://myuserid/mypassword#ds112345.mlab.com:12345/mydb [wrong]
mongodb://myuserid:mypassword#ds112345.mlab.com:12345/mydb [right]
While the node application was able to successfully connect to MongoDB, the incorrectly formatted URI caused the driver to skip authentication prior to issuing database commands.
Thanks and a hat tip to the people at mLab support.
In my case helped to add ?authSource=admin to uri.
So the uri looked like this:
spring.data.mongodb.uri=mongodb://user:password#host:port/database?authSource=admin
I was getting error like
err { MongoError: not authorized on XXXX to execute command { $eval: function(){ return db.tbl_user.find( {
For me role executeFunctions worked. Once you give role your user should look like following...
{
"_id" : "XXXXX",
"user" : "USERXXX",
"db" : "DBXXXX",
"roles" : [
{
"role" : "executeFunctions",
"db" : "admin"
},
{
"role" : "dbOwner",
"db" : "fryendsapp"
}
]
}
I had the same problem, after a lot of trial and error I solved it by adding the role "read" to my user but indicating the database as well:
db.grantRolesToUser("myUserName",
[{ role: "read", db: "myDataBaseName"}] );
Greetings
I was having the same problem until I realized I made a mistake. I was inheriting a mongo client pointing to a different database from a parent Perl class without realizing it. Maybe something you can check.

How to use Elasticsearch with MongoDB?

I have gone through many blogs and sites about configuring Elasticsearch for MongoDB to index Collections in MongoDB but none of them were straightforward.
Please explain to me a step by step process for installing elasticsearch, which should include:
configuration
run in the browser
I am using Node.js with express.js, so please help accordingly.
This answer should be enough to get you set up to follow this tutorial on Building a functional search component with MongoDB, Elasticsearch, and AngularJS.
If you're looking to use faceted search with data from an API then Matthiasn's BirdWatch Repo is something you might want to look at.
So here's how you can setup a single node Elasticsearch "cluster" to index MongoDB for use in a NodeJS, Express app on a fresh EC2 Ubuntu 14.04 instance.
Make sure everything is up to date.
sudo apt-get update
Install NodeJS.
sudo apt-get install nodejs
sudo apt-get install npm
Install MongoDB - These steps are straight from MongoDB docs.
Choose whatever version you're comfortable with. I'm sticking with v2.4.9 because it seems to be the most recent version MongoDB-River supports without issues.
Import the MongoDB public GPG Key.
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
Update your sources list.
echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list
Get the 10gen package.
sudo apt-get install mongodb-10gen
Then pick your version if you don't want the most recent. If you are setting your environment up on a windows 7 or 8 machine stay away from v2.6 until they work some bugs out with running it as a service.
apt-get install mongodb-10gen=2.4.9
Prevent the version of your MongoDB installation being bumped up when you update.
echo "mongodb-10gen hold" | sudo dpkg --set-selections
Start the MongoDB service.
sudo service mongodb start
Your database files default to /var/lib/mongo and your log files to /var/log/mongo.
Create a database through the mongo shell and push some dummy data into it.
mongo YOUR_DATABASE_NAME
db.createCollection(YOUR_COLLECTION_NAME)
for (var i = 1; i <= 25; i++) db.YOUR_COLLECTION_NAME.insert( { x : i } )
Now to Convert the standalone MongoDB into a Replica Set.
First Shutdown the process.
mongo YOUR_DATABASE_NAME
use admin
db.shutdownServer()
Now we're running MongoDB as a service, so we don't pass in the "--replSet rs0" option in the command line argument when we restart the mongod process. Instead, we put it in the mongod.conf file.
vi /etc/mongod.conf
Add these lines, subbing for your db and log paths.
replSet=rs0
dbpath=YOUR_PATH_TO_DATA/DB
logpath=YOUR_PATH_TO_LOG/MONGO.LOG
Now open up the mongo shell again to initialize the replica set.
mongo DATABASE_NAME
config = { "_id" : "rs0", "members" : [ { "_id" : 0, "host" : "127.0.0.1:27017" } ] }
rs.initiate(config)
rs.slaveOk() // allows read operations to run on secondary members.
Now install Elasticsearch. I'm just following this helpful Gist.
Make sure Java is installed.
sudo apt-get install openjdk-7-jre-headless -y
Stick with v1.1.x for now until the Mongo-River plugin bug gets fixed in v1.2.1.
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.1.deb
sudo dpkg -i elasticsearch-1.1.1.deb
curl -L http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz
sudo mv *servicewrapper*/service /usr/local/share/elasticsearch/bin/
sudo rm -Rf *servicewrapper*
sudo /usr/local/share/elasticsearch/bin/service/elasticsearch install
sudo ln -s `readlink -f /usr/local/share/elasticsearch/bin/service/elasticsearch` /usr/local/bin/rcelasticsearch
Make sure /etc/elasticsearch/elasticsearch.yml has the following config options enabled if you're only developing on a single node for now:
cluster.name: "MY_CLUSTER_NAME"
node.local: true
Start the Elasticsearch service.
sudo service elasticsearch start
Verify it's working.
curl http://localhost:9200
If you see something like this then you're good.
{
"status" : 200,
"name" : "Chi Demon",
"version" : {
"number" : "1.1.2",
"build_hash" : "e511f7b28b77c4d99175905fac65bffbf4c80cf7",
"build_timestamp" : "2014-05-22T12:27:39Z",
"build_snapshot" : false,
"lucene_version" : "4.7"
},
"tagline" : "You Know, for Search"
}
Now install the Elasticsearch plugins so it can play with MongoDB.
bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/1.6.0
bin/plugin --install elasticsearch/elasticsearch-mapper-attachments/1.6.0
These two plugins aren't necessary but they're good for testing queries and visualizing changes to your indexes.
bin/plugin --install mobz/elasticsearch-head
bin/plugin --install lukas-vlcek/bigdesk
Restart Elasticsearch.
sudo service elasticsearch restart
Finally index a collection from MongoDB.
curl -XPUT localhost:9200/_river/DATABASE_NAME/_meta -d '{
"type": "mongodb",
"mongodb": {
"servers": [
{ "host": "127.0.0.1", "port": 27017 }
],
"db": "DATABASE_NAME",
"collection": "ACTUAL_COLLECTION_NAME",
"options": { "secondary_read_preference": true },
"gridfs": false
},
"index": {
"name": "ARBITRARY INDEX NAME",
"type": "ARBITRARY TYPE NAME"
}
}'
Check that your index is in Elasticsearch
curl -XGET http://localhost:9200/_aliases
Check your cluster health.
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
It's probably yellow with some unassigned shards. We have to tell Elasticsearch what we want to work with.
curl -XPUT 'localhost:9200/_settings' -d '{ "index" : { "number_of_replicas" : 0 } }'
Check cluster health again. It should be green now.
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
Go play.
Using river can present issues when your operation scales up. River will use a ton of memory when under heavy operation. I recommend implementing your own elasticsearch models, or if you're using mongoose you can build your elasticsearch models right into that or use mongoosastic which essentially does this for you.
Another disadvantage to Mongodb River is that you'll be stuck using mongodb 2.4.x branch, and ElasticSearch 0.90.x. You'll start to find that you're missing out on a lot of really nice features, and the mongodb river project just doesn't produce a usable product fast enough to keep stable. That said Mongodb River is definitely not something I'd go into production with. It's posed more problems than its worth. It will randomly drop write under heavy load, it will consume lots of memory, and there's no setting to cap that. Additionally, river doesn't update in realtime, it reads oplogs from mongodb, and this can delay updates for as long as 5 minutes in my experience.
We recently had to rewrite a large portion of our project, because its a weekly occurrence that something goes wrong with ElasticSearch. We had even gone as far as to hire a Dev Ops consultant, who also agrees that its best to move away from River.
UPDATE:
Elasticsearch-mongodb-river now supports ES v1.4.0 and mongodb v2.6.x. However, you'll still likely run into performance problems on heavy insert/update operations as this plugin will try to read mongodb's oplogs to sync. If there are a lot of operations since the lock(or latch rather) unlocks, you'll notice extremely high memory usage on your elasticsearch server. If you plan on having a large operation, river is not a good option. The developers of ElasticSearch still recommend you to manage your own indexes by communicating directly with their API using the client library for your language, rather than using river. This isn't really the purpose of river. Twitter-river is a great example of how river should be used. Its essentially a great way to source data from outside sources, but not very reliable for high traffic or internal use.
Also consider that mongodb-river falls behind in version, as its not maintained by ElasticSearch Organization, its maintained by a thirdparty. Development was stuck on v0.90 branch for a long time after the release of v1.0, and when a version for v1.0 was released it wasn't stable until elasticsearch released v1.3.0. Mongodb versions also fall behind. You may find yourself in a tight spot when you're looking to move to a later version of each, especially with ElasticSearch under such heavy development, with many very anticipated features on the way. Staying up on the latest ElasticSearch has been very important as we rely heavily on constantly improving our search functionality as its a core part of our product.
All in all you'll likely get a better product if you do it yourself. Its not that difficult. Its just another database to manage in your code, and it can easily be dropped in to your existing models without major refactoring.
River is a good solution once you want to have a almost real time synchronization and general solution.
If you have data in MongoDB already and want to ship it very easily to Elasticsearch like "one-shot" you can try my package in Node.js https://github.com/itemsapi/elasticbulk.
It's using Node.js streams so you can import data from everything what is supporting streams (i.e. MongoDB, PostgreSQL, MySQL, JSON files, etc)
Example for MongoDB to Elasticsearch:
Install packages:
npm install elasticbulk
npm install mongoose
npm install bluebird
Create script i.e. script.js:
const elasticbulk = require('elasticbulk');
const mongoose = require('mongoose');
const Promise = require('bluebird');
mongoose.connect('mongodb://localhost/your_database_name', {
useMongoClient: true
});
mongoose.Promise = Promise;
var Page = mongoose.model('Page', new mongoose.Schema({
title: String,
categories: Array
}), 'your_collection_name');
// stream query
var stream = Page.find({
}, {title: 1, _id: 0, categories: 1}).limit(1500000).skip(0).batchSize(500).stream();
elasticbulk.import(stream, {
index: 'my_index_name',
type: 'my_type_name',
host: 'localhost:9200',
})
.then(function(res) {
console.log('Importing finished');
})
Ship your data:
node script.js
It's not extremely fast but it's working for millions of records (thanks to streams).
Here I found another good option to migrate your MongoDB data to Elasticsearch.
A go daemon that syncs mongodb to elasticsearch in realtime.
Its the Monstache. Its available at : Monstache
Below the initial setp to configure and use it.
Step 1:
C:\Program Files\MongoDB\Server\4.0\bin>mongod --smallfiles --oplogSize 50 --replSet test
Step 2 :
C:\Program Files\MongoDB\Server\4.0\bin>mongo
C:\Program Files\MongoDB\Server\4.0\bin>mongo
MongoDB shell version v4.0.2
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.2
Server has startup warnings:
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** WARNING: This server is bound to localhost.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** Remote systems will be unable to connect to this server.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** Start the server with --bind_ip <address> to specify which IP
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** addresses it should serve responses from, or with --bind_ip_all to
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** bind to all interfaces. If this behavior is desired, start the
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten] ** server with --bind_ip 127.0.0.1 to disable this warning.
2019-01-18T16:56:44.931+0530 I CONTROL [initandlisten]
MongoDB Enterprise test:PRIMARY>
Step 3 : Verify the replication.
MongoDB Enterprise test:PRIMARY> rs.status();
{
"set" : "test",
"date" : ISODate("2019-01-18T11:39:00.380Z"),
"myState" : 1,
"term" : NumberLong(2),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
},
"appliedOpTime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
},
"durableOpTime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
}
},
"lastStableCheckpointTimestamp" : Timestamp(1547811517, 1),
"members" : [
{
"_id" : 0,
"name" : "localhost:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 736,
"optime" : {
"ts" : Timestamp(1547811537, 1),
"t" : NumberLong(2)
},
"optimeDate" : ISODate("2019-01-18T11:38:57Z"),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"electionTime" : Timestamp(1547810805, 1),
"electionDate" : ISODate("2019-01-18T11:26:45Z"),
"configVersion" : 1,
"self" : true,
"lastHeartbeatMessage" : ""
}
],
"ok" : 1,
"operationTime" : Timestamp(1547811537, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1547811537, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
MongoDB Enterprise test:PRIMARY>
Step 4.
Download the "https://github.com/rwynn/monstache/releases".
Unzip the download and adjust your PATH variable to include the path to the folder for your platform.
GO to cmd and type "monstache -v"
# 4.13.1
Monstache uses the TOML format for its configuration. Configure the file for migration named config.toml
Step 5.
My config.toml -->
mongo-url = "mongodb://127.0.0.1:27017/?replicaSet=test"
elasticsearch-urls = ["http://localhost:9200"]
direct-read-namespaces = [ "admin.users" ]
gzip = true
stats = true
index-stats = true
elasticsearch-max-conns = 4
elasticsearch-max-seconds = 5
elasticsearch-max-bytes = 8000000
dropped-collections = false
dropped-databases = false
resume = true
resume-write-unsafe = true
resume-name = "default"
index-files = false
file-highlighting = false
verbose = true
exit-after-direct-reads = false
index-as-update=true
index-oplog-time=true
Step 6.
D:\15-1-19>monstache -f config.toml
I found mongo-connector useful. It is form Mongo Labs (MongoDB Inc.) and can be used now with Elasticsearch 2.x
Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager
mongo-connector creates a pipeline from a MongoDB cluster to one or more target systems, such as Solr, Elasticsearch, or another MongoDB cluster. It synchronizes data in MongoDB to the target then tails the MongoDB oplog, keeping up with operations in MongoDB in real-time. It has been tested with Python 2.6, 2.7, and 3.3+. Detailed documentation is available on the wiki.
https://github.com/mongodb-labs/mongo-connector
https://github.com/mongodb-labs/mongo-connector/wiki/Usage%20with%20ElasticSearch
Here how to do this on mongodb 3.0. I used this nice blog
Install mongodb.
Create data directories:
$ mkdir RANDOM_PATH/node1
$ mkdir RANDOM_PATH/node2>
$ mkdir RANDOM_PATH/node3
Start Mongod instances
$ mongod --replSet test --port 27021 --dbpath node1
$ mongod --replSet test --port 27022 --dbpath node2
$ mongod --replSet test --port 27023 --dbpath node3
Configure the Replica Set:
$ mongo
config = {_id: 'test', members: [ {_id: 0, host: 'localhost:27021'}, {_id: 1, host: 'localhost:27022'}]};
rs.initiate(config);
Installing Elasticsearch:
a. Download and unzip the [latest Elasticsearch][2] distribution
b. Run bin/elasticsearch to start the es server.
c. Run curl -XGET http://localhost:9200/ to confirm it is working.
Installing and configuring the MongoDB River:
$ bin/plugin --install
com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb
$ bin/plugin --install elasticsearch/elasticsearch-mapper-attachments
Create the “River” and the Index:
curl -XPUT 'http://localhost:8080/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "mydb",
"collection": "foo"
},
"index": {
"name": "name",
"type": "random"
}
}'
Test on browser:
http://localhost:9200/_search?q=home
Since mongo-connector now appears dead, my company decided to build a tool for using Mongo change streams to output to Elasticsearch.
Our initial results look promising. You can check it out at https://github.com/electionsexperts/mongo-stream. We're still early in development, and would welcome suggestions or contributions.