A few basic questions about getting started with Titan and Tinkerpop - titan

This morning I decided to try out Titan. I've used both Neo4j and OrientDB, and was going to implement a polyglot persistence model using one of those graph databases; however, since I am already using Cassandra, I decided to try out Titan.
I've read through the Titan docs, as well as the Tinkerpop docs, but a few things are still unclear. Both Neo4j and OrientDB are pretty much plug-and-play; since Titan seems like more of a layer on top of a db backend like Cassandra, I'm unsure of how to start with setting it up. I can start the gremlin console and connect to my Cassandra cluster, and I can start titan server, both from the console.
My main question is, am I supposed to install titan as a service? Do I make my own init scripts, or use supervisor/monit/etc to manage it? Basically, what is the right way to keep everything running and available?

Titan starts as an application by itself; configuring and running embedded applications of backends (Berkeley, Embedded-Cassandra, ...) or connects to already started server or cluster of Cassandra or DynamoDB.
This means that you can pass a single configuration file that includes all information you want Titan to use. In this config file, you could ask Titan to embed a backend (start and maintain it by itself) or connect to a local/remote instance.
These are several examples of config files that you should have a look at.
As a quick introduction, download Titan 1.0.0 and run its gremlin console by moving to main directory and running
bin/gremlin.sh
Inside the gremlin console, you could run something like
TitanGraph g = TitanFactory.build().
set("storage.backend", "berkeleyje").
set("storage.directory", "/tmp/graph").
open();
Or you could load a configuration file like this:
TitanGraph g = TitanFactory.open("path/to/properties/file")
Dive deep here.

Related

What is upconfig and downconfig in zookeeper?

I am a noob in Solr and zookeeper and trying to learn by myself. I understood that zookeeper is a file structure that manages solr cluster and prevents race condition using locks. I didn’t understand what is upconfig and downconfig and when we do that. It would be of great help if someone can give me a clear picture on it. Thanks in advance!
A better and more general description of Zookeeper is an application that provides centralised configuration for distributed systems. So in Solr Cloud, you can have multiple Solr instances across multiple servers acting together as a single cloud. However, if you want to update a collection's configuration, you don't want to have to go to each server and update them all individually. You want only one version of the config which is then used by any collection that needs it. Hence the conf commands.
upconfig uploads a configuration to ZooKeeper, which then ensures that all collections using that configuration (throughout the Cloud, on all the servers) have that specific config. So you only need to upload it once, on one server.
downconfig lets you fetch a configuration from Zookeeper.

Apache Ignite Failover functionality

I have set apache ignite on a Cluster of nodes and sent some job to some server node to run. When connection to that server node was lost I need to somehow store the result of that node locally (either via binary file or via some other way). Then when the connection with that node is established again push back the stored results to some Database server.
I'm working under .Net platform.
I can use
EventType.EVT_CLIENT_NODE_DISCONNECTED
EventType.EVT_CLIENT_NODE_RECONNECTED
these events and inside of their functions to implement the 'storing locally' and 'pushing to the DB server' functionality but I wanted to find some ready solution.
Is there any ready tool with the functionality I mentioned to just take and use it?
You can take a look at Checkpointing. I'm not sure this is exactly the same as you described (mainly because it will save the intermidiate state on server side), but I think it can be quite helpful.

MongoLab and Elasticsearch

My Mongo database is hosted at MongoLab. I'd like to use ElasticSearch as a full text search engine on top of my DB.
As I understand MongoDB needs to run as a replica-set, but I don't have any control on how the database run. I'm currently using the 500mb free plan.
On the top of that, I'm using the scala playframework.
Was anyone successful with those technologies and services?
Update:
Finally I'm not using MongoDB anymore, and went straight for a ElasticSearch solution.
I found this nice cloud host providing a 500MB free plan http://facetflow.com/
It was very useful for my development.
I didn't find any satisfying Scala library for ES, therefore I'm using Dispatch and make direct http requests to the ES instance.
I hope that someone will find this useful.
Just a quick note ... MongoHQ has oplog support with their MongoDB Elastic Deployments ... those could help you with using Elastic Search and River.
http://blog.mongohq.com/elastic-deployments-now-with-oplog-access/
I haven't looked into this too deeply, but you might want to check out Searchly http://www.searchly.com/features/ . The features mention
Built-in crawler for crawling web pages and databases. (Currently MongoDB)
If you try this out, please let me know how it goes. I will do the same.
Update:
I haven't tried searchly, but I was able to start a MongoDB instance in replica mode on OpenShift.
I have also an Elastic Search server running on the same OpenShift "gear".
Now I need time to try connecting those two together, and then the fun will start :-)

MongoDB on Azure Cloud

Is MongoDB for Azure production ready ?
Can anyone share some experience with it ?
Looks like comfort is missing for using it for prod.
What do you think ?
Edit: Since there is a misunderstanding in my question i will try to redefine it.
The information i look into from the community is sharing an info of someone who is running mongo on windows azure to share experience from it.
What i mean by experience is not how to run it in the cloud(we already have the manual on 10gens faq) nor how many bugs it have(we can see that in mongo-azure jira).
What i am looking for is that how it is going with performance ?
Are there any problems(side effects) from running mongodb on azure ?
How does mongodb handle VM recycling ?
Does anyone tried sharding ?
In the end, is the mongo-azure worker role from 10gens stable for using it in production ?
Hope this clears out.
A bit of clarification here. MongoDB itself is production-ready. And MongoDB works just fine in Windows Azure, as long as you set up the scaffolding to get it to work in the environment. This typically entails setting up an Azure Drive, to give you durable storage. Alternatively, using a replicaset, you effectively have eventual consistency across the set members. Then, you could consider going with a standalone (or standalone with hot standby). Personally, I prefer a replicaset model, and that's typical guidance for production MongoDB systems.
As far as 10gen's support for Windows Azure: While the page #SyntaxC4 points to does clarify the wrapper is in a preview state, note that the wrapper is the scaffolding code that launches MongoDB. This scaffolding was initially released in December 2011, and has had a few tweaks since then. It uses the production MongoDB bits (and works just fine with version 2.0.5 which was published on May 9). One caveat is that the MongoDB replicaset roles are deployed alongside your application's roles, since the client app needs visibility to all replica set nodes (to properly build the set). To avoid this limitation, you'd need to run mongos and the entry point (and that's not part of 10gen's scaffolding solution).
Forgetting the preview scaffolding a moment: I have customers running MongoDB in production, with custom scaffolding. One of them is running a rather large deployment, with multiple shards, using a replicaset per shard.
So... does it work in Windows Azure? Yes. Should you take advantage of 10gen's supplied scaffolding? If you're just looking for a simple way to launch a replicaset, I think it's fine. If you want a standalone model, or a shard model, or if you need a separate deployment for MongoDB, you'd currently need to do this on your own (or modify the project 10gen published).
MongoLab is now offering Mongo as a service on Azure MongoLab Blog
Free Demo account is 0.5 GB storage are available in the Windows Azure Store
The warning message on their site says that it's a preview. This would mean that there would be no support for it at a product level in Windows Azure.
If you want to form your own opinion on a comfort level, you can take a look at their bug tracking system and get a feeling for what people are currently reporting as issues.

using the oplog monitoring class in casbah

I am trying to use the oplog monitoring class in casbah
https://github.com/mongodb/casbah/blob/master/casbah-core/src/main/scala/util/OpLog.scala
What i want to do is monitor the oplog entries at a production mongo db on
production.someserver.com
and get the entries and send them to the storage DB at
test.someotherserver.com
and replicate all the data that is in the production server to the test server. I cannot use replica sets to do this as i cannot redeploy now. I am trying to build a scala app to do this. Casbah the official scala driver for mongo as the above mentioned class which i m trying to instantiate using
val mongoColl = MongoConnection() ("test") ("test_data")
val oLog = new MongoOpLog(mongoColl)
But im not even able to instantiate it, getting an error that mongooplog is not found. Ive imported the necessary package. But even if im able to do this i have no clue on how to do what i want to do. can any one pls point me in a right direction on how to achieve this. I am pretty new to scala so a bit of detailed explanation or a link containing it would be helpful for me.
You need to have replication enabled on the server for the oplog to be created; as either a member of a replica set or in master mode for master/slave.
Otherwise, MongoDB does not waste CPU cycles and disk space maintaining an oplog. Please see the documentation on Replication for more info - http://www.mongodb.org/display/DOCS/Replication
You should really never be running any database with a single server in production, incidentally.