Do voltDB or NuoDB need ElasticSearch? - postgresql

Currently I am working in a project that use PostgreSQL + ElasticSearch. However I recently found VoltDB, and I was wondering if we still need ElasticSearch for doing searches with VoltDB.
If I am ok, elasticSearch get the data from PostgreSQL of from another relational Database, and them it reindexes the data to make faster Queries instead using the relational Database indexes. This is because the data stored in ElasticSearch is not completely trusted because ElasticSearch is not ACID compliant.

VoltDB is very fast and is excellent at parallelizing work across hardware resources. It doesn't contain any kind of full-text-indexing functionality. Any kind of full-text search on VoltDB will be at least mostly brute force. That doesn't mean it won't meet your needs, but it really depends on the kind of queries you want to run.

Based on my (limited) knowledge of ElasticSearch, it appears that it is a search server that would work in conjunction with the database and is used primarily to search and index document files.
If this is correct, I don't think that NuoDB would be a replacement for ElasticSearch but could likely work in conjunciton with it similar to PostgreSQL.
Also, similar to Volt, NuoDB doesn't have full-text-indexing functionality.

Related

Any way to get Meteor using a native ACID compliant db?

I am seriously considering Meteor framework for building every POC and apps in the future...but, I can't get ride of an ACID compliant database as I have few usages of multi-documents atomic transaction that require this compliance.
Meteor strongly rely on MongoDB syntax and storage engine at the moment (it means there are no "Transaction" related syntax available...)
I am currently evaluating any solution allowing this ACID capability :
Using a MySQL native driver for Meteor (different syntax than MongoDB?)
Using a PostgreSQL native driver for Meteor (SQL syntax)
Using a TokuMX (a MongoDB fork with ACID compliance...same syntax than MongoDB appart from transaction related commands that would be required to add)
Those 3 solutions are good candidates for the Meteor roadmap as shown here
What pros/cons about those solution ? Which are the most advanced one ?
What would you although consider as a solution to keep Meteor while storing documents in a NoSQL like ACID compliant db ?
sqlAndMeteor
If you are like me, you love Meteor but hate Mongo. In Meteor's Trello Roadmap (https://trello.com/b/hjBDflxp/meteor-roadmap), the most voted feature is SQL Support, either PostgreSQL or MySQL.
Since there is no date for that in Meteor, here I summarize the partial solutions I have found.
1.- Use SQL only for client-side querys.
Let's face it, Mongo sucks on common data operations, so having the ability to use SQL to query data (with JOINS, GRUP BY and so on) would relief a lot of pain. There are packages which let you use SQL in the client, at least for querys: The simplest one is a old (2010) utility, SqlLike (http://www.thomasfrank.se/sqlike.html). The new player in town is alaSQL, which is actively developed by #agershun (https://github.com/agershun/alasql). The SqlLike advantage is that it only has 10k. AlaSQL, is a lot more powerful, of course, but for using SQL to replace mongo sintax in unions and aggregations, SqlLike is OK.
With both of them you can do something like this in your helper:
productsSold:function(){
var customerSalesHistory=salesHistory.find({cutomerId:Session.get('currentCustomer')}).fetch();
var items=products.find().fetch();
return alasql("select item.name, sales.ordered as sumaVentas from ? sales, ? items
where items.Id=sales.itemId",[customerSalesHistory,items]);
}
2.- Experiment with direct SQL support.
Some packages try to replace Mongo (and minimongo) with MySql or PostgreSQL. #numtel's MySql package is Meteor-MySql https://github.com/numtel/meteor-mysql, and PostgreSQL is Meteor-pg (https://github.com/numtel/meteor-pg). Both are good attempts to solve the problem, but have some issues yet and are somehow cumbersome to adapt.
A team from Hack Reactor has formed Meteor Stream, and its first product is a PostgreSql integration with Meteor, meteor-postgres (https://github.com/meteor-stream/meteor-postgres). It looks very good and uses alaSql on the client to replace minimongo.
Both approaches are good, but they have some problems:
They broke deployment to meteor.
They are very, very young and not near production ready AFAIK
They still require tweaks to the usual pub-sub sintax we are used to, which could raise compatibility issues with other meteor packages.
3.- Still use Mongo, but as simple repository for your MySql database.
This option maintains all Meteor's characteristics and uses Mongo as a temporal repository for your MySql or PostgreSql databases.
A brilliant attempt to that is mysql-shadow by #perak (https://github.com/perak/mysql-shadow). It does what it says, keeps Mongo synchronized both ways with MySql and let's you work your data in MySql.
The bad news is that the developer will not continue maintaining it, but what is done is enough to work with simple scenarios where you don't have complex triggers that update other tables or stuff like that.
For a full featured synchronization you can use SymmetricsDS (http://www.symmetricds.org), a very well tested database replicator. This involves setting up a new java server, of course, but is by far the best way to be sure that you will be able to convert your Mongo database in a simple repository of your real MySql, PostgreSQL, SQL Server , Informix database. I have to check it myself yet.
For now MySQL Shadow seems like a good enough solution.
One advantage of this approach is that you can still use all standard Meteor features, packages, meteor deployment and so on. You donĀ“t have to do anything but set up the synch mechanism, and you are not breaking anything.
Also, if someday the Meteor team uses some of the dollars raised in SQL integration, your app is more likely to work as is.
If MySQL works for you, I've used the meteor-mysql package and it works well.
I finally forged my own conclusion...
I will have TWO platforms :
a Meteor front with business data in PostgreSQL and some front data or easy-to-replicate data in MongoDB
a Java data backend (server2server only) handling all atomic operations on my business data in PostgreSQL...plus technical adapters (SAP, Salesforce), a BPMN 2.0 workflow engine (Actility), and any registered SOA needed from other systems
Any comments are still very welcomed and will be considered and answered

HBase or Mongo for an Analytics DB if already using Hadoop?

I currently have a Hadoop cluster where I store tons of logs over which I run pig scripts for calculating aggregated analytics. I also have a Mongo cluster where I store production data.
I've recently been put in a position where I need to do a lot of one-off analytics queries, or enable others to do them. These queries frequently need to use both production data and log data together, so whatever I go with, I'd like to have everything in one place. My log data is in json and about 10x the size of my prod data. Here are the pros/cons of Mongo and HBase I'm seeing:
Mongo Pros/ HBase Cons:
Since log data is in JSON, I can get it into Mongo pretty easily, and I can do this in real time as it comes in through something like FluentD.
Most people I work with already have experience writing Mongo queries from needing to work with prod data, so getting an analytics db up on Mongo would be very simple for everyone to use.
I know much less about Hbase than Mongo.
No idea how easy/difficult it would be to get data in JSON or from Mongo into Hbase. I imagine this isn't so bad, but I don't see much documentation.
HBase Pros/Mongo Cons:
My log data is much bigger than my prod data, so storing it in both hadoop and mongo would be way more expensive than storing my prod data in both hadoop and mongo.
I can build HBase on top of my already running Hadoop cluster and fit my prod data in there without adding many extra machines. If I went with Mongo, I'd need a whole new Mongo cluster.
I could use Phoenix on top of Hbase to allow a simple SQL syntax for accessing all our data, but I'm not sure how unwieldily this would be for multi-level document-based data.
I know very little about Hbase currently, and I wouldn't consider myself a Mongo expert, so I'm probably missing a lot.
So, what am I missing, and which is right for my situation?
First of all, you should use something which you already can handle. Therefore, Mongo DB seems a good choice, especially when the data is already in the json format.
On the other hand, I used HBase quite a while and the read performance is amazing although having a lot of rows and I really don't know if there is any good and fast integration of Mongo DB with Hadoop.
HBase is the Hadoop database, so it is predestinated to work with Hadoop together.
If the logs could be indexed by (in the HBase Rowkey):
producing_program_identifier, timestamp, ...
HBase could work quite well for this query pattern.
But if you decide on HBase, use the
phoenix framwork, it will save you time using familiar interfaces like jdbc and sql-like queries. It also provides simple aggregation functions (count, avg, max, min) which may be sufficient.
From what you're saying it seems a mongoDB based solution would work best for you.
HBase is extremely versatile and you can get it to serve both your prod needs as well as your analytics needs however the general purpose SQL capabilities (in Phoenix, Cloudera's Impala and others) are in their infancy and the standard HBase way to get high query performance (designing the data structure for reads) will take a lot on effort (esp. since you don't have experience in HBase).
By the way it may be applicable for you to use map/reduces pre-aggregated data and then load it into MongoDB and thus utilize your current setup bette rather than change it either way

Existing Postgres Database vs Solr

We have an app that uses postgres database, that has about 50 tables. Each table contains about 3 Million records (on average). The tables get updated with new data every now and than. Now, we want to implement search feature in our app. The search needs to be performed on one table at a time (no joins needed).
I've read about postgres full text support and that looks promising. But it seems that Solr is Super fast in comparison to it. Can I use my existing postgres database with Solr? If tables get updated would I need to re-index everything again?
It is definitely worth giving Solr a try. We moved many MySQL queries involving JOINs on multiple tables with sorting on different fields to Solr. We are very happy with Solr's search speed, sort speed, faceting capabilities and highly configurable text analysis/tokenization options.
If tables get updated would I need to re-index everything again?
No, you can run delta imports to only re-index your new and updated documents. See https://wiki.apache.org/solr/DataImportHandler.
Get started with https://lucene.apache.org/solr/4_1_0/tutorial.html and all the links in there.
Since nobody has leapt in, I'll answer.
I'm afraid it all depends. It depends on (at least)
how big the text is in each "document"
how flexible you want your searching to be
how much integration you need between database and text-search
how fast is fast enough
how much experience you have with both
When I've had a database that needs some text searching, I've just used PG's built-in options. If I didn't have superuser access to the db, or was already running a big Java setup then Solr might well have appealed.

Java ORMs on NoSQL DB like HBase

I have recently started getting familiarized with NoSQL (HBase). I am definitely a noob.
I was investigating about ORMs and high level clients which can be used on HBase and came across a few.
Some ORM libraries like Kundera are providing SQL like data query functionality. I am finding this a little counter intuitive.
Can any one help me understand why we would again need SQL like querying if the whole objective was to move away from it?
Also can anyone comment on your experiences with ORMs for HBase? I looked at a few of them from http://wiki.apache.org/hadoop/SupportingProjects and started looking at Kundera.
Another related question - Does data query with Kundera run map reduce jobs internally?
kundera or Spring data might provide user friendly ORM layer over NoSQL databases, but the underlying entity model still has to be NoSQL friendly. This means that NoSQL users should not blindly follow RDBMS modeling strategies but design ORM entities in such a way so that all NoSQL capabilities can be used.
As a thumb rule, the kundera ORM entities should be designed using query-first strategy where first the queries need to defined so as to create primary keys and also ensuring that relationship model is used as minimal as possible. Querying on random columns and full scans should be avoided and so data might have to be replicated across entities for reducing multiple entity look ups. Also, transactions management needs to be planned. FYI, kundera does not support transactions(beyond single row TX supported by Hbase/Cassandra).
Reason for using Kundera:
1) If looking for SQL like support over HBase. As it is build on top of HBase native API, so it simply transforms these SQL queries in to corresponding GET or PUT method calls.
2) Currently it support HBase-0.20.6 only. Kundera-2.0.6 will enable support for HBase 0-90.x versions.
3) Kundera does not do sometihng out of the box to provide map reduce over SQL like queries. However support for such thing will be provided in Kundera-2.0.6 by enabling support for Hive native queries only!
It is totally JPA compliant, so no need to learn something new. It simply hides complexity at developer level with very minimal effort.
SQL like querying is for developement ease, quick developement, less error prone and reusability ofcourse!
-Vivek

How can I use MongoDB as a cache for Postgresql?

I have an application that can not afford to lose data, so Postgresql is my choice for database (ACID)
However, speed and query advantages of MongoDB are very attractive, but based on what I've read so far, MongoDB can report a successful write which may not have gone to disk, so I can't make it my mission critical db (I'll also need transactions)
I've seen references to people using mysql and MongoDB together, one for the transactions and the other for queries. Please not that I'm not talking about keeping some data in one DB and the rest in another. I want to use Postgresql as a gateway to data entry, and MongoDB for reads.
Are there any resources that offer an architecture/guide for Postgresql + MongoDB usage in this way? I can remember seeing this topic in Postgresql conference agenda, but I could not find the link.
I don't think you'll get much speed using MongoDB just as a cache. It's strengths are replication and horizontal scalability. On one computer you'd make Mongo and Postgres compete for memory, IO bandwidth and processor time.
As you can not afford to loose transactions you'll be better with Postgres only. Its has efficient caching, sophisticated query planner, prepared queries and wide indexing support cause that read-only queries will be very fast - really comparable to MongoDB on a single computer.
Postgres can even scale horizontally now using asynchronous, or, from version 9.1, synchronous replication.
One way to achieve this would be to set up a master-slave replication with the PostgreSQL database as master, and the MongoDB database as slave. You would then do all reads from MongoDB, and all writes to PostgreSQL.
This post discusses such a setup using a tool called Bucardo:
http://blog.endpoint.com/2011/06/mongodb-replication-from-postgres-using.html
You may also be able to do it with Tungsten Replicator, although it seems designed to be used with MySQL:
http://code.google.com/p/tungsten-replicator/wiki/TRCHeterogeneousReplication
I can remember seeing this topic in Postgresql conference agenda, but I could not find the
link.
Maybe, you are talking about this: https://www.postgresqlconference.org/content/hybrid-applications-using-mongodb-and-postgres
Depending how important transactions are to you, one option is to use MongoDb driver's safe mode and drop Postgresql.
http://www.mongodb.org/display/DOCS/getLastError+Command
How can you expect transactional consistency from Postgres but trust MongoDB for reads? How would you support rollbacks in this scenario? How do you detect when they've gotten out of sync?
I think you're better off going with memcache and implementing a higher level object cache. Alternatively, you could consider a replication slave for reads. If you have performance needs beyond what a dedicated read slave can provide, consider denormalizing your tables on your slave system.
Make sure that any of this is actually needed. For thin tables with PK lookups most modern database engines like Postgres or InnoDB are going to generally keep up with NoSQL solutions. Don't fall into the ROFLSCALE trap
http://www.youtube.com/watch?v=b2F-DItXtZs
I think you can run a mongo replica set.. Let say 3 Slave and 1 Master.. Then in your app you should run all write transactions on Postgresql and then on Mongo ReplicaSet.. After that you can query read operations on Mongo Replica set..
But Synchronizing will be a problem, you should work on it..
you may find some replacement for mongo in here or here that is safer and fast as well.
but I advise to simplify your solution instead of making a complicated design.
Visual Guide to NoSQL Systems
lucky
In mongodb we can specify writeConcern property to specify that it should write to journal/ instances and then send confirmation/ acknowledgement and i think even mongodb has teh concept of transactions. Not sure why we need postgres behind it.