It's obvious that I'm not an expert on Cassandra. So the question may sound silly.
Given an existing SQL-based project does it give any benefit or is it even possible to apply a no-SQL database(e.g. Cassandra) as an additional layer between business logic and SQL database to speed up our queries or inserts.
It's relatively new technology and I'm trying to find its place.
Cassandra will work fine, but if you don't care if you have to rebuild your data memcached will be faster.
But if you want a persistent cache, Cassandra is probably your best option -- reddit started by using Cassandra like this and is working on moving more functionality to it.
I would go with Windows Server AppFabric aka Velocity distributed cache with SQL Server, assuming you are on the .NET platform.
Scott Hanselman has a bunch of posts on AppFabric.
Related
My company has been used Oracle for a long time but we would like to look for a NoSQL database as a replacement for faster querying and flexible schema design.
I have tried to use MongoDB which would be the most popular NoSQL database nowadays. I connected it to Spring Data to do some simple queries, which is quite easy to be set up and code simply. Since we are using Spring MVC for web development, Spring Data seems quite suitable for integration.
However, I heard that Cassandra would have better performance in write and read, especially in large scaling system. I am not sure whether it is worth to move to Cassandra and not sure how to measure the performance between MongoDB and Cassandra.
Here are some requirements for my system:
focusing on article fetching
tagging for articles for users to easily search for their favors or related articles
non-distributed system, but have load-balancing and fail-over
Java based, Spring MVC for web development
articles would be stored as XML
probably provide user-defined tables (collections) and fields (keys)
Therefore I would like to raise some questions:
Which Database is the most suitable for my case? You may also raise other databases apart from MongoDB and Cassandra.
If I use Cassandra, which framework would be suitable for integrating to Spring MVC?
Thank you so much in advanced.
I have experience using Spring and Cassandra together. But I always have written my own data access layer.
Using the ORMs out there for Cassandra will not allow you to leverage its full power, and you will, most likely, introduce bugs because your SQL background will make you expect certain behaviours that are just not what Cassandra will give you.
My advice write the code that will access Cassandra yourself and do not be afraid to denormalize A LOT. Think more about how you want to query (or find it) your data than the format in which you want to save it.
I also strongly recommend reading this amazing article: Cassandra Data Modeling Best Practices part 1 part 2
Another DB which might suit your application better is CouchDB (I like using BigCouch). It is another Document based NoSQL database and is in my opinion superior to MongoDB. It offers better solution for scaling and gives emphasis to Availability (just like Cassandra).
I'd like to point you to this question about the difference between CouchDB and MongoDB.
As far as framework goes Play framework has a lot of plugin to work with NoSQL systems, so you might give it a try. You could try playorm which is the last I experimented on.
EDIT : I forgot to mention Kundera as well as an ORM for Cassandra
Choosing between Cassandra and MongoDB depends on type of storage. MongoDB is primarily for document based storage where you get an edge by having various sql like features.
If you require columnar database with high availability and multi dc replication? go for Cassandra.
http://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB
I would like to give a web app with a PostgreSQL database 100% offline functionality. In an ideal case the database should be completely replicated in the browser per user, and synchronized when online. So that the same code can be used to talk to both the offline and online database. I know this is possible with PouchDB and CouchDB, but have not found a solution that works with PostgreSQL. Is this at all possible?
Short answer: I don't know of anything like this that currently exists.
However, in theory, this could be made to work...(long answer:)
Write a PostgreSQL backend for levelup (one exists for MySQL: https://github.com/kesla/mysqldown)
Wire up pouch-server to read/write from your PostgreSQL db using pouchdb's existing leveldb adapter (which in turn will have to be configured to use your postgres backend). Congrats, you can now sync data using PouchDB!
Whether an approach like this is practical in reality for your application is a different question you'll have to answer.
You may be wondering, for example, "will I be able to sync an existing complex schema with multiple tables to the client with this approach?" The answer is probably not - the mysqldown implementation of leveldown uses a single MySQL table with three fields: id, key, and value (source), and I imagine any general-purpose PostgreSQL adapter would be similar (nothing says you can't do a special-purpose adapter just for your app though!).
On the other hand, if you were to implement a couchdb-compatible API (or a subset- you may not need attachments, for example) over your existing database schema, there's nothing stopping you from using PouchDB on the client to talk directly to that as if it were an actual CouchDB - just pop in the URL and call replicate()! Implementing the replication protocol might be a fair bit of work, since you'd need to track revisions and so on somewhere - but again, technically not impossible!
There are also implementations of levelup's backend storage that are designed for browsers. See level.js, which could be another way to sync between a server-side Postgres levelup backend and the browser.
TL;DR: There's tons of work being done around Javascript databases right now. Is syncing with Postgres impossible? probably not. Would it be a lot of work? Definitely. Worth it? Who knows, but it would be cool.
Without installing PostgreSQL on the client? No. Obviously you can cache data for offline use, but an entire RDBMS+procedural languages in Javscript, no.
We would like to use OrientDB Graph in an Azure environment. Does anybody has experience using it? We also would like to know if high availability from OrientDB is required under Azure cloud? Azure already offers high availability for Azure storage, Azure Drive and SQL. I understand that they have replications and load balancing built in.
This is super important because we prefer not to get into the business of replications and infrastructure management.
Thanks
So you can spin up 2 or more machines and install OrientDB on them, then configure them together as a distributed cluster. However I haven't been able to find any way that is simpler, easier to do. I am interested in this topic too.
Azure does have features such as geo-replication, which is protects your data against a major data-center incident but doesn't provide any performance benefit and will not make it highly available.
Although pretty reliable, occasionally Microsoft will reboot servers for updates, so to protect against downtime you can use affinity groups so that, of your 2 or more servers, one will always be online. This however does need to be used in conjunction with database replication and ideally load balancing.
It's also worth noting that OrientDB recommends clusters have an odd number of servers as this can prevent conflicts when synchronising data after a communication issue between the servers.
I am using it in amazon and I had to create a java project to monitor http requests inserts and queries. The queries are very fast but takes longer inserting data .
I recommend this type of graph database mode to decrease the time of the queries. Also if you have empty fields OrientDB manages very well compared to other databases .
If you need help with the java project can response to this post and I´ll help u.
I hope it helps. Good luck.
I want to write a high scalable web application for selling event tickets. I want to use NoSQL database, like Big Table or MongoDB and Cloud Service like Google App Engine (GAE) or Amazon Elastic Compute Cloud (Amazon EC2)
Is it posible using this type of database to be sure that two client will not be able to buy a ticket for the same place simultaneously? Or may be I will have to use RDBMS database and forget about Google App Engine?
Things like GAE's datastore can still support transactional semantics, for example:
http://code.google.com/appengine/docs/python/datastore/transactions.html
So yes, it is possible to do what you're seeking to do. (Note - GAE's Datastore is not exactly NoSQL, since it uses SQL-like queries.)
I have a problem with this question. Not all NoSQL databases are created equally, and different NoSQL databases have different ways they store data. Generally the thing you should be worried about are: data is actually written to disk and not just into memory. Most NoSQL databases can do this but not by default. Let's just say this is not a problem, you can usually tell the database like MOngo or Cassandra to write data to disk, can even tell how many servers at minimum the data should be written to.
The problem is that you may not get a true transactional support. When you deal with ecommerce it's important to have all or nothing type of transation where several operations either succeed completely or rolled back. There must be absolutely no chance that only part of your data is saved. For example, if you need to write data to more than one table (collection or document in NoSQL lingo), if server goes down in the middle of the process and your data is only written to one table, that's usually unacceptable in ecommerce.
I am not familiar with all NoSQL databases, but the ones I know don't have this option yet.
MySQL, on the other hand, does.
If transactional support or lack of it does not bother you, then I think its OK to use NoSQL as long as you tell it to save data to disk and not just into memory.
The answer is 'maybe.'
Depending on what you're trying to build, you many be able to use some of the techniques in this post:
http://kylebanker.com/blog/2010/06/07/mongodb-inventory-transactions/
Using something like get_or_insert you can easily ensure that two clients are not receiving the same resource simultaneously on Google App Engine. However, there are big differences between GAE and a RDBMS, so make sure you study them further before you make a decision.
I've prototyped an iPhone app that uses (internally) SQLite as its data base. The intent was to ultimately have it communicate with a server via PHP, which would use MySQL as the back-end database.
I just discovered Google App Engine, however, but know very little about it. I think it'd be nice to use the Python interface to write to the data store - but I know very little about GQL's capability. I've basically written all the working database code using MySQL, testing internally on the iPhone with SQLite. Will GQL offer the same functionality that SQL can? I read on the site that it doesn't support join queries. Also is it truly relational?
Basically I guess my question is can an app that typically uses SQL backend work just as well with Google's App Engine, with GQL?
I hope that's clear... any guidance is great.
True, Google App Engine is a very cool product, but the datastore is a different beast than a regular mySQL database. That's not to say that what you need can't be done with the GAE datastore; however it may take some reworking on your end.
The most prominent different that you notice right off the start is that GAE uses an object-relational mapping for its data storage scheme. Essentially object graphs are persisted in the database, maintaining there attributes and relationships to other objects. In many cases ORM (object relational mappings) map fairly well on top of a relational database (this is how Hibernate works). The mapping is not perfect though and you will find that you need to make alterations to persist your data. Also, GAE has some unique contraints that complicate things a bit. One contraint that bothers me a lot is not being able to query for attribute paths: e.g. "select ... where dog.owner.name = 'bob' ". It is these rules that force you to read and understand how GAE data store works before you jump in.
I think GAE could work well in your situation. It just may take some time to understand ORM persistence in general, and GAE datastore in specifics.
GQL offers almost no functionality at all; it's only used for SELECT queries, and it only exists to make writing SELECT queries easier for SQL programmers. Behind the scenes, it converts your queries to db.Query objects.
The App Engine datastore isn't a relational database at all. You can do some stuff that looks relational, but my advice for anyone coming from an SQL background is to avoid GQL at all costs to avoid the trap of thinking the datastore is anything at all like an RDBMS, and to forget everything you know about database design. Specifically, if you're normalizing anything, you'll soon wish you hadn't.
I think this article should help you.
Summary: Cloud computing and software development for handheld devices are two very hot technologies that are increasingly being combined to create hybrid solutions. With this article, learn how to connect Google App Engine, Google's cloud computing offering, with the iPhone, Apple's mobile platform. You'll also see how to use the open source library, TouchEngine, to dynamically control application data on the iPhone by connecting to the App Engine cloud and caching that data for offline use.
That's a pretty generic question :)
Short answer: yes. It's going to involve some rethinking of your data model, but yes, changes are you can support it with the GAE Datastore API.
When you create your Python models (think of these as tables), you can certainly define references to other models (so now we have a foreign key). When you select this model, you'll get back the referencing models (pretty much like a join).
It'll most likely work, but it's not a drop in replacement for a mySQL server.