Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Opinion-based Update the question so it can be answered with facts and citations by editing this post.
Improve this question
What is the pros and cons of MongoDB (document-based), HBase (column-based) and Neo4j (objects graph)?
I'm particularly interested to know some of the typical use cases for each one.
What are good examples of problems that graphs can solve better than the alternative?
Maybe any Slideshare or Scribd worthy presentation?
MongoDB
Scalability: Highly available and consistent but sucks at relations and many distributed writes. It's primary benefit is storing and indexing schemaless documents. Document size is capped at 4mb and indexing only makes sense for limited depth. See http://www.paperplanes.de/2010/2/25/notes_on_mongodb.html
Best suited for: Tree structures with limited depth
Use Cases: Diverse Type Hierarchies, Biological Systematics, Library Catalogs
Neo4j
Scalability: Highly available but not distributed. Powerful traversal framework for high-speed traversals in the node space. Limited to graphs around several billion nodes/relationships. See http://highscalability.com/neo4j-graph-database-kicks-buttox
Best suited for: Deep graphs with unlimited depth and cyclical, weighted connections
Use Cases: Social Networks, Topological analysis, Semantic Web Data, Inferencing
HBase
Scalability: Reliable, consistent storage in the petabytes and beyond. Supports very large numbers of objects with a limited set of sparse attributes. Works in tandem with Hadoop for large data processing jobs. http://www.ibm.com/developerworks/opensource/library/os-hbase/index.html
Best suited for: directed, acyclic graphs
Use Cases: Log analysis, Semantic Web Data, Machine Learning
I know this might seem like an odd place to point to but, Heroku has recently gone nuts with their noSQL offerings and have an OK overview of many of the current projects. It is in no way a Slideshare press but it will help you start the comparison process:
http://blog.heroku.com/archives/2010/7/20/nosql/?utm_medium=email&utm_source=EmailBlast&utm_content=619506254&utm_campaign=HerokuSeptemberNewsletter-VersionB&utm_term=NoSQLHerokuandYou
Checkout this for at glance comparison of NoSQL dbs:
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
MongoDB:
MongoDB is document database unlike Relational database. The document stores semi structured data like JSON object ( schema free)
Key features:
Schema can change over evolution of application
Full indexing
Load balancing & Data sharding
Data replication
Consistency & Partitioning in CAP theory ( Consistency-Availability-Partitioning)
When to use:
Real time analytics
High speed logging
Semi structured data management
When not to use:
Highly transactional applications with strong ACID properties ( Atomicity, Consistency, Isolation & Durability). RDBMS is preferred in this use case.
Operating on data sets involving relations - foreign keys etc
HBASE:
HBase is an open source, non-relational, distributed column family database
Key features:
It provides a fault-tolerant way of storing large quantities of sparse data (small amounts of information caught within a large collection of empty or unimportant data, such as finding the 50 largest items in a group of 2 billion records, or finding the non-zero items representing less than 0.1% of a huge collection)
Supports variable schema where each row is different
Can serve as the input and output for MapReduce job
Compression, in-memory operation, and Bloom filters on a per-column (A data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set)
5.Achieve CP on CAP
When to use HBase:
If you’re loading data by key, searching data by key (or range), serving data by key, querying data by key
Storing data by row that doesn’t conform well to a schema (variable schema)
When not to use HBase:
For relational analytics
Full table scans
Data to be aggregated, analyzed by rows instead of columns
Neo4j:
Neo4j is graph database using Property Graph Data Model (Data is stored as a graph and nodes & relationships with properties)
Key features:
Supports full ACID(Atomicity, Consistency, Isolation and Durability) rules
Supports Indexes by using Apache Lucence
Schema free, bottom-up data model design
High scalability has been achieved due to compact storage and memory caching available for graphs
When to use:
Master data management
Network and IT Operations
Real time recommendations
Fraud detection
Social network (like facebook)
When not to use:
Bulk queries/Scans
If your application requires Partitioning & Sharding of data
Have a look at comparison of various NoSQL technologies in this article
Sources:
Wiki, Slide share, Cloudera,Tutorials Point,Neo4j
You could also evaluate a Multi-Model DBMS, as the second generation of NoSQL product. With a Multi-Model you don't have all the compromises on choosing just one model, but rather more than one model.
The first multi-model NoSQL is OrientDB.
Pretty decent article here on MongoDB and NoRM (.net extensions for MongoDB)
http://lukencode.com/2010/07/09/getting-started-with-mongodb-and-norm/
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
We're starting a new project and looking for an appropriate storage solution for our case.
Main requirements for the storage are as follows:
Ability to support highly flexible and connected domain
Ability to support queries like "give all children of that item and items linked to that children" in ms
Full text search
Ad hoc analytics
Solid read and write performance
Scalability (as we want to offer a Saas version of our product)
First of all we eliminated all RDBMS, since we have really flexible schema which can also be changed by the customer (add new fields etc.),
so supporting such solution in any RDBMS can become a nightmare...
And we came to NoSQL. We evaluated sevaral NoSQL storage engines and chose 3 most appropriate (as we think).
MongoDB
Pros:
Appropriate to store aggregates with flexible structure (as we have
them)
Scalability/Maturity/Support/Community
Experience with MongoDB on previous project
Drivers, cloud support
Analitycs
Price (it's free)
Cons:
No support for relationships (relly important for us as we have a lot of connected items)
Slow retrieval of connected data (all joins happen in app)
Neo4j:
Pros:
Support of conencted data in modeling, flexibility
Fast retrieval of interconnected data
Drivers, cloud support
Maturity/Support/Comminity (if we compare with other graph Dbs)
Cons:
No support for aggregate storage (we would like to have aggregates in one vertex than in several)
Scalability (as far as I know, now all data is duplicated on other servers)
Analitics ?
Write performance ? (read several blogs where customers complained on its write performance)
Price (it is not free for commercial software)
OrientDB
Pros:
It seems that OrientDB has all the features that we need (aggregates and graphdb in one solution)
Price (looks like is't free)
Cons:
Immaturity (comparing with others)
Really small company behind the technology (In particular one main contributor), so questions about support, known issues etc.
A lot of features, but do they work pretty well
So now, the main dilemma for as is between Neo4j and OrientDB (MongoDb is a third option because its lack of relationships that are really important in our case - this post explains the pitfalls). I've searched for any benchmarks/comparison of these dbs, but all all of them are old. Here is a comparison by features http://vschart.com/compare/neo4j/vs/orientdb. So now we need an advice from people who already used these dbs, what to choose. Thanks in advance.
I think there are interesting trade-offs with each of these:
MongoDB doesn't do graphs;
Neo4j's nodes are flat key-value properties;
OrientDB forces you to choose between graphs and documents (can't do both simultaneously).
So your choice is between a graph store (neo4j or orient) and a document store (mongo or orient). My sense is that MongoDB is the leading document store and Neo4j is the leading graph database which would lead me to pick one of thse. But since connectivity is important, I'd lean towards the graph database and take Neo4j.
Neo4j's scalability is proven: it's in use for graphs larger than Facebook's and by enormous companies like Walmart and EBay. So if your problem is anywhere between 0-120% of Facebook's social graph, Neo4j has you covered. Write throughput is fine with Neo4j - I get in excess of 2,000 proper ACID Transactions per second on a laptop and I can easily queue writes to multiply that out.
Everything else is pretty equal: you can choose to pay for any of these or use them freely under their open source licenses (including Neo4j if you can work with GPL/AGPL). Neo4j's paid licenses have great support (up 24x7x365, 1 hour turnaround worldwide) versus OrientDB's rather lacklustre support (4 hour turnaround in the EU daytime only), and I imagine MongoDB has good support too (though I have not checked up on it).
In short, there's a reason Neo4j is the top database for connected data: it kicks ass!
Jim
To correct some misconceptions regarding mongoDB
Relations are supported, by either linking to other documents or embedding them. Please see the Data Modeling Introduction in the mongoDB docs for details. It may be that you are forced to trade normalization against speed, though. However, there are use cases in which embedding is the better solution compared to relations. Think of orders: When embedding order items and their price, you do not need to have a price history table for each and every product ever sold.
What is not supported are JOINs. Which you can circumvent by embedding documents, as mentioned above.
MongoDB can be used for tree structures. Please see Model Tree Structures with Materialized Paths for details. This approach seems to be the most appropriate way to implement a tree structure for the mentioned use case. An alternative may be an array of ancestors, depending on your needs.
That being said, mongoDB may fail in one of the basic requirements, though this really depends on how you define it: ad hoc analysis. My suggestion would be to model the intended data structure using a document oriented approach (in opposite of putting a relational approach on a document oriented database) and prototype one of the possible analysis use cases with dummy data.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Wondering which technology would do better for a typical product catalog of a webshop. I'm writing my master thesis about nosql in the enterprise environment and focused on document stores for to long now I think.
Read a lot articles which recommend document stores because of it's flexibilty which is needed to model thousands of different products. But as far as I know now, Column-Family Stores like Cassandra offer the same flexibility.
What I like most of the idea of using cassandra is, what nosql-database.org says about it (marked the most interesting features):
massively scalable, partitioned row store, masterless architecture, linear scale performance, no single points of failure, read/write support across multiple data centers & cloud availability zones. API / Query Method: CQL and Thrift, replication: peer-to-peer, written in: Java, Concurrency: tunable consistency, Misc: built-in data compression, MapReduce support, primary/secondary indexes, security features.
In the end I focus on building a prototype of a highly available and scaleable Multishop System which makes use of polyglot persistence, saying K/V Stores for Sessions, Document Store or Column-Family Store for Product Catalog and maybe RDBMS for Inventory/Pricing like Sadalage and Fowler mentioned in their book "NoSQL Destilled".
If possible, provide scientific papers or other reliable sources for your answers.
Thanks!
Document Store's Achilles Heel
Stuart Halloway mentioned that a document store is the biggest schema lock solution that is way too inflexible, which I agree with. Couch/Mongo and others try to mitigate that by providing workarounds to create secondary indicies, ability and necessity to be aware of plain object ids, etc. And of course if you think about versioning (i.e. add a "time" variable to your system), document stores fail fast to provide a smooth support and time travel.
Column Store: Problem Relevance
Cassandra is a really compelling solution for building "scalable"/"distributed" systems with real examples such as Netflix, where 500 Cassandra nodes can be brought up in AWS for several minutes, and all the requests hit a Cassandra ring.
However, given the problem as it is stated in your question, Cassandra would be an unnecessary overkill. Not just because it is a bit more complex than "others", or because it is mentally harder to create a solid data model on top of column oriented stores, but also because a "product catalog" problem is not quite a rocket science. It can be, if you want to add machine learning later to predict/recognize/etc.., but a catalog itself is not, and simpler stores such as PostgreSQL for example would solve it easily.
Simple Desire to NoSQL
If you really want to use NoSQL for a product catalog, I would definitely consider 3 solutions to fit your prototype:
Riak as a "K/V for Sessions"
Datomic to solve "Product Catalog, Inventory and Pricing"
Depending on the size and nature of the problem and the final solution, I would consider Redis to cache those sessions, while having Datomic comfortably sit on top of Riak as its storage service.
Practice vs. Theory
Two classical NoSQL papers that made NoSQL sound real in practice for the first time are Dynamo and BigTable. I consider Datomic to be the next evolutionary step in the DB universe by introducing a hybrid data model with true indicies and relations without a schema lock, and immutability from which everything follows: safe time travel, caching, local db values, etc.
Practically, if it wasn't a master theses, depending on the real problem scale and definition, I would be choosing between Datomic and PostreSQL to solve catalog, inventory, pricing, etc.
A big advantage of Datomic here is time travel. In practice it is very important to be able to safely and easily do that in a "Shopping System".
A big advantage of PostgreSQL is its familiarity and SQL tools availability for analytics and reporting.
By now I think that Column-Family Stores are not well suited for product calaloges.
It's because products often contain some kind of collections like tags, tracklists for music records, different sizes for clothes and so on.
Cassandra supports collections by now BUT they are not searchable! This is a must have feature for tags for example.
In contrast MongoDb for example offers the $in operator to search in nested arrays...
I don't want to say it is not possible to model a product calalog in Cassandra but I think it is much more straight forward to do it in a document store.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm looking for an opinion on replacing existing Data Grid (i.e. Oracle Coherence) with some document store alternative e.g. NoSQL MongoDB. I was think about the most important pros and cons and came up with:
NoSQL
Pros:
No additional database
No ORM mapping necessary
Although the best query efficiency can be achieved when looking up by ID, other queries can be satisfied by map/reduce queries
Cons:
Quite difficult to achieve data consistency when updating multiple collections or even multiple rows in a same collection
Slower response time ? (i suspect that Coherence reponse time might be better)
A read operation can return old data
Data Grid
Pros
With a Data Grid it seems easier to keep data consistent e.g. the data grid becomes is a SOR (System of Record)
As Data Grid becomes SOR, all data should always be available in the grid
Remote Executors
Cons
Additional database means additional overhead & system/application requirements
With a huge amount of data and sharding in place any kind of queries can take a lot of time
Couchbase Server is a very good replacement for Oracle Coherence particularly for enterprise class applications. Orbitz is a great example where large number of nodes of Coherence were replaced by 70 nodes of Couchbase.
You can read more about the Coherence replacement here: http://gigaom.com/cloud/balancing-oracle-and-open-source-at-orbitz/
Slides from an Orbitz presentation about Couchbase are also available here: http://www.slideshare.net/Couchbase/t1-s6-oww-usescouchbase
Pros:
High availability of nodes using replication and failover (avoid cold cache scenarios)
Sub-milli second latencies (built-in object level cache based on memcached)
High read / write throughput (very low granularity of locking) ( http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf)
Strong consistency at a document / item level
TTL / Expiry per document / item
Cons:
Difficult to achieve consistency across multi-document updates. Can be achieved using sentinels. ( http://www.amainhobbies.com/FromTheCEO/2012/09/09/invalidating-couchbase-cache-entries-with-sentinels/ )
It can, but so can a pen and paper system.
The question is, will it be an acceptable replacement. That wholly depends on the situation. In some cases a NoSQL solution is faster, more scalable than a relational solution, but in some situations it is essential to have some kind of support for longer running transactions and relational constraints.
It depends.
You already gave the pros and cons in detail...
as iwein said it depends...
What are the queries that existing relational system forced?
we know that partitioning in nosql db's are easier than realtional db's...
So if you switch to mongo you can extend your systems performance more cheaper and quicker way...
if people are happy on your oracle system now. don't touch it :)
Yes - NoSQL can replace it. But a lot depends on what you are trying to do.
If you just need a simple document store with easy key based lookups - NoSQL is a no-brainer.
If you need an enterprise class solution with paid for support and features such as custom aggregation, entry processors etc etc. Maybe Coherence is what you want.
I've seen people build custom NoSQL solutions on top of Coherence - which is a really expensive thing to do.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What kind of projects benefit from using a NoSQL database instead of rdbms wrapped by an ORM?
Examples:
Stackoverflow similiar sites?
Social communities?
forums?
Your question is very general. NoSQL describes a collection of database techniques that are very different from each other. Roughly, there are:
Key-value stores (Redis, Riak)
Triplestores (AllegroGraph)
Column-family stores (Bigtable, Cassandra)
Document-oriented stores (CouchDB, MongoDB)
Graph databases (Neo4j)
A project can benefit from the use of a document database during the development phase of the project, because you won't have to design complex entity-relation diagrams or write complex join queries. I've detailed other uses of document databases in this answer.
If your application needs to handle very large amounts of data, the development phase will likely be longer when you use a specialized NoSQL solution such as Cassandra. However, when your application goes into production, it will greatly benefit from the performance and scalability of Cassandra.
Very generally speaking, if an application has the following requirements:
scale horizontally
work with data model X
perform Y operations
the application will benefit from using a NoSQL solution that is geared towards storing data model X and perform Y operations on the data. If you need more specific answers regarding a certain type of NoSQL database, you'll need to update your question.
Benefits during development (e.g. easier to use than SQL, no licensing costs)?
Benefits in terms of performance (e.g. runs like hell with a million concurrent users)?
What type of NoSQL database?
Update
Key-value stores can only be queried by key in most cases. They're useful to store simple data, such as user sessions, simple profile data or precomputed values and output. Although it is possible to store more complex data in key-value pairs, it burdens the application with the responsibility of maintaining 'manual' indexes in order to perform more advanced queries.
Triplestores are for storing Resource Description Metadata. I don't anything about these stores, except for what Wikipedia tells me, so you'll have to do some research on that.
Column-family stores are built for storing and processing very large amounts of data. They are used by Google's search engine and Facebook's inbox search. The data is queried by MapReduce functions. Although MapReduce functions may be hard to grasp in the beginning, the concept is quite simple. Here's an analogy which (hopefully) explains the concept:
Imagine you have multiple shoe-boxes filled with receipts, and you want to calculate your total expenses. You invite some of your friends over and assign a person to each shoe-box. Each person writes down the total of each receipt in his shoe-box. This process of selecting the required data is the Map part.
When a person has written down the totals of (some of) his receipts, he can sum up these totals. This is the Reduce part and can be repeated multiple times until all receipts have been handled. In the end, all of your friends come together and sum up their total sums, giving you your total expenses. That's the final Reduce step.
The advantage of this approach is that you can have any number of shoe-boxes and you can assign any number of people to a shoe-box and still end up with the same result. Each shoe-box can be seen as a server in the database's network. Each friend can be seem as a thread on the server. With MapReduce you can have your data distributed across many servers and have each server handle part of the query, optimizing the performance of your database.
Document-oriented stores are explained in this question, so I won't discuss them here.
Graph databases are for storing networks of highly connected objects, like the users on a social network for example. These databases are optimized for graph operations, such as finding the shortest path between two nodes, or finding all nodes within three hops from the current node. Such operations are quite expensive on RDBMS systems or other NoSQL databases, but very cheap on graph databases.
NoSQL in the sense of different design approaches, not only the query language. It can have different features. E.g. column oriented databases are used for large amount of data warehouses, which might be used for OLAP.
Similar to my question, there you'll find a lot of resources.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am evaluating what might be the best migration option.
Currently, I am on a sharded MySQL (horizontal partition), with most of my data stored in JSON blobs. I do not have any complex SQL queries (already migrated away after since I partitioned my db).
Right now, it seems like both MongoDB and Cassandra would be likely options. My situation:
Lots of reads in every query, less regular writes
Not worried about "massive" scalability
More concerned about simple setup, maintenance and code
Minimize hardware/server cost
Lots of reads in every query, fewer regular writes
Both databases perform well on reads where the hot data set fits in memory. Both also emphasize join-less data models (and encourage denormalization instead), and both provide indexes on documents or rows, although MongoDB's indexes are currently more flexible.
Cassandra's storage engine provides constant-time writes no matter how big your data set grows. Writes are more problematic in MongoDB, partly because of the b-tree based storage engine, but more because of the multi-granularity locking it does.
For analytics, MongoDB provides a custom map/reduce implementation; Cassandra provides native Hadoop support, including for Hive (a SQL data warehouse built on Hadoop map/reduce) and Pig (a Hadoop-specific analysis language that many think is a better fit for map/reduce workloads than SQL). Cassandra also supports use of Spark.
Not worried about "massive" scalability
If you're looking at a single server, MongoDB is probably a better fit. For those more concerned about scaling, Cassandra's no-single-point-of-failure architecture will be easier to set up and more reliable. (MongoDB's global write lock tends to become more painful, too.) Cassandra also gives a lot more control over how your replication works, including support for multiple data centers.
More concerned about simple setup, maintenance and code
Both are trivial to set up, with reasonable out-of-the-box defaults for a single server. Cassandra is simpler to set up in a multi-server configuration since there are no special-role nodes to worry about.
If you're presently using JSON blobs, MongoDB is an insanely good match for your use case, given that it uses BSON to store the data. You'll be able to have richer and more queryable data than you would in your present database. This would be the most significant win for Mongo.
I've used MongoDB extensively (for the past 6 months), building a hierarchical data management system, and I can vouch for both the ease of setup (install it, run it, use it!) and the speed. As long as you think about indexes carefully, it can absolutely scream along, speed-wise.
I gather that Cassandra, due to its use with large-scale projects like Twitter, has better scaling functionality, although the MongoDB team is working on parity there. I should point out that I've not used Cassandra beyond the trial-run stage, so I can't speak for the detail.
The real swinger for me, when we were assessing NoSQL databases, was the querying - Cassandra is basically just a giant key/value store, and querying is a bit fiddly (at least compared to MongoDB), so for performance you'd have to duplicate quite a lot of data as a sort of manual index. MongoDB, on the other hand, uses a "query by example" model.
For example, say you've got a Collection (MongoDB parlance for the equivalent to a RDMS table) containing Users. MongoDB stores records as Documents, which are basically binary JSON objects. e.g:
{
FirstName: "John",
LastName: "Smith",
Email: "john#smith.com",
Groups: ["Admin", "User", "SuperUser"]
}
If you wanted to find all of the users called Smith who have Admin rights, you'd just create a new document (at the admin console using Javascript, or in production using the language of your choice):
{
LastName: "Smith",
Groups: "Admin"
}
...and then run the query. That's it. There are added operators for comparisons, RegEx filtering etc, but it's all pretty simple, and the Wiki-based documentation is pretty good.
Why choose between a traditional database and a NoSQL data store? Use both! The problem with NoSQL solutions (beyond the initial learning curve) is the lack of transactions -- you do all updates to MySQL and have MySQL populate a NoSQL data store for reads -- you then benefit from each technology's strengths. This does add more complexity, but you already have the MySQL side -- just add MongoDB, Cassandra, etc to the mix.
NoSQL datastores generally scale way better than a traditional DB for the same otherwise specs -- there is a reason why Facebook, Twitter, Google, and most start-ups are using NoSQL solutions. It's not just geeks getting high on new tech.
I'm probably going to be an odd man out, but I think you need to stay with MySQL. You haven't described a real problem you need to solve, and MySQL/InnoDB is an excellent storage back-end even for blob/json data.
There is a common trick among Web engineers to try to use more NoSQL as soon as realization comes that not all features of an RDBMS are used. This alone is not a good reason, since most often NoSQL databases have rather poor data engines (what MySQL calls a storage engine).
Now, if you're not of that kind, then please specify what is missing in MySQL and you're looking for in a different database (like, auto-sharding, automatic failover, multi-master replication, a weaker data consistency guarantee in cluster paying off in higher write throughput, etc).
I haven't used Cassandra, but I have used MongoDB and think it's awesome.
If you're after simple setup, this is it: You simply untar MongoDB and run the mongod daemon and that's it ... it's running.
Obviously that's only a starter, but to get you started it's easy.
I saw a presentation on mongodb yesterday. I can definitely say that setup was "simple", as simple as unpacking it and firing it up. Done.
I believe that both mongodb and cassandra will run on virtually any regular linux hardware so you should not find to much barrier in that area.
I think in this case, at the end of the day, it will come down to which do you personally feel more comfortable with and which has a toolset that you prefer. As far as the presentation on mongodb, the presenter indicated that the toolset for mongodb was pretty light and that there werent many (they said any really) tools similar to whats available for MySQL. This was of course their experience so YMMV. One thing that I did like about mongodb was that there seemed to be lots of language support for it (Python, and .NET being the two that I primarily use).
The list of sites using mongodb is pretty impressive, and I know that twitter just switched to using cassandra.