Beginning Cassandra -- Use Kundera? Something else? [closed] - jpa

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
We are in the process of getting our feet wet with Cassandra. None of us have any experience with this particular platform, but are experienced developers with JavaEE, JPA, etc. I came across the Kundera library that provides a JPA implementation compatible with several NoSQL datastores, including Cassandra.
It's tempting to go down this route, as we will be able to get up and running MUCH faster. However, is it the right idea? What are the tradeoffs of using a library like this? How does it affect performance? Is there a huge difference?
I'm curious to know what experiences others have had using this library. And, if there is something else we should look at instead I'd love to hear about it.

Taking Jonathan's (jbellis) to the next level, IMHO, the purpose of ORMs like hibernate or JPA specs is to hide the complexities of SQL since Application developers deal best in objects and not SQL.
Similarly, kundera hides the complexities of NoSQL but in an intelligent way that allows it to use the power of NoSQL but still make it easy for developers to use the traditional RDBMS paradigm.
However, as Jonathan mentions, you should still understand Cassandra data modeling concepts, otherwise, you will end up creating another SQL like monster even over Cassandra.
Kundera does help in this NoSQL modeling by using optimization techniques such as Embedded or One-to-Many relationships automatically converted into multiple Columns rather than creating new Columnfamilies (RDBMS tables equivalents) thus circumventing creating a physical relational model.
"A tool is only as good as the skills of the craftsman/woman using it!"

Use the native CQL driver and read the documentation on data modeling. Pretending that Cassandra is a relational database the way Kundera does is a great way to paint yourself into a corner without quite understanding how you got there.

PlayOrm is a noSql mapping layer and is NOT JPA compliant on purpose following many of the patterns of noSQL. noSQL is not relational. In fact, it probably should not have the R in PlayORM as it is not really relational completely either. You still have relationships in noSQL though. It is not just an RDBMS.

Related

Azure Hadoop and Entity Framework

I am stating a new project that needs to be portable and in some scenarios will have 100's millions of entities.
Now with Azure getting hadoop that of course got my attention for the big data scenario. But I also have the small data scenario under 1 million rows.
Entity Framework code-first is they way I see designing this but needing hadoop in the mix of course may complicate things (Entity Framework of course been used to give simpler storage providers for the smaller data sets)
Now the question is does anyone have experience in this mix?
Can anyone recommended if this is a good approach or not, if not is there a better way?
Working on a reasonably large system based on Entity Framework Code First, with the caveats that I have been working with EF4 and cannot upgrade to 5, that your mileage may vary and the outcomes are going to be strongly affected by what you intend to do, my experience has been that EF doesn't handle large amounts of data tremendously well, it is quite inflexible, so if you need to change its standard behaviour in some way there is a good chance you are going to end up having to hack some nasty workarounds, and performance isn't amazing. If you want to do things that aren't exactly the things that EF expects you to do, then you can run into walls.
If I was thinking of designing a relatively simple/small scale Asp.Net MVC setup, I think EF is a really good choice. For a larger scale operation where you need more flexibility or you are planning to go beyond basic operations you might find that something like NHibernate works better. I don't have experience with that, but colleagues who have worked with both tend to prefer NHibernate. ( Brief article on the comparison - a bit old so EF has addressed some, but not all of the points there. It is also different by design of course. )
It may be that for more high-traffic or unusual stuff you need to roll your own data access anyway just to achieve the right performance or be able to find the right data. Unsquestionably, if you are planning to try working with EF I strongly recommend some serious prototyping just to ensure it can do what you need.

NoSQL (e.g. MongoDB) or RDMS (e.g. PostgreSQL) for new Scala project?

I'm developing a brand new project in Scala. It's just an application for a bunch of CRUD operations, however, because of some eccentric requirements, Play2 or Lift does not fit the bill, so I'm going to develop the application from the ground up. This means that Anorm or ScalaQuery becomes less obvious choices for database integration, and leaves me with the question: is it time to try something new?
My past technology stacks mostly included Java and PostgreSQL and I have experience with both ORM and plain SQL. Are NoSQL database management systems like MongoDB a good replacement for a typical RDBMS or are they special case application data stores? Also, how does the choice of database effect the greater Scala system design (if at all)? For example, the fact that you are using a JSON-like interface to talk to the database, and JSON between the web and a REST service, does not mean that much if everything in the middle becomes Scala objects, or does it?
I'm basically asking for someone's experience on moving from relational to object/document type databases, using Scala in particular. I know that good RDBMS integration is promised in the upcoming release of SLICK. So, if a company like TypeSafe decides to make a RDBMS integration part of the TypeSafe stack, then will I be swimming upstream by integrating to MongoDB using Casbah for example?
Apologies if this question appears a bit vague. I do hope that someone with the right insights or experience will be able to help though.
Update:
Apologies for not adding links to SLICK (it being fairly new). Here goes:
Quick overview
Project home
Update 2:
My personal first win for a technology is usually developer productivity - this translates to lightweight and simple: quick to learn, easy to maintain, no magic
I am currently in a similar situation, and since I have some experience with web development and SQL databases, I took it as an opportunity to work with MongoDB, Cashbah (and Scalatra). My experience is still very limited and the project and the amount of data I am working with is pretty small, but here are a few observations I've made.
For the few sets of data I have, performance does not seem to motivate either SQL or NoSQL. However, performance in the presence of huge amounts of data is often listed as a reason for using NoSQL, e.g., by Wikipedia
My documents (entries in the database) arise from benchmarking test suits, and mainly have a static structure, and I am optimistic that I could store them in a fixed-schema SQL database. However, a few substructures are not static, e.g., new test cases are added, new statistics are tracked, others are removed. This was my main motivation for trying a schema-free NoSQL database. Also, because I had the feeling that the document approach of MongoDB makes it much more obvious which data belongs together (i.e., to a document), in contrast to entries in a relational database, where the data would be distributed over various tables and rows, and where a full "document" would need to be reconstructed by joins.
Tools such as Lift-Json or Rogue allow you to work with regular Scala objects in a type-safe, although the data is regularly (de-)serialised as (from) JSON. However, this naturally works best if the structure of your data is mainly static, otherwise, you you are left with using strings to access your data (e.g., for expanding the results of a query using Cashbah).
If you are mainly concerned about a coherent representation of data on server and client side, languages such as Opa or Haxe might be of interest, since they compile to code that can executed on both sides. See this page for "multitarget" or "tierless" languages.
Got too long for a comment. Was just trying to relate my short experience with Scala (about 6 months now, since about when Play2 came out--it's quickly become my go to language).
I've enjoyed using Salat/Casbah with MongoDB in my last few projects; most have been in Play2, but the latest was without a webapp framework. It definitely hasn't felt like swimming upstream.
I would say that there are particular use cases for which I wouldn't use mongo, but it works nicely as a general purpose object data store, especially if you expect to query by id or index and don't need transactions (and will need minimal ad-hoc aggregation type stuff).
Expect to require a separate set of servers dedicated to mongodb (or to use a service dedicated to mongodb), but I guess that's normal for most serious database apps.
I've also used Play2/Anorm, which was surprisingly enjoyable to use for some ad-hoc query dashboard-style report pages. I started trying to go the Squeryl route, but Anorm seemed easier to use for one-off aggregation queries. Haven't looked at SLICK, but it sounds interesting.
It's really hard to say without knowing what problems you would like the app to solve.
I've personally found my productivity increased using NoSQL DBs via REST/JSON. Though bear in mind most NoSQL DBs offer REST interfaces which preclude the need for much middleware, Scala or otherwise, unless you intend to write a webapp with a UI.
If this is a learning exercise, I recommend you try multiple things out, as each NoSQL DB has something different to offer to your toolkit, and have personally found CouchDB, Riak, Neo4j, and MongoDb all with various pluses and drawbacks and good for different purposes.
Hope this helps, good luck.

Any Postgres compatible ORM for Node.js? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm seeking for a good ORM for postgres under Node.js, one that supports declaration of relationships beetween models, and fields validation. I've searched during a long time and cannot get any satisfying results. Maybe someone can point me to a project I missed during my researches.
Thx.
node-orm2 looks good: supports association, validators, and mysql, postgres, and mongo (in beta)
UPDATE: The node-orm2 package is no longer maintained. Possible alternatives include bookshelf or sequelize.
SequelizeJS - models, validation and migrations
BookshelfJS - a promise based ORM looks quite promising
JugglingDB - multidatabase ORM inspired by activerecord and datamapper. Supports validations, hooks, relations. Works with: mysql, postgres, sqlite, memory, redis, mongodb, neo4j.
Not production ready now (march 2012), but growing fast. I plan stable release soon.
Would recommend trying Knex for the database and Bookshelf as an ORM on top of it (developed by same person). I'm using it with postgres, but supports SQLite, MySQL/MariaDB and Oracle (in alpha) too.
Very expressive promise-based API with bluebird behind it, knex has a well documented and great command line tool for making migrations, seed files etc. Bookshelf uses backbone models and collections as an inspiration, including the .extend(..) paradigm for inheritance, so picking it up is a breeze if you come from that world. So far, so good.
Missy is a universal ORM for both SQL and NoSQL databases which is simple, flexible, well-documented and supports some fancy features that other ORMs are lacking
ORM's are a little too slow for fast nature of node.js; plain database driver is fine, but a little tiring. That's for I do write something just between: prego. It provides automatic statement preparation, migrations, simple models with associations, transactions and few utilities, all callback style and fast. Ideas/issues are welcome.
I suggest you use this pair: pg (like a driver) and light-orm (like orm wrapper).
https://npmjs.org/package/pg
https://npmjs.org/package/light-orm
https://www.npmjs.org/package/rdb
Simple, flexible mapper.
Transaction with commit and rollback.
Persistence ignorance - no need for explicit saving, everything is handled by transaction.
Eager or lazy loading.
Based on promises.
Well documented by (running) examples.

When to use NoSql, and which one? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've been programming with php and mySql for a while now and recently decided that I wanted to give nosql a try. I would really appreciate if some of you with experience could tell me:
When is it a good time to switch, how do I know nosql is for me?
Which nosql software would you recommend?
Thanks
When is it a good time to switch?
It really depends on the particular project. But in general I see that I can use nosql for 95% of web applications. I will still use old good sql for the systems which should guarantee ACID (for example, systems that work with 'real' money).
How do I know nosql is for me?
It's for you, for you, believe me. ;)
You just need to try something from the nosql world, read some existing articles and you will see all of the benefits and problems.
Which nosql software would you recommend?
I would personally recommend you to start from mongodb, because it really simple. To become an expert in sql takes years, to become an expert in mongodb needs a month or so.
I suggest that you spend an hour for reading "The little mongodb book" and try to write your first test application starting from tomorrow.
No one here will say that you need to use this, or this database. What database to use depends on project and requirements.
It depenends on your application needs.. There are a lot of options.
You can use a document-oriented like mongodb, a "extended" key-value like Redis or maybe a graph-oriented like neo4j
This article is very useful http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html
This recent blog post in High Scalability pretty much answers your question in regards to when to use NoSQL.
I myself always go MySQL until it fails me and then choose the right tool for the job, some of the non-relational databases I worked with are:
Riak: a dynamo clone which is useful when you need to access records quickly but you have too many records to keep on one machine. For instance a recommender system for users in a web application, you want to access the data in a few milliseconds but you have 200M users.
MongoDB: a document-based database, for when I needed speed but had a write-intensive application (read/write ratio close to 1:1) the data was highly transient so I didn't care about the durability issues
The best time to switch is when you:
Start working on a new project and you make your first architectural decisions. Porting an existing application to a different database can cause a lot of headaches.
Hit a brick wall... or better before you see one coming :) The main reason is usually lack of performance or scalability.
Need a missing feature (eg: complex hierachical structures, graph-like traversals, etc..).
I would recommend a lot of them, but each of them has their own sweet-spots where they shine and other parts where they lack features. The only way to pick the right tool(s) is to get familiar with a couple of them.
Web developers usually learn key-value stores (memcached, redis) first as they can fix a lot of performance problems (but also add some complexity to your app...).
There are document (schema-less) databases like MongoDB or CouchDB which can significantly enhance your productivity if your data model often changes.
For graph traversals there is NeoJ.
For "big data" there is Hadoop and its related projects.
And a list goes on and on...

How to start a project [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Just wondering how to start a project from concept,specs,dev etc. In development do you start with database design? or maybe theres a resource you know i can look at.
Starting with database design is actually a big pet peeve of mine. Sure, it's fine for some projects. Simple forms-over-data apps, stuff like that. But for anything more complex, anything that has a "domain" of logic, do not start with database design. Start with domain modeling. If you're taking business logic and putting it in code, then it's highly likely that the business users who define the logic flow do not think in terms of SQL or relational data at rest. They think in terms of logical interactions of concrete and abstract concepts.
As Eric S. Raymond said, "Smart data structures and dumb code works better than the other way around." Usually, when one starts with the database design, one creates a flat "dumb" data structure. Not dumb in the sense that it's a bad design, but in the sense that it has no built-in logic. It's flat and dimensionless. All of the intelligence would need to go into the code that uses it.
A rich domain model, on the other hand, incorporates business logic and concepts directly into the data structures. It's enhances the data itself with actual business intelligence, carrying that intelligence throughout the domain.
Now, this doesn't mean that you shouldn't think of persistence at all while designing the domain. But the persistence should be built to accompany the domain, not the other way around. Nilsson suggests starting with the domain and during the development of it take breaks to think and work on the persistence. This is because the domain model is really the core, but you'll need to evaluate any compromises on persistence to keep yourself realistic. Going for true persistence ignorance could dig yourself into some holes.
That all depends on what sparked the motivation to start a project in the first place. It could vary from sitting down and fleshing out detains of something that's been brewing amorphously in your mind for years, or sitting down and making a quick and dirty prototype to convince yourself that the genius idea you had that seemed so simple is actually quite a thorny bush that requires you to sit down and flesh out.
I never start with database design, as that's an implementation detail. I might not even want to use a database. I start with the functional design. What do I want it to do? Why? How? How does it do it differently from other approaches? Is the benefit enough to even bother doing it at all? You get the idea. Implementation design is tackled once I know clearly what I am doing and most importantly why.
That is very general but the first step is always to figure out and document what you want the application to do. Then I usually develop and ERD which defines the tables required to accomplish those functions along with the class structure that sits in front of those tables. Once those two big parts get done, it's usually pretty smooth sailing.