Mapping Scala class to Scalding or MongoDB - mongodb

I am new to both Scala and NoSQL databases. I would like to know is if there exist ORM tools that will map my Scala objects to a NoSQL database, as with RDBMS solutions?

There is a library called Kundera, based on JPA, that provides ORM for a number of NoSQL databases of different flavours, including HBase, Cassandra, MongoDB, CouchDB and Neo4J. See https://github.com/impetus-opensource/Kundera which appears to be under active development. Note that the nature of some of the problems that have given rise to NoSQL in the first place, and some of the tradeoffs inherent in the CAP theorem between availability, consistency and partition tolerance, make ORM challenging in a NoSQL environment. There is some good discussion of some of these issues here: http://architects.dzone.com/articles/sqlifying-nosql-%E2%80%93-are-orm
As you mentioned Scala, specifically, here is a very interesting article from Foursquare about how they built DSL with Scala for interacting with MongoDB. http://engineering.foursquare.com/2011/01/21/rogue-a-type-safe-scala-dsl-for-querying-mongodb/

Related

What NoSQL database is good for complex many to many relationships?

My startup project require complex data structure, which must be ready for geolocation and fulltext search. The information in the database are added by two different types of users, who together build a complex relationship network.
For a better understanding of my questions, here is a simple diagram that shows this relationship:
My first choice was MongoDB and Elasticsearch, but I noticed the problem of multiplication of the same data. Further planning, we concluded that for certain parts of the application need ACID transactions possibilities.
What NoSQL database is good for complex many to many relationships?
What would be a good choice for us?
A graph database like Neo4j is perfect for that.
If you need features of a document database like MongoDB, but with Neo4j under the hood, try Structr: http://de.slideshare.net/AxelMorgner/neo4j-as-document-database http://structr.org
(Disclaimer: I'm the project inititator of Structr)

spring-data - connecting to relational Databases

Researching Spring-Data - I understand why you would use for NoSQL databases but am struggling why you would use Spring-Data for relational databases over the standard Spring-ORM capabilities (e.g. the JPA support as standard).
Anyone got clear use-cases why you would use the spring-data framework for relational queries?
Thanks,
James.
The JPA Module of the Spring Data project is different from the NOSQL ones as we don't need to provide a low level store abstraction ourselves. So the main features are:
elimination of a large chunk of the implementation code needed for repositories (see this blog post for a showcase)
abstractions for pagination and dynamic sorting
DDD specifications to allow defining domain predicates (see this blog post as example)
support for Querydsl predicates
transparent entity auditing
The JDBC module of Spring contains support for Querydsl as well.

What NoSQL database to use as replacement for MySQL?

When it comes to NoSQL, there are bewildering number of choices to select a specific NoSQL database as is clear in the NoSQL wiki.
In my application I want to replace mysql with NOSQL alternative. In my application I have user table which has one to many relation with large number of other tables. Some of these tables are in turn related to yet other tables. Also I have a user connected to another user if they are friends.
I do not have documents to store, so this eliminates document oriented NoSQL databases.
I want very high performance.
The NOSQL database should work very well with Play Framework and scala language.
It should be open source and free.
So given above, what NoSQL database I should use?
I think you may be misunderstanding the nature of "document databases". As such, I would recommend MongoDB, which is a document database, but I think you'll like it.
MongoDB stores "documents" which are basically JSON records. The cool part is it understands the internals of the documents it stores. So given a document like this:
{
"name": "Gregg",
"fave-lang": "Scala",
"fave-colors": ["red", "blue"]
}
You can query on "fave-lang" or "fave-colors". You can even index on either of those fields, even the array "fave-colors", which would necessitate a many-to-many in relational land.
Play offers a MongoDB plugin which I have not used. You can also use the Casbah driver for MongoDB, which I have used a great deal and is excellent. The Rogue query DSL for MongoDB, written by FourSquare is also worth looking at if you like MongoDB.
MongoDB is extremely fast. In addition you will save yourself the hassle of writing schemas because any record can have any fields you want, and they are still searchable and indexable. Your data model will probably look much like it does now, with a users "collection" (like a table) and other collections with records referencing a user ID as needed. But if you need to add a field to one of your collections, you can do so at any time without worrying about the older records or data migration. There is technically no schema to MongoDB records, but you do end up organizing similar records into collections.
MongoDB is one of the most fun technologies I have happened to come across in the past few years. In that one happy Saturday I decided to check it out and within 15 minutes was productive and felt like I "got it". I routinely give a demo at work where I show people how to get started with MongoDB and Scala in 15 minutes and that includes installing MongoDB. Shameless plug if you're into web services, here's my blog post on getting started with MongoDB and Scalatra using Casbah: http://janxspirit.blogspot.com/2011/01/quick-webb-app-with-scala-mongodb.html
You should at the very least go to http://try.mongodb.org
That's what got me started.
Good luck!
At this point the answer is none, I'm afraid.
You can't just convert your relational model with joins to a key-value store design and expect it to be a 1:1 mapping. From what you said it seems that you do have joins, some of them recursive, i.e. referencing another row from the same table.
You might start by denormalizing your existing relational schema to move it closer to a design you wish to achieve. Then, you could see more easily if what you are trying to do can be done in a practical way, and which technology to choose. You may even choose to continue using MySQL. Just because you can have joins doesn't mean that you have to, which makes it possible to have a non-relational design in a relational DBMS like MySQL.
Also, keep in mind - non-relational databases were designed for scalability - not performance! If you don't have thousands of users and a server farm a traditional relational database may actually work better for you.
Hmm, You want very high performance of traversal and you use the word "friends". The first thing that comes to mind is Graph Databases. They are specifically made for this exact case.
Try Neo4j http://neo4j.org/
It's is free, open source, but also has commercial support and commercial licensing, has excellent documentation and can be accessed from many languages (REST interface).
It is written in java, so you have native libraries or you can embedd it into your java/scala app.
Regarding MongoDB or Cassendra, you now (Dec. 2016, 5 years late) try longevityframework.org.
Build your domain model using standard Scala idioms such as case classes, companion objects, options, and immutable collections. Tell us about the types in your model, and we provide the persistence.
See "More Longevity Awesomeness with Macro Annotations! " from John Sullivan.
He provides an example on GitHub.
If you've looked at longevity before, you will be amazed at how easy it has become to start persisting your domain objects. And the best part is that everything persistence related is tucked away in the annotations. Your domain classes are completely free of persistence concerns, expressing your domain model perfectly, and ready for use in all portions of your application.

Main differences/features among the most known NoSQL systems

I have no experience with NoSQL database systems but if I would have to choose one of the most known (MongoDb, Cassandra, CouchDb, Redis), can someone describe the relevant main features/differences of every one? Is there anything I should know with regards to their capabilities that might affect the choice of NoSQL system I use>
Redis is a key-value store. You can usually insert a primitive value (int, string, bool), or an array of primitives under a single key. Retrieval of data is usually limited to query by key. These are the most basic NoSQL databases.
Cassandra is a column-family store. It's similar to a key-value store, but supports nesting of key-value pairs up to about four levels deep. Querying is limited to query by key and map-reduce functions. This type of database has a rather difficult data model (does 'supercolumn' ring any bells?) and is highly specialized for extremely large amounts of data.
MongoDB and CouchDB are both document databases. They both store JSON documents, which aren't restricted by a schema, giving you a lot of flexibility. The database allows you to query the contents of these documents, which makes it very easy to retrieve data, compared to other types of NoSQL databases. Map-reduce functions are also supported.
Martin Fabik's answer contains some good links to comparisons of MongoDB and CouchDB.
Ayende has a series of blog posts called That No SQL Thing that handles each of these types. It's a good introduction to the NoSQL concept, the different types of NoSQL databases and how to query each of them. I highly recommend you read his articles on the database types I mentioned above, they are very instructive!
I was lately doing some research on this topic. I can post you some links..
Nice categorization of NoSQL DBs.
Comparing MongoDB and CouchDB
Comparing MongoDB, CouchDB and RavenDB Part1
Comparing MongoDB, CouchDB and RavenDB Part2
I was pretty impressed by RavenDB. It supports also transactions and triggers,but the licensing is not so friendly.
Some arguments why RavenDB from Ayende (creator of RavenDB)
In addition to the other links, take a look at the Survey of Distributed Databases as well as this what if that takes a look at the characteristics of other NoSQL databases and how they respond - [MongoDB and Foursquare - What If?].
Finally, [NoSQLSummer] has a lot of good papers describing the various offerings out there and the theoretical underpinnings of each one.

What are the differences between the NoSQL databases and how are they different from traditional relational databases?

There seem to be a lot of new "NoSQL" type databases out there.
Some of the popular ones are CouchDB, Cassandra and MongoDB.
What are the differences between such databases and how are they different from tradition relational databases? What are the advantages and disadvantages of picking NoSQL DBs over SQL DBs?
The term NoSQL covers a lot of different approaches to data storage ranging from the simplest key/value storage to sophisticated document databases. It's a catchy buzz word, but not very discriptive IMHO.
For a quick intro you could take a look at the Wikipedia entry for NoSQL
Agreed, the question is "not which is better," it's "which solution or set of solutions is best for this particular situation."
NoSQL covers a lot of different storage technologies such as CouchDB, MongoDB, Cassandra and Solr.
CouchDB and MongoDB store multi-dimensional data-structures. MongoDB is also schema-less. Cassandra is a column-based storage engine for fast retrieval, and Solr helps solve other problems such as faceting.
NoSQL simply refers to any storage facility which is not interacted with via SQL queries.
They are not better. NOSQL doesn't involve any new innovation or special feature. NOSQL just refers to a collection of software products that are used for certain types of application but don't necessarily have much else in common with each other. NOSQL does not have to mean a non-relational database.
Folks, Its a hot debate now a days, SQL or NoSQL, While some admire the elegance in terms of performance of NoSQL databases while others want to live with the legacy of SQL or the RDBMS. While each have its merits and demerits ,I tried to contrast it in brief using some points.
While RDBMS uses relations and joins to make data simpler in database tables
NoSQL don't use joins for performance.
NoSQL scales freely when we talk in terms of schema and data, while its very tough to scale a RDBMS if data grows.
There are restriction in size of data in RDBMS in terms of data-types capability, files of any size can be used in NoSQL databases.
Data integrity enforcement comes to play only in RDBMS not in NoSQL databases.
ACID is not the cup of tea for NoSQL databases but for RDBMS.
RDBMS supports complex transactions whereas NoSQL keeps mum for transactions.
NoSQL does not support constraints and validations while its the basic ingredient in RDBMS.
Data is not structured in NoSQL but is highly structured in form of tables in RDBMS.
Its all depends upon the nature and need of the project whether to use SQL or NoSQL.
RDBMS is completely structured way of storing data.
While the NoSQL is unstructured way of storing the data.
And another main difference is that the amount of data stored mainly depends on the Physical memory of the system. While in the NoSQL you don't have any such limits as you can scale the system horizontally.
You'll find that NoSQL database have few common characteristics. They can be roughly divided into a few categories:
key/value stores
Bigtable inspired databases (based on the Google Bigtable paper)
Dynamo inspired databases
distributed databases
document databases
Well,The basic difference are discussed below.Of course,now No-SQL concepts getting popular day by day.But still which one we need to use based on project need or requirements.
1) SQL databases are primarily called as RDBMS. whereas NoSQL database are primarily called as Non-Relational or Distributed database.
2) RDBMS will follow ACID properties i.e Atomcity,Consistency,Isolation,Durability.But in No-Sql it's following CAP (Consistency, Availability and Portioning).
3) In SQL we store data in Tabular formats only.But in No-SQL it uses collection of key-value pair, documents, graph databases or wide-column stores.So No-SQL is Schema free and It can handle structured, semi-structured and unstructured data.
But SQL is not Schema free.SQL is having Pre-Defined schema.i.e In SQL if you have table and in that first column is int data type,then you cant store string or Float values.
4) RDBMS follows SQL ( structured query language ) for defining and manipulating the data, which is very powerful. In NoSQL database, queries are focused on collection of documents. Sometimes it is also called as UnQL (Unstructured Query Language). The syntax of using UnQL varies from database to database.Also SQL databases are good fit for the complex query intensive environment whereas NoSQL databases are not good fit for complex queries. On a high-level, NoSQL don’t have standard interfaces to perform complex queries, and the queries themselves in NoSQL are not as powerful as SQL query language.
For Eg..Take Social Eng. sites,We upload photos/videos/Music/Album..etc.For that we get comments, replies to comments,like..etc.Here we can get numbers,special characters..,so almost we cant predict what might be the reply or comments.In this case we go for No-SQL in documented type like below to store the comments.
{
user_id: ObjectID("65f82bda42e7b8c76f5c1969"),
update: [
{
date: ISODate("2015-09-18T10:02:47.620Z"),
text: "Nice picture."
},
{
date: ISODate("2015-09-17T13:14:20.789Z"),
text: "1234#some smile symbol"
}
{
date: ISODate("2015-09-17T12:33:02.132Z"),
text: "...Oh my god.."
}
]
}
In Above if we go for SQL we cant store comments (text above) in column only.We need to store based on type.So we will end up with Big complex query with number of joins with different tables .But SQL is good for Transactions.
5)In most typical situations, SQL databases are Vertically scalable. You can manage increasing load by increasing the CPU, RAM, SSD, etc, on a single server. On the other hand, No-SQL databases are Horizontally scalable. You can just add few more servers easily in your No-SQL database infrastructure to handle the large traffic.
6)SQL databases are best fit for Heavy duty transnational type applications, as it is more stable and promises the atomicity as well as integrity of the data. While you can use NoSQL for transactions purpose, it is still not comparable and stable enough in high load and for complex transactional applications.
7)Examples for No-SQL are MangoDB,Cassandra..etc while for SQL are MySQL,SQL Server etc..