Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm starting out to design a site that has some requirements that I've never really dealt with. Specifically, the data objects will have similar, but not exact, attributes. Yes, I could probably figure out most of the possible attributes and then just not populate the ones that don't make sense, and therefore keep a traditional "Relational" table and column design, but I'm thinking this might be a really good time to learn NoSQL.
In addition, the user will have 1, and only 1, textbox to search, and I will need to search all data objects and their attributes to find that string.
Ideally, I'd like to have the search return in order of "importance", meaning that if a match for the user's entered string is found in a "name" attribute, it would be returned as a higher confidence match than if the string was matched on a sub-attribute.
Anyone have any experience in this sort of situation? What have you tried that worked or didn't work? Am I wrong in thinking that this project is very well suited to a NoSQL type of database?
Stick with a traditional relational database such as MySQL or Postgresql. I would suggest sorting by relevance in your application code after obtaining the matching results. The size of your result set should impact your design choices, but if you will have less than 1-2k results then just keep it simple and don't worry too much about optimization.
NoSQL is just a dumb key value store, a persistent dictionary that can be shared across multiple application instances. It can solve scalability issues, but introduces new ones since you now just have a dumb data store. Relational databases have had years of performance tuning and do a great job.
I find NoSQL to be much more suited to storing state data, like a users preferences or cache. If you are analyzing the relationship between data then you need a relational database.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a data-set that consist of 1500 columns and 6500 rows and I am trying to figure out what the best way is to shape the data for web based user interactive visualizations.
What I am trying to do is make the data more interactive and create an admin console that allows anyone to filter the data visually.
Front-end could potentially be based on Crossfilter, D3 and DC.js and give the user basically end-less filtering possibilities(date, value, country. In addition there will be some pre defined views like top and bottom 10 values.
I have seen and tested some great examples like this one, but after testing it did not really fit for the large amount of columns I had and it was based on a full JSON dump from the MongoDB. This amounted in very long loading times and loss of full interactivity with the data.
So in the end my question is what is the best approach (starting with normalization) in getting the data shaped in the right way so it can be manipulated from a front-end. Changing the amount of columns is a priority.
A quick look at the piece of data that you shared suggests that the dataset is highly denormalized. To allow for querying and visualization from a database backend I would suggest normalizing. This is no small bit software work but in the end you will have relational data which is much easier to deal with.
It's hard to guess where you would start but from the bit of data you showed there would be a country table, an event table of some sort and probably some tables of enumerated values.
In any case you will have a hard time finding a db engine that a lows that many columns. The row count is not a problem. I think in the end you will want a db with dozens of tables.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I appreciate there are one or two similar questions on SO but we are a few years on and I know EF's speed and general performance has been enhanced so those may be out of date.
I am writing a new webservice to replace an old one. Replicating the existing functionality it needs to do just a handful of database operations. These are:
Call existing stored procedures to get data (2)
Send SQL to the database to be executed (should be stored procedures I know) (5)
Update records (2)
Insert records (1)
So 10 operations in total. The database is HUGE but I am only dealing with 3 tables directly (stored procedures do some complex JOINs).
When getting the data I build an array of objects (e.g. Employees) which then get returned by the web service.
From my experience with Entity Framework, and because I'm not doing anything clever with the data, I believe EF is not the right tool for my purpose and SqlDataReader is better (I imagine it is going to be lighter and faster).
Entity Framework focuses mostly on developer productivity - easy to use, easy to get things done.
EF does add some abstraction layers on top of "raw" ADO.NET. It's not designed for large-scale, bulk operations, and it will be slower than "raw" ADO.NET.
Using a SqlDataReader will be faster - but it's also a lot more (developer's) work, too.
Pick whichever is more important to you - getting things done quickly and easily (as a developer), or getting top speed by doing it "the hard way".
There's really no good "one single answer" to this "question" ... pick the right tool / the right approach for the job at hand and use it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Lets say I have the following entities:
Student (Name, Courses, Department, Grades, Age, Gender, Address, Schedule)
Course (Name, Schedule, MaxStudent, EligibleStudents, Department, PreRequisite, CoRequisite)
Department (Name, NumberOfStudents)
Grades (Course, Value)
These are linked data. How should I save them in NoSQL?
I heard that joins are not encourage in NOSQL...
If you ask about MongoDB there are to ways of modeling relationships between entities - embedding and linking/referencing.
Generally speaking which one you choose depends on access patterns, atomicity considerations, expected document size growth and so on. There is no such thing like MongoDB equivalent of n-th normal form.
There is pretty extensive documentation on docs.mongodb.org so it is a good place to start.
Data Modeling Introduction
Data Modeling Concepts
Data Model Examples and Patterns
You can also watch short video about MongoDB schema design by Dwight Merriman.
Note: Please don't use terms NoSQL and Big Data here. They are just the buzzwords and are almost completely meaningless when you ask programming related question.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm looking for an opinion on replacing existing Data Grid (i.e. Oracle Coherence) with some document store alternative e.g. NoSQL MongoDB. I was think about the most important pros and cons and came up with:
NoSQL
Pros:
No additional database
No ORM mapping necessary
Although the best query efficiency can be achieved when looking up by ID, other queries can be satisfied by map/reduce queries
Cons:
Quite difficult to achieve data consistency when updating multiple collections or even multiple rows in a same collection
Slower response time ? (i suspect that Coherence reponse time might be better)
A read operation can return old data
Data Grid
Pros
With a Data Grid it seems easier to keep data consistent e.g. the data grid becomes is a SOR (System of Record)
As Data Grid becomes SOR, all data should always be available in the grid
Remote Executors
Cons
Additional database means additional overhead & system/application requirements
With a huge amount of data and sharding in place any kind of queries can take a lot of time
Couchbase Server is a very good replacement for Oracle Coherence particularly for enterprise class applications. Orbitz is a great example where large number of nodes of Coherence were replaced by 70 nodes of Couchbase.
You can read more about the Coherence replacement here: http://gigaom.com/cloud/balancing-oracle-and-open-source-at-orbitz/
Slides from an Orbitz presentation about Couchbase are also available here: http://www.slideshare.net/Couchbase/t1-s6-oww-usescouchbase
Pros:
High availability of nodes using replication and failover (avoid cold cache scenarios)
Sub-milli second latencies (built-in object level cache based on memcached)
High read / write throughput (very low granularity of locking) ( http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf)
Strong consistency at a document / item level
TTL / Expiry per document / item
Cons:
Difficult to achieve consistency across multi-document updates. Can be achieved using sentinels. ( http://www.amainhobbies.com/FromTheCEO/2012/09/09/invalidating-couchbase-cache-entries-with-sentinels/ )
It can, but so can a pen and paper system.
The question is, will it be an acceptable replacement. That wholly depends on the situation. In some cases a NoSQL solution is faster, more scalable than a relational solution, but in some situations it is essential to have some kind of support for longer running transactions and relational constraints.
It depends.
You already gave the pros and cons in detail...
as iwein said it depends...
What are the queries that existing relational system forced?
we know that partitioning in nosql db's are easier than realtional db's...
So if you switch to mongo you can extend your systems performance more cheaper and quicker way...
if people are happy on your oracle system now. don't touch it :)
Yes - NoSQL can replace it. But a lot depends on what you are trying to do.
If you just need a simple document store with easy key based lookups - NoSQL is a no-brainer.
If you need an enterprise class solution with paid for support and features such as custom aggregation, entry processors etc etc. Maybe Coherence is what you want.
I've seen people build custom NoSQL solutions on top of Coherence - which is a really expensive thing to do.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What are the strengths and weaknesses of the various NoSQL databases available?
In particular, it seems like Redis is weak when it comes to distributing write load over multiple servers. Is that the case? Is it a big problem? How big does a service have to grow before that could be a significant problem?
The strengths and weaknesses of the NoSQL databases (and also SQL databases) is highly dependent on your use case. For very large projects, performance is king; but for brand new projects, or projects where time and money are limited, simplicity and time-to-market are probably the most important. For teaching yourself (broadening your perspective, becoming a better, more valuable programmer), perhaps the most important thing is simple, solid fundamental concepts.
What kind of project do you have in mind?
Some strengths and weaknesses, off the top of my head:
Redis
Very simple key-value "global variable server"
Very simple (some would say "non-existent") query system
Easily the fastest in this list
Transactions
Data set must fit in memory
Immature clustering, with unclear future (I'm sure it'll be great, but it's not yet decided.)
Cassandra
Arguably the most community momentum of the BigTable-like databases
Probably the easiest of this list to manage in big/growing clusters
Support for map/reduce, good for analytics, data warehousing
MUlti-datacenter replication
Tunable consistency/availability
No single point of failure
You must know what queries you will run early in the project, to prepare the data shape and indexes
CouchDB
Hands-down the best sync (replication) support, supporting master/slave, master/master, and more exotic architectures
HTTP protocol, browsers/apps can interact directly with the DB partially or entirely. (Sync is also done over HTTP)
After a brief learning curve, pretty sophisticated query system using Javascript and map/reduce
Clustered operation (no SPOF, tunable consistency/availability) is currently a significant fork (BigCouch). It will probably merge into Couch but there is no roadmap.
Similarly, clustering and multi-datacenter are theoretically possible (the "exotic" thing I mentioned) however you must write all that tooling yourself at this time.
Append only file format (both databases and indexes) consumes disk surprisingly quickly, and you must manually run compaction (vacuuming) which makes a full copy of all records in the database. The same is required for each index file. Again, you have to be your own toolsmith.
Take a look at http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis He does a good job summing up why you would use one over the other.