What is BigData and NoSQL, any good books on both? [closed] - nosql

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know I am asking two questions in one. But can someone please tell me what is meant by bigdata. Also how does NoSQL different from conventional SQL.
Lastly can you please recommend good/best books or tutorials/website on topic which can take a newbie to advance level.
Please reply.

"Big Data" is a buzz word, which means that it defines different (albeit related) things to different people.
Some use it for database software that specializes in "Big Data", some use it for whole infrastructure that manipulates large data sets, some use it for large data sets themselves (structured, semi-structured, and non-structured).
"Big Data" data sets posses at least one distinct property: due to their large size and/or lack of structure they are assumed to hide valuable information and relationships. The end goal of about every "Big Data" project is uncovering these valuable knowledge in efficient and repeatable manner.
How large is "Big Data"? Large enough that few years ago it would demand million dollar investments in all 3: hardware, software and development. Today you may still require a significant (but less) investment but probably just into 2 out of these 3.

Related

high volume database choice for php [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am about to develop website using YII framework. But i am not quite sure about which database i should use.
Generally insertion and selection would be there in a website. Data would be come from different relational tables as i will have more than 50 filters so that user can see what ever database they want to see.
Here is the example of website. http://property.sulekha.com/
I want to design something like this.
which new concept i can use for the optimization, for better performance.
I have few concept in my mind which i am supposing to use.
1) MemCache
2) HipHop PHP
3) Doctorin ORM
I am just wondering how facebook search is working, are they using any advance tool for search??????
Facebook architecture is a fascinating one, and you shouldn't try to copy it, because you don't need it, and as we all know, premature optimization is the devil.
scaling issues are not something you prepare for, unless you're working for an enterprise and know first hand that you'll recieve huge amounts of traffic from day 1, like the new mega.
if you're talking about a large de-normalized table, which sounds so by applying up-to 50 filters, maybe you should consider a NoSQL solution, like mongoDB.
from what I know about facebook search, is that the servers are clustered, and are basically pointers to the "real" data, which means that alot of their data isn't a physical one, but as I said, unless you plan on sporting 1 billion users - that's over your head as of now.
good luck

Why is Perl market position in server-side scripting so low, even less than Java? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
As per the article at W3Techs, Perl ranks the lowest among the server side scripting languages, even less than Java? Is there any reason behind it? Perl, as far as I see, is very popular, and an awesome language, how come it is hardly used by websites? Does it have issues with server side scripting?
This article has a lot of details on how W3Techs gets their data: http://w3techs.com/blog/entry/usage_of_perl_for_websites_fell_below_1_percent
As i did some analysis on this, let me summarize in short that the data presented by W3Techs is deeply flawed and extremely misleading. First off, it is important to know that they detect technologies of sites by running simple scripts at them that look for file suffixes in urls and then just take that and never verify with the site owner. As such they have a "no-detect" rate of 17.6% (plus an unknown "false-detect" rate). A more correct version of their chart would be this:
If you'd like to get more details and more mistakes in their data methodology, please take a look at the comments of the article, especially those written by "Mithaldu" or "Christian Walde", i.e. me. I posted extensively there as to why their data is nearly useless and why they're even misinterpreting the data they do have.

Nosql Database suggestion for high performance [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
We have requirements that force us to have two layers of databases. A good caching solution backed by large distributed database. We are thinking to use redis for fast read and write. We are not yet settled for the database at backend, however we would prefer it to have following properties:
consistent over time.
robust (no data loss).
reasonably fast read.
distributed.
We are exploring cassandra and Mongodb as our options. Hbase might be a option too. Kindly let us know your views/ current state of work. We are expecting some comparative analysis which could be like in http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis , but should be more upto date and can give us better insight. An example usecase could be like when someone post a comment in facebook. The comment is then visible to all its friends in real time.

PostgreSQL nested queries performance [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Are there significant performance issues when using nested(2 level deep at max) queries in PostgreSQL?
I use version 8.4.2
I am asking because I am planning to use quite a lot of those soon on a busy website..
The boring answer: it depends on the query and your data.
To write (and read and understand) a nested query might be easier than writing a non-nested one, but you might end up paying the price in reduced performance. During my previous database project we ended up rewriting quite a few of the more critical queries to avoid nesting and we saw order of magnitude performance improvements.
EXPLAIN is your friend. You should learn to love it and how to use it :)
http://www.postgresql.org/docs/current/static/sql-explain.html
Not really.
If you want to improve, does not forget to runs analyze on all tables periodically.
Your question is waaaaaaaaaaaaaaaaaaaaaaay too general. There isn't any inherent issue with using "nested" queries in Postgres, no matter how many levels deep. You need to post specific queries if you have issues.
Additionally...if you're designing a new system, then why 8.4 and not 9.0? And even on 8.4, you should update to 8.4.5.

How to answer this interview question? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
As i am in between 1-2 years experience what should i say to this interview question....
What are the types of Normalization?
Should i say all the normal forms or what?
Way too broad of a question for an interview - it could fill a small book. I would simply remember a few key points about the first 3 normal forms (4 and 5 for extra credit). Here's a somewhat decent summary of them.
If I were interviewing you, and asked the question, I would want to hear above anything else that most db designers strive for at least 3NF but should be able to deviate from that for X reasons. Knowing when to stray from normalization and why is way more important and telling than knowing the definitions.
Knowing the formal definitions of the normal forms and being able to give some real world examples would be an excellent answer to the question.
FWIW, I think it's a silly question to ask except when interviewing people straight from a University where there's not much to ask for but theory. One of the 1st things they taught me when they taught normalization was "we'll explain these [normalization] steps now, but keep in mind that once you understand it, you won't think in terms of normal forms because 3NF will come naturally". And they were right.
Much better interview questions would be "what's wrong with this schema?" and "design a schema for the following data...". because they show applied, practiical knowledge of the underlying principles.