Database Optimization techniques for amateurs - rdbms-agnostic

Can we get a list of basic optimization techniques going (anything from modeling to querying, creating indexes, views to query optimization). It would be nice to have a list of these, one technique per answer. As a hobbyist I would find this to be very useful, thanks.
And for the sake of not being too vague, let's say we are using a maintstream DB such as MySQL or Oracle, and that the DB will contain 500,000-1m or so records across ~10 tables, some with foreign key contraints, all using the most typical storage engines (eg: InnoDB for MySQL). And of course, the basics such as PKs are defined as well as FK contraints.

Learn about indexes, and use them properly. Generally speaking*, follow these guidelines:
Every table should have a clustered index
Fields used for filters and sorts are good candidates for indexing
More selective fields are better candidates for indexing
For best performance on crucial queries, design "covering indexes" for those queries
Make sure your indexes are actually being used, and remove those that aren't
If your table has 15 fields, and you make 15 indexes, each with only a single field, you're doing it wrong :)
*There are some exceptions to these rules if you know what you're doing. My experience is Microsoft SQL Server, but I would presume most of this advice would still apply to a different RDMS.

IMO, by far the best optimization is to have the data model fit the problem domain for which it was built. When it does not, the resulting symptom is difficult-to-write or convoluted queries in order to get the information desired and that typically rears itself when reports are built against the database. Thus, in designing a database it helps to have an idea as to the types and nature of the information, such as reports, that the users will want from the system.

When talking database design, check out the database normalization, e.g. the wikipedia article: Normal forms.
If you have a good design and still you need to optimize for performance, try Denormalisation.
If you have specific needs which are not covered by relational model efficiently, look at other models covered by the term NoSQL.

Some query/schema optimizations:
Be mindful when using DISTINCT or GROUP BY. I find that many new developers will use DISTINCT in places where it really is not needed or could be rewritten more efficiently using an Exists statement or a derived query.
Be mindful of Left Joins. All too often I find new SQL developers will ignore the schema in place and use Left Joins where they really are not necessary. For example:
Select
From Orders
Left Join Customers
On Customers.Id = Orders.CustomerId
If Orders.CustomerId is a required column, then it is not necessary to use a left join.
Be a student of new features. Currently, MySQL does not support common-table expressions which means that some types of queries are cumbersome and probably slower to write than they would be if CTEs were supported. However, that will not be true forever. Keep up on new syntax features in MySQL which might be used to make existing queries more efficient.
You do not have to use surrogate keys everywhere. There might be tables better suited to an intelligent key (e.g. US State abbreviations, Currency Codes etc) which would enable developers to avoid additional joins in many cases.
If possible, find ways of archiving data to an OLAP or reporting server. The smaller you can make the production data, the faster it will run.

A design that concisely models your problem is always a good start. Overgeneralizing the data model can lead to performance problems. For example, I've heard reports of projects striving for uber-flexibility that use the RDBMS as a dumb "name/value" store - and resulting performance was appalling.
Once a good design is in place, then use the tools provided by the RDBMS to help it achieve good performance. Single field PKs (no composites), but composite business keys as an index with unique constraint, use of appropriate data types, e.g. using appropriate numeric types for numeric values rather than char or similar. Physical attributes of the hardware the RDBMS is running on should also be considered, since the bulk of query time is often disk I/O - but of course don't take this for granted - use a profiler to find out where the time is going.
Depending upon the update/query ratio, materialized views/indexed views can be useful in improving performance for slow running queries. A poor-man's alternative is to use triggers to invoke a procedure that populates the table with a result of a slow-running, infrequently-changed view.
Query optimization is a bit of a black art since it is often database-dependent, but some rules of thumb are given here - Optimizing SQL.
Finally, although possibly outside the intended scope of your question, use a good data access layer in your application, and avoid the temptation to roll your own - there are surely tested and performant implementations available for all major languages. Use of caching at the data access layer, middle tier and application layer can help improve performance considerably.

Do use less query whenever possible. Use "JOIN", and group your tables so that a single query gives your results.
A good example is the Modified Preorder Tree Transversal (MPTT) to get all of a tree node parents, ordered, in a single query.

Take a holistic approach to optimization.
Consider the impact of slow disks, network latency, lack of memory, and server load.

Related

recommendations for a dbms for an EAV system with mostly insert and select operations needs on .net stack

In the project I have been working on, the data modeling requirements are:
A system consisting of N number of clients with each having N number of events. An event is an entity with a required name and timestamp at which it occurs. Optionally, an event may have N number of properties (key/value pares) defining attributes that a client want to store with the particular instance of that event.
The system will have mostly:
inserts – events are logged but never updated.
selects – reports/actions will be generated/executed based on events and properties of any possible combinations.
The requirements reflect an entity-attribute-value (EAV) data model. After researching for sometimes, I feel that a relational dbms like Sql Server might not be a good fit for this. (correct me if I'm wrong!)
So I'm leaning toward NoSql option like MongoDb/CouchDb/RavenDb etc.
My questions are:
What is the best fit in available NoSql solutions keeping in view of my system's heavy insert/select needs?
I'm also open for relational option if these requirements can be translated into relational schema. Although I personally doubt this, but after reading performance DBA answers (like referenced here), I got curious. However, I couldn't figure out myself an optimal relational model for my requirements, perhaps the system being rather generic.
thanks!
MongoDB really shines when you write unstructured data to it (like your event). Also, it is able to sustain pretty heavy write load. However, it's not very good for reporting. At least, for reporting in the traditional sense.
So, if your reporting needs are simple, you might get away with some simple map-reduce jobs. Otherwise you can export data to a relational database (nightly job, for example) and report the hell out of it.
Such hybrid solution is pretty common (in my experience).

NoSQL & AdHoc Queries - Millions of Rows

I currently run a MySQL-powered website where users promote advertisements and gain revenue every time someone completes one. We log every time someone views an ad ("impression"), every time a user clicks an add ("click"), and every time someone completes an ad ("lead").
Since we get so much traffic, we have millions of records in each of these respective tables. We then have to query these tables to let users see how much they have earned, so we end up performing multiple queries on tables with millions and millions of rows multiple times in one request, hundreds of times concurrently.
We're looking to move away from MySQL and to a key-value store or something along those lines. We need something that will let us store all these millions of rows, query them in milliseconds, and MOST IMPORTANTLY, use adhoc queries where we can query any single column, so we could do things like:
FROM leads WHERE country = 'US' AND user_id = 501 (the NoSQL equivalent, obviously)
FROM clicks WHERE ad_id = 1952 AND user_id = 200 AND country = 'GB'
etc.
Does anyone have any good suggestions? I was considering MongoDB or CouchDB but I'm not sure if they can handle querying millions of records multiple times a second and the type of adhoc queries we need.
Thanks!
With those requirements, you are probably better off sticking with SQL and setting up replication/clustering if you are running into load issues. You can set up indexing on a document database so that those queries are possible, but you don't really gain anything over your current system.
NoSQL systems generally improve performance by leaving out some of the more complex features of relational systems. This means that they will only help if your scenario doesn't require those features. Running ad hoc queries on tabular data is exactly what SQL was designed for.
CouchDB's map/reduce is incremental which means it only processes a document once and stores the results.
Let's assume, for a moment, that CouchDB is the slowest database in the world. Your first query with millions of rows takes, maybe, 20 hours. That sounds terrible. However, your second query, your third query, your fourth query, and your hundredth query will take 50 milliseconds, perhaps 100 including HTTP and network latency.
You could say CouchDB fails the benchmarks but gets honors in the school of hard knocks.
I would not worry about performance, but rather if CouchDB can satisfy your ad-hoc query requirements. CouchDB wants to know what queries will occur, so it can do the hard work up-front before the query arrives. When the query does arrive, the answer is already prepared and out it goes!
All of your examples are possible with CouchDB. A so-called merge-join (lots of equality conditions) is no problem. However CouchDB cannot support multiple inequality queries simultaneously. You cannot ask CouchDB, in a single query, for users between age 18-40 who also clicked fewer than 10 times.
The nice thing about CouchDB's HTTP and Javascript interface is, it's easy to do a quick feasibility study. I suggest you try it out!
Most people would probably recommend MongoDB for a tracking/analytic system like this, for good reasons. You should read the „MongoDB for Real-Time Analytics” chapter from the „MongoDB Definitive Guide” book. Depending on the size of your data and scaling needs, you could get all the performance, schema-free storage and ad-hoc querying features. You will need to decide for yourself if issues with durability and unpredictability of the system are risky for you or not.
For a simpler tracking system, Redis would be a very good choice, offering rich functionality, blazing speed and real durability. To get a feel how such a system would be implemented in Redis, see this gist. The downside is, that you'd need to define all the „indices” by yourself, not gain them for „free”, as is the case with MongoDB. Nevertheless, there's no free lunch, and MongoDB indices are definitely not a free lunch.
I think you should have a look into how ElasticSearch would enable you:
Blazing speed
Schema-free storage
Sharding and distributed architecture
Powerful analytic primitives in the form of facets
Easy implementation of „sliding window”-type of data storage with index aliases
It is in heart a „fulltext search engine”, but don't get yourself confused by that. Read the „Data Visualization with ElasticSearch and Protovis“ article for real world use case of ElasticSearch as a data mining engine.
Have a look on these slides for real world use case for „sliding window” scenario.
There are many client libraries for ElasticSearch available, such as Tire for Ruby, so it's easy to get off the ground with a prototype quickly.
For the record (with all due respect to #jhs :), based on my experience, I cannot imagine an implementation where Couchdb is a feasible and useful option. It would be an awesome backup storage for your data, though.
If your working set can fit in the memory, and you index the right fields in the document, you'd be all set. Your ask is not something very typical and I am sure with proper hardware, right collection design (denormalize!) and indexing you should be good to go. Read up on Mongo querying, and use explain() to test the queries. Stay away from IN and NOT IN clauses that'd be my suggestion.
It really depends on your data sets. The number one rule to NoSQL design is to define your query scenarios first. Once you really understand how you want to query the data then you can look into the various NoSQL solutions out there. The default unit of distribution is key. Therefore you need to remember that you need to be able to split your data between your node machines effectively otherwise you will end up with a horizontally scalable system with all the work still being done on one node (albeit better queries depending on the case).
You also need to think back to CAP theorem, most NoSQL databases are eventually consistent (CP or AP) while traditional Relational DBMS are CA. This will impact the way you handle data and creation of certain things, for example key generation can be come trickery.
Also remember than in some systems such as HBase there is no indexing concept. All your indexes will need to be built by your application logic and any updates and deletes will need to be managed as such. With Mongo you can actually create indexes on fields and query them relatively quickly, there is also the possibility to integrate Solr with Mongo. You don’t just need to query by ID in Mongo like you do in HBase which is a column family (aka Google BigTable style database) where you essentially have nested key-value pairs.
So once again it comes to your data, what you want to store, how you plan to store it, and most importantly how you want to access it. The Lily project looks very promising. The work I am involved with we take a large amount of data from the web and we store it, analyse it, strip it down, parse it, analyse it, stream it, update it etc etc. We dont just use one system but many which are best suited to the job at hand. For this process we use different systems at different stages as it gives us fast access where we need it, provides the ability to stream and analyse data in real-time and importantly, keep track of everything as we go (as data loss in a prod system is a big deal) . I am using Hadoop, HBase, Hive, MongoDB, Solr, MySQL and even good old text files. Remember that to productionize a system using these technogies is a bit harder than installing MySQL on a server, some releases are not as stable and you really need to do your testing first. At the end of the day it really depends on the level of business resistance and the mission-critical nature of your system.
Another path that no one thus far has mentioned is NewSQL - i.e. Horizontally scalable RDBMSs... There are a few out there like MySQL cluster (i think) and VoltDB which may suit your cause.
Again it comes to understanding your data and the access patterns, NoSQL systems are also Non-Rel i.e. non-relational and are there for better suit to non-relational data sets. If your data is inherently relational and you need some SQL query features that really need to do things like Cartesian products (aka joins) then you may well be better of sticking with Oracle and investing some time in indexing, sharding and performance tuning.
My advice would be to actually play around with a few different systems. However for your use case I think a Column Family database may be the best solution, I think there are a few places which have implemented similar solutions to very similar problems (I think the NYTimes is using HBase to monitor user page clicks). Another great example is Facebook and like, they are using HBase for this. There is a really good article here which may help you along your way and further explain some points above. http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html
Final point would be that NoSQL systems are not the be all and end all. Putting your data into a NoSQL database does not mean its going to perform any better than MySQL, Oracle or even text files... For example see this blog post: http://mysqldba.blogspot.com/2010/03/cassandra-is-my-nosql-solution-but.html
I'd have a look at;
MongoDB - Document - CP
CouchDB - Document - AP
Redis - In memory key-value (not column family) - CP
Cassandra - Column Family - Available & Partition Tolerant (AP)
HBase - Column Family - Consistent & Partition Tolerant (CP)
Hadoop/Hive - Also have a look at Hadoop streaming...
Hypertable - Another CF CP DB.
VoltDB - A really good looking product, a relation database that is distributed and might work for your case (may be an easier move). They also seem to provide enterprise support which may be more suited for a prod env (i.e. give business users a sense of security).
Any way thats my 2c. Playing around with the systems is really the only way your going to find out what really works for your case.

Example of a task that a NoSQL database can't handle (if any)

I would like to test the NoSQL world. This is just curiosity, not an absolute need (yet).
I have read a few things about the differences between SQL and NoSQL databases. I'm convinced about the potential advantages, but I'm a little worried about cases where NoSQL is not applicable. If I understand NoSQL databases essentially miss ACID properties.
Can someone give an example of some real world operation (for example an e-commerce site, or a scientific application, or...) that an ACID relational database can handle but where a NoSQL database could fail miserably, either systematically with some kind of race condition or because of a power outage, etc ?
The perfect example will be something where there can't be any workaround without modifying the database engine. Examples where a NoSQL database just performs poorly will eventually be another question, but here I would like to see when theoretically we just can't use such technology.
Maybe finding such an example is database specific. If this is the case, let's take MongoDB to represent the NoSQL world.
Edit:
to clarify this question I don't want a debate about which kind of database is better for certain cases. I want to know if this technology can be an absolute dead-end in some cases because no matter how hard we try some kind of features that a SQL database provide cannot be implemented on top of nosql stores.
Since there are many nosql stores available I can accept to pick an existing nosql store as a support but what interest me most is the minimum subset of features a store should provide to be able to implement higher level features (like can transactions be implemented with a store that don't provide X...).
This question is a bit like asking what kind of program cannot be written in an imperative/functional language. Any Turing-complete language and express every program that can be solved by a Turing Maching. The question is do you as a programmer really want to write a accounting system for a fortune 500 company in non-portable machine instructions.
In the end, NoSQL can do anything SQL based engines can, the difference is you as a programmer may be responsible for logic in something Like Redis that MySQL gives you for free. SQL databases take a very conservative view of data integrity. The NoSQL movement relaxes those standards to gain better scalability, and to make tasks that are common to Web Applications easier.
MongoDB (my current preference) makes replication and sharding (horizontal scaling) easy, inserts very fast and drops the requirement for a strict scheme. In exchange users of MongoDB must code around slower queries when an index is not present, implement transactional logic in the app (perhaps with three phase commits), and we take a hit on storage efficiency.
CouchDB has similar trade-offs but also sacrifices ad-hoc queries for the ability to work with data off-line then sync with a server.
Redis and other key value stores require the programmer to write much of the index and join logic that is built in to SQL databases. In exchange an application can leverage domain knowledge about its data to make indexes and joins more efficient then the general solution the SQL would require. Redis also require all data to fit in RAM but in exchange gives performance on par with Memcache.
In the end you really can do everything MySQL or Postgres do with nothing more then the OS file system commands (after all that is how the people that wrote these database engines did it). It all comes down to what you want the data store to do for you and what you are willing to give up in return.
Good question. First a clarification. While the field of relational stores is held together by a rather solid foundation of principles, with each vendor choosing to add value in features or pricing, the non-relational (nosql) field is far more heterogeneous.
There are document stores (MongoDB, CouchDB) which are great for content management and similar situations where you have a flat set of variable attributes that you want to build around a topic. Take site-customization. Using a document store to manage custom attributes that define the way a user wants to see his/her page is well suited to the platform. Despite their marketing hype, these stores don't tend to scale into terabytes that well. It can be done, but it's not ideal. MongoDB has a lot of features found in relational databases, such as dynamic indexes (up to 40 per collection/table). CouchDB is built to be absolutely recoverable in the event of failure.
There are key/value stores (Cassandra, HBase...) that are great for highly-distributed storage. Cassandra for low-latency, HBase for higher-latency. The trick with these is that you have to define your query needs before you start putting data in. They're not efficient for dynamic queries against any attribute. For instance, if you are building a customer event logging service, you'd want to set your key on the customer's unique attribute. From there, you could push various log structures into your store and retrieve all logs by customer key on demand. It would be far more expensive, however, to try to go through the logs looking for log events where the type was "failure" unless you decided to make that your secondary key. One other thing: The last time I looked at Cassandra, you couldn't run regexp inside the M/R query. Means that, if you wanted to look for patterns in a field, you'd have to pull all instances of that field and then run it through a regexp to find the tuples you wanted.
Graph databases are very different from the two above. Relations between items(objects, tuples, elements) are fluid. They don't scale into terabytes, but that's not what they are designed for. They are great for asking questions like "hey, how many of my users lik the color green? Of those, how many live in California?" With a relational database, you would have a static structure. With a graph database (I'm oversimplifying, of course), you have attributes and objects. You connect them as makes sense, without schema enforcement.
I wouldn't put anything critical into a non-relational store. Commerce, for instance, where you want guarantees that a transaction is complete before delivering the product. You want guaranteed integrity (or at least the best chance of guaranteed integrity). If a user loses his/her site-customization settings, no big deal. If you lose a commerce transation, big deal. There may be some who disagree.
I also wouldn't put complex structures into any of the above non-relational stores. They don't do joins well at-scale. And, that's okay because it's not the way they're supposed to work. Where you might put an identity for address_type into a customer_address table in a relational system, you would want to embed the address_type information in a customer tuple stored in a document or key/value. Data efficiency is not the domain of the document or key/value store. The point is distribution and pure speed. The sacrifice is footprint.
There are other subtypes of the family of stores labeled as "nosql" that I haven't covered here. There are a ton (122 at last count) different projects focused on non-relational solutions to data problems of various types. Riak is yet another one that I keep hearing about and can't wait to try out.
And here's the trick. The big-dollar relational vendors have been watching and chances are, they're all building or planning to build their own non-relational solutions to tie in with their products. Over the next couple years, if not sooner, we'll see the movement mature, large companies buy up the best of breed and relational vendors start offering integrated solutions, for those that haven't already.
It's an extremely exciting time to work in the field of data management. You should try a few of these out. You can download Couch or Mongo and have them up and running in minutes. HBase is a bit harder.
In any case, I hope I've informed without confusing, that I have enlightened without significant bias or error.
RDBMSes are good at joins, NoSQL engines usually aren't.
NoSQL engines is good at distributed scalability, RDBMSes usually aren't.
RDBMSes are good at data validation coinstraints, NoSQL engines usually aren't.
NoSQL engines are good at flexible and schema-less approaches, RDBMSes usually aren't.
Both approaches can solve either set of problems; the difference is in efficiency.
Probably answer to your question is that mongodb can handle any task (and sql too). But in some cases better to choose mongodb, in others sql database. About advantages and disadvantages you can read here.
Also as #Dmitry said mongodb open door for easy horizontal and vertical scaling with replication & sharding.
RDBMS enforce strong consistency while most no-sql are eventual consistent. So at a given point in time when data is read from a no-sql DB it might not represent the most up-to-date copy of that data.
A common example is a bank transaction, when a user withdraw money, node A is updated with this event, if at the same time node B is queried for this user's balance, it can return an outdated balance. This can't happen in RDBMS as the consistency attribute guarantees that data is updated before it can be read.
RDBMs are really good for quickly aggregating sums, averages, etc. from tables. e.g. SELECT SUM(x) FROM y WHERE z. It's something that is surprisingly hard to do in most NoSQL databases, if you want an answer at once. Some NoSQL stores provide map/reduce as a way of solving the same thing, but it is not real time in the same way it is in the SQL world.

NoSQL databases alternatives/patterns/equivalences to Relational DB

From what I saw in this video...
http://www.youtube.com/watch?v=LhnGarRsKnA
pretty much all the traditional RDBM operations (JOINS, GROUP BY, HAVING, etc) can be done in NoSQL databases through a combination of MapReduce/denormalization techniques.
Is there any article/document that has all these equivalences clearly described. Something like... the equivalence of a JOIN in a NoSQL database would be... bla bla
I just can't find this kind of documentation anywhere :(
There's a reason you can't find that type of documentation. For a start, unlike SQL, there is no such thing as a standard NoSQL database. Have you tried searching for specific NoSQL data stores?
Also, trying to convert relational operations to non relational systems will just get you into trouble. Instead you need to look at what you are trying to do with those relational operations. For example is group by for sorting a list with categories or for handling heirarchical objects when you don't have multiple value fields? Is join assembling a single object stored in multiple tables or calculating a set intersection?
One of the biggest strengths of SQL is that any data can be represented in a standard way and any query can be run on that data. It won't necessarily be the best fit for that data, but it is standard and there is a single correct answer for almost any question. NoSQL is mostly about being able to optimize your data store for what you actually need by sacrificing the generality of SQL. That may be performance, handling a large dataset, handling inconsistent data, or just simpler code. In short you need to understand your requirements and the tradeoffs involved in optimizing for them rather than just choosing SQL by default.
Your best option is to pick a datastore that fits what you need (a good comparison of high level features is at
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis ) and look for some examples of how systems are designed using that data store. With luck you will find something close to what you are working on.
This article has a some good info on NoSQL vs using SQL
http://www.develop.com/mongoDB
Be very careful... you probably don't want to join a 1 trillion row table with something. Think about patterns of partitioning your data. For example with playOrm you can partition your data and do S-SQL(Scalable SQL) so you can do all the joins you want to within a partition which in many OLTP apps is exactly what you need as if your customers are businesses, each business is in it's own partition and you can join all you want all the tables related to that one business.
Here is a list of patterns to help you gain ground in the noSql world....(it's a work in progress).
https://github.com/deanhiller/playorm/wiki/Patterns-Page

Why “Set based approaches” are better than the “Procedural approaches”?

I am very eager to know the real cause though earned some knowledge from googling.
Thanks in adavnce
Because SQL is a really poor language for writing procedural code, and because the SQL engine, storage, and optimizer are designed to make it efficient to assemble and join sets of records.
(Note that this isn't just applicable to SQL Server, but I'll leave your tags as they are)
Because, in general, the hundreds of man-years of development time that have gone into the database engine and optimizer, and the fact that it has access to real-time statistics about the data, have resulted in it being better than the user in working out the best way to process the data, for a given request.
Therefore by saying what we want to achieve (with a set-based approach), and letting it decide how to do it, we generally achieve better results than by spelling out exactly how to provess the data, line by line.
For example, suppose we have a simple inner join from table A to table B. At design time, we generally don't know 'which way round' will be most efficient to process: keep a list of all the values on the A side, and go through B matching them, or vice versa. But the query optimizer will know at runtime both the numbers of rows in the tables, and also the most recent statistics may provide more information about the values themselves. So this decision is obviously better made at runtime, by the optimizer.
Finally, note that I have put a number of 'generally's in this post - there will always be times when we know better than the optimizer will, and for such times we can provide hints (NOLOCK etc).
Set based approaches are declarative, so you don't describe the way the work will be done, only what you want the result to look like. The server can decide between several strategies how to complay with your request, and hopefully choose one that is efficient.
If you write procedural code, that code will at best be less then optimal in some situation.
Because using a set-based approach to SQL development conforms to the design of the data model. SQL is a very set-based language, used to build sets, subsets, unions, etc, from data. Keeping that in mind while developing in TSQL will generally lead to more natural algorithms. TSQL makes many procedural commands available that don't exist in plain SQL, but don't let that switch you to a procedural methodology.
This makes me think of one of my favorite quotes from Rob Pike in Notes on Programming C:
Data dominates. If you have chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
SQL databases and the way we query them are largely set-based. Thus, so should our algorithms be.
From an even more tangible standpoint, SQL servers are optimized with set-based approaches in mind. Indexing, storage systems, query optimizers, and other optimizations made by various SQL database implmentations will do a much better job if you simply tell them the data you need, through a set-based approach, rather than dictating how you want to get it procedurally. Let the SQL engine worry about the best way to get you the data, you just worry about telling it what data you want.
As each one has explained, let the SQL engine help you, believe, it is very smart.
If you do not use to write set based solution and use to develop procedural code, you will have to spend some time until write well formed set based solutions. This is a barrier for most people. A tip if you wish to start coding set base solutions is, stop thinking what you can do with rows, and start thinking what you can do with collumns, and do practice functional languages.