I am trying to find are there any differences between SQL Server 2014 Enterprise and Standard editions in the context of the T-SQL itself. I am aware that there are tools and hardware limitations, for example.
I need to know if there are any T-SQL limitations like:
query to run faster on Enterprise
index seek/scan to run faster on Enterprise
updatable columned stored index to be available for Enterprise only
According this article comparing the Programmability between the editions there is not. Anyway I want to double check it and be sure the performance will be the same (in the context of same hardware) and I will not need to change anything in the T-SQL code.
Example of such difference is the Direct query of indexed views (using NOEXPAND hint):
In SQL Server Enterprise, the query optimizer automatically considers
the indexed view. To use an indexed view in the Standard edition or
the Datacenter edition, the NOEXPAND table hint must be used.
A number of features that are categorized under "Data Warehouse" and Scalability and Performance affect Programmability.
This comes into play when a developer modifies their TSQL syntax to take advantage of a particular feature. The syntax will not necessary produce an error, but it may be less efficient. Partitioned tables, indexed views (as you mention), compression, star joins, for example, will all affect the execution plan. The query optimizer is usually smart enough to find the best execution plan for a given edition, however that is not always the case.
It is also likely that, if you're dealing with a large database, that the optimal indexing strategy may differ from Enterprise to Standard, which in turn might effect the query.
To the degree that different editions suggest different TSQL syntax, the Standard Edition syntax is usually the more intuitive. I would also say that in most modern environments you one is much more likely to be affected by the resource limitations than you are by query optimizer differences.
Related
My client has an existing PostgreSQL database with around 100 tables and most every table has one or more relationships to other tables. He's got around a thousand customers who use an app that hits that database.
Recently he hired a new frontend web developer, and that person is trying to tell him that we should throw out the PostgreSQL database and replace it with a MongoDB solution. That seems odd to me, but I don't have experience with MongoDB.
Is there any clear reasons why he should, or should not, make the change? Obviously I'm arguing against it and the other guy for it, but I would like to remove the "I like this one better" from the argument and really hear from the community on their experience with such things.
1) Performance
During last years, there were several benchmarks comparing Postgres and Mongo.
Here you can find the most recent performance benchmark (Yahoo): https://www.slideshare.net/profyclub_ru/postgres-vs-mongo-postgres-professional (start with slide #58, where some overview of the past becnhmarks is given).
Notice, that traditionally, MongoDB provided benchmarks, where they didn't turn on write ahead logging or even turned fsync off, so their benchmarks were unfair -- in such states the database system doesn't wait for filesystem, so TPS are high but probability to lose data is also very high.
2) Flexibility – JSON
Postgres has non-structured and semistructured data types since 2003 (hstore, XML, array data types). And now has very strong JSON support with indexing (jsonb data type), you can create partial indexes, functional indexes, index only part of JSON documents, index whole documents in different manners (you can tweek index to reduce it's size and speed).
More interestingly, with Postgres, you can combine relational approach and non-relational JSON data – see this talk again https://www.slideshare.net/profyclub_ru/postgres-vs-mongo-postgres-professional for details. This gives you a lot of flexibility and power (I wouldn't keep money-related or basic accounts-related data in JSON format).
3) Standards and costs of support
SQL experiences new born now -- NoSQL products started to add SQL dialects, there is a lot of people making big data analysis with SQL, you can even run machine learning algorithms inside RDBMS (see MADlib project http://madlib.incubator.apache.org).
When you need to work with data, SQL was, is and will be for long time the best language – there are such many things included to it, so all other languages are lagging too much. I recommend http://modern-sql.com/ to learn modern SQL features and https://use-the-index-luke.com (from the same author) to learn how reach the best performance using SQL.
When Mongo needed to create "BI connector", they also needed to speak SQL, so guess what they chose? https://www.linkedin.com/pulse/mongodb-32-now-powered-postgresql-john-de-goes
SQL will go nowhere, it's extended with SQL/JSON now and this means that for future, Postgres is an excellent choice.
4) Scalability
If you data size is up to several terabytes -- it's easy to live on "single master - multiple replicas" architectuyre either on your own installation or in clouds (Amazon RDS, Heroku, Google Cloud Platform, and since recently, Azure – all them support Postgres). There is an increasing number of solutions which help you to work with microservice architecture, have automatic failover, and/or shard your data. Here is only few of them, which are actively developed and supported, without specific order:
https://wiki.postgresql.org/wiki/PL/Proxy
https://github.com/zalando/spilo and https://github.com/zalando/patroni
https://github.com/dalibo/PAF
https://github.com/postgrespro/postgres_cluster
https://www.2ndquadrant.com/en/resources/bdr/
https://www.postgresql.org/docs/10/static/postgres-fdw.html
5) Extensibility
There are much more additional projects built to work with Postgres than with Mongo. You can work with literally any data type (including but not limited to time ranges, geospatial data, JSON, XML, arrays), have index support for it, ACID and manipulate with it using standard SQL. You can develop your own functions, data types, operators, index structures and much more!
If your data is relational (and it appears that it is), it makes no sense whatsoever to use a non-relational db (like mongodb). You can't underestimate the power and expressiveness of standard SQL queries.
On top of that, postgres has full ACID. And it can handle free-form JSON reasonably well, if that is that guy's primary motivation.
I am currently developing a software system that imports and normalizes historical data in various formats (XML, JSON, CSV). As of right now, we are using SQL server, and are looking to find the best replacement for this tool (Postgres or NoSQL). 90% of the time, the (archived/historical/static)data is accessed via a web client, and is used in a READ only format with users picking and choosing canned reports. Changes to the data only occur to update incorrect information .
The replacement DB must be able to store and report on 10s of millions of rows very quickly, and scale across multiple servers with ease (data replication, clustering, etc). There must also be data integrity, so if I update one KPI (lets say Cost per Hr), then all the reports that rely on the KPI will be updated accordingly.
Having no prior experience with NoSQL databases, I am wondering if it is even the right choice to use in a reporting software. We would like to allow for users to create their own custom reports, and that means being able to query any data as opposed to our canned reports, but I don't know if this would throw a wrench in the comparison between SQL vs NoSQL.
There are a few too many variables in the question, to comfortably answer it in entirety, but here's an attempt.
Your choice in SQL vs NoSQL should be based on data structure. Scalability is generally a second-tier concern, and is only slightly easy on some NoSQL platforms, but as always, isn't always free.
If you're looking for 10s of millions of rows 'very quickly' you are seriously testing the limits of what you can do with it. An RDBMS would allow you a plethora of options at the cost of speed, and a NoSQL although quite fast an inputting at that speed, would make you code most of the RDBMS smartness in your application. Chose your poison.
Updating a metric and 'automagically' updating reports is clearly a business-logic smartness, that shouldn't be tied down to platform selection.
PostgreSQL has in the near past, really picked up a lot of arsenal to deal with file formats (JSON et al) and is clearly worth a try (sans easy scalability).
Having said that, you should really investigate Postgres' otherwise forgotten asset, FDWs. You can clearly consider using a NoSQL setup to ingest large unstructured data, and thence utilize postgres' powerful semantics to use that and create a asynchronous yet structured backend for your application. If done well, that could mean the best of both worlds.
My projects are currently SQL Server focused (with a little Postgres and MongoDB thrown in for fun). A recent project involving some configuration on Oracle reminded me of the complexity of implementing and managing Oracle RDBMS instances compared to the above.
Having dealt with DB2 on OS/2 many years ago, I decided to download a trial, and install it on CentOS for comparison. It was a fairly quick and easy implementation, including docs, and sample data.
Noting that DB2 LUW seems to get relatively little attention, I am wondering why? In certain editions, it is price competitive, and by many measures, is highly capable and scalable.
So, I am interested in knowing, if you use DB2 Express(-C), WSE, or EE, on Linux or Windows, could you share why (if it is your database of choice) ?
I work with DB2 for LUW across the spectrum : from high end Enterprise Server Edition in large enterprises to DB2 Express-C in SMBs.
In my opinion DB2 Express-C is absolutely brilliant for the SMB market. There is virtually no functionality that you would need as an SMB which does not exist in Express-C and all the major technology from the more expensive DB2 editions are there including pureXML (which I use extensively) and (with the Express-C support at $3k per server) full HADR support.
Things which are not in Express-C are -
Oracle compatibility support (ability to run Oracle PL/SQL rather than DB2's standard SQL/PL) : not an issue unless you plan to migrate an existing Oracle application. Note that many of the features underpinning this are available, including such things as associative arrays which you mentioned.
Deep Compression : the DB2 compression which I've found can save us up to 70% of the disk space on DB2 ESE. But SMBs don't tend to have the amount of data which would justify the extra cost of the Compression license even if you could buy it (you are talking about many terabytes of storage before it becomes worthwhile at the current price point). All that stops you using this compression is that you can't buy a license for it however.
Some of the partitioning capabilities are also not available in Express-C : but again partitioning is something that really is only needed by the highest end customers. In fact at least one type of partitioning (DPF) is not even available with ESE : you have to buy InfoSphere Warehouse (what used to be called DB2 Data Warehousing Edition) to get this these days.
If you want those you'd have to buy DB2 ESE (at a major price premium).
There is one other time when I'd recommend something other than Express-C these days, and that is if you want the ultra-scaleability of pureScale. This is an extra cost option on DB2 ESE, but actually comes included (limited only by the total number of processors you can have in the cluster on WSE).
Anyway, the bottom line is that I would recommend DB2 (and especially Express-C) to almost anyone these days. I think the reason you don't hear more of it is because IBM just doesn't do a good job of marketing it.
HTH
Phil Nelson
(teamdba#scotdb.com)
We use DB2 LUW at work (though, I speak only for me, not for work). I like that:
It's fast, and has neat tools that help you make your queries faster.
It has facilities for high availability (HADR).
It has XML support, which may or may not be useful to you (but we don't currently use that at work).
Its procedural language is easy to use (if rather lacking in features, especially for versions prior to 9.7).
It has excellent documentation.
(The decision to use DB2 at work was made long before I started there, so I can't comment on work's rationale for choosing it.)
The only thing I would add to Phil Nelson's excellent answer is that DB2 Express-C is currently unique among the no-cost commercial DBMS products in that it does not limit the size of the database. The newest versions of Microsoft's and Oracle's no-cost database engines top out at around 10-11GB of data.
I want to write some queries which can work in almost all the databases without any SQLExceptions. So, where can I get the ANSI standards to write the queries ?
Not sure that'll help you.
Vendors are touch and go as far as standards implementation and often the standards themselves are imprecise enough such that you could never write a query that would work with all implementors.
For example, SQL 92 defines the concatenation operator as || but neither MySQL nor MSSQL use this (Oracle does). Vendor independent string concatenation is impossible.
Similarly, a standard escape character is not specified so how you handled that might not work in all vendors.
Having said that:
SQL 92:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
Wiki article with links to SQL 99 ISO documents:
http://en.wikipedia.org/wiki/SQL:1999
From wikipedia:
The SQL standard is not freely available. The whole standard may be purchased from the ISO as ISO/IEC 9075(1-4,9-11,13,14):2008.
Nevertheless I would not advise you to follow this strategy because no database engine follows any SQL standard (SQL 99, 2003, etc.) to the letter. All of them take liberties in the way they handle instructions or define variables (for example, when comparing two strings different engines handle case sensitivity differently). A method that is very efficient with one engine can be terrible inefficient for another.
A suggestion would be to develop a standard group of queries and develop different classes that contain the specific implementation of that query for a certain target RDBMS.
Hope this helped
Check out the BNF of the core SQL grammars available at http://savage.net.au/SQL/
This is part of the answer - the rest, as pointed out by Kiranu and MattMitchell, is that different vendors implement the standard differently. No DBMS adheres perfectly to even SQL-92, though most are pretty close.
One observation: the SQL standard says nothing about indexes - so there is no standard syntax for creating an index. It also says nothing about how to create a database; each vendor has their own mechanisms for doing that.
The Sql-92 standard is probably the one you want to target. I believe it's supported most of the major RDBMSs.
Here is a less terse link. Sample content:
PostgreSQL Has views. Breaks standard by not allowing updates to views...
DB2 Conforms to at least SQL-92.
MSSQL Conforms to at least SQL-92.
MySQL Conforms to at least SQL-92.
Oracle Conforms to at least SQL-92.
Informix Conforms to at least SQL-92.
Something else you might consider, if you're using .NET, is to use the factory pattern in System.Data.Common which does a good job of abstracting provider specifics for a number of RDBMSs.
If you are trying to make a product that will work against multiple databases I think trying to only use standard sql is not the way to go, as other answers have indicated, due to the different 'interpretations' of the standard. Instead you should if possible have some kind of data access layer in your application which has different implementations specific for each database. Depending on what you are trying to do, there are tools such as Hibernate which will so a lot of the heavy lifting in regards to this for you.
Can we get a list of basic optimization techniques going (anything from modeling to querying, creating indexes, views to query optimization). It would be nice to have a list of these, one technique per answer. As a hobbyist I would find this to be very useful, thanks.
And for the sake of not being too vague, let's say we are using a maintstream DB such as MySQL or Oracle, and that the DB will contain 500,000-1m or so records across ~10 tables, some with foreign key contraints, all using the most typical storage engines (eg: InnoDB for MySQL). And of course, the basics such as PKs are defined as well as FK contraints.
Learn about indexes, and use them properly. Generally speaking*, follow these guidelines:
Every table should have a clustered index
Fields used for filters and sorts are good candidates for indexing
More selective fields are better candidates for indexing
For best performance on crucial queries, design "covering indexes" for those queries
Make sure your indexes are actually being used, and remove those that aren't
If your table has 15 fields, and you make 15 indexes, each with only a single field, you're doing it wrong :)
*There are some exceptions to these rules if you know what you're doing. My experience is Microsoft SQL Server, but I would presume most of this advice would still apply to a different RDMS.
IMO, by far the best optimization is to have the data model fit the problem domain for which it was built. When it does not, the resulting symptom is difficult-to-write or convoluted queries in order to get the information desired and that typically rears itself when reports are built against the database. Thus, in designing a database it helps to have an idea as to the types and nature of the information, such as reports, that the users will want from the system.
When talking database design, check out the database normalization, e.g. the wikipedia article: Normal forms.
If you have a good design and still you need to optimize for performance, try Denormalisation.
If you have specific needs which are not covered by relational model efficiently, look at other models covered by the term NoSQL.
Some query/schema optimizations:
Be mindful when using DISTINCT or GROUP BY. I find that many new developers will use DISTINCT in places where it really is not needed or could be rewritten more efficiently using an Exists statement or a derived query.
Be mindful of Left Joins. All too often I find new SQL developers will ignore the schema in place and use Left Joins where they really are not necessary. For example:
Select
From Orders
Left Join Customers
On Customers.Id = Orders.CustomerId
If Orders.CustomerId is a required column, then it is not necessary to use a left join.
Be a student of new features. Currently, MySQL does not support common-table expressions which means that some types of queries are cumbersome and probably slower to write than they would be if CTEs were supported. However, that will not be true forever. Keep up on new syntax features in MySQL which might be used to make existing queries more efficient.
You do not have to use surrogate keys everywhere. There might be tables better suited to an intelligent key (e.g. US State abbreviations, Currency Codes etc) which would enable developers to avoid additional joins in many cases.
If possible, find ways of archiving data to an OLAP or reporting server. The smaller you can make the production data, the faster it will run.
A design that concisely models your problem is always a good start. Overgeneralizing the data model can lead to performance problems. For example, I've heard reports of projects striving for uber-flexibility that use the RDBMS as a dumb "name/value" store - and resulting performance was appalling.
Once a good design is in place, then use the tools provided by the RDBMS to help it achieve good performance. Single field PKs (no composites), but composite business keys as an index with unique constraint, use of appropriate data types, e.g. using appropriate numeric types for numeric values rather than char or similar. Physical attributes of the hardware the RDBMS is running on should also be considered, since the bulk of query time is often disk I/O - but of course don't take this for granted - use a profiler to find out where the time is going.
Depending upon the update/query ratio, materialized views/indexed views can be useful in improving performance for slow running queries. A poor-man's alternative is to use triggers to invoke a procedure that populates the table with a result of a slow-running, infrequently-changed view.
Query optimization is a bit of a black art since it is often database-dependent, but some rules of thumb are given here - Optimizing SQL.
Finally, although possibly outside the intended scope of your question, use a good data access layer in your application, and avoid the temptation to roll your own - there are surely tested and performant implementations available for all major languages. Use of caching at the data access layer, middle tier and application layer can help improve performance considerably.
Do use less query whenever possible. Use "JOIN", and group your tables so that a single query gives your results.
A good example is the Modified Preorder Tree Transversal (MPTT) to get all of a tree node parents, ordered, in a single query.
Take a holistic approach to optimization.
Consider the impact of slow disks, network latency, lack of memory, and server load.