median/percentile OOTB sql server 2017 - tsql

I am pretty sure later versions of sql server can calculate quantiles/percentiles as easy as using AVG() (the arithmetic mean) in conjunction with GROUP BY for some columns. I am almost certain I have used this a couple of years ago instead of more complex approaches like this. Unfortunately, I cannot find any information despite extensive searches. Any help appreciated. Thanks!

Related

Performance issues using psqlodbc in Excel/PowerQuery [duplicate]

This question already exists:
psqlodbc performance issues when using functions [duplicate]
Closed 1 year ago.
Already asked the question here but it was falsely marked as closed.
I use psqlODBC and PowerQuery to load data from a PostgreSQL (12.4) to Microsoft Excel. Recently I am having performance issues I cannot explain. Some of the functions take extremely long (15 minutes and longer) to pull the data via ODBC. Some of the functions take only seconds. When running the same functions using pgAdmin, Beekeeper or psql from command line none of them takes more than 500ms for execution.
Is there anything I can do to improve the performance of psqlODBC?
I found this answer stating that it might be possible that the aggregations are done client side. That would be really bad as the tables contain several million rows and I would only want to send the aggregated columns back. Can someone give more information to that? I wasn't able to find anything about that.
What I've tried so far
I replaced a function by a view, which seems to solve the problem. However, this is not a general solution as there are some functions that contain logic that cannot be done in a view.
I used pg_stat_statements and auto_explain to find out if there is a problem in the functions. I did not find anything suspicious. As mentioned before, the problem only occurs when using psqlODBC, so I expect the problem to be anywhere there.
Modifying the psqlODBC config's parameters. No success so far.
Details
The functions in postgreSQL are usually defined like this:
CREATE FUNCTION func(parameter integer) RETURNS TABLE(<col1> <type1>, ...)
LANGUAGE 'plpgsql' COST 100 STABLE PARALLEL UNSAFE ROWS 500
AS $BODY$
BEGIN
RETURN QUERY
<SELECT ...>
END
I use functions instead of views because some of them contain logic that cannot be done in views.
The server is running PostgreSQL 12.4, client uses psqlODBC 12.02, Excel 2016 (latest).
I am thankful for every piece of advice here how I could solve my problem.
Thank you.

Drools for rating telco records

Has anyone successfully used Drools as a kind of "rating engine" before? What are your experiences?
I'm trying to process a couple of millions of records (of slightly different types) and apply rating/pricing to these records.
Rating would be based of tables or database lookups as well as chains of if/then/else/else/else/else conditions using the lookup data.
Traditional rating engines don't employ rule mechanisms in ways that I'm comfortable with...
thanks for your help
To provide a slightly more informative response (although your question can't be answered based on the very vague description you've given), your "rating" is just one of the many names for what I use to call "classification problem". It has been solved many times using Drools.
However, this doesn't mean to say that your problem, with its particular environmental flavour and expected performance (how fast do you want to have the 2M records processed?) can be solved best using Drools - especially when the measure for deciding the quality isn't settled. (For instance: Is ease of maintenance more important than top efficiency?)
Go ahead and rig up a prototype and run a test to see how it goes. That will give you a more reliable answer than anything else. If someone says that something similar couldn't be done, it could be due to bad rule coding. If someone says that something similar was done successfully, it may not have had one of the quirks of your setup. And so on.

Which NoSQL technology can replace MOLAP cubes for instantaneous queries?

I was wondering if you could tell me which NoSQL db or technology/tools should I use for my scenario. We are looking at replacing our OLAP cubes based on SQL server Analysis services with an open source technology coz the data is getting too huge to manage and queries are taking too long to return. We have followed every rule in the book to shard the data, optimize the design of the cube by using aggregations and partitions etc and still some of our distinct count queries take 1-2 mins :( The data size of our fact table is roughly around 250GB. And there are 10-12 dimensions connected in star schema fashion.
So we decided to give open source technologies like Hadoop/HBase/NoSQL dbs a try to see if they can solve our OLAP scenarios with minimal setup and onboarding.
Our main requirements for the new technology are
It has to get blazing fast or instantaneous results for distinct count queries ( < 2 secs)
Supports the concept of measures and dimensions (like in OLAP).
Support SQL like query language as many of our developers are SQL experts.
Ability to connect Excel/Tableau to visualize the data.
As there are so many new technologies and tools in the open source world today, I was hoping if you can help me point to the right direction.
Notes: I'm from Apache Kylin team.
Please refer to below answers which may bring some idea for you:
Our main requirements for the new technology are
It has to get blazing fast or instantaneous results for distinct count queries ( < 2 secs)
--Luke: 90%tile query latency less than 5s is our current statistics. For <2s on distinct count, how many data you will have? Is approximate result ok?
Supports the concept of measures and dimensions (like in OLAP).
--Luke: Kylin is pure OLAP engine which has dimension (supports hierarchy also) and measure (Sum/Count/Min/Max/Avg/DistinctCount) definition
Support SQL like query language as many of our developers are SQL experts.
--Luke: Kylin support ANSI SQL interface (most SELECT functions)
Ability to connect Excel/Tableau to visualize the data.
--Luke: Kylin has ODBC Driver works very well with Tableau, Excel/PowerBI will coming soon.
Please let's know if you have more questions.
Thanks.
Looks like "Kylin" http://www.kylin.io/ is my answer. This has all the requirements that I wanted and even more. I'm gonna give it a try now! :)

How to store opening hours semantically in NoSQL?

I'm wondering what the best way would be to store opening hours and retrieving if a certain place is open right now (or to a specific time). For humans, Mo-Fr 9am-5pm, Sa 10am-2pm is fine, but how can I get a computer to understand that and query it in a NoSQL / document based database like Elasticsearch?
FWIW: David Smiley (one of the Solr / Lucene guru's ) and I have come up with a working solution (on epaper, never implemented at least by me) in Solr. The solution could be somewhat simplified if you only require 1 -slot per day of week, which may be what you want.
http://lucene.472066.n3.nabble.com/Modeling-openinghours-using-multipoints-td4025336.html
Problem is that this is based on fairly new spatial-stuff in Solr 4 (which stuff -> read the post), which I'm pretty sure hasn't made it's way into ES yet although I might be mistaken.
No guarentees, no docs :)
A straightforward alternative, if indeed you only have 1 -slot per day of week is to have 14 dynamic-fields, representing 7 closing and 7 opening-hours and do a simple boolean-query on the correct fields.

Why to build a SSAS Cube?

I was just searching for the best explanations and reasons to build a OLAP Cube from Relational Data. Is that all about performance and query optimization?
It will be great if you can give links or point out best explanations and reasons for building a cube, as we can do all the things from relational database that we can do from cube and cube is faster to show results.Is there any other explanation or reasons?
There are many reasons why you should use a cube for analytical proccessing.
Speed. Olap wharehouses are read only infrastractures providing 10 times faster queries than their oltp counterparts. See wiki
Multiple data integration. On a cube you can easily use multiple data sources and do minimal work with many automated tasks (especially when you use SSIS) to intergrate them on a single analysis system. See elt process
Minimum code. That is, you need not write queries. Even though you can write MDX - the language of the cubes in SSAS, the BI Studio does most of the hard work for you. On a project I am working on, at first we used SSRS to provide reports for the client. The queries were long and hard to make and took days to implement. Their SSAS equivalent reports took us half an hour to make, writing only a few simple queries to trasform some data.
A cube provides reports and drill up-down-through, without the need to write additional queries. The end user can traverse the dimension automatically, as the aggregations are already stored in the warehouse. This helps as the users of the cube need only traverse its dimensions to produce their own reports without the need to write queries.
Is is part of the Bussiness Intelligence. When you make a cube it can be fed to many new technologies and help in the implementation of BI solutions.
I hope this helps.
If you want a top level view, use OLAP. Say you have millions of rows detailing product sales and you want to know your monthly sales totals.
If you want bottom-level detail, use OLTP (e.g. SQL). Say you have millions of rows detailing product sales and want to examine one store's sales on one particular day to find potential fraud.
OLAP is good for big numbers. You wouldn't use it to examine string values, really...
It's bit like asking why using JAVA/C++ when we can do everything with Assembly Language ;-) Building a cube (apart from performance) is giving you the MDX language; this language has higher level concepts than SQL and is better with analytic tasks. Perhaps this question gives more info.
My 2 centavos.