If I have a PostgreSQL server running with my data already structured in facts and dimensions, how can I run MDX queries against it?
Let's suppose each row in the fact table is a sale, so the fact table has the following columns: id, product_id, country_id and amount.
And the dimension tables are very simple: product_id and product_name, and country_id and country_name.
How should I proceed to be able to run MDX queries against this data? I tried downloading Mondrian but I found it very hard to use.
Please keep in mind I am not a developer, so my technical skills are limited; I work at an investment fund and I want to be able to run more powerful analysis on our data sets. But I do have some basic knowledge on SQL and I can code a little bit in Ruby.
As you already have a DWH (data warehouse) in PostgreSQL which contains dimension tables and fact tables, now you are two steps from building simple analysis solution. The solution I recommend consists of:
DWH: PostgreSQL
OLAP server: Mondrian OLAP (OLAP schema workbench tool)
Analysis tool: Saiku Analysis application (you can preview Saiku demo here)
Steps:
Download the OLAP schema workbench tool. Using this tool you can create Mondrian OLAP schema easily on the top of the existing tables (dimensions, facts) of your DWH.
Once you create the OLAP schema, download the Saiku Analysis application, configure it to use your OLAP schema and your DWH
Run Saiku - you can run MDX queries on the DWH or do ad-hoc data analysis by drag&drop of measures (amount, etc.) and dimensions (product name, country name).
Related
For a project I need two types of tables.
hypertable (which is a special type of table in PostgreSQL (in PostgreSQL TimescaleDB)) for some timeseries records
my ordinary tables which are not timeseries
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it? Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB? If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
If I can, does it have any benefit if I store my ordinary table on a separate ordinary PostgreSQL database?
Can I create a PostgreSQL TimescaleDB and store my ordinary tables on it?
Absolutely... TimescaleDB is delivered as an extension to PostgreSQL and one of the biggest benefits is that you can use regular PostgreSQL tables alongside the specialist time-series tables. That includes using regular tables in SQL queries with hypertables. Standard SQL works, plus there are some additional functions that Timescale created using PostgreSQL's extensibility features.
Are all the tables a hypertable (time series) on a PostgreSQL TimescaleDB?
No, you have to explicitly create a table as a hypertable for it to implement TimescaleDB features. It would be worth checking out the how-to guides in the Timescale docs for full (and up to date) details.
If no, does it have some overhead if I store my ordinary tables in PostgreSQL TimescaleDB?
I don't think there's a storage overhead. You might see some performance gains e.g. for data ingest and query performance. This article may help clarify that https://docs.timescale.com/timescaledb/latest/overview/how-does-it-compare/timescaledb-vs-postgres/
Overall you can think of TimescaleDB as providing additional functionality to 'vanilla' PostgreSQL and so unless there's a reason around application design to separate non-time-series data to a separate database then you aren't obliged to do that.
One other point, shared by a very experienced member of our Slack community [thank you Chris]:
To have time-series data and “normal” data (normalized) in one or separate databases for us came down to something like “can we asynchronously replicate the time-series information”?
In our case we use two different pg systems, one replicating asynchronously (for TimescaleDB) and one with synchronous replication (for all other data).
Transparency: I work for Timescale
We are working on a audit system where auditor are given access to transaction processed in last quarter. Auditor performs various analysis on the data to find out invalid/erroneous transactions that have some exceptions.
Generally, these analysis requires data to be present on some charts to view the out-layers or sometime duplication detection are done based on multiple columns.
Sometime exception detection algorithm are pretty involved that require multiple processing steps using stored procedure.
Please note that analysis rarely involves aggregation on huge rows.
Occasionally , they can change some data if they find it missing or incorrect.
We are evaluating row based (sql & nosql databases) and column store (like data warehouse systems).
Is this a use case for datawarehouse or row based store, like nosql or some RDBMS?
In short, requirements are:
- Occasional update
- Mostly read queries over last 3/months of data
- Reading data my require several messaging steps, like creating temp table in step 1, forming join with another table in step rule, delete some rows ect.
Thanks
For your task, it does not really matter how the data is stored. You need to think instead how to create a solid dimensional model, populate it with data properly, and what reporting tools to use.
To give you an example, here are a couple of common setups I've used in my projects:
Microsoft stack setup:
SQL Server for data storage
SSIS for data ETL (or write your own stored procedures if you know what you are doing)
Publish dimensional model on the same SQL Server. If your data set is large (over billion records), use SSAS Tabular instead
Power Pivot or Power BI for interactive reporting, or SSRS for paginated reports.
Open-source setup:
PostgreSQL for data storage
Use stored procedures and/or Python to process data
Publish dimensional model to another PostgreSQL database. If your data is large, publish the dimensional model to Redshift or
other columnar database
Use Tableau or Power BI for interactive reporting, or build your own reporting interface.
I think NoSQL database is a wrong choice here because audit will require highly structured data.
We're using Tableau 10.5.6. I used a reporting tool years ago called Oracle Sales Analyzer. In that tool you could get to the queries generated by the reports and graphs you created through back-end catalogs using their command line.
There you could rewrite the query to be more efficient by fine-tuning the code if you needed. It was a very cool feature of that reporting tool for geeks like me who like to dive into the back end of the product and tune it at a very low level.
My question is, does Tableau have any of this type of facility? Is there a way to get to the queries that get stored once you create a report or a graph. Also is there command line where you can access these catalogs if they exist? Otherwise are these queries just stored in ASCII flat files that can be accessed by a user.
Thanks!
There are two ways that Tableau will query a database.
Option 1: Custom SQL
In your data source, you paste in the sql you have written and Tableau will pass that query through to the database. This gives you complete control over the sql, including adding any indexing hints you may want. See https://onlinehelp.tableau.com/current/pro/desktop/en-us/customsql.html
Option 2: Use the Tableau data source designer
This is what many people do. Here, you visually design your data source with the joins. Tableau translates that design into what the Hyper engine considers to be the most effective way to run the query. Sometimes, Hyper translates that into a regular sql statement. Sometimes it does some additional things to help boost performance, like breaking it up into different queries. A lot depends on the db engine you are connecting to. There is no "sql" stored in a flat file for this. Tableau just translates your design at run-time. The Hyper engine does a good job with fine-tuning, assuming you have an efficient database design with proper indexing and current table statistics.
There is a way to see the sql from option 2 at run-time using Performance Recording. Performance Recording keeps track of each step of the visualization process and will spit out the sql statement(s) that Tableau ran to generate your dataset. The sql is not stored in the twb file though, it's a run-time analysis.
I want to connect two database and establish a relationship between them in tableau. One from sql sever and another from Microsoft excel sheet. How to do that?
I have goggled a lot for that but could not get a suitable answer.
You are speaking about Data Blending -
And for connecting cross database data
Cross Database Querying is a Flagship Upgrade to Tableau 10.0
However, you cannot use cross-database joins with these below connection types:
Tableau Server
Firebird
Google Analytics
Microsoft Analysis Services
Microsoft PowerPivot
Odata
Oracle Essbase
Salesforce
SAP BW
Splunk
Teradata OLAP Connector
You just need to connect to each database separately and make sure they have the same column names. When creating a sheet when you switch between datasources you will see a chain on the linked fields.
Do note that this is not properly joined but is just blended data, it would be best to create another table in your sql database for the excel sheet.
I am working on a project where I need to programmatically validate and/or compare a database schema between product releases.
I am using Perl and am looking for a cross-platform method to collect the database schema. I am currently able to perform database queries by utilizing the dbisql.exe command and then parsing the results.
I am wondering if there is potentially a stored procedure or set of queries that I can run to collect the database schema.
It appears that the dbunload.exe command could be used to generate a SQL regeneration script however I am thinking that this output may be difficult to parse.
Any feedback would be greatly appreciated.
If you would like to retrieve the DB schema data on a really low level you could query the corresponding system tables. They are in the SYS-Namespace, especially SYSTABLE (for all tables) and SYSCOLUMN for all fields in those tables.
Check the ASA SQL Reference Handbook for the schema of those system tables.
With Perl's DBI you can easily fire queries on those tables. But you will have to create some local storage for the schema to compare the query results with.
Sybase Central v3.0 has the possibility to export DDL with all DB objects;
and I think SC v6.0 can't connect to ASA 11 :(