We are using tableau and we are connected live to a redshift data source.
I cannot seem to find the Median aggregate function that i see when i connect to other types of data sources. Is this a known issue? We can't seem to find anything about it. Can we overcome it somehow using some kind of custom function?
First, I'm not a Redshift expert. However I know in other cases when the underlying datastore doesn't offer Median (MySQL for example), then a direct Tableau connection can't find the median.
If you use a Tableau extract, "Median" should appear as an aggregation option. This is due to the fact that Tableau has a median in its own data store implmentation.
Related
When combining multiple fact tables using shared dimensions you need to use a drill-across query or a multi-pass query to get the correct results.
I'm looking for a BI tool that does this correctly based on recognising which tables are fact tables and which are dimension tables. The tool should preferably generate a postgreSQL query.
For most of the tools that I've been looking into, you need to recognise these situations and write SQL manually to fix this.
Are there any tools that will generate the correct queries for you without the need for writing the multi-pass or drill across yourself?
We're using Tableau 10.5.6. I used a reporting tool years ago called Oracle Sales Analyzer. In that tool you could get to the queries generated by the reports and graphs you created through back-end catalogs using their command line.
There you could rewrite the query to be more efficient by fine-tuning the code if you needed. It was a very cool feature of that reporting tool for geeks like me who like to dive into the back end of the product and tune it at a very low level.
My question is, does Tableau have any of this type of facility? Is there a way to get to the queries that get stored once you create a report or a graph. Also is there command line where you can access these catalogs if they exist? Otherwise are these queries just stored in ASCII flat files that can be accessed by a user.
Thanks!
There are two ways that Tableau will query a database.
Option 1: Custom SQL
In your data source, you paste in the sql you have written and Tableau will pass that query through to the database. This gives you complete control over the sql, including adding any indexing hints you may want. See https://onlinehelp.tableau.com/current/pro/desktop/en-us/customsql.html
Option 2: Use the Tableau data source designer
This is what many people do. Here, you visually design your data source with the joins. Tableau translates that design into what the Hyper engine considers to be the most effective way to run the query. Sometimes, Hyper translates that into a regular sql statement. Sometimes it does some additional things to help boost performance, like breaking it up into different queries. A lot depends on the db engine you are connecting to. There is no "sql" stored in a flat file for this. Tableau just translates your design at run-time. The Hyper engine does a good job with fine-tuning, assuming you have an efficient database design with proper indexing and current table statistics.
There is a way to see the sql from option 2 at run-time using Performance Recording. Performance Recording keeps track of each step of the visualization process and will spit out the sql statement(s) that Tableau ran to generate your dataset. The sql is not stored in the twb file though, it's a run-time analysis.
We have developed a model in Tabular Object Model(TOM), <= 3.5 GB in size., and built few Tableau Dashboard(s) on top of this model.
Each dashboard is built by dragging multiple sheets into one dashboard. All the sheets (dragged in one dashboard) fetch data from one fact table (of course it has relationships with Date and other related dimensions) in TOM.
Now, when we interact with Tableau dashboard, we see a performance degradation. When we checked the SQL profiler, Tableau is generating a huge query for almost every interaction that we have with the dashboard.
We checked the huge query and observed that it includes the DAX/query for almost all the measures in fact tables, irrespective of whether the fact table is used in the said dashboard or not.
We have verified the filter settings in the dashboard, the settings are applicable only for the sheets dragged in our dashboard, so there is no question of visualizations getting changed in other dashboards.
Ironically, we still see that Tableau is creating a huge query and incorporating all the DAX/queries and this results into performance impact.
Is there any way we can restrict this behavior?
In case anyone else is having this issue, this is tied into Tableau not actually supporting SSAS Tabular, the connector you using is for SSAS Multidimensional so Tableau generates MDX queries against the DAX-based Tabular model.
This is also evident from Tableau's own techspecs site:
https://www.tableau.com/products/techspecs
"Microsoft SQL Server Analysis Services 2008 SP4 or later, multi-dimensional mode only*
"
Tableau's website at https://www.tableau.com/products/techspecs clearly states support for
"Microsoft SQL Server Analysis Services 2005 or later, non-tabular mode only*(incl. support for Kerberos)"
Need to be able to report on Unique Visitors, but would like to avoid pre-computing every possible permutation of keys and creating multiple tables.
As a simplistic example, let's say I need to report Monthly Uniques in a table that has the following columns
date (Month/Year)
page_id
country_id
device_type_id
monthly_uniques
In Druid and Redis, Hyperloglog data type will take care of this (assuming a small margin of error is acceptable), where I would be able to run a query by any combination of the dimensions and receive a viable estimate of the uniques.
Closest I was able to find in PostgreSQL world is postgresql-hll plugin, but it seems to be for PostgreSQL 9.0+.
Is there a way to represent this in Redshift without either having to pre-compute or store visitor IDs (greatly inflating the table size, but allowing to use RedShift's "approximate count" hll implementation)?
Note: RedShift is the preferred platform, but I already know that other self-hosted PostgreSQL forks can support this, such as CitusDB. Looking for ways to do this with RedShift.
Redshift recently announced support for HyperLogLog Sketches:
https://aws.amazon.com/about-aws/whats-new/2020/10/amazon-redshift-announces-support-hyperloglog-sketches/
https://docs.aws.amazon.com/redshift/latest/dg/hyperloglog-overview.html
UPDATE: blog post on HLL usage https://aws.amazon.com/blogs/big-data/use-hyperloglog-for-trend-analysis-with-amazon-redshift/
Redshift announced new HLL capabilities in October 2020. If your Redshift release version is 1.0.19097 or later, you can use all HLL functions available. See more at AWS Redshift documentation here
You can do something like
SELECT hll(column_name) AS unique_count FROM YOURTABLE;
or create HLL sketches directly
Redshift, while technically postgresql-derived, was forked over ten years ago. It still speaks the same line protocol as postgres, but its code has diverged a great deal. Among other incompatibilities, it no longer allows for custom datatypes. That means that the type of plugin you're looking to use is not going to be feasible.
However, as you pointed out, if you're able to get all the raw data in, you can use the built-in approximation capability.
I am trying to run RAWSQL_REAL("select sum(amount_us)from gbsa_dpo_itg.Fact_tblHistoryData_new where qtr_data='Q42014'") in calculated field and I am getting error message ERROR 2133: Aggregate function calls cannot contain subqueries.
I am using tableau 8.3.3 and HP Vertica database live connection to tableau
When I run the same query in custom sql it is working fine
pleas help in this
thanks in advance
Read the manual about these functions, look under reference, functions
You don't generally pass an entire SQL string to execute in isolation. Instead, they are useful for writing expressions or calling non standard functions that your server may provide, which are embedded within the SQL that Tableau generates. So first learn to use Tableau to get the effect you want, and then resort to Raw SQL functions in the rare case where you need to access some database server specific feature.
There is no reason that you would need Raw SQL to get the information above using Tableau. You could put amount_us on the row shelf and qtr_data on the filter shelf, and Tableau would generate a similar query.
If you are doing this to combine data from multiple queries, first learn about calculated fields and data blending.