We have developed a model in Tabular Object Model(TOM), <= 3.5 GB in size., and built few Tableau Dashboard(s) on top of this model.
Each dashboard is built by dragging multiple sheets into one dashboard. All the sheets (dragged in one dashboard) fetch data from one fact table (of course it has relationships with Date and other related dimensions) in TOM.
Now, when we interact with Tableau dashboard, we see a performance degradation. When we checked the SQL profiler, Tableau is generating a huge query for almost every interaction that we have with the dashboard.
We checked the huge query and observed that it includes the DAX/query for almost all the measures in fact tables, irrespective of whether the fact table is used in the said dashboard or not.
We have verified the filter settings in the dashboard, the settings are applicable only for the sheets dragged in our dashboard, so there is no question of visualizations getting changed in other dashboards.
Ironically, we still see that Tableau is creating a huge query and incorporating all the DAX/queries and this results into performance impact.
Is there any way we can restrict this behavior?
In case anyone else is having this issue, this is tied into Tableau not actually supporting SSAS Tabular, the connector you using is for SSAS Multidimensional so Tableau generates MDX queries against the DAX-based Tabular model.
This is also evident from Tableau's own techspecs site:
https://www.tableau.com/products/techspecs
"Microsoft SQL Server Analysis Services 2008 SP4 or later, multi-dimensional mode only*
"
Tableau's website at https://www.tableau.com/products/techspecs clearly states support for
"Microsoft SQL Server Analysis Services 2005 or later, non-tabular mode only*(incl. support for Kerberos)"
Related
We are working on a audit system where auditor are given access to transaction processed in last quarter. Auditor performs various analysis on the data to find out invalid/erroneous transactions that have some exceptions.
Generally, these analysis requires data to be present on some charts to view the out-layers or sometime duplication detection are done based on multiple columns.
Sometime exception detection algorithm are pretty involved that require multiple processing steps using stored procedure.
Please note that analysis rarely involves aggregation on huge rows.
Occasionally , they can change some data if they find it missing or incorrect.
We are evaluating row based (sql & nosql databases) and column store (like data warehouse systems).
Is this a use case for datawarehouse or row based store, like nosql or some RDBMS?
In short, requirements are:
- Occasional update
- Mostly read queries over last 3/months of data
- Reading data my require several messaging steps, like creating temp table in step 1, forming join with another table in step rule, delete some rows ect.
Thanks
For your task, it does not really matter how the data is stored. You need to think instead how to create a solid dimensional model, populate it with data properly, and what reporting tools to use.
To give you an example, here are a couple of common setups I've used in my projects:
Microsoft stack setup:
SQL Server for data storage
SSIS for data ETL (or write your own stored procedures if you know what you are doing)
Publish dimensional model on the same SQL Server. If your data set is large (over billion records), use SSAS Tabular instead
Power Pivot or Power BI for interactive reporting, or SSRS for paginated reports.
Open-source setup:
PostgreSQL for data storage
Use stored procedures and/or Python to process data
Publish dimensional model to another PostgreSQL database. If your data is large, publish the dimensional model to Redshift or
other columnar database
Use Tableau or Power BI for interactive reporting, or build your own reporting interface.
I think NoSQL database is a wrong choice here because audit will require highly structured data.
We're using Tableau 10.5.6. I used a reporting tool years ago called Oracle Sales Analyzer. In that tool you could get to the queries generated by the reports and graphs you created through back-end catalogs using their command line.
There you could rewrite the query to be more efficient by fine-tuning the code if you needed. It was a very cool feature of that reporting tool for geeks like me who like to dive into the back end of the product and tune it at a very low level.
My question is, does Tableau have any of this type of facility? Is there a way to get to the queries that get stored once you create a report or a graph. Also is there command line where you can access these catalogs if they exist? Otherwise are these queries just stored in ASCII flat files that can be accessed by a user.
Thanks!
There are two ways that Tableau will query a database.
Option 1: Custom SQL
In your data source, you paste in the sql you have written and Tableau will pass that query through to the database. This gives you complete control over the sql, including adding any indexing hints you may want. See https://onlinehelp.tableau.com/current/pro/desktop/en-us/customsql.html
Option 2: Use the Tableau data source designer
This is what many people do. Here, you visually design your data source with the joins. Tableau translates that design into what the Hyper engine considers to be the most effective way to run the query. Sometimes, Hyper translates that into a regular sql statement. Sometimes it does some additional things to help boost performance, like breaking it up into different queries. A lot depends on the db engine you are connecting to. There is no "sql" stored in a flat file for this. Tableau just translates your design at run-time. The Hyper engine does a good job with fine-tuning, assuming you have an efficient database design with proper indexing and current table statistics.
There is a way to see the sql from option 2 at run-time using Performance Recording. Performance Recording keeps track of each step of the visualization process and will spit out the sql statement(s) that Tableau ran to generate your dataset. The sql is not stored in the twb file though, it's a run-time analysis.
We are building a dashboard with many reports. The relationship between tables is defined in microstrategy. We found that Microstrategy is not using different SQL for different reports. It is pulling all the data from Database(which is 46 million) and then applying post processing on those data to generate individual reports.
This is taking lot of time and it is not using the query engine of the database.
How can we configure microstrategy so that it generates different query for different reports and collect only the required data for a particular report and NOT all data.
One way to do that is to use fre form SQL. But we want to have the capability for drag and drop kind of reports.
How can we achieve this?
We are using Microstrategy 10.1
From your description it sounds like Microstrategy is first pulling all data (46 million records) from the DB using its SQL Engine and then applying filtering after this.
If your reports have been created in Microstrategy developer (or web) using attribute filters then each report should correctly execute sql that has explicit where conditions that translate to those attribute filters. e.g. if you have a report with an attribute titled 'Fruit' and you want to only display apples, then you would have an attribute filter on that report that only displays results where 'Fruit' = 'Apple'. This would translate to a where condition in the SQL engine when the report is executed. However, if you are applying a view filter to the report, then the SQL engine will first obtain everything and then filter the entire dataset in the analytical engine, which would be slow especially if there are multiple reports running on the dashboard.
It's important to know how you are bringing the dataset into the dashboard - is it using a cube as a dataset, or a report, or? There are a few ways of achieving the performance you are looking for, here are a couple:
Option 1: Develop each report in Microstrategy developer using attribute filters as desired. This would require that you have all your attribute relationships defined correctly.
Option 2: Have all your 46 million records pulled into a cube. Use the cube as the dataset for the dashboard and then use view filters however you want on the various reports you want to place on the report.
Option 1 + 2: You can combine both of the above options if you wish. Store entire dataset in cube, define several reports (normal reports, not cube reports) that can dynamically source from cube, using filters as required, and then add these reports into your dashboard.
These are the things I would do as first steps:
Check your attributes and attribute relationships are defined and work
Create a test report and try to filter based on one of these attributes
Try to create a few reports, each with different filter conditions based on one of the attributes
Put these reports into the dashboard and see whether each one generates different SQL statements.
This sounds like you have either:
built the reports using view filters (which apply filtering post query execution) rather than applying filter in the generated SQL, or
you don't have attribute relationships defined, such that the system doesn't think the filters you've defined aren't relevant to the fact tables containing the data.
Are you using cubes? I am assuming that what you mean by executing the query once.
You need to replace the the individual reports with new report- regular report- not the ones made out of cubes. Thats the only way
I want to connect two database and establish a relationship between them in tableau. One from sql sever and another from Microsoft excel sheet. How to do that?
I have goggled a lot for that but could not get a suitable answer.
You are speaking about Data Blending -
And for connecting cross database data
Cross Database Querying is a Flagship Upgrade to Tableau 10.0
However, you cannot use cross-database joins with these below connection types:
Tableau Server
Firebird
Google Analytics
Microsoft Analysis Services
Microsoft PowerPivot
Odata
Oracle Essbase
Salesforce
SAP BW
Splunk
Teradata OLAP Connector
You just need to connect to each database separately and make sure they have the same column names. When creating a sheet when you switch between datasources you will see a chain on the linked fields.
Do note that this is not properly joined but is just blended data, it would be best to create another table in your sql database for the excel sheet.
Just going through the concepts of Business inteligence for Relational Databases. There Present lots of Tools for Relational DB's.
I want to know is there any tool which is used to do BI for NOSQL(MongoDB) and if yes then which is more powerful.
I have heared about Nucleaon BI. But dont know how powerful it is and advantages above other tools
There are currently 3 major BI platforms for MongoDB ecosystem.
Jaspersoft :
The only BI server that can connect directly to MongoDB, leveraging the aggregation framework APIs, so that you can report on and analyze data in MongoDB without having to move the data through ETL to a relational database.
Pentaho :
Increase Data Value – With Pentaho, MongoDB data can be accessed, blended, visualized and reported in combination with any other data source for increased insight and operational analytics. Reduce Complexity – Reporting on data stored in MongoDB is simplified, increasing developer productivity with Pentaho’s automatic document sampling, drag and drop interface and schema generation. Accelerate Data Access and Querying– With no impact on throughput, this integration builds on the features and capabilities in MongoDB, such as the Aggregation Framework, Replication and Tag Sets.
JSON Analytics :
Native JSON handling – no mapping to dimensions and measures means very short up-and-running times and no changes when the structure of the data changes. Contrary to previous-generation BI tools, JSON Studio was built from the ground up for JSON and MongoDB and is not based on a connector that tries to map JSON data into columns.
Native usage of MongoDB’s aggregation framework under an easy to use UI means very fast response times, for the first time accessible to all types of users.
HTTP Gateway with parameters means power users can design reports and graphs that can be used by any user, used for building dashboards and used from within other applications.
Rich d3 visualization and exploratory analytics gives power users the perfect platform to understand and work with data.
Low cost.
Nucleon BI is also in the picture but not so popular.
I have used Jaspersoft and found it great for BI and reporting.