I'm using Tableau and I want to connect a workbook to a data source connected to BigQuery, in BigQuery the query process 700 MB of data.
The question is, is there a difference in performance in Tableau if I choose to publish the data source instead of publishing the data source embedded in the workbook?
Right now I don't have many workbooks or datasources published in Tableau, but I will have in the next months, so if the volume of data sources and workbooks increases, is Tableau better capable of keeping data sources and workbooks separetely since both will occupy memory and processing time?
Related
I am using the delta lake oss version 0.8.0.
Let's assume we calculated aggregated data and cubes using the raw data and saved the results in a gold table using delta lake.
My question is, is there a well known way to access these gold table data and deliver them to a web dashboard for example?
In my understanding, you need a running spark session to query a delta table.
So one possible solution could be to write a web api, which executes these spark queries.
Also you could write the gold results in a database like postgres to access it, but that seems just duplicating the data.
Is there a known best practice solution?
The real answer depends on your requirements regarding latency, number of requests per second, amount of data, deployment options (cloud/on-prem, where data located - HDFS/S3/...), etc. Possible approaches are:
Have the Spark running in the local mode inside your application - it may require a lot of memory, etc.
Run Thrift JDBC/ODBC server as a separate process, and access data via JDBC/ODBC
Read data directly using the Delta Standalone Reader library for JVM, or via delta-rs library that works with Rust/Python/Ruby
We are looking for usage stats on tableau data sources. We are running tableau with snowflakes. We have the raw queries, but cannot link the raw queries back to the data store name.
Tableau UI offers some usage stats, but we would like these stats at the data source level, not the workbook level.
Is there a way (either in tableau or snowflake) to track usage at this level, specifically linking the tableau data source to it's source query?
I have a workbook with datasources blended and calculated fields referencing these datasources.
I get an error alert publishing to tableau server.
How do I publish the data sources?
I use tableau 2019
Create an extract of the blended data source, and publish that extract to the server. This extract will included all the calculated fields that are included in the current workbook.
Generally for this type of use case I suggest publishing your data extracts separately. They can contain your calculated fields. Publishing the calculated fields is actually more performant anyway, many of them become materialised.
You can connect to both server data sources in desktop, and perform the data blending within the Tableau workbook using the published data sources. You can create additional calculated fields, including those from data blends, within the workbook. You're then able to publish the workbook and those calculated fields will remain at the workbook level, not the data source.
Other users connecting to the data sources will only see the calculations published into the extract, not the calculations published with the workbook.
Hope that makes sense.
I want to automate the import process in power bi, But I can't find how to publish a csv file as a dataset.
I'm using a C# solution for this.
Is there a way to do that?
You can't directly import CSV files into a published dataset in Power BI Service. AddRowsAPIEnabled property of datasets published from Power BI Desktop is false, i.e. this API is disabled. Currently the only way to enable this API is to create a push dataset programatically using the API (or create a streaming dataset from the site). In this case you will be able to push rows to it (read the CSV file and push batches of rows, either using C# or some other language, even PowerShell). In this case you will be able to create reports with using this dataset. However there are lots of limitations and you should take care of cleaning up the dataset (to avoid reaching the limit of 5 million rows, but you can't delete "some" of the rows, only to truncate the whole dataset) or to make it basicFIFO and lower the limit to 200k rows.
However a better solution will be to automate the import of these CSV files to some database and make the report read the data from there. For example import these files into Azure SQL Database or Data Bricks, and use this as a data source for your report. You can then schedule the refresh of this dataset (in case you use imported) or use Direct Query.
After a Power BI updates, it is now possible to import the dataset without importing the whole report.
So what I do is that I import the new dataset and I update parameters that I set up for the csv file source (stored in Data lake).
In my company we have 1K+ Tableau workbooks, all using same Vertica data source via multiple-table connection or custom SQL. Often we end up in situation where reports stop working because underlying data source was changed: table renamed, field removed etc.
How can we proactively react to these changes is my question.
Can we try to correct source code of tableau workbooks to batch replace deprecated query parts?
Or can we monitor what data tables are used in the workbook with/without parsing the source code of the workbook to create alert system?
Thanks