I'm creating charts with different google Cloud path (different Data Source) on a same report. What will be the maximum limit on data (cache) for each chart and cumulative total for the report ?
for example, I'm applying filter for date, as 5 years for a chart, may be about say 150 MB, will it able to show/load ? Similarly for each chart of the report, I apply same date filter for 5 yrs, and every chart have different Data Source. So, for one report, what will be Data Studio maximum data limit ?
According to the documentation for Google Data Studio, there is no mention of a cache size limit. There is only a mention that you can use the cache to manage "Data Freshness".
Additionally, judging by the comments on this document, each data source has it's own cache as you can configure when you wish to refresh said information based on the connector you are using.
Hope you find this useful!
Related
I was reading some API documents including https://developers.google.com/analytics/devguides/reporting/data/v1/basics. However, this API allows downloading certain dimensions and metrics.
request = RunReportRequest(
property=f"properties/{property_id}",
dimensions=[Dimension(name="country")],
metrics=[Metric(name="activeUsers")],
date_ranges=[DateRange(start_date="2020-09-01", end_date="2020-09-15")],
)
Is there any way to download the entire data as json or something?
BigQuery Export allows you to download all your raw events. Using the Data API, you could create individual reports with say 5 dimensions & metrics each; then, you could download your data in slices through say 10 of those reports.
BigQuery and the Data API will have different schemas. For example, BigQuery gives the event timestamp, and the most precise time granularity that the Data API gives is hour. So, your decision between the Data API & BigQuery may depend on which dimensions & metrics you need.
What dimensions & metrics are most important to you?
"Donwnload the entire data" is a vague statement. First you need to decide what kind of data you need. Aggregated data or Raw data?
Aggregated data e.g. daily / hourly activeUsers, events, sessions etc. can be extracted using Data API as you have already tried. Each API requests accepts fixed number of dimensions and metrics. So as per your business requirements you should decide the dimension-metric combinations and extract the data using the API.
Raw events can be extracted from BigQuery if you have linked GA4 property to it. Here is the BigQuery table schema for that where each row has clientId, timestamp and other event level details which are not available with Data API. Again based on business requirements you can write queries and extract the data.
I have 6.5 GB data which consists 900000 rows in my Input table ****(tPostgresqlInput)** ,I am trying to load the same data into my output table(tPostgresqlOutput) , while running the job i am not getting any response from my input table, Is there Any solution to load the data? pls refer my attachment
You made need to develop a strategy to retrieve more manageable chunks of data, for example dividing up the data based on row IDs. That way, it does not take as much memory or time to retrieve the data .
You could also increase the Default memory limit for the job from 1 GB to a higher number .
If you job runs on the same network as your database server, that can also increase performance.
Make sure you enable Use Cursor on the Inputs advanced settings. The default 1k value is fine.
Also enable the batch size on the ouput which does similar.
By enabling this Talend will work with 1k records at a time.
If this two tables are in the same DB you can try to use Talend ELT component
no push down you processing to the database. Take a look on following set of components:
https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide60EN/tELTPostgresqlInput
https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide60EN/tELTPostgresqlMap
https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide60EN/tELTPostgresqlOutput
I have over 300k records (rows) in my dataset and I have a tab that stratifies these records into different categories and does conditional calculations. The issue with this is that it takes approximately an hour to run this tab. Is there any way that I can improve calculation efficiency in tableau?
Thank you for your help,
Probably the issue is accessing your source data. I have found this problem working directly live data as an sql dabase.
An easy solution is to use extracts. Quoting Tableau Improving Database Query Performance article
Extracts allow you to read the full set of data pointed to by your data connection and store it into an optimized file structure specifically designed for the type of analytic queries that Tableau creates. These extract files can include performance-oriented features such as pre-aggregated data for hierarchies and pre-calculated calculated fields (reducing the amount of work required to render and display the visualization).
Edited
If you are using an extraction and you still have performance issues I suggest to you to massage your source data to be more friendly for tableau, for example, generate pre-calculated fields on ETL process.
I have two different data sources (Users and Shipments). I wish to display all the users who didn't receive any shipment for a specific month which will be selected by user.
Assuming you are working with structured data, you can use Tableau's data munging features. See this Tableau Knowledge Base article. Also, Tableau can support joins across different data stores, a.k.a Data Blending. See this Tableau video for more on data blending.
Also based on your question, I believe you must be relatively new to Tableau. I was in your position about a year ago. Watch Tableau's free On Demand Training videos on its website.
I have created a report using ssrs 2008.i have used mutiple times of grouping of data.Problem is arise when i preview the report it takes a lot of time ,sometimes more than 40 minutes.
how can i resolve it.
See if you can cache the dataset, Microsoft recommends to cache the dataset when the time taken is more then what is expected.
Also read the link http://www.simple-talk.com/sql/performance/a-performance-troubleshooting-methodology-for-sql-server/ which provides the basis for troubleshooting the Query at the database level.