Data Import in GA4 for dimensions

Data Import in GA4 for dimensions - google-analytics-4

We are planning to use GA4 for our analytics use case for an eCommerce website. We have use cases to enrich our view_item event data and all other events with custom parameters which will then be used as dimension in the Analysis Hub. I have the following questions:
Size Limit - The data import csv file limit is 1GB in our use case we may have more than 10GB of data. Instead of uploading CSV file is there a way to enrich event data using BigQuery etc?
Custom Parameters - If I introduce a custom parameter during my enrichment process, can I use that as dimension in the Analysis Hub?
Thanks

Yes, BigQuery is a database, you can upload what you want, for example in a separate table and build an ad hoc query for what you are interested in obtaining.
Yes, you have to activate it in the GA4 interface.

Related

Not all [GA4] BigQuery Export schema not available via GA4 API?

I was looking for the number of IDFA /user concents collected per an app release version.
I saw it exists on [GA4] BigQuery Export schema table (device.is_limited_ad_tracking).
But couldn't find it on GA4.
Is there any alternative?

The API, webUI, and BigQuery export are different sources of data. Not only do they have different schemas (available dimensions and metrics), when compared, the data often will not match. This is by design.
This article compares the data sources:
https://analyticscanvas.com/4-ways-to-export-ga4-data/
This article explains why they don't match:
https://analyticscanvas.com/3-reasons-your-ga4-data-doesnt-match/
In most cases, you'll find the solution is to use the BigQuery export. It has the most rich set of data and doesn't have quota limits.

How to access gold table in delta lake for web dashboards and other?

I am using the delta lake oss version 0.8.0.
Let's assume we calculated aggregated data and cubes using the raw data and saved the results in a gold table using delta lake.
My question is, is there a well known way to access these gold table data and deliver them to a web dashboard for example?
In my understanding, you need a running spark session to query a delta table.
So one possible solution could be to write a web api, which executes these spark queries.
Also you could write the gold results in a database like postgres to access it, but that seems just duplicating the data.
Is there a known best practice solution?

The real answer depends on your requirements regarding latency, number of requests per second, amount of data, deployment options (cloud/on-prem, where data located - HDFS/S3/...), etc. Possible approaches are:
Have the Spark running in the local mode inside your application - it may require a lot of memory, etc.
Run Thrift JDBC/ODBC server as a separate process, and access data via JDBC/ODBC
Read data directly using the Delta Standalone Reader library for JVM, or via delta-rs library that works with Rust/Python/Ruby

How to import datasets as csv file to power bi using rest api?

I want to automate the import process in power bi, But I can't find how to publish a csv file as a dataset.
I'm using a C# solution for this.
Is there a way to do that?

You can't directly import CSV files into a published dataset in Power BI Service. AddRowsAPIEnabled property of datasets published from Power BI Desktop is false, i.e. this API is disabled. Currently the only way to enable this API is to create a push dataset programatically using the API (or create a streaming dataset from the site). In this case you will be able to push rows to it (read the CSV file and push batches of rows, either using C# or some other language, even PowerShell). In this case you will be able to create reports with using this dataset. However there are lots of limitations and you should take care of cleaning up the dataset (to avoid reaching the limit of 5 million rows, but you can't delete "some" of the rows, only to truncate the whole dataset) or to make it basicFIFO and lower the limit to 200k rows.
However a better solution will be to automate the import of these CSV files to some database and make the report read the data from there. For example import these files into Azure SQL Database or Data Bricks, and use this as a data source for your report. You can then schedule the refresh of this dataset (in case you use imported) or use Direct Query.

After a Power BI updates, it is now possible to import the dataset without importing the whole report.
So what I do is that I import the new dataset and I update parameters that I set up for the csv file source (stored in Data lake).

Data streaming to Google Cloud ML Engine

I found that Google ml engine expects data in cloud storage, big query etc. Is there any way to stream data to ml-engine. For example, imagine that I need to use data in WordPress or Drupal site to create a tensorflow model, say a spam detector. One way is to export the whole data as CSV and upload it to cloud storage using google-cloud--php library. The problem here is that, for every minor change, we have to upload the whole data. Is there any better way?

By minor change, do you mean "when you get new data, you have to upload everything--the old and new data--again to gcs"? One idea is to export just the new data to gcs on some schedule, making many csv files over time. You can write your trainer to take a file pattern and expand it using get_matching_files/Glob or multiple file paths.
You can also modify your training code to start from an old checkpoint and train over just the new data (which is in its own file) for a few steps.

Does IBM Dataworks support CSV delimiters?

Does the IBM DataWorks Data Load API support CSV files as input source?

The answer is yes. To accomplish this, you have provide the structure of the file in the request payload. This is explained in the API documentation Creating a Data Load Activity. This an excerpt of the documentation:
Within the columns array, specify the columns to provision data
from. If Analytics for Hadoop, Amazon S3, or SoftLayer Object Storage
is the source, you must specify the columns. If you specify columns,
only the columns that you specify are provisioned to the target...
The Data Load application included in DataWorks is provided just as an example and assumes the input file has 2 columns, the first being an INTEGER and the second one a VARCHAR.
Note: This question was answered on dW Answers by user emalaga.