Google Analytics Data API V1 (GA4 property) - wrong API report results when including an event-specific conversions count metric - google-analytics-api

The bounty expires in 2 days. Answers to this question are eligible for a +50 reputation bounty.
Golak Sarangi wants to draw more attention to this question:
I am also facing a similar issue where the dimensions in the results change based on the metric used. For a certain date range I get 24 cities when I query for totalUsers but 402 cities when I query for sessions.
I'm migrating an app from the Reporting API V4 to the new Data API V1 to support GA4 properties.
While the sessions metric aggregation is usually a few sessions off when adding multiple dimensions or requesting multiple days, adding an event-specific conversions count metric to the report request produces a considerable increase in the number of sessions on the result rows - by a factor of ~4x from my empiric observations.
Request JSON with only the sessions metric and the date dimension:
{"dimensions":[{"name":"date"}],"metrics":[{"name":"sessions"}],"dateRanges":[{"startDate":"2022-12-01","endDate":"2022-12-01"}],"metricAggregations":["TOTAL"]}
Results: 612 sessions on the response row | 612 sessions in Metric Aggregations
Request JSON with the additional conversions:purchase metric:
{"dimensions":[{"name":"date"}],"metrics":[{"name":"sessions"},{"name":"conversions:purchase"}],"dateRanges":[{"startDate":"2022-12-01","endDate":"2022-12-01"}],"metricAggregations":["TOTAL"]}
Results: 2527 sessions on the response row | 612 sessions in Metric Aggregations
Note: this behavior is consistent throughout multiple properties, even for those with no conversions:purchase events.
Is this an API bug, or am I missing something? Is there something I can do to get accurate results?

Related

REST API for processed data

It may be opinionated and not belong here, but I don't seem to find any info about this.
I have a 'sales' resource so I can GET, POST, PUT, DELETE sales of my store.
I want to do a dashboard with information about those sales: e.g the average sales per day of the last month.
Since REST is resource-oriented, that means I have to manually retrieve all sales of the last month and calculate the average per day on client using GET /sales?sale_date>=...? That doesn't seem optimal, since I could have thousands of sales in that period of time.
Also, I don't think REST can allow a URL like GET /sales/average-per-day-last-month. What is the alternative of doing this?
I don't think REST can allow a URL like GET /sales/average-per-day-last-month
Change your thinking - that's a perfectly satisfactory URL for a perfectly satisfactory resource.
Any information that can be named can be a resource -- Fielding, 2000
"Any information that can be named" of course includes "any sales report that you can imagine".
GET /sales/average-per-day-last-month HTTP/x.y
That's perfect. Depending on your needs, you might also want to have other resources that are the "average-per-day" report for specific months.
GET /sales/2021-02/average-per-day HTTP/x.y
REST doesn't constrain your resource model at all; and it doesn't constrain your resource identifier design beyond expecting conformance to the production rules described by RFC 3986.

Pushing key/value pair data to graphite/grafana

We are trying to see if graphite will fit our use case. So we have a number of public parameters. Like key value pairs.
Say:
Data:
Caller:abc
Site:xyz
Http status: 400
6-7 more similar fields (key values pairs) .
Etc.
This data is continuously posted to use in a data report. What we want is to draw visualisations over this data.
We want graphs that will say things like how many 400s by sites etc. Which are the top sites or callers for whom there is 400.
Now we are wondering if this can be done with graphite.
But we have questions. Graphite store numerical values. So how will we represent this in graphite.
Something like this ?
Clicks.metric.status.400 1 currTime
Clicks.metric.site.xyz 1 currTime
Clicks.metric.caller.abc 1 currTime
Adding 1 as the numerical value to record the event.
Also how will we group the set of values together.
For eg this http status is for this site as it is one record.
In that case we need something like
Clicks.metric.status.{uuid1}.400 1 currTime
Clicks.metric.site.{uuid1}.xyz 1 currTime
Our aim is to then use grafana to have graphs on this data as in what are the top site which have are showing 400 status?
will this is ok ?
regards
Graphite accepts three types of data: plaintext, pickled, and AMQP.
The plaintext protocol is the most straightforward protocol supported
by Carbon.
The data sent must be in the following format: <metric path> <metric
value> <metric timestamp>. Carbon will then help translate this line
of text into a metric that the web interface and Whisper understand.
If you're new to graphite (which sounds like you are) plaintext is definitely the easiest to get going with.
As to how you'll be able to group metrics and perform operations on them, you have to remember that graphite doesn't natively store any of this for you. It stores timeseries metrics, and provides functions that manipulate that data for visual / reporting purposes. So when you send a metric, prod.host-abc.application-xyz.grpc.GetStatus.return-codes.400 1 1522353885, all you're doing is storing the value 1 for that specific metric at timestamp 1522353885. You can then use graphite functions to display that data, e.g.,: sumSeries(prod.*.application-xyz.grpc.GetStatus.return-codes.400) will produce a sum of all 400 error codes from all hosts.

How do I gather geolocation data on GitHub users in a week's time, given their API rate limiting constraints?

To make a long story short, I need to gather geolocation data and signup date on ~23 million GitHub users for a historical data visualization project. I need to do this within a week's time.
I'm rate-limited to 5000 API calls/hour while authenticated. This is a very generous rate limit, but unfortunately, I've run into a major issue.
The GitHub API's "get all users" feature is great (it gives me around 30-50 users per API call, paginated), but it doesn't return the complete user information I need.
If you look at the following call (https://api.github.com/users?since=0), and compare it to a call to retrieve one user's information (https://api.github.com/users/mojombo), you'll notice that only the user-specific call retrieves the information I need such as "created_at": "2007-10-20T05:24:19Z" and "location": "San Francisco".
This means that I would have to make 1 API call per user to get the data I need. Processing 23,000,000 users then requires 4,600 hours or a little over half a year. I don't have that much time!
Are there any workarounds to this, or any other ways to get geolocation and sign up timestamp data of all GitHub users? If the paginated get all users API route returned full user info, it'd be smooth sailing.
I don't see anything obvious in the API. It's probably specifically designed to make mass data collection very slow. There's a few things you could do to try and speed this up.
Increase the page size
In your https://api.github.com/users?since=0 call add per_page=100. That will cut the number of calls getting the whole user list by 1/3.
Randomly sample the user list.
With a set of 23 million people you don't need to poll very single one to get good data about Github signup and location patterns. Since you're sampling the entire population randomly there's no poll bias to account for.
If user 1200 signed up on 2015-10-10 and user 1235 also signed up on 2015-10-10 then you know users 1201 to 1234 also signed up on 2015-10-10. I'm going to assume you don't need any more granularity than that.
Location can also be randomly sampled. If you randomly sample 1 in 10 users, or even 1 in 100 (one per page). 230,000 out of 23 million is a great polling sample. Professional national polls in the US have sample sizes of a few thousand people for an even bigger population.
How Much Sampling Can You Do In A Week?
A week gives you 168 hours or 840,000 requests.
You can get 1 user or 1 page of 100 users per request. Getting all the users, at 100 users per request, is 230,000 requests. That leaves you with 610,000 requests for individual users or about 1 in 37 users.
I'd go with 1 in 50 to account for download and debugging time. So you can poll two random users per page or about 460,000 out of 23 million. This is an excellent sample size.

Google Analytics API TotalEvents query

I'm trying to get a total count for a give event,however I get a number much bigger(10-20 times bigger), than I see on the GA website, what am I doing wrong? (api v3)
here is the segment
metric:
ga:totalEvents
segment:
dynamic::ga:eventCategory==mycategory;ga:eventAction==myaction;ga:eventLabel==mylabel
note that I get wrong results with the query explorer as well.
You are using a segment instead of a filter.
Segments are session based so it will include all events in a session if it matches you specification. So essentially if I triggered the event you want plus other events they will all be included in the total events field.
What you need to do Is remove the segment and add a filter, the filter will include only the data you request.
Hope that helps

Can response data from core reporting api be grouped?

Explanation:
I am able to query the Google Core reporting APIv3 using the client library to get data on pageviews for specific URLs of a website I am working on. I want to get data(pageviews) for each day within a specified range. So far I am simply looping through the range, sending individual request to the API. in each request I am setting the same value for the start date and the end date.
Problem:
Obviously this gets the job done, BUT it is certainly not the best way to go about it. Because, assumming I want to get data for the past 3 months for each of about 2000 URIs. Then I will need 360000 number of requests and that value is well over the limit quota defined by Google.
Potential solution: So one way I thought of solving this issue is probably to send a request setting start-date and end-date to be a week apart but the API will return a sum of the values rather than the individual values.
main question: So is there a way to insist that these values should not be added up and returned as a sum but rather returned (as associative array or something like that) separately for each.
I hope the question is clear and that there is a solution! Thank you!
Very straightforward:
Metric: ga:pageview, Dimension: ga:date, Set a filter for your pagepath, and set a start-date and end-date.
Example:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3Axxyyzz&dimensions=ga%3Adate&metrics=ga%3Apageviews&filters=ga%3Apagepath%3D%3D%2Ffaq.html&start-date=2013-06-27&end-date=2013-07-11&max-results=50
This will return the pageviews for that the faq.html& page for each day in the time-frame.
You should check out the QueryExplorer. Great tool to find out how to structure queries.