SoundClound track_count for client_id different than the real count when listing the tracks - soundcloud

I'm using the Python API but also found this out using normal HTTP requests.
So the issue is the following, when querying for a channel information I get a specific number of track_count, but when I list all the tracks the track count is different, in this case is bigger by 2.
Ex: https://api.soundcloud.com/users/<user_id>?client_id=<client_id> will give me in this specific case that the track count is of 1138
Then when paginating using linked_partitioning=1 as https://api.soundcloud.com/users/<user_id>/tracks?client_id=<client_id>&limit=200&linked_partitioning=1 as per documentation, I get a total of 1140, when following the next_href. Using Python I found that 2 are repeated.
Is this something usual or there is a bug there?

There's a bug from soundcloud's POV
I've seen this in many platforms using that, including big ones like theartistunion.com and on several mobile apps.
I don't see any workaround this

Related

Firebase analytics - Unity - time spent on a level

is there any possibility to get exact time spent on a certain level in a game via firebase analytics? Thank you so much 🙏
I tried to use logEvents.
The best way to do so would be measuring the time on the level within your codebase, then have a very dedicated event for level completion, in which you would pass the time spent on the level.
Let's get to details. I will use Kotlin as an example, but it should be obvious what I'm doing here and you can see more language examples here.
firebaseAnalytics.setUserProperty("user_id", userId)
firebaseAnalytics.logEvent("level_completed") {
param("name", levelName)
param("difficulty", difficulty)
param("subscription_status", subscriptionStatus)
param("minutes", minutesSpentOnLevel)
param("score", score)
}
Now see how I have a bunch of parameters with the event? These parameters are important since they will allow you to conduct a more thorough and robust analysis later on, answer more questions. Like, Hey, what is the most difficult level? Do people still have troubles on it when the game difficulty is lower? How many times has this level been rage-quit or lost (for that you'd likely need a level_started event). What about our paid players, are they having similar troubles on this level as well? How many people have ragequit the game on this level and never played again? That would likely be easier answer with sql at this point, taking the latest value of the level name for the level_started, grouped by the user_id. Or, you could also have levelName as a UserProperty as well as the EventProperty, then it would be somewhat trivial to answer in the default analytics interface.
Note that you're limited in the number of event parameters you can send per event. The total number of unique parameter names is limited too. As well as the number of unique event names you're allowed to have. In our case, the event name would be level_completed. See the limits here.
Because of those limitations, it's important to name your event properties in somewhat generic way so that you would be able to efficiently reuse them elsewhere. For this reason, I named minutes and not something like minutes_spent_on_the_level. You could then reuse this property to send the minutes the player spent actively playing, minutes the player spent idling, minutes the player spent on any info page, minutes they spent choosing their upgrades, etc. Same idea about having name property rather than level_name. Could as well be id.
You need to carefully and thoughtfully stuff your event with event properties. I normally have a wrapper around the firebase sdk, in which I would enrich events with dimensions that I always want to be there, like the user_id or subscription_status to not have to add them manually every time I send an event. I also usually have some more adequate logging there Firebase Analytics default logging is completely awful. I also have some sanitizing there, lowercasing all values unless I'm passing something case-sensitive like base64 values, making sure I don't have double spaces (so replacing \s+ with " " (space)), maybe also adding the user's local timestamp as another parameter. The latter is very helpful to indicate time-cheating users, especially if your game is an idler.
Good. We're halfway there :) Bear with me.
Now You need to go to firebase and register your eps (event parameters) into cds (custom dimensions and metrics). If you don't register your eps, they won't be counted towards the global cd limit count (it's about 50 custom dimensions and 50 custom metrics). You register the cds in the Custom Definitions section of FB.
Now you need to know whether this is a dimension or a metric, as well as the scope of your dimension. It's much easier than it sounds. The rule of thumb is: if you want to be able to run mathematical aggregation functions on your dimension, then it's a metric. Otherwise - it's a dimension. So:
firebaseAnalytics.setUserProperty("user_id", userId) <-- dimension
param("name", levelName) <-- dimension
param("difficulty", difficulty) <-- dimension (or can be a metric, depends)
param("subscription_status", subscriptionStatus) <-- dimension (can be a metric too, but even less likely)
param("minutes", minutesSpentOnLevel) <-- metric
param("score", score) <-- metric
Now another important thing to understand is the scope. Because Firebase and GA4 are still, essentially just in Beta being actively worked on, you only have user or hit scope for the dimensions and only hit for the metrics. The scope basically just indicates how the value persists. In my example, we only need the user_id as a user-scoped cd. Because user_id is the user-level dimension, it is set separately form the logEvent function. Although I suspect you can do it there too. Haven't tried tho.
Now, we're almost there.
Finally, you don't want to use Firebase to look at your data. It's horrible at data presentation. It's good at debugging though. Cuz that's what it was intended for initially. Because of how horrible it is, it's always advised to link it to GA4. Now GA4 will allow you to look at the Firebase values much more efficiently. Note that you will likely need to re-register your custom dimensions from Firebase in GA4. Because GA4 is capable of getting multiple data streams, of which firebase would be just one data source. But GA4's CDs limits are very close to Firebase's. Ok, let's be frank. GA4's data model is almost exactly copied from that of Firebase's. But GA4 has a much better analytics capabilities.
Good, you've moved to GA4. Now, GA4 is a very raw not-officially-beta product as well as Firebase Analytics. Because of that, it's advised to first change your data retention to 12 months and only use the explorer for analysis, pretty much ignoring the pre-generated reports. They are just not very reliable at this point.
Finally, you may find it easier to just use SQL to get your analysis done. For that, you can easily copy your data from GA4 to a sandbox instance of BQ. It's very easy to do.This is the best, most reliable known method of using GA4 at this moment. I mean, advanced analysts do the export into BQ, then ETL the data from BQ into a proper storage like Snowflake or even s3, or Aurora, or whatever you prefer and then on top of that, use a proper BI tool like Looker, PowerBI, Tableau, etc. A lot of people just stay in BQ though, it's fine. Lots of BI tools have BQ connectors, it's just BQ gets expensive quickly if you do a lot of analysis.
Whew, I hope you'll enjoy analyzing your game's data. Data-driven decisions rock in games. Well... They rock everywhere, to be honest.

Do I have to loop through each 'page' of orders to get all orders in one WooComerce REST Api query?

I've built a KNIME workflow that helps me analyse (sales) data from numerous channels. In the past I used to export all orders manually and use an XSLX or CSV reader but I want to do it via WooCommerce's REST API to reduce manual labor.
I would like to be able to receive all orders up until now from a single query. So far, I only get as many as the # I fill in for &per_page=X. But if I fill in like 1000, it gives an error. This + my common sense give me the feeling I'm thinking the wrong way!
If it is not possible, is looping through all pages the second best thing?
I've managed to connect to the api via basic auth. The following query returns orders, but only 10:
I've tried increasing the number per_page but I do not think this is the right way to get all orders in one table.
https://XXXX.nl/wp-json/wc/v3/orders?consumer_key=XXXX&consumer_secret=XXXX
My current mindset would like to be able to receive all orders up until now from a single query. But it personally also feels like that this is not the common way to do it. Is looping through all pages the second best thing?
Thanks in advance for your responses. I am more of a data analist than a data engineer or scientist and I hope your answers will help me towards my goal of being more of a scientist :)
It's possible by passing params "per_page" with the request
per_page integer Maximum number of items to be returned in result set. Default is 10.
Try -1 as the value
https://woocommerce.github.io/woocommerce-rest-api-docs/?php#list-all-orders

Why Google Analytics API v3 is triggering ALWAYS sampling at 50%?

I have build a very simple crawler for Google Analytics (v3) and it used to work well until this week that I started to get sampled data in all queries.
I used to overcome sampling by simply reducing the date range of the queries, but now I get 50% of all sessions (aprox.), even for sample spaces of less than 100 sessions.
It seems like that something is triggering sampling, but I cannot realize what can be. Anyone has suffered similar issues?
EDITED
We are also suffering sampling when querying the "Users Overview" standard report from GA web interface (along with others), even when there are only 883 sessions and we are asking for a single day.
A sample query is below, where we are querying several metrics over 3 dimensions, with a sample size of 883 sessions and a sampling or around 50% (query URL is cropped, but parameters are listed on "query" key).
It seems that the reason could be related with querying ga:users metric with several dimensions, including ga:appId.
I have tried different combinations and only ga:users is returning sampled data when queried with more dimensions than ga:date.
In summary, if I query any other metric from the example with the same 3 dimensions it returns full space data.
Two weeks ago this was not happening, so I suppose that Google has changed the way ga:users is computed recently.
Moreover, as a side-effect I realized that querying users on batches is somehow misleading if you plan to compute the total number of users, because you cannot simply sum them. That is, ga:users is similar to ga:1dayUsers when queried with ga:date, and then you cannot aggregate data. Also weird is the fact that you cannot use ga:appId with ga:1dayUsers, but you can with ga:users.
We have also detected another problem after discarding ga:users in crawler. The issue is related with segment parameter, that it is also triggering sampling when used in combination with the remaining metrics and dimensions.
We collect data from several apps in the same view (not recommendable, but it is there for legacy reasons). Therefore we use a segment defined on-the-fly like "sessions::condition::ga:appId=#com.xxx.yyy.zzz".
The fact is that when we filter that way we suffer sampling, but if we use a common filter like "ga:appId=com.xxx.yyy.zzz" we do not get sampled results.
Probably the question is why we use the segment-based filter instead of standard filter, and the reason is because we need it for some specific metrics like ga:7dayUsers and related, which cannot be combined with ga:appId as dimension and so you cannot either use ga:appId in filters. Confusingly, for those metrics, when we use the segment-based filter we do not get sampled results.
Now it seems that all our API calls are returning real data.
Not sure yet however, why a default report in web interface like "Users Overview" is returning sampled data for a single day with less than 1000 sessions.
Hope this information could help someone else if having similar issues with sampling.

how to avoid redundent in Yahoo answer api

I have a question about Yahoo answer api. I plan to use (questionSearch, getByCategory, getQuestion, getByUser). For example I used getByCategory to query. Each time I call the function, I can query max 50 questions. However, there are a lot of same questions which have been queried in previous time. So How can I remove this redundent ?
The API doesn't track what it has returned to you previously as its stateless.
This leaves you with two options that I can think of.
1) After you get your data back filter out what you already have. This requires you checking what is displayed and then not displaying duplicated items.
2) Store all ID's you have showing in a list, then adjust your YQL Query so that it provides that list of ID's as ones not to turn. Like:
select * from answers.getbycategory where category_id=2115500137 and type="resolved" and id not in ('20140216060544AA0tCLE', '20140215125452AAcNRTq', '20140215124804AAC1cQl');
The downside of this, is that it could effect performance since your YQL queries will start to take longer and longer to return.

When does the Google Analytics API zero out ga:adCost? Is there a workaround?

Good afternoon. The ga:adCost metric and the ga:date and ga:referralPath dimensions are compatible, according to the reference doc. But when I query for these three values:
https://www.google.com/analytics/feeds/data?ids=XXX&dimensions=ga%3Adate%2Cga%3AreferralPath&metrics=ga%3AadCost&filters=ga%3AadCost%3E0&start-date=2011-04-21&end-date=2011-05-05&max-results=50
I get no results. Removing the filter does not change the outcome. If I remove ga:referralPath, I get expected results, with many records with non-zero ad cost. Other Campaign dimensions are OK, such as ga:source and ga:medium, though apparently ga:adContent is also no good.
At least one other person has seen very similar behavior (blog here). I've considered that it could be due to sampling and rounding, but it persists for very small date ranges.
Is there a workaround? ga:adCost is not allowed with ga:transactionId, which is the only unique identifier of which I'm aware, and even that only applies to customers who make a purchase.
I think that the problem is due to there not being a referral path for AdWords visits recorded in Google Analytics. If you want to see where AdWords visits are coming from then you need to use the other campaign dimensions (source, medium, campaign, keyword and adContent).