Webmasters.Searchanalytics.Query rowLimit works strange - google-search-console

I constructed query this way:
...
val request = new SearchAnalyticsQueryRequest()
request.setStartDate(from)
request.setEndDate(to)
request.setDimensions(List("query", "page").asJava)
request.setRowLimit(5000)
request.setStartRow(0)
webmasters.searchanalytics().query(site, request)
result have 3343 rows
I tried to make paging - and for test reasons setup rowLimit at 1000
and i suggest to get 1000 then another 1000, and another 1000 and, finaly, 343 rows
from here https://developers.google.com/webmaster-tools/v3/how-tos/search_analytics
If your query has more than 5,000 rows of data, you can request data in batches of 5,000 rows at a time by sending multiple queries and incrementing the startRow value each time. Count the number of retrieved rows; if you get less than the number of rows requested, you have retrieved all the data. If your request ends exactly on the data boundary (for example, there are 5,000 rows and you requested startRow=0 and rowLimit=5000), on your next call you will get an empty response.
but i got only 559 rows !
when i set rowLimit at 100 - i got 51 rows!!!
What i doing wrong? :)

I noticed the same behavior, that looks like data sampling.
You can get more results (and then more accurate metrics) by fetching data day-by-day through multiple queries, instead of only one query ranging between two dates.
Hope, it helps!

Related

How do I make the trigger run after all the data is inserted into the batch class?

I want to use Apex Batch class to put 10,000 pieces of data into an object called A and use After Insert trigger to update the weight field value of 10,000 pieces of data to 100 if the largest number of weight fields is 100.
But now, if Batch size is 500, the number with the largest weight field value out of 500 data is applied to 500 data.
Of the following 500 data, the number with the largest weight field value applies to 500 data.
For example, if the weight field for the largest number of the first 500 data is 50,
Weight field value for data 1-50: 50
If the weight field for the largest number of the following 500 data is 100,
Weight field value for data 51-100: 100
I'm going to say that if the data is 10,000, the weight field is the largest number out of 10,000 data.
I want to update the weight field value of all data.
How shall I do it?
Here's the code for the trigger I wrote.
trigger myObjectTrigger on myObject_status__c (after insert) {
List<myObject_status__c> objectStatusList = [SELECT Id,Weight FROM myObject_status__c WHERE Id IN: Trigger.newMap.KeySet() ORDER BY Weight DESC];
Decimal maxWeight= [SELECT Id,Weight FROM myObject_status__c ORDER BY Weight DESC Limit 1].weight
for(Integer i=0;i<objectStatusList();i++){
objectStatusList[i].Weight = maxWeight;
}
update objectStatusList;
}
A trigger will not know whether the batch is still going on. Trigger works on scope of max 200 records at a time and normally sees only that. There are ways around it (create some static variable?) but even then it'd be limited to whatever is the batch's size, what came to single execute(). So if you're running in chunks of 500 - not even static in a trigger would help you.
Couple ideas:
How exactly do you know it'll be 10K? You're inserting them based on on another record? You're using the "Iterator" variant of batch? Could you "prescan" the records you're about to insert, figure out the max weight, then apply it as you insert, eliminating the need for update?
if it's never going to be bigger than 10K (and there are no side effects, no DMLs running on update) - you could combine Database.Stateful and finish() method. Keep updating the max value as you go through executes(), then in finish() update them 1 last time. Cutting it real close though.
can you "daisy chain". Submit another batch from this batch's finish. Passing same records and the max you figured out.
can you stamp the records inserted in same batch with same value, like maybe put the batch job's id into a hidden field. Then have another batch (daisy chained?) that looks for them, finds the max in the given range and applies to any that share the batch job id but not have the value applied yet
Set the weight in your finish method of the batch class, it runs once all batches have finished. Track the max weight record in a static variable in the class.

long running queries and new data

I'm looking at a postgres system with tables containing 10 or 100's of millions of rows, and being fed at a rate of a few rows per second.
I need to do some processing on the rows of these tables, so I plan to run some simple select queries: select * with a where clause based on a range (each row contains a timestamp, that's what I'll work with for ranges). It may be a "closed range", with a start and an end I know are contained in the table, and I know no new data will fall into the range, or an open range : ie one of the range boundary might not be "in the table yet" and rows being fed in the table might thus fall in that range.
Since the response will itself contains millions of rows, and the processing per row can take some time (10s of ms) I'm fully aware I'll use a cursor and fetch, say, a few 1000 rows at a time. My question is:
If I run an "open range" query: will I only get the result as it was when I started the query, or will new rows being inserted in the table that fall in the range while I run my fetch show up ?
(I tend to think that no I won't see new rows, but I'd like a confirmation...)
updated
It should not happen under any isolation level:
https://www.postgresql.org/docs/current/static/transaction-iso.html
but Postgres insures it only in Serializable isolation
Well, I think when you make a query, that means you create a new transaction and it will not receive/update data from any other transaction until it commit.
So, basically "you only get the result as it was when you started the query"

Get past row limitation

My report processes millions of records. When the number of rows gets too high, I get this error:
The number of rows or columns is too big. Try limiting the number of unique group values.
Details: The number of rows or columns exceeds its limit, 65535.
How can I work around (or increase) this limit?
This error is pretty straightforward. 65535 is 0xFFFF in hexadecimal, so once you hit that limit there's no more vacancies and the hotel is closed. Solutions include:
Reduce the number of rows displayed by using grouping in your crosstab or whatever.
Reduce the amount of incoming data to your report with Record Selection. (Parameters)
Perform the dependent calculations in a custom SQL statement, generated as a temporary table in your report. You can then pass the results into your report as fields, rather than having to print millions of lines.

KDB+/Q query too heavy to handle

I want to grab data from a KDB data base for a list of roughly 200 days within the last two years. The 200 days are in no particular pattern.
I only need the data from 09:29:00.000 to 09:31:00.000 everyday.
My first approach was to query all of the last two years data that have time stamp between 09:29:00.000 and 09:31:00.000, because I didn't see a way to just query the particular 200 days that I need.
However this proved to be too much for my server to handle.
Then I tried to summarize the 2 minute data for each date into an average and just print out the average, so now I will only have 200 rows of data as output. But somehow this still turns out to be too much. I'm not sure if this is because I'm not selecting the data correctly.
My other suspicion is that the query is garbing all the data first then averaging each date, which means averaging is not making it easier to handle.
Here's the code that I have:
select maxPriceB:max(price), minPriceB:min(price), avgPriceB:avg(price), avgSizeB:avg(qty) by date from dms where date within(2015.01.01, 2016.06,10), time within(09:29:00.000, 09:31:00.000), sym = `ZF6
poms is the table that the data is in
ZFU6 is the symbol that im looking for
I tried adding the key word distinct after select.
I want to know if there's anyway to break up the query, or make the query lighter for the server to handle.
Thank you!
If you use 32-bit kdb+ and get infamous 'wsfull error then you may try processing one day at a time like this:
raze{select maxPriceB:max(price), minPriceB:min(price), avgPriceB:avg(price), avgSizeB:avg(qty)
from dms where date=x,sym=`ZF6,time within 09:29:00.000 09:31:00.000}each 2015.01.01+1+til 2016.06.10-2015.01.01

SQLite vs Memory

I have a situation with my app.
Suppose I have 6 users, each user can have up to 9 score entries (i.e score 1000 points at 8:00pm with gold collected 3, silver 4 etc etc), say score per stage and 9 stages.
All these scores are being taken from an API call, so it can update with an interval of 3+minutes.
Operations I need to do on this data is
find the nearest min, max record from stage 4.
and some more operations like add or subtract two scores etc
All these 6 users, and their score records are already in database, being updated in needed after the API call.
Now my questions is :
Is this a better way for such kind of data (data of scores here) to keep all the data for all the 6 users in memory in NSArray or NSDictionary, and find min and max in that array by a min-max algorithm.
OR
It should be taken from Database by a query like " WHERE score<=200 " AND " WHERE score >=200", in short, 2 database queries which return nearest min and max record each, and not keeping all the data in memory.
What we are focusing on is speed, and memory usage both. The point is, Would a DB call be fast and efficient to find min and max OR a search for min,max in an Array of all the records from DB.
All records can be 6users * 9scores for each = 54.
Update time for records can be 3+ minutes.
Frequency of finding min max for certain values are high.
Please ask, if any more details are required.
Thanks in advance.
You're working with such a small amount of data that I wouldn't imagine it would be worth worrying about. Do whichever method makes your development process easiest!
Edit:
If I had a lot of data (hundreds of competitors) I'd use SQLite. You can do queries like the following:
SELECT MIN(`score`) FROM `T_SCORE` WHERE `stage` = '4';
That way you can let the database handle doing the calculation for you, so you never have to fetch all the results.
My SQL-fu isn't the most awesome, but I think you can also do this:
SELECT `stage`, MIN(`score`) AS min, MAX(`score`) AS max FROM `T_SCORE` GROUP BY `stage`
That would do all the calculations in one single query.