Staggered API call in Power BI

Staggered API call in Power BI - rest

I'm fairly new to Power BI so forgive my ignorance if this is a stupid question. I've set up an API query to pull records from a 3rd party app, but the rate limiting of the app allows max 500 records per refresh, with some timing restrictions.
How can I set up my query to stagger the refresh, starting where it left off every time. So for example if there are 2000 records to pull, I'd want to stagger that to 500 (new) records pulled every minute until complete. I considered using incremental refresh, but there are no variables that group the data into small enough chunks.
Help? Can I provide anything else?

Related

PostgreSQL delete and aggregate data periodically

I'm developing a sensor monitoring application using Thingsboard CE and PostgreSQL.
Contex:
We collect data every second, such that we can have a real time view of the sensors measurements.
This however is very exhaustive on storage and does not constitute a requirement other than enabling real time monitoring. For example there is no need to check measurements made last week with such granularity (1 sec intervals), hence no need to keep such large volumes of data occupying resources. The average value for every 5 minutes would be perfectly fine when consulting the history for values from previous days.
Question:
This poses the question on how to delete existing rows from the database while aggregating the data being deleted and inserting a new row that would average the deleted data for a given interval. For example I would like to keep raw data (measurements every second) for the present day and aggregated data (average every 5 minutes) for the present month, etc.
What would be the best course of action to tackle this problem?
I checked to see if PostgreSQL had anything resembling this functionality but didn't find anything. My main ideia is to use a cron job to periodically perform the aggregations/deletions from raw data to aggregated data. Can anyone think of a better option? I very much welcome any suggestions and input.

How to overcome API/Websocket limitations with OHCL data for trading platform for lots of users?

I'm using CCXT for some API REST calls for information and websockets. It's OK for 1 user, if I wanted to have many users using the platform, How would I go about an inhouse solution?
Currently each chart is either using websockets or rest calls, if I have 20 charts then thats 20 calls, if I increase users, then thats 20x whatever users. If I get a complete coin list with realtime prices from 1 exchange, then that just slows everything down.
Some ideas I have thought about so far are:
Use proxies with REST/Websockets
Use timescale DB to store the data and serve that OR
Use caching on the server, and serve that to the users
Would this be a solution? There's got to be a way to over come rate limiting & reducing the amount of calls to the exchanges.

Probably, it's good to think about having separated layers to:
receive market data (a single connection that broadcast data to OHLC processors)
process OHLC histograms (subscribe to internal market data)
serve histogram data (subscribe to processed data)
The market data stream is huge, and if you think about these layers independently, it will make it easy to scale and even decouple the components later if necessary.
With timescale, you can build materialized views that will easily access and retrieve the information. Every materialized view can set a continuous aggregate policy based on the interval of the histograms.
Fetching all data all the time for all the users is not a good idea.
Pagination can help bring the visible histograms first and limit the query results to avoid heavy IO in the server with big chunks of memory.

How to improve CloudKit server latency when uploading data

I am having a hard time uploading data to my CloudKit container in a series of
'modify records' operations. I have an 'uploader' function in my app that can populate the CloudKit private database with a lot of user data. I batch the records into multiple CKModifyRecordsOperations, with 300 records in each operation as a maximum, before I upload them. When I do this with a bit of data (less than 50MB even), it can take dozens of minutes to do a simple upload. This includes a robust retry logic that takes the CKErrorRetryAfterKey key from any timed-out operations and replays them after the delay (which after happens frequently).
I checked the CloudKit dashboard, and for the container's telemetry section, the 'server latency' seems very very high (over 100,000 for 95% percentile). It also suggests the 'average request size' is 150KB on average across the last few days as I've been testing this, which doesn't seem like a lot, but the server response time is 10 seconds on each operation on average! This seems super slow.
I've tried throttling the requests so only 20 modify operations are sent at a time, but it doesn't seem to help. I have 'query' indexes for 'recordName' field for each recordType, and 'query, searchable, sortable' on some of the custom fields on the recordTypes (though not all). The CKModifyRecordsOperations' configurations have 'qualityOfService' set to 'userInitiated'. But none of this seems to help. I'm not sure what else I can try to improve the 'upload' times (downloading records seem to be happen as expected).
Is there anything else I can try to improve the time it takes to upload a few thousand records? Or is it out of my control?

Reporting of workflow and times

I have to start moving transnational data into a reporting database, but would like to move towards a more warehouse/data mart design, eventually leveraging Sql Server Analytics.
The thing that is being measured is the time between points of a workflow on a piece of work. How would you model that when the things that can happen, do not have a specific order. Also some work wont have all the actions, or might have the same action multiple times.
It makes me want to put the data into a typical relational design with one table the key or piece of work and a table that has all the actions and times. Is that wrong? The business is going to try to use tableau for report writing and I know it can do all kinds of sources, but again, I would like to move away from transaction into warehousing.
The work is the dimension and the actions and times are the facts?
Is there any other good online resources for modeling questions?
Thanks

It may seem like splitting hairs, but you don't want to measure the time between points in a workflow, you need to measure time within a point of a workflow. If you change your perspective, it can become much easier to model.
Your OLTP system will likely capture the timestamp of when the event occurred. When you convert that to OLAP, you should turn that into a start & stop time for each event. While you're at it, calculate the duration, in seconds or minutes, and the occurrence number for the event. If the task was sent to "Design" three times, you should have three design events, numbered 1,2,3.
If you're want to know how much time a task spent in design, the cube will sum the duration of all three design events to present a total time. You can also do some calculated measures to determine first time in and last time out.
Having the start & stop times of the task allow you to , for example, find all of tasks that finished design in January.
If you're looking for an average above the event grain, for example what is the average time in design across all tasks, you'll need to do a new calculated measure using total time in design/# tasks (not events).
Assuming you have more granular states, it is a good idea to define parent states for use in executive reporting. In my company, the operational teams have workflows with 60+ states, but management wanted them rolled up into five summary states. The rollup hierarchy should be part of your workflow states dimension.
Hope that helps.

Need a separate table to cache top 10 scores for leaderboard (MongoDB)?

I have a question of following situation:
I already have a score table storing the best score for each game for each user. Roughly one user has about 10 rows in score table for 10 games' top score of that user. If I want to get the overall top 10 user for leaderboard, theoretically I could directly query score table. But I am worrying about if there are too many users in score table and many users are query the top 10 at the same time, will the query on the server take too long?
Is it worth to create a separate table to cache overall top 10 users for each game so that the caching table size is fixed and query time will not scale up as user number scales up? I could update the caching table like every one hour.
I am not sure different database may be different on this or not. Now I am using MongoDB, provided by Parse.comd Server. I might transfer to other database later(like MySql).

I would strongly recommend looking into caching for a score board.
You've already seen that with a lot of users and a lot of games, keeping the top scores board current and accurate will become more and more expensive over time.
You could use Mongo to pre-process results and store them in a different collection, but you might also consider using something like Couchbase (http://www.couchbase.com/), Memcached (http://memcached.org/) or Redis (http://redis.io/) to cache your data.
A good approach for caching the scores ...
For a top 10 list for all users for a given game, set the cache expiration from 1-5 minutes. So after 5 minutes it will run your Mongo query and refresh the cache. You'll have to decide how fresh you want that data to be.
For a user's own high score board ... cache it for days. You can pro-actively expire it when they have finished a game and have a new high score. Only then will you refresh it.
The idea is identifying which pieces of data need to be pro-actively expired and which can expire on their own.
Also as a note I would recommend trying as hard as possible to make everything expire on its own. Much easier to manage.

Thats its exactlly what i will do in the game im creating, since i have multiple game levels (easy,medium, hard) i need to store those top 10 for each level. To reduce the number of api calls to parse.com i will create a cloud background job on parse that runs every 5 minutes to create a cache table with top 10 for each level, and on my client app i just select this entire cache table with 30 records with one api call.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse