Streaming Analytics Query Yielding More Than Real Time Data - tsql

I recently set up a streaming analytics job pulling data from an event hub that's capturing about 1000 events per month. I'd like to pull real time data from the event hub and display near realtime data. I entered the following query that is downloading data successfully to the streaming analytics job:
SELECT system.timestamp AS time
, city
, state
, zip
, hascontactedconsultant
, websiteguideid
, status
, assignedto
, type
, COUNT(type)
INTO ttvleadsstream
FROM ttvhuball
GROUP BY time
, city
, state
, zip
, hascontactedconsultant
, websiteguideid
, status
, assignedto
, type
, TumblingWindow(ss, 5);
However, when I check on the dataset in my PowerBI online service, I'm noticing that the data is capturing and displaying events from now all the way back to yesterday. I'm not seeing what in my query is causing data to be preserved for longer than 5 seconds. Any ideas?

Based on your query, if there are events, Azure stream analytics will output every 5 seconds.
Looks like you are sending the data to PowerBI. PowerBI deletes old data based on a retention policy. Details about the policy are here.
https://msdn.microsoft.com/en-us/library/mt186545.aspx
Roughly, it will only purge old data when certain thresholds are met. This might be the reason why you see old data.

Related

MSK & building aggregrate tables (e.g. for analytics)

I use MSK and I manually build aggregate tables of my streams in my application code (e.g. TypeScript in a node.js webservice). I have lots of data (approaching 1M events per day), and I want to be able to productionise different real-time 'views' on the incoming stream. E.g. for some sales data, I might want to create these views:
sales per customer (table schema: customer, sum_of_sales)
sales per day (table schema: date, sum_of_sales)
sale per customer per day (table schema: date, customer, sum_of_sales)
Today if I wanted to achieve this I would scaffold 3 tables up (could be RDMS or something like DynamoDB), and then in my application code, I would insert/upsert into the table for every sales event that arrived. The scaffolding around that feels a little tedious, I was wondering if there is a better way without having to write a bunch of code in my webservice to actually pull from the consumer, upsert the data into a table.
All I would expect my code in my web service to do is provide APIs (e.g. REST APIs) to fetch data from these views. E.g. a client makes a REST request to get all sales in the last 7 days for customers X, Y and Z.
There seems like a lot of technologies out there, but my use case is fairly trivial and from the not-so-brief look I took nothing does this.
Thanks
If it's noteworthy, I currently keep my data indefinitely.

Time Stamped Data

Our edge device has an inbuilt data logging function which logs data at regular intervals. If for some reason a connection to the cloud is lost for a period of time, then the next time it connects it will upload data from it's internal data log memory. In this case the sample is sent with a time stamp of when the data was logged, which is obviously different to the time it is received by the cloud.
The time stamp is sent in a standard format as shown by the packet below.
{"d": { "Ch_1": 37.4,"Ch_2": 37.1,"Ch_3": 3276.7,"Ch_4": 3276.7},"bt": "2016-09-19T14:35:00.00+12:00"}
where "bt" is name for the base time of the sample. Looking at the property details in the schemas, I can set the data type to a string type but how would I get this data to be recognized as a date/time stamp and store this data accordingly?
Is there a way of doing this?

PostgreSQL REST API Pagination

I'm following Pagination Done the Right Way, which orders by date for news.
I want to order by created_at (a timestamp) for posts (like Facebook posts).
According to PostgreSQL Date/Time Types, timestamp has a resolution of 1 microsecond.
The API's clients, however, only need (to display) whole seconds.
So, should I just round created_at to whole seconds (with CURRENT_TIMESTAMP(0)) by default when inserting new posts?
That way, to get the next page, the client can simply send back to the REST API server the created_at timestamp (in whole seconds) of the last post it received.
Otherwise, the client would have to know the exact created_at timestamp (down to the microseconds) of the last post it received.
Is there any reason to store the exact microseconds in the database (especially if they're never sent to the clients)? Isn't a whole second enough precision for something like Facebook (or Instagram) posts?

Keep set members sorted in Redis

I'm coding an IM system.
I'm using Redis and JSON to store the data. I have a Redis Set with the conversation IDs. When I retrieve them, I would like to get the list sorted by the timestamp of the messages:
conversation 9 -> last message timestamp: 1390300000
conversation 12 -> last message timestamp: 1390200000
conversation 7 -> last message timestamp: 1390100000
I have a Set with the conversations where each user participates (user1337:conversations) and a List with the JSON-encoded messages of each conversation (conversation1234:messages).
I guess there is no need for tricks, so it can be done natively with Redis. How would you manage to achieve this?
Sounds like a Sorted Set is exactly what you need.
You would set the timestamp of each conversation as its score (see ZADD) and then you can retrieve them ordered, using commands like ZRANGE, ZRANGEBYSCORE, ZREVRANGE and ZREVRANGEBYSCORE.

Get inbox messages from a date onwards

Using the Graph API Explorer, and using GET /me/inbox, I can get a list of messages.
I was wondering how to limit them to messages from the past day, for example?
You can use time based paging this way:
me/inbox?since=1372395600
It relies on the updated_time (unix timestamp) parameter of an inbox thread. This way you could get all the threads updated with a message at a time since yesterday, for example.