How to access all historical public posts of user? - stocktwits

I am writing a script that will extract all posted entries and exits for EOD & EOW performance reporting. I would also like to start out by generating a report for all historical trades. Is there a simple way to retrieve all the message history of a single user?

We don't currently offer historical data, aside from the ability to request messages with a maximum message_id. Our data vendors do offer historical data, and we may in the future offer it ourselves if we find enough interest.

Related

How to count each read & write operations programmatically

Is there any tools to count firestore read & write operations?
Or how do you usually count each operations manually?
Thanks!
There's no tool for you to keep a count on the operations in your Firestore database, you would need to manually keep track of them. I would rather recommend you to perform an estimate as depicted here if what you are trying to do is get an estimate of the billing you'll receive in the month.
If you have an issue with the number of reads you are currently performing, I would recommend you to take a look at this article which may help you with that topic.

Do Firebase/Firestore Transactions create internal queues?

I'm wondering if transactions (https://firebase.google.com/docs/firestore/manage-data/transactions) are viable tools to use in something like a ticketing system where users maybe be attempting to read/write to the same collection/document and whoever made the request first will be handled first and second will be handled second etc.
If not what would be a good structure for such a need with firestore?
Transactions just guarantee atomic consistent update among the documents involved in the transaction. It doesn't guarantee the order in which those transactions complete, as the transaction handler might get retried in the face of contention.
Since you tagged this question with google-cloud-functions (but didn't mention it in your question), it sounds like you might be considering writing a database trigger to handle incoming writes. Cloud Functions triggers also do not guarantee any ordering when under load.
Ordering of any kind at the scale on which Firestore and other Google Cloud products operate is a really difficult problem to solve (please read that link to get a sense of that). There is not a simple database structure that will impose an order where changes are made. I suggest you think carefully about your need for ordering, and come up with a different solution.
The best indication of order you can get is probably by adding a server timestamp to individual documents, but you will still have to figure out how to process them. The easiest thing might be to have a backend periodically query the collection, ordered by that timestamp, and process things in that order, in batch.

Calculating and reporting Data Completeness

I have been working with measuring the data completeness and creating actionable reports for out HRIS system for some time.
Until now i have used Excel, but now that the requirements for reporting has stabilized and the need for quicker response time has increased i want to move the work to another level. At the same time i also wish there to be more detailed options for distinguishing between different units.
As an example I am looking at missing fields. So for each employee in every company I simply want to count how many fields are missing.
For other fields I am looking to validate data - like birthdays compared to hiring dates, threshold for different values, employee groups compared to responsibility level, and so on.
My question is where to move from here. Is there any language that is better than any of the others when dealing with importing lists, doing evaluations on fields in the lists and then quantify it on company and other levels? I want to be able to extract data from our different systems, then have a program do all calculations and summarize the findings in some way. (I consider it to be a good learning experience.)
I've done something like this in the past and sort of cheated. I wrote a program that ran nightly, identified missing fields (not required but necessary for data integrity) and dumped those to an incomplete record table that was cleared each night before the process ran. I then sent batch emails to each of the different groups responsible for the missing element(s) to the responsible group (Payroll/Benefits/Compensation/HR Admin) so the missing data could be added. I used .Net against and Oracle database and sent emails via Lotus Notes, but a similar design should work on just about any environment.

Real-time statistics (example). NoSQL

Task
Hi I have 2-3 thousands of users online. I also have groups, teams and other(2-3) entities which have users. So for about every 10 seconds I
want to show online statistics (query various params of users and other entities). And every, I believe, 5 - 30 seconds user can change his status. Every 1 hour move to another group or team or whatever. What no-sql database should I use ? I dont have experience, just know no-sql is quite fast and just read a little about Redis, MongoDB, Cassandra.
Of course, I store this data model in RDBMS (except online status and statistics).
I think about next solution:
Store all data in json. use Redis. prepend id prefix (EX 'user_'+userId)
user_id:{"status":"123", "group":"group_id", "team":"team_id", "firstname":"firstname", "lastname":"lastname", ... other attributes]}
group_id:{users:[user_id,user_id,...], ... other group attributes}
team_id:{users:[user_id,user_id,...], ... other team attributes}
...
What would you recommend or propose? Will it be convenient to query such data?
Maybe I can use some popular standard algotithms to query statistics (ex monte-carlo algotithm for percentage statistics, I dunno). Thanks
You could use Redis Hyperloglog, a feature added in Redis 2.8.9.
This blog post describes how to calculate very efficiently some statistics that look quite similar to the ones you need.

Creating a snapshot in a distributed architecture

I'm thinking about the problem in question title: if I have to query for an aggregate in a distributed architecture where the distributed event store can eventually be waiting for last events to be distributed.. How can I know if the aggregate i'm reading via read model is not being replaced by the updated one in another server of the network?
I have an http server that receive events to save on the store. Store not exists actually but I want implement it soon.
Events regards huge aggregate that serialized in json format takes 4MB
Another sub-question is what storage do you recommend for the snapshot?
EDIT
I don't understand if the question is not written well or if I have selected wrong tags...
The ability to know when the "last" event in the distributed store is processed depends on two things:
Can you define "last"?
Does the distributed storage engine expose it to you?
The CAP theorem is a good reference to the sort of problems you are going to have with both of those in a distributed data store; in general, unless you give up availability you are not going to be able to have the properties needed to get what you want.
On the other hand, if you can define last in a meaningful way, you can still have what you want. For example: do your events expire after a while? If, for example, they expire after 12 hours, you know that you can always meaningfully define last as "the moment in time 12 hours ago", because any unprocessed event older than that is obsolete...
To answer your sub-question, I strongly recommend a storage engine that you do not write yourself, because distributed data storage is an awesomely hard problems that many very smart people, working for companies doing nothing but solving problems in this space, are doing for you.
Leverage their work instead.