Is access to the Activity Log on the short term road map?
I have two use cases:
Compliance:
Weekly dump of the Activity Logs, consolidate, and provide compliance metrics during initial adoption of the system.
Non-Compliance:
Weekly dump of the Activity Logs, consolidate, and provide compliance metrics and comparison to user list to determine non-compliance/resistance during initial adoption of the system.
Of course, those could continue after roll-out, but may be key to identifying areas of resistance to adoption and things to be improved early in the process.
I use Python 3.6 with associated SDK.
Craig
It's on our list, but I can't say "short term."
It's relatively expensive due to the large list of activities we need to model.
Thank you for including the use cases - that really helps us prioritize.
Related
I am getting very high counts of Entity Writes in my firestore database.
Write permission in most of the paths are restricted, done from back-end server using admin SDK. Only a very few paths have write access- specifically only to the users who are (authenticated & registered & joined and approved in a specific group), so even though the ways to abuse are apparently thin, yet hard to specifically identify.
Only way I see- is to execute Cloud Functions on every write, and have the function log the paths somewhere to analyze. But that introduces further costs and complexity.
Is there any way/recommendation to monitor/profile where (i.e.- path) and who (UID or any identity) are performing the writes? There are tools to do such for RTDB, bu't can't find anything for Firestore.
I am also wondering if there is any way to restrict ip/users automatically in case of abuse (i.e.- high rate of read/write)?
What I'm currently doing is going to firestore console => menu usage => view usage
and I see something like this:
It's not the same as the profiler, but better than nothing.
I'm also keeping an eye on the video on the link below to see if someone provides an answer. People are asking for the profiler too.
https://www.youtube.com/watch?v=9CObBsjk6Tc
I'm trying to build an Instant Messaging functionality in my app as part a bigger project.
Chats can have more than 2 participants (group chats)
If participant A delete a message, it still should be visible to participant B (that's why I used the Message Participants table)
Same applies to Conversation.
By same logic, if all participants delete the conversation/message, it should be erased from DB.
Questions :
I'm afraid that this schema is too cumbersome, meaning that the queries will be too slow once the app gets certain traffic mark (1k active users ? I'm guessing)
Message Participants will have multiple records for each message - one for each participants in the chat. Instant Messaging means it will involve those writes with very tight timings. Wouldn't that be a problem?
Should I add a layer of Redis DB, to manage a chat's active session's messaging? it will store the recent messages, and actively sync the PostgreSQL db with those messages (perhaps with Async transactions functionality that postgresql has?)
UPDATED schema :
I would also gladly hear ideas for having a "read" status functionality. I'm assuming it's much more complex with Group chats, so at least offering that for 1:1 chats would be nice.
I am a little confused by your diagram. Shouldn't the Conversation Participants be linked to the Conversations instead of the Message? The FKs look all right, just the lines appear wrong.
I wouldn't be worried about performance yet. The Premature Optimization Anti-Pattern warns us not to give up a clean design for performance reasons until we know whether we are going to have a performance problem. You anticipate 1000 users - that's not much for a modern information system. Even if they are all active at the same time and enter a message every 10 seconds, this will just mean 100 transactions per second, which is nothing to be afraid of. Of course, I don't know the platform on which you are going to run this. But it should be an easy task to set up those tables and write a simple test program that inserts those records as fast as possible.
Your second question makes me wonder how "instant" you expect your message passing to be. Shall all viewers of a message receive each keystroke of the text within a millisecond? Or do they just need to see each message appear right after it was posted? Anyway, the limiting factor for user responsiveness will probably be the network, not the database.
Maybe this is not mainly a database design issue. Let's assume you will have a tremendous rate of postings and viewings. But not all conversations will be busy all the time. If the need arises - but not earlier - it might be necessary to hold the currently busy conversations in memory and using the database just as a backup for future times when they aren't busy any more.
Concerning your additional comments:
100k users: This is a topic not for this forum, but concerning business development of a startup. Many founders of startup companies imagine huge masses of users being attracted to their site, while in reality most startups just fail or only reach very few. So beware of investments (in money, but also in design and implementation effort) that will only pay in the highly improbable case that your company will be the next Whatsapp.
In case you don't really anticipate such masses of users but just want to imagine this as a programming exercise, you still have a difficult task. You won't have the platform to simulate the traffic, so there is no way to make measurements on where you actually have a performance problem to solve. That's one of the reasons for the Premature Optimization warning: Unless you know positively where you have a bottleneck, you - and all of us - will be just guessing and probably make the wrong decisions.
Marking a message as read is easy: Introduce a boolean attribute read at Message Participants, and set it to true as soon as, well, the user has read the message. It's up to your business requirements in which cases and to whom you show this.
This may be quite an easy question to answer as it may just be my lack of understanding, but if you are having to run the query twice - once of the server and once on the client - why not just publish all the collection data, and then just run one query on the client?
Obviously I don't mean doing this for the users collection, but if you have a blog Posts collection, wouldn't this be beneficial?
Publish all the post data, then subscribe to it and running whatever query is necessary on the client to get the data you need.
Publishing everything is good for 'development' environment as meteor adds autopublish by default but this has some fallacies in 'production' environment. I find this two points to be of importance
Security : The idea is, supply only as much data to the client as required. You can never trust the client and you don't know what the client may use the data for. For your use case, of simple blog posts, this may not be a serious risk but may be a critical risk for e commerce application. The last thing, you want is a hacker to use the data and leverage a bug in your code to do nasty stuff.
Data Overheads: For subscriptions, generally waitOn is used. Thus, till all the data has been made available to the client, the templates are not rendered. IF you have a very large amount of data it will take considerable time to render. So, it is advised to keep the data at 'only what to need' stage to optimize this time too.
One of our web site is a common "Announce for free your apartment".
Revenues are directly associated to number of public usage and announces
registered (argument of our marketing department).
On the other side, REST pushes to maintain a clear api when designing your
api (argument of our software department) which is a data stealing
invitation to any competitors. In this view, the web server becomes
almost an intelligent database.
We clearly identified our problem, but have no idea how to resolve these
contraints. Any tips would help?
Throttle the calls to the data rich elements by IP to say 1000 per day (or triple what a normal user would use)
If you expose data then it can be stolen. And think about search elements that return large datasets even if they are instigated by javascript or forms - I personally have written trawlers that circumvent these issues.
You may also think (if data is that important) about decrypting it in the client based on keys and authentication sent from the server (but this only raises the bar not the ability to steal.
Add captcha/re-captcha for users who are scanning too quickly or too much.
In short:
As always only expose the minimum API to do the job (attack surface minimisation)
Log and throttle
Force sign in(?). This at least MAY put off some scanners
Use capthca mechanism for users you think may be bots trawling your data
I'm studying UML and I've been reading about use cases. All of the examples I see deal with single system and I was wondering how an end to end process would be modeled. So, I made up a fairly typical enterprise scenario and I’ve been trying to model it. I have questions that I’ve been unable to answer.
Scenario: My business use case is a Shopper creates a Shopping Cart which is received by the Vendor as an Order.
The end to end process flow is:
The shopper creates a cart
A manager reviews the cart and approves/rejects and a purchase order is created in the purchasing system.
The purchasing system sends all newly created PO to their respective vendors’ systems.
The vendor receives the PO as an order.
However, the devil is in the details so I decided to make it more complex by adding the following details:
The shopping-purchasing system communication is point to point and real-time.
The PO can be sent to vendor via fax or internet. All PO go into a queue before being sent to the vendor. The queue is processed every X minutes. I picked 10 minutes as the interval
The purchasing-vendor connection uses middleware (ESB).
Questions:
I believe I have 3 system use cases: Shopper-Creates Cart, Manager-Reviews the Cart , Time-Send PO to Vendors. Is the correct even though I have an ESB system between the Purchasing System and Vendor System?
Since the middleware is not an actor in one of the above use cases where should I model the ESB’s involvement in the process (Purchasing-> ESB, ESB-> Vendor)?
Do I draw 2 system boundaries or 1 system boundary? I believe I should have the Vendor’s System as a secondary actor so I only have the Shopping System and the Purchasing System. Or do I merge them into an E2E System (such as Procurement System)?
I would create separate use cases for reviewing, approving and rejecting the cart, but otherwise I think your use cases should be accurate enough. Since the ESB system is not directly used by your actors, I don't think it's relevant in the use case diagram.
You could create a component diagram to model the relationships between separate systems and their subsystems in more detail that is possible or reasonable in a use case diagram. If you want to, you probably could isolate the ESB in it's own system boundary with a use case "Deliver PO to Vendor" marked as a dependency for the use cases related to the connection.
I suggest two or three system boundaries, depending on whether you create an own boundary for the ESB. If the Vendor's system is outside your scope, you probably won't need to model it in too much detail - receiving the PO should be enough.
Use cases are meant to describe how the users of the system (actors) interact with the system. They should be simple enough for your client to understand. So before you start breaking your head with use-cases questions, ask yourself who is your client and how are you creating a better system for him by creating a use case.
(sorry for the philosophical answer...)