Application event logging for statistics

Application event logging for statistics - mongodb

I have app in production and working. It is hosted on heroku and uses Postgres 9.3 in the cloud. There are 2 databases: master and (read-only follower) slave. There are tables like Users, Likes, Followings, Subscriptions and so on. We need to store complete log about events like userCreated, userDeleted, userLikedSomething, userUnlikedSomething, userFollowedSomeone, userUnfollowedSomeone and so on. Later on we have to prepare statistic reports/charts about current and historical data. The main proble is that when user is deleted it is just removed from db so we can't retrieve users that were deleted from db because they are not stored in db anymore. Same applies to likes/unlikes follows/unfollows and so on. There are few things I don't know how to handle properly:
If we will store events in same database with foreign keys to user profiles then historical data will change because each event will be "linked" to current user profile which will change in time.
If we will store events in separate postgres database (db just for logs to offload the main database) then to join the events with actual user profiles we would have to use cross-db joins (dblink) which might be slow I guess (I have never used this feature before). Anyway this wont solve the problem from point 1.
I thought about using different type of database for storing logs - maybe MongoDb - as I remember mongoDb is more "write-heavy" than postgres (which is more "read-heavy"?) so it might be more suitable for storing logs/events. However then I would have to store user profiles in two databases (and even user profile per each event to solve point 1).
I know this is very general question but maybe there is some kind of standard approach or special database type for storing such data?

Related

How to see changes in a postgresql database

My postresql database is updated each night.
At the end of each nightly update, I need to know what data changed.
The update process is complex, taking a couple of hours and requires dozens of scripts, so I don't know if that influences how I could see what data has changed.
The database is around 1 TB in size, so any method that requires starting a temporary database may be very slow.
The database is an AWS instance (RDS). I have automated backups enabled (these are different to RDS snapshots which are user initiated). Is it possible to see the difference between two RDS automated backups?

I do not know if it is possible to see difference between RDS snapshots. But in the past we tested several solutions for similar problem. Maybe you can take some inspiration from it.
Obvious solution is of course auditing system. This way you can see in relatively simply way what was changed. Depending on granularity of your auditing system down to column values. Of course there is impact on your application due auditing triggers and queries into audit tables.
Another possibility is - for tables with primary keys you can store values of primary key and 'xmin' and 'ctid' hidden system columns (https://www.postgresql.org/docs/current/static/ddl-system-columns.html) for each row before updated and compare them with values after update. But this way you can identify only changed / inserted / deleted rows but not changes in different columns.
You can make streaming replica and set replication slots (and to be on the safe side also WAL log archiving ). Then stop replication on replica before updates and compare data after updates using dblink selects. But these queries can be very heavy.

MongoDB data modelling: any drawbacks in using lots of databases?

I have recently moved to MongoDB part of the back-end of a web app, the web app itself is a validation tool, and the workflow looks like:
the user uploads a file (typically hundreds of thousands of lines)
the validator checks it outputting a lot of messages (possibly more than one per line)
...and finally provide a few statistics
I modelled my application so that each user has it's own DB containing:
The file (saved through GridFS)
A collection containing the messages (possibly over a million lines, in some cases)
A collection with the statistics
We have a few hundreds of users, so MongoDB will end up having a few hundreds DBs.
Of course I could have hold all the data in the same DB, using namespaces to separate data from different users. However I felt it was handy to send the DB in the connection URI, and I found more intuitive to issue a "drop database" statement to purge a user, rather than searching and removing its data in the large DB.
I am pretty new to MongoDB, so my question is: is there any drawback in having several DBs in the same MongoDB instance? Or is there any special consideration that I should give to the problem?

I'm not familiar with MongoDB specifically. In general, openning a connection to a database is a relatively slow operation and ties up system resources. Whether this is enough to matter in your case I can't say.
Having a different db for each user would make it difficult to perform queries that access data for multiple users. Maybe you have no need to do this.
Still, I would think it would be a whole lot simpler in general to just put a user id in each record rather than create a separate database. What's the gain of separate databases? Okay, deleting a user means saying "drop database". But deleting a user from a single database should mean saying "delete from tableX where user=?; delete from tableY where user=?" etc for however many relevant tables you have. I can't imagine it's hundreds, right? Maybe half a dozen lines of code or so?

How to handle syncing a user's db with a master db on a server?

So I'm planning an app that will involve having a master db on a server, lets say 3,000 CDs, with the columns Title, Artist, and Release Date.
1)When a user adds a CD to their collection, it will add it to the apps local SQLite DB. But lets say I spelled a CD title wrong, so I make an update to it. When the user goes to sync, how should I go about handling an updated row? Should I have a column 'IsUpdated' that is just a numeric value that increase by one every time I update that row? That way when the app sees IsUpdated on the server is larger than the local IsUpdated for that particular item, it will now to replace the contents. Does that make sense? Is it even practical? What other option would there be?
2) How would I do about handling the addition of brand new columns? Like adding a Barcode or Price? Do I just push an update for the app that adds the new columns locally, then do the same on the server, and let the rest take its run? Which would also trickle to number 1 with the syncing issue.

First you have to give more detail than that. Is the entire 3000 master list also replicated down to the remote db?
Sounds like it.
Ok so if that the case, this isn't a DB design issue so much as it is replication.
It's a bad idea to update every row in a table, especially one that makes the row longer. You'll be better off just dropping the table and recreating. <--- that's how it works in RDBMS on servers, no idea if that concept changes on a client db. And now we get into more iPhone questions of replication than simple db replication. Would it be better to just republish the app? Is the user data segregated from the server data. Can DDL be done on the local/remote tables after published?
Instead of searching the entire list for changes as you outline in #1. I would keep a dated delta table. The local app would store a last_updated_Datetime, any records in the delta table after that datetime would need to be brought down. Once downloaded the local system can determine how to apply them. Again this is inappropriate for mass changes.

What are the major challenges of building an iPhone application that synchronizes data with a server via web APIs?

I want to build an application that utilizes the data from a server, and it needs to synchronize the data in the application with the data entered by other client applications.
So, there are some questions:
How to design the database schema efficiently? Should it replicate the same database schema on the server or should it add some more fields & entities?
What are the strategies to synchronize the data, on each application start or during some idle state of the application, or something else...
How to handle conflict of the data entered by the user within the application and data enter ed by another client application.
Any response is welcomed.

Well, you've identified the main challenges in your original question. The real answer is that this has little to do with the iPhone - database replication is just really hard.
Here are some rules of thumb I can offer:
one-way replication of data is a million times easier than two-way replication, if you can get away with it.
replication is always easier if the database schema is identical on the client and the server.
to do two-way replication, you either need to store timestamps for each row on each end, or to store the complete contents of one end on the other end. (ie. the server needs to know the client's most recent status, or the client needs to know the server's most recent status).
to allow adding rows from disconnected clients, you need to identify your rows using a GUID (or hash, eg. SHA-1), not an autoincrement field. It's possible to keep new client-added rows as "identifierless" until you sync them with the server, but that way lies madness.
there is no actual good way to do conflict resolution. The imperfect options include last-writer-wins (last person who syncs a modified record gets their copy of the record inserted), three-way-merge (when someone sends a modified record, check which columns they have changed, and change only those columns, thus not overwriting any changes to other columns), split-into-two-records (if two people make changes to the same record, just make two records and assume someone will fix it eventually), and "ask the user" (which is technically the most sound, but requires a lot of UI work and users rarely understand what a conflict even is).

SQlite synchronization scheme

I know it is xmas eve, so it is a perfect time to find hardcore programers online :).
I have a sqlite db fiel that contains over 10 K record, I generate the db from a mysql database, I have built the sqlite db within my iphone application the usual way.
The records contains information about products and their prices, shops and the like, this info of course is not static, I use an automatic scheme to populate and keep updating my mysql db.
Now, how can I update the iphoen app sqlite database with the new information available in the mysql db, the db structure is still the same, but the records contains new information.
Thanks.
Ahed
info:
libsqlite3.0,
iphone OS 3.1,
mysql 2005,
Mac OS X 10.6.2

There is a question you need to answer first; How do you determine the set of changed records in your MySQL database?
Or, more specifically, given that the MySQL database is in state A, some transactions occur and now it is in state B, how do you know what changed between A and B?
Bottom line; you need a schema in MySQL that enables this. Once you have answered that question, then you can answer the "how do I sync problem?".

I have a similar application.
I am using Push Notification to let my users know there is new or updated data available.
Each time a record on the server is updated, I store a sequential record-number alongside the record.
Each UDID that's registered has a "last updated" number associated with it that contains the highest record-number it has ever downloaded.
When any given device comes to get it's updates, all database records greater than the UDID's last updated record-number as stored on the server are sent to the device. If everything goes OK, the last updated record-number for the UDID is set to the last record number sent.
The user has the option to fetch all records and refresh his database if he feels any need to sync his device to the entire database.
Seems to be working well.
-t

You can find many other similar questions by searching for "iphone synchronization":
https://stackoverflow.com/search?q=iphone+synchronization
I'm going to assume that the data is going only from mysql to sqlite, and not the reverse direction.
Three are a few ways that I could imagine doing this. The first is to just redownload
the entire database during every update. Another way, which I'm describing below, would be to create a "log" table to record the modifications to your master table, and then download just the new logs when doing the update.
I would creat a new "log" table in your SQL database to log changes to the table needing synchronization. The log could contain a "revision" column to track in what order changes were made, a "type" column to specify if it was a insert, update, or or delete, a the row-id if your affected row, and finally have the entire set of columns from your master table.
You could automate the creation of log entries by using stored procedures as wrappers to modify your master table.
With only 10k records, I wouldn't expect this log table to grow to be that huge.
You would then in sqlite keep track of the latest revision downloaded from mysql. To update the table, you would download all log entries after the latest update, and then apply them to your sqlite table.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse