Do I need to calculate the field or update on each transaction? - postgresql

Trying to design database. I have User table, which supposed to have balance field. It can be calculated each time I need balance based on 4 tables like below:
So I need to do some math like sum of all deposits, minus sum of withdrawals, minus bet amount+profit, plus amount from bonuses. There can be thousands of records for each user in each related table.
Or alternatively I can just update balance field from the application code whenever one of the related table has been altered.
However, first method is tend to be slower and slower as database grows. Second method is prone to errors, in case my application will fail to update the field and real balance will get asynced with balance field.
I wonder if there is any design pattern or technique to handle this cases? How the online bankings or similar services are counting balance? Are they going through each bank transaction every time balance is requested?

For the first method , in order to prevent from loading all transaction to calculate the account balance , we can make a snapshot value of the account balance periodically (e.g. at the end of day/month or for certain number of completed transactions) such that when calculating the latest account balance , we only need to load latest snapshot account balance and only the transaction after that latest snapshot time rather than loading all transaction records.
You can also found the similar snapshot pattern in the event-sourcing

Related

Reliable way to poll data from a Postgres table

I want to use a table in Postgres database as a storage for input documents (there will be billions of them).
Documents are being continuously added (using "UPSERT" logic to avoid duplicates), and rarely are removed from the table.
There will be multiple worker apps that should continuously read data from this table, from the first inserted row to the latest, and then poll new rows as they being inserted, reading each row exactly once.
Also, when worker's processing algorithm changes, all the data should be reread from the first row. Each app should be able to maintain its own row processing progress, independent of other apps.
I'm looking for a way to track last processed row, to be able to pause and continue polling at any moment.
I can think of these options:
Using an autoincrement field
And then store the autoincrement field value of the last processed row somewhere, to use it in a next query like this:
SELECT * FROM document WHERE id > :last_processed_id LIMIT 100;
But after some research I found that in a concurrent environment, it is possible that rows with lower autoincrement values will become visible to clients LATER than rows with higher values, so some rows could be skipped.
Using a timestamp field
The problem with this option is timestamps are not unique and could overlap during high insertion rate, what, once again, leads to skipped rows. Also, adjusting system time (manually or by NTP) may lead to unpredicted results.
Add a process completion flag to each row
This is the only actually reliable way to do this I could think of, but there are drawbacks to it, including the need to update each row after it was processed and extra storage needed to store the completion flag field for each app, and running a new app may require DB schema change. This is the last resort for me, I'd like to avoid it if there are more elegant ways to do this.
I know, the task definition screams I should use Kafka for this, but the problem with it is it doesn't allow to delete single messages from a topic, and I need this functionality. Keeping an external list of Kafka records that should be skipped during processing feels very clumsy and inefficient to me. Also, a real-time deduplication with Kafka would also require some external storage.
I'd like to know if there are other, more efficient approaches to this problem using the Postgres DB.
I ended up saving the transaction id for each row and then selecting records that have txid value lower than the transaction with smallest id to the moment like this:
SELECT * FROM document
WHERE ((txid = :last_processed_txid AND id > :last_processed_id) OR txid > :last_processed_txid)
AND txid < pg_snapshot_xmin(pg_current_snapshot())
ORDER BY txid, id
LIMIT 100
This way, even if Transaction #2, that was started after Transaction #1, completes faster than the first one, the rows it written won't be read by a consumer until the Transaction #1 finishes.
Postgres docs state that
xid8 values increase strictly monotonically and cannot be reused in the lifetime of a database cluster
so it should fit my case.
This solution is not that space-efficient, because an extra 8-byte txid field must be saved with each row, and an index for the txid field should be created, but the main benefits over other methods here are:
DB schema remains the same in case of adding new consumers
No updates needed to mark row as processed, a consumer only should keep id and txid values of the last processed row
System clock drift or adjustment won't lead to rows being skipped
Having the txid for each row helps to query data in insertion order in cases when multiple producers insert rows with id, generated using preallocated pool (for example, Producer 1 in the moment inserts rows with ids in 1..100, Producer 2 - 101..200 and so on)

Create tables using a predefined schema in REST API call in Springboot

There is a scenario where I need to add entry for every user in a table. There will be around 5-10 records per user and the approximate number of users are approximately 1000. So, if I add the data of every user each day in a single table, the table becomes very heavy and the Read/Write operations in the table would take some time to return the data(which would be mostly for a particular user)
The tech stack for back-end is Spring-boot and PostgreSQL.
Is there any way to create a new table for every user dynamically from the java code and is it really a good way to manage the data, or should all the data should be in a single table.
I'm concerned about the performance of the queries once there are many records in case of a single table holding data for every user.
The model will contain the similar things like userName, userData, time, etc.
Thank you for you time!
Creating one table per user is not a good practice. Based on the information you provided, 10000 rows are created per day. Any RDBMS will be able to perfectly handle this amount of data without any performance issues.
By making use of indexing and partitioning, you will be able to address any potential performance issues.
PS: It is always recommended to define a retention period for data you want to keep in operational database. I am not sure about your use-case, but if possible define a retention period and move older data out of operational table into a backup storage.

Dynamically Add Rows to Entry Form - Access 2010

I'm building a database to track usage/generate reports of particular products on a daily basis. There is a limited number of products (about 20) where they're removed from inventory, used in production, and then remaining product is returned to inventory but not put back into the system. The idea is to have production record how much they receive and then have storage record how much they get back from production.
The tables are pretty straightforward - I have one table with product properties that I completely control for the foreseeable future and another that will house usage data. The issue I'm having is how to design the form for operations for when they receive product. I don't want to have a huge list of product #'s with entries for each one (usually 5 to 10 are used on a daily basis). I also don't want to populate the data table with a bunch of blank records. I want them to add a line, select the product code from a drop down, record the amount received and then repeat. Preferably the drop down would update to exclude any previously filled in codes on the form. I want to do this all at once to limit duplicate records. So they fill out all the product #'s they received and how many of each they received and then click save to have it populate the data table.
Is there a way to have an "add line" option for a form of this design? I know this isn't a terribly extensive database but I want to design and test it prior to integration into our plant's larger scale product tracking system.

Statistics (& money transfers) with mongoDB

1) My first questions is regarding the best solution to store statistics with mongoDB
If i want to store large amounts of statistics (lets say visitors on a specific site - down to hourly), a noSQL DB like mongoDB seems to work very fine. But how do I structure those tables to get the most out of mongoDB?
I'd increase the visitor amount for that specific object id (for example SITE_MONTH_DAY_YEAR_SOMEOTHERFANCYPARAMETER) by one every time a user visits the page. But if the database gets big (>10g), doesnt that slow down (like it would on mysql) because it has to search for the object_id and update it? Is the data always accurate when I update it (afaik mongoDB does not have any table locking?)
Wouldnt it be faster just INSERTING one row for every visitor? (and more accurate) On the other hand, reading the statistics would be much faster with my first solution, wouldnt it? (especially in terms of "grouping" by site/date/[...]).
2) For every visitor counted I'd like to make a money transfer between two users. It is crucial that those transfers are always accurate. How would you achieve that?
I was thinking about a hourly cron that picks the amount of vistiors from the mongoDB.statistics for the last hour and updates the users balance. I'd prefer doing this directly/live while counting the user - but what happens if thousands of visitors are calling the script simultaneously, is there any risk of getting wrong balances?

Calculating price drop Apps or Apps gonna free - App Store

I am working on a Website which is displaying all the apps from the App Store. I am getting AppStore data by their EPF Data Feeds through EPF Importer. In that database I get the pricing of each App for every store. There are dozen of rows in that set of data whose table structure is like:
application_price
The retail price of an application.
Name Key Description
export_date The date this application was exported, in milliseconds since the UNIX Epoch.
application_id Y Foreign key to the application table.
retail_price Retail price of the application, or null if the application is not available.
currency_code The ISO3A currency code.
storefront_id Y Foreign key to the storefront table.
This is the table I get now my problem is that I am not getting any way out that how I can calculate the price reduction of apps and the new free apps from this particular dataset. Can any one have idea how can I calculate it?
Any idea or answer will be highly appreciated.
I tried to store previous data and the current data and then tried to match it. Problem is the table is itself too large and comparing is causing JOIN operation which makes the query execution time to more than a hour which I cannot afford. there are approx 60, 000, 000 rows in the table
With these fields you can't directly determine price drops or new application. You'll have to insert these in your own database, and determine the differences from there. In a relational database like MySQL this isn't too complex:
To determine which applications are new, you can add your own column "first_seen", and then query your database to show all objects where the first_seen column is no longer then a day away.
To calculate price drops you'll have to calculate the difference between the retail_price of the current import, and the previous import.
Since you've edited your question, my edited answer:
It seems like you're having storage/performance issues, and you know what you want to achieve. To solve this you'll have to start measuring and debugging: with datasets this large you'll have to make sure you have the correct indexes. Profiling your queries should helping in finding out if they do.
And probably, your environment is "write once a day", and read "many times a minute". (I'm guessing you're creating a website). So you could speed up the frontend by processing the differences (price drops and new application) on import, rather than when displaying on the website.
If you still are unable to solve this, I suggest you open a more specific question, detailing your DBMS, queries, etc, so the real database administrators will be able to help you. 60 million rows are a lot, but with the correct indexes it should be no real trouble for a normal database system.
Compare the table with one you've downloaded the previous day, and note the differences.
Added:
For only 60 million items, and on a contemporary PC, you should be able to store a sorted array of the store id numbers and previous prices in memory, and do an array lookup faster than the data is arriving from the network feed. Mark any differences found and double-check them against the DB in post-processing.
Actually I also trying to play with these data, and I think best approach for you base on data from Apple.
You have 2 type of data : full and incremental (updated data daily). So within new data from incremental (not really big as full) you can compare only which record updated and insert them into another table to determine pricing has changed.
So you have a list of records (app, song, video...) updated daily with price has change, just get data from new table you created instead of compare or join them from various tables.
Cheers