I am having a postgres table with 2 columns username and items they have bought. On an average we could expect about 100 items for each user. While thinking about this, I am wondering the possible issues I could face if I plan to have a unique table corresponding to each user rather than having a single table storing data of all the items bought by all the users.
One thing I realised was, since postgres stores data in chunks of 8kB I'll be consuming lot of disc space than required (as I will certainly not have 8kB of data for any specific user). But, will this help me to reduce time to fetch the items corresponding to a given user as I could directly fetch all the rows from the table of that specific user compared to searching for the rows corresponding to a given user (if I plan to save all the data in a single table) and then return the items?
Related
There is a scenario where I need to add entry for every user in a table. There will be around 5-10 records per user and the approximate number of users are approximately 1000. So, if I add the data of every user each day in a single table, the table becomes very heavy and the Read/Write operations in the table would take some time to return the data(which would be mostly for a particular user)
The tech stack for back-end is Spring-boot and PostgreSQL.
Is there any way to create a new table for every user dynamically from the java code and is it really a good way to manage the data, or should all the data should be in a single table.
I'm concerned about the performance of the queries once there are many records in case of a single table holding data for every user.
The model will contain the similar things like userName, userData, time, etc.
Thank you for you time!
Creating one table per user is not a good practice. Based on the information you provided, 10000 rows are created per day. Any RDBMS will be able to perfectly handle this amount of data without any performance issues.
By making use of indexing and partitioning, you will be able to address any potential performance issues.
PS: It is always recommended to define a retention period for data you want to keep in operational database. I am not sure about your use-case, but if possible define a retention period and move older data out of operational table into a backup storage.
According to Amazon:
Load your data in sort key order to avoid needing to vacuum.
As long as each batch of new data follows the existing rows in your
table, your data will be properly stored in sort order, and you will
not need to run a vacuum. You don't need to presort the rows in each
load because COPY sorts each batch of incoming data as it loads.
The sort key is a timestamp and the data is loaded as it comes in. 200 rows are loaded at a time. However the rows are 99% unsorted. Why are so many rows unsorted?
You should double check the data you insert, as if your newly data is sorted by the SORTKEY and it intended to be placed at the end of the table, only then the VACUUM will not be necessary. If you have at least one row that is placed after the intentional data to be loaded, this data will be placed into an unsorted region in Redshift, which causes your data to be unsorted.
See this example for further information.
Also, read about VACUUM process.
I am developing an application using a virtual private database pattern in postgres.
So every user gets his id and all rows of this user will hold this id to be separated from others. this id should also be part of the primary key. In addition every row has to have a id which is unique in the scope of the user. This id will be the other part of the primary key.
If we have to scale this across multiple servers we can also append a third column to the pk identifying the shard this id was generated at.
My question now is how to create per user unique ids. I came along with some options which i am not sure about all the implications. The 2 solutions that seem most promising to me are:
creating one sequence per user:
this can be done automatically, using a trigger, every time a user is created. This is for sure transaction safe and I think it should be quite ok in terms of performance.
What I am worried about is that this has to work for a lot of users (100k+) and I don't know how postgres will deal with 100k+ sequences. I tried to find out how sequences are implemented but without luck.
counter in user table:
keep all users in a table with a field holding the latest id given for this user.
when a user starts a transaction I can lock the row in the user table and create a temp sequence with the latest id from the user table as a starting value. this sequence can then be used to supply ids for new entries.
before exiting the transaction the current value has to be written back to the user table and the lock has to be released.
If another transaction from the same user tries to concurrently insert rows it will stall until the first transaction releases its lock on the user table.
This way I do not need thousands of sequences and i don't think that ther will be concurrent accesses from one user frequently (the application has oltp character - so there will not be long lasting transactions) and even if this happens it will just stall for about a second and not hurt anything.
The second part of my question is if I should just use 2 columns (or maybe three if the shard_id joins the game) and make them a composite pk or if I should put them together in one column. I think handling will be way easier having them in separate columns but what does performance look like? Lets assume both values are 32bit integers - is it better tho have 2 int columns in an index or 1 bigint column?
thx for all answers,
alex
I do not think sequences would be scalable to the level you want (100k sequences). A sequence is implemented as a relation with just one row in it.
Each sequence will appear in the system catalog (pg_class) which also contains all of the tables, views, etc. Having 100k rows there is sure to slow the system down dramatically. The amount of memory required to hold all of the data structures associated with these sequence relations would be also be large.
Your second idea might be more practical, if combined with temporary sequences, might be more scalable.
For your second question, I don't think a composite key would be any worse than a single column key, so I would go with whatever matches your functional needs.
We have a searchable list. Initially the list displayed will be empty. As soon as the user types in 3 characters, it will start hitting the database and fetching data which matches the 3 characters typed in. The list contains names of organizations. The catch is - the data may have "Private" but the user may try Pvt or something like that and end up thinking the data is missing. Which is the best approach to get around this problem? One approach we are considering is to have all possible combinations of the values a user may try in another column, do the look up against that column, and display the column which has the 'clean' data. Any other approaches? Performance is not a big concern because at most we will have 5000 records in the master table. The master data will be queries by the user once or twice for registration.
Hi i am building database, with couple tables (products, orders, costumers) and i am interested if it is possible to do such a trick, generate table every-day based on orders table with name of current day, because orders table will have about 1000 or more rows every-day and it will hurt application speed.
1000 rows is nothing. What database are you using? Most modern databases can handle millions of rows with no issue, as long as you put some effort into proper indexing of the table.
From your comment, I'm assuming you don't know about database table indexing.
A database index is a data structure that improves the speed of data
retrieval operations on a database table at the cost of slower writes
and increased storage space. Indices can be created using one or more
columns of a database table, providing the basis for both rapid random
lookups and efficient access of ordered records.
From http://en.wikipedia.org/wiki/Database_index
You need to add indexes to your database tables to ensure they can be searched optimally.
What you are suggesting is a bad idea IMO, and it's going to make working with the application a pain. Instead, if you really fill this table with vast amounts of data you could consider periodically archiving old data, but don't do this until you really need to.