SQLite optimization -- two databases vs. one - iphone

I'm developing an iPhone app with two SQLite databases. I'm wondering if I should be using only one.
The first database has a dozen small tables (each having four or five INTEGER or TEXT columns). Most of the tables have only a few dozen rows, but one table will have several thousand rows of "Items." The tables are all related so they have many joins between them.
The second database contains only one table. It has a BLOB column with one row related to some of the "Items" from the first database (photos of the items). These BLOBs are only used in a small part of the program, so they are rarely attached or joined with the first database (rows can be fetched easily without joins).
The first database file size will be about a megabyte; the db with the BLOBs will be much larger -- about 50 megabytes. I separated the BLOBs because I wanted to be able to backup the smaller database without having to carry the BLOBs, too. I also thought separating the BLOBs might improve performance for the smaller tables, but I'm not sure.
What are the pros and cons of having two databases (SQLIte databases in particular ) to separate the BLOBs, vs. having just one database with a table full of very narrow tables mixed in with the table of BLOBs?

Even at 50 MB, that is a very small database size, and you're not going to see a performance difference due to the size of the database itself.
However, if you think performance is the issue, check your DB queries, and make sure you're not 'SELECT'ing the BLOB rows in queries where you don't need that information. Returning more rows from the database than you need (think of it as reading more data from a hard drive than you need to), is where you're much more likely to see performance issues.
I don't see any pros to separating them, unless you are under a very tight restriction on the total size of the backup. It's just going to add complexity in weird places in your app.

Related

huge postgres database size reduction

I have a huge database (9 billon rows, 3000 columns) which is currently host on postgres, it is huge about 30 TB in size. I am wondering if there is some practical ways to reduce the size of the database while preserving the same information in the database, as storage is really costly.
If you don't want to delete any data.
Vacuuming. Depending on how many updates/deletes your database performs there will be much garbage. A database that size is likely full of tables that do not cross vacuum thresholds often (pg13 has a fix for this). Manually running vacuums will allocate dead rows for re-use, and free up space where ends of pages are no longer needed.
Index management. These bloat over time and should be smaller than your tables. Re-indexing (concurrently) will give you some space back, or allow re-use of existing pages.
data de-duplication / normalisation. See where you can remove data from tables where it is not needed, or presented elsewhere in the database.

Redshift Performance of Flat Tables Vs Dimension and Facts

I am trying to create dimensional model on a flat OLTP tables (not in 3NF).
There are people who are thinking dimensional model table is not required because most of the data for the report present single table. But that table contains more than what we need like 300 columns. Should I still separate flat table into dimensions and facts or just use the flat tables directly in the reports.
You've asked a generic question about database modelling for data warehouses, which is going to get you generic answers that may not apply to the database platform you're working with - if you want answers that you're going to be able to use then I'd suggest being more specific.
The question tags indicate you're using Amazon Redshift, and the answer for that database is different from traditional relational databases like SQL Server and Oracle.
Firstly you need to understand how Redshift differs from regular relational databases:
1) It is a Massively Parallel Processing (MPP) system, which consists of one or more nodes that the data is distributed across and each node typically does a portion of the work required to answer each query. There for the way data is distributed across the nodes becomes important, the aim is usually to have the data distributed in a fairly even manner so that each node does about equal amounts of work for each query.
2) Data is stored in a columnar format. This is completely different from the row-based format of SQL Server or Oracle. In a columnar database data is stored in a way that makes large aggregation type queries much more efficient. This type of storage partially negates the reason for dimension tables, because storing repeating data (attibutes) in rows is relatively efficient.
Redshift tables are typically distributed across the nodes using the values of one column (the distribution key). Alternatively they can be randomly but evenly distributed or Redshift can make a full copy of the data on each node (typically only done with very small tables).
So when deciding whether to create dimensions you need to think about whether this is actually going to bring much benefit. If there are columns in the data that regularly get updated then it will be better to put those in another, smaller table rather than update one large table. However if the data is largely append-only (unchanging) then there's no benefit in creating dimensions. Queries grouping and aggregating the data will be efficient over a single table.
JOINs can become very expensive on Redshift unless both tables are distributed on the same value (e.g. a user id) - if they aren't Redshift will have to physically copy data around the nodes to be able to run the query. So if you have to have dimensions, then you'll want to distribute the largest dimension table on the same key as the fact table (remembering that each table can only be distributed on one column), then any other dimensions may need to be distributed as ALL (copied to every node).
My advice would be to stick with a single table unless you have a pressing need to create dimensions (e.g. if there are columns being frequently updated).
When creating tables purely for reporting purposes (as is typical in a Data Warehouse), it is customary to create wide, flat tables with non-normalized data because:
It is easier to query
It avoids JOINs that can be confusing and error-prone for causal users
Queries run faster (especially for Data Warehouse systems that use columnar data storage)
This data format is great for reporting, but is not suitable for normal data storage for applications — a database being used for OLTP should use normalized tables.
Do not be worried about having a large number of columns — this is quite normal for a Data Warehouse. However, 300 columns does sound rather large and suggests that they aren't necessarily being used wisely. So, you might want to check whether they are required.
A great example of many columns is to have flags that make it easy to write WHERE clauses, such as WHERE customer_is_active rather than having to join to another table and figuring out whether they have used the service in the past 30 days. These columns would need to be recalculated daily, but are very convenient for querying data.
Bottom line: You should put ease of use above performance when using Data Warehousing. Then, figure out how to optimize access by using a Data Warehousing system such as Amazon Redshift that is designed to handle this type of data very efficiently.

Guideline when designing a database in Postgresql

I am designing a database in Postgresql and I would like to have some expert advices before refactorizing my work.
The database naturally contains different parts that I plan to separate into schemas in order to have a mangling of object names that reflect logical organization of them. About 20 tables are for scientific purposes and 20 others are technical and 20 furthers are about administrative tasks.
Is that a good idea or am I misleading myself into a management overhead that I will regret later?
The database contains 3 tables that are huge. By huge, I mean there is more than 60 millions of rows in it and they might grow a little bit. I think I will create special tablespace for that tables. I would like to do it, in order to separate logically the place where data are stored because the rest of the database should be backuped in a different way than that three tables.
Further more one those 3 tables contains binary data that are not heavy but weight a bit when multiplying by the amount of rows and also this table grows faster than the 2 others. Then I will periodically purge it after backuping the table.
Is it a good idea to have more than one tablespace in a database? If so, is there any precaution to be taken when proceeding this way?
Thank you in advance for your advices.
Choosing good names & grouping database stuffs is always a wise choice, and such overheads are not usually considerable.
About separating tablespace of a single database, it also should not cause any special problem, I've a similar database (but in mysql) that has a large file table and I had to move all of it's content to another server for some optimization reasons and i had no problem with it till now.
There is a very more important matter in RDBMS designing and that's CORRECT TABLE INDEXING. I think choosing good indexes is most critical phase of designing a relational database and you'll see it's effect soon (when you begin to write JOIN queries!).
In general, designing and implementing database is an experimental job that depends to your situation and expertness, so you can't seek for a solid instruction.

PostgreSQL: What is the maximum number of tables can store in postgreSQL database?

Q1: What is the maximum number of tables can store in database?
Q2: What is the maximum number of tables can union in view?
Q1: There's no explicit limit in the docs. In practice, some operations are O(n) on number of tables; expect planning times to increase, and problems with things like autovacuum as you get to many thousands or tens of thousands of tables in a database.
Q2: It depends on the query. Generally, huge unions are a bad idea. Table inheritance will work a little better, but if you're using constraint_exclusion will result in greatly increased planning times.
Both these questions suggest an underlying problem with your design. You shouldn't need massive numbers of tables, and giant unions.
Going by the comment in the other answer, you should really just be creating a few tables. You seem to want to create one table per phone number, which is nonsensical, and to create views per number on top of that. Do not do this, it's mismodelling the data and will make it harder, not easier, to work with. Indexes, where clauses, and joins will allow you to use the data more effectively when it's logically structured into a few tables. I suggest studying basic relational modelling.
If you run into scalability issues later, you can look at partitioning, but you won't need thousands of tables for that.
Both are, in a practical sense, without limit.
The number of tables a database can hold is restricted by the space on your disk system. However, having a database with more than a few thousand tables is probably more an expression of an incorrect analysis of your application domain. Same goes for unions: if you have to union more than a handful of tables you probably should look at your table structure.
One practical scenario where this can happen is with Postgis: having many tables with similar attributes that could be joined in a single view (this is a flaw in the design of Postgis IMHO), but that would typically be handled at the application side (e.g. a GIS).
Can you explain your scenario where you would need a very large number of tables that need to be queried in one sweep?

Entity FrameWork CodeFirst- Table name

Hi i am building database, with couple tables (products, orders, costumers) and i am interested if it is possible to do such a trick, generate table every-day based on orders table with name of current day, because orders table will have about 1000 or more rows every-day and it will hurt application speed.
1000 rows is nothing. What database are you using? Most modern databases can handle millions of rows with no issue, as long as you put some effort into proper indexing of the table.
From your comment, I'm assuming you don't know about database table indexing.
A database index is a data structure that improves the speed of data
retrieval operations on a database table at the cost of slower writes
and increased storage space. Indices can be created using one or more
columns of a database table, providing the basis for both rapid random
lookups and efficient access of ordered records.
From http://en.wikipedia.org/wiki/Database_index
You need to add indexes to your database tables to ensure they can be searched optimally.
What you are suggesting is a bad idea IMO, and it's going to make working with the application a pain. Instead, if you really fill this table with vast amounts of data you could consider periodically archiving old data, but don't do this until you really need to.