what are the properties of Normalization related to Database?
Is there any relation between ACID properties and Normalization properties?
Normalization and ACID are different things.
ACID refers to four important properties of transactions. Each transaction is
Atomic (all or nothing),
Consistent (changing the database
from one consistent state to
another),
Isolated (hidden from other
transactions until it's committed),
Durable (after COMMIT, data
survives a crash)
Normalization refers to the process of refining the structure of a relation (informally, a table) by identifying dependencies that lead to trouble, and then decomposing a table into two or more tables that have more favorable properties.
During normalization, the statements that create and alter tables are themselves database transactions. The ACID properties apply to those statements, as well as the more common SELECT, UPDATE, and DELETE statements.
No there is no relation between ACID and Normalization properties.
check basic of normalization : http://databases.about.com/od/specificproducts/a/normalization.htm
Related
In this page in Microsoft's documentation on EF it is stated literally
Entity Framework does not wrap queries in a transaction
If I am right, this means that sql reads are not implied with transactions and thus every select in our code is executed independently. But if this is so, can we ensure that two reads are consistent between each other? In the typical scenario, is there a warranty that the sum of the loaded amount of A and the loaded amount of B will be right (in some connection) if a transfer between A and B is started (in a different connection) between the read of A and the read of B? Would Entity Framework be able to solve this case in some way?
The built-in solution in EF is client-side optimistic concurrency. On update EF will build a query that ensures that the row to be updated has not been changed since it was read.
Properties configured as concurrency tokens are used to implement
optimistic concurrency control: whenever an update or delete operation
is performed during SaveChanges, the value of the concurrency token on
the database is compared against the original value read by EF Core.
If the values match, the operation can complete. If the values do not
match, EF Core assumes that another user has performed a conflicting
operation and aborts the current transaction.
You can also opt in to Transactions at whatever isolation level you choose, which may provide similar protections. Or use Raw SQL queries with lock hints for your target database.
I am developing a multi-step data pipeline that should optimize the following process:
1) Extract data from a NoSQL database (MongoDB).
2) Transform and load the data into a relational (PostgreSQL) database.
3) Build a data warehouse using the Postgres database
I have manually coded a script to handle steps 1) and 2), which is an intermediate ETL pipeline. Now my goal is to build the data warehouse using the Postgres database, but I came across with a few doubts regarding the DW design. Below is the dimensional model for the relational database:
There are 2 main tables, Occurrence and Canonical, from which inherit a set of others (drawn in red and blue, respectively). Note that there are 2 child data types, ObserverNodeOccurrence and CanonicalObserverNode, that have an extra many-to-many relationship with another table.
I made some research regarding how inheritance should be implemented in a data warehouse and figured the best practice would be to merge together the family data types (super and child tables) into a single table. Doing this would imply adding extra attributes and a lot of null values. My new dimensional model would look like the following:
Question 1: Do you think this is the best approach to address this problem? If not, what would be?
Question 2: Any software recommendations for on-premise data warehouses? (on-premise is a must since it contains sensitive data)
Usually having fewer tables to join and denormalizing data will improve query performance for data warehouse queries, so they are often considered a good thing.
This would suggest your second table design. NULL values don't occupy any space in a PostgreSQL table, so you need not worry about that.
As described here there are three options to implement inheritance in a relational database.
IMO the only practicable way to be used in data warehouse is the Table-Per-Hierarchy option, which merges all entities in one table.
The reason is not only the performance gain by saving the joins. In data warehouse often the historical view of the data is important. Think, how would you model a change in a subtype in some entity?
An important thing is to define a discriminator column which uniquely defines the source entity.
NOTE: I have never done this before:
What are some steps or documentation to help normalize tables/views in a database? Currently, there are several tables and views in a database that do not use primary/foreign key concept and sort of repeats same information in multiple tables.
I'd like to clean this up and also somewhat setup a process that would keep relationship updated. Example, if a person zipcode changes or record is removed then it automatically updates its relationship with other tables row/s.
NOTE:* My question is to normalize existing database tables. The tables are live so how do I approach normalization? Do I create a brand new database with table structure I want and then move data to that database? Once data moved, I plug in stored procedures and imports?
This question is somewhat broad, so I will only explain the concept.
Views are generally used for reporting/data presentation purposes and therefore I would not try to normalise them. Your case may be different.
You also need to be clear about primary / foreign key concept:
Lack of actual constraints (e.g. PRIMARY KEY, FOREIGN KEY) defined on the table does not mean that the tables do not have logical relationships on columns.
Data maintenance can be implemented in Triggers.
If you really have a situation where a lot of highly de-normalised data exists in tables for no apparent reason and you want to normalise it then this problem can be approached in two ways:
Full re-write - I would recommend for small / new Apps
"Gradual" re-factoring - large / mature applications, where underlying data relationships are complex and / or may not be fully understood.
Within "Gradual" re-factoring there are a few ways as well:
2.a. You take 1 old table and replace it with a new table and at the same time change all code that uses the old table to use the new table. For large systems this can be problematic as you simply may not be aware of all places that reference this table. On the other hand, it may be useful for situations where the table structure change is not significant and/or when the number of dependencies is small.
2.b. Another way is to create new table(s) (in the same database) in the shape / form you desire. The current tables should be replaced with Views that return identical data (to old tables) but sourced from "new" tables. This approach removes / minimises the need to modify all dependencies immediately. The drawback is that the View that replaces the old table can become rather complex, especially if View Instead Of Triggers are needed to be implemented.
In order to cut down on "stupid" tables (the ones which are identical for several related parent entities) we made a few generic tables.
Here is an example:
tbl_settings
id
owner_type (e.g. "account", "user" etc.)
owner_id (actual ID of a foreign table record)
setting_name
setting_value
Now the problem are the deletes, where it is quite easy to forget to delete e.g. the user's settings when the user is deleted.
What is the right way to handle deletes for this kind of a table in PostgreSQL?
Do it in an application (e.g. when deleting an user, do a manual delete of related settings)?
Do it in a database trigger on the tbl_user (and in all other parent tables)?
Something else?
If a tables relationships have meaning only in the application and upwards - i.e. it has no bearing on the referential integrity of the data in the database - you can do this in the application layer.
If "orphaned" records violate data (as opposed to business logic) relationships, then do this in the database: the safest way is probably via a trigger, though that has its disadvantages too (e.g. the likelihood of obfuscating DML errors is higher if there is a trigger action involved).
My impression from your question is that these tables are mainly there because of some business logic, in which case I would handle the deletes outisde the database, in an ORM layer, for example.
when we say rdbms that means it may be oracle, my sql, ms access etc.. But for dbms what are the examples. Is there any example or it just the concept?.
A DBMS is a database management system. There are two crucial features a DBMS must provide:
storing data
standardised access to the data
The second function is the crucial one. I can connect to a DBMS with a generic client (e.g. through JDBC and discover the organisation of the data stored therein. I can do this because a real DBMS maintains metadata - data about the date it stores - in a data dictionary or an INFORMATION_SCHEMA.
So we can see that flat files do not constitute a DBMS. They handle the first part, persistence, easily enough, but they fail on the second: only the application (or person) which wrote the data can interpret the data structure. This means that spreadsheets don't count as a DBMS either (although a case can be made for XML files).
An RDBMS is a particular type of DBMS which implements Codd's famous Twelve Rules. Many database theoreticians would arge that the products you list (Oracle, MySQL, MS Access) are examples of SQL DBMS rather than RDBMS because they fail to satisfy two or more of Codd's rules: they all fail Rule 0 and then at least one other rule.
There are other types of DBMS. There is the hierarchical form, of which the most venerable is MUMPS . There are object-oriented OODBMS, such as Intersystems Cache. There are network (graph) DBMS e.g. IDMS and Neo4J. And thene there's the whole raft of other NoSQL databases most of which probably qualify as DBMSes.
dbms = database management system
rdbms = relational database management system
So every rdbms is also a dbms.
You might want to name Gemstone, an OODBMS, or Cache, a hierarchical one.
Database management system has a list of links to various types of DBMSs which then link to lists of examples for that type, for example a list of Object DBMSs