Derived Associations with Entity Framework - entity-framework

I've just started with Entity Framework this week, and am struggling with a few of the concepts.
Right now, I have a database structure that I am struggling to transfer across to entity framework.
I have started with the model first, and have this:
------------------ -----------------------
| Order_Item | | Order_FetchableItem |
---------------- ---------------------
| order_id | | order_id |
| item_id | | item_id |
------------------ | fetch_url |
-----------------------
The idea is that orders contain items, and this relation is conveyed in the order_item table. HOWEVER, some (not all) of the items in an order have a URL, so this property needs to be stored too.
I can't get this working in EF, because EF detects Order_Item as a relation, and I can't derive from it. What's the best alternative for doing this?
I have considered moving the fetch_url field to the Order_Item table, but as it is a wide column, I don't want lots of NULL values in the order_item table.
Thanks, and please excuse the formatting above!

Related

Postgresql: More efficient way of joining tables based on multiple address fields

I have a table that lists two connected values, ID and TaxNumber(TIN) that looks somewhat like this:
IDTINMap
ID | TIN
-------------------------
1234567890 | 654321
-------------------------
3456321467 | 986321
-------------------------
8764932312 | 245234
An ID can map to multiple TINs, and a TIN might map to multiple IDs, but there is a Unique constraint on the table for an ID, TIN pair.
This list isn't complete, and the table has about 8000 rows. I have another table, IDListing that contains metadata for about 9 million IDs including name, address, city, state, postalcode, and the ID.
What I'm trying to do is build an expanded ID - TIN map. Currently I'm doing this by first joining the IDTINMap table with IDListing on the ID field, which gives something that looks like this in a CTE that I'll call Step1 right now:
ID | TIN | Name | Address | City | State | Zip
------------------------------------------------------------------------------------------------
1234567890 | 654321 | John Doe | 123 Easy St | Seattle | WA | 65432
------------------------------------------------------------------------------------------------
3456321467 | 986321 | Tyler Toe | 874 W 84th Ave| New York | NY | 48392
------------------------------------------------------------------------------------------------
8764932312 | 245234 | Jane Poe | 984 Oak Street|San Francisco | CA | 12345
Then I go through again and join the IDListing table again, joining Step1 on address, city, state, zip, and name all being equal. I know I could do something more complicated like fuzzy matching, but for right now we're just looking at exact matches. In the join I preserve the ID in step 1 as 'ReferenceID', keep the TIN, and then have another column of all the matching IDs. I don't keep any of the address/city/state/zip info, just the three numbers.
Then I can go back and insert all the distinct pairs into a final table.
I've tried this with a query and it works and gives me the desired result. However the query is slower than to be desired. I'm used to joining on rows that I've indexed (like ID or TIN) but it's slow to join on all of the address fields. Is there a good way to improve this? Joining on each field individually is faster than joining on a CONCAT() of all the fields (This I have tried). I'm just wondering if there is another way I can optimize this.
Make the final result a materialized view. Refresh it when you need to update the data (every night? every three hours?). Then use this view for your normal operations.

Know which table are affected by a connection

I want to know if there is a way to retrieve which table are affected by request made from a connection in PostgreSQL 9.5 or higher.
The purpose is to have the information in such a way that will allow me to know which table where affected, in which order and in what way.
More precisely, something like this will suffice me :
id | datetime | id_conn | id_query | table | action
---+----------+---------+----------+---------+-------
1 | ... | 2256 | 125 | user | select
2 | ... | 2256 | 125 | order | select
3 | ... | 2256 | 125 | product | select
(this will be the result of a select query from user join order join product).
I know I can retrieve id_conn througth "pg_stat_activity", and I can see if there is a running query, but I can't find an "history" of the query.
The final purpose is to debug the database when incoherent data are inserted into the table (due to a lack of constraint). Knowing which connection do the insert will lead me to find the faulty script (as I have already the script name and the id connection linked).

Versioning in the database

I want to store full versioning of the row every time a update is made for amount sensitive table.
So far, I have decided to use the following approach.
Do not allow updates.
Every time a update is made create a new
entry in the table.
However, I am undecided on what is the best database structure design for this change.
Current Structure
Primary Key: id
id(int) | amount(decimal) | other_columns
First Approach
Composite Primary Key: id, version
id(int) | version(int) | amount(decimal) | change_reason
1 | 1 | 100 |
1 | 2 | 20 | correction
Second Approach
Primary Key: id
Uniqueness Index on [origin_id, version]
id(int) | origin_id(int) | version(int) | amount(decimal) | change_reason
1 | NULL | 1 | 100 | NULL
2 | 1 | 2 | 20 | correction
I would suggest a new table which store unique id for item. This serves as lookup table for all available items.
item Table:
id(int)
1000
For the table which stores all changes for item, let's call it item_changes table. item_id is a FOREIGN KEY to item table's id. The relationship between item table to item_changes table, is one-to-many relationship.
item_changes Table:
id(int) | item_id(int) | version(int) | amount(decimal) | change_reason
1 | 1000 | 1 | 100 | NULL
2 | 1000 | 2 | 20 | correction
With this, item_id will never be NULL as it is a valid FOREIGN KEY to item table.
The best method is to use Version Normal Form (vnf). Here is an answer I gave for a neat way to track all changes to specific fields of specific tables.
The static table contains the static data, such as PK and other attributes which do not change over the life of the entity or such changes need not be tracked.
The version table contains all dynamic attributes that need to be tracked. The best design uses a view which joins the static table with the current version from the version table, as the current version is probably what your apps need most often. Triggers on the view maintain the static/versioned design without the app needing to know anything about it.
The link above also contains a link to a document which goes into much more detail including queries to get the current version or to "look back" at any version you need.
Why you are not going for SCD-2 (Slowly Changing Dimension), which is a rule/methodology to describe the best solution for your problem. Here is the SCD-2 advantage and example for using, and it makes standard design pattern for the database.
Type 2 - Creating a new additional record. In this methodology, all history of dimension changes is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. Both the prior and new rows contain as attributes the natural key(or other durable identifiers). Also 'effective date' and 'current indicator' columns are used in this method. There could be only one record with the current indicator set to 'Y'. For 'effective date' columns, i.e. start_date, and end_date, the end_date for current record usually is set to value 9999-12-31. Introducing changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute could be added in the future.
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y
Detail Explanation::::
Here, all you need to add the 3 extra column, START_DATE, END_DATE, CURRENT_FLAG to track your record properly. When the first time record inserted # source, this table will be store the value as:
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 NULL Y
And, when the same record will be updated then you have to update the "END_DATE" of the previous record as current_system_date and "CURRENT_FLAG" as "N", and insert the second record as below. So you can track everything about your records. as below...
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y

Is it possible to use different forms and create one row of information in a table?

I have been searching for a way to combine two or more rows of one table in a database into one row.
I am currently creating multiple web-based forms that connect to one table in my database. Is there any way to write some mysql and php code that will take separate form submissions and put them into one row of the database instead of multiple rows?
Here is an example of what is going into the database:
This is all in one table with three rows.
Form_ID represents the three different forms that I used to insert the data into the table.
Form_ID | Lot_ID| F_Name | L_Name | Date | Age
------------------------------------------------------------
1 | 1 | John | Evans | *NULL* | *NULL*
-------------------------------------------------------------
2 |*NULL* | *NULL* | *NULL* | 2017-07-06 | *NULL*
-------------------------------------------------------------
3 |*NULL* | *NULL* | *NULL* | *NULL* | 22
This is an example of three separate forms going into one table. Every time the submit button is hit the data just inserts down to the next row of information.
I need some sort of join or update once the submit button is hit to replace the preceding NULL values.
Here is what I want to do after the submit button is hit:
I want it to be combined all into one row but still in one table
Form_ID is still the three separate forms but only in one row now.
Form_ID |Lot_ID | F_Name | L_Name | Date | Age
----------------------------------------------------------
1 | 1 | John | Evans | 2017-07-06 | 22
My goal is once a one form has been submitted I want the next, different form submission to replace the NULL values in the row above it and so on to create a single row of information.
I found a way to solve this issue. I used UPDATE tablename SET columname = newColumnName WHERE Form_ID = newID
So this way when I want to update rows that have blanks spaces I have it finding the matching ID's

Database Design - Items and Regions

I have design problem of database structure.
The goal is to have database for offers that our clients offer per some geographical region.
Each offer can be offered in many regions.
The regions are in hierarhy - example:
subregion_1
subregion_11
region_111
region_112
subregion_12
region_121
region_122
subregion_2
subregion_21
region_221
Now I want to store in database the offer_1 and regions for that offer. I will give You 3 examples what I have to ahieve:
when my offer_1 is stored in region_111 then I would like to display this offer when user are browsing the subregion_1, subregion_11 and region_111
If offer_1 is stored in regions subregion_11 and region_121 then the offer should be displayed when user are browsing the subregion_1, subregion_11 and all branch of subregion_11, subregion_12 and region_121
when my offer_1 is stored in subregion_1 then the offer is displayed on subregion_1 page and all branch under subregion_1.
Also I have to provide a way to calculate the number of diffrent offers in each regions dynamicaly and very fast.
Does somebody have some advice how to aproach this design?
Here is what I have so Far.
Regions
------------------------------------------------------------
| id | level1 | level2 | level3 | name | level |
------------------------------------------------------------
| 02 | 02 | null | null | subregion_1 | 1 |
| 0201 | 02 | 01 | null | subregion_11 | 2 |
| 020103 | 02 | 01 | 03 | region_111 | 3 |
------------------------------------------------------------
Offers to regions
------------------------
| offer_id | region_id |
------------------------
| 1 | 020103 |
| 1 | 0202 |
------------------------
I created id for regions from concatenating level1, level2 and level3. In the table Offers_to_regions I store the offer and the region. Here I have region on level 3 (020103) and region on level 2 (0202) for offer 1.
With this design I have problems how to query the numbers of difrent offers per region, and how to query offers for regions on level1, level2 and level3 regions.
Well there is the obvious way which uses an id to point to a parent like this
CREATE TABLE Regions (
region_id INT AUTO_INCREMENT PRIMARY KEY,
parent_id INT,
region_name VARCHAR(100) NOT NULL,
FOREIGN KEY (parent_id) REFERENCES Regions(region_id)
);
But in your situation this could be considered an anti-pattern, since it is not so easy to query through the hierarchy (specially if the number of levels changes)
Another approach could be using something like Path Enumeration, where you store the hierarchy path similar to for example unix paths. E.g.
CREATE TABLE Regions (
region_id INT AUTO_INCREMENT PRIMARY KEY,
path VARCHAR(100),
region_name VARCHAR(100) NOT NULL
);
This will allow you to store your hierarchy like this
---------------------------------------------
| region_id | path | region_name |
---------------------------------------------
| 1 | 1/ | subregion_1 |
| 2 | 1/2/ | subregion_11 |
| 3 | 1/2/3/ | region_111 |
| 4 | 1/2/4/ | region_112 |
---------------------------------------------
This way, when querying your offers table (where each offer will have a ref. to the region_id), and while browsing lets say offer for the subregion_1 (with id 1) your query can look something like this.
select Offers.SOME_COLUMN, ......
from Offers, Regions
where Offers.region_id = Regions.region_id
and Regions.path like '1/%'
There are other patterns to model your hierarchical data, such as Nested Sets and Closure Table (maybe relevant) which you might be interested to look into as well. each has different pros and cons in terms of select/insert/delete performance
EDIT:
I just noticed you edited your question, also that offers could belong to more than one region. The above might need adjustments to support assigning more than one region, but the basic idea could still be applied.