Postgres: n:m intermediate table with type - postgresql

I have a table called "Tag" which consists of an Id, Name and Description column.
Now lets say I have the tables Character (C), Movie (M), Series (S) etc..
And I want to be able to tag entries in C, M, S with multiple tags and one tag may be used for multiple entries.
So I could realize it like this:
T -> TC <- C
T -> TM <- M
T -> TS <- S
Where TC, TM, TS are the intermediate tables.
I was wondering if I could combine TC, TM, TS into one table with a type column added and still use foreign keys.
As of yet I haven't found a way to do it.
Or is this something I shouldn't be doing?

As the comments above suggested you can't combine multiple table into a single one. If you want to have a single view of the "tag relationships" you can pull the needed information into a View. This way, you only need to write a longer query once and are able to use like a single table. Keep in mind that you can't insert data into a view (there are possibilities to do so, but they are a little advanced)

Related

Postgres: Query Values in nested jsonb-structure with unknown keys

I am quite new in working with psql.
Goal is to get values from a nested jsonb-structure where the last key has so many different characteristics it is not possible to query them explicitely.
The jsonb-structure in any row is as follows:
TABLE_Products
{'products':[{'product1':['TYPE'], 'product2':['TYPE2','TYPE3'], 'productN':['TYPE_N']}]}
I want to get the values (TYPE1, etc.) assigned to each product-key (product1, etc.). The product-keys are the unknown, because of too many different names.
My work so far achieves to pull out a tuple for each key:value-pair on the last level. To illustrate this here you can see my code and the results from the previously described structure.
My Code:
select url, jsonb_each(pro)
from (
select id , jsonb_array_elements(data #> '{products}') as pro
from TABLE_Products
where data is not null
) z
My result:
("product2","[""TYPE2""]")
("product2","[""TYPE3""]")
My questions:
Is there a way to split this tuple on two columns?
Or how can I query the values kind of 'unsupervised', so without knowing the exact names of 'product1 ... n'

Reference foreign keys using SSIS-Lookup

I am asking for help on the following topic. I am trying to create an ETL process with two Excel data sources (S1 ~300 rows and S2 ~7000 rows). S1 contains project information and employee details and S2 contains the amount of hours, which each employee worked in which project at a timestamp.
I want to insert the amount of hours, which each employee worked in each project at a timestamp, into the fact table by referencing to the existing primary keys in the dimension tables. If an entry is not present in the dimension tables already, i want to add a new entry first and use the newly generated id. The destination table structure looks as follows (Data Warehouse, Star Schema):Destination Table Structure
In SSIS, i created three Data Flow tasks for filling the Dimension Tables (project, employee and time) with distinct values (using group by, as S1 and S2 contain a lot of duplicate rows)first, and a fourth data flow task (see image below) to insert the FactTable data, and this is where I'm running into problems:
Data Flow Task FactTable
I am using three LookUp functions to retrieve the foreignKeys project_id, employee_id and time_id from the Dimension tables (using project name, employee number and timestamp). If the id is found, it is passed on all the way to Merge Join 1, if not, a new Dimension Entry is created (lets say project) and the generated project_id passed on instead. Same goes for employee and time respectively.
There is two issues with this:
1) The "amount of hours" (passed by Multicast four, see image above) is not matched in the final result (No Match)
2) The amount of rows being inserted keeps increasing forever (Endless Join, I belive due to the Merge joins).
What I've tried:
I have used one UNION instead of three Merge Joins before, but this resulted in the foreign keys being in seperate rows each, instead of merged together.
I used Merge (instead of Merge Join) and combined the join as well as sort conditions in as I fell all possible ways.
I understand that this scenario might be confusing for everybody else, but thank your for taking time looking at it! Any help is greatly appreciated.
Solved it
For anybody having similar issues:
Seperate Data Flows for filling Dimension Tables with those filling Fact Tables will do the trick.
Its a clean solution and easier to debug.
Also: Dont run the LookUp Functions in parallel, but rather one after each other and pass on the attributes. Saves unnecessary Merges as well.
So as a Sum Up:
Four Data Flow Tasks, three for filling dimension tables ONLY and one for filling fact tables ONLY.
Loading Multiple Tables using SSIS keeping foreign key relationships
The answer posted by onupdatecascade is basically it.
Good luck!

what's the utility of array type?

I'm totally newbie with postgresql but I have a good experience with mysql. I was reading the documentation and I've discovered that postgresql has an array type. I'm quite confused since I can't understand in which context this type can be useful within a rdbms. Why would I have to choose this type instead of using a classical one to many relationship?
Thanks in advance.
I've used them to make working with trees (such as comment threads) easier. You can store the path from the tree's root to a single node in an array, each number in the array is the branch number for that node. Then, you can do things like this:
SELECT id, content
FROM nodes
WHERE tree = X
ORDER BY path -- The array is here.
PostgreSQL will compare arrays element by element in the natural fashion so ORDER BY path will dump the tree in a sensible linear display order; then, you check the length of path to figure out a node's depth and that gives you the indentation to get the rendering right.
The above approach gets you from the database to the rendered page with one pass through the data.
PostgreSQL also has geometric types, simple key/value types, and supports the construction of various other composite types.
Usually it is better to use traditional association tables but there's nothing wrong with having more tools in your toolbox.
One SO user is using it for what appears to be machine-aided translation. The comments to a follow-up question might be helpful in understanding his approach.
I've been using them successfully to aggregate recursive tree references using triggers.
For instance, suppose you've a tree of categories, and you want to find products in any of categories (1,2,3) or any of their subcategories.
One way to do it is to use an ugly with recursive statement. Doing so will output a plan stuffed with merge/hash joins on entire tables and an occasional materialize.
with recursive categories as (
select id
from categories
where id in (1,2,3)
union all
...
)
select products.*
from products
join product2category on...
join categories on ...
group by products.id, ...
order by ... limit 10;
Another is to pre-aggregate the needed data:
categories (
id int,
parents int[] -- (array_agg(parent_id) from parents) || id
)
products (
id int,
categories int[] -- array_agg(category_id) from product2category
)
index on categories using gin (parents)
index on products using gin (categories)
select products.*
from products
where categories && array(
select id from categories where parents && array[1,2,3]
)
order by ... limit 10;
One issue with the above approach is that row estimates for the && operator are junk. (The selectivity is a stub function that has yet to be written, and results in something like 1/200 rows irrespective of the values in your aggregates.) Put another way, you may very well end up with an index scan where a seq scan would be correct.
To work around it, I increased the statistics on the gin-indexed column and I periodically look into pg_stats to extract more appropriate stats. When a cursory look at those stats reveal that using && for the specified values will return an incorrect plan, I rewrite applicable occurrences of && with arrayoverlap() (the latter has a stub selectivity of 1/3), e.g.:
select products.*
from products
where arrayoverlap(cat_id, array(
select id from categories where arrayoverlap(parents, array[1,2,3])
))
order by ... limit 10;
(The same goes for the <# operator...)

Better performance for SQLite Select Statement

I'm developing an Iphone App where the user types in any string into a searchbar and presses the search button. After that a result list should appear.
In my SQLite I have four columns a, b, c, d. Let's say they have the following Values:
Dataset 1:
a: code1
b: report1
c: description1_1
d: description1_2
Dataset 2:
a: code2
b: report2
c: description2_1
d: description2_2
So if the user enters a value of: "1_1" then the first dataset will be selected because of clumn c.
If the user enters a value of: "report" then the first and second dataset will be selected.
As I'm using a database with nearly 60.000 Datasets searching for a part-string is really killing the performance.
Setting an index at all 4 columns will make the size of the SQLite database much too huge.
So I didn't use an index at all.
My Select Statement looks like this:
NSString *sql = [NSString stringWithFormat:#"SELECT * FROM scode WHERE a LIKE '%#%#%#' OR c LIKE '%#%#%#' OR d LIKE '%#%#%#'", wildcard, searchBar.text, wildcard, wildcard, searchBar.text, wildcard, wildcard, searchBar.text, wildcard, wildcard, searchBar.text, wildcard];
Is there any good way to enhance the performance of searching for a part-string in all columns?
Thank you and kind regards,
Daniel
You're after Full Text Searching, which SQLite doesn't natively support. I don't have any experience with 3rd party support, but based on results there are a few options.
You answered your own question: Do the index on all four columns. And measure the size difference. Considering the storage capacity of the iPhone, you're probably out of balance trying to reduce storage.
The rule of thumb with SQLite performance is not to doa query that isn't indexed.
You can see what SQLite is actually doing by creating your database on the Mac using the same schema and EXPLAIN QUERY PLAN. (There's also EXPLAIN, which is more detailed but less obvious.)
You can create a separate table, with two columns: a pattern string and a key value (which is used to refer to your data tables). Lets call this table "search_index".
Then, on any change to your data table entries, you update the "search_index" table:
remove rows with keys of changed data table rows
for each column in data table, use the first X characters of the data, and add them to search_index with the key
You can work out the details yourself, but in this way, you just build your own (partial) search index.
When querying, you can use up to X characters to search in the search_index table alone. If the user types more than X characters you at least have a limited set of data table rows to search in. So you can search those 60k rows easily.
Find a good value for X to balance storage requirements and usability and performance.
EDIT: Looks like you do not want to search only the beginning of the words? Well, then you should not just use the "first X characters", but you should split the data into single words, and use the full words in search_index. Though in practice you will still have around a fourth of the index storage requirements compared to giving all columns an index. So, its still a good thing to build your own "search_index".

Relational Data to Flat File

I hope you can help find an answer to a problem that will become a recurring theme at work. This involves denormalising data from RDBMS tables to flat file formats with repeating groups (sharing domain and meaning) across columns. Unfortunately this is unavoidable.
Here's a very simplified example of the transformation I'd require:
TABLE A TABLE B
------------------- 1 -> MANY ----------------------------
A_KEY FIELD_A B_KEY A_KEY FIELD_B
A_KEY_01 A_VALUE_01 B_KEY_01 A_KEY_01 B_VALUE_01
A_KEY_02 A_VALUE_02 B_KEY_02 A_KEY_01 B_VALUE_02
B_KEY_03 A_KEY_02 B_VALUE_03
This will become:
A_KEY FIELD_A B_KEY1 FIELD_B1 B_KEY2 FIELD_B2
A_KEY_01 A_VALUE_01 B_KEY_01 B_VALUE_01 B_KEY_02 B_VALUE_02
A_KEY_02 A_VALUE_02 B_KEY_03 B_VALUE_03
Each entry from TABLE A will have one row in the output flat file with one column per related field from TABLE B. Columns in the output file can have empty values for fields obtained from TABLE B.
I realise this will create an extremely wide file, but this is a requirement. I've had a look at MapForce and Apatar, but I think this problem is too bizarre or I can't use them correctly.
My question: is there already a tool that will accomplish this or should I develop one from scratch (I don't want to reinvent the wheel)?
I'm pretty sure you can't solve this in plain SQL, but depending on your RDBMS, it may be possible to create a stored procedure or some such thing. Otherwise it's a fairly easy thing to do in a scripting language. Which technology are you using?
Does this help?
using-pivot-in-sql-server-2008
Thanks for all your help. As it turns out the relationship is ONE -> MAX of 3 and this constraint will not change as the data is now static so the following run-of-the-mill SQL works:
select A.A_KEY, A.FIELD_A, B.B_KEY, B.FIELD_B, B2.B_KEY, B2.FIELD_B, B3.B_KEY,
B3.FIELD_B
from
A left join B on (A.A_KEY = B.A_KEY)
left join B B2 on (A.A_KEY = B2.A_KEY and B2.B_KEY != B.B_KEY)
left join B B3 on (A.A_KEY = B3.A_KEY and B3.B_KEY != B.B_KEY
and B3.B_KEY != B2.B_KEY)
group by A.A_KEY
order by A.A_KEY