PostgreSQL non uniformly distributed partitions - postgresql

Let's say I have a table containing people and their favourite fruit.
Name
Favourite fruit
John
Apple
James
Orange
Robert
Banana
Now let's say I partition table by favourite fruit but it turns out that 90% of peoples' favourite fruit is apple so I ended up with non-uniform access pattern. Partitioned tables have different sizes:
Can I somehow make postgres to do further partitioning so I end up with tables of same size but still each table containing users liking the same fruit:

Related

Aggregating logs label’s values in one line

I’m trying to create a table with an aggregation of values in the same field but without any calculation function.
I have a Loki LogQL query that transforms to a table with three labels. I want to do a “group by” with two of the labels, and the third one will be aggregated to have all the values in the same line together. For example:
The logs line from the query are:
Apple Buy 20
Apple Buy 20
Apple Sell 45
Apple Sell 45
Banana Buy 30
Banana Buy 30
Banana Buy 20
Banana Buy 20
Banana Sell 45
Banana Sell 45
And after transformations (Labels to fields, Filter by name, Group by - on all three labels, Organize fields), the table looks like this:
Table first example
I couldn't add a picture yet
And I want it to become like this:
Table after wanted change
So for Type Banana with Process Buy, all the values are aggregated together like in a list or vector (We can have the values ordered, but it’s not necessary, the Value is a string of a number).
I have been trying to do the change between the first table to the second and have encountered difficulties completing this change.
Any help would be very appreciated.
I have posted this also if the Grafana community and asking this here as well in the hope of a wider reach and finding an answer.

Extracting single words from entries with multiple words from a column in Postgresql

I have a table with multiple columns which has a column named categories. Each entry in a column is consisted of multiple words. Each row has a different number of words. How can I create another table with one column in which each entry would be a single word from the categories column from the previous table?
The table i have and the column i would like to target:
categories
Golf, Active Life
Specialty Food, Restaurants, Dim Sum, Imported Food, Food, Chinese, Ethnic Food, Seafood
I would like another table with one column:
categories
Golf
Active Life
Specialty Food
Restaurants
Dim Sum
...
EDIT: For anyone in the future, I needed a permanent table. Check the comments for full answer.
OK, it's easy to implement this in PG, just use string_to_array and unnest function, like below:
select
unnest(string_to_array(categories,',')) as word
into my_new_table
from
table
The example as below:
select unnest(string_to_array('Specialty Food, Restaurants, Dim Sum, Imported Food, Food, Chinese, Ethnic Food',','));
unnest
----------------
Specialty Food
Restaurants
Dim Sum
Imported Food
Food
Chinese
Ethnic Food
(7 rows)
Hope it works for you.

Is this table in first normal form?

I am currently studying SQL normal forms.
Lets say I have the following table the primary key is userid
userid FirstName LastName Phone
1 John Smith 555-555
1 Tim Jack 432-213
2 Sarah Mit 454-541
3 Tom jones 987-125
The book I'm reading states the following conditions must be true in order for a table to be in 1st normal form.
Rows contain data about an entity.
Columns contain data about attributes of the entities.
All entries in a column are of the same kind.
Each column has a unique name.
Cells of the table hold a single value.
The order of the columns is unimportant.
The order of the rows is unimportant.
No two rows may be identical.
A primary key Must be assigned
I'm not sure if my table violates the
8th rule No two rows may be identical.
Because the first two records in my table
1 John Smith 555-555
1 Tim Jack 432-213
share the same userid does that mean that they are considered
duplicate rows?
Or does duplicate records mean that every peace of data in the row
has to be the same for the record to be considered a duplicate row
see example below?
1 John Smith 555-555
1 John Smith 555-555
EDIT1: Sorry for the confusion
The question I was trying to ask is simple
Is this table below in 1st normal form?
userid FirstName LastName Phone
1 John Smith 555-555
1 Tim Jack 432-213
2 Sarah Mit 454-541
3 Tom jones 987-125
Based on the 9 rules given in the textbook I think it is but I wasn't sure that
if rule 8 No two rows may be identical
was being violated because of two records that use the same primary key.
The class text book and prof isn't really that clear on this subject which is why I am asking this question.
Or does duplicate records mean that every peace of data in the row has to be the same for the record to be considered a duplicate row see example below?
They mean that--the latter of your choices. Entire rows are what must be "identical". It's ok if two rows share the same values for one or more columns as long as one or more columns differ.
That's because a relation holds a set of values that are tuples/rows/records, and set is a collection of values that are all different.
But SQL & some relational algebras have different notions of "identical" in the case of NULLs compared to the relational model without NULLs. You should read what your textbook says about it if you want to know exactly what they mean by it. Two rows that have NULL in the same column are considered different. (Point 9 might be summarizing something involving NULLs. Depends on the explanation in the book.)
PS
There's no single notion of what a relation is. There is no single notion of "identical". There is no single notion of 1NF.
Points 3-8 are better described as (poor) ways of restricting how to interpret a picture of a table to get a relation. Your textbook seems to be (strangely) making "1NF" a property of such an interpretation of a picture of a table. Normally we simply define a relation to be a certain thing so if you have one then it has to have the defined properties. Then "in 1NF" applies to a relation & either means "is a relation" & isn't further used or it means certain further restrictions hold. A relation is a set of tuples/rows/records, and in the kind of relation your 3-8 describes they are sets of attribute/column/field name-value pairs & the values paired with a name have to be of the type paired with that name in some schema/heading that is a set of name-type pairs that is defined either as part of the relation or external to it.
Your textbook doesn't seem to present things clearly. It's definition of "1NF" is also idiosyncratic in that although 3-8 are mathematical, 1 & 2 are informal/heuristic (& 9 could be either or both).

Dimension Table Usage when we have a loaded fact table

i am new to data warehouse and i want to ask that on copying all the foreign key data and to the fact table then why we still use dimension as all the data is present in Fact table , can some please guide me.
Short answer: a typical dimension has additional attributes than only a key. Your fact table has a foreign key to a dimension where additional info is available and even grouping is possible.
Recommended reading: "The Data Warehouse Toolkit" by Ralph Kimball
A fact table should only store 1) the business metric that it models (e.g. a sales order/transaction, or some other business transaction that you are measuring); 2) foreign keys to the related dimensions.
A dimension table should only store the context/qualitative data that is necessary to understand your business transactions (your facts).
Let's say, for example, that you are modelling sales on retail stores; a very simplified dimensional model for this would be something like:
Store Dimension: name, street address, city, county, etc
Product Dimension: name, brand, description, sku, etc
Date Dimension: year, month, day, etc
Sales Fact table: fkStore, fkProduct, fkDate, unitsSold, salesAmmount
So, the fact table only holds the metrics/measures and foreign keys, but business users need to use the information stored in dimensions to be able to explore the facts. That's how you enable them to explore unitsSold or salesAmmount according to a specific product, or on a specific store/location, or on a specific date.
The fact table by itself only provides quantitative data ("ammount sold") while the dimensions provide the context that a business user needs to interpret that metric ("ammount of product X sold in store Y in 2017").
The decision on what falls into dimension or fact data is not clear cut in many cases. Typically data that is re-usable ( is meaningful in relation to other fact data) can be considered dimension data.
A lot of times fact data is the most changeable over time. Fact tables contain the history of these records changing over time
Like daily Sales numbers, nightly EndOfDay results, etc.
These are often of numeric type i.e. quantitative measures. Datawarehouse analysis then consists of bucketing ( summing / Grouping ) these numerics so they carry the narrative of a trend over time at varying levels of granularity
Where dimension data is of more of 'static' nature like Trade , Customer , Product details.
I recommend reading:
https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/

Design Database schema for billing system

I want to design a database for billing system. In one bill a customer might have purchased multiple different items ,for example fot bill Id 1 customer purchased 2 apples 3 bananas and 1 watermelon. i want to know how i can normalize this database.
This is a pretty standard, basic normalization exercise with a pretty standard solution. The usual approach is to have an orders table containing order ID, customer ID, order date &c., and an order_items table with a record for each line item on the order.