PostgreSQL: Insert into new table N times based on column value from origin table - postgresql

I have been tasked with remodelling a PostgreSQL database, and I am a bit stumped.
I have a table that looks similar to this:
| id | item_name | quantity | user_id |
| -- | --------- | -------- | ------- |
| 1 | item1 | 1 | 1 |
| 2 | item2 | 3 | 6 |
| 4 | item1 | 4 | 3 |
| 3 | item3 | 5 | 6 |
What I need to do is insert these into another table as unique rows based on quantity.
Needed table structure based on above values:
| id | item_name | user_id |
| -- | --------- | ------- |
| 1 | item1 | 1 |
| 2 | item2 | 6 |
| 3 | item2 | 6 |
| 4 | item2 | 6 |
| 5 | item1 | 3 |
| 6 | item1 | 3 |
| 7 | item1 | 3 |
| 8 | item1 | 3 |
| 9 | item3 | 6 |
| 10 | item3 | 6 |
| 11 | item3 | 6 |
| 12 | item3 | 6 |
| 13 | item3 | 6 |
I could do this pretty easily with a python script, but it is required for it to be done with Postgres and I lack a bit of the experience required in this specific scenario.

Use generate_series() function for retrieving values multiple times with specific column values.
Example: generate_series(start_value, end_value, interval)
-- PostgreSQL
select t.id, t.name, t.user_id
from (select *, generate_series(1, quantity, 1) no_of_times
from test) t
Please check this url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=351d8b5828ecd1050b78ac516329816b

Related

Facet a Mutli-value(MVA) type field in sphinx

I have executed below query in sphinx,
select MVA_FIELD from mySphinxIndex facet MVA_FIELD order by count(*) desc;
What I got is like,
+----------------------------+----------+
| MVA_FIELD | count(*) |
+----------------------------+----------+
| | 664 |
| 0 | 536 |
| 13 | 439 |
| 4,13 | 8 |
| 19,13 | 8 |
| 18,13,20 | 8 |
| 8,17,18 | 8 |
| 8,18,13 | 8 |
| 8,15,18 | 8 |
| 8,13,20 | 7 |
| 17,13 | 7 |
| 18,19,20 | 7 |
| 8,17 | 7 |
| 13,17,19 | 7 |
| 11,6 | 7 |
| 6,11,13 | 7 |
| 15,18 | 7 |
| 11,13,20 | 7 |
| 11,13,17 | 7 |
| 6,18,19 | 6 |
| 7,20 | 6 |
| 8,11,13 | 6 |
| 13,17,20 | 6 |
I want to get the count of each ids in MVA_FIELD. For example, I just want the count of 0, 4, 13,... each id separately. How to achieve this ?
Honestly dont how how to do it with FACET suger, but with a normal GROUP BY query, would just use the GROUPBY() function when grouping by a MVA attribute
SELECT GROUPBY() AS value,COUNT(*) FROM mySphinxIndex GROUP BY MVA_FIELD ORDER BY COUNT(*) DESC;
From the docs
A special GROUPBY() function is also supported. It returns the GROUP BY key. That is particularly useful when grouping by an MVA value, in order to pick the specific value that was used to create the current group.

Create a PostgreSQL function that becomes a formula field of a table retrieving related data from other table

The example above can be done on a SQL Server. It is a function that performs the calculation on another table while getting the current table field Id to list data from other table, return a single value.
Question: how to do the exact thing with PostgreSQL
SELECT TOP(5) * FROM Artists;
+------------+------------------+--------------+-------------+
| ArtistId | ArtistName | ActiveFrom | CountryId |
|------------+------------------+--------------+-------------|
| 1 | Iron Maiden | 1975-12-25 | 3 |
| 2 | AC/DC | 1973-01-11 | 2 |
| 3 | Allan Holdsworth | 1969-01-01 | 3 |
| 4 | Buddy Rich | 1919-01-01 | 6 |
| 5 | Devin Townsend | 1993-01-01 | 8 |
+------------+------------------+--------------+-------------+
SELECT TOP(5) * FROM Albums;
+-----------+------------------------+---------------+------------+-----------+
| AlbumId | AlbumName | ReleaseDate | ArtistId | GenreId |
|-----------+------------------------+---------------+------------+-----------|
| 1 | Powerslave | 1984-09-03 | 1 | 1 |
| 2 | Powerage | 1978-05-05 | 2 | 1 |
| 3 | Singing Down the Lane | 1956-01-01 | 6 | 3 |
| 4 | Ziltoid the Omniscient | 2007-05-21 | 5 | 1 |
| 5 | Casualties of Cool | 2014-05-14 | 5 | 1 |
+-----------+------------------------+---------------+------------+-----------+
The function
CREATE FUNCTION [dbo].[ufn_AlbumCount] (#ArtistId int)
RETURNS smallint
AS
BEGIN
DECLARE #AlbumCount int;
SELECT #AlbumCount = COUNT(AlbumId)
FROM Albums
WHERE ArtistId = #ArtistId;
RETURN #AlbumCount;
END;
GO
Now, (at SQL Server), after update the first table fields with ALTER TABLE Artists ADD AlbumCount AS dbo.ufn_AlbumCount(ArtistId); whe can list and get the following result.
+------------+------------------+--------------+-------------+--------------+
| ArtistId | ArtistName | ActiveFrom | CountryId | AlbumCount |
|------------+------------------+--------------+-------------+--------------|
| 1 | Iron Maiden | 1975-12-25 | 3 | 5 |
| 2 | AC/DC | 1973-01-11 | 2 | 3 |
| 3 | Allan Holdsworth | 1969-01-01 | 3 | 2 |
| 4 | Buddy Rich | 1919-01-01 | 6 | 1 |
| 5 | Devin Townsend | 1993-01-01 | 8 | 3 |
| 6 | Jim Reeves | 1948-01-01 | 6 | 1 |
| 7 | Tom Jones | 1963-01-01 | 4 | 3 |
| 8 | Maroon 5 | 1994-01-01 | 6 | 0 |
| 9 | The Script | 2001-01-01 | 5 | 1 |
| 10 | Lit | 1988-06-26 | 6 | 0 |
+------------+------------------+--------------+-------------+--------------+
but how to achieve this on postgresql?
Postgres doesn't support "virtual" computed column (i.e. computed columns that are generated at runtime), so there is no exact equivalent. The most efficient solution is a view that counts this:
create view artists_with_counts
as
select a.*,
coalesce(t.album_count, 0) as album_count
from artists a
left join (
select artist_id, count(*) as album_count
from albums
group by artist_id
) t on a.artist_id = t.artist_id;
Another option is to create a function that can be used as a "virtual column" in a select - but as this is done row-by-row, this will be substantially slower than the view.
create function album_count(p_artist artists)
returns bigint
as
$$
select count(*)
from albums a
where a.artist_id = p_artist.artist_id;
$$
language sql
stable;
Then you can include this as a column:
select a.*, a.album_count
from artists a;
Using the function like that, requires to prefix the function reference with the table alias (alternatively, you can use album_count(a))
Online example

Is there a V-lookup effect in Microsoft Access?

I am a novice self-teaching Microsoft Access.
I have an MS Access database with a table of students (Table1).
Table1
+----+-----------+----------+------------+------------+
| id | firstname | lastname | Year_Group | Form_Group |
+----+-----------+----------+------------+------------+
| 2 | mnb | nbgfv | 7 | 1 |
| 3 | jhg | uhgf | 8 | 2 |
| 4 | poi | ijuy | 9 | 2 |
| 5 | tgf | tgfd | 10 | 2 |
| 6 | wer | qwes | 11 | 2 |
+----+-----------+----------+------------+------------+
Every day students days are recorded sort of like Table2.
Table2
+----------+----+-----------+----------+------------+--------+-----------+----------+
| Date | id | firstname | lastname | Year_Group | Effort | Behaviour | Homework |
+----------+----+-----------+----------+------------+--------+-----------+----------+
| 28/02/19 | 2 | mnb | nbgfv | 7 | Good | Good | Y |
| 28/02/19 | 3 | jhg | uhgf | 8 | OK | OK | Y |
| 28/02/19 | 4 | poi | ijuy | 9 | Bad | Bad | N |
| 01/03/19 | 5 | tgf | tgfd | 10 | Good | OK | Y |
| 01/03/19 | 6 | wer | qwes | 11 | Good | Good | Y |
+----------+----+-----------+----------+------------+--------+-----------+----------+
Is there a way (when using a list box or combo box) to select a student from Table1 so that their information is used for the corresponding columns in Table2?
Or is there a more efficient way to do this?
Firstly, you should normalise your data.
Currently, you are repeating the firstname, lastname, and Year_Group data in two separate tables, which not only bloats your database, but also means that such data must be maintained in two separate places, potentially leading to inconsistencies and then uncertainty as to which is the master.
Instead, I would suggest that your Students table should contain all information pertaining to the characteristics of a student:
Students
+----+-----------+----------+------------+------------+
| id | firstname | lastname | Year_Group | Form_Group |
+----+-----------+----------+------------+------------+
| 2 | mnb | nbgfv | 7 | 1 |
| 3 | jhg | uhgf | 8 | 2 |
| 4 | poi | ijuy | 9 | 2 |
| 5 | tgf | tgfd | 10 | 2 |
| 6 | wer | qwes | 11 | 2 |
+----+-----------+----------+------------+------------+
And the information pertaining to each school day should only reference the student ID in the Students table:
SchoolDays
+----------+----+--------+-----------+----------+
| Date | id | Effort | Behaviour | Homework |
+----------+----+--------+-----------+----------+
| 28/02/19 | 2 | Good | Good | Y |
| 28/02/19 | 3 | OK | OK | Y |
| 28/02/19 | 4 | Bad | Bad | N |
| 01/03/19 | 5 | Good | OK | Y |
| 01/03/19 | 6 | Good | Good | Y |
+----------+----+--------+-----------+----------+
Then, if you want to display the data in its entirety, you would use a query which joins the two tables, e.g.:
select
t2.date,
t1.firstname,
t1.lastname,
t1.year_group,
t2.effort,
t2.behaviour,
t2.homework
from
students t1 inner join schooldays t2 on t1.id = t2.id

count continuously postgresql data

i need help with counting some data
this what i want
| user_id | action_id | count |
-------------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 2 |
| 4 | 3 | 1 |
| 5 | 3 | 2 |
| 6 | 3 | 3 |
| 7 | 4 | 1 |
| 8 | 5 | 1 |
| 9 | 5 | 2 |
| 10 | 6 | 1 |
this is what i have
| user_id | action_id | count |
-------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 3 | 1 |
| 6 | 3 | 1 |
| 7 | 4 | 1 |
| 8 | 5 | 1 |
| 9 | 5 | 1 |
| 10 | 6 | 1 |
i really need it for create some research about second action from users
how do i do it?
thank you
Using ROW_NUMBER should work here:
SELECT
user_id,
action_id,
ROW_NUMBER() OVER (PARTITION BY action_id ORDER BY user_id) count
FROM yourTable
ORDER BY
user_id;
Demo

Redshift Distribution By Child Columns

My Situation
I have some tables in my redshift cluster that all break down into either an order_id, shipment_id, or shipment_item_id depending on how granular the table is. order_id is a 1 to many relationship on shipment_id and shipment_id is a 1 to many on shipemnt_item_id.
My Question
I distribute on order_id, so all shipment_id and shipment_item_id records should be on the same nodes across the tables since they are grouped by order_id. My question is, when I have to join on shipment_id or shipment_item_id then will redshift know that the records are on the same nodes, or will it still broadcast the tables since they aren't joined on order_id?
Example Tables
unified_order shipment_details
+----------+-------------+------------------+ +-------------+-----------+--------------+
| order_id | shipment_id | shipment_item_id | | shipment_id | ship_day | ship_details |
+----------+-------------+------------------+ +-------------+-----------+--------------+
| 1 | 1 | 1 | | 1 | 1/1/2017 | stuff |
| 1 | 1 | 2 | | 2 | 5/1/2017 | other stuff |
| 1 | 1 | 3 | | 3 | 6/14/2017 | more stuff |
| 1 | 2 | 4 | | 4 | 5/13/2017 | less stuff |
| 1 | 2 | 5 | | 5 | 6/19/2017 | that stuff |
| 1 | 3 | 6 | | 6 | 7/31/2017 | what stuff |
| 2 | 4 | 7 | | 7 | 2/5/2017 | things |
| 2 | 4 | 8 | +-------------+-----------+--------------+
| 3 | 5 | 9 |
| 3 | 5 | 10 |
| 4 | 6 | 11 |
| 5 | 7 | 12 |
| 5 | 7 | 13 |
+----------+-------------+------------------+
Distribution
distribution_by_node
+------+----------+-------------+------------------+
| node | order_id | shipment_id | shipment_item_id |
+------+----------+-------------+------------------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 1 | 1 | 3 |
| 1 | 1 | 2 | 4 |
| 1 | 1 | 2 | 5 |
| 1 | 1 | 3 | 6 |
| 1 | 5 | 7 | 12 |
| 1 | 5 | 7 | 13 |
| 2 | 2 | 4 | 7 |
| 2 | 2 | 4 | 8 |
| 3 | 3 | 5 | 9 |
| 3 | 3 | 5 | 10 |
| 4 | 4 | 6 | 11 |
+------+----------+-------------+------------------+
The Amazon Redshift documentation does not go into detail how information is shared between nodes, but it is doubtful that it "broadcasts the tables".
Rather, information is probably sent between nodes based on need -- only the relevant columns would be shared, and possibly only sub-ranges of the data.
Rather than worrying too much about the internal implementation, you should test various DISTKEY and SORTKEY strategies against real queries to determine performance.
Follow the recommendations from Choose the Best Distribution Style to minimize the amount of data that needs to be sent between nodes and consult Amazon Redshift Best Practices for Designing Queries to improve queries.
You can EXPLAIN your query to see how data will be distributed (or not) during the execution. In this doc you'll see how to read the query plan:
Evaluating the Query Plan