How to convert rows into column while inserting into temp table? - postgresql

I want to convert rows into column while inserting into temp table. I read somewhere about crosstab but not exactly sure how to use it for my purpose.
So suppose firstly I query like
select cat_id, category_name from category_tab;
cat_id category_name
111 books
112 pen
113 copy
Now I want a temp table which should be like
s_no books pen copy
1
2
3
s_no will be serial number and rows of previous query becomes the column in new temp table.
How can I achieve this task in postgres ?

Related

How to Dynamically Transpose rows to columns in postgressql

I've Table in following picture and we need to take only one column out and create a new table with multiple columns (using rows from column3) as below. Postgresql. 1: https://i.stack.imgur.com/IUeTW.png
Please check the link for image of tables
Result -
New_Table
Cat | Dog | Gold | Pen
Note: This need to be dynamic as the data may change(we should be able to add new rows to column3 and we need to get them in New_Table as columns.
[1]: https://i.stack.imgur.com/gATsO.png

How to efficiently join two huge tables by nearest timestamp?

I have two huge tables, A and B. Table A has around 500 million rows of time-series data. Table B has around 10 million rows of time-series data. To simplify, we can assume they are constituted by the following columns:
Table A
factory
machine
timestamp_1
part
suplement
1
1
2022-01-01 23:54:01
1
1
1
1
2022-01-01 23:54:05
1
2
1
1
2022-01-01 23:54:10
1
3
...
...
...
...
Table B
machine
timestamp_2
measure
1
2022-01-01 23:54:00
0
1
2022-01-01 23:54:07
10
1
2022-01-01 23:54:08
0
...
...
...
I want to create a table C, that results from "joining" both tables by matching each value of timestamp_1 of table A to the nearest value of timestamp_2 of table B whose measure is 0, and also for the same factory and machine. I also only need this for the part = 1 values of table A. For the small example above, the resulting table C would have the same amount of rows as A and would look like:
Table C
machine
timestamp_1
time_since_measure_0
1
2022-01-01 23:54:01
1
1
2022-01-01 23:54:05
5
1
2022-01-01 23:54:10
2
...
...
...
Some things that are also important to consider are:
Table A has an index on columns (factory, machine, timestamp_1, part, suplement). That index is essential working great for other queries not related to this. Table B has indexes on columns (machine, timestamp_2, measure).
Table A is a compressed timescaleDB partitioned table by (factory, timestamp_1). This is also because of other queries. Table B is a postgresql vanilla table.
I used the following statement to create table C:
create table C (
machine int4 not null,
timestamp_1 timestamptz,
time_since_measure_0 interval,
constraint C primary key (machine,timestamp_1)
)
I then tried this code to select and insert data into table C:
insert into C (
select
factory,
machine,
timestamp_1,
timestamp_1 - (
select timestamp_2
from B
where
A.machine = B.machine
and B.measure = 0
and B.timestamp_2 <= A.timestamp_1
order by B.timestamp_2 desc
limit 1
) as "time_since_measure_0"
from A
where A.part = 1
)
However, this seems takes a loooot of time. I know I am dealing with very big tables, but is there something I am missing or how could I optimize this?
Because of course we don't have access to your tables and you haven't posted a query plan it's difficult to do more than make some general observations. The indexes you describe as being in place do not appear to be useful to this query. Looking at your query it appears to me that you need to add the following indexes:
Table A
Index on (machine, timestamp_1)
Table B
Index on (machine, measure, timestamp_2)
Give that a shot and see what happens.
What you want is called "as-of join". That joins each timestamp to the nearest value in the other table.
Some time-series databases, like clickhouse, support this directly. This is the only way to make it fast. It is quite similar to a merge join, with a few modifications: the engine must scan both tables in timestamp order, and join to the nearest value row instead of the equal value row.
I've looked into it briefly and it doesn't look like timescaledb supports it, but this post shows a workaround using lateral join and a covering index. This is likely to have similar performance to your query, because it will use a nested loop and an index-only scan to pick the nearest value for each row.

Postgres column to have Autoincrement feature based on values on another key column

I have a requirement where i need to design the table in postgres, where one of the column needs to have autoincrement feature. But the autoincrement should be based on the value on another column.
Table
Column A Column B
100 1
101 1
102 1
102 2
102 3
column A and Column B are the keys to the table. Now if i insert another row, with Column A equated as 100, then column B needs to auto populate as 2. If i attempt to insert value 102 into column A, then column B needs to populate on its own as 4.
Is there a way i can set an attribute to column B, during table creation?
Thanks
Sadiq
#AdrianKlaver is correct. You should use a timestamp and actually record when the version was saved. If you want the version number then you can generate it with the Window row_number function when queried.
select column_a
, row_number() over (partition by column_a
order by column_a,column_a_ver_ts) column_b
from table_name;
Alternative you could use the above query and create a view. See Fiddle

Convert certain columns that are recurring to rows in postgres

I have a table in Postgres that has ~50 million rows. I need to convert certain columns to rows.
I need to unpivot certain columns for individuals that repeat as an individual column and repeat the non-individual variables against the respective ID -
The following is the output I need -
Id appreciate any help on this.
50 million rows is not a big deal in Greenplum but returning that many rows to a client is kind of pointless. I'm guessing you want to create a new table for this new output. You are also going to be creating a table that is 2x larger because you are taking a single row and turning it into 2.
create table new_table as
select id, mid_1 as mid, name_1 as name, age_1 as age, location
from your_table
union all
select id, mid_2 as mid, name_2 as name, age_2 as age, location
from your_table
distributed by (id);

SQL (Redshift) to get the intersect of multiple tables

I'm using Redshift and have 6 tables of IDs in. I want to get the intersect between each of the tables.
So my final output would look something like this:
Table 1 & Table 2 have 10% common IDs
Table 1 & Table 3 have 50% common IDs
.....
.....
Table 6 & Table 4 have 20% common IDs
Table 6 & Table 5 have 3% common IDs
I can easily get the data, but it would be a lot of repeating the same SQL, so I've tried to create some tables of all the IDs and tables they are in but I'm stuck as to what to get the data in one or two SQL's.
Any ideas welcome!
you could try to full join all these tables by ID in a subquery and then use conditional aggregate so that Table 1 & Table 2 have 10% common IDs would be expressed as
100.0*sum(case when id1 is not null and id2 is not null then 1 end)/count(id1)
(taking Table 1 row count as denominator)