I have a DB with a field of timestamp,
I want to partition it for every 2 seconds (I know how to do it for 1 minute and one second)
this is an example of the DB:
create table data_t(id integer, time_t timestamp without time zone, data_t integer );
insert into data_t(id,time_t,data_t) values(1,'1999-01-08 04:05:06',248),
(2,'1999-01-08 04:05:06.03',45),
(3,'1999-01-08 04:05:06.035',98),
(4,'1999-01-08 04:05:06.9',57),
(5,'1999-01-08 04:05:07',86),
(6,'1999-01-08 04:05:08',84),
(7,'1999-01-08 04:05:08.5',832),
(8,'1999-01-08 04:05:08.7',86),
(9,'1999-01-08 04:05:08.9',863),
(10,'1999-01-08 04:05:9',866),
(11,'1999-01-08 04:05:10',862),
(12,'1999-01-08 04:05:10.5',863),
(13,'1999-01-08 04:05:10.55',826),
(14,'1999-01-08 04:05:11',816),
(15,'1999-01-08 04:05:11.7',186),
(16,'1999-01-08 04:05:12',862),
(17,'1999-01-08 04:05:12.5',826)
;
with t as (
select id,
time_t,
date_trunc('second', data_t.time_t) as time_t_1,
data_t
from data_t
), t1 as(
select *,
extract(hour from time_t_1) as h,
extract(minute from time_t_1) as m,
extract(second from time_t_1) as s
from t ) select *,
row_number() over(partition by h,m,s order by time_t_1) as t_sequence
from t1;
the output of this is:
| id | time_t | time_t_1 | data_t | h | m | s | t_sequence |
|----|--------------------------|----------------------|--------|---|---|----|------------|
| 1 | 1999-01-08T04:05:06Z | 1999-01-08T04:05:06Z | 248 | 4 | 5 | 6 | 1 |
| 2 | 1999-01-08T04:05:06.03Z | 1999-01-08T04:05:06Z | 45 | 4 | 5 | 6 | 2 |
| 3 | 1999-01-08T04:05:06.035Z | 1999-01-08T04:05:06Z | 98 | 4 | 5 | 6 | 3 |
| 4 | 1999-01-08T04:05:06.9Z | 1999-01-08T04:05:06Z | 57 | 4 | 5 | 6 | 4 |
| 5 | 1999-01-08T04:05:07Z | 1999-01-08T04:05:07Z | 86 | 4 | 5 | 7 | 1 |
| 6 | 1999-01-08T04:05:08Z | 1999-01-08T04:05:08Z | 84 | 4 | 5 | 8 | 1 |
| 7 | 1999-01-08T04:05:08.5Z | 1999-01-08T04:05:08Z | 832 | 4 | 5 | 8 | 2 |
| 8 | 1999-01-08T04:05:08.7Z | 1999-01-08T04:05:08Z | 86 | 4 | 5 | 8 | 3 |
| 9 | 1999-01-08T04:05:08.9Z | 1999-01-08T04:05:08Z | 863 | 4 | 5 | 8 | 4 |
| 10 | 1999-01-08T04:05:09Z | 1999-01-08T04:05:09Z | 866 | 4 | 5 | 9 | 1 |
| 11 | 1999-01-08T04:05:10Z | 1999-01-08T04:05:10Z | 862 | 4 | 5 | 10 | 1 |
| 12 | 1999-01-08T04:05:10.5Z | 1999-01-08T04:05:10Z | 863 | 4 | 5 | 10 | 2 |
| 13 | 1999-01-08T04:05:10.55Z | 1999-01-08T04:05:10Z | 826 | 4 | 5 | 10 | 3 |
| 14 | 1999-01-08T04:05:11Z | 1999-01-08T04:05:11Z | 816 | 4 | 5 | 11 | 1 |
| 15 | 1999-01-08T04:05:11.7Z | 1999-01-08T04:05:11Z | 186 | 4 | 5 | 11 | 2 |
| 16 | 1999-01-08T04:05:12Z | 1999-01-08T04:05:12Z | 862 | 4 | 5 | 12 | 1 |
| 17 | 1999-01-08T04:05:12.5Z | 1999-01-08T04:05:12Z | 826 | 4 | 5 | 12 | 2 |
as you can see the t_sequence start over every second but I want it to start over every 2 seconds,
is there a way to do it?
link for SQL fiddle with all the data
I need to select first 2 lines where the store_name is different than one given for a given product
id | store_name | prod_name
----+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
4 | 1 | A
5 | 2 | E
6 | 2 | A
7 | 3 | G
8 | 2 | A
9 | 1 | A
10 | 3 | A
(10 rows)
result should be store_name <> 3 AND prod_name ='A'
id | store_name | prod_name
----+------------+------
1 | 1 | A
4 | 1 | A
6 | 2 | A
8 | 2 | A
Use the row_number() window function to accomplish this.
Query #1
with first_two as (
select *,
row_number() over (partition by store_name
order by id) as rn
from store_product
where store_name <> 3
and prod_name = 'A'
)
select id, store_name, prod_name
from first_two
where rn <= 2;
| id | store_name | prod_name |
| --- | ---------- | --------- |
| 1 | 1 | A |
| 4 | 1 | A |
| 6 | 2 | A |
| 8 | 2 | A |
View on DB Fiddle
My Situation
I have some tables in my redshift cluster that all break down into either an order_id, shipment_id, or shipment_item_id depending on how granular the table is. order_id is a 1 to many relationship on shipment_id and shipment_id is a 1 to many on shipemnt_item_id.
My Question
I distribute on order_id, so all shipment_id and shipment_item_id records should be on the same nodes across the tables since they are grouped by order_id. My question is, when I have to join on shipment_id or shipment_item_id then will redshift know that the records are on the same nodes, or will it still broadcast the tables since they aren't joined on order_id?
Example Tables
unified_order shipment_details
+----------+-------------+------------------+ +-------------+-----------+--------------+
| order_id | shipment_id | shipment_item_id | | shipment_id | ship_day | ship_details |
+----------+-------------+------------------+ +-------------+-----------+--------------+
| 1 | 1 | 1 | | 1 | 1/1/2017 | stuff |
| 1 | 1 | 2 | | 2 | 5/1/2017 | other stuff |
| 1 | 1 | 3 | | 3 | 6/14/2017 | more stuff |
| 1 | 2 | 4 | | 4 | 5/13/2017 | less stuff |
| 1 | 2 | 5 | | 5 | 6/19/2017 | that stuff |
| 1 | 3 | 6 | | 6 | 7/31/2017 | what stuff |
| 2 | 4 | 7 | | 7 | 2/5/2017 | things |
| 2 | 4 | 8 | +-------------+-----------+--------------+
| 3 | 5 | 9 |
| 3 | 5 | 10 |
| 4 | 6 | 11 |
| 5 | 7 | 12 |
| 5 | 7 | 13 |
+----------+-------------+------------------+
Distribution
distribution_by_node
+------+----------+-------------+------------------+
| node | order_id | shipment_id | shipment_item_id |
+------+----------+-------------+------------------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 1 | 1 | 3 |
| 1 | 1 | 2 | 4 |
| 1 | 1 | 2 | 5 |
| 1 | 1 | 3 | 6 |
| 1 | 5 | 7 | 12 |
| 1 | 5 | 7 | 13 |
| 2 | 2 | 4 | 7 |
| 2 | 2 | 4 | 8 |
| 3 | 3 | 5 | 9 |
| 3 | 3 | 5 | 10 |
| 4 | 4 | 6 | 11 |
+------+----------+-------------+------------------+
The Amazon Redshift documentation does not go into detail how information is shared between nodes, but it is doubtful that it "broadcasts the tables".
Rather, information is probably sent between nodes based on need -- only the relevant columns would be shared, and possibly only sub-ranges of the data.
Rather than worrying too much about the internal implementation, you should test various DISTKEY and SORTKEY strategies against real queries to determine performance.
Follow the recommendations from Choose the Best Distribution Style to minimize the amount of data that needs to be sent between nodes and consult Amazon Redshift Best Practices for Designing Queries to improve queries.
You can EXPLAIN your query to see how data will be distributed (or not) during the execution. In this doc you'll see how to read the query plan:
Evaluating the Query Plan
I have a table Members(id, name, parent_id), where parent_id is the parent of the member(it is also a member which can have its parent). For example
id | name | parent_id
----------------------
1 | John | NULL
2 | Smith| 1
3 | Andy | 1
4 | Joe | 2
5 | Rick | 2
6 | Craig| 5
7 | Greg | NULL
8 | Bob | 5
9 | Mike | 8
And I'd like to run statement select from members, and I want to have
id | name | parent_id | root_parent_id
--------------------------------------
1 | John | NULL | NULL
2 | Smith| 1 | 1
3 | Andy | 1 | 1
4 | Joe | 2 | 1
5 | Rick | 2 | 1
6 | Craig| 5 | 1
7 | Greg | NULL | NULL
8 | Bob | 7 | 7
9 | Mike | 8 | 7
I want to find the root_parent_id for all members as deeply as possible. Help me please
with recursive recursive_members as (
select *, id root_id, 1 depth
from members
union all
select r.id, r.name, r.parent_id, m.parent_id, r.depth+ 1
from recursive_members r
join members m on r.root_id = m.id
where m.parent_id notnull
)
select distinct on (id) *
from recursive_members
order by id, depth desc;
id | name | parent_id | root_id | depth
----+-------+-----------+---------+-------
1 | John | | 1 | 1
2 | Smith | 1 | 1 | 2
3 | Andy | 1 | 1 | 2
4 | Joe | 2 | 1 | 3
5 | Rick | 2 | 1 | 3
6 | Craig | 5 | 1 | 4
7 | Greg | | 7 | 1
8 | Bob | 5 | 1 | 4
9 | Mike | 8 | 1 | 5
(9 rows)
Read about recursive WITH queries.
i would like to use the current row number of my org table in cell calculations, either in relation to the table as a whole or in relation to an hline.
if i have the following table:
|---+---+---|
| x | y | z |
|---+---+---|
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
| 2 | 4 | 8 |
|---+---+---|
#+TBLFM: #II..#III$1=2::$2=4::$3=$1*$2
how do I change it so that the in the y column each cell is equal to its table row number, as shown if you turn on grid mode in org? the resulting table would look like:
|---+----+----|
| x | y | z |
|---+----+----|
| 2 | 2 | 4 |
| 2 | 3 | 6 |
| 2 | 4 | 8 |
| 2 | 5 | 10 |
| 2 | 6 | 12 |
| 2 | 7 | 14 |
| 2 | 8 | 16 |
| 2 | 9 | 18 |
| 2 | 10 | 20 |
|---+----+----|
(defmath passIndex (x)
x
)
Number rows:
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
#+TBLFM: $1=passIndex(##)
Number columns:
| 1 | 2 | 3 | 4 | 5 |
#+TBLFM: #1=passIndex($#)
Number rows with header row:
| header |
|--------|
| 2 |
| 3 |
| 4 |
| 5 |
#+TBLFM: $1=passIndex(##)