How to increase value when source value is changed?
I have tried rank, dense_rank, row_number without success =(
id | src | how to get this?
--------
1 | 1 | 1
2 | 1 | 1
3 | 7 | 2
4 | 1 | 3
5 | 3 | 4
6 | 3 | 4
7 | 1 | 5
NOTICE: src is guaranteed to be in this order you see
is there simple way to do this?
You can achieve this by nesting two window functions - the first to get whether the src value changed from the previous row, the second to sum the number of changes. Unfortunately Postgres doesn't allow nesting window functions directly, but you can work around that with a subquery:
SELECT
id,
src,
sum(incr) OVER (ORDER BY id)
FROM (
SELECT
*,
(lag(src) OVER (ORDER BY id) IS DISTINCT FROM src)::int AS incr
FROM example
) AS _;
(online demo)
If I have a table:
id | status
----+--------
2 | 200
1 | 0
4 | 100
3 | 200
5 | 200
I want to count the number of occurrences of each status. I have tried to use the COUNT/OVER function
SELECT status, COUNT(*) OVER () AS all, COUNT(*) OVER (PARTITION by status) as count FROM my_table;
The results are what is expected per the postgres docs on windows "However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities"
status | all | count
--------+-------+-------
0 | 5 | 1
100 | 5 | 1
200 | 5 | 3
200 | 5 | 3
200 | 5 | 3
How instead can get an output that combines the rows, so that I only get 1 row per unique status if the partition is required?
status | all | count
--------+-------+-------
0 | 5 | 1
100 | 5 | 1
200 | 5 | 3
No window function necessary in the first stage of the query, i.e. getting the counts per status. Window functions work on the result of the non-windowing part of the query, thus you can have a window function referring the aggregate & non-aggregate columns in a query. To get all_counts, it is sufficient to SUM the status_count over all the rows.
SELECT
status
, COUNT(*) status_count
, SUM(COUNT(*)) OVER () all_count
FROM my_table
GROUP BY status
I have a table called example that looks as follows:
ID | MIN | MAX |
1 | 1 | 5 |
2 | 34 | 38 |
I need to take each ID and loop from it's min to max, incrementing by 2 and thus get the following WITHOUT using INSERT statements, thus in a SELECT:
ID | INDEX | VALUE
1 | 1 | 1
1 | 2 | 3
1 | 3 | 5
2 | 1 | 34
2 | 2 | 36
2 | 3 | 38
Any ideas of how to do this?
The set-returning function generate_series does exactly that:
SELECT
id,
generate_series(1, (max-min)/2+1) AS index,
generate_series(min, max, 2) AS value
FROM
example;
(online demo)
The index can alternatively be generated with RANK() (example, see also #a_horse_with_no_name's answer) if you don't want to rely on the parallel sets.
Use generate_series() to generate the numbers and a window function to calculate the index:
select e.id,
row_number() over (partition by e.id order by g.value) as index,
g.value
from example e
cross join generate_series(e.min, e.max, 2) as g(value);
I have a Postgresql database (technically Greenplum) with data on individuals over time. The database has three fields: user_id, monthly_date, and account_value. When I put in a query, I have to download the results from a remote server, so bandwidth is an issue. Since the user_id field is a very long string (around 50 characters), I'd like to return a numerical value that corresponds 1:1 with each value of user_id, since this will take up less space.
For example, the database might have sample data like this:
63a9364385350b13473279 Jan-2000
63a9364385350b13473279 Feb-2000
2066937e2887w206010393 Apr-2001
036686037e507d01764237 Mar-2003
036686037e507d01764237 Jun-2003
036686037e507d01764237 Jul-2003
036686037e507d01764237 Dec-2003
90829x098327549n286418 Apr-2004
90829x098327549n286418 Sep-2004
67518x834512306933u500 Nov-2000
and I'm trying to work out a query using ROW_NUMBER() and various window functions like PARTITION BY to get results like this:
1 Jan-2000
1 Feb-2000
2 Apr-2001
3 Mar-2003
3 Jun-2003
3 Jul-2003
3 Dec-2003
4 Apr-2004
4 Sep-2004
5 Nov-2000
I know these aren't actual database formats, but I'm just using them as example data. Is this possible? I don't care (although it would be nice and very neat to see) if, for example, 63a9364385350b13473279 maps to 1 in one query and 2 in the next, but in any given query, 63a9364385350b13473279 should always map to the same value regardless of date. The mapped numbers don't need to be in sequence or have any meaningful value besides being unique.
If you just need a unique number, this will do the trick:
SELECT
id,
split_part(t.d, '-', 2),
row_number() OVER all_window - row_number() OVER group_window AS a_unique_number_by_id
FROM (
VALUES
('63a9364385350b13473279','Jan-2000'),
('63a9364385350b13473279','Feb-2000'),
('2066937e2887w206010393','Apr-2001'),
('036686037e507d01764237','Mar-2003'),
('036686037e507d01764237','Jun-2003'),
('036686037e507d01764237','Jul-2003'),
('036686037e507d01764237','Dec-2003'),
('90829x098327549n286418','Apr-2004'),
('90829x098327549n286418','Sep-2004'),
('67518x834512306933u500','Nov-2000')
) as t(id, d)
WINDOW group_window AS (
PARTITION BY id
ORDER BY split_part(t.d, '-', 2)
), all_window AS (
ORDER BY split_part(t.d, '-', 2)
);
Here is the result:
id | split_part | a_unique_number_by_id
------------------------+------------+-----------------------
63a9364385350b13473279 | 2000 | 0
63a9364385350b13473279 | 2000 | 0
67518x834512306933u500 | 2000 | 2
2066937e2887w206010393 | 2001 | 3
036686037e507d01764237 | 2003 | 4
036686037e507d01764237 | 2003 | 4
036686037e507d01764237 | 2003 | 4
036686037e507d01764237 | 2003 | 4
90829x098327549n286418 | 2004 | 8
90829x098327549n286418 | 2004 | 8
(10 rows)
You should re-order it with another column to keep the original ordering.
I think you are looking for dense_rank().
create table sample_data
(userid varchar(50) not null,
monthly_date date not null)
distributed by (userid);
insert into sample_data (userid, monthly_date) values
('63a9364385350b13473279','2000-01-01'),
('63a9364385350b13473279','2000-02-01'),
('2066937e2887w206010393','2001-04-01'),
('036686037e507d01764237','2003-03-01'),
('036686037e507d01764237','2003-06-01'),
('036686037e507d01764237','2003-07-01'),
('036686037e507d01764237','2003-12-01'),
('90829x098327549n286418','2004-04-01'),
('90829x098327549n286418','2004-09-01'),
('67518x834512306933u500','2000-11-01');
select dense_rank() over(order by userid) as new_userid, userid, monthly_date
from sample_data
order by 2;
new_userid | userid | monthly_date
------------+------------------------+--------------
1 | 036686037e507d01764237 | 2003-06-01
1 | 036686037e507d01764237 | 2003-07-01
1 | 036686037e507d01764237 | 2003-12-01
1 | 036686037e507d01764237 | 2003-03-01
2 | 2066937e2887w206010393 | 2001-04-01
3 | 63a9364385350b13473279 | 2000-02-01
3 | 63a9364385350b13473279 | 2000-01-01
4 | 67518x834512306933u500 | 2000-11-01
5 | 90829x098327549n286418 | 2004-09-01
5 | 90829x098327549n286418 | 2004-04-01
(10 rows)
Try the below script
create table test_schema.source_data (id varchar(50), dt varchar(50));
insert into test_schema.source_data
values ('63a9364385350b13473279','Jan-2000'),
('63a9364385350b13473279','Feb-2000'),
('2066937e2887w206010393','Apr-2001'),
('036686037e507d01764237','Mar-2003'),
('036686037e507d01764237','Jun-2003'),
('036686037e507d01764237','Jul-2003'),
('036686037e507d01764237','Dec-2003'),
('90829x098327549n286418','Apr-2004'),
('90829x098327549n286418','Sep-2004'),
('67518x834512306933u500','Nov-2000');
create temporary table id_mapping
as
select t1.id, row_number() over(order by t1.id) rownum
from (
SELECT distinct id
FROM test_schema.source_data
) t1;
select t1.id, t1.dt, t2.rownum
from
test_schema.source_data t1
join id_mapping t2
on t1.id = t2.id;
And here is the result
id dt rownum
------------------------+------------+-----
036686037e507d01764237 Dec-2003 1
036686037e507d01764237 Jul-2003 1
036686037e507d01764237 Jun-2003 1
036686037e507d01764237 Mar-2003 1
2066937e2887w206010393 Apr-2001 2
63a9364385350b13473279 Feb-2000 3
63a9364385350b13473279 Jan-2000 3
67518x834512306933u500 Nov-2000 4
90829x098327549n286418 Sep-2004 5
90829x098327549n286418 Apr-2004 5
I'm trying to figure out how to show distinct records in groups in crystal reports. The view I wrote returns something like this:
Field 1 | Field 2 | Field 3
----------------------------------
10 | 111 | Record Info 1
10 | 111 | Record Info 1
10 | 222 | Record Info 2
20 | 111 | Record Info 1
20 | 222 | Record Info 2
The report groups are based off field one, and I want distinct fields 2 and 3 for each group:
Field 1 | Field 2 | Field 3
----------------------------------
10 | 111 | Record Info 1
10 | 222 | Record Info 2
20 | 111 | Record Info 1
20 | 222 | Record Info 2
Field 2 and 3 are always the same, Field 1 acts as an FK reference to any entries in the view. Selecting distinct xxx in the view isn't really viable due to the huge amount of columns being brought in.
Can this be done in CR?
Cheers
Create a group for field1, field2
Hide Details area, field1 group area header and field1 group footer
Drop all the columns you want to show in the field2 group area header/footer.
Good luck!
You might also consider using Database | Select Distinct Records.