First column value from multiple rows in postgres - postgresql

I am using postgres 11, and have below table
+----------+-------------------+-----------+-----------------------------+
| username | filters | flag | time |
+----------+-------------------+-----------+-----------------------------+
| user1 | filter001 | 0 |2022-06-16 05:35:19.593000 |
| user1 | filter001 | 0 |2022-06-16 05:35:19.603000 |
| user1 | filter001 | 1 |2022-06-16 05:35:19.753000 |
| user1 | filter001 | 1 |2022-06-16 05:35:19.763000 |
| user1 | filter001 | 1 |2022-06-16 05:35:19.773000 |
| user1 | filter001 | 0 |2022-06-16 05:35:19.793000 |
| user1 | filter002 | 1 |2022-06-16 05:35:19.793000 |
| user1 | filter002 | 1 |2022-06-16 05:35:19.813000 |
| user1 | filter002 | 0 |2022-06-16 05:35:19.823000 |
| user1 | filter002 | 0 |2022-06-16 05:35:19.833000 |
| user1 | filter002 | 1 |2022-06-16 05:35:19.843000 |
| user1 | filter002 | 1 |2022-06-16 05:35:19.853000 |
| user1 | filter003 | 1 |2022-06-16 05:35:19.863000 |
| user1 | filter003 | 0 |2022-06-16 05:35:19.873000 |
| user1 | filter003 | 0 |2022-06-16 05:35:19.883000 |
| user1 | filter003 | 0 |2022-06-16 05:35:19.893000 |
| user1 | filter003 | 1 |2022-06-16 05:35:19.903000 |
| user1 | filter003 | 1 |2022-06-16 05:35:19.913000 |
| user1 | filter003 | 0 |2022-06-16 05:35:19.923000 |
| user1 | filter004 | 0 |2022-06-16 05:35:19.933000 |
| user1 | filter004 | 1 |2022-06-16 05:35:19.943000 |
| user1 | filter004 | 0 |2022-06-16 05:35:19.953000 |
| user1 | filter004 | 0 |2022-06-16 05:35:19.963000 |
| user1 | filter004 | 0 |2022-06-16 05:35:19.973000 |
+----------+-------------------+-----------------------------------------+
I'm trying to get below result from distinct filter value, along with count calculation
+-----------+----------+---------+---------------+----------------------------+
| filters | total_0 | total_1 | total_0_and_1 | time |
+-----------+----------+---------+---------------+----------------------------+
| filter001 | 3 | 3 | 6 |2022-06-16 05:35:19.593000 |
| filter002 | 2 | 4 | 6 |2022-06-16 05:35:19.793000 |
| filter003 | 4 | 3 | 7 |2022-06-16 05:35:19.863000 |
| filter004 | 4 | 1 | 5 |2022-06-16 05:35:19.933000 |
+-----------+----------+---------+---------------+----------------------------+
I tried below query, which is giving me the desired result except for the time values, I am unable to add the time value to unique filters, time value could be the first record value of each filter.
Any way to optimize and add time value of the first record of each filter?
select filters,
count(flag) filter (where flag=0) as total_0 ,
count(flag) filter (where flag=1) as total_1,
count(time) as total_0_and_1
from My_Table
where username='user1';

We can use COUNT as an analytic function along with DISTINCT ON here:
WITH cte AS (
SELECT COUNT(flag) FILTER (WHERE flag = 0) OVER (PARTITION BY filters) AS total_0,
COUNT(flag) FILTER (WHERE flag = 1) OVER (PARTITION BY filters) AS total_1,
COUNT(time) OVER (PARTITION BY filters) AS total_0_and_1,
filters, time
FROM My_Table
WHERE username = 'user1'
)
SELECT DISTINCT ON (filters) filters, total_0, total_1, total_0_and_1, time
FROM cte
ORDER BY filters, time;

Related

tableau calculate cumulative value with condition

I have a tableau table with columns like this:
| ID | ww | count_flag |
| 1 | ww1 | 0 |
| 1 | ww2 | 1 |
| 1 | ww3 | 1 |
| 1 | ww4 | 0 |
| 1 | ww5 | 1 |
| 2 | ww1 | 1 |
| 2 | ww2 | 1 |
| 2 | ww3 | 1 |
| 2 | ww4 | 0 |
| 2 | ww5 | 1 |
...
Now I'd like to add a new column to show the consistent status for each ID among all the ww(workweek), the consistent status will be reset every time when the count_flag is 0 or ID changes, so it will look like below:
|ID | ww | count_flag | consistent status|
| 1 | ww1 | 0 | 0 |
| 1 | ww2 | 1 | 1 |
| 1 | ww3 | 1 | 2 |
| 1 | ww4 | 0 | 0 |
| 1 | ww5 | 1 | 1 |
| 2 | ww1 | 1 | 1 |
| 2 | ww2 | 1 | 2 |
| 2 | ww3 | 1 | 3 |
| 2 | ww4 | 0 | 0 |
| 2 | ww5 | 1 | 1 |
...
How should I create the calculating field to add such a parameter to the table column.

ERROR: column "table.column_name" must appear in the GROUP BY clause or be used in an aggregate function

I have the following table:
SELECT * FROM trips_motion_xtics
+------------+---------+-------------+------------+-------------+------------+-----+------------------+------------------+-------+--------------+-------+-------------+
| session_id | trip_id | lat_start | lat_end | lon_start | lon_end | alt | distance | segments_length | speed | acceleration | track | travel_mode |
+------------+---------+-------------+------------+-------------+------------+-----+------------------+------------------+-------+--------------+-------+-------------+
| 652 | 303633 | 41.1523521 | 41.1524966 | -8.6097233 | -8.6096833 | 0 | 42.7424443438547 | 28.0353622436523 | 0 | 74.208 | 0 | foot |
| 652 | 303633 | 41.1523521 | 41.1524966 | -8.6097233 | -8.6096833 | 0 | 42.7424443438547 | 28.0353622436523 | 0 | 74.154 | 0 | foot |
| 652 | 303633 | 41.1523521 | 41.1524966 | -8.6097233 | -8.6096833 | 0 | 42.7424443438547 | 28.0353622436523 | 0 | 68.226 | 0 | foot |
| 656 | 303637 | 41.14454009 | 41.1631127 | -8.56292593 | -8.5870161 | 0 | 5921.07030809987 | 2785.6088546142 | 0 | 99.028 | 0 | car |
| 656 | 303637 | 41.14454009 | 41.1631127 | -8.56292593 | -8.5870161 | 0 | 5921.07030809987 | 2785.6088546142 | 0 | 109.992 | 0 | car |
+------------+---------+-------------+------------+-------------+------------+-----+------------------+------------------+-------+--------------+-------+-------------+
Now would like to compute the average value for columns alt, distance, speed ... for unique value of session_id, trip_id, lat_start,...
Query:
SELECT DISTINCT(session_id, trip_id, lat_start, lat_end, lon_start, lon_end, travel_mode), AVG(alt) AS avg_alt, AVG(distance) AS avg_disntance, AVG(speed) AS avg_speed, AVG(acceleration) AS avg_acc FROM akil.trips_motion_xtics;
ERROR: column "trips_motion_xtics.session_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT DISTINCT(session_id, trip_id, lat_start, lat_end, lon...
Required result:
+------------+---------+-------------+------------+-------------+------------+-----+------------------+------------------+-------+--------------+-------+-------------+
| session_id | trip_id | lat_start | lat_end | lon_start | lon_end | alt | distance | segments_length | speed | acceleration | track | travel_mode |
+------------+---------+-------------+------------+-------------+------------+-----+------------------+------------------+-------+--------------+-------+-------------+
| 652 | 303633 | 41.1523521 | 41.1524966 | -8.6097233 | -8.6096833 | 0 | 42.7424443438547 | 28.0353622436523 | 0 | 72.196 | 0 | foot |
| 656 | 303637 | 41.14454009 | 41.1631127 | -8.56292593 | -8.5870161 | 0 | 5921.07030809987 | 2785.6088546142 | 0 | 104.51 | 0 | car |
+------------+---------+-------------+------------+-------------+------------+-----+------------------+------------------+-------+--------------+-------+-------------+
You want aggregation. You will get a unique record for each combination of the column listed in the GROUP BY clause, and you can apply aggregate functions (such as AVG()) on other columns:
SELECT
session_id,
trip_id,
lat_start,
lat_end,
lon_start,
lon_end,
travel_mode,
AVG(alt) AS avg_alt,
AVG(distance) AS avg_disntance,
AVG(speed) AS avg_speed,
AVG(acceleration) AS avg_acc
FROM akil.trips_motion_xtics
GROUP BY
session_id,
trip_id,
lat_start,
lat_end,
lon_start,
lon_end,
travel_mode

Replace null by negative id number in not consecutive rows in hive

I have this table in my database:
| id | desc |
|-------------|
| 1 | A |
| 2 | B |
| NULL | C |
| 3 | D |
| NULL | D |
| NULL | E |
| 4 | F |
---------------
And I want to transform this table into a table that replace nulls by consecutive negative ids:
| id | desc |
|-------------|
| 1 | A |
| 2 | B |
| -1 | C |
| 3 | D |
| -2 | D |
| -3 | E |
| 4 | F |
---------------
Anyone knows how can I do this in hive?
Below approach works
select coalesce(id,concat('-',ROW_NUMBER() OVER (partition by id))) as id,desc from database_name.table_name;

Count occurrences of value in field for a particular ID using Redshift

I want to count the occurrences of particular values in a certain field for an ID. So what I have is this:
| Location ID | Group |
|:----------- |:---------|
| 1 | Group A |
| 2 | Group B |
| 3 | Group C |
| 4 | Group A |
| 4 | Group B |
| 4 | Group C |
| 3 | Group A |
| 2 | Group B |
| 1 | Group C |
| 2 | Group A |
And what I would hope to yield through some computer magic is this:
| Location ID | Group A Count | Group B Count | Group C count|
|:----------- |:--------------|:--------------|:-------------|
| 1 | 1 | 0 | 1 |
| 2 | 1 | 2 | 0 |
| 3 | 1 | 0 | 1 |
| 4 | 1 | 1 | 1 |
Is there some sort of pivoting function I can use in Redshift to achieve this?
This will require the usage of the CASE function and GROUP clause, as in example.
SELECT l_id,
SUM(CASE WHEN l_group = 'Group A' THEN 1 ELSE 0 END) AS a,
SUM(CASE WHEN l_group = 'Group B' THEN 1 ELSE 0 END) AS b-- and so on
FROM location
GROUP BY l_id;
This should give you such result:
| l_id | a | b |
|------|---|---|
| 4 | 1 | 1 |
| 1 | 1 | 0 |
| 3 | 1 | 0 |
| 2 | 1 | 2 |
You can play with it on this SQL Fiddle.

Ordering and grouping MySQL data

+----+------------------+-----------------+
| id | template_type_id | url |
+----+------------------+-----------------+
| 1 | 1 | text |
| 2 | 2 | text |
| 3 | 1 | text |
| 4 | 1 | text |
| 5 | 1 | text |
| 6 | 1 | text |
| 7 | 1 | text |
| 8 | 1 | text |
| 9 | 2 | text |
| 10 | 2 | text |
+----+------------------+-----------------+
As i am using 1 page template and 2 page template i need to reorder above result as per 1 page and 2 page as below:
+----+------------------+-----------------+
| id | template_type_id | url |
+----+------------------+-----------------+
| 1 | 1 | text |
| 3 | 1 | text |
| 2 | 2 | text |
| 4 | 1 | text |
| 5 | 1 | text |
| 6 | 1 | text |
| 7 | 1 | text |
| 9 | 2 | text |
| 10 | 2 | text |
| 8 | 1 | text |
+----+------------------+-----------------+
+------------------------------------------+
---------------- ------------------
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
---------------- ------------------
+------------------------------------------+
Assuming there's publish_date column in the table that is not shown and the values in it consistent with the ordering of the records in the examples 1 and 2, I suggest:
order by publish_date, template_type_id