I've created a view for daily totals. I need to exclude rows to get more accurate figures.
There are two columns (amongst others) in the view named PaymentCode and CustomerID. I need to exclude rows where PaymentCode = Account and CustomerID = CASHLIG
Tried using <> which doesn't work as it then takes out everything where PaymentCode = Account
which is not what I need.
Sample data
| PAYMENTCODE | CUSTOMERID |
----------------------------
| CASH | CASHLIG |
| CCARD | CASHLIG |
| ACCOUNT | 10VICT003 |
| ACCOUNT | CASHLIG |
| CCARD | CASHLIG |
| ACCOUNT | CASHLIG |
Any suggestions? I tried searching for an answer to this but wasn't sure how to phrase it.
Your help will be greatly appreciated.
UPDATED Try
CREATE VIEW vw_DailyTotals AS
SELECT ...
FROM ...
WHERE PaymentCode <> 'ACCOUNT'
OR CustomerID <> 'CASHLIG'
Sample output:
| PAYMENTCODE | CUSTOMERID |
----------------------------
| CASH | CASHLIG |
| CCARD | CASHLIG |
| ACCOUNT | 10VICT003 |
| CCARD | CASHLIG |
Here is SQLFiddle demo.
Related
Note: I've already gone over related questions like following that don't address my query
SQL: how to pick one row for each set of rows with duplicate value in one column?
Fill missing values with first non-null following value in Redshift
I have a sparse, unclean dataset like this
| id | operation | title | channel_type | mode |
|-----|-----------|----------|--------------|------|
| abc | Start | | | |
| abc | Start | recovery | | Link |
| abc | Start | recovery | SMS | |
| abc | Set | | Email | |
| abc | Verify | | Email | |
| pqr | Start | | | OTP |
| pqr | Verfiy | sign_in | Push | |
| pqr | Verify | | | |
| xyz | Start | sign_up | | Link |
and I need to fill up empty rows of each id with non-empty data available from other rows
| id | operation | title | channel_type | mode |
|-----|-----------|----------|--------------|------|
| abc | Start | recovery | SMS | Link |
| abc | Start | recovery | SMS | Link |
| abc | Start | recovery | SMS | Link |
| abc | Set | recovery | Email | Link |
| abc | Verify | recovery | Email | Link |
| pqr | Start | sign_in | Push | OTP |
| pqr | Verfiy | sign_in | Push | OTP |
| pqr | Verify | sign_in | Push | OTP |
| xyz | Start | sign_up | | Link |
notes
some ids can have a certain field as empty in all rows
and while most ids will have same non-empty values for each field, edge cases could have different values. For such groups, filling up any non-empty value in all rows is acceptable. [this is too rare in my dataset and can be ignored]
another extra bit of pattern is that certain fields are mostly only present only against rows of certain operations, for e.g. mode is only present against operation='Start' rows
I've tried grouping rows by id while performing listagg over title, channel_type and mode columns, followed by coalesce, something along the lines of this:
WITH my_data AS (
SELECT
id,
operation,
title,
channel_type,
mode
FROM
my_db.my_table
),
list_aggregated_data AS (
SELECT
id,
listagg(title) AS titles,
listagg(channel_type) AS channel_types,
listagg(mode) AS modes
FROM
my_data
GROUP BY
id
),
coalesced_data AS (
SELECT DISTINCT
id,
coalesce(titles) AS title,
coalesce(channel_types) AS channel_type,
coalesce(modes) AS mode
FROM
list_aggregated_data
),
joined_data AS (
SELECT
md.id,
md.operation,
cd.title,
cd.channel_type,
cd.mode
FROM
my_data AS md
LEFT JOIN
coalesced_data AS cd ON cd.id = md.id
)
SELECT
*
FROM
joined_data
ORDER BY
id,
operation
But for some reason this is resulting in concatenation of values (presumably from coalesce operation), where I get
| id | operation | title | channel_type | mode |
|-----|-----------|------------------|--------------|------|
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Set | recoveryrecovery | Email | Link |
| abc | Verify | recoveryrecovery | Email | Link |
| pqr | Start | sign_in | Push | OTP |
| pqr | Verfiy | sign_in | Push | OTP |
| pqr | Verify | sign_in | Push | OTP |
| xyz | Start | sign_up | | Link |
What's the correct way to approach this problem?
I'd start with the first_value() window function with the ignore nulls option. You will partition by the first 2 columns and will need to work out the edge cases with some data massaging, likely in the order by clause of the window function.
I have two tables products and product_attributs. One Product can have one or many attributs and these are filled by a dynamic web form (name and value inputs) added by the user as needed. For example for a drill the user could decide to add two attributs : color=blue and power=100 watts. For another product it could be 3 or more different attribus and for another it could have no special attributs.
products
| id | name | identifier | identifier_type | active
| ----------|--------------|-------------|------------------|---
| 1 | Drill | AD44 | barcode | true
| 2 | Polisher | AP211C | barcode | true
| 3 | Jackhammer | AJ2133 | barcode | false
| 4 | Screwdriver | AS4778 | RFID | true
product_attributs
|id | name | value | product_id
|----------|--------------|-------------|----------
|1 | color | blue | 1
|2 | power | 100 watts | 1
|3 | size | 40 cm | 2
|4 | energy | electrical | 3
|4 | price | 35€ | 3
so attributs could be anything which are set dynamically by the user. My need is to generate a report on CSV which contain all products with their attributs. Without a good experience in SQL I generated the following basic request :
SELECT pr.name, pr.identifier_type, pr.identifier, pr.active, att.name, att.value
FROM products as pr
LEFT JOIN product_attributs att ON pr.id = att.product_id
as you know the result will contain for the same product as many rows as attributs it has and this is not ideal for reporting. The ideal would be this :
|name | identifier_type | identifier | active | name | value | name | value
|-----------|-----------------|------------|--------|--------|-------|------ |------
|Drill | barcode | AD44 | true | color | blue | power | 100 w
|Polisher | barcode | AP211C | true | size | 40 cm | null | null
|Jackhammer | barcode | AJ2133 | true | energy | elect | price | 35 €
|Screwdriver| barcode | AS4778 | true | null | null | null | null
here I only showed a max of two attributes per product but it could be more if needed. Well I did some research and came across the pivot with crosstab function on Postgres but the problem it requests static values but this does not match my need.
thanks lot for your help and sorry for duplicates if any.
Thanks Laurenz Albe for your help. array_agg solved my problem. Here is the query if someone may be interested in :
SELECT
pr.name, pr.description, pr.identifier_type, pr.identifier,
pr.internal_identifier, pr.active,
ARRAY_TO_STRING(ARRAY_AGG (oa.name || ' = ' || oa.value),', ') attributs
FROM
products pr
LEFT JOIN product_attributs oa ON pr.id = oa.product_id
GROUP BY
pr.name, pr.description, pr.identifier_type, pr.identifier,
pr.internal_identifier, pr.active
ORDER BY
pr.name;
I am trying to understand how to pivot data within T-SQL but can't seem to get it working. I have the following table structure
+-------------------+-----------------------+
| Name | Value |
+-------------------+-----------------------+
| TaskId | 12417 |
| TaskUid | XX00044497 |
| TaskDefId | 23 |
| TaskStatusId | 4 |
| Notes | |
| TaskActivityIndex | 0 |
| ModifiedBy | Orange |
| Modified | /Date(1554540200000)/ |
| CreatedBy | Apple |
| Created | /Date(2121212100000)/ |
| TaskPriorityId | 40 |
| OId | 2 |
+-------------------+-----------------------+
I want to pivot the name column to be columns expected output
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
| TASKID | TASKUID | TASKDEFID | TASKSTATUSID | NOTES | TASKACTIVITYINDEX | MODIFIEDBY | MODIFIED | CREATEDBY | CREATED | TASKPRIORITYID | OID |
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
| | | | | | | | | | | | |
| 12417 | XX00044497 | 23 | 4 | | 0 | Orange | /Date(1554540200000)/ | Apple | /Date(2121212100000)/ | 40 | 2 |
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
Is there an easy way of doing it? The columns are fixed (not dynamic).
Any help appreciated
Try this:
select * from yourtable
pivot
(
min(value)
for Name in ([TaskID],[TaskUID],[TaskDefID]......)
) as pivotable
You can also use case statements.
You must use the aggregate function in the pivot table.
If you want to learn more, here is the reference:
https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Output (I only tried three columns):
DB<>Fiddle
I have a table "Listing" that looks like this:
| listing_id | amenities |
|------------|--------------------------------------------------|
| 5629709 | {"Air conditioning",Heating, Essentials,Shampoo} |
| 4156372 | {"Wireless Internet",Kitchen,"Pets allowed"} |
And another table "Amenity" like this:
| amenity_id | amenities |
|------------|--------------------------------------------------|
| 1 | Air conditioning |
| 2 | Kitchen |
| 3 | Heating |
Is there a way to join the two tables in a new one "Listing_Amenity" like this:
| listing_id | amenities |
|------------|-----------|
| 5629709 | 1 |
| 5629709 | 3 |
| 4156372 | 2 |
You could use unnest:
CREATE TABLE Listing_Amenity
AS
SELECT l.listing_id, a.amenity_id
FROM Listing l
, unnest(l.ammenities) sub(elem)
JOIN Amenity a
ON a.ammenities = sub.elem;
db<>fiddle demo
What we are trying to achieve is some level of cleansing on the table. This is what we currently have (this a subset of the table please)
+-----------+------------+-----------+------------+--------+-----------+------------+-------+
| STUDENTNO | LASTNAME | FIRSTNAME | PREFERNAME | GENDER | COURSE | YEAR | MAJOR |
+-----------+------------+-----------+------------+--------+-----------+------------+-------+
| auaw64 | Drury | Janet | Jane | f | DIPLOMA | 29/10/2011 | NO |
| auaw64 | Drury | Janet | Jane | f | BACHELORS | 29/09/2013 | YES |
| auqn70 | Givens | Jason | | m | DIPLOMA | 29/10/2011 | NO |
| auqn70 | Givens | Jason | | m | BACHELORS | 10/10/2012 | YES |
| mrpd90 | Blackstock | Williams | Bill | m | DIPLOMA | 29/10/2011 | NO |
| mrpd90 | Blackstock | Williams | Bill | m | BACHELORS | 29/09/2013 | YES |
| pyts84 | Peters | Theresa | | f | BACHELORS | 29/09/2013 | YES |
| qjgp97 | Aaron | Felina | | f | DIPLOMA | 29/10/2013 | NO |
| qzhs28 | Gyeong | Ma | | f | DIPLOMA | 29/10/2011 | NO |
| qzhs28 | Gyeong | Ma | | f | BACHELORS | 29/09/2013 | YES |
| uwnv95 | Anholt | Wilhemina | | f | MASTERS | 29/10/2011 | NO |
| uwnv95 | Anholt | Wilhemina | | f | BACHELORS | 10/10/2012 | YES |
| jaiw67 | Muguruza | David | Dave | m | MASTERS | 28/09/2014 | YES |
+-----------+------------+-----------+------------+--------+-----------+------------+-------+
But we need to reorder this table with the 'studentno' as the new primary key and then perform an update/upsert (which is where the new columns have now emerged)
+-----------+------------+-----------+------------+--------+--------------+------------+--------------+------------+
| STUDENTNO | LASTNAME | FIRSTNAME | PREFERNAME | GENDER | MAJOR_COURSE | MAJOR_YEAR | MINOR_COURSE | MINOR_YEAR |
+-----------+------------+-----------+------------+--------+--------------+------------+--------------+------------+
| auaw64 | Drury | Janet | Jane | f | BACHELORS | 29/09/2013 | DIPLOMA | 29/10/2011 |
| auqn70 | Givens | Jason | | m | BACHELORS | 10/10/2012 | DIPLOMA | 29/10/2011 |
| mrpd90 | Blackstock | Williams | Bill | m | BACHELORS | 29/09/2013 | DIPLOMA | 29/10/2011 |
| pyts84 | Peters | Theresa | | f | BACHELORS | 29/09/2013 | null | null |
| qjgp97 | Aaron | Felina | | f | DIPLOMA | 29/10/2013 | null | null |
| qzhs28 | Gyeong | Ma | | f | BACHELORS | 29/09/2013 | DIPLOMA | 29/10/2011 |
| uwnv95 | Anholt | Wilhemina | | f | BACHELORS | 29/09/2013 | DIPLOMA | 29/10/2011 |
| jaiw67 | Muguruza | David | Dave | m | MASTERS | 28/09/2014 | null | null |
+-----------+------------+-----------+------------+--------+--------------+------------+--------------+------------+
Essentially we want to take the first table and convert to the second but doing this using the functionalities inside PostgreSQL and also turning the 'studentno' into the unique key. We can then write the new table into another for another app to read or make use of but the new key will now offer a proper query handle unlike we have in the original table.
First create the course table
CREATE TABLE Courses (
id SERIAL,
course varchar);
Then insert all courses into the new table
insert into Courses(course)
select course from Participants GROUP BY course
Then create the n-n relation table
CREATE TABLE Participants_Courses (
id SERIAL,
studentno varchar,
course integer,
year date,
major boolean);
Then insert the values from Participants into the n-n table
INSERT INTO Participants_Courses AS PC (studentno, course, year,major)
VALUES (SELECT STUDENTNO, (SELECT id FROM Courses AS C where PC.course = C.course), YEAR, MAJOR);
Finally drop the unnecessary columns from the Participants table
ALTER TABLE Participants DROP COLUMN course;
ALTER TABLE Participants DROP COLUMN year;
ALTER TABLE Participants DROP COLUMN major;
This is normalized, I would not recommend to insert two courses into the same column of the table, because it would limit you. With this, a person could have many courses.
This is the Third normal form, see https://en.wikipedia.org/wiki/Third_normal_form
I did not test the SQL's so there may be syntax errors ;)
Disclaimer:
Doing this is completely backwards and wrong. It's doubling down on a bad table design by making it into a worse one. Do not use wide, denormalized tables as your primary data store.
Read:
https://en.wikipedia.org/wiki/Normalization
https://en.wikipedia.org/wiki/Third_normal_form
and get familiar with using joins.
If you use this solution then I am sorry to whoever has to maintain the code later. You appear to be determined to do it, and there are legitimate reasons you might want to make this transformation for things like a view or report, so here's how.
Use a self-left-join, treating studentno as the key. Something like:
select
l."STUDENTNO", l."LASTNAME", l."FIRSTNAME", l."PREFERNAME", l."GENDER",
l."COURSE" AS "MAJOR_COURSE", l."YEAR" AS "MAJOR_YEAR",
r."COURSE" AS "MINOR_COURSE", r."YEAR" AS "MINOR_YEAR"
FROM table1 l
LEFT OUTER JOIN table1 r
ON (l."STUDENTNO" = r."STUDENTNO"
AND (l."COURSE", l."YEAR") <> (r."COURSE", r."YEAR"))
WHERE (l."COURSE", l."YEAR") < (r."COURSE", r."YEAR")
OR r."COURSE" IS NULL;
A proper solution wouldn't rely on the lexical sort of BACHELORS before DIPLOMA and would instead use a CASE statement or a function to produce proper ordering, but since we're already piling wrong on top of wrong, it doesn't matter much.
SQLfiddle: http://sqlfiddle.com/#!15/2f49f/1
Now. Delete this, and go do it properly. Create a STUDENT table and COURSE table, and a STUDENT_COURSE table that joins them and records which students took which courses in which years. It's relational databases 101.