Postgres control flow--multiple return values - postgresql

I have a bunch of queries on a business database with addresses and I often want to reclassify those as either inside or outside a given (known) area.
The SELECT CASE construct is great for this purpose, but I have often been in a situation where I want more than one return value based on the same condition tested for. For example, if the business is in a certain area, I classify it as "inside", but I may also by the same token, preferably in the same CASE block, set another value or flag, differently weigh the observation, and so on conditional on the CASE criteria being true.
What is the best/easiest way to leverage long condition statements and get multiple return values at the same time? Is that the domain of plpgsql only?
EDIT: added mock data, below. This does the categorization, but if I wanted to weigh employment for each establishment, I would need a separate CASE block, with the same critieria. That is what I am trying to get around.
SELECT
City, CASE WHEN City =ANY (ARRAY['San Francisco', 'San Mateo','Oakland','Marin','Santa Clara'])
THEN City ELSE 'outside'::text END as area,EstabEmployees
FROM (VALUES
('San Francisco', 14),
('San Mateo', 23),
('San Mateo', 3),
('San Francisco', 34),
('Visalia', 65),
('Juneau', 23),
('Mendocino', 5),
('Santa Clara', 1),
('Los Angeles', 56),
('San Mateo', 11),
('Los Angeles', 30),
('Marin', 33),
('Oakland', 14),
('Oakland', 2)
) AS t (City, EstabEmployees)
;

The easiest way to do this is to leverage another table or two defining the relationship.
See: http://sqlfiddle.com/#!1/ef0bb/6
To preserve for future use, here's the "Schema" for sqlfiddle (which I'm using a combination of DDL and DML)
create table metros(
id serial primary key,
name varchar(100),
data varchar (100)
);
create table metromappings (
id serial primary key,
metroid int references metros(id),
cityname varchar(100) not null
);
insert into metros(name) values ('San Francisco Area');
insert into metromappings(metroid,cityname)
select currval(pg_get_serial_sequence('metros', 'id')), name
from (values ('San Francisco'),
('San Mateo'),
('Oakland'),
('Marin'),
('Santa Clara')) as t(name);
And here's my introduction of how to use it:
select
case when m.id IS NULL THEN 'outside::' ELSE t.City END AS area,EstabEmployees
from
(metros m inner join metromappings mm
on m.id = mm.metroid and m.name =ANY (ARRAY['San Francisco Area1'])) -- add more stuff here
full outer join
(VALUES
('San Francisco', 14),
('San Mateo', 23),
('San Mateo', 3),
('San Francisco', 34),
('Visalia', 65),
('Juneau', 23),
('Mendocino', 5),
('Santa Clara', 1),
('Los Angeles', 56),
('San Mateo', 11),
('Los Angeles', 30),
('Marin', 33),
('Oakland', 14),
('Oakland', 2)
) AS t (City, EstabEmployees) on t.City = mm.cityname
order by area, EstabEmployees;
Please note, you might want to do some clustering/unique indexing on metroid,cityname, if only to remove the possibility of adding the same city to the same area twice (or just define the pair as the key and set up the id as some unique index; I'm not sure which is best).

Related

Creating a table from literal values in Ibis

I'd like to use Ibis to create a table from literal values instead of a table.
In BigQuery SQL, I might do this with the a combination of the array and struct data types. See this example from the BigQuery docs.
WITH races AS (
SELECT "800M" AS race,
[STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as laps),
STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as laps),
STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as laps),
STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as laps),
STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as laps),
STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as laps),
STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as laps),
STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as laps)]
AS participants)
SELECT
race,
participant
FROM races r
CROSS JOIN UNNEST(r.participants) as participant;
The ibis.table() method only constructs an empty table with a given schema, so I'm not sure how one might go from such a table to one with literal values. Also, the fact that the table is unbound makes it difficult to use in many backends.
This is now available via ibis.memtable -- there's a brief introduction in the tutorial available here: https://ibis-project.org/docs/4.1.0/tutorial/05-IO-Create-Insert-External-Data/#creating-new-tables-from-in-memory-pandas-dataframes

Summation graph over tree in PostgreSQL

I have salaries table and a tree with departments (child, parent). Need to go over the graph and calculate every summation vertex. The output I need - summation of all child nodes. The issue looks like the request is double counting values.
Test data:
CREATE TABLE deps (
id serial,
child varchar,
parent varchar);
CREATE TABLE salaries (
name varchar,
salary numeric);
INSERT INTO salaries(name, salary) VALUES
('manager1', 100),
('manager2', 100),
('manager3', 100),
('manager4', 100),
('manager5', 100),
('manager6', 100),
('manager7', 100),
('manager8', 100),
('manager9', 100),
('engeneer1', 100),
('engeneer2', 100),
('engeneer3', 100),
('engeneer4', 100),
('engeneer5', 100),
('engeneer6', 100),
('engeneer7', 100),
('engeneer8', 100),
('engeneer9', 100),
('engeneer10', 100),
('accountant1', 100),
('accountant2', 100),
('accountant3', 100),
('accountant4', 100);
insert INTO deps(child, parent) VALUES
('manager1', 'management'),
('manager2', 'management'),
('manager3', 'management'),
('manager4', 'management'),
('management_team1', 'management'),
('management_team1_1', 'management_team1'),
('management_team1_2', 'management_team1'),
('manager5', 'management_team1_1'),
('manager6', 'management_team1_1'),
('manager7', 'management_team1'),
('manager8', 'management_team1_2'),
('manager9', 'management_team1_2'),
('engeneer1', 'it'),
('engeneer2', 'it'),
('engeneer3', 'it'),
('engeneer4', 'it'),
('it_dep1', 'it'),
('it_dep2', 'it'),
('engeneer5', 'it_dep1'),
('engeneer6', 'it_dep1'),
('engeneer7', 'it_dep2'),
('engeneer8', 'it_dep2'),
('it_dep3', 'it_dep2'),
('engeneer9', 'it_dep3'),
('engeneer10', 'it_dep3'),
('accountant1', 'accounts'),
('accountant2', 'accounts'),
('accountant3', 'accounts'),
('accountant4', 'accounts'),
('management', NULL),
('accounts', NULL),
('it', NULL);
Request:
WITH RECURSIVE tree ("depth", parent, child) as(
SELECT
0,
parent,
child
FROM deps
WHERE parent IS NULL
UNION
SELECT
"depth" + 1,
tree.child,
deps.child
FROM tree JOIN deps ON tree.child = deps.parent
),
graph (child, parent, "depth", value) as(
-- non recursive
SELECT
tree.child,
tree.parent,
tree."depth",
salaries.salary -- outbound amount FROM node which IS equal TO salary AT this depth
FROM
tree JOIN salaries ON salaries.name = tree.child
WHERE tree."depth" = (SELECT max("depth") FROM tree) -- we START FROM deepest LEVEL OF yierarchy
UNION
-- recursive
SELECT
current_tree.child,
current_tree.parent,
current_tree."depth",
COALESCE(current_tree.salary,
sum(graph.value) OVER (PARTITION BY current_tree.child)
) -- outbound amount FROM node which IS equal TO SUM of ALL incoming amounts
FROM
graph,
LATERAL (SELECT * FROM tree LEFT JOIN salaries ON salaries.name = tree.child WHERE tree."depth" = graph."depth" - 1) AS current_tree
LEFT JOIN LATERAL (SELECT * FROM tree WHERE tree."depth" = graph."depth") AS previous_tree
ON current_tree.child = previous_tree.parent
-- WHERE graph."depth" = (SELECT max("depth") FROM graph)
)
SELECT * FROM graph
WHERE graph."depth" = (SELECT max("depth") FROM graph)
gives error since double calling of graph table is not allowed
DB Fiddle sample
I expect to see 32 relations corresponding to initial tree with sums as values of all child nodes.

I'm having problems getting sql data in a table to pivot. I'm getting incorrect values

I have the following data that I need to pivot:
there is more data but this is a good representation. There are several samples. The results column is a derived field.
I have tried pivot (max(results) for question in [OFFSET DIRECTION],[OFFSET DISTANCE],[REFERENCE LINE],[STATION NUMBER],[THICKNESS]
I get pivoted data and the first set of rows (each with a new sample) with good data but starting with scmn (specimen number) is pulling data from other samples. I have tried various row_number() over (partition by sample, scmn order by sample, scmn) as [control] but nothing is working.
What I need is:
[
I have spent many days on this and am hitting a wall. Any help will be greatly appreciated.
Here is sample code:
drop table if exists smpl_rslt;
CREATE TABLE SMPL_RSLT
([SAMPLE] INT, QUESTION VARCHAR(100),VAL_NUM decimal(13,5), VAL_TXT VARCHAR(10),TST_STEP INT, SCMN INT)
INSERT INTO SMPL_RSLT VALUES
(732171,'Offset Direction',null, 'L',11,1),
(732171,'Offset Direction', null, 'L', 11, 2),
(732171,'Offset Direction',null,'L', 11,3),
(732171,'Offset Distance', 0.0000000, null, 13,1),
(732171,'Offset Distance', 0.0000000, null, 13,2),
(732171,'Offset Distance', 0.0000000, null, 13,3),
(732171,'Refence Line', null,'Centerline', 10,1),
(732171,'Refence Line', null,'Centerline', 10,2),
(732171,'Refence Line', null,'Centerline', 10,3),
(732171,'Station Number', null,'101+00', 5,1),
(732171,'Station Number', null,'101+05', 5,2),
(732171,'Station Number', null,'101+10', 5,3),
(732171,'Thickness', 6.500000,null, 14,1),
(732171,'Thickness', 6.500000,null, 14,2),
(732171,'Thickness', 6.500000,null, 14,3),
(732172,'Offset Direction',null, 'R',11,1),
(732172,'Offset Direction', null, 'R', 11, 2),
(732172,'Offset Direction',null,'R', 11,3),
(732172,'Offset Distance', 0.0000000, null, 13,1),
(732172,'Offset Distance', 0.0000000, null, 13,2),
(732172,'Offset Distance', 0.0000000, null, 13,3),
(732172,'Refence Line', null,'Right Edge', 10,1),
(732172,'Refence Line', null,'Right Edge', 10,2),
(732172,'Refence Line', null,'Right Edge', 10,3),
(732172,'Station Number', null,'210+00', 5,1),
(732172,'Station Number', null,'210+00', 5,2),
(732172,'Station Number', null,'210+00', 5,3),
(732172,'Thickness', 10.500000,null, 14,1),
(732172,'Thickness', 10.200000,null, 14,2),
(732172,'Thickness', 10.000000,null, 14,3);
select * from SMPL_RSLT
use test;
select [sample],[station number], [REFERENCE LINE],[OFFSET DIRECTION], [OFFSET DISTANCE],[THICKNESS]
from (
SELECT [sample],question,IIF(COALESCE(CAST(VAL_NUM AS VARCHAR(10)), VAL_TXT) = '','TEST IN PROGRESS',COALESCE(CAST(VAL_NUM AS VARCHAR(10)), VAL_TXT))AS RESULT
,row_number() over (partition by [sample],question order BY [sample], question) spcmn
from SMPL_RSLT) t
pivot (max(result) for question in
([station number], [REFERENCE LINE],[OFFSET DIRECTION], [OFFSET DISTANCE],[THICKNESS])) p
Reference line is not showing values
You have used wrong column values for the PIVOT which resulted you in NULL for Reference. Make sure you copy and paste column names when you use them in PIVOTS. That was the issue here. Error was [REFENCE LINE] had a spelling mistake. Avoid this, copy and paste column values when using them in PIVOTS as PIVOTS dealing with rows values.
1st without block column names
select [sample],[Station Number], [Refence Line],[Offset Direction], [Offset Distance] ,[Thickness]
from (
SELECT
[sample]
,question,IIF(COALESCE(CAST(VAL_NUM AS VARCHAR(10)), VAL_TXT) = '','TEST IN PROGRESS',COALESCE(CAST(VAL_NUM AS VARCHAR(10)), VAL_TXT))AS RESULT
,row_number() over (partition by [sample],question order BY [sample], question) spcmn
from SMPL_RSLT
) t
pivot
(
max(result)
for question in ([Station Number], [Refence Line],[Offset Direction], [Offset Distance] ,[Thickness])
) p
ORDER BY P.[Refence Line]
If you need to get capital column names in your query.
2nd with block column names
select [sample],[Station Number], [Refence Line],[Offset Direction], [Offset Distance] ,[Thickness]
from (
SELECT
[sample]
,question AS [question]
,IIF(COALESCE(CAST(VAL_NUM AS VARCHAR(10)), VAL_TXT) = '','TEST IN PROGRESS',COALESCE(CAST(VAL_NUM AS VARCHAR(10)), VAL_TXT))AS RESULT
,row_number() over (partition by [sample],question order BY [sample], question) spcmn
from SMPL_RSLT
) t
pivot
(
max(result)
for question in ([STATION NUMBER], [REFENCE LINE],[OFFSET DIRECTION], [OFFSET DISTANCE] ,[THICKNESS])
) p
ORDER BY P.[Refence Line]

Invalid length parameter passed to the LEFT or SUBSTRING function

I have the following description: 'Sample Product Maker Product Name XYZ - Size' and I would like to only get the value 'Product Name XYZ' from this. If this were just one row I'd have no issue just using SUBSTRING but I have thousands of records and although the initial value Sample Product Maker is the same for all products the Product Name could be different and I don't want anything after the hyphen.
What I have so far has generated the error in the header of this question.
SELECT i.Itemid,
RTRIM(LTRIM(SUBSTRING(i.ShortDescription, 25, (SUBSTRING(i.ShortDescription, 25, CHARINDEX('-', i.ShortDescription, 25)))))) AS ProductDescriptionAbbrev,
CHARINDEX('-', i.ShortDescription, 0) - 25 as charindexpos
FROM t_items i
I am getting 'Argument data type varchar is invalid for argument 3 of substring function'
As you can see, I am getting the value for the last line the sql statement but when I try and plug that into the SUBSTRING function I get various issues.
Chances are good you have rows where the '-' is missing, which is causing your error.
Try this...
SELECT i.Itemid,
SUBSTRING(i.ShortDescription, 22, CHARINDEX('-', i.ShortDescription+'-', 22)) AS ProductDescriptionAbbrev,
FROM t_items i
You could also strip out the Sample Product Maker text and go from there:
SELECT RTRIM(LEFT(
LTRIM(REPLACE(i.ShortDescription, 'Sample Product Maker', '')),
CHARINDEX('-', LTRIM(REPLACE(i.ShortDescription, 'Sample Product Maker',
'' ))) - 1))
AS ShortDescription
Your first call to SUBSTRING specifies a length of SUBSTRING(i.ShortDescription, 25, CHARINDEX('-', i.ShortDescription, 25)).
You might try:
declare #t_items as Table ( ItemId Int Identity, ShortDescription VarChar(100) )
insert into #t_items ( ShortDescription ) values
( 'Sample Product Maker Product Name XYZ - Size' )
declare #SkipLength as Int = Len( 'Sample Product Maker' )
select ItemId,
RTrim( LTrim( Substring( ShortDescription, #SkipLength + 1, CharIndex( '-', ShortDescription, #SkipLength ) - #SkipLength - 1 ) ) ) as ProductDescriptionAbbrev
from #t_items
The problem is that your outer call to SUBSTRING is being passed a character data type from the inner SUBSTRING call in the third parameter.
+--This call does not return an integer type
SELECT i.Itemid, V
RTRIM(LTRIM(SUBSTRING(i.ShortDescription, 25, (SUBSTRING(i.ShortDescription, 25, CHARINDEX('-', i.ShortDescription, 25)))))) AS ProductDescriptionAbbrev,
CHARINDEX('-', i.ShortDescription, 0) - 25 as charindexpos
FROM t_items i
The third parameter must evaluate to the length that you want. Perhaps you meant LEN(SUBSTRING(...))?
Seems like you want something like this (22, not 25):
SELECT i.Itemid,
RTRIM(LTRIM(SUBSTRING(i.ShortDescription, 22, CHARINDEX('-', i.ShortDescription)-22))) AS ProductDescriptionAbbrev,
CHARINDEX('-', i.ShortDescription)-22 as charindexpos
FROM t_items i
You want:
LEFT(i.ShortDescription, isnull(nullif(CHARINDEX('-', i.ShortDescription),0) - 1, 8000))
Note that a good practice is to wrap charindex(...)'s and patindex(...)'s with nullif(...,0), and then handle the null case if desired (sometimes null is the right result, in this case we want all the text so we isnull(...,8000) for the length we want).

SQL: PIVOTting Count & Percentage against a column

I'm trying to produce a report that shows, for each Part No, the results of tests on those parts in terms of the numbers passed and failed, and the percentages passed and failed.
So far, I have the following:
SELECT r2.PartNo, [Pass] AS Passed, [Fail] as Failed
FROM
(SELECT ResultID, PartNo, Result FROM Results) r1
PIVOT (Count(ResultID) FOR Result IN ([Pass], [Fail])) AS r2
ORDER By r2.PartNo
This is half of the solution (the totals for passes and fails); the question is, how do I push on and include percentages?
I haven't tried yet, but I imagine that I can start again from scratch, and build up a series of subqueries, but this is more a learning exercise - I want to know the 'best' (most elegant or most efficient) solution, so I thought I'd seek advice.
Can I extend this PIVOT query, or should I take a different approach?
DDL:
CREATE TABLE RESULTS (
[ResultID] [int] NOT NULL,
[SerialNo] [int] NOT NULL,
[PartNo] [varchar](10) NOT NULL,
[Result] [varchar](10) NOT NULL);
DML:
INSERT INTO Results VALUES (1, '100', 'ABC', 'Pass')
INSERT INTO Results VALUES (2, '101', 'DEF', 'Pass')
INSERT INTO Results VALUES (3, '100', 'ABC', 'Fail')
INSERT INTO Results VALUES (4, '102', 'DEF', 'Pass')
INSERT INTO Results VALUES (5, '102', 'DEF', 'Pass')
INSERT INTO Results VALUES (6, '102', 'DEF', 'Fail')
INSERT INTO Results VALUES (7, '101', 'DEF', 'Fail')
UPDATE:
My solution, based on bluefeet's answer is:
SELECT r2.PartNo,
[Pass] AS Passed,
[Fail] as Failed,
ROUND(([Fail] / CAST(([Pass] + [Fail]) AS REAL)) * 100, 2) AS PercentFailed
FROM
(SELECT ResultID, PartNo, Result FROM Results) r1
PIVOT (Count(ResultID) FOR Result IN ([Pass], [Fail])) AS r2
ORDER By r2.PartNo
I've ROUNDed a FLOAT(rather than CAST to DECIMAL twice) because its a tiny bit more efficient, and I've also decided that we only real need the failure %age.
It sounds like you just need to add a column for Percent Passed and Percent Failed. You can calculate those columns on your PIVOT.
SELECT r2.PartNo
, [Pass] AS Passed
, [Fail] as Failed
, ([Pass] / Cast(([Pass] + [Fail]) as decimal(5, 2))) * 100 as PercentPassed
, ([Fail] / Cast(([Pass] + [Fail]) as decimal(5, 2))) * 100 as PercentFailed
FROM
(
SELECT ResultID, PartNo, Result
FROM Results
) r1
PIVOT
(
Count(ResultID)
FOR Result IN ([Pass], [Fail])
) AS r2
ORDER By r2.PartNo