Concatenate all values up to current date - qliksense

Say we have a table of products and the year in which each product was sold. Using Qliksense, I would like to add a column that concatenates all products that were sold up to the respective year.
year
product
2020
A
2021
B
2022
C
Desired outcome would be:
year
product
concatenate_products
2020
A
A
2021
B
A-B
2022
C
A-B-C
See below for what I tried so far. It gave error "invalid expression".
Load
*,
if(RowNo()=1, 'A', Concat(distinct Peek(concatenate_products), product)) as concatenate_products
resident Table;

Might be easier to create the new field in separate table and then join to the source table:
Load * Inline [
year, product
2020, A
2021, B
2022, C
];
NoConcatenate
// get list of only distinct year <-> product values
DistinctValues:
Load
distinct
year,
product
Resident
RawData
;
// calculate the combined product field
// and join back to the source table.
// the join will be performed on 2 fields: year and product
join (RawData)
Load
*,
if(RowNo() = 1, product, peek(concatenate_products) & '-' & product) as concatenate_products
Resident
DistinctValues
;
Drop Table DistinctValues;
Result table:

Related

subquery problem - need to get avg of a sum

I have 2 tables
sales table
weekly sales, store, date
store table
store, type, size
my sales table has multiple years, multiple stores and multiple types. I'm trying to get the avg sales by sqft for each store type per year. I have a sub query that shows the sales by sqft for each store but Im having trouble then rolling it up into my main query to get the avg by type
Anything jumps out with my final query?
SELECT
date_part('year', sales.date) AS year,
stores.type,
AVG(sales_by_sqft)
FROM
(SELECT
SUM((sales.weekly_sales)/stores.size) AS sales_by_sqft
FROM SALES
INNER JOIN stores ON sales.store = stores.store
GROUP BY sales.store) AS sq
FROM sales
INNER JOIN stores ON sales.store = stores.store
WHERE date_part('year', date) = 2012
GROUP BY year, stores.type;
getting a syntax error on the second FROM statement
I figured it out. AVG doesn't work on money. Once I changed that data type to integer, it all fell in place
SELECT
year,
type,
ROUND(AVG(sales_by_sqft),2)AS avg_sales_by_sqft
FROM
(SELECT
date_part('year', sales.date) AS year,
stores.type,
sales.store,
stores.size,
SUM(sales.weekly_sales) AS total_sales,
SUM(sales.weekly_sales)/ AVG(stores.size) AS sales_by_sqft
FROM sales
INNER JOIN stores ON sales.store = stores.store
GROUP BY year, stores.type, sales.store, stores.size) AS sq
GROUP BY 1,2
ORDER BY 1,3 DESC;

How to join on closest date in Postgresql

Suppose, I have following tables
product_prices
product|price|date
-------+-----+----------
apple |10 |2014-03-01
-------+-----+----------
apple |20 |2014-05-02
-------+-----+----------
egg |2 |2014-03-03
-------+-----+----------
egg |4 |2015-10-12
purchases:
user|product|date
----+-------+----------
John|apple |2014-03-02
----+-------+----------
John|apple |2014-06-03
----+-------+----------
John|egg |2014-08-13
----+-------+----------
John|egg |2016-08-13
What I need is table similar to this:
name|product|purchase date |price date|price
----+-------+--------------+----------+-----
John|apple |2014-03-02 |2014-03-01|10
----+-------+--------------+----------+-----
John|apple |2014-06-03 |2014-05-02|20
----+-------+--------------+----------+-----
John|egg |2014-08-13 |2014-08-13|2
----+-------+--------------+----------+-----
John|egg |2016-08-13 |2015-10-12|4
Or "what is the price for product at this day". Where price is calculated based on date from products table.
On real DB I tried to use something similar to:
SELECT name, product, pu.date, pp.date, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp
ON pu.date = (
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1);
But I keep either getting only left part of table (with (null) instead of product dates and prices) or many rows with all the combinations of prices and dates.
I would suggest changing product_prices table to use a daterange column instead (or at least a start_date and an end_date).
You can use an exclusion constraint to make sure you never have overlapping ranges for one product and an insert trigger that "closes" the "current" prices and creates a new unbounded range for the newly inserted price.
A daterange can efficiently be indexed and with that in place the query gets as easy as:
SELECT name, product, pu.date, pp.valid_during, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp ON pu.date <# pp.valid_during
(assuming the range column is named valid_during)
The exclusion constraint would only work however if the product was an integer (not a varchar) - but I guess your real product_purchases table uses a foreign key to some product table anyway (which is an integer).
The new table definitions could look something like this:
create table purchase_prices
(
product_id integer not null references products,
price numeric(16,4) not null,
valid_during daterange not null
);
And the constraint that prevents overlapping ranges:
alter table purchase_prices
add constraint check_price_range
exclude using gist (product_id with =, valid_during with &&);
The constraint needs the btree_gist extension.
As always improving query speed comes with a price and in this case it's the higher maintenance costs for the GiST index. You would need to run some tests to see if the easier (and most probably much faster) query outweighs the slower insert performance on purchase_prices.
Look at your scalar sub-query very closely. It is not correlated back to the outer query. In other words, it will return the same result every time: the latest date in the product_prices table. Period. Think about the query out of context:
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1
There are two problems with it:
It will return 2015-10-12 for every row in the join and ultimately, nothing was purchased on that date, hence, null.
Your approximation of closest is that the dates are equal. Unless you have a product_prices row for every product for every single date, you'll always have misses. "Closest" implies distance and ranking.
WITH close_prices_by_purchase AS (
SELECT
p.user,
p.product,
p.date pp.date,
pp.price,
row_number() over (partition by pp.product, order by pp.date desc) as distance -- calculate distance between purchase date and price date
FROM purchases AS p
INNER JOIN product_prices AS pp on pp.product = p.product
WHERE pp.date < p.date
)
SELECT user as name, product, pu.date as purchase_date, pp.date as price_date, price
FROM close_prices_by_purchase AS cpbp
WHERE distance = 1; -- shortest distance
You can try something like this, although I am sure there's a better way:
with diffs as (
select
a.*,
b."date" as bdate,
b.price,
b."date" - a."date" as diffdays,
row_number() over (
partition by "user", a."product", a."date"
order by "user", a."product", a."date", b."date" - a."date" desc
) as sr
from purchases a
inner join product_prices b on a.product = b.product
where b."date" - a."date" < 1
)
select
"user" as "name",
product,
"date" as "purchase date",
bdate as "price date",
price
from diffs
where sr = 1
Example: https://www.db-fiddle.com/f/dwQ9EXmp1SdpNpxyV1wc6M/0
Explanation
I attempted to join both tables and find the difference between dates of purchase and price, and ranked them by closest date prior to the purchase. Rank of 1 will go to the closest date. Then, data with rank of 1 was extracted.
This is a great place to use date ranges! We know the start date of the price range and we can use a window function to get the next date. At that point, it's really easy to figure out the price on any day.
with price_ranges as
(select product,
price,
date as price_date,
daterange(date, lead(date, 1)
OVER (partition by product order by date), '[)'
) as valid_price_range from product_prices
)
select "user" as name,
purchases.product,
purchases.date,
price_date,
price
from purchases
join price_ranges on purchases.product = price_ranges.product
and purchases.date <# price_ranges.valid_price_range
order by purchases.date;

Join 2 tables where two sets of numbers overlap within the joining columns

I need to join 2 tables with postgresql where two sets of numbers overlap within the joining columns.
The image below explains it - I am needing to take a table of congresspeople and their party affiliation and join it with a table of districts (based on when the districts were drawn or redrawn). The result will be the rows that show the dates that the district, state and congressperson were the same. Wherever there are dates of a district that are known and the congressperson dates are unknown, the dates that are known for the district are filled for that portion, and the dates for the congressperson are left blank - and vice versa.
For example, for the first rows in the tables:
Congressperson Table:
Arkansas, District 5, Republican: 1940-1945
District Table:
Arkansas, District 5: 1942-1963
Results in the following combinations (Start_Comb and End_Comb):
1940-1942
1942-1945
And for the combination where the district is unknown (1940-1942), the district dates are left blank.
The final set of date columns (gray) is simply the combinations that are only for the district (this is super easy).
In case you're wondering what this is for, I am creating an animated map, kind of like this, but for congressional districts over time:
https://www.youtube.com/watch?v=vQDyn04vtf8
I'll end up with something where there is a map where for every known district, there is a known or unknown party.
Haven't got very far, this is what I did:
SELECT *
FROM congressperson
JOIN districts
ON Start_Dist BETWEEN Start_Cong AND End_Cong
WHERE district.A = district.B
OR End_Dist BETWEEN Start_Cong AND Start_Dist
OR Start_Cong = Start_Dist OR End_Cong= End_Dist;
The idea is to make list of unique dates from both tables first. Then for each such date find next date (in this particular case dates are grouped by state, district, and next date is looked for particular state, district).
So now we have list of ranges we are looking for. Now we can join (for this paticular task left join) other tables by required conditions:
select
r.state,
c.start_cong,
c.end_cong,
c.party,
coalesce(c.district, d.district) district,
d.start_dist,
d.end_dist,
start_comb,
end_comb,
case when d.district is not null then start_comb end final_start,
case when d.district is not null then end_comb end final_end
from (
with dates as (
select
*
from (
SELECT
c.state,
c.district,
start_cong date
FROM congressperson c
union
SELECT
c.state,
c.district,
end_cong
FROM congressperson c
union
SELECT
d.state,
d.district,
start_dist
FROM district d
union
SELECT
d.state,
d.district,
end_dist
FROM district d
) DATES
group by
state,
district,
date
order by
state,
district,
date
)
select
dates.state,
dates.district,
dates.date start_comb,
(select
d.date
from
dates d
where
d.state = dates.state and
d.district = dates.district and
d.date > dates.date
order by
d.date
limit 1
) end_comb
from
dates) r
left join congressperson c on
c.state = r.state and
c.district = r.district and
start_comb between c.start_cong and c.end_cong and
end_comb between c.start_cong and c.end_cong
left join district d on
d.state = r.state and
d.district = r.district and
start_comb between d.start_dist and d.end_dist and
end_comb between d.start_dist and d.end_dist
where
end_comb is not null
order by
r.state, coalesce(c.district, d.district), start_comb, end_comb, start_cong, end_cong

Change the relation between two tables to outer join

I have a table (table1) has fact data. Let's say (products, start, end, value1, month[calculated column]) are the columns and start and end columns are timestamp.
What I am trying to have is a table and bar chart which give me sum of value1 for each month divided by a factor number according to each month (this report is a yearly bases. I mean, I load the data into qlik sense for one year).
I used the start and end to generate autoCalendar as a timestamp field in qlik sense data manager. Then, I get the month from start and store it in the calculated column "month" in the table1 using the feature of autoCalendar (Month(start.autoCalendar.Month)).
After that, I created another table having two columns (month, value2) the value2 column is a factor value which I need it to divide the value1 according to each month. that's mean (sum(value1) /1520 [for January], sum(value2) / 650 [for February]) and so on. Here the month and month columns are relational columns in qlik sense. then I could in my expression calculated the sum(value1) and get the targeted value2 which compatible with the month for the table2.
I could make the calculation correctly. but still one thing is missed. The data of the products does not have value (value1 ) in every month. For example, let's say that I have a products (p1,p2...). I have data in the table 1 for (Jun, Feb, Nov), and for p2 for (Mrz, Apr,Mai, Dec). Hence, When the data are presented in a qlik sense table as well as in a bar chart I can see only the months which have values in the fact table. The qlik sense table contains (2 dimensions which are [products] and [month] and the measure is m1[sum(value1)/value2]).
What I want to have a yearly report showing the 12 months. and in my example I can see for p1 (only 3 months) and for p2 (4 months). When there is no data the measure column [m1] 0 and I want to have the 0 in my table and chart.
I am think, it might be a solution if I can show the data of the the qlik sense table as right outer join of my relation relationship (table1.month>>table2.month).So, is it possible in qlik sense to have outer join in such an example? or there is a better solution to my problem.
Update
Got it. Not sure if that this is the best approach but in this cases I usually fill the missing records during the script load.
// Main table
Sales:
Load
*,
ProductId & '-' & Month as Key_Product_Month
;
Load * Inline [
ProductId, Month, SalesAmount
P1 , 1 , 10
P1 , 2 , 20
P1 , 3 , 30
P2 , 1 , 40
P2 , 2 , 50
];
// Get distinct products and assign 0 as SalesAmount
Products_Temp:
Load
distinct ProductId,
0 as SalesAmount
Resident
Sales
;
join (Products_Temp) // Cross join in this case
Load
distinct Month
Resident
Sales
;
// After the cross join Products_Temp table contains
// all possible combinations between ProductId and Month
// and for each combination SalesAmount = 0
Products_Temp_1:
Load
*,
ProductId & '-' & Month as Key_Product_Month1 // Generate the unique id
Resident
Products_Temp
;
Drop Table Products_Temp; // we dont need this anymore
Concatenate (Sales)
// Concatenate to main table only the missing ProductId-Month
// combinations that are missing
Load
*
Resident
Products_Temp_1
Where
Not Exists(Key_Product_Month, Key_Product_Month1)
;
Drop Table Products_Temp_1; // not needed any more
Drop Fields Key_Product_Month1, Key_Product_Month; // not needed any more
Before the script:
After the script:
The table link in Qlik Sense (and Qlikview) is more like full outer join. if you want to show the id only from one table (and not all) you can create additional field in the table you want and then perform your calculations on top of this field instead on the linked one. For example:
Table1:
Load
id,
value1
From
MyQVD1.qvd (qvd)
;
Table2:
Load
id,
id as MyRightId
value2
From
MyQVD2.qvd (qvd)
;
In the example above both tables will still be linked on id field but if you want to count only the id values in the right table (Table2) you just need to type
count( MyRightId )
I know this questions has been answered and I quite like Stefan's approach but hope my answer will help other users. I recently ran into something similar and I used a slightly different logic with the following script:
// Main table
Sales:
Load * Inline [
ProductId, Month, SalesAmount
P1 , 1 , 10
P1 , 2 , 20
P1 , 3 , 30
P2 , 1 , 40
P2 , 2 , 50
];
Cartesian:
//Create a combination of all ProductId and Month and then load the existing data into this table
NoConcatenate Load distinct ProductId Resident Sales;
Join
Load Distinct Month Resident Sales;
Join Load ProductId, Month, SalesAmount Resident Sales; //Existing data loaded
Drop Table Sales;
This results in the following output table:
The Null value in the new (bottom-most) row can stay like that but if you prefer replacing it then use Map..Using process

Show data from quarterly records in a single row

Each quarter's sales data is contained in a row in the data source.
Account 1's 4 quarters of sales data would be in 4 separate records, each containing the account name, quarter number, and count of items purchased.
The report should show, in each detail row: account name, q1 count, q2 count, q3 count, q4 count, total year count.
I'm new to Crystal, but it seems like this should be easy; how would I do this?
I'd probably create the result list using some slightly complex sql and they just display it on the Crystal report...but if you're wanting to accomplish this entirely inside Crystal, take a look at http://aspalliance.com/1041_Creating_a_Crosstab_Report_in_Visual_Studio_2005_Using_Crystal_Reports.all.
Here's a stab at the SQL that would be required...
select
accountName,
(select sum(itemCount) from myTable where quarterName = 'q1') as q1Count,
(select sum(itemCount) from myTable where quarterName = 'q2') as q2Count,
(select sum(itemCount) from myTable where quarterName = 'q3') as q3Count,
(select sum(itemCount) from myTable where quarterName = 'q4') as q4Count,
(select sum(itemCount) from myTable) as yearCount
from myTable
group by accountName ;
If your data source has the sales date in it (and I assume it would), you can create a formula called #SalesQuarter:
if month({TableName.SalesQuarter}) in [1,2,3] then '1' else
if month({TableName.SalesQuarter}) in [4,5,6] then '2' else
if month({TableName.SalesQuarter}) in [7,8,9] then '3'
else '4'
You can then add a cross-tab to your report, and use the new #SalesQuarter field as the column header of your cross-tab.
This assumes your sales are all within the same year.
Add a group on {account}
In the group footer add a Running total for each quarter.
For each quarter, create a running total with following settings:
Running Total Name: create a unique name for each formula, for example Q1,Q2,Q3,Q4
Field to summarize: {items purchased}
Type of summary: sum
Evaluate: Use a formula - {quarter number}= --should be 1,2,3, or 4, depending on which quarter you are summing
Reset: On Change of Group {account}