Show first and last value in table - qliksense

I have an excel file with customer's purchasing details (sorted by date).
for example:
customer_id
date
$_Total_purchase
A
1/2/23
5
A
1/3/23
20
A
1/4/23
10
i want to show in table, one row for each customer, so the final table will be:
customer_id
date
purchase_counter
amount_of_last_purchase
amount_of_first_purchase
A
1/4/23
3
10
5
in my table, customer_id is a dimension.
for extracting the date, i use max(date) as measure
for purchase_counter i use count(customer_id)
for extracting 'amount_of_first_purchase', i use firstSortedValue('$_Total_purchase', date)
how can i extract 'amount_of_last_purchase'? is there maybe an aggregation function i can use?
Thanks in advance :)

The simple answer is that you can use -date in you expression and this will return the last record:
FirstSortedValue('$_Total_purchase', -date)
The above will work for the provided data example. When there are more than one customer then Aggr function can help:
First: FirstSortedValue(aggr(sum($_Total_purchase), customer_id, date), date)
Last: FirstSortedValue(aggr(sum($_Total_purchase), customer_id, date), -date)
Another approach (if applied to your case/data) is to flag the first and last records during the data load and use the flags in the measures.
An example script:
RawData:
Load * Inline [
customer_id, date, $_Total_purchase
A, 2/1/23, 5
A, 3/1/23, 20
A, 4/1/23, 10
B, 5/1/23, 35
B, 6/1/23, 40
B, 7/1/23, 50
];
Temp0:
Load
customer_id,
date,
// flag the first record
// if the current row is the beggining of the table then flag as isFirst = 1
// if the customer_id for the current row is different from the previously loaded >-
// customer_id then flag as isFirst = 1
if(RowNo() = 1 or customer_id <> peek(customer_id), 1, null()) as isFirst,
// getting the last is a bit more tricky
// similar logic - if the currrent and previous customer_id are different >-
// or it is the end of the table then get the current customer_id and date >-
// and combine their values. Values are separeted with | ELSE write 0.
// for example: A|4/1/23 or B|7/1/23
if(customer_id <> peek(customer_id) and RowNo() <> 1, peek(customer_id) & '|' & peek(date),
if(RowNo() = NoOfRows('RawData'), customer_id & '|' & date, 0
)) as isLastTemp
Resident
RawData
;
// Get all the data from Temp0 for which isLastTemp is not equal to 0
// split isLastTemp by | -> fist value is customer_id and second is date
// join the result back to the otiginal table
join (RawData)
Load
SubField(isLastTemp, '|', 1) as customer_id,
SubField(isLastTemp, '|', 2) as date,
1 as isLast
Resident
Temp0
Where
isLastTemp <> 0
;
// join Temp0 to the original table
// but only grab the isFirst flag
join(RawData)
Load
customer_id,
date,
isFirst
Resident
Temp0
;
// this table is no longer needed
Drop Table Temp0;
Once the above script is reloaded RawData table will have two more columns - isFirst and isLast:
Then the expressions are simpler:
First: sum( {< isFirst = {1} >} $_Total_purchase)
Last: sum( {< isLast = {1} >} $_Total_purchase)

import pandas as pd
# read excel file
df = pd.read_excel('customer_purchases.xlsx')
# get first value
first_value = df.head(1)
# get last value
last_value = df.tail(1)
you can do with pandas also

Related

Fast new row insertion if a value of a column depends on previous value in existing row

I have a table cusers with a primary key:
primary key(uid, lid, cnt)
And I try to insert some values into the table:
insert into cusers (uid, lid, cnt, dyn, ts)
values
(A, B, C, (
select C - cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 1
), now())
on conflict do nothing
Quite often (with the possibility of 98%) a row cannot be inserted to cusers because it violates the primary key constraint, so hard select queries do not need to be executed at all. But as I can see PostgreSQL first counts the select query as a result of dyn column and only then rejects row because of uid, lid, cnt violation.
What is the best way to insert rows quickly in such situation?
Another explanation
I have a system where one row depends on another. Here is an example:
(x, x, 2, 2, <timestamp>)
(x, x, 5, 3, <timestamp>)
Two columns contain an absolute value (2 and 5) and relative value (2, 5 - 2). Each time I insert new row it should:
avoid same rows (see primary key constraint)
if new row differs, it should count a difference and put it into the dyn column (so I take the last inserted row for the user according to the timestamp and subtract values).
Another solution I've found is to use returning uid, lid, ts for inserts and get user ids which were really inserted - this is how I know they have differences from existing rows. Then I update inserted values:
update cusers
set dyn = (
select max(cnt) - min(cnt)
from (
select cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 2) Table
)
where uid = A and lid = B and ts = TS
But it is not a fast approach either, as it seeks all over the ts column to find the two last inserted rows for each user. I need a fast insert query as I insert millions of rows at a time (but I do not write duplicates).
What the solution can be? May be I need a new index for this? Thanks in advance.

Qlik -create calculated Dimension to show barchart from month start untill filter date(5-4-4 calendar)

I have a master calendar(5-4-4) that look like the below image.
I have a Date column in the Sales table. I'm using that Date column in the Selection pane(Filter).
Example Scenario:
now I selected Selection pane date '15-10-2020', bar-chart only showing '15-10-2020' sales only. but I need to look up the master calendar and show the dimension from the start_month date to until selected date
Expected Output: bar chart needs to show dimensions from
'28-09-2020' to '15-10-2020'
The approach is to generate dates from Start_Month to TransDate (made up field name)
Lets say that this is the data we have:
Transactions:
Load * inline [
TransDate , Value
15-10-2020, 100
27-07-2021, 50
];
MasterCalendar_Temp:
Load * inline [
Start_Month, End_Month , Month_number
28-09-2020 , 01-11-2020, 1
02-11-2020 , 29-11-2020, 2
30-11-2020 , 27-12-2020, 3
28-12-2020 , 31-01-2021, 4
01-02-2021 , 28-02-2021, 5
01-03-2021 , 28-03-2021, 6
29-03-2021 , 02-05-2021, 7
03-05-2021 , 30-05-2021, 8
31-05-2021 , 27-06-2021, 9
28-06-2021 , 01-08-2021, 10
02-08-2021 , 29-08-2021, 11
30-08-2021 , 26-09-2021, 12
];
The first step is to find in which interval each TransDate is part of. For this we'll use IntervalMatch function
Inner Join
IntervalMatch ( TransDate )
Load
Start_Month,
End_Month
Resident
MasterCalendar_Temp
;
At this point MasterCalendar_Temp table will look like:
So we now know the perid for each TransDate
The next step is to load the MasterCalendar_Temp data into a separate table but concatenate Start_Month and TransDate into one:
NoConcatenate
MasterCalendar:
Load
Start_Month,
End_Month,
Start_Month & '_' & TransDate as Start_TransDate_Temp
Resident MasterCalendar_Temp;
// we dont need this table anymore
Drop Table MasterCalendar_Temp;
Once we have it we can start creating our dates
// loop through each value in Start_TransDate_Temp field
// for each step extract Start_Month and TransDate values
// use these two values to generate the dates between them
for i = 1 to FieldValueCount('Start_TransDate_Temp')
let value = FieldValue('Start_TransDate_Temp', $(i));
let startDate = num(SubField('$(value)', '_', 1));
let transDate = num(SubField('$(value)', '_', 2));
Dates:
LOAD
date('$(transDate)', 'DD-MM-YYYY') as TransDate,
date($(startDate) + IterNo() - 1, 'DD-MM-YYYY') AS DisplayDates
AUTOGENERATE (1)
WHILE
$(startDate) + IterNo() -1 <= $(transDate)
;
next
// we dont need this table anymore
Drop Table MasterCalendar;
And thats it!
After the script is reloaded we'll have two tables:
Transactions table is untouched and Dates table will have values like this:
As you can see for each TransDate we have the range of dates (from the correcponding Start_Month to TransDate
If we construct a simple bar chart (with DisplayDates as dimension and sum(Value) as measure) and do not apply any selections:
And if select one TransDate:

Query function to contain date operations [DATE + X number of months]

I am querying a Google spreadsheet, using a relatively simple expression:
=QUERY(Sheet1!A1:J200, "Select A, J", 1)
This query produces list of Offices and Last N date in columns L and M - see picture below.
What I do next is
add 6 months to each of the Last N dates;
=IF(M2="","",DATE(YEAR(M2)+0,MONTH(M2)+6,DAY(M2)+0))
See if any of the resultant dates are equal to or greater than TODAY();
If YES, place "ALARM" into column O which is then used as a marker elsewhere, by filtering the rows with this value as an identifier.
=IF(today()>=X2,"ALARM","")
I was wondering if it is possible to create a query where 6 months would already be added to values in Column J and, possibly, the resultant list filtered IF value[i] in column J is greater than or equal to TODAY(). By achiving this, the column J would contain only Last N dates + 6 months AND >= TODAY();
All examples I have checked seems to operate with dates as filters.
=QUERY({Sheet1!A1:A,
ARRAYFORMULA(DATE(YEAR(Sheet1!J1:J), MONTH(Sheet1!J1:J)+6, DAY(Sheet1!J1:J)))},
"select Col1,Col2,'ALARM'
where Col1 is not null
and Col2 >=date '"&TEXT(TODAY(), "yyyy-mm-dd")&"'
label Col2'ABCD', 'ALARM''alarm'
format Col2 'dd-mmm-yyyy'", 1)
=QUERY({FleetStatus!A1:D, ARRAYFORMULA(
DATE(YEAR(FleetStatus!J1:J), MONTH(FleetStatus!J1:J)+6, DAY(FleetStatus!J1:J)))},
"select Col1,Col5,'ALARM'
where Col1 is not null
and Col1 !='IVAN GUBKIN'
and Col1 !='VYACHESLAV TIKHONOV'
and Col4 != 'L'
and Col5 <=date '"&TEXT(TODAY(), "yyyy-mm-dd")&"'
label Col5'+6M', 'ALARM''Alarm'
format Col5 'dd-mmm-yyyy'", 1)

How to accumulate values tsql

I have to solve a problem and don't know how to do it. Im using SQL Server 2012.
I have the data like this schema:
-----------------------------------------------------------------------------------
DriverId | BeginDate | EndDate | NextBegin | Rest in | Drive Time | Drive
| | | Date | Hours | in Minutes | KM
-----------------------------------------------------------------------------------
integer datetime datetime datetime integer integer decimal(10,3)
Rest in hours = EndDate - NextBeginDate
Drive Time in Minutes = BeginDate - EndDate
I have to search the first rest => 36 hours then
Do
Compute how many days are
SUM(DriveTime)
SUM(TotalKM)
until next rest => 36 hours
IF No More Rest EXIT DO
Loop
From the begining to the first Rest is discard
From the last Rest to the end is discard
I have data in excel sheet you can download from here: Download Excel with data example
I'm sorry for my english, I hope you can understand and help me, thank you in advance.
There are several parts to the query. The first part pulls out the rows where Rest is >= 36 and assigns a row number. The result is stored in a CTE called BigRest.
with BigRest(RowNumber, DriverId, BeginDate, EndDate)
as
(
select ROW_NUMBER() over(partition by d.DriverId order by d.DriverId, d.BeginDate) RowNumber,d.DriverId, d.BeginDate, d.EndDate
from Drive d
where d.Rest >= 36
)
Then I assign the row number from BigRest to each row in Drive (which is what I'm calling the table that has all the data in it) based on the BeginDate. So the data is effectively segmented by the days where Rest >= 36. Each segment gets a number called DriveGroup.
;with Grouped(DriverId, BeginDate, EndDate, DriveTime, DriveKM, DriveGroup)
as
(
select d.DriverId, d.BeginDate, d.EndDate, d.Drivetime, d.DriveKM, (select Top 1 RowNumber from BigRest b where b.DriverId = d.DriverId and b.BeginDate >= d.BeginDate order by b.BeginDate)
from Drive d
)
Finally, I select the data from Grouped, cross applying it with some aggregate data from itself. We can filter out the rows where the DriveGroup is 1 or null because those represent the beginning and end rows that don't matter (the "do nothing" rows).
select distinct DriverId, MinBeginDate BeginDate, MaxEndDate EndDate, DATEDIFF(D, MinBeginDate, MaxEndDate)+1 Days, DriveTimeSum Drive, DriveKMSum KM
from
(
select g.DriverId, g.BeginDate, g.EndDate, g.DriveGroup, g.DriveTime, c.DriveTimeSum, c.DriveKMSum, c.MinBeginDate, c.MaxEndDate
from Grouped g
cross apply(select SUM(g2.DriveTime) DriveTimeSum,
SUM(g2.DriveKM) DriveKMSum,
MIN(g2.BeginDate) MinBeginDate,
MAX(g2.EndDate) MaxEndDate
from Grouped g2
where g2.DriverId = g.DriverId
and g2.DriveGroup = g.DriveGroup) as c
where g.DriveGroup is not null
and g.DriveGroup > 1
) x
Here's a SQL Fiddle
I'd encourage you to look at the results at each step of the query to see what's actually going on.

Perl + PostgreSQL-- Selective Column to Row Transpose

I'm trying to find a way to use Perl to further process a PostgreSQL output. If there's a better way to do this via PostgreSQL, please let me know. I basically need to choose certain columns (Realtime, Value) in a file to concatenate certains columns to create a row while keeping ID and CAT.
First time posting, so please let me know if I missed anything.
Input:
ID CAT Realtime Value
A 1 time1 55
A 1 time2 57
B 1 time3 75
C 2 time4 60
C 3 time5 66
C 3 time6 67
Output:
ID CAT Time Values
A 1 time 1,time2 55,57
B 1 time3 75
C 2 time4 60
C 3 time5,time6 66,67
You could do this most simply in Postgres like so (using array columns)
CREATE TEMP TABLE output AS SELECT
id, cat, ARRAY_AGG(realtime) as time, ARRAY_AGG(value) as values
FROM input GROUP BY id, cat;
Then select whatever you want out of the output table.
SELECT id
, cat
, string_agg(realtime, ',') AS realtimes
, string_agg(value, ',') AS values
FROM input
GROUP BY 1, 2
ORDER BY 1, 2;
string_agg() requires PostgreSQL 9.0 or later and concatenates all values to a delimiter-separated string - while array_agg() (v8.4+) creates am array out of the input values.
About 1, 2 - I quote the manual on the SELECT command:
GROUP BY clause
expression can be an input column name, or the name or ordinal number
of an output column (SELECT list item), or ...
ORDER BY clause
Each expression can be the name or ordinal number of an output column
(SELECT list item), or
Emphasis mine. So that's just notational convenience. Especially handy with complex expressions in the SELECT list.