Aggregate the columns based on a condition in SPSS modeler

Aggregate the columns based on a condition in SPSS modeler - spss-modeler

I would like to add the values based on a column:
For example: The input table looks like this:
USERS Order_date Number_of_orders
alice 01-01-2014 2
alice 19-01-2014 5
alice 20-05-2014 8
bob 03-01-2014 1
bob 08-04-2014 9
The output should be like:
USERS Order_date Number_of_orders(NEW)
alice 01-01-2014 2
alice 19-01-2014 7
alice 20-05-2014 15
bob 03-01-2014 1
bob 08-04-2014 10
Number_of_orders(NEW) is the sum of total orders in the same day + the total number of previous orders of that user.
Please let me know how to do this with SPSS modeler.

I know this is fairly late but what I would do is:
1) create a conditional derive node if NAME /= #OFFSET(NAME,1) or #NULL(NAME) then Number_of_orders else null
2) create filler node if #NULL(Derive325) then #OFFSET(Derive325,1)+Number_of_orders
Just FYI, if I were to look for this, I would use the word cumulative, not aggregate

Related

How to create an inline table from an existing table

I have a table in qlik Sense loaded from the database.
Example:
ID
FRUIT
VEG
COUNT
1
Apple
5
2
Figs
10
3
Carrots
20
4
Oranges
12
5
Corn
10
From this I need to make a filter that will display all the Fruit/Veg records along with records from other joined tables, when selected.
The filter needs to be something like this :
|FRUIT_XXX|
|VEG_XXX |
Any help will be appreciated.

I do not know how to do it in qlicksense, but in SQL it's like this:
SELECT
ID
CASE WHEN FRUIT IS NULL THEN VEG ELSE FRUIT END as FruitOrVeg,
COUNT
FROM tablename

Not sure if its possible to be dynamic. Usually I solve these by creating a new field that combines the values from both fields into one field
RawData:
Load * Inline [
ID , FRUIT ,VEG , COUNT
1 , Apple , , 5
2 , Figs , , 10
3 , ,Carrots , 20
4 , Oranges , , 12
5 , ,Corn , 10
];
Combined:
Load
ID,
'FRUIT_' & FRUIT as Combined
Resident
RawData
Where
FRUIT <> ''
;
Concatenate
Load
ID,
'VEG_' & VEG as Combined
Resident
RawData
Where
VEG <> ''
;
This will create new table (Combined) which will be linked to the main table by ID field:
The new Combined field will have the values like this:
And the UI:
P.S. If further processing is needed you can join the Combined table to the RawData table. This way the Combined field will become part of the RawData table. To achieve this just extend the script a bit:
join (RawData)
Load * Resident Combined;
Drop Table Combined;

How to update the cheapest item owned by someone in postgres?

Let's say I have the following table in Postgres:
fruit
fruit_id owner_id fruit_price notes
-------------------------------------------
1 5 15
2 5 30
3 5 20
4 8 10
5 8 80
I am looking for a way to update the cheapest fruit owned by someone.
That is, I am looking for an operation that would allow me to set the notes column for the cheapest fruit owned by an individual. So this should only ever update one row (updating multiple rows is fine if there are several ties for the smallest value).
For example (psuedocode):
UPDATE fruit SET notes = 'wow cheap' WHERE owner_id = 5 AND fruit_price IS cheapest;
And this would update the first row in the above example data, because fruit_id of 1 is the cheapest fruit owned by user 5.

One possible way is simply to use a correlated subquery:
update fruit set
notes = 'some notes'
where owner_id = 5
and fruit_price = (
select min(fruit_price) from fruit f2
where f2.owner_id = fruit.owner_id
);

Best way to maintain an ordered list in PostgreSQL?

Say I have a table called list, where there are items like these (the ids are random uuids):
id rank text
--- ----- -----
x 0 Hello
x 1 World
x 2 Foo
x 3 Bar
x 4 Baz
I want to maintain the property that rank column always goes from 0 to n-1 (n being the number of rows)---if a client asks to insert an item with rank = 3, then the pg server should push the current 3 and 4 to 4 and 5, respectively:
id rank text
--- ----- -----
x 0 Hello
x 1 World
x 2 Foo
x 3 New Item!
x 4 Bar
x 5 Baz
My current strategy is to have a dedicated insertion function add_item(item) that scans through the table, filter out items with rank equal or greater than that of the item being inserted, and increment those ranks by one. However, I think this approach will run into all sorts of problems---like race conditions.
Is there a more standard practice or more robust approach?
Note: The rank column is completely independent of rest of the columns, and insertion is not the only operation I need to support. Think of it as the back-end of a sortable to-do list, and the user can add/delete/reorder the items on the fly.

Doing verbatim what you suggest might be difficult or not possible at all, but I can suggest a workaround. Maintain a new column ts which stores the time a record is inserted. Then, insert the current time along with rest of the record, i.e.
id rank text ts
--- ----- ----- --------------------
x 0 Hello 2017-12-01 12:34:23
x 1 World 2017-12-03 04:20:01
x 2 Foo ...
x 3 New Item! 2017-12-12 11:26:32
x 3 Bar 2017-12-10 14:05:43
x 4 Baz ...
Now we can easily generate the ordering you want via a query:
SELECT id, rank, text,
ROW_NUMBER() OVER (ORDER BY rank, ts DESC) new_rank
FROM yourTable;
This would generate 0 to 5 ranks in the above sample table. The basic idea is to just use the already existing rank column, but to let the timestamp break the tie in ordering should the same rank appear more than once.

you can wrap it up to function if you think its worth of:
t=# with u as (
update r set rank = rank + 1 where rank >= 3
)
insert into r values('x',3,'New val!')
;
INSERT 0 1
the result:
t=# select * from r;
id | rank | text
----+------+----------
x | 0 | Hello
x | 1 | World
x | 2 | Foo
x | 3 | New val!
x | 4 | Bar
x | 5 | Baz
(6 rows)
also worth of mention you might have concurrency "chasing condition" problem on highly loaded systems. the code above is just a sample

You can have a “computed rank” which is a double precision and a “displayed rank” which is an integer that is computed using the row_number window function on output.
When a row is inserted that should rank between two rows, compute the new rank as the arithmetic mean of the two ranks.
The advantage is that you don't have to update existing rows.
The down side is that you have to calculate the displayed ranks before you can insert a new row so that you know where to insert it.
This solution (like all others) are subject to race conditions.
To deal with these, you can either use table locks or serializable transactions.

The only way to prevent a race condition would be to lock the table
https://www.postgresql.org/docs/current/sql-lock.html
Of course this would slow you down if there are lots of updates and inserts.
If can somehow limit the scope of your updates then you can do a SELECT .... FOR UPDATE on that scope. For example if the records have a parent_id you can do a select for update on the parent record first and any other insert who does the same select for update would have to wait till your transaction is done.
https://www.postgresql.org/docs/current/explicit-locking.html#:~:text=5.-,Advisory%20Locks,application%20to%20use%20them%20correctly.
Read the section on advisory locks to see if you can use those in your application. They are not enforced by the system so you'll need to be careful of how you write your application.

Calculate value based on existence of records matching given criteria - FileMaker Pro 13

How can I write a calculation field in a table that outputs '1' if there are other (related) records in the same table that meet a given set of criteria and '0' otherwise?
Here's my problem explained in more detail:
I have a table containing 'students' and another containing 'exam results'. The 'exam results' table looks like this:
StudentID SubjectID Level Result
3234 1 2 A-
3234 2 4 B+
4739 1 4 C+
A student can only pass a Level 4 exam in subject 2 if they have also passed a Level 2 exam in subject 1 with a B+ or higher. I want to define a field in the 'students' table that contains a '1' if there exists an exam result belonging to the right student that meets these criteria and a '0' otherwise.
What would be the best way to do this?

Let us take an example of a Results table where the results are also calculated as a numeric value, e.g.
StudentID SubjectID Level Result cResultNum
3234 1 2 A- 95
3234 2 4 B+ 85
4739 1 4 C+ 75
and an Exams table with the following fields (among others):
RequiredSubjectID
RequiredLevel
RequiredResultNum
Given these, you can construct a relationship between Exams and (another occurrence of) Results as:
Exams::RequiredSubjectID = Results 2::SubjectID
AND
Exams::RequiredLevel = Results 2::Level
AND
Exams::RequiredResultNum ≤ Results 2::cResultNum
This allows each exam record to calculate a list of students that are eligible to take that exam as =
List ( Results 2::StudentID )
I want to define a field in the 'students' table that contains a '1'
if there exists an exam result belonging to the right student that
meets these criteria and a '0' otherwise.
This request is unclear, because there are many exams a student may want to take, and a field in the Students table can calculate only one result.

You need to do a self-join in the table for the field you want to check, for example:
Exam::Level = Exam2::Level
Exam::Student = Exam2::Student
And for the "was passed" criteria I think you could do an "If" on the calculation like this:
If ( Last(Exam2::Result) = "D" and ...(all the pass values) ; 1 ; 0 )
Edit:
It could be just with the not pass value hehe I miss that it will be like this:
If ( Last(Exam2::Result) = "F" ; 0 ; 1 )
I hope this helps you.

Get entire record for Max value (date) in one column

I am trying to query data for transactions. I want to get multiple columns for the latest dated transaction. PONumber, Vendor, Price for each item, last time purchased. For example:
Data:
PONumber Item Vendor Price DateOrdered
1 ABC Wal-Mart 1.00 10/29/12
2 ABC BestBuy 1.25 10/20/12
3 XYZ Wal-Mart 2.00 10/30/12
4 XYZ HomeDepot 2.50 9/14/12
Desired Result Set:
PONumber Item Vendor Price DateOrdered
1 ABC Wal-Mart 1.00 10/29/12
3 XYZ Wal-Mart 2.00 10/30/12
Trying to use max function on DateOrdered, but when I include the vendor I get the last purchase for each vendor and item (too many rows). I need one record for each item. Any ideas on how to accomplish? Using MS Access 2007 with ODBC to oracle tables. Thanks in advance.

How about:
SELECT
tran.PONumber,
tran.Item,
tran.Vendor,
tran.Price,
tran.DateOrdered
FROM tran
WHERE tran.DateOrdered = (
SELECT Max(DateOrdered)
FROM tran t
WHERE t.item=tran.item)
Where tran is your table.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Aggregate the columns based on a condition in SPSS modeler - spss-modeler

Related

How to create an inline table from an existing table

How to update the cheapest item owned by someone in postgres?

Best way to maintain an ordered list in PostgreSQL?

Calculate value based on existence of records matching given criteria - FileMaker Pro 13

Get entire record for Max value (date) in one column

Categories

Resources