Select cases if value is greater than mean of group - select

Is there a way to include means of entire variables in Select Cases If syntax?
I have a dataset with three groups n=20 each (sorting variable grp with values 1, 2, or 3) and results of a pre and post evaluation (variable pre and post). I want to select for every group only the 10 cases where the pre value is higher than the mean of that value in the group.
In pseudocode:
select if pre-value > mean(grp)
So if the mean in group 1 is 15, that's what all values from group one cases should be compared to. But at the same time if group 2's mean is 20, that is what values from cases in group 2 should be compared to.
Right now I only see the MEAN(arg1,arg2,...) function in the Select Cases If window, but no possibility to get the mean of an entire variable, much less with an additional condition (like group).
Is there a way to do this with Select Cases If syntax, or otherwise?

You need to create a new variable that will contain the mean of the group (so all lines in each group will have the same value in this variable - group mean). You can then compare each line to this value .
First I'll create some example data to demonstrate on:
data list list/grp pre_value .
begin data
1 3
1 6
1 8
2 1
2 4
2 9
3 55
3 43
3 76
end data.
Now you can calculate the group mean and select:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=grp /GrpMean=MEAN(pre_value).
select if pre_value > GrpMean.
.

Related

SPSS aggergate on 2 variables

I am trying to compute a N_break that has to "satisfy" a condition. I have a variable which indicates 1 or 0. Lets call that variable "HT". Every lopnr is also labled in every row multiple times. So first 10 rows can be ID nr 1. And next 20 can be ID nr 2 and so on.
My question is: How do i create a N-break with lopnr as breakvariable that has to have HT=1? I am not allowed to select only 1s on variable HT before, since i need the 0s in the file.
A few simple ways to do this:
1 - USE FILTER
filter cases by HT.
aggregate ....
when you get back to original dataset, use:
filter off.
use all.
2 - COPY DATASET
dataset name orig.
dataset copy foragg.
dataset activate foragg.
select if HT.
aggregate....
3 - TEMPORARY SELECTION
temporary.
select if HT.
aggregate....

How to dynamically pivot based on rows data and parameter value?

I am trying to pivot using crosstab function and unable to achieve for the requirement. Is there is a way to perform crosstab dynamically and also dynamic result set?
I have tried using crosstab built-in function and unable to meet my requirement.
select * from crosstab ('select item,cd, type, parts, part, cnt
from item
order by 1,2')
AS results (item text,cd text, SUM NUMERIC, AVG NUMERIC);
Sample Data:
ITEM CD TYPE PARTS PART CNT
Item 1 A AVG 4 1 10
Item 1 B AVG 4 2 20
Item 1 C AVG 4 3 30
Item 1 D AVG 4 4 40
Item 1 A SUM 4 1 10
Item 1 B SUM 4 2 20
Item 1 C SUM 4 3 30
Item 1 D SUM 4 4 40
Expected Results:
ITEM CD PARTS TYPE_1 CNT_1 TYPE_1 CNT_1 TYPE_2 CNT_2 TYPE_2 CNT_2 TYPE_3 CNT_3 TYPE_3 CNT_3 TYPE_4 CNT_4 TYPE_4 CNT_4
Item 1 A 4 AVG 10 SUM 10 AVG 20 SUM 20 AVG 30 SUM 30 AVG 40 SUM 40
The PARTS value is based on a parameter passed by the user. If the user passes 2 for example, there will be 4 rows in the result set (2 parts for AVG and 2 parts of SUM).
Can I achieve this requirement using CROSSTAB function or is there a custom SQL statement that need to be developed?
I'm not following your data, so I can't offer examples based on it. But I have been looking at pivot/cross-tab features over the past few days. I was just looking at dynamic cross tabs just before seeing your post. I'm hoping that your question gets some good answers, I'll start off with a bit of background.
You can use the crosstab extension for standard cross tabs, what when wrong when you tried it? Here's an example I wrote for myself the other day with a bunch of comments and aliases for clarity. The pivot is looking at item scans to see where the scans were "to", like the warehouse or the floor.
/* Basic cross-tab example for crosstab (text) format of pivot command.
Notice that the embedded query has to return three columns, see the aliases.
#1 is the row label, it shows up in the output.
#2 is the category, what determines how many columns there are. *You have to work this out in advance to declare them in the return.*
#3 is the cell data, what goes in the cross tabs. Note that this form of the crosstab command may return NULL, and coalesce does not work.
To get rid of the null count/sums/whatever, you need crosstab (text, text).
*/
select *
from crosstab ('select
specialty_name as row_label,
scanned_to as column_splitter,
count(num_inst)::numeric as cell_data
from scan_table
group by 1,2
order by 1,2')
as scan_pivot (
row_label citext,
"Assembly" numeric,
"Warehouse" numeric,
"Floor" numeric,
"QA" numeric);
As a manual alternative, you can use a series of FILTER statements. Here's an example that summaries errors_log records by day of the week. The "down" is the error name, the "across" (columns) are the days of the week.
select "error_name",
count(*) as "Overall",
count(*) filter (where extract(dow from "updated_dts") = 0) as "Sun",
count(*) filter (where extract(dow from "updated_dts") = 1) as "Mon",
count(*) filter (where extract(dow from "updated_dts") = 2) as "Tue",
count(*) filter (where extract(dow from "updated_dts") = 3) as "Wed",
count(*) filter (where extract(dow from "updated_dts") = 4) as "Thu",
count(*) filter (where extract(dow from "updated_dts") = 5) as "Fri",
count(*) filter (where extract(dow from "updated_dts") = 6) as "Sat"
from error_log
where "error_name" is not null
group by "error_name"
order by 1;
You can do the same thing with CASE, but FILTER is easier to write.
It looks like you want something basic, maybe the FILTER solution appeals? It's easier to read than calls to crosstab(), since that was giving you trouble.
FILTER may be slower than crosstab. Probably. (The crosstab extension is written in C, and I'm not sure how smart FILTER is about reading off indexes.) But I'm not sure as I haven't tested it out yet. (It's on my to do list, but I haven't had time yet.) I'd be super interested if anyone can offer results. We're on 11.4.
I wrote a client-side tool to build FILTER-based pivots over the past few days. You have to supply the down and across fields, an aggregate formula and the tool spits out the SQL. With support for coalesce for folks who don't want NULL, ROLLUP, TABLESAMPLE, view creation, and some other stuff. It was a fun project. Why go to that effort? (Apart from the fun part.) Because I haven't found a way to do dynamic pivots that I actually understand. I love this quote:
"Dynamic crosstab queries in Postgres has been asked many times on SO all involving advanced level functions/types. Consider building your needed query in application layer (Java, Python, PHP, etc.) and pass it in a Postgres connected query call. Recall SQL is a special-purpose, declarative type while app layers are general-purpose, imperative types." – Parfait
So, I wrote a tool to pre-calculate and declare the output columns. But I'm still curious about dynamic options in SQL. If that's of interest to you, have a look at these two items:
https://postgresql.verite.pro/blog/2018/06/19/crosstab-pivot.html
Flatten aggregated key/value pairs from a JSONB field?
Deep magic in both.

Increment, display, and print a number of pages up to a Parameter maximum

I am trying to set up a Crystal Report to generate pallet labels which include a "Pallet N of M" line on each page.
The order in the database only has one entry, which contains all of the other information for the label(s). My issue is that the number of pallets isn't known until print time, and each label needs the same order information as well as a pallet ID. What I am looking for is for the user to enter a parameter for the total number of pallets, then have Crystal generate the correct number of unique labels.
For example, if the order has 4 pallets then Crystal should generate 4 copies of the label, the only difference on each copy being the last line which should read 'Pallet 1 of 4', 'Pallet 2 of 4', 'Pallet 3 of 4', and 'Pallet 4 of 4' respectively.
The only solution I can think of would be to create a number of sections equal to the maximum number of pallets, all containing my real data as a sub-report, then suppress any which are greater than the entered parameter. I am trying to avoid this because we may have greater than 50 pallets on a given order, so that would be tedious to create and is just not a very clean solution.
-= EDIT =-
Big thanks to Aliqux for the solution. In the end I needed to change the syntax to be Oracle compatible and also include the LINKVALUE field in the report, but the answer was 99% there especially considering I had not mentioned that my datasource was Oracle. The Command syntax I ended up using (with my parameter called TotalPallets) was:
SELECT
-1 AS "LINKVALUE"
FROM table
WHERE rownum <= {?TotalPallets}
You will need to Add a command to your tables.
1) Under database Expert in the Database you used select Add Command.
2) Click create to Create a Parameter.
3) Create the requested values, you can leave the Default Value blank if you want, and change the Value Type to Number. (I will be calling my parameter Counter)
4) Put the following in the SQL query where table is the main table you are using
SELECT TOP {?Counter}
-1 AS LINKVALUE
FROM table
5) Click okay and enter a value if prompted
6) In links link a number field, that will never go into negative numbers as we are using -1 for a base, to the LINKVALUE in the Command Table.
7)Link Options are as followed
Join Type: Inner Join, Enforce Join: Not Enforced, Link Type: !=
8) In your report create a formula with a shared numbervalue the starts from one.
shared numbervar counters;
if counters = {?Counter} then counters := 0;
counters := counters + 1
9) You can use that formula and the Counter Parameter for your labels
totext({#Counts},0,"")+" of "+totext({?Counter},0,"")
Credit reference: https://blogs.sap.com/2014/01/24/duplicating-data-details-records-n-times-eg-repeating-labels-n-times/

Get consecutive sequence number in ireport

I need to display row number sequence of each group.
I have used $V{PAGE_COUNT} and evaluation time as now
The report data that I am getting is
Group A
1.
2
3
4
...........
page ends ......
Group A
1
2
3
4
page ends ---------
Group B
1
2
3
4
5
page ends....
But my requirement is
Group A
1.
2
3
4
...........
page ends
Group A
5
6
7
8
9
page ends .......
Group B
1
2
3
4
5
page ends....
I need all rows of same group to be continuous sequence. And start sequence from 1 when group is changed
You should use the GroupName_COUNT variable in this case.
The quote from the JasperReports Ultimate Guide
When declaring a report group, the engine automatically creates a count variable that
calculates the number of records that make up the current group (that is, the number of
records processed between group ruptures).
The name of this variable is derived from the name of the group it corresponds to,
suffixed with the _COUNT sequence. It can be used like any other report variable, in any
report expression, even in the current group expression, as shown in the BreakGroup
group of the /demo/samples/jasper sample)
More info is here: Data Grouping

How to pick items from warehouse to minimise travel in TSQL?

I am looking at this problem from a TSQL point of view, however any advice would be appreciated.
Scenario
I have 2 sets of criteria which identify items in a warehouse to be selected.
Query 1 returns 100 items
Query 2 returns 100 items
I need to pick any 25 of the 100 items returned in query 1.
I need to pick any 25 of the 100 items returned in query 2.
- The items in query 1/2 will not be the same, ever.
Each item is stored in a segment of the warehouse.
A segment of the warehouse may contain numerous items.
I wish to select the 50 items (25 from each query) in a way as to reduce the number of segments I must visit to select the items.
Suggested Approach
My initial idea has been to combined the 2 result sets and produce a list of
Segment ID, NumberOfItemsRequiredInSegment
I would then select 25 items from each query, giving preference to those in a segments with the most NumberOfItemsRequiredInSegment. I know this would not be optimal but would be an easy to implement heuristic.
Questions
1) I suspect this is a standard combinational problem, but I don't recognise it.. perhaps multiple knapsack, does anyone recognise it?
2) Is there a better (easy-ish to impliment) heuristic or solution - ideally in TSQL?
Many thanks.
This might also not be optimal but i think would at least perform fairly well.
Calculate this set for query 1.
Segment ID, NumberOfItemsRequiredInSegment
take the top 25, Just by sorting by NumberOfItemsRequiredInSegment. call this subset A.
take the top 25 from query 2, by joining to A and sorting by "case when A.segmentID is not null then 1 else 0, NumberOfItemsRequiredInSegmentFromQuery2".
repeat this but take the top 25 from query 2 first. return the better performing of the 2 sets.
The one scenario where i think this fails would be if you got something like this.
Segment Count Query 1 Count Query 2
A 10 1
B 5 1
C 5 1
D 5 4
E 5 4
F 4 4
G 4 5
H 1 5
J 1 5
K 1 10
You need to make sure you choose A, D, E, from when choosing the best segments from query 1. To deal with this you'd almost still need to join to query two, so you can get the count from there to use as a tie breaker.