SPSS group by rows and concatenate string into one variable - aggregate

I'm trying to export SPSS metadata to a custom format using SPSS syntax. The dataset with value labels contains one or more labels for the variables.
However, now I want to concatenate the value labels into one string per variable. For example for the variable SEX combine or group the rows F/Female and M/Male into one variable F=Female;M=Male;. I already concatenated the code and labels into a new variable using Compute CodeValueLabel = concat(Code,'=',ValueLabel).
so the starting point for the source dataset is like this:
+--------------+------+----------------+------------------+
| VarName | Code | ValueLabel | CodeValueLabel |
+--------------+------+----------------+------------------+
| SEX | F | Female | F=Female |
| SEX | M | Male | M=Male |
| ICFORM | 1 | Yes | 1=Yes |
| LIMIT_DETECT | 0 | Too low | 0=Too low |
| LIMIT_DETECT | 1 | Normal | 1=Normal |
| LIMIT_DETECT | 2 | Too high | 2=Too high |
| LIMIT_DETECT | 9 | Not applicable | 9=Not applicable |
+--------------+------+----------------+------------------+
The goal is to get a dataset something like this:
+--------------+-------------------------------------------------+
| VarName | group_and_concatenate |
+--------------+-------------------------------------------------+
| SEX | F=Female;M=Male; |
| ICFORM | 1=Yes; |
| LIMIT_DETECT | 0=Too low;1=Normal;2=Too high;9=Not applicable; |
+--------------+-------------------------------------------------+
I tried using CASESTOVARS but that creates separate variables, so several variables not just one single string variable. I'm starting to suspect that I'm running up against the limits of what SPSS can do. Although maybe it's possible using some AGGREGATE or OMS trickery, any ideas on how to do this?

First I recreate your example here to demonstrate on:
data list list/varName CodeValueLabel (2a30).
begin data
"SEX" "F=Female"
"SEX" "M=Male"
"ICFORM" "1=Yes"
"LIMIT_DETECT" "0=Too low"
"LIMIT_DETECT" "1=Normal"
"LIMIT_DETECT" "2=Too high"
"LIMIT_DETECT" "9=Not applicable"
end data.
Now to work:
* sorting to make sure all labels are bunched together.
sort cases by varName CodeValueLabel.
string combineall (a300).
* adding ";" .
compute combineall=concat(rtrim(CodeValueLabel), ";").
* if this is the same varname as last row, attach the two together.
if $casenum>1 and varName=lag(varName)
combineall=concat(rtrim(lag(combineall)), " ", rtrim(combineall)).
exe.
*now to select only relevant lines - first I identify them.
match files /file=* /last=selectthis /by varName.
*now we can delete the rest.
select if selectthis=1.
exe.
NOTE: make combineall wide enough to contain all the values of your most populated variable.

Related

How do I make a specific query in a PSQL database

I need help with a project. Right now, I have a giant postgreSQL database (~6 million rows, 25 columns) and I need to figure out how to get the following information:
For one specific range for one attribute, "20<np<400":
Find local minimum values in another variable, "itt"
Record whole row of data for that local minimum
Add extra column and add to this the next local maximum of itt
Edit: the attribute are as follows: The database has this schema: id | at | itt | engine_torque | ng | np | fuel_flow | engine_oil_pressure | engine_oil_temp | airspeed | altitude | total_air_temp | weight_on_wheels | p25_p3 | bypass | chip_counter | number_of_flight_counter | number_of_engine_run | run_id | ectm_file_id | aircraft_sn | engine_sn | gas_generator_sn | power_section_sn | tail_number
By 'local minima and maxima' i mean that within a certain range of np, i'm looking for the highest and lowest values of itt in close proximity with respect to time, or "at"
Thanks in advance!!!

Combine multiple rows into single row in Google Data Prep

I have a table which has multiple payload values in separate rows. I want to combine those rows into a single row to have all the data together. Table looks something like this.
+------------+--------------+------+----+----+----+----+
| Date | Time | User | D1 | D2 | D3 | D4 |
+------------+--------------+------+----+----+----+----+
| 2020-04-15 | 05:39:45 UTC | A | 2 | | | |
| 2020-04-15 | 05:39:45 UTC | A | | 5 | | |
| 2020-04-15 | 05:39:45 UTC | A | | | 8 | |
| 2020-04-15 | 05:39:45 UTC | A | | | | 7 |
+------------+--------------+------+----+----+----+----+
And I want to convert it to something like this.
+------------+--------------+------+----+----+----+----+
| Date | Time | User | D1 | D2 | D3 | D4 |
+------------+--------------+------+----+----+----+----+
| 2020-04-15 | 05:39:45 UTC | A | 2 | 5 | 8 | 7 |
+------------+--------------+------+----+----+----+----+
I tried "set" and "aggregate" but they didn't work as I wanted them to and I am not sure how to go forward.
Any help would be appreciated.
Thanks.
tl;dr:
use fill() function to fill all empty values within each d1-d4 columns in the wanted group (AKA - the columns date+time+user) then dedup\aggregate to your heart's content.
long version
So the quickest way to do this is by using a window-function called "fill()".
What this function does for each given field in a column, it tells it:
"Look down. look up. find the closest non-empty value, and copy it!"
you can ofcourse limit it's sight (look only 3 rows above, for example) but for this example, don't need the limitation. so your fill function will look like this:
FILL($col, -1, -1)
So the "$col" will reference all the chosen columns. the "-1" says "unlimited sight".
finally, the "~" says "from column D1 to column D4".
So, function will look like this:
.
Which in turn will make your columns look like this:
.
Now you can use the "dedup" transformation to remove any duplications, and only 1 copy of each "group" will remain.
Alternatively, if you still want to use "group by", you can do that aswell.
Hope this helps =]
p.s
There are more ways to do this - which entails using the "pivot" transformation, and array unnesting. But in the process you'll lose your columns' names, and will need to rename them.

SQL parameter table

I suspect this question is already well-answered but perhaps due to limited SQL vocabulary I have not managed to find what I need. I have a database with many code:description mappings in a single 'parameter' table. I would like to define a query or procedure to return the descriptions for all (or an arbitrary list of) coded values in a given 'content' table with their descriptions from the parameter table. I don't want to alter the original data, I just want to display friendly results.
Is there a standard way to do this?
Can it be accomplished with SELECT or are other statements required?
Here is a sample query for a single coded field:
SELECT TOP (5)
newid() as id,
B.BRIDGE_STATUS,
P.SHORTDESC
FROM
BRIDGE B
LEFT JOIN PARAMTRS P ON P.TABLE_NAME = 'BRIDGE'
AND P.FIELD_NAME = 'BRIDGE_STATUS'
AND P.PARMVALUE = B.BRIDGE_STATUS
ORDER BY
id
I want to produce 'decoded' results like:
| id | BRIDGE_STATUS |
|--------------------------------------|------------ |
| BABCEC1E-5FE2-46FA-9763-000131F2F688 | Active |
| 758F5201-4742-43C6-8550-000571875265 | Active |
| 5E51634C-4DD9-4B0A-BBF5-00087DF71C8B | Active |
| 0A4EA521-DE70-4D04-93B8-000CD12B7F55 | Inactive |
| 815C6C66-8995-4893-9A1B-000F00F839A4 | Proposed |
Rather than original, coded data like:
| id | BRIDGE_STATUS |
|--------------------------------------|---------------|
| F50214D7-F726-4996-9C0C-00021BD681A4 | 3 |
| 4F173E40-54DC-495E-9B84-000B446F09C3 | 3 |
| F9C216CD-0453-434B-AFA0-000C39EFA0FB | 3 |
| 5D09554E-201D-4208-A786-000C537759A1 | 1 |
| F0BDB9A4-E796-4786-8781-000FC60E200C | 4 |
but for an arbitrary number of columns.

Cross tab with a list of values instead of summation

I want a Cross tab that lists field values and counts them instead of just giving a count for the summation. I know I could make this with groups but I cant list the values vertically that way. From my research I believe I have to use a Display String Formula.
SQL Field Data
-------------------------------------------------
| Play # | Formation |Back Set | R/P | PLAY |
-------------------------------------------------
| 1 | TREY | FG | R | TRUCK |
-------------------------------------------------
| 2 | T | FG | R | RHINO |
-------------------------------------------------
| 3 | D | FG | P | 5 STEP |
-------------------------------------------------
| 4 | D | FG | P | 5 STEP |
-------------------------------------------------
| 5 | K JET | NG | R | DOG |
-------------------------------------------------
Desired report structure:
-----------------------------------------------------------
| Backet & Formation | Run | Pass |
-----------------------------------------------------------
| NG K JET | BULLA 1 | |
| | HELL 3 | |
-----------------------------------------------------------
| FG D | | 5 STEP 2 |
-----------------------------------------------------------
| NG K JET | DOG | |
-----------------------------------------------------------
| FG T | RHINO | |
-----------------------------------------------------------
Don't see why a Crosstab is necessary for this - especially if the entire body of the report is just that table.
Group your records by Bracket and Formation - If that's not
something natively configured in your table, make a new Formula field
and group on that.
Drop the 3 relevant fields into whichever section you need to display. (It might be a Footer, based on whether or not you want repeats
Write a formula to determine whether or not Run or Pass are displayed, and place it in their suppression field. (Good luck getting a Crosstab to do that for you! It tends to prefer 0s over blanks.)
If there's more to the report than just this table, you can cheat the system by placing your "table" into a subreport. And of course you can stretch Line objects across the sections and it will stretch to form the table outlines

LibreOffice - RANDBETWEEN return a name

I got two columns list like this
+----+-------+
| Nr | Name |
+----+-------+
| 1 | Alice |
| 2 | Bob |
| 3 | Joe |
| 4 | Ann |
| 5 | Jane |
+----+-------+
And would like to generate a random name from this list.
For now I am only able to randomly select a number and then manually pick out the corresponding name - using this function =RANDBETWEEN(A2;A10) How can I pick out the name instead?
Assuming that the data of your table are in cells E7:F11 the following code can do what you need:
=VLOOKUP(RANDBETWEEN(1;5);E7:F11;2)
Further, in case you need to create a random permutation of the names you may also use the Calc extension Permutate at https://sourceforge.net/projects/permutate/.
Hope that helps.
Assuming your data is with Nr in A1 I suggest:
=INDEX(B$2:B$6;RANDBETWEEN(1;5))
then there is no need for the Nr column in making the selection.