creating an average score variable - average

I have a dataset that goes as follows:
an ID then his/hers storenumber then the amount of units they bought there.
What i now need from that dataset is a variable that states the following: x buys per ID (for a specific store) (this score would thus be the same for different id's with the same storenumber) example:
ID 1 and 2 have respectively 3 and 5 buys at store 1. then the variable i want would be (3+5)/2=4 then both ID's would get the variable average buys per ID for ID 1 and 2 is 4.
I just cannot get the above done through spss.

This is to create some fake data to play with:
data list list/storeNum IDinstore NumPurchased.
begin data
1 1 3
1 2 5
1 3 7
1 4 9
2 1 4
2 2 8
2 3 12
end data.
assuming what you want is the average number of units bought in each store, what you should do is aggregate:
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=storeNum
/NumPurchased_mean=MEAN(NumPurchased).

Related

How to add an iterative id column which goes up when a value in another column resets to 1 in Postgresql

I have a SQL table which has two columns called seq and sub_seq as seen below. I would like to add a third column called id, which goes up by 1 every time the sub_seq starts again at 1 as shown in the table below.
seq
sub_seq
id
1
1
1
2
2
1
3
3
1
4
4
1
5
5
1
6
1
2
7
2
2
8
3
2
9
1
3
10
2
3
11
3
3
12
4
3
13
5
3
14
6
3
15
7
3
I could write a solution using plpgsql, however I would like to know if there is a way of doing this in standard SQL. Any help would be greatly appreciated.
If sub_seq is always a running sequence then you can use the DENSE RANK function and order over the differences of two columns, assuming it will consistently uniform.
SELECT seq, sub_Seq, DENSE_RANK() OVER (ORDER BY seq-sub_Seq) AS id
FROM tableDemo
This solution is based on the sample data you have provided, I think more sample data would be helpful to check the whole scenario.

Running total using two columns

Given a table with data like:
A
B
Qty.
Running Total
5
5
5
10
5
15
I can create the running total using the formula =SUM($A$2:A2) and then drag down to get the running total after each quantity (here Qty.)
What may I do for calculating running total using two columns which may or may not be consecutive as shown below:
A
B
C
D
Qty. 1
Other
Qty. 2
RT
2
blah
2
4
2
phew
2
8
3
xyz
2
13
Place in cell D2 the formula =SUM(A2,C2,D1). Do not pay attention to the fact that the function will refer to a non-numeric cell D1 - the SUM() function will not break, unlike ordinary addition =A2+C2+D1. Now, just stretch the formula down.

Calculating group means with own group excluded in MATLAB

To be generic the issue is: I need to create group means that exclude own group observations before calculating the mean.
As an example: let's say I have firms, products and product characteristics. Each firm (f=1,...,F) produces several products (i=1,...,I). I would like to create a group mean for a certain characteristic of the product i of firm f, using all products of all firms, excluding firm f product observations.
So I could have a dataset like this:
firm prod width
1 1 30
1 2 10
1 3 20
2 1 25
2 2 15
2 4 40
3 2 10
3 4 35
To reproduce the table:
firm=[1,1,1,2,2,2,3,3]
prod=[1,2,3,1,2,4,2,4]
hp=[30,10,20,25,15,40,10,35]
x=[firm' prod' hp']
Then I want to estimate a mean which will use values of all products of all other firms, that is excluding all firm 1 products. In this case, my grouping is at the firm level. (This mean is to be used as an instrumental variable for the width of all products in firm 1.)
So, the mean that I should find is: (25+15+40+10+35)/5=25
Then repeat the process for other firms.
firm prod width mean_desired
1 1 30 25
1 2 10 25
1 3 20 25
2 1 25
2 2 15
2 4 40
3 2 10
3 4 35
I guess my biggest difficulty is to exclude the own firm values.
This question is related to this page here: Calculating group mean/medians in MATLAB where group ID is in a separate column. But here, we do not exclude the own group.
p.s.: just out of curiosity if anyone works in economics, I am actually trying to construct Hausman or BLP instruments.
Here's a way that avoids loops, but may be memory-expensive. Let x denote your three-column data matrix.
m = bsxfun(#ne, x(:,1).', unique(x(:,1))); % or m = ~sparse(x(:,1), 1:size(x,1), true);
result = m*x(:,3);
result = result./sum(m,2);
This creates a zero-one matrix m such that each row of m multiplied by the width column of x (second line of code) gives the sum of other groups. m is built by comparing each entry in the firm column of x with the unique values of that column (first line). Then, dividing by the respective count of other groups (third line) gives the desired result.
If you need the results repeated as per the original firm column, use result(x(:,1))

How to group a column in tableau based on value of another column

I am new to tableau and need help in figuring this out.I have a dataset in below format:
hid:id for the house the customer belong
cid:customer id
hID CustomerID
1 A
1 B
1 C
2 D
2 E
3 F
3 G
3 H
3 I
4 J
5 K
5 L
5 M
5 N
5 O
So A,B belong to house 1 so count of hid '1' is 3 so:
hid count of members
1 3
2 2
3 3
4 1
5 5
I want to show a graph in tableau as size of house that is X-axis :Size of house and Y-axis :Count no of house with same size so for above data the values as below:
Size of house no of house
1 1
2 1
3 2
4 0
5 1
The final graph should be:
In Tableau jargon, you're looking to bin based upon an aggregate value. Take a look at the following blog post for a more detailed description/walk-through.
One way to accomplish this is by leveraging talbeau's level of detail calculations. Creating a calculated field along the lines of:
{FIXED [hID] : COUNTD([CustomerID])}
You can then create a bin field by right clicking on the new field and binning based on a parameter, or a a static size (1?) of your choosing.
To create the visual, place this second bin field on the row shelf and on the column shelf drag the hID dimension and right click to convert to a measure by selecting Count Distinct.
As a side note, depending on whether you set your bin field as continuous or discrete, the 4 bin in your sample data will or will not appear.

Count the number of membership changes across multiple vectors

I have 2 vectors in which elements of a similar value are considered to be of the same group, something like this:
V1 V2
1 7
1 8
1 8
1 8
1 9
2 10
3 11
3 11
3 11
3 12
4 12
4 12
In this example, V1 has 4 groups, group 1 has the first 5 elements , group 2 has the next 1 element, group 3 has the next 4 elements, and group 4 has the last 2 elements. V2 has 5 groups, group 1 has the first element, group 2 has the next 3 elements, etc.
Now, I would like to count the number of time an element switches groups, using V1 as the reference. Let's consider group 1 in V1. The first 5 elements are in this same group. In V2, that's no longer the case because V2(1,1) and V2(5,1) do not have the same value as the remaining elements and thus are considered to have switched/changed membership. Applied the same principle, there is no switch for group 2 (i.e.,V1(6,1) and V2(6,1)), one switch for group 3, and no switch for group 4. Total is 3 switches.
At first I thought this would be a simple calculation with no. of switches = numel(unique(V1)) - numel(unique(V2)). However, as you can see, this underestimates the number of switches. Does anyone have a solution to this?
I also welcome a solution to a simpler problem in which V1 contains only one group, like this:
V1 V2
2 7
2 8
2 8
2 8
2 8
2 8
2 8
2 9
2 8
2 10
2 10
2 8
In this second case, the count is 4 nodes that switch: V2(1,1), V2(8,1), V2(10,1), V2(11,1).
Side note: this is actually a network problem: V1 and V2 are partitions and I'm trying to count the number of time a node switches membership.
Here is a solution using unique and accumarray
u = unique([V1 V2],'rows');
switches = accumarray(u(:,1) , 1, [],#numel)-1;
total_switches = sum(switches)
or you can use histcounts
u = unique([V1 V2],'rows');
switches = histcounts(u(:,1) , [unique(u(:,1)); u(end,1)])-1;
total_switches = sum(switches)