SPSS - Create dummy for top volume months within customer grouping - date

I need to create a dummy for the top purchase months within each customer ID. That is, if a month belong to one of the four months within the year where the customer purchased the most then it is noted with the number 1, otherwise 0.
Example of data, cust id, order date, volume and new variable dummy:
This code creates some sample data:
data list free/ID volume (2f4).
begin data
1 100 1 500 2 1 2 2 2 3 2 90 1 600 1 90 1 870 2 9 2 8 2 10
end data.

Using the sample data in the question, this code will create a new variable containing the dummy according to your definition:
RANK VARIABLES=volume (A) BY ID /RANK.
compute high4=(Rvolume<=4).

Related

How can I merge two data sets when the using data set is a subset of the master data set? Stata

I have three data sets I would like to merge: one contains data on all members of a household and the other two contain more detailed data on women and men respectively.
I would like to link my detailed data to the household data. See an example of the data sets below. I need to use household ID (HH ID) and individual ID (IND ID) to link to bolded observations. Once the women and men data are merged there will still be some missing observations that do appear in the household data (which I would like to keep).
Household data set
HH ID
IND ID
1
1
1
2
1
3
1
4
2
1
2
2
2
3
3
1
Women's data set
HH ID
IND ID
1
1
1
3
2
2
3
1
I have tried the following 1:1 merge
merge 1:1 hhid indid using "womens_data.dta"
and got the following error message,
variables hhid indid do not uniquely identify observations in the master data
r(459);
What should I be doing instead?

Recursive CTE in Postgres - SUM at each parent node

I`m storing a hierarchical product adjacency list and a separate table for all the sales on those products.
Currently, I`m trying to present to the user a "Sales Report" with the total amount/sum sold per each product, and at the parent level.
In the sales table, I do not have info on the sales per group, thus, I have info available only at the level of the child. From what I have read I need to use recursive CTE, and I tried creating some queries, without any success.
Example of my dataset:
F - folders
P - products
Products table:
id
name
parentid
1
F1
NULL
2
F2
1
3
P1
2
4
P2
2
Sales table:
id
quant
sum
3
3
90
4
2
100
What I need to obtain in the report:
id
name
parentid
quant
sum
1
F1
NULL
5
190
2
F2
1
5
190
3
P1
2
3
90
4
P2
2
2
100
Logically, I understand that I need to fetch each row, and recursively go through all its children in order to SUM the quant and the sum, however I have no clue how to write it.
I`d be thankful for any guidance on where I could read more about recursive CTE, or anything that can help my situation.
Cheers!

How would I configure analyze threshold for a table where the data is categorically different every couple months?

We host data for an auditing service. Every few months, a new audit comes out with similar questions to previous audits of the same category. Since questions can change verbiage and number, we store each question in each audit separately (we do link them through a "related_questions" table).
audits
id
name
passing_score
1
audit_1
100
2
audit_2
150
questions
id
audit_id
text
1
1
q1
2
1
q2
3
2
q1
4
2
q2
We then have a surveys and responses table. Surveys are the overall response to an audit, while responses store the individual responses to each question.
surveys
id
audit_id
overall_score
pass
1
1
120
true
2
1
95
false
3
2
200
true
4
2
100
false
responses
id
survey_id
question_id
score
1
1
1
60
2
1
2
60
3
2
1
60
4
2
2
35
5
3
3
100
6
3
4
100
7
4
3
50
8
4
4
50
The analyze threshold is base threshold + scale factor * number of tuples. The problem with this is that once an audit has finished (after a few months), we'll never receive new surveys or responses for that category of data. The new data that comes in is conceptually all that needs to be analyzed. All data is queried, but the new data has the most traffic.
If 10% is the ideal scale factor for today and analyze autoruns once every week, a couple years from now analyze may autorun once every 4 months due to the number of tuples. This is problematic when the past 3 months of data is for questions that the analyzer has never seen and so there are no helpful stats for the query planner on this data.
We could set the scale factor extremely low for this table, but that seems like a cheap solution that could cause issues in the future.
If you have a constant data modifications rate, setting autovacuum_analyze_scale_factor to 0 for that table and only using autovacuum_analyze_threshold is a good idea.

Number of Order From Customers

This seems like a simple question but I'm having trouble building my graph. I'm trying to get the number of customers who made 1 order, 2 orders, 3 orders etc..
Sample Data Source:
Customer ID| Order ID| Date Ordered
A 10 06/01/2019
A 11 06/02/2019
A 12 06/02/2019
B 15 06/05/2019
B 16 06/05/2019
B 17 06/05/2019
C 20 06/06/2019
C 21 06/06/2019
I can easily get the graph to show that Customer A made 3 Orders , Customer B made 3 Orders and Customer C made 2 orders.. etc.
What I'm trying to show is how many customer places a certain number of orders . So in our sample data. 1 Order = 0 , 2 Orders = 1, 3 Orders = 2. So in the X axis im trying to show (1 Order, 2 Orders , 3 Orders, 4 Orders.. etc )
I tried doing calculations such as IF COUNT([CustomerID]) > 2 then '1 order' but I can't seem to get it right. Any advice will be helpful. Thanks in advance
Maybe you can try using LOD expressions and create a new calculated field like this:
{INCLUDE [Customer ID]: COUNTD([Order ID])}
And then use that field to show that info.

Get consecutive sequence number in ireport

I need to display row number sequence of each group.
I have used $V{PAGE_COUNT} and evaluation time as now
The report data that I am getting is
Group A
1.
2
3
4
...........
page ends ......
Group A
1
2
3
4
page ends ---------
Group B
1
2
3
4
5
page ends....
But my requirement is
Group A
1.
2
3
4
...........
page ends
Group A
5
6
7
8
9
page ends .......
Group B
1
2
3
4
5
page ends....
I need all rows of same group to be continuous sequence. And start sequence from 1 when group is changed
You should use the GroupName_COUNT variable in this case.
The quote from the JasperReports Ultimate Guide
When declaring a report group, the engine automatically creates a count variable that
calculates the number of records that make up the current group (that is, the number of
records processed between group ruptures).
The name of this variable is derived from the name of the group it corresponds to,
suffixed with the _COUNT sequence. It can be used like any other report variable, in any
report expression, even in the current group expression, as shown in the BreakGroup
group of the /demo/samples/jasper sample)
More info is here: Data Grouping