How to make new column in Orange Python Script - orange

I have column " Price " . This column is continous variable.
For some reason i want to make new column called " Price_Category ". This column is Discrete variable and the values are "cheap, moderate, expensive"
for example :
if 1.45 < price <1.99 then price_category = cheap
if 1.99 < price <2.00 then price_category = moderate
if 2.00 < price <5.00 then price_category = expensive
how to do that in Python Script widget in Orange Data Mining software ?

Don't know about Python Script widget, but you can use Feature Constructor widget to make a discrete variable with code such as:
0 if price <= 1.99 else 1 if price <= 2 else 2
and values:
cheap, moderate, expensive

Related

If-Else-Then with today()

i am currently trying to write some code that goes through my data and marks a number 0-12 based off the date in the "Week" column. this number appears in a new column called group which is created by the code you see below. The problem is that this column is periods all the way down and not numbers. There are no errors messages in the log so i dont know where i went wrong (im fairly new to sas). PS. the dates range from 6/17 to 9/9
data have;
set have;
if today()+84 = Week > today()+79 then group=12;
else if today()+77 = Week > today()+72 then group=11;
else if today()+70 = Week > today()+65 then group=10;
else if today()+63 = Week > today()+58 then group=9;
else if today()+56 = Week > today()+51 then group=8;
else if today()+49 = Week > today()+45 then group=7;
else if today()+42 = Week > today()+37 then group=6;
else if today()+35 = Week > today()+30 then group=5;
else if today()+28 = Week > today()+23 then group=4;
else if today()+21 = Week > today()+16 then group=3;
else if today()+14 = Week > today()+11 then group=2;
else if today()+7 = Week > today()+2 then group=1;
else if today() = Week > today()-5 then group=0;
run;
update:
the first column is called week and is a monday date that goes 12 weeks into the future. the rest of the columns are variables that i will end up summing based on the group that row is in.
ex:
week ID var2 ... var18
17jun2019 1 x x
24jun2019 1 x x
and it continues until 09sept2019.. it does this for each ID (roughly 10,000 of them) but not every id goes 12 weeks out thats why i am using the else if
i would like it to look like
week ID var2 ... var18 group
17jun2019 1 x x 0
24jun2019 1 x x 1
01july2019 1 x x 2
A full reference to SAS operators can be found in SAS help by searching SAS Operators in Expression. SAS expressions can use some operators that are relatively unique across the spectrum of coding languages. Here are some that are not typically found in newly coded SAS (at time of this post)
<> MAX operator
>< MIN operator
implied AND operator
Two comparisons with a common variable linked by AND can be condensed with an implied AND.
So the uninitiated readers of the question may misunderstand
…
if today()+35 = Week > today()+30 then group=5;
…
as incorrect, instead of recognizing it as an implied AND
…
if today()+35 = Week AND Week > today()+30 then group=5;
…
When syntactically correct, the = in the implied AND causes the expression to be true only on equality. The week value in open interval ( today()+35, today()+34 ) will never evaluate as true in the above expression. This is the likely cause of the missing values (.) you are seeing.
Why does the code exhibit non-static delta of 7 in the sequence 30,23,16,11,2,-5 ?
Should it be 30,23,16,9,2,-5.
In other words why is group 1 apparently shooting for a 5 day range [+7, +2) when all the others are 3, such as [+14, +11) ?
Why are there 2-days domains, presumed weekends, in which group is not assigned, and would thus be missing (.) ?
This type of wallpaper code is often better represented by a an arithmetic expression.
For example, presuming integer SAS date values:
group = ifn ( MOD (week-today(), 7) in (1,2)
, .
, CEIL (week-today() / 7 )
);
if not ( 0 <= group <= 12 ) then group = .; * probably dont want this but makes it compliant with OP;
Tomorrow the group value could 'wrong' because it is today() based. Consider coding a view instead of creating a permanent data set -- OR -- place meta information in the variable name group_on_20190622 = …
If you insist on wallpaper, consider using a select statement which is less prone to typing errors that can happen with errant semi-colons or missing elses.
It is not at all clear what you are trying to do. It sounds a little like you want to group observations based on how many weeks the date variable (called WEEK) is away from today's date. It might be easiest to just use the INTCK() function. That will count how many week boundary's are crossed between the two dates.
data have ;
input id week date9.;
format week date9.;
cards;
1 17jun2019
1 24jun2019
1 01jul2019
2 24jun2019
2 01jul2019
2 08jul2019
;
data want ;
set have;
group = intck('week',today(),week);
run;
You can then summarize the number of ID's per group.
proc freq data=want;
tables group;
run;
Results:
The FREQ Procedure
Cumulative Cumulative
group Frequency Percent Frequency Percent
----------------------------------------------------------
-1 1 16.67 1 16.67
0 2 33.33 3 50.00
1 2 33.33 5 83.33
2 1 16.67 6 100.00
Assuming week is date and not datetime.
data test;
do i = 1 to 30;
dt = intnx('day',today(),1*i);
output;
end;
format dt date9.;
run;
data test2;
set test;
if dt ge today() and dt le today()+7 then dt2 = 1;
else if dt ge today()+8 and dt le today()+14 then dt2 = 2;
else if dt ge today()+15 and dt le today()+21 then dt2 = 3;
else if dt ge today()+22 and dt le today()+28 then dt2 = 4;
else if dt ge today()+29 and dt le today()+35 then dt2 = 5;
/* another way */
dt3 = ceil(intck('day',today(),dt)/7);
run;
removed wrong answer.

Manipulating last two rows if there's data based on a Cut date

This question is a slightly varied version of this one...
Now I'm using Measures instead of Calculated columns and the date is static instead of having it based on a dropdown list.
Here's the Power BI test .pbix file:
https://drive.google.com/open?id=1OG7keqhdvDUDYkFQFMHyxcpi9Zi6Pn3d
This printscreen describes what I'm trying to accomplish:
Basically the date in P6 Update table is used as a cut date and will be fixed\static. It's imported from an Excel sheet where the user can customize it however they want.
Here's what should happen when a matching row in Test data table is found for P6 Update date:
column Earned Daily - must have its value summed with the next row if there's one;
column Earned Cum - must grab the next row's value;
all the previous rows should remain intact, that is, their values won't change;
all subsequent rows must have their values assigned 0.
So for example:
If P6 Update is 1-May-2018, this is the expected result:
1-May 7,498 52,106
2-May 0 0
If P6 Update is 30-Apr-2018, this is the expected result:
30-Apr 13,173 50,699
1-May 0 0
2-May 0 0
If P6 Update is 29-Apr-2018, this is the expected result:
29-Apr 11,906 44,608
30-Apr 0 0
1-May 0 0
2-May 0 0
and so on...
Hope this makes sense.
This is easier in Excel, but trying to do this in Power BI is making me go nuts.
I will ignore previously asked related questions and start from scratch.
First, create a measure:
Current Earn =
CALCULATE (
SUM( 'Test data'[Value]),
'Test data'[Act Rem] = "Actual Units",
'Test data'[Type] = "Current"
)
This measure will be used in other measures, to save you from typing all these conditions ("Actual Units" and "Current") again and again. It's a great practice to re-use measures in other measures - saves work, makes code cleaner and easier to refactor.
Create another measure:
Cut Date = SELECTEDVALUE('P6 Update'[Date])
We will use this measure whenever we need a cut off date. Please note that it does not have to be hard-coded - if P6 table contains a list of dates, you can create a pull-down slicer from the dates, and can choose the cut-off date dynamically. The formula will work properly.
Create third measure:
Next Earn =
VAR Cut_Date = [Cut Date]
VAR Current_Date = MAX ( 'Test data'[Date] )
VAR Next_Date = Current_Date + 1
VAR Current_Earn = [Current Earn]
VAR Next_Earn = CALCULATE ( [Current Earn], 'Test data'[Date] = Next_Date )
RETURN
SWITCH (
TRUE,
Current_Date < Cut_Date, Current_Earn,
Current_Date = Cut_Date, Current_Earn + Next_Earn,
BLANK ()
)
I am not sure if "Next Earn" is a good name for it, hopefully you will find a more intuitive name. The way it works: we save all necessary inputs into variables, and then use SWITCH function to define the results. Hopefully it's self-explanatory. (Note: if you need 0 above Cut Date, replace BLANK() with 0).
Finally, we define a measure for cumulative earn. It does not require any special logic, because previous measure takes care of it properly:
Cum Earn =
VAR Current_Date = MAX('Test data'[Date])
RETURN
CALCULATE(
[Next Earn],
FILTER(ALL('Test data'[Date]), 'Test data'[Date] <= Current_Date))
Result:

Tableau - Bins based on Fixed LOD Customer Units

What I'm trying to do: Create a histogram showing the distribution of customers based on their annual ordering size (1-10 units, 11-50, and so on) based on a Combined Field (Child + Zip Code which is our definition of a customer).
Problem: I cannot figure out a way to calculate the different bins correctly. I've seen plenty of posts for using bins in Tableau but none calculated based off a unique id like mine. It seems the customers are being put in every category (1-10, 11-20, etc...) instead of a unique category if their unit sales go beyond the <= . Perhaps I'm misunderstanding FIXED LOD calcs.
End goal: Get a count of the customers in these different ordering ranges to display on a histogram.
Having no luck with this formula:
IF { FIXED [UID_Cust] : SUM([Units]) } <= 10 THEN '1-10'
ELSEIF { FIXED [UID_Cust] : SUM([Units]) } <= 20 THEN '11-20'
ELSEIF { FIXED [UID_Cust] : SUM([Units]) } <= 50 THEN '21-50'
ELSEIF { FIXED [UID_Cust] : SUM([Units]) } <= 250 THEN '51-250'
ELSE '>250'
END
Here is a picture of what I'm currently getting. Everything would be perfect if I could replace those little blocks with just one number, the count of the customers in that range.
Turns out the problem was the LOD calc. I needed to add the year since I forgot Fixed LOD ignores worksheet filters.
{ FIXED [UID_Cust], [Order_Date] = 2017 : SUM([Units]) }
Then I saved this as an individual table calc "UID_Sales"
IF [UID_Sales] <= 10 THEN '1-10'
ELSEIF [UID_Sales] <= 20 THEN '11-20'
ELSEIF [UID_Sales] <= 50 THEN '21-50'
ELSEIF [UID_Sales] <= 250 THEN '51-250'
ELSE '>250'
END

Reshaping and merging simulations in Stata

I have a dataset, which consists of 1000 simulations. The output of each simulation is saved as a row of data. There are variables alpha, beta and simulationid.
Here's a sample dataset:
simulationid beta alpha
1 0.025840106 20.59671241
2 0.019850549 18.72183088
3 0.022440886 21.02298228
4 0.018124857 20.38965861
5 0.024134726 22.08678021
6 0.023619479 20.67689981
7 0.016907209 17.69609466
8 0.020036455 24.6443037
9 0.017203175 24.32682682
10 0.020273349 19.1513272
I want to estimate a new value - let's call it new - which depends on alpha and beta as well as different levels of two other variables which we'll call risk and price. Values of risk range from 0 to 100, price from 0 to 500 in steps of 5.
What I want to achieve is a dataset that consists of values representing the probability that (across the simulations) new is greater than 0 for combinations of risk and price.
I can achieve this using the code below. However, the reshape process takes more hours than I'd like. And it seems to me to be something that could be completed a lot quicker.
So, my question is either:
i) is there an efficient way to generate multiple datasets from a single row of data without multiple reshape, or
ii) am I going about this in totally the wrong way?
set maxvar 15000
/* Input sample data */
input simulationid beta alpha
1 0.025840106 20.59671241
2 0.019850549 18.72183088
3 0.022440886 21.02298228
4 0.018124857 20.38965861
5 0.024134726 22.08678021
6 0.023619479 20.67689981
7 0.016907209 17.69609466
8 0.020036455 24.6443037
9 0.017203175 24.32682682
10 0.020273349 19.1513272
end
forvalues risk = 0(1)100 {
forvalues price = 0(5)500 {
gen new_r`risk'_p`price' = `price' * (`risk'/200)* beta - alpha
gen probnew_r`risk'_p`price' = 0
replace probnew_r`risk'_p`price' = 1 if new_r`risk'_p`price' > 0
sum probnew_r`risk'_p`price', mean
gen mnew_r`risk'_p`price' = r(mean)
drop new_r`risk'_p`price' probnew_r`risk'_p`price'
}
}
drop if simulationid > 1
save simresults.dta, replace
forvalues risk = 0(1)100 {
clear
use simresults.dta
reshape long mnew_r`risk'_p, i(simulationid) j(price)
keep simulation price mnew_r`risk'_p
rename mnew_r`risk'_p risk`risk'
save risk`risk'.dta, replace
}
clear
use risk0.dta
forvalues risk = 1(1)100 {
merge m:m price using risk`risk'.dta, nogen
save merged.dta, replace
}
Here's a start on your problem.
So far as I can see, you don't need more than one dataset.
The various reshapes and merges just rearrange what was first generated and that can be done within one dataset.
The code here in the first instance is for just one pair of values of alpha and beta. To simulate 1000 such, you would need 1000 times more observations, i.e. about 10 million, which is not usually a problem and to loop over the alphas and betas. But the loop can be tacit. We'll get to that.
This code has been run and is legal. It's limited to one alpha, beta pair.
clear
input simulationid beta alpha
1 0.025840106 20.59671241
2 0.019850549 18.72183088
3 0.022440886 21.02298228
4 0.018124857 20.38965861
5 0.024134726 22.08678021
6 0.023619479 20.67689981
7 0.016907209 17.69609466
8 0.020036455 24.6443037
9 0.017203175 24.32682682
10 0.020273349 19.1513272
end
local N = 101 * 101
set obs `N'
egen risk = seq(), block(101)
replace risk = risk - 1
egen price = seq(), from(0) to(100)
replace price = 5 * price
gen result = (price * (risk/200)* beta[1] - alpha[1]) > 0
bysort price risk: gen mean = sum(result)
by price risk: replace mean = mean[_N]/_N
Assuming now that you first read in 1000 values, here is a sketch of how to get the whole thing. This code has not been tested. That is, your dataset starts with 1000 observations; you then enlarge it to 10 million or so, and get your results. The tricksy part is using an expression for the subscript to ensure that each block of results is for a distinct alpha, beta pair. That's not compulsory; you could do it in a loop, but then you would need to generate outside the loop and replace within it.
local N = 101 * 101 * 1000
set obs `N'
egen risk = seq(), block(101)
replace risk = risk - 1
egen price = seq(), from(0) to(100)
replace price = 5 * price
egen sim = seq(), block(10201)
gen result = (price * (risk/200)* beta[ceil(_n/10201)] - alpha[ceil(_n/10201)]) > 0
bysort sim price risk: gen mean = sum(result)
by sim price risk: replace mean = mean[_N]/_N
Other devices used: egen to set up in blocks; getting the mean without repeated calls to summarize; using a true-or-false expression directly.
NB: I haven't tried to understand what you are doing, but it seems to me that the price-risk-simulation conditions define single values, so calculating a mean looks redundant. But perhaps that is in the code because you wish to add further detail to the code once you have it working.
NB2: This seems a purely deterministic calculation. Not sure that you need this code at all.

Create Accumulate function on Drools Decision table

I am trying to create accumulate function condition on Decision table. Please help me that How to create on Decision table.
My accumulate rule function is
when
$i : Double(doubleValue > 1000 ) from accumulate( Product($productQty:quantity),sum($productQty))
then
System.out.println( "The quantity is exceeded more than 1000 and the total value is " + $i );
You can create a column
rows
n condition
n+1 $i : Double() from accumulate( Product($productQty:quantity),sum($productQty))
n+2 doubleValue > $param
n+3 add quantitities and check
n+4 1000
Two comments.
This is not well-suited for decision tables unless you plan to check for different ranges of the accumulated value.
Why do you use double for counting what is, most likely, integral quantities? I'd be surprise if the accumulated stock exceeds Integer.MAX_VALUE. In short: use Number in the pattern and intValue in the constraint.