How to split age column which is currently in range into different age category in python - data-cleaning

Currently I am using suicide rates overview.csv data from kaggle.
I have an age column in which ages are into range.
I want to categorize them into a specific age category so that i could use that for data pre-processing.
dataset['age'] -
age
15-24
35-54
55-74
75+
Is there anyone who would like to help me out what i am supposed to do with this age group so that it won't display error i.e '<' not supported between instances of 'int' and 'str'?
while using standardscaler
It throws an error i.e could not convert string to float: '25-34'

Related

Using CAPdiscrim and receiving Error missing value where TRUE/FALSE needed

I have a dataset where I am running CAP on different variables. I want to compare age to each of the variables, so I have merged Age and Reprostat into it's own column. But when I run CAP on this new column, it gives me this error:
Error in if (result1$percent > correct) { : missing value where TRUE/FALSE needed
This is the code for merging the columns and the first line of CAP which gives me the error
smetadat$AgeReprostat<-as.factor(paste(smetadat$Age,smetadat$ReproStat,sep=""))
Ordination.model2<-CAPdiscrim (samples~AgeReprostat, data = smetadat,dist="euclid",
axes=0,m=0,permutations=0,add=T)
I've checked and there aren't any NA values, this also happens when I merge age and region, however not for age and sex.
Any ideas why I can't run CAP on only some of the merged data columns? Thank you!

Is it possible to combine multiple input files with different schemas using Schema Drift / Dynamic Columns

I have around 20 tab-separated input files. They have in the region of 500 columns, but each will be slightly different.
The sink output schema is known and will contain all the possible input columns.
As a simplified example:
File 1
Name
Age
DOB
Nationality
Bob
21
01/01/1972
British
File2
Name
Nationality
NINO
Joe
British
AA995654A
File 3
Name
DOB
Nationality
Sam
01/01/1990
British
Is it possible to have one DataFlow with multiple inputs, where the schema is not known until runtime, that would cope with changes in the input files and in this case would output:
Name
Age
DOB
NINO
Nationality
Bob
21
01/01/1972
NULL
British
Joe
NULL
NULL
AA995654A
British
Sam
NULL
01/01/1990
NULL
British
I have looked at column pattern matching and schema drift, but don't see how/if it is possible to achieve this.
What you will do is to build a logical model in your data flow using a Derived Column with the common model that you wish to conform your input data to. This video shows an example of achieving this: https://www.youtube.com/watch?v=K5tgzLjEE9Q

How to stop date when someone dies

I have an access database for my dairy farm. I have one field named DateBorn, a module function fAge, and an unbound field named AgeNow. For this I have the expression:
=IIf(IsNull([DateBorn]),"",fAge([DateBorn]),Date()))
With this expression, whenever I type in a value for DateBorn, it calculates age for me in years, months, and days. It has worked fine thus far.
Now I want to add something to it; another field named DateDied. I want an expression that whenever I put DateDied, it should stop calculating age for that particular record.
I'm not sure if you made a mistake in your sample regarding calling the function fAge().
I expect it needs two date parameters.
That expression always calculates the age, but for people who died it uses DateDied instead of Date():
=IIf(IsNull([DateBorn]),"",fAge([DateBorn],NZ([DateDied],Date())))
If in case of dead people there shouldn't be any calculated age use this:
=IIf(IsNull([DateBorn]) Or Not IsNull([DateDied]),"",fAge([DateBorn],Date()))

Is there a way to have number type parameters blank?

I'm creating a demographics report where users can choose what demographics they want to include on the report. One of the options is age range where the user can type in their own figures. Everything works except when the age range parameters are not used. The others can be blank as they are string types but the age range parameters are number types because I need to use them to do counting. Is there a way to use number type parameters but allow them to be blank if necessary?
I tried changing the type to string and then running a ToNumber formula to use the field for my needs but I get the same error. I also tried suppressing any fields that use number parameters but trying to say suppress is Not HasValue but that didn't work either.

MDX for age range

I have a following MDX query:
SELECT {[Measures].[PARTICIPANT ID]} ON columns,
{[GENDER].[Female Gender]} ON rows
FROM [Dystonia DS]
I have a dimension called AGE IN YEARS and I want to filter PARTICIPANT ID using age range i.e PARTICIPANT IDs between AGE 20 to 54 etc.
I got solution for date range on this forum but unable to make it for age range by referring date range MDX.
Any help is greatly appreciated.
If it is really another dimension, and you don't want to display it, can't you just add it to the WHERE clause?
SELECT {[Measures].[PARTICIPANT ID]} ON columns,
{[GENDER].[Female Gender]} ON rows
FROM [Dystonia DS]
WHERE {[Age Range].&[20]:[Age Range].&[54]}
And if you need to see it, add it to the tuple in the ROWS dimension.
SELECT {[Measures].[PARTICIPANT ID]} ON columns,
([GENDER].[Female Gender],
{[Age Range].&[20]:[Age Range].&[54]}) ON rows
FROM [Dystonia DS]