Solving duplicate rows in (Stata) - merge

I have dataset that is composed of two merged datasets. First, official unemployment figures from country statistical offices, while the other is unemployment figures from the ILO's modelled estimates.
For some countries like Jordan, it is named JOR but JOR_total in another dataset, although it is measuring the same demographic group. This results in having two rows per gender for the same year. However, I essentially want to keep one row per country and per gender.
Below is how the dataset looks like:
year gender country unemployment_official unemployment_ilo
2019 Female JOR 19
2019 Male JOR 8
2019 Female JOR_total 17.3
2019 Male JOR_total 7.4
I would like to change the dataset to look something as below:
year gender country unemployment_official unemployment_ilo
2019 Female JOR 19 17.3
2019 Male JOR 8 7.4
2019 Female EGY 17 22
2019 Male EGY 5 9.4

Solved this before merging the two datasets, with the following:
// Removing countries with duplicate entries
drop if country=="JOR"
replace country="JOR" if country=="JOR_total"
And it looks as:
year gender country unemployment_official unemployment_ilo
2019 Female JOR 19 17.3
2019 Male JOR 8 7.4
2019 Female EGY 17 22
2019 Male EGY 5 9.4

Related

Is there a way to apply week-over-week growth rates from prior year to current year data in Tableau?

I have a Dataset in Tableau with fields: Date, City, Sales. The data is for all days of 2022
I created Week - over - Week growth rates for 2022. My worksheet look like this:
Iso-Year (Date)
Iso Week (Date)
City
SUM(Sales)
2022 Growth %
2022
W01
New York
2500
0
2022
W01
Boston
400
0
2022
W01
Minnesota
1300
0
2022
W02
New York
2600
4%
2022
W02
Boston
200
-50%
2022
W02
Minnesota
1340
3.07%
2022
W03
New York
2400
-7.69%
2022
W03
Boston
800
200%
2022
W03
Minnesota
1200
-10.44%
The calculated field I used to get 2022 W-o-W growth %:
(SUM([Sales]) - LOOKUP(SUM([SALES]),-1)) / SUM([Sales])
I want to use the 2022 growth rate for a specific City and Iso-Week combination to the same City and Iso-week combination for 2023. As an example,
New York grew by 4% from W01 to W02 of 2022, so I'm looking to apply the same 4% to calculate Sales in W02 of 2023 (W01 2023 Sales * 1.04) and so on. Like this, I want to apply growth rates for those week and city combination of 2022 to calculate Sales for all future weeks of 2023. I already have my W01 2023 data to start with.
Iso-Year (Date)
Iso Week (Date)
City
SUM(Sales)*Growth
New Sales
2023
W01
New York
4000
4000
2023
W02
New York
4000*(1+4%)
4160
2023
W03
New York
4160*(1-7.69%)
3840
I cannot find a way to do this with my dataset. I cannot use FIXED function because I've used a table calculation in my calculated field for growth rate.
My struggle is that I'm using a Dataset provided to me by my organization and cannot modify the table logic. After I created the calculated field, I cannot use the 2022 growth % if I have 2023 data filtered in.
I have an option to create a custom data source but I do not think that the effort is worth for this.
Does anyone have an idea to make this possible?
I can provide more information if the description is not clear enough. Thanks a lot!

Query that references itself

I want to create a self referencing and shifting basked of stocks (similar to methodologies used by S&P for the S&P 500). The goal is to create an index that changes structure every month. It guarantees a spot for the first stock by market cap (rank). The 2nd spot goes to the stock that has been in the previous months' lineup and ranks between 2 and 3 this month. If that stock ranks lower than 3, it will get excluded and a new stock slots into its' place. Else, the next closest stock will get chosen.
Give the table below, the index would include the following stocks:
2020-01-01 – AAPL + MSFT
2020-02-01 – AAPL + MSFT
2020-03-01 – APPL + GOOG
In my real data, I obviously have many more stocks and many more months. I am having a very hard time modeling the second case in Postgres since it requires me to create a continuously updated "previous months" table that I need to reference when checking the current month. Any idea how to do this in PostgreSQL? I tried recursive CTEs and those didn't work (due to inner join requirements)
Table with structure below.
date
stock
rank
2020-01-01
AAPL
1
2020-01-01
MSFT
2
2020-01-01
META
3
2020-01-01
GOOG
4
2020-02-01
AAPL
1
2020-02-01
MSFT
3
2020-02-01
META
2
2020-01-01
GOOG
4
2020-03-01
AAPL
1
2020-03-01
MSFT
4
2020-03-01
META
3
2020-01-01
GOOG
2

calculation to determine average per event by year

I have a very large table of data containing cricket information. At the moment I am trying to gather the average number of runs per match for Australia (and other countries) in years 2013, 2014, and 2015. I was able to get the average runs per year into a graph and currently I have a bar chart that looks like this:
year 2013 | 2014 | 2015
total runs 1037 | 1835 | 177
but I would like one that divides that total by the number of matches per year (6, 13, and 1 respectively) and looks like this:
year 2013 | 2014 | 2015
avg runs per match 173 | 141 | 177
but I don't know how to conduct a calculation on these numbers to divide that total over the number of games played. There is a column in my set called 'MID' for Match ID. Obviously, summing the number of MID for 2013 would give me the needed number, 6.
Ideally, I would divide the total number of runs by the number of unique items in the MID column, but I do not know how to do this. If this makes any sense at all, would anyone have a simple way of doing this? I would really appreciate it, as I'm essentially experimenting with this on my own and falling way behind on my other projects.
Assuming you have a column named "Runs" and a column named "MID", then a calculation for Runs per Match would be as follows:
SUM(Runs) / COUNTD(MID)
This gives total runs divided by distinct count of Match ID.

How to include Percentage in ToolTip in Tableau Public

I created the below view that gives No. of students got PASS/FAIL in each subject . The tool Tip gives me some default options.
But I would like to have Percentage in tool tip
Basically I need Percentage field in tooltip that says 50% for below screen.
PASS Percentage 50%
FAIL Percentage 50%
This Percentage field needs to vary as per each subject and its grade among students
Could somebody help me on steps to include Percentage in Tooltip?
sample dataset
id name age gender subject grade
100 Steve 14 MALE ENGLISH PASS
100 Steve 14 MALE PHYSICS PASS
100 Steve 14 MALE CHEMISTRY PASS
101 Edward 15 MALE ENGLISH FAIL
101 Edward 15 MALE PHYSICS FAIL
101 Edward 15 MALE CHEMISTRY FAIL
102 Andy 15 FEMALE ENGLISH PASS
102 Andy 15 FEMALE PHYSICS PASS
102 Andy 15 FEMALE CHEMISTRY FAIL
103 Kim 16 FEMALE ENGLISH FAIL
103 Kim 16 FEMALE PHYSICS FAIL
103 Kim 16 FEMALE CHEMISTRY PASS
Table calcs let you calculate percent of totals without creating new calculated fields.
Put SUM(Number of Records) on the Tooltip shelf. Then click on it and choose Quick Table Calc->Percent of Total. You will see a triangle icon next to the field indicating it is now a table calculation
Experiment with changing the Compute Using setting for the Field. I believe compute using Grade is probably the one you want.
I did this with a few calcs. First, get the PASS count.
if [Grade] = 'PASS' then 1 END
Then create a Pass % calc.
sum([Pass count]) / total(countd([Id]))
Now you can place this field in the Tooltip. Repeat for FAIL as well and place that in the Tooltip.
Then I updated the tooltip as follows:
Number of students <CNTD(Id)> (<AGG(pass %)> <AGG(fail %)>) who got a <Grade> in <Subject>
see sample workbook here for details. https://dl.dropboxusercontent.com/u/60455118/160326%20stack%20question.twbx

Cumulative Days in Overlapping Date Ranges

I use Crystal Reports 11.
What I'd like to do is get a count of the unique days a student was enrolled in one of our many programs. If a student was enrolled in 3 programs in which the dates overlapped, I'd just want to count each day once and get a number.
Example using a student:
Algebra Jan 1 to Jan 10: 10 days
Science Jan 4 to Jan 11: 8 days
English Jan 9 to Jan 13: 4 days
I'd want the answer to be 13.
Good point. If they always over lap then this will work
Create a formula that finds maximum end date and the minimum start date based on patient. Then minus each formula.
i.e.: Maximum({xxx.enddate}, {xxx.patient}) - Minimum({xxx.startdate}, {xxx.patient})
If there are gaps between program dates, this won't work because it will include them.
Grouping the field by the patient name and Using
DistinctCount()
may be helpful.