Problems with an m:m Merge (Stata) - merge

I am trying to the merge two datasets that have unemployment rates from different sources, and the first is structured as below:
It has over 30 variables but I am only listing this as an example. In addition, each observation is measured at one year only, for Egypt it is 2005.
year country Gender Unemployment
2005 EGY Female 7.6
2005 EGY Male 9.2
2005 EGY Total .
2006 EGY Female 7.6
2006 EGY Male 9
2006 EGY Total .
The second is structured as below, but it comes from an annual survey, so each country has three entries per year (total, male, female). And each country has from 1995-2019.
country Gender year Unemployment
EGY Total 2005 12
EGY Male 2005 7
EGY Female 2005 17.5
Therefore, I tried to merge the two datasets with 1:1 and 1:m merge, and for both I get:
"variables country year do not uniquely identify observations in the master data"
However, the merge worked with a m:m as in below,
merge m:m country year using "Documents\LMI.dta"
Thanks to Nick's advice, I merged with the triples:
merge 1:1 country year Gender cusing "Documents\LMI.dta"
And it worked well!

On the face of it your datasets are identified by triples of country year Gender and so qualify for merge 1:1 with those variables. So, the downside of an m:m merge appears to be that it is quite wrong.
That statement doesn't solve any of the problems that come next:
Unemployment is so named in both sets, so what do you expect or want Stata to do?
In your data example, the values of Unemployment are different in different datasets, although perhaps this is not true of the real data.

Related

PostgreSQL LAG records one year apart by partitioning

I have a db table with a bunch of records in a snapshot fashioned way, i. e. daily captures of product units availability for many years
product units category expire_date report_date
pineapple 10 common 12/25/2021 12/01/2021
pineapple 8 common 12/25/2021 12/02/2021
pineapple 8 deluxe 12/28/2021 12/02/2021
grapes 45 deluxe 11/30/2022 12/01/2021
...
pineapple 21 common 12/12/2022 12/01/2022
...
What I'm trying to get from that data is something like this "lagged" version, partitioning by product and category:
product units category report_date prev_year_units_atreportdate
pineapple 10 common 12/01/2021 NULL
pineapple 21 common 12/01/2022 10
pineapple 16 common 12/01/2023 21
...
It's important to know that from time to time the cron snapshot task fails and no records are stored for days. This leads to a different number of records by product.
I've been using LAG() to no avail since I can only get previous day/month using partitioning by product, category
Can anyone help me on this?
I think I would use a subselect rather than a window function.
select *,
(
select units from t t2
where t2.report_date=t1.report_date-interval '1 year' and t2.product=t1.product and t2.category=t1.category
) lagged_units
from t as t1
I'm not sure what you want to happen on leap year, though, or the year after one.

How to group previously-denormalized-data from a row

I have a table containing courses run by teachers, I want to grab the number of taught days and split these by years and teachers' status.
The table contains the following fields:
id teacher_id course_name course_date course_duration teacher_status
--------------------------------------------------------------------------
1 Teacher_01 Course_AA 2012-02-01 2 volunteer
2 Teacher_02 Course_BB 2012-02-01 7 employee
3 Teacher_03 Course_BB 2013-02-01 7 contractor
4 Teacher_01 Course_AA 2014-02-01 2 paid volunteer
5 Teacher_04 Course_AA 2014-06-01 2 paid volunteer
Teachers may run a course under various statuses: volunteer, paid volunteer, contractor, employee, etc. The status of a given teacher can change through time. The duration of a course is expressed in days.
I can already gather the sum of taught days by teachers, split by status. This is done by
SELECT
teacher_status,
sum(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
teacher_status
;
But data is not normalized and different families of statuses have been mixed. So I want to gather the same info (number of taught days) split:
by 3 statuses: volunteer, paid volunteer, all other statuses,
and by years.
What is expected is:
Year Teacher_status Taught_days
---------------------------------------
2012 volunteer 2
2012 employee 7
2013 contractor 7
2014 paid volunteer 4
I've tried various combinations of aggregate functions, GROUP BY / HAVING / ROLLUP statements but without success. How should I achieve this?
You'll want to select a complex expression and then GROUP BY that, not just by a raw column value. You could either repeat the expression or, in Postgres, also refer to the column alias:
SELECT
EXTRACT(year FROM course_date) as year,
(CASE teacher_status
WHEN 'volunteer' THEN 'volunteer'
WHEN 'paid volunteer' THEN 'paid'
ELSE 'other'
END) AS status,
SUM(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
year,
status;
To get your example result, I have this query
SELECT extract (year from course_date),
teacher_status,
sum(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
extract (year from course_date),
teacher_status;

TIBCO Spotfire: Cars in service since 3 weeks or more

I have a table of vehicles at service locations showing columns such as DAY, LICENSE, BOROUGH etc. I'd like to add a cross table showing the number of vehicles that have been serviced for 3 weeks or more. I'm not sure what custom expression to use.
Sample data:
Sample data
I hope your sample data isn't containing a bunch of legitmate license plates. not the most compromising data but I would recommend blacking out or replacing them with test data if it isn't already.
anyway. you're looking for the DateDiff() function. for example:
If(DateDiff('day', Date(DateTimeNow()), [Date]) >= 21, "21 days or more", "less than 21 days")

How to set the SQL Query for a Report?

Crystal Reports 2011.
Database is MS Access 2003
I have the following tables:
Calendar
has Date entries for the current and next Year, for every Day of the Year, plus some Status Fields marking certain days as "Special" (Joining this table so I have a record for Days with no activity.
Staff table
StaffNo
Name
.
.
.
DayResults
Date
StaffNo
Status
.
.
.
The DayResults table has one entry per Day and Staff.
Entries are only made when the staff gets an entry by the Program for Status or other events. Staff that does not log in the system has no entry for this Day.
So, in case of John not showing up on July 2nd, i have no entry for him for this Day. But I need an entry for my report!
I need to create a Report that fetches Data from the DayResults Table and make calculations on the Parameters here, in order to calculate a Daily as ell as Period Bonus.
The rules for this Bonus require that a Day without activity (i.e. No Show) results in a negative bonus amount.
Therefore I need to have a select statement which creates an entry FOR EACH DAY FOR EACH STAFF.
This should look like this:
Date StaffNo Name Status
2016/07/01 1 Jim 1
2016/07/01 2 John 2
2016/07/02 1 Jim 2
2016/07/02 2 John NULL
(John did not show up on 2016/07/02 ...)
SELECT Calendar.Date, Staff.StaffNo, Staff.NickName, DayResults.Status
FROM Staff LEFT JOIN (Calendar RIGHT JOIN DayResults ON Calendar.Date = DayResults.Date) ON Staff.StaffNo = DayResults.StaffNo;
Unfortunately, no entry here for John on July 2nd?
Any idea how to proceed?
Manfred

LOD Workaround with Tableau 8.3

I'm new to Tableau. I have a customer-event table to show which customers attended which events (like webinars, etc). One of fields is sales - which is the sales for that customer 30 days from the date of the event.
custid eventid eventdt 30daysales
1 aa jan 1 $100
1 ab jan 1 $100
2 aa jan 2 $150
Note that customer 1 attended 2 events on the same day. So the sales number is duplicated. If I were building a report for a single event, it's no problem. But when I build a monthly report, I want sum(Sales) = $250 and not $350.
My report sample:
Month eventcount customercount 30daysales
Jan 2 2 $250
With tableau 9, I read that using an LOD formula would allow me to sum sales on a per customer basis. But I'm on Tableau 8.3 and I'm wondering what the manual workaround is.
How do I write the calculated field to compute the 30daysales without duplicating?