Exact matching on >= 1 variable and fuzzy matching on >=1 variable stata - merge

In Stata, how can I do exact matching on at least one variable as well as fuzzy matching on at least one variable?
For instance, say that I want to do exact matching on org and year and fuzzy matching on firstname and lastname. In other words, in order for it to even consider fuzzy matching on firstname and lastname, org and year must be exact matches.
Here is an example dataset:
*dataset a
clear all
input str1 org year str10 firstname str12 lastname
"A" 2010 "susan" "robertson"
"A" 2011 "bob" "miller"
"B" 2010 "albert" "smith"
"B" 2011 "sue" "washington"
end
tempfile a
save `a'
And the other one, to be merged:
*dataset b
clear all
input str1 org year str10 firstname str12 lastname
"A" 2010 "Susan A" "Robertson"
"A" 2011 "bob" "Miller"
"A" 2012 "francisco" "ramirez"
"B" 2010 "mike" "doorpen"
"B" 2011 "sue h" "washnngton"
end
tempfile b
save `b'
How can I accomplish what I want?
The best I can think of is to use matchit after combining firstname and lastname together into one variable, say name. Then keep only the fuzzy matched results above some threshold for the observations that have the same org and year. But this seems pretty clunky. Is there a better way? Open to all approaches.

Someone told me the answer on Twitter: use reclink (https://fmwww.bc.edu/repec/bocode/r/reclink.html) with the required option.

Related

String match in Postgresql

I am trying to make separate columns in my query result for values stored in in a single column. It is a string field that contains a variety of similar values stored like this:
["john"] or ["john", "jane"] or ["john", "john smith', "jane"],etc... where each of the values in quotes is a distinct value. I cannot seem to isolate just ["john"] in a way that will return john and not john smith. The john smith value would be in a separate column. Essentially a column for each value in quotes. Note, I would like the results to not contain the quotes or the brackets.
I started with:
Select name
From namestbl
Where name like %["john"]%;
I think this is heading in the wrong direction. I think this should be in select instead of where.
Sorry about the format, I give up trying to figure out the completely useless error message when i try to save this with table markdown.
Your data examples represent valid JSON array syntax. So cast them to JSONB array and access individual elements by position (JSON arrays are zero based). The t CTE is a mimic of real data. In the illustration below the number of columns is limited to 6.
with t(s) as
(
values
('["john", "john smith", "jane"]'),
('["john", "jane"]'),
('["john"]')
)
select s::jsonb->>0 name1, s::jsonb->>1 name2, s::jsonb->>2 name3,
s::jsonb->>3 name4, s::jsonb->>4 name5, s::jsonb->>5 name6
from t;
Here is the result.
name1
name2
name3
name4
name5
name6
john
john smith
jane
john
jane
john

How to convert a symbol to a string in kdb+?

For example, if I have a list of symbols i.e (`A.ABC;`B.DEF;`C.GHI) or (`A;`B;`C), how could I convert each item in the list to a string?
string will convert them. It's an atomic function
q)string (`A.ABC;`B.DEF;`C.GHI)
"A.ABC"
"B.DEF"
"C.GHI"
You can use the keyword string to do this documented here
q)lst:(`A;`B;`C)
// convert to list of strings
q)string lst
,"A"
,"B"
,"C"
As the others have mentioned, string is what you're after. In your example if you're interested in separating the prefix and suffix separated by the . you can do
q)a:(`A.ABC;`B.DEF;`C.GHI)
q)` vs' a
A ABC
B DEF
C GHI
and if you want to convert these to strings you can just use string again on the above.
q)string each (`A.ABC;`B.DEF;`C.GHI)
"A.ABC"
"B.DEF"
"C.GHI"
Thanks all, useful answers! While I was trying to solve this on my own in parallel, I came across ($) that appears to work as well.
q)example:(`A;`B;`C)
q)updatedExample:($)example;
q)updatedExample
enlist "A"
enlist "B"
enlist "C"
use String() function.
q)d
employeeID firstName lastName
-----------------------------------------------------
1001 Employee 1 First Name Employee 1 Last Name
1002 Employee 2 First Name Employee 2 Last Name
q)update firstName:string(firstName) from `d
`d
q)d
employeeID firstName lastName
-------------------------------------------------------
1001 "Employee 1 First Name" Employee 1 Last Name
1002 "Employee 2 First Name" Employee 2 Last Name

Is there a way to filter one database field based on another?

I have one field which is {customer.id} and another which is {side} (that has the values "L" and "R") and I need to count the {customer.id} if {side} is either "L" or "R" but not both(xor) for the specific id.
Example:
{customer.id} {side}
id1 L
id1 R
id2 L
id2 R
id3 L
id4 R
id4 R
So I would like the result of the distinctcount for this example to be 2 (id3 and id4).
Is it possible to achieve this filtering somehow in crystal reports?
Create two running totals for "L" and "R"
Fields to summarize: {customer.id}
Type of Summary: Count
Evaluate: Check Use a Formula and press the formula button.
In the formula box enter {side} = "L"
Reset: Check Never
as seen here.

Filemaker: If statement multiples

Filemaker Pro 13, using a windows 10 comp. The Database is used from both Mac and Windows computers.
I have a calculation field that detects the string-value of another field to define itself.
Field 1: Name Field2: Position
If ( Field1 = "Bob" or "Joe" or "Carl" ; "Tech Assist")
If ( Field1 = "Susan" or "Hank" or "Alex" ; "Employee")
Filemaker only picks up on the first value of the If statement, in this example, "Bob" and "Susan". All others are left blank.
Like so:
Name: Bob Position: Tech Assist
Name: Joe Position: ___________
How do I get Filemaker to view all possibilities?
There are (at least) two problems with your attempt:
The test Field1 = "Bob" or "Joe" or "Carl" is evaluated as if you have written (Field1 = "Bob") or ("Joe") or ("Carl") - so it will only ever return true when Field1 = "Bob";
You cannot string two If() instructions together like that. Although
you could nest them, you really should be using the Case()
function here, say:
Case (
Field1 = "Bob" or Field1 = "Joe" or Field1 = "Carl" ; "Tech Assist" ;
Field1 = "Susan" or Field1 = "Hank" or Field1 = "Alex" ; "Employee"
)
The more serious problem here is that you are hard-coding data that will almost certainly change at some point into a calculation formula.
What you should have is a related table of Staff with fields for StaffID, Name and Position. Then, when you fill Field1 in this table with the appointed staff member's StaffID, the corresponding position will be taken from that staff member's record in the Staff table via a relationship.
After much fiddling, Big Boss and I figured it out. Seems a bit redundant:
If ( Field1 = "Bob or Field1 = "Joe" or Field1 = "Carl" ; "Tech Assist")
Alternatively...
Case(
Not IsEmpty( FilterValues( List( "Bob" ; "Joe" ; "Carl" ) ; Field1 ) ) ;
"Tech Assist" ;
Not IsEmpty( FilterValues( List( "Susan" ; "Hank" ; "Alex" ) ; Field1 ) ) ;
"Employee"
)
A longer term solution would be to have a table of names and departments. See the image in case it is not clear.
Staff Table Sample
Then you will build a relationship between Field1 in your current table and the "Name" field in the sample image. From there, the relationship will output the name of the department so that you do not need if, else, or case. Otherwise, you will find yourself updating this script every time there is a change to your roster or their roles in your company. The relationship will also allow you to easily add new departments without the need to update this calculation.

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi