How to search for names in a text field in SQL Server - tsql

I need to redact proper names from text fields in SQL Server. Let's say I have the following table:
PersonTable
FirstName
LastName
Notes
I could do this:
UPDATE PersonTable
SET Notes = REPLACE(REPLACE(Notes, FirstName, 'REDACTED'), LastName, 'REDACTED')
That should work fine for the exact match condition, but what if someone has misspelled first or last name in the Notes field, or worse yet, used a nick-name like Jim?
I think Full Text searching using Contains is good for this sort of thing where the deviation is meaning or derivation-based, but will it work for names? Even if it worked for finding rows where Notes contained a name, I don't think it works with the Replace scenario.
I have also considered SOUNDEX, but I am also not seeing how to do this using Replace for a text field. The only way I can see using Soundex or something like that would be to split the text field into words and do a comparison on each word. I have to do this on many text fields in very heavily populated tables, so I'm not excited about doing that if there's a better way.
Does anyone have experience doing something like this?
Thanks

Related

Mysql to Sphinx query conversion

how can i write this query on sphinx select * from vehicle_details where make LIKE "%john%" OR id IN (1,2,3,4), can anyone help me? I've search a lot and i can't find the answer. please help
Well if you really want to use sphinx, could perhaps make id into a fake keyword, so can use it in the MATCH, eg
sql_query = SELECT id, CONCAT('id',id) as _id, make, description FROM ...
Now you have a id based keyword you can match on.
SELECT * FROM index WHERE MATCH('(#make *john*) | (#_id id1|id2|id3|id4)')
But do read up on sphinx keyword matching, as sphinx by default only matches whole words, you need to enable part word matching with wildcards, (eg with min_infix_len) so you can get close to a simple LIKE %..% match (which doesnt take into account words)
Actually pretty hard to do, becuase you mixing a string search (the LIKE which will be a MATCH) - with an attribute filter.
Would suggest two seperate queries, one to sphinx for the text filter. And the IN filter just do directly in database (mysql?). Merge the results in the application.

FullText Index - Searching values from another table

Is it possible, in SQL Server 2008, using the full text index syntax, to run a query such as this one?
SELECT *
FROM TABLE_TO_SEARCH S,
TABLE_WITH_STRINGS_TO_SEARCH SS
WHERE
CONTAINS(S.WHOLE_NAME,SS.FIRST_NAME)
OR CONTAINS(S.WHOLE_NAME,SS.LAST_NAME)
I need to search for the FIRST_NAME in table TABLE_TO_SEARCH, column WHOLE_NAME that has an full text index on it. It doesn't seem to be a valid query though... Is there any workaround to it by using the full text index search?
LATER EDIT:
Here is the business case: each night I am downloading from several websites information about "blacklisted" individuals and insert it into a table in this format: WholeName, LastName, FirstName, MiddleName. But the data is chaotic as WholeName does not necessarily contain either the last, first or middle name or the WholeName is null while the other 3 fields have values, or every of these 4 fields is null and so on. Also, the data may repeat itself as one blacklisted individual may come from 2+ of these websites. What I need to do is to compare this data, as chaotic as it is, against our customer data based on our customer's First and Last name and give it a matching score (rank) against the files we download from these websites.
First I tried with charindex or like operators but I couldn't create a scoring algorithm based on this and also it took 6 hours to compare just our customer's first and last name with only the WholeName column from the TABLE_TO_SEARCH table. I thought that perhaps implementing the full_text index it would get easier and faster but ... apparently I was wrong.
Has anyone dealt with a task like this? And if so, what was the best approach?
After skimming http://technet.microsoft.com/en-us/library/ms187787.aspx and http://technet.microsoft.com/en-us/library/ms142571.aspx I don't think it is possible to do your search in this way. Not only that, but it seems this type of index wouldn't work well with names anyway.
If you care about checking one name then all you have to do is set those values to variables. This method would allow you to use the full-text index.
Otherwise, I would suggest splitting the WHOLE_NAME column (if there is a space or unique character between the first and last name) and comparing each part to those other columns. If you are working with a huge data set, you may want to experiment with doing this at a temp table level and creating an index.
Good luck!

Access Form - Syntax error (missing operator) in query expression

I am receiving a syntax error in a form that I have created over a query. I created the form to restrict access to changing records. While trying to set filters on the form, I receive syntax errors for all attributes I try to filter on. I believe this has something to do with the lack of () around the inner join within the query code, but what is odd to me is that I can filter the query with no problem. Below is the query code:
SELECT CUSTOMER.[Product Number], SALESPERSON.[Salesperson Number],
SALESPERSON.[Salesperson Name], SALESPERSON.[Email Address]
FROM SALESPERSON INNER JOIN CUSTOMER ON
SALESPERSON.[Salesperson Number] = CUSTOMER.[Salesperson Number];
Any ideas why only the form would generate the syntax error, or how to fix this?
I was able to quickly fix it by going into Design View of the Form and putting [] around any field names that had spaces. I am now able to use the built in filters without the annoying popup about syntax problems.
I had this same problem.
As Dedren says, the problem is not the query, but the form object's control source. Put [] around each objects Control Source. eg: Contol Source: [Product number], Control Source: Salesperson.[Salesperson number], etc.
Makita recomends going to the original table that you are referencing in your query and rename the field so that there are no spaces eg: SalesPersonNumber, ProductNumber, etc. This will solve many future problems as well. Best of Luck!
Try making the field names legal by removing spaces. It's a long shot but it has actually helped me before.
No, no, no.
These answers are all wrong. There is a fundamental absence of knowledge in your brain that I'm going to remedy right now.
Your major issue here is your naming scheme. It's verbose, contains undesirable characters, and is horribly inconsistent.
First: A table that is called Salesperson does not need to have each field in the table called Salesperson.Salesperson number, Salesperson.Salesperson email. You're already in the table Salesperson. Everything in this table relates to Salesperson. You don't have to keep saying it.
Instead use ID, Email. Don't use Number because that's probably a reserved word. Do you really endeavour to type [] around every field name for the lifespan of your database?
Primary keys on a table called Student can either be ID or StudentID but be consistent. Foreign keys should only be named by the table it points to followed by ID. For example: Student.ID and Appointment.StudentID. ID is always capitalized. I don't care if your IDE tells you not to because everywhere but your IDE will be ID. Even Access likes ID.
Second: Name all your fields without spaces or special characters and keep them as short as possible and if they conflict with a reserved word, find another word.
Instead of: phone number use PhoneNumber or even better, simply, Phone. If you choose what time user made the withdrawal, you're going to have to type that in every single time.
Third: And this one is the most important one: Always be consistent in whatever naming scheme you choose. You should be able to say, "I need the postal code from that table; its name is going to be PostalCode." You should know that without even having to look it up because you were consistent in your naming convention.
Recap: Terse, not verbose. Keep names short with no spaces, don't repeat the table name, don't use reserved words, and capitalize each word. Above all, be consistent.
I hope you take my advice. This is the right way to do it. My answer is the right one. You should be extremely pedantic with your naming scheme to the point of absolute obsession for the rest of your lives on this planet.
NOTE:You actually have to change the field name in the design view of the table and in the query.
Put [] around any field names that had spaces (as Dreden says) and save your query, close it and reopen it.
Using Access 2016, I still had the error message on new queries after I added [] around any field names... until the Query was saved.
Once the Query is saved (and visible in the Objects' List), closed and reopened, the error message disappears. This seems to be a bug from Access.
I did quickly fix it by going into "Design View" of the main Table of same Form and putting underline (_) between any field names that had spaces. I am now able to use the built in filters without the annoying popup about syntax problems.
Extra ( ) brackets may create problems in else if flow. This also creates Syntax error (missing operator) in query expression.
I had this on a form where the Recordsource is dynamic.
The Sql was fine, answer is to trap the error!
Private Sub Form_Error(DataErr As Integer, Response As Integer)
' Debug.Print DataErr
If DataErr = 3075 Then
Response = acDataErrContinue
End If
End Sub

Openedge SDO -> smart data browser - I want to filter the query results

I have an SDO supplying data to a read-only browser. The SDO query joins several tables and has calculated fields as well as natural data fields.
The users now want a search facility so the browser will only show rows where the search word appears in ANY of the text fields.
For example they want to see rows where
customer.name matches "*bob*" OR
customer.address1 matches "*bob*" OR
product.description matches "*bob*" OR
calc_field_1 matches "*bob*" OR
calc_field_2 matches "*bob*" OR ...
Ideally the answer will filter the SDO output as it is created - but I am also happy to filter the data on the way to the smartbrowser or in the smartbrowser.
The business problem you're trying to solve in fraught with performance issues if you implement it as written. I'd suggest
adding another character column to the table or db,
putting all the words from the other columns in it,
applying a word-index to the new column,
doing a search on that column, and then linking back to the source tables.
It'll be much faster and easier to use.
I used a very simple solution in the end. Users can enter a string they are looking for. If the string is in a cell in the browser then the cell is highlighted in yellow.
Before this the users had to scroll up and down trying to spot the cells of interest in hundreds of rows. We did not have the time or budget for anything fancier.
The important bit of code in the smartbrowser is like this...
on row-display of br_table in frame f-main
do:
if rowObject.field1 matches "*BOB*" then
rowObject.field1:BGCOLOR in browse br_table = 14.
if rowObject.field2 matches "*BOB*" then
rowObject.field2:BGCOLOR in browse br_table = 14.
if rowObject.field3 matches "*BOB*" then
rowObject.field3:BGCOLOR in browse br_table = 14.
... etc ...
it's not hard-coded to only look for Bob - but you should get the idea.

Searching Core Data when field contains ' (single quote) character

Please help because I think I'm going mad.
I have a core data collection that contains many thousand records.
Some of the fields in these records contain terms that have a single
quote. When building the database from XML, we created two fields - a
NAME field and a SORTFIELD field.
The NAME field contains the full term and is used for display purposes
in our application.
The SORTFIELD field contains the same content as the NAME field but is
converted to lowercase and has all single quote characters removed
using the following.
[NSString stringByReplacingOccurrencesOfString:#"'" withString:#""];
When the user enters the string for which they wish to search, we
remove single quotes from the search string entered and convert to
lowercase using the same method as when creating the data. However it
does not find any results.
If the user enters Michael's, the system fails to find any records
The predicate is "SORTFIELD like 'michaels'"
If the user enters Michael' (single quote no 's' character), the
system finds all records starting Michael's
The predicate is "SORTFIELD like 'michael'"
I am sure we are doing something very stupid and appreciate any
pointers on how to resolve this issue as we have gone round and round
in circles.
We have placed many NSLog entries to track actual content of the
search fields and record fields and all display as expected.
Thanks
Mike
I finally resolved the issue on finding that the fields being searched
had a non-displayable character when a single quote existed. For
example Michael's displayed but Michaelx's where x= an extended ascii
character, was the actual contents.
By removing this from the fields we were able to perform the searching
as described in our original posting.
However, I still feel that you should be able to search a Core Data collection for strings that contain single quotes without having to do the 'hack' described.
Welcome any thoughts from others on this.
Thanks
Mike