How do I deconflict multiple ABRecord when the uniqueid fails and I have people with the same first and last name? - iphone

This may be down in the weeds, but thought I'd throw this out there and ask. So I'm building an app where I need to reference information in the address book, say a person's phone number, but do not want to store the info separately. Basically, every time the user loads the app, I'll go check the address book and get the info just in case they've updated it.
Anyway, I've read this in the iPhone Programming Doc:
"Every record in the Address Book database has a unique record identifier. This identifier always refers to the same record, unless that record is deleted or the MobileMe sync data is reset. Record identifiers can be safely passed between threads. They are not guaranteed to remain the same across devices.
The recommended way to keep a long-term reference to a particular record is to store the first and last name, or a hash of the first and last name, in addition to the identifier. When you look up a record by ID, compare the record’s name to your stored name. If they don’t match, use the stored name to find the record, and store the new ID for the record."
So what I'm worried about or curious about is let's say I've stored the uids and the first and last name. Then let's say upon sync or device transfer or whatever my uids get hosed. Now let's say my address book contains two entries for user "Bob Smith".
How do you deconflict this given the uids no longer match and first/last name are the same? My guess is to end up storing other info (e.g. phone number, email, etc...) but that puts me back into the situation of not really wanting to store more info than is necessary. I'm realizing this could be a .0001% time problem, but thought I'd throw this out there to see what you all thought.
Thanks for any suggestions!

I’m trying to solve the same problem. But with my solution, I can (or actually need to) store everything for multi-platform/multi-device issues. But if you can afford to pick up one or two additional properties to really find the difference between the two people (lets say e-mail address and birthday date), you should be able to recognize them at most times. If not, I would wonder if they aren’t really the same person (same name, same e-mail, same birthday… walks like a duck, quacks like a duck, is duck enough).
And I would say, it’s not .0001 % time, ie. me and my father have the very same name (and that happens to a lot of people in my country). But we definitely don’t have the same e-mail address or birthday date.

Related

Dealing with newlines or multi line responses from a user with a chatbot?

Is there a known way to deal with users writing responses over multiple lines? - is it best to handle this case on the client level? as in checking if the user is still typing and have a delay between responses, or can this be handled on Watson somehow?
An example would be:
Bot:
What's Your Name?
User:
My name is
Nour
Those are two independent messages by the user over 2 lines.
It is best to always send the full "utterance" to Assistant in one request, because the processing does not work across multiple split calls to Assistant. Otherwise you would need to do some complex logic with context variables, or ask back the user their name if they uttered "my name is" without an actual name.
Generally the client side UI would wait for the user to press Enter before sending the utterance to Assistant. So you can be sure they have entered the full utterance.
But perhaps if they do utter "my name is" you could have an intent which checks for a name and an entity that extracts the name, and a dialog node which if the intent is found has a slot which ensures the entity is also found. In that way, if they do say "my name is" and no name, the bot will ask them for their name.
DSeager's approach is potentially the right way for the example you gave. The reason being is that within your overall question you have an entity.
What about where there is on real entity? For example:
Two Intents.
Pay my speeding fine
Pay my parking fine
Here you are using the intents to understand the answer without the need of an entity. Some will argue the Intent->Entity approach, but depending on your solution it can generally not scale as well versus just an intent answer.
So now your user enters the following:
How do I pay my
parking fine
Entity solution doesn't really work here as you have no context that they want to pay.
So one approach.
1. Send "How do I pay my" to WA. Assuming you have trained the system well it should come back with a low confidence or irrelevant.
2. Before you respond to the user, see if another utterance has been cached to send. If it has, then append with some kind of marker and send. For example:
How do I pay my !! parking fine
This will return the correct answer.
But wait a second, what if they do this?
How do I pay my parking fine?
Where do I pay it?
Both are valid questions, but the second one will fail and you can't append it to the previous.
In this instance, when an answer displays have it set a $anaphora context variable. Then if you get a low confidence/irrelevant response try reasking with the $anaphora value appended.
For example:
Q: How do I pay my parking fine?
A: <Answer> $anaphora = "parking fine"
Q: Where do I pay it?
A: <Irrelevant>
Q: parking fine !! Where do I pay it?
Both of these require some work at the application layer.

Is the MODSEQ value of an email unique to the entire mail account?

As far as I can see in the IMAP RFC, it appears to say that a MODSEQ value is unique to a folder and will never be repeated unless UIDValidity changes. However, I can't see it saying anything about the account as a whole, rather just folders.
My question is, can I use an emails MODSEQ value as a unique value across the entire inbox, or need I define my own unique value, likely something similar to:
let uid = path + MODSEQ
There are no guarantee about uniqueness across folders. This is because some servers don't know much about other folders than the ones they have open at the moment, and it was considered important to make MODSEQ easy to implement for servers.
Yes, you need your own uniqueness value.

How do I generate a unique id from an auto incremented integer?

I have an auto incremented id (an int) that I want to convert in to something less "mine-able". Basically I don't want people to be able to access data/0, data/1, data/2, etc. and rip through our entire database. I was thinking of just hashing the ID but I wasn't sure if I could guarantee uniqueness.
Let's say the value range is from 1 to a couple hundred million. It may be that one of the hash algorithms can guarantee uniqueness within those parameters.
If not, what would be a good approach to take?
I did consider hashing and then appending the ID.
I'm trying to avoid using a GUID because it would require a lot of changes to existing code so I'd prefer to transform the data I have.
EDIT:
To further explain the situation - these are static resources that are being hit. I don't have to go to a database and reverse it or look it up against something else. Imagine a listing of products - a user might have a link to a specific page but I don't want them to be able to programatically go through every page so I need an non incrementing ID.
As far as I know hashing is intended to create unique ID based on some concrete data (e.g. name, surname etc.). Hashing auto incremented ID wont help you much. If someone searches through your database by entering an auto incremented ID, that ID will be passed to hash function as parameter and he will still get the data he wants. So I think that better solution would be to hash some other data in order to get a unique ID. If you do so then a person who searches through you database would have to know exact data that is stored in there (e.g. He would have to know exact name of you employee, or his SSN).
Hope that helps!
Use something pseudo-random to salt the value before hashing if there is no need for reverse lookup.

CQRS events do not contain details needed for updating read model

There is one thing about CQRS I do not get: How to update the read model when the raised event does not contain the details needed for updating the read model.
Unfortunately, this is a quite common scenario.
Example: I add a user to a group, so I send a addUserToGroup(userId, groupId) command. This is received, handled by the command handler, the userAddedToGroup event is created, stored and published.
Now, an event handler receives this event and the both IDs. Now there shall be a view that lists all users with the names of the groups they're in. To update the read model for that view, we do need the user id (which we have) and the group name (which we don't have, we only have its id).
So the question is: How do I handle this scenario?
Currently, four options come to my mind, all with their specific disadvantages:
The read model asks the domain. => Forbidden, and not even possible, as the domain only has behavior, no (public) state.
The read model reads the group name from another table in the read model. => Works, but what if there is no matching table?
Add the neccessary data to the event. => Does not work, as this means that I had to update all previous events as well, and I cannot foresee which data I may need one day.
Do not handle the event via a "usual" event handler, but start an ETL process in the background that deals with the event store, creates the neccessary data and writes the read model. => Works, but to me this seems a little bit of way too much overhead for such a simple scenario.
So, the question is: How do I deal with this scenario correctly?
There are two common solutions.
1) "Event Enrichment" is where you indeed put information on the event that reflects the information you are mentioning, e.g. the group name. Doing this is somewhere between modeling your domain properly and cheating. If you know, for instance, that group names change, emitting the name at the moment of the change is not a bad idea. Imagine when you create a line item on a quote or invoice, you want to emit the price of the good sold on the invoice created event. This is because you must honor that price, even if it changes later.
2) Project several streams at once. Write a projector which watches information from the various streams and joins them together. You might watch user and group events as well as your user added to group event. Depending on the ordering of events in your system, you may know that a user is in a group before you know the name of the group, but you should know the general properties of your event store before you get going.
Events don't necessarily represent a one-to-one mapping of the commands that have initiated the process in the first place. For instance, if you have a command:
SubmitPurchaseOrder
Shopping Cart Id
Shipping Address
Billing Address
The resulting event might look like the following:
PurchaseOrderSubmitted
Items (Id, Name, Amount, Price)
Shipping Address
Shipping Provider
Our Shipping Cost
Shipping Cost billed to Customer
Billing Address
VAT %
VAT Amount
First Time Customer
...
Usually the information is available to the domain model (either by being provided by the command or as being known internal state of the concerned aggregate or by being calculated as part of processing.)
Additionally the event can be enriched by querying the read model or even a different BC (e.g. to retrieve the actual VAT % depending on state) during processing.
You're correctly assuming that events can (and probably will) change over time. This basically doesn't matter at all if you employ versioning: Add the new event (e.g. SubmitPurchaseOrderV2) and add an appropriate event handler to all the classes that are supposed to consume it. No need to change the old event, it can still be consumed since you don't modify the interface, you extend it. This basically comes down to a very good example of the Open/Closed Principle in practice.
Option 2 would be fine, your question about "what about the mismatching in the groups' name read-model table" wouldn´t apply. no data should be deleted, should invalidated when a previous event (say delete group) was emmited. In the end the row in the groups table is there effectively and you can read the group name without problem at all. The only apparent problem could be speed inconsistency, but thats another issue, events should be orderly processed no matter speed they are being processed.

Am i violating any NF RULE on my database design?

Good day!
I am a newbie on creating database... I need to create a db for my recruitment web application.
My database schema is as follows:
NOTE: I included the applicant_id on other tables... e.g. exam, interview, exam type.
Am i violating any normalization rule? If i do, what do you recommend to improve my design? Thank you
Overall looks good. A few minor points to consider:
Interviewer is also a person. You will need to use program logic to prevent different / misspellings.
The longest real life email address I've seen was 62 characters.
In exam you use the reserved word date for a column name
(subjective) I would rename applicant_date to applied_at
I don't see a postal / zip code for the applicant
All result columns are VARCHAR(4). If they use the same values, can they be normalized?
Birthdate is better to store then age. You don't want to schedule someone for an interview on their birthdate (or if you're cruel by nature, you do want that :) ). Age can be derived from it and will also be correct at all times.
EDIT:
Given that result is PASS or FAIL, simply declare the field a boolean and name it 'passed'. A lot faster.
One area where I could see a potential problem is the Interviewer being integrated in interview. Also I would like to point at the source channel in applicant, which could potentially get blobbed (depending of what you're going to store in there).
You don't seem to be violating any normalization rules upon first glance. It's not clear from your schema design, however, that the applicant_id is a referencing the applicant table. Make sure you declare it as a foreign key that references the applicant table when actually implementing the scehma.
Not to make any assumptions on your data, but can the result of a screening be stored in 4 characters?
Age and gender are generally illegal questions to ask in interviews so you may not want to record such things. You might want a separate interviewer table. You also might want a separate table that stores qualifications so you can search for people you have interviewed with C# knowledge when the next opening comes up. I'd probably do something like a Qualifications table that is the lookup for quals you want to add to the applicant qualfications table. Then you'd need the qualification id, applicantId, years, skill level in the Applicant Qualification table.
I notice results is a varchar 4 field, I assume you are planning to put Pass/fail in it. I would consider having a numeric score as well. The guy who got 80% of the questions right passed but the guy who got 100% of them right might be the better candidate. In fact for interviews I might have interview questions and results tables. Then you can record the score and any comments about each question which can help later in evaluation of a lot of candidates. We did this manually in paper spreadsheets once when we were interviewing several hundred people (we had over a hundred openings at the time and this was way before personal computers) and found it most helpful to be able to compare answers to questions. It's hard to remember 200 people you interviewed and who said what. It might help later when you have a new opening to find the people who were strong onthe questions most pertinent to the new job who might not have been given a job at the time of the interview(5 excellent candidates, 1 job for instance).
I might also consider a field to mark if the candidate is unaccepatble for ever hiring for some reason. Such as he committed a felony or he lied on the resume and you caught him or he was just totally clueless in the interview. This can make it easy to prevent this person from being considered repeatedly.
I think that your DB structure has a lot of limitation for future usage. For example you can even have a description of the exam because this stable store the score and exam date. It may by that this kind of information are already stored in another system and you have to design only the result container. But even then the exam, screen and interview are just a form of test, that why the information about should be stored in one table and distinguished by some type id. If you decide to this approach you have to create another table to store the information about result
So the definition of that should look more like this:
TEST
TEST_ID
TEST_TYPE_ID ref TEST_TYPE - Table that define the test type
TEST_REQUIRED_SCORE - The value of the score that need to be reach to pass the exam.
... - Many others properties of TEST like duration, expire date, active inactive etc.
APPLICANT_RESULTS
APPLICANT_ID ref APPLICANT
TEST_ID = ref TEST
TESTS_DATE - The day of exam
TEST_START - The time when the test has started
TEST_FINISH - The time when the test has ended
APPLICANT_RESULT - The applicant result of taken test.
This kind of structure is more flexible and give the easy way to specify the requirements between the test in table like this
TEST_REQUIREMENTS - Table that specify the test hierarchy and limitation
TEST_ID ref TEST
REQUIRED_TEST ref TEST
ORDER - the order of exams
Another scenario is that in the future your employer will want to switch to an e-exam system. In that case only think what you will need are:
Create table that will store the question definition (one question can be used in exam, screen or interview)
Crate table that will store the question answers.
Create table that will store the information about the test question.
Create table for storing the answer for each question given from applicant.
A trigger that will update the over all score of test.