I am trying to figure out how to parse an address using T-SQL and I suck at T-SQL. My challenge is this,
I have a table called Locations defined as follows:
- City [varchar(100)]
- State [char(2)]
- PostalCode [char(5)]
My UI has a text box in which a user can enter an address in. This address could be in the form of essentially anything (yuck, I know). Unfortunately, I cannot change this UI either. Anyways, the value of the text box is passed into the stored procedure that is responsible for parsing the address. I need to take what the person enters and get the PostalCode from the Locations table associated with their input. For the life of me, I cannot figure out how to do this. There are so many cases. For instance, the user could enter one of the following:
Chicago, IL
Chicago, IL 60601
Chicago, IL, 60601
Chicago, IL 60601 USA
Chicago, IL, 60601 USA
Chicago IL 60601 USA
New York NY 10001 USA
New York, NY 10001, USA
You get the idea. There are a lot of cases. I can't find any parsers online either. I must not be looking correctly. Can someone please point me to a parser online or explain how to do this? I'm willing to pay for a solution to this problem, but I can't find anything, I'm shocked.
Perhaps a CLR function might be a better choice than tsql. Check out http://msdn.microsoft.com/en-us/magazine/cc163473.aspx for an example of using regular expressions to parse some pretty complex string inputs into table value results. Now you get to be as creative as you please with your regex matching but the following regex should get you started:
(.*?)([A-Z]{2}),? (\d+)( USA)?$
If you're reluctant to use CLR functions, perhaps you have regex functionality in the calling system, like ASP.Net or PHP.
Related
I have training data (.arff) and i want to convert to test data.
this is my training data:
#relation fix_labeled_tweet
#attribute Text string
#attribute class-att {relevant,not_relevant,additional}
#data
'pvj dengan ciwalk masih tetap jadi tempat fav untuk belanja;',additional
'deta di bandung trade centre btc fashion mall;',additional
'promo hotel bandung ibis trans studio enjoy our special price akan your wonderful weekend periode s di 27 desember;',not_relevant
'indri theressa di cihampelas walk ciwalk;',additional
'beiga we di jatinangor town square jatos;',additional
'nonton di paris van java my husband;',relevant
'mainya seringnya ke paris van java mall miko mall mana;',not_relevant
'double date yeahhhh di braga city walk;',relevant
'sinta di jatinangor town square jatos;',additional
'terimakasih tas dompet teguh di cihampelas walk ciwalk;',additional
'malam minggu miko the movie di cinema 21 mall panakukang;',additional
'karaokean sekalian dugem patriot handrian di inul vista paskal hypersquare;',relevant
'makan di mujigae korean resto ciwalk;',relevant
'just posted a photo bandung trade center;',additional
What i've tried is removing the label (addition,relevant,not_relevant) from the data, then i save to different name, but it's not working. Weka said that the train and test set are not compatible.
They are incompatible because the structure of the training set and testing set is different.
If you did a copy of the document (say as Testing.arff), then supplied it as the test set, then the classifier would accept the file fine. If, however, you remove the used attributes from the testing file, then the document cannot be used either because some of the inputs (for classification) or outputs (for evaluation) are missing.
I have been able to replicate your issue when removing the class output, but when copying the document, the test set works correctly as expected.
Hope this helps!
Hello everyone I am trying to get prefix of phone numbers in order to get the actual phone number without country dialing code. How can I achieve this?
Please note that the phone numbers can be
123456789
0099123456789
+9912345678
or any other formats with country code and area code etc..
if you tried like this then it will help some what but not sure ,
NSString *str=[PhoneNumber substringToIndex:[PhoneNumber length]-10];
Taking a look at the amount of different prefixes you can have List of country calling codes [wikipedia] and Internatioal dialing prefix [wikipedia], one could reach the conclusion that without narrowing the area down you'll probably not get very far with this.
If however you'll be handling phone numbers from a specific region to another specific region you might be able to come up with something.
I need to extract names (including uncommon names) from blocks of text using Perl. I've looked into this module for extracting names, but it only has the top 1000 popular names and surnames in the US dating back to 1990; I need something a bit more comprehensive.
I've considered using the Social Security Index to make a database for comparison, but this seems very tedious and processing intensive. Is there a way to pull names from Perl using another method?
Example of text to parse:
LADNIER Louis Anthony Ladnier, [Louie] age 48, of Mobile, Alabama died at home Friday, November 16, 2012. Louie was born January 9, 1964 in Mobile, Alabama. He was the son of John E. Ladnier, Sr. and Gloria Bosarge Ladnier. He was a graduate of McGill-Toolen High School and attended University of South Alabama. He was employed up until his medical retirement as Communi-cations Supervisor with the Bayou La Batre Police Department. He is preceded in death by his father, John. Survived by his mother, Gloria, nephews, Dominic Ladnier and Christian Rubio, whom he loved and help raise as his own sons, sisters, Marj Ladnier and Morgan Gordy [Julian], and brother Eddie Ladnier [Cindy], and nephews, Jamie, Joey, Eddie, Will, Ben and nieces, Anna and Elisabeth. Memorial service will be held at St. Dominic's Catholic Church in Mobile on Wednesday at 1pm. Serenity Funeral Home is in charge of arrangements. In lieu of flowers, memorials may be sent to St. Dominic School, 4160 Burma Road Mobile, AL 36693, education fund for Christian Rubio and McGill-Toolen High School, 1501 Old Shell Road Mobile, AL 36604, education Fund for Dominic Ladnier. The family is grateful for all the prayers and support during this time. Louie was a rock and a joy to us all.
Use Stanford's NER (GPL). Demo:
http://nlp.stanford.edu:8080/ner/process
There is no sure fire way to do this due to the nature of the English language. You either need lists to (fuzzy)compare with, or will have to settle for significant accuracy penalties.
The Apache Foundation has a few projects that cover the topic of entity extraction with specific pre-trained models for English names (nameFinder). I would recommend openLNP or Stanbol. In the meantime if you have just a few queries I have an NLP I've implemented in C# in my apps section at http://www.augmentedintel.com/apps/csharpnlp/extract-names-from-text.aspx.
Best,
Don
You're trying to implement a named-entity recognition. The bad news is that it's really hard.
You could try Lingua::EN::NamedEntity, however:
$ perl -MLingua::EN::NamedEntity -nE 'say $_ for map { $_->{class} eq "person" ? $_->{entity} : () } extract_entities($_)' names.txt
Louie
Louis Anthony Ladnier
Louie
John E
Bayou La Batre Police Department
Gloria
Julian
Cindy
Eddie Ladnier
Eddie
John
Catholic Church
Christian Rubio
Dominic Ladnier
Burma Road Mobile
Louie
You can also use Calais, a Reuters webservice for natural language processing, which offers a lot better results:
I think you want to Google something like:
perl part of speech tagging
Question: How would one write a function to check and return whether or not a string (NSString) contains a valid zip code worldwide.
Additional info: I am aware of RegEx in iOS. However I am not so fluent at it. Please keep in mind this should accepts anything valid in any country as true.
Examples
US - "10200"
US - "33701-4313"
Canada - "K8N 5W6"
UK - "3252-322"
etc.
Edit: Those who voted down or to close the question, please do mention why. Thank you.
^[ABCEGHJKLMNPRSTVXY]\d[A-Z][- ]*\d[A-Z]\d$
Matches Canadian PostalCode formats with or without spaces (e.g., "T2X 1V4" or "T2X1V4")
^\d{5}(-\d{4})?$
Matches all US format ZIP code formats (e.g., "94105-0011" or "94105")
(^\d{5}(-\d{4})?$)|(^[ABCEGHJKLMNPRSTVXY]\d[A-Z][- ]*\d[A-Z]\d$)
Matches US or Canadian codes in above formats.
UK codes are more complicated than you think: http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom
I suggest you don't do this. I've seen many websites that try to enforce zipcodes, but I've never seen one get it right. Even the name zipcode is specific to the US.
In other words:
- (BOOL)isValidZipCode: (NSString *)zip {
return YES;
}
I was originally going to write [zip length] > 0, but of course even that isn't guaranteed.
Each country that uses postcodes/zip codes usually has their own format. You are going to be hard-pressed to find a regular expression that matches any worldwide code!
You're better off adding a country picker that determines the regular expression (if any) to be used to validate the zip code.
As an aside, the postcode you have given as a UK example is not correct. A decent UK regex is:
^(^gir\\s0aa$)|(^[a-pr-uwyz]((\\d{1,2})|([a-hk-y]\\d{1,2})|(\\d[a-hjks-uw])|([a-hk-y]\\d[abehmnprv-y]))\\s\\d[abd-hjlnp-uw-z]{2}$)$
I want to store long description in sqlite database manager in iphone like this data.
"The Golden Temple: The Golden Temple, popular as Sri Harmandir Sahib or Sri Darbar Sahib, is the sacred seat of Sikhism. Bathed in a quintessential golden hue that dazzles in the serene waters of the Amrit Sarovar that lace around it, the swarn mandir (Golden temple) is one that internalizes in the mindscape of its visitors, no matter what religion or creed, as one of the most magnificent House of Worship. On a jewel-studded platform is the Adi Grantha or the sacred scripture of Sikhs wherein are enshrined holy inscriptions by the ten Sikh gurus and various Hindu and Moslem saints. While visiting the Golden Temple you need to cover your head. Street sellers sell bandanas outside the temple at cheap prices."
I am trying to take as description (VARCHAR(5000)) but when i execute query it is showing half text with dotted (....) like that http://i.stack.imgur.com/gyMqi.png
Thanks
The ... surely indicate that the full text is present in the database. It also indicates that "Sqlite database browser" truncates past a certain length:
m_textWidthMarkSize = s.value("prefs/sqleditor/textWidthMarkSpinBox", 60).toInt();
Is there a way to change the settings?
Edit
You can verify that the text is fully saved with the following query (replace theTable with the correct table name):
select length(description) from theTable;