I'd like to translitterate some Unicode characters in the most generic way possible, but I'm stuck with the generic currency sign, "¤".
I was thinking of translitterating other currencies to their ISO 4217 3-letter code, so for example:
€ => EUR
¥ => JPY
etc.
There are 2 codes that could correspond to "¤":
XTS: "Codes specifically reserved for testing purposes"
XXX: "The codes assigned for transactions where no currency is involved"
However, I don't know which one fits best.
Any idea?
Source: ISO 4217
I think I have found my answer: XTS.
Indeed, in the french version of the Wikipedia page of ISO 4217, there are more details:
XTS : code réservé pour effectuer des essais (aucune transaction contractuelle effective, devise inconvertible, aucune opération de change autorisée, aucun prélèvement de frais de transaction) ;
XXX : code réservé pour des transactions contractuelles effectuées sans devise associée (par exemple transfert d’informations sur les caractéristiques non monétaires d’un compte, d’un contrat ou d’une transaction, taux de change nul, mais prélèvement de frais de transaction associés possible dans une autre devise).
which can be translated to:
XTS : code reserved for testing purposes (no effective contractual transaction, inconvertible currency, no foreign exchange operation allowed, no direct debit of transaction costs) ;
XXX : code reserved for contractual transactions where no currency is involved (for example information transfer about non monetary caracteristics of an account, a contract or a transaction, zero exchange rate, but possible direct debit of associated transaction costs in another currency).
Moreover, the Wikipedia page about "¤" says that it is "used to denote an unspecified currency", and if the currency is unspecified, the exchange rate is unknown, so you can't convert it.
Since XXX seems to denote a real transaction, but without currency, while XTS seems to denote a fake transaction with a fake currency, I think the latter is closer to "¤" than the former.
Related
I'm using api.ai to build a chatbot for my city (Toulouse, France) and to know when the garbage collection service is according to your address.
It works fine for every case except one :
When I type :
Quand passent les poubelles au 38 allées Jean Jaurès Toulouse ?
The resolved value for my parameter $address is just "Jean Jaurès" (it should be 38 allées Jean Jaurès Toulouse)
If I change "allées" to "rue" or "avenue", it works and I have the full address. But with "allées", no.
Something weird is that in the training panel, the resolved value shown is good.
Do you have any idea how I can fix this ?
Thanks
Sample Script:
DECLARE Name,TEST;
"Peter"->Name;
"der Groot"->Name;
"Robert"->Name;
"de Leew"->Name;
"O'Sullivan"->Name;
STRING s;
STRINGLIST slist;
Name{-> MATCHEDTEXT(s), ADD(slist,s),LOG(s)};
ANY+ {INLIST(slist)->MARK(TEST)};
Received Output:
Peter
Robert
Expected Output:
Peter
der Groot
Robert
de Leew
O'Sullivan
Sample Input:
Peter
der Groot
Robert
de Leew
O'Sullivan
I've tried to mark the stringlist value into an annotation type.But the received output is different from expected output.
The condition at the rule element ANY+ validates every single ANY, thus fails with the first one and also matches only single tokens.
Should the last rule annotate only position directly after Name annotations?
If not, the you can do something like:
Name{-> MATCHEDTEXT(s), ADD(slist,s)};
MARKFAST(TEST, slist);
If yes, the situation gets more complicated because you do not have candidates with the correct span. You cannot solve this with a combination of ANY and INLIST, You either need a correct span or fragments in the list. I'd rather recommend an additional fixing rule:
Name{-> MATCHEDTEXT(s), ADD(slist,s)};
MARKFAST(TEST, slist);
ANY{-ENDSWITH(Name)} #TEST{-> UNMARK(TEST)};
DISCLAIMER: I am a developer of UIMA Ruta
I have a batch of text files from which I am trying to remove HTML tags. The text that I want preserved in each file is between <TEXT> and </TEXT>. In some of these files, there is a second instance of <TEXT> and </TEXT> in the bottom half of the document that I want preserved as well.
HTML::Restrict works great for preserving all relevant text in the first instance, but it doesn't seem to preserve the text between the second instance of <TEXT> and </TEXT>.
My code is:
$hr = HTML::Restrict->new() ;
$processed = $hr->process($doc) ;
I can't discern any options within the HTML::Restrict module that I can tweak to ensure that the second part of the text file is preserved. Do such options exist, or is there a better way to accomplish this task? I've tried some regex, but so far I've run into a similar problem with that as well.
Below is the original file. The resulting output is everything the first instance of <TEXT> (immediately above "UNITED STATES") and the first instance of </TEXT> in the third grey box from the bottom.
-----BEGIN PRIVACY-ENHANCED MESSAGE-----
Proc-Type: 2001,MIC-CLEAR
Originator-Name: webmaster#www.sec.gov
Originator-Key-Asymmetric:
MFgwCgYEVQgBAQICAf8DSgAwRwJAW2sNKK9AVtBzYZmr6aGjlWyK3XmZv3dTINen
TWSM7vrzLADbmYQaionwg5sDW3P6oaM5D3tdezXMm7z1T+B+twIDAQAB
MIC-Info: RSA-MD5,RSA,
VlTZCBM7TRNLONv/I0OgPsjKD23uR2Zn9/jJ4XrBQY8DlPxfH2+iX+W5TZjhZEQY
shGRyuAw29phAaxb1IPhgQ==
<SEC-DOCUMENT>0001157523-06-001366.txt : 20060209
<SEC-HEADER>0001157523-06-001366.hdr.sgml : 20060209
<ACCEPTANCE-DATETIME>20060209161745
ACCESSION NUMBER: 0001157523-06-001366
CONFORMED SUBMISSION TYPE: 8-K
PUBLIC DOCUMENT COUNT: 2
CONFORMED PERIOD OF REPORT: 20060209
ITEM INFORMATION: Results of Operations and Financial Condition
ITEM INFORMATION: Financial Statements and Exhibits
FILED AS OF DATE: 20060209
DATE AS OF CHANGE: 20060209
FILER:
COMPANY DATA:
COMPANY CONFORMED NAME: ANALOG DEVICES INC
CENTRAL INDEX KEY: 0000006281
STANDARD INDUSTRIAL CLASSIFICATION: SEMICONDUCTORS & RELATED DEVICES [3674]
IRS NUMBER: 042348234
STATE OF INCORPORATION: MA
FISCAL YEAR END: 1205
FILING VALUES:
FORM TYPE: 8-K
SEC ACT: 1934 Act"
SEC FILE NUMBER: 001-07819
FILM NUMBER: 06593279
BUSINESS ADDRESS:
STREET 1: ONE TECHNOLOGY WAY
CITY: NORWOOD
STATE: MA
ZIP: 02062
BUSINESS PHONE: 7813294700
MAIL ADDRESS:
STREET 1: ONE TECHNOLOGY WAY
CITY: NORWOOD
STATE: MA
ZIP: 02062
</SEC-HEADER>
<DOCUMENT>
<TYPE>8-K
<SEQUENCE>1
<FILENAME>a5077045.txt
<DESCRIPTION>ANALOG DEVICES, INC., 8-K
<TEXT>
UNITED STATES
SECURITIES AND EXCHANGE COMMISSION
Washington, D.C. 20549
FORM 8-K
CURRENT REPORT
Pursuant to Section 13 OR 15(d) of The Securities Exchange Act of 1934
Date of Report (Date of earliest event reported): February 9, 2006
Analog Devices, Inc.
- --------------------------------------------------------------------------------
(Exact name of registrant as specified in its charter)
Massachusetts 1-7819 04-2348234
- --------------------------------------------------------------------------------
(State or other juris- (Commission (IRS Employer
diction of incorporation File Number) Identification No.)
One Technology Way, Norwood, MA 02062
- --------------------------------------------------------------------------------
(Address of principal executive offices) (Zip Code)
Registrant's telephone number, including area code: (781) 329-4700
- --------------------------------------------------------------------------------
(Former name or former address, if changed since last report)
Check the appropriate box below if the Form 8-K filing is intended to
simultaneously satisfy the filing obligation of the registrant under any of the
following provisions (see General Instruction A.2. below):
|_| Written communications pursuant to Rule 425 under the Securities Act (17
CFR 230.425)
|_| Soliciting material pursuant to Rule 14a-12 under the Exchange Act (17 CFR
240.14a-12)
|_| Pre-commencement communications pursuant to Rule 14d-2(b) under the
Exchange Act (17 CFR 240.14d-2(b))
|_| Pre-commencement communications pursuant to Rule 13e-4(c) under the
Exchange Act (17 CFR 240.13e-4(c))
<PAGE>
Item 2.02. Results of Operations and Financial Condition
On February 9, 2006, Analog Devices, Inc. announced its financial results
for the quarter ended January 28, 2006. The full text of the press release
issued in connection with the announcement is attached as Exhibit 99.1 to this
Current Report on Form 8-K.
The information in this Form 8-K and the exhibit attached hereto shall not
be deemed "filed" for purposes of Section 18 of the Securities Exchange Act of
1934 (the "Exchange Act") or otherwise subject to the liabilities of that
section, nor shall it be deemed incorporated by reference in any filing under
the Securities Act of 1933 or the Exchange Act, except as expressly set forth by
specific reference in such a filing.
EXHIBIT INDEX
Exhibit No. Description
- ----------- -----------
99.1 Press release dated February 9, 2006 issued by Analog
Devices, Inc.
</TEXT>
</DOCUMENT>
<DOCUMENT>
<TYPE>EX-99.1
<SEQUENCE>2
<FILENAME>a5077045ex99_1.txt
<DESCRIPTION>EXHIBIT 99.1
<TEXT>
Exhibit 99.1
Analog Devices Reports Results for the
First Quarter of Fiscal Year 2006
NORWOOD, Mass.--(BUSINESS WIRE)--Feb. 9, 2006--Analog Devices,
Inc. (NYSE: ADI):
-- Board of Directors declares dividend of $0.12 per share for
the quarter.
-- Financial results for the first quarter and guidance for the
second quarter to be discussed on conference call today at
4:30 pm.
Analog Devices, Inc. (NYSE: ADI), a global leader in
high-performance semiconductors for signal processing applications,
today announced revenue of $621.3 million for the first quarter of
fiscal 2006, an increase of 7% compared to the same period one year
ago and approximately even with the immediately prior quarter's $622.1
million in revenue.
CONTACT: Analog Devices, Inc.
Maria Tagliaferro,781-461-3282
Director of Corporate Communications,
781-461-3491 (fax)
investor.relations#analog.com
</TEXT>
</DOCUMENT>
</SEC-DOCUMENT>
-----END PRIVACY-ENHANCED MESSAGE-----
Since you don't really have an HTML document, you want a parser that is not thrown off by various crap thrown at it.
In the example below, I put the sample text above in the __DATA__ section of my script for convenience. In the real world, you should open the file with the appropriate encoding.
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new(handle => \*DATA);
my #text;
while (my $token = $parser->get_token) {
if ($token->is_start_tag('text')) {
push #text, $parser->get_text('/text');
}
}
print "[[[>>>$_<<<]]]\n\n" for #text;
__DATA__
This should give you all matches (tested myself):
my #text = $doc =~ /<TEXT>(.*?)<\/TEXT>/gs
I need to extract names (including uncommon names) from blocks of text using Perl. I've looked into this module for extracting names, but it only has the top 1000 popular names and surnames in the US dating back to 1990; I need something a bit more comprehensive.
I've considered using the Social Security Index to make a database for comparison, but this seems very tedious and processing intensive. Is there a way to pull names from Perl using another method?
Example of text to parse:
LADNIER Louis Anthony Ladnier, [Louie] age 48, of Mobile, Alabama died at home Friday, November 16, 2012. Louie was born January 9, 1964 in Mobile, Alabama. He was the son of John E. Ladnier, Sr. and Gloria Bosarge Ladnier. He was a graduate of McGill-Toolen High School and attended University of South Alabama. He was employed up until his medical retirement as Communi-cations Supervisor with the Bayou La Batre Police Department. He is preceded in death by his father, John. Survived by his mother, Gloria, nephews, Dominic Ladnier and Christian Rubio, whom he loved and help raise as his own sons, sisters, Marj Ladnier and Morgan Gordy [Julian], and brother Eddie Ladnier [Cindy], and nephews, Jamie, Joey, Eddie, Will, Ben and nieces, Anna and Elisabeth. Memorial service will be held at St. Dominic's Catholic Church in Mobile on Wednesday at 1pm. Serenity Funeral Home is in charge of arrangements. In lieu of flowers, memorials may be sent to St. Dominic School, 4160 Burma Road Mobile, AL 36693, education fund for Christian Rubio and McGill-Toolen High School, 1501 Old Shell Road Mobile, AL 36604, education Fund for Dominic Ladnier. The family is grateful for all the prayers and support during this time. Louie was a rock and a joy to us all.
Use Stanford's NER (GPL). Demo:
http://nlp.stanford.edu:8080/ner/process
There is no sure fire way to do this due to the nature of the English language. You either need lists to (fuzzy)compare with, or will have to settle for significant accuracy penalties.
The Apache Foundation has a few projects that cover the topic of entity extraction with specific pre-trained models for English names (nameFinder). I would recommend openLNP or Stanbol. In the meantime if you have just a few queries I have an NLP I've implemented in C# in my apps section at http://www.augmentedintel.com/apps/csharpnlp/extract-names-from-text.aspx.
Best,
Don
You're trying to implement a named-entity recognition. The bad news is that it's really hard.
You could try Lingua::EN::NamedEntity, however:
$ perl -MLingua::EN::NamedEntity -nE 'say $_ for map { $_->{class} eq "person" ? $_->{entity} : () } extract_entities($_)' names.txt
Louie
Louis Anthony Ladnier
Louie
John E
Bayou La Batre Police Department
Gloria
Julian
Cindy
Eddie Ladnier
Eddie
John
Catholic Church
Christian Rubio
Dominic Ladnier
Burma Road Mobile
Louie
You can also use Calais, a Reuters webservice for natural language processing, which offers a lot better results:
I think you want to Google something like:
perl part of speech tagging
I am trying to figure out how to parse an address using T-SQL and I suck at T-SQL. My challenge is this,
I have a table called Locations defined as follows:
- City [varchar(100)]
- State [char(2)]
- PostalCode [char(5)]
My UI has a text box in which a user can enter an address in. This address could be in the form of essentially anything (yuck, I know). Unfortunately, I cannot change this UI either. Anyways, the value of the text box is passed into the stored procedure that is responsible for parsing the address. I need to take what the person enters and get the PostalCode from the Locations table associated with their input. For the life of me, I cannot figure out how to do this. There are so many cases. For instance, the user could enter one of the following:
Chicago, IL
Chicago, IL 60601
Chicago, IL, 60601
Chicago, IL 60601 USA
Chicago, IL, 60601 USA
Chicago IL 60601 USA
New York NY 10001 USA
New York, NY 10001, USA
You get the idea. There are a lot of cases. I can't find any parsers online either. I must not be looking correctly. Can someone please point me to a parser online or explain how to do this? I'm willing to pay for a solution to this problem, but I can't find anything, I'm shocked.
Perhaps a CLR function might be a better choice than tsql. Check out http://msdn.microsoft.com/en-us/magazine/cc163473.aspx for an example of using regular expressions to parse some pretty complex string inputs into table value results. Now you get to be as creative as you please with your regex matching but the following regex should get you started:
(.*?)([A-Z]{2}),? (\d+)( USA)?$
If you're reluctant to use CLR functions, perhaps you have regex functionality in the calling system, like ASP.Net or PHP.