iPhone-SDK: Remove uneccessary white spaces from big paragraph string? - iphone

I want to remove unnecessary white spaces from the big paragraph string mentioned below.
I tried removing it by stringByTrimmingCharactersInSet and using replaceOccurrencesOfString and all. No success. Can someone please see my paragraph string and provide me the code snippet which can replace all the unnecessary white spaces and make it worth read.
Paragraphs String starts from below ------------------------------------------
World wealth down 11 pct, fewer millionaires - report
Top News
World wealth down 11 pct, fewer millionaires - report
11:29 AM IST
By Joe Rauch
NEW YORK (Reuters) - The 2008 global recession caused the first worldwide contraction in assets under management in nearly a decade, according to a study that found wealth dropped 11.7 percent to $92.4 trillion.
A return to 2007 levels of wealth will take six years, according to a Boston Consulting Group study that examined assets overseen by the asset management industry.
North America, particularly the United States, was the hardest hit region, reporting a 21.8 percent decline in wealth firms' assets under management to $29.3 trillion, primarily because of the beating U.S. equities investments took in 2008.
Also hit hard were off-shore wealth centers, like Switzerland and the Caribbean, where assets declined to $6.7 trillion in 2008 from $7.3 trillion in 2007, an 8 percent drop.
The downturn has "shattered confidence in a way we have not seen in a long time," said Bruce Holley, senior partner and managing director at BCG's New York office.
The study forecasts that wealth management firms' assets under management will not return to 2007 levels, $108.5 trillion, until 2013, a six-year rebound.
Europe posted a slightly higher $32.7 trillion of assets under management, edging out North America for the wealthiest region, though the total wealth in region dropped 5.8 percent.
Latin America was the only region to report a gain in assets under management, posting a 3 percent uptick from $2.4 trillion in 2007 to $2.5 trillion in 2008.
MILLIONAIRE ... NOT
The economy's retreat also pounded millionaires who made risky investments during the economic boom.
The number of millionaires worldwide shrank 17.8 percent to 9 million, the BCG study found.
Europe and North America were hardest hit in that regard, posting 22 percent declines. The United States still boasts 3.9 million millionaires, the highest population on the globe.
Singapore had the highest density of millionaires at 8.5 percent of the population. Other countries included Switzerland, at 6.6 percent, Kuwait, at 5.1 percent, United Arab Emirates, at 4.5 percent, and the United States, at 3.5 percent.
----------------- Paragraph string ends just above ------------------------------------

Please see my answer to your identical previous question:
iPhone-SDK:Remove white spaces from a paragraph string?

Related

Compute capability of a small (1mm^2) ASIC

I was watching a recent ACM Turing Lecture by Hennessy and Patterson and was intrigued by a stat they cited on the cost of small chip tape-outs. They claimed that you can tape-out 100 1 mm x 1mm chips at 28 nm process node for $14,000, presumably on a test shuttle.
My question is, if I wanted to fill this chip area with MAC units (say 16 or 32 bit), how many simultaneous MACs could I do per cycle?
Just as a back of the envelope calculation, this paper describes a 32x32->64 multiplier as being 435um*482um in Synopsys' 90nm educational technology. If you just trivially scale to 28nm, you get 0.02mm^2 per instance. That's probably within an order of magnitude, which is good enough because "multipliers per mm" isn't really a meaningful metric: the interesting part is how to get data into and and out of such a multiplier array, which will dominate the area of the actual multipliers.
For another reference, the FU540-C000 is 30mm^2 in TSMC's 28nm HPC process. Yunsup's HotChips presentation from last year shows a fairly detailed die plot on page 17, from which you can calculate what 1mm^2 gets you on a modern technology -- it's quite a bit of SRAM/logic, but not many pads.

Where can I download Dundee Corpus?

Dundee Corpus (Kennedy et al., 2003) is an open eye-tracking corpus with tokenization and measures similar to the Dundee Treebank (Barrett et al., 2015). The corpus contains eye-tracking recordings of ten native English-speaking subjects reading 20 newspaper articles from The Independent.
But I cannot find this data from the Internet. Can anybody tell me where I can download this dataset or offer it to me?
[Kennedy et al., 2003] Alan Kennedy, Robin Hill, and Jo¨el
Pynte. The dundee corpus. Proceedings of the 12th European
conference on eye movement, 2003.
[Barrettetal.2015] Maria Barrett, Zˇeljko Agic ́, and Anders Søgaard. 2015. The dundee treebank. In The 14th International Workshop on Treebanks and Lin- guistic Theories (TLT 14).
Because of licensing restrictions, I don't think it's freely available. As a close approximation, you can download syntactic trees built off of it, like: http://www.ling.ohio-state.edu/golddundee/#kennedy

Encoding of the Canadian PostBar barcodes

I am working on a software to encode postal addresses using the PostBar barcode symbology in use in Canada.
I can't find the relevant information for these codes. Wikipedia does describe PostBars, but with a caveat saying that the article is about the D12 type, whereas the Canadian Post actually uses the types D52.01/D82.01/S52.40 and S82.39, which are different and undocumented. (I also know the "CANADA POST CORPORATION 4-STATE BAR CODE HANDBOOK" document, which doesn't help.)
I need the specifics of the encoding of the fields (DCI, Postal Code, Address Locator...) and the parameters of Reed-Solomon parity bits.
I am not after an implementation, which I am able to craft myself. Thank you in advance for any tip.
This is the only thing I could find on the subject. It is not much, I'm afraid:
https://en.wikipedia.org/wiki/Canada_Post#Barcodes
Canada Post uses a 13 character barcode for their pre-printed labels. Bar codes consist of two letters, followed by eight sequence digits, and a ninth digit which is the check digit. The last two characters are the letters CA. The check digit seems to ignore the letters and only concern itself with the first 8 numeric digits. The scheme is to multiply each of those 8 digits by a different weighting factor, (8 6 4 2 3 5 9 7). Add up the total of all of these multiplications and divide by 11. The remainder after dividing by 11 gives a number from 0 to 10. Subtracting this from 11 gives a number from 1 to 11. That result is the check digit, except in the two cases where it is 10 or 11. If 10 it is then changed to a 0, and if 11 then it is changed to a 5. The check digit may be used to verify if a barcode scan is correct, or if a manual entry of the barcode is correct.
And as bonus, an explanation of the barcodes, in Dutch:
https://www.postnl.nl/Images/Brochure-KIX-code-van-PostNL_tcm10-10210.pdf
I don't think we ( Canada Post ) use PostBar anymore. Management made adoption too much of a pain for the mailer so it died. I haven't seen one on an envelope in years. Now that OCR tech is so good it wouldn't help that much to include a PostBar anyway.
What they should have done is given away software that printed up the address labels in alpha-numeric order of the postal code and printed a bunch of positional marks on the top fold of the envelope based on that same postal code. That way a postal clerk need not even take the mail out of the box to see where it should be shipped to. LVM's (large volume mailers) would do this for a rebate on their bill.
Ase for smaller businesses or the general public we should have just soled them prepaid envelopes in 2 or 3 standard sizes for a dime less than the cost of a stamp alone. A standard envelop can have a dedicated spot for a machine readable postal code. I would have gone with good old public-domain Braille! printed or in sharpy:-) Oh well I'm rambling now I'll stop.

Same feature value for all corresponding feature

I have tried to set the sentence as a feature for each Me_UnitSpacing. But I'm getting the same sentence value for all the occurence of the Me_UnitSpacing
Sample Code:
DECLARE LOWERCAMELCASE,UPPERCAMELCASE;
DECLARE ME_UNITSPACING(STRING sentence, STRING replace,STRING description);
Document{-> RETAINTYPE(SPACE)};
SW CW{->MARK(LOWERCAMELCASE,1,2)};
CW CW{->MARK(UPPERCAMELCASE,1,2)};
Document{-> RETAINTYPE};
LOWERCAMELCASE{REGEXP("mmHg")->MARK(ME_UNITSPACING)};
UPPERCAMELCASE{REGEXP("MmHg")->MARK(ME_UNITSPACING)};
W{REGEXP("Mmhg",true)->MARK(ME_UNITSPACING)};
DECLARE UnitspacingSENTENCE;
SENTENCE{CONTAINS(ME_UNITSPACING)->UnitspacingSENTENCE};
STRING unitspacingsent;
UnitspacingSENTENCE{->MATCHEDTEXT(unitspacingsent)};
ME_UNITSPACING{->ME_UNITSPACING.sentence=unitspacingsent};
Sample Input:
A number of psychological and mmHg psychiatric correlates have been found
implicated in the onset and/or repetition of NSSI behavior. Nock et al. 14
reported 9 k that more than half of the clinical adolescents they studied
met the DSM-IV criteria for an internalizing disorder, an externalizing
disorder, or a substance-related disorder, with a prevalence mmHg rate of
psychiatric pathologies estimated to be as high as 87%. In a large
community-based sample of 12,068 adolescents from 11 countries, Brunner et
al. (2014) found significant associations mmHg with symptoms of depression
and anxiety in adolescents who engaged in self-harming behavior 6, and they
emphasized that self-injury is strongly indicative of psychological
problems that require professional attention. Their results are consistent
with previous reports of a significantly higher rate of depressive and
anxious symptoms in self-injurers.5,15,16,17,18,19 The onset of NSSI
behavior in teenagers with depression is mainly attributable to the
function of NSSI as a way to seek relief from the depressive symptoms. 20
The literature generally stresses the broad variety of psychiatric problems
seen in mmHg teenagers with history of NSSI. Cluster B personality
disorders are often identified, especially in self-cutting adolescent
females, and so are eating disorders; approximately one in three
adolescents with eating disorders are also self-injurers, the NSSI
frequently coinciding with or following the eating disorder 21, 22.
DECLARE LOWERCAMELCASE,UPPERCAMELCASE;
DECLARE ME_UNITSPACING(STRING sentence, STRING replace,STRING description);
Document{-> RETAINTYPE(SPACE)};
SW CW{->MARK(LOWERCAMELCASE,1,2)};
CW CW{->MARK(UPPERCAMELCASE,1,2)};
Document{-> RETAINTYPE};
LOWERCAMELCASE{REGEXP("mmHg")->MARK(ME_UNITSPACING)};
UPPERCAMELCASE{REGEXP("MmHg")->MARK(ME_UNITSPACING)};
W{REGEXP("Mmhg",true)->MARK(ME_UNITSPACING)};
DECLARE UnitspacingSENTENCE;
SENTENCE{CONTAINS(ME_UNITSPACING)->UnitspacingSENTENCE};
BLOCK(foreach)UnitspacingSENTENCE{}
{
STRING unitspacingsent;
UnitspacingSENTENCE{->MATCHEDTEXT(unitspacingsent)};
ME_UNITSPACING{->ME_UNITSPACING.sentence=unitspacingsent};
}

error while importing txt file into mallet

I have been having trouble converting some txt files to mallet. I keep getting:
Exception in thread "main" java.lang.IllegalStateException: Line #39843 does not match regex:
and the Line#39843 reads:
24393584 |Title Validation of a Danish version of the Toronto Extremity Salvage Score questionnaire for 
patients with sarcoma in the extremities.The Toronto Extremity Salvage Score (TESS) questionnaire is a selfadministered questionnaire designed to assess physical disability in patients having undergone surgery of the extremities. The aim of this study was to validate a Danish translation of the TESS. The TESS was translated according to international guidelines. A total of 22 consecutive patients attending the regular outpatient control programme were recruited for the study. To test their understanding of the questionnaires, they were asked to describe the meaning of five randomly selected questions from the TESS. The psychometric properties of the Danish version of TESS were tested for validity and reliability. To assess the testretest reliability, the patients filled in an extra TESS questionnaire one week after they had completed the first one. Patients showed good understanding of the questionnaire. There was a good internal consistency for both the upper and lower questionnaire measured by Cronbach's alpha. A BlandAltman plot showed acceptable limits of agreement for both questionnaires in the testretest. There was also good intraclass correlation coefficients for both questionnaires. The validity expressed as Spearman's rank correlation coefficient comparing the TESS with the QLQC30 was 0.89 and 0.90 for the questionnaire on upper and lower extremities, respectively. The psychometric properties of the Danish TESS showed good validity and reliability. not relevant.not relevant.
This happens for a quite a few of the lines and when I remove the line, the rest of the file
is imported into mallet. What regex expression in this line could be the problem?
thanks,
Priya
Mallet has problems handling certain machine symbols, because of bad programming. Try running
tr -dc [:alnum:][\ ,.]\\n < ./inputfile.txt > ./inputfilefixed.txt
before running mallet. This will remove all non-alphanumerical symbols, which usually solves the problem for me.