Where can I download Dundee Corpus? - corpus

Dundee Corpus (Kennedy et al., 2003) is an open eye-tracking corpus with tokenization and measures similar to the Dundee Treebank (Barrett et al., 2015). The corpus contains eye-tracking recordings of ten native English-speaking subjects reading 20 newspaper articles from The Independent.
But I cannot find this data from the Internet. Can anybody tell me where I can download this dataset or offer it to me?
[Kennedy et al., 2003] Alan Kennedy, Robin Hill, and Jo¨el
Pynte. The dundee corpus. Proceedings of the 12th European
conference on eye movement, 2003.
[Barrettetal.2015] Maria Barrett, Zˇeljko Agic ́, and Anders Søgaard. 2015. The dundee treebank. In The 14th International Workshop on Treebanks and Lin- guistic Theories (TLT 14).

Because of licensing restrictions, I don't think it's freely available. As a close approximation, you can download syntactic trees built off of it, like: http://www.ling.ohio-state.edu/golddundee/#kennedy

Related

high-performance computing (HPC) for language technology in EU

I am looking to compile list of HPC open to researchers in the field of language technology.
Can you please point me to the HPC that are available to researchers and small and medium enterprises.
For example,
LEONARDO
https://www.lumi-supercomputer.eu/

What is the algorithm under the function seqefsub?

I wonder what is the underliyng algorithm implemented in the function seqefsub. In the book chapter "Exploratory mining of life event histories", and I found this:
Efficient algorithms for extracting frequent subsequences have been pro-
posed in the literature among which the prominent ones are those of Bettini
et al. (1996), Srikant and Agrawal (1996), Mannila et al. (1997) and Zaki
(2001). The algorithm implemented in
TraMineR
is an adaptation of the
prefix-tree-based search described in Masseglia (2002).
However, the last reference is a PhD thesis in French. Is there any reference (in English) about this algorithm?
Thanks!
Victor

Same feature value for all corresponding feature

I have tried to set the sentence as a feature for each Me_UnitSpacing. But I'm getting the same sentence value for all the occurence of the Me_UnitSpacing
Sample Code:
DECLARE LOWERCAMELCASE,UPPERCAMELCASE;
DECLARE ME_UNITSPACING(STRING sentence, STRING replace,STRING description);
Document{-> RETAINTYPE(SPACE)};
SW CW{->MARK(LOWERCAMELCASE,1,2)};
CW CW{->MARK(UPPERCAMELCASE,1,2)};
Document{-> RETAINTYPE};
LOWERCAMELCASE{REGEXP("mmHg")->MARK(ME_UNITSPACING)};
UPPERCAMELCASE{REGEXP("MmHg")->MARK(ME_UNITSPACING)};
W{REGEXP("Mmhg",true)->MARK(ME_UNITSPACING)};
DECLARE UnitspacingSENTENCE;
SENTENCE{CONTAINS(ME_UNITSPACING)->UnitspacingSENTENCE};
STRING unitspacingsent;
UnitspacingSENTENCE{->MATCHEDTEXT(unitspacingsent)};
ME_UNITSPACING{->ME_UNITSPACING.sentence=unitspacingsent};
Sample Input:
A number of psychological and mmHg psychiatric correlates have been found
implicated in the onset and/or repetition of NSSI behavior. Nock et al. 14
reported 9 k that more than half of the clinical adolescents they studied
met the DSM-IV criteria for an internalizing disorder, an externalizing
disorder, or a substance-related disorder, with a prevalence mmHg rate of
psychiatric pathologies estimated to be as high as 87%. In a large
community-based sample of 12,068 adolescents from 11 countries, Brunner et
al. (2014) found significant associations mmHg with symptoms of depression
and anxiety in adolescents who engaged in self-harming behavior 6, and they
emphasized that self-injury is strongly indicative of psychological
problems that require professional attention. Their results are consistent
with previous reports of a significantly higher rate of depressive and
anxious symptoms in self-injurers.5,15,16,17,18,19 The onset of NSSI
behavior in teenagers with depression is mainly attributable to the
function of NSSI as a way to seek relief from the depressive symptoms. 20
The literature generally stresses the broad variety of psychiatric problems
seen in mmHg teenagers with history of NSSI. Cluster B personality
disorders are often identified, especially in self-cutting adolescent
females, and so are eating disorders; approximately one in three
adolescents with eating disorders are also self-injurers, the NSSI
frequently coinciding with or following the eating disorder 21, 22.
DECLARE LOWERCAMELCASE,UPPERCAMELCASE;
DECLARE ME_UNITSPACING(STRING sentence, STRING replace,STRING description);
Document{-> RETAINTYPE(SPACE)};
SW CW{->MARK(LOWERCAMELCASE,1,2)};
CW CW{->MARK(UPPERCAMELCASE,1,2)};
Document{-> RETAINTYPE};
LOWERCAMELCASE{REGEXP("mmHg")->MARK(ME_UNITSPACING)};
UPPERCAMELCASE{REGEXP("MmHg")->MARK(ME_UNITSPACING)};
W{REGEXP("Mmhg",true)->MARK(ME_UNITSPACING)};
DECLARE UnitspacingSENTENCE;
SENTENCE{CONTAINS(ME_UNITSPACING)->UnitspacingSENTENCE};
BLOCK(foreach)UnitspacingSENTENCE{}
{
STRING unitspacingsent;
UnitspacingSENTENCE{->MATCHEDTEXT(unitspacingsent)};
ME_UNITSPACING{->ME_UNITSPACING.sentence=unitspacingsent};
}

Recommendation Algorithm for suggesting job to workers(Crowdsourcing platform)

I have crawled MTurk website. and I have 260 Hits as a dataset and from this dataset particular number of users has selected Hits and assigned ratings to each selected Hits. now I want to give recommendation to these users on basis of their selection. How it is possible ? Can anyone recommend me any recommendation algorithm ?
It sounds that You should go for the one of the Collaborative Filtering (CF) algorithm as users have explicit feedback in a form of ratings. First, I would suggest implementing a simple item/user-based k-Nearest Neighbours algorithm. If the results do not satisfy You and maybe Your data is very sparse - probably matrix factorization techniques should do the trick. A good recently survey which I read was [1] - it presents the different methods on different data settings.
If You fill fill comfortable with this and You realize that what You need is actually ranked list of Top-N predictions than ratings, I would suggest reading about e.g. Bayesian Personalized Ranking[2].
And the best part is - those algorithms are really well known and most of them are available for almost every programming language, e.g. python -> https://github.com/Mendeley/mrec/
[1] J. Lee, M. Sun, and G. Lebanon, “A Comparative Study of Collaborative Filtering Algorithms,” ArXiv, pp. 1–27, 2012.
[2] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-thieme, “BPR : Bayesian Personalized Ranking from Implicit Feedback,” in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 2009, vol. cs.LG, pp. 452–461.

iPhone-SDK: Remove uneccessary white spaces from big paragraph string?

I want to remove unnecessary white spaces from the big paragraph string mentioned below.
I tried removing it by stringByTrimmingCharactersInSet and using replaceOccurrencesOfString and all. No success. Can someone please see my paragraph string and provide me the code snippet which can replace all the unnecessary white spaces and make it worth read.
Paragraphs String starts from below ------------------------------------------
World wealth down 11 pct, fewer millionaires - report
Top News
World wealth down 11 pct, fewer millionaires - report
11:29 AM IST
By Joe Rauch
NEW YORK (Reuters) - The 2008 global recession caused the first worldwide contraction in assets under management in nearly a decade, according to a study that found wealth dropped 11.7 percent to $92.4 trillion.
A return to 2007 levels of wealth will take six years, according to a Boston Consulting Group study that examined assets overseen by the asset management industry.
North America, particularly the United States, was the hardest hit region, reporting a 21.8 percent decline in wealth firms' assets under management to $29.3 trillion, primarily because of the beating U.S. equities investments took in 2008.
Also hit hard were off-shore wealth centers, like Switzerland and the Caribbean, where assets declined to $6.7 trillion in 2008 from $7.3 trillion in 2007, an 8 percent drop.
The downturn has "shattered confidence in a way we have not seen in a long time," said Bruce Holley, senior partner and managing director at BCG's New York office.
The study forecasts that wealth management firms' assets under management will not return to 2007 levels, $108.5 trillion, until 2013, a six-year rebound.
Europe posted a slightly higher $32.7 trillion of assets under management, edging out North America for the wealthiest region, though the total wealth in region dropped 5.8 percent.
Latin America was the only region to report a gain in assets under management, posting a 3 percent uptick from $2.4 trillion in 2007 to $2.5 trillion in 2008.
MILLIONAIRE ... NOT
The economy's retreat also pounded millionaires who made risky investments during the economic boom.
The number of millionaires worldwide shrank 17.8 percent to 9 million, the BCG study found.
Europe and North America were hardest hit in that regard, posting 22 percent declines. The United States still boasts 3.9 million millionaires, the highest population on the globe.
Singapore had the highest density of millionaires at 8.5 percent of the population. Other countries included Switzerland, at 6.6 percent, Kuwait, at 5.1 percent, United Arab Emirates, at 4.5 percent, and the United States, at 3.5 percent.
----------------- Paragraph string ends just above ------------------------------------
Please see my answer to your identical previous question:
iPhone-SDK:Remove white spaces from a paragraph string?