What's a Good package for Phonetic Representation for Various Human Languages? - perl

I'm currently working on a project for which I think being able to come up with phonetic representations of words in various languages would be really helpful. I know Aspell does this pretty well, but I don't think there's a very easy way to get at their phonetic representations, so I ask: is there some other good package for getting the phonetic representation of a word given the word and the language/dialect/accent/whatever it's coming from?
This doesn't need to be in any particular language, but if it were Perl, that would be best.
I've already tried Soundex, Metaphone, DoubleMetaphone, and everything else in Text::Phonetic, and none of that stuff was very good – definitely nowhere near as good as the stuff in Aspell.

The first thing that springs to mind is Soundex. Of course, there is a Perl module Soundex, too. While this is designed to generate a soundex "key" from input it might be useful in mapping different variants to a common key.

There is a package Text::Aspell in CPAN. Might be useful.

I you are trying to make a google style suggestion/correction system, it's not based on just phonetics or AI, but on a massive amount of user input. When a user makes a search, and doesn't click in any link but corrects the input and searches again, it gives google a lot of data about "correct" writing than a phonetics test or dictionary matching.
The main problem is in human language itself, it's not that people speak or write in a deterministic way, let alone in multiple languages.
Of course , i might be wrong, but if you need a library that let's you do this:
getLanguage(string);
I want to see that working, really.

Related

Perl module for text comparison

Can anyone suggest a Perl module which can compare two strings and return a degree to which they match? I searched CPAN extensively, and although there are similar modules like String::Approx and Data::Compare, they are not what I am looking for. Suppose I have two strings : I love you, and I boht you. I want functionality which will compare these two strings, taking into account numerous parameters, the matching of words in correct order (love as the first word in a string should not "match" love as the 4th word in the 2nd string, even though both strings have that word), words not matching but spelt almost similarly (like say love and loge), number of words, etc and return an index, say a number from 0 to 1 on a scale of 1, representing the degree of similarity between the two strings. Is there any such Perl module?
There are many such modules. Often, though, you'll have to make use of them in some special way to account for your own assumptions. Most of the string comparison tools like this just implement some algorithm for comparing one string to another. Most assume that if you have specific policy decisions to make, you'll code them yourself.
Personally, I am not sure I'd recommend Text::Levenshtein because of bugs and lack of ut8 support. I don't have a better recommendation either, though.
However, these searches will reveal lots of potential modules you could look into and determine what works best for your purpose (based on the names of common algorithms for doing this sort of thing):
https://metacpan.org/search?q=levenshtein
https://metacpan.org/search?q=wagner+fischer
https://metacpan.org/search?q=edit+distance
If you're interested in spoken similarities, you can also look into phonetic comparisons:
https://metacpan.org/search?q=phonetic
https://metacpan.org/search?q=soundex
https://metacpan.org/search?q=metaphone

How can ported code be detected?

If you port code over from one language to another, how can this be detected?
Say you were porting code from c++ to Java, how could you tell?
What would be the difference between a program designed and implemented in Java, and a near identical program ported over to Java?
If the porting is done properly (by people expert in both languages and ready to translate the source language's idioms into the best similar idioms of the target language), there's no way you can tell that any porting has taken place.
If the porting is done incompetently, you can sometimes recognize goofily-transliterated idioms... but that can be hard to distinguish from people writing a new program in a language they know little just goofily transliterating the idioms from the language they do know;-).
Depending on how much effort was put into the intention to hide the porting it could be very easy to impossible to detect.
I would use pattern recognition for this task. Think about the "features" which would indicate code-similarities. Extract these feature from each code and compare them.
e.g:
One feature could be similar symbol names. Extract all symbols using ctags or regular expressions, make all lower-case, make uniq sort of both lists and compare them.
Another possible feature:
List of class + number of members e.g:
MyClass1 10
...
List of method + sequence of controll blocks. e.g:
doSth() if, while, if, ix, case
...
Another easy way, is to represent the code as a picture - e.g. load the code as text in Word and set the font size to 1. Human beings are very good on comparing pictures. For another Ideas of code Visualization you may check http://www.se-radio.net/2009/03/episode-130-code-visualization-with-michele-lanza/

Can aptitude for learning Programming paradigms be influenced by culture or native language's grammar? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
It is well known that different people have different aptitudes regarding various programming paradigms (e.g. some people have trouble learning non-procedural, especially functional languages. Some people have trouble understanding pointers - see Joel Spolsky's blog for musings on that. Some people have trouble grasping recursion).
I was recently reading about a study that looked at how the grammar of someone's native language affected their speed of learning math. Can't find that article now but a quick googling found this reference.
That led me to wondering whether someone's native culture or first language might affect their aptitude towards various programming paradigms. I'm more curious about positive influences - e.g. some trait that make it easier/faster for someone to learn a particular paradigm, for example native language grammar being very recursion-oriented.
To be clear, I'm looking for how culture/language grammare may affect the difference between aptitude of the same person towards various paradigms as opposed to how it affects overall aptitude towards programming between different persons.
Important: the only answers I'm interested in are either references to scientific studies, or personal observations from someone intimately familiar with a particular culture/language, including from their own experience.
E.g. I'm not interested in your opinion of how Chinese being your first language affects anything unless you speak Chinese or worked with extremely large set of Chinese-native programmers extensively.
I'm OK with your guesstimates not based on scientific studies, but please be sure to supply your reasoning about plausible causes of your observation.
I'm not interested in culture-bashing (any such commends will be deleted or flagged for deletion).
I'm also not particularly interested in culture-building - we all know Linus is from Finland and Tetris was written in Russia and Larry Wall is an American. Any culture/nation can produce a brilliant mind in any discipline. I'm interested in averages.
Disclaimer: I was a Cultural Anthropologist before I got into programming, so you know I'm going to be on a high horse, here.
Obviously, a person's history will have an impact on their aptitude for any particular task, but I think this has less to do with the structure or grammar of a person's language than it does with the particular material conditions of the culture in which that language is spoken.
For example, a pair of Anthropologists in the 60's went to various African communities and tested people's susceptibility to various optical illusions. Here is a classic one:
In this illusion, the bottom line looks longer, because the angled lines connecting it make it appear to be off in the distance.
These Anthropologists found that in many African cultures, the illusion doesn't work at all - people consider the lines to be the same length. By refining their study, they found that the only people who were susceptible to the illusion were people who had grown up in an urban environment. They hypothesized that the illusion did not work on people from remote jungle environments, because these people had little or no experience with right angles and seeing things at very long distances.
My point with this is that even if you successfully found a correlation between programmers' native languages and their abilities with certain aspects of programming, you couldn't be sure that the correlation wasn't spurious. For example, you might think that Asians tend to be bad drivers, and you might even be able to demonstrate this statistically. If you then concluded, however, that "bad driving" is some sort of fundamental characteristic of Asian-ness, you would be ignoring the fact that Asians are more likely to be from Asia, and thus to have had much less experience driving cars (or even being in cars) while growing up than Westerners (and especially Americans) have had.
With programming, we might think that a particular language inhibits programming ability, and not take note of the fact that the society in which that language is spoken has much less access to computers, and thus people growing up with that language appear to have less programming aptitude or ability to understand certain programming concepts.
In short, I wouldn't give much credence to the idea that language inhibits anyone's ability to understand anything in particular. The human mind is much too flexible and adaptable for that to be true.
This seems analogous to the Sapir-Whorf Hypothesis - that the facilities of a language affect the ease which which one can cogitate about certain subjects, or in the words of the Wikipedia article:
"The linguistic relativity principle (also known as the Sapir-Whorf Hypothesis) is the idea that the varying cultural concepts and categories inherent in different languages affect the cognitive classification of the experienced world in such a way that speakers of different languages think and behave differently because of it."
( http://en.wikipedia.org/wiki/Linguistic_relativity )
While there appears to be little definitive information here, the discussions appear to be relevant to the question, and perhaps worthy of further exploration.
Just a few random thoughts. I think the influence is generally very weak and can most of the time be neglected but they do exist and sometimes they can make us feel them.
In Chinese grammar, for example, we don't quite distinguish between plural and singular forms, but I wouldn't think we Chinese have any noticeable difficulty understanding the concepts of scalar and array in Perl. The reason might be this: although we generally don't need particular suffixes or changes in form to indicate whether something is singular or plural, we do have the concepts of plural and singular and we mostly depend upon the context to tell them apart. Grammar-wise, the context in Chinese may possibly be way more important than that in those languages belonging to indo-european family. We omit a lot of things sometimes when they have already been mentioned and sometimes when we just presume that these things can be implicitly well understood by the listener. In either case, we don't need those indefinite and definite articles (a, an, the) or those relative pronouns like, that, which and who, to indicate whether they're being mentioned for the first time or yet another time again. Maybe that's partially why I feel very comfortable with Perl's default variable "$". print; chomp; split; all act upon $, which has never ever been mentioned. But this is quite subjective.
I think the Chinese language is more characterized by implicitness and fuzziness than Indo-european languages. For example, We never ever pay attention to subject verb agreement and we never ever do verbal conjugation to denote tenses. This could mean that the Chinese are inclined use a not quite so logical mode of thinking. One of my teachers onced used an example to try to generalize (or maybe over-generalize)the difference between Chinese non-logical mode of thinking and American logical mode of thinking.
If the American version of quarrelling should be this:
“I can lick you.”
“No, you can’t.”
“Yes, I can.”
“No, you can’t.”
“I can.”
“you can’t.”
“Can!”
“Can’t!”
The Chinese version (translated in English) would be something like this:
I can lick you.
How dare you!
What if I dare?
Then you try.
Try? Hm, you wait and see.
Wait and see? I’m not afraid.
Not afraid? OK. You don’t run away.
Who runs away? Come on and lick
Well, I agree that there may be some differences between Chinese way of thinking and that of other countries but the example looks like a stereotype because the Chinese may easily switch to the use of the American version. Back to the question, I think the language and culture may indeed influence a programmer's learning process in one way or another but this influence is defninitely not decidingly noticeable. Maybe because of the culture you're exposed to makes you feel a little bit uncomfortable to get used to some notions in some programming language, recursion or whatever, but time will solve it.
I was recently reading about a study that looked at how the grammar of someone's native language affected their speed of learning math. ... Important: the only answers I'm interested in are either references to scientific studies, or personal observations from someone intimately familiar with a particular culture/language, including from their own experience.
I learned a lot of maths before I started programming (enough to count as "intimately familiar"), and IMO programming is relatively easy: more tangible.
Sometimes I've wondered whether it's beneficial to know more than one human language: if you only know one language, then you might think of the words "cat" and "dog" as being values, i.e. synonymous with cat and dog objects; but if you're fluent in more than one language, then "cat" and "dog" become pointers: because for example the French words "chat" and "chien" are referring/pointing to the same objects as "cat" and "dog", and so clearly there's a distinction between the word and the object.
It's disappointing that you post the question without linking to the article which inspired it. I thought of "reverse polish notation" and wondered whether that was at all the kind of differences in "grammar" that were considered in the original study.
The reference you cite seems to rest on the assumption that making it easier helps with learning. In my understanding, there is a countereffect: without enough challange, you're not learning enough.
There are theories/studies (anyone with a link?) that development of language created crucial pressure on expanding the cerebral cortex and thus "made us human". (in very darwinistic terms: more grey matter ==> better language capabilities ==> better teamwork ==> better survival as a group). So language complexity can't be all bad for learning.
(My only qualification is being an eager follower of The Frontal Cortex blog, so take this with a grain of salt.)
In german we have a strange ordering of numbers: 10^0 and 10^1 positions are switched, but others are normal, (e.g. 25 is 'five and twenty', 125 is 'one hundred five and twenty'). It's been claimed that this makes learning numbers harder, and thus german should adopt a more intuitive ordering.
I guess that it helps a lot with doing additions in your head - at least if you stay below 100 or 200 - You can first add the 10^0 position and already say it / write it down while taking any carry into account for the 10^1 position.
(That doesn't continue for 10^2, I guess that would be done in writing by the majority anyway)
Also: abstractions. There are languages where numbers aren't abstracted from objects, "two coconuts" and "two sabretooth tigers" don't share a common "two" word / concept. Such a language would probably be very bad for developing math skills. Here the abstraction (separating number and object) in language is important.
Generally, I'd say the language has a strong effect on shaping a developing mind, and I see no reason why this should not extend to culture.
Of course it's still open what would be the "right kind of complexity" - for what, and how particular language features affect general improvement vs. establishment of an elite (i.e. "sharpening the skills of the gifted, while hampering the rest").
Interesting Question, no doubt - looking forward to other replies.

The future of Perl? (Perl 6, employability)

I've found a few related questions, like Python vs. Perl (now deleted) and Is Perl Worth it? (now deleted), but I can't seem to find anything that directly addresses this question.
Is there a legitimate future in Perl? I work in a Perl shop right now, and I came from PHP so I see some of the advantages of an arguably "lower" level language when doing things on the server-level, but it seems to me a lot of the tasks in Perl can be performed more quickly in PHP, and SOME ARGUE (subjective, not my opinion) that Python does these tasks in a more explicit way that's easier to maintain.
Is having this job on my resume ultimately going to make me less employable, especially if the language no longer grows?
A few notes:
I love Perl, so don't think I'm bashing the language. It's fun to use and we use a fairly verbose syntax that is relatively easy to maintain.
I realize that "Vaporware" is a buzzword that isn't necessarily applicable to this situation, because Perl doesn't have a marketing department and they're not "promising" Perl 6 by any date.
I realize that CPAN keeps the community going, so whether Perl 6 comes out or not people continue to build modules that increase possibilities in the language, but that doesn't mean that industry shops realize this, and switch to "more supported" languages that keep coming out with revised versions of the language like Python and (especially) PHP.*
EDIT {CLARIFICATION}
Cade Roux and Telemachus both brought up good points about whether or not your future can be defined by your resume.
To be honest, this was brought up when one of my former employers said "I don't hire anyone with Perl as their last job. That's OLD technology." This was a PHP shop, so take all that with a grain of salt.
Now without defaming my former employer, she's not a tech person AT ALL, so she was really expressing an opinion of a layperson, and in this case my question was more along the lines of "Is there a stigma on this particular technology placed on it by people who don't utilize it?", specifically more along the lines of people who may have had past experience with similar employers. I'm not asking you to look into the future with a magic glass to assume what the next "hot" language would be, but rather if this particular language (which is accused of stunted growth, again by laypeople) has negative connotations placed upon it.
I hope that makes a little more sense.
Plenty of shops - including on Wall Street - heavily use Perl and will continue to do so.
However, I have never seen a PHP or Python used in this industry (not saying it is not used, but that I never encountered. Purely personal anecdote. Nor have I EVER heard any conversation of "Perl can not do X that Python can, let's use Python").
Perl6 is irrelevant to job picture.
Many shops are still on 5.8 or G-d forbid 5.6
More importantly, perl5 continues to evolve, including with features/ideas from Perl6. See Perl 5.10 and 5.11
Plus evolution includes really cool framework like Moose etc...
I can probably come up with more bullets later, but the summary is that no, having a Perl job will in no way negatively affect your career prospects.
However, knowing nothing but Perl may affect it negatively, so make sure you know Java, C#, C++ or something besides dynamic interpreted languages. Not many shops would hire "Perl Only" developer, even if they gladly hire "Perl + other stuff" ones.
See Tim Bunce's Perl Myths slides on slide share.
In short, Perl is not dead and has lots of jobs available.
Anyone who actually watches the development of Perl, would know that that there has perhaps been more work on the Perl language in the past decade, than in the previous decade.
This has been spurred on by the introduction of Perl6.
The introduction of Perl 6 spurred on, the now deeply ingrained, testing culture.
Just look at how much the Rakudo implementation of Perl 6, is tested:
Rakudo Progress http://rakudo.de/progress.png
There has also been a lot of back-porting of Perl 6 features into Perl 5.
For example, the Perl 6 "switch" statement
#!/usr/bin/perl
use strict;
use warnings;
use 5.10.1;
# or
use feature qw'switch say';
my $str = "testing 123";
given( $str ){
when(/(\d+)/){
say $1;
}
when( [0..10] ){
say $_, 'is equal to some number between 0 and 10';
# given, sets the current topic "$_"
}
}
There are few languages I would tie my career to. Perl will always be there and it will always be the best tool for certain kinds of jobs. But this is true for many languages. However, there are also languages which have more competition in some of the spaces where they are used. Perl is one language that has a lot more strong niches.
Still, you wouldn't restrict yourself to using just one language for your entire life - or even in one project if there are better options to solve a problem.
Career-wise, there are basic technologies which are fairly universally used, and of these I think a few of the most valuable are: relational database concepts and SQL, XML/HTML/HTTP/DOM, regular expressions. These are all basically independent of any particular vendor or language, and if you are strong in these areas, choice of language and platform are going to be informed by the problem being addressed.
Perl is, and always will be, a practical language for manipulating large amounts of data. I work in an industry where moving, converting, and parsing large amounts of text and image data is what we do, and I couldn't live without Perl.
Likewise, if you're a sysadmin (especially a Unix one), Perl is a necessary tool. There are tons of places where you need to be able to whip up a quick and dirty application that runs right along with the shell functions.
Languages have niches. Perl has a big stable niche, in many ways much more stable than fad-driven web languages. PHP, for example, is a nice little web language, but its saving grace is that it's quick and easy to develop in, not that it is a particularly great language. I'll tend to use PHP over Perl for web applications (though I use Python over PHP, if I have time), but 90% of the stuff I do in my day-to-day would be nearly impossible in PHP, and is flat trivial in Perl.
#Nate: I love Python. LOVE it. I actually worry that I love it too much, and I'm being irrational about it. PHP is a nice tool, but when your main selling point is "Quick and Easy" then you're running a risk. That was the big push behind original Visual Basic, and we all know how that worked out.
I'd discourage you from putting Perl on your resume - there's already too many people in the perl market and we don't want any more! ... just kidding.
The past is supposedly no guide to the future, but, despite having plenty of C (etc.) and Java in my 'skills toolbag' I've seen more gainful employ from my Perl than anything else over the last decade.
I suspect that offshore-perl-new-build may not be the biggest market in the future, but there's certainly active development in the city and media industries in the UK.
Otherwise, I'd just agree with the points above. Technicians with diverse skills are more able to pick the right tools, and less inclined to 'get religious' about language choice.
If you're looking at a post where the non-technical management have a strong point of view about what technology should and shouldn't be used - I'd place that one in the 'avoid' pile.
To add another separate answer - as you have noted - there is a very real danger when dealing with recruiters and others that your resume will be interpreted and things inferred that are not necessarily how you see yourself, and you might get pigeon-holed.
This WILL happen both ways - too much variation and you aren't an expert in anything OR too little variation and you are only good at one thing.
I don't have a simple answer for combatting that, except to ensure that you emphasize portable skills and also achievements which are independent of technology - making the company more money, landing new business, making new markets, etc.
Perl is another tool in your toolbox. If I have an opening and one person is narrow focused to a specific technology, and another has a broad range of skills I would be more inclined to hire the one with the wider range of skills even if they might not be quite as deeply knowledgeable. Some one who has a wide range of skills on a range of platforms is someone who can think, innovate and adapt.
I don't understand the point of this question. You have a job and you already know Perl. You can ask whether or not to learn new languages and which ones to learn (please don't, but you could), but none of us can or should predict whether or not you're going to get another job using Perl.
You ask, "Is having this job on my resume ultimately going to make me less employable, especially if the language no longer grows?"
Well, it's better than a blank resume, and you can't change your past, so really what are we talking about here?

How can I combine Catalyst and ngettext?

I'm trying to get my head around i18n with Catalyst. As far as I understood the matter, there are two ways to make translations with Perl: Maketext and Gettext. However, I have a requirement to support gettext's .po format so basically I'm going with gettext.
Now, I've found Catalyst::Plugin::I18n and thus Locale::Maketext::Lexicon, which does what I want most of the time. However, it doesn't generate proper pluralization forms, i.e. properly writing msgid_plural and msgstr[x] into the .pot file. This happens probably because Maketext depends on its bracket notation [quant,_1...] and thus has to have the same notation in the translation.
Yet another solution might be using some direct gettext port like Locale::Messages, however this would mean rewriting C::P::I18n.
Does anybody have a proper solution for this problem apart from rewriting several modules? Anything that combines proper gettext with all its features and Catalyst will do.
You will probably get a better answer on the mailing list:
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
I assume you've also read this:
http://www.catalystframework.org/calendar/2006/18