How can ported code be detected? - porting

If you port code over from one language to another, how can this be detected?
Say you were porting code from c++ to Java, how could you tell?
What would be the difference between a program designed and implemented in Java, and a near identical program ported over to Java?

If the porting is done properly (by people expert in both languages and ready to translate the source language's idioms into the best similar idioms of the target language), there's no way you can tell that any porting has taken place.
If the porting is done incompetently, you can sometimes recognize goofily-transliterated idioms... but that can be hard to distinguish from people writing a new program in a language they know little just goofily transliterating the idioms from the language they do know;-).

Depending on how much effort was put into the intention to hide the porting it could be very easy to impossible to detect.
I would use pattern recognition for this task. Think about the "features" which would indicate code-similarities. Extract these feature from each code and compare them.
e.g:
One feature could be similar symbol names. Extract all symbols using ctags or regular expressions, make all lower-case, make uniq sort of both lists and compare them.
Another possible feature:
List of class + number of members e.g:
MyClass1 10
...
List of method + sequence of controll blocks. e.g:
doSth() if, while, if, ix, case
...
Another easy way, is to represent the code as a picture - e.g. load the code as text in Word and set the font size to 1. Human beings are very good on comparing pictures. For another Ideas of code Visualization you may check http://www.se-radio.net/2009/03/episode-130-code-visualization-with-michele-lanza/

Related

Racket - is it possible to define and use the same language in a single file?

I like to write code using org-mode and ob-racket. It works quite well for my purposes (mostly pragmatic scripting and solving code challenges as I slowly learn the patterns of this cool language). I am trying to understand language creation and a limitation is that ob-racket as is will only ever generate a single file for me, yet every example of creating a custom language I can find (largely just Beautiful Racket) uses separate files for the language definition and its usage.
I would assume - this being lisp after all - that it is possible to define a custom reader and/or a custom extender and then use them all in the same file. Can someone give me an example of how to do this or is it not doable?

Implementing language translators in racket

I am implementing an interpreter that codegen to another language using Racket. As a novice I'm trying to avoid macros to the extent that I can ;) Hence I came up with the following "interpreter":
(define op (open-output-bytes))
(define (interpret arg)
(define r
(syntax-case arg (if)
[(if a b) #'(fprintf op "if (~a) {~a}" a b)]))
; other cases here
(eval r))
This looks a bit clumsy to me. Is there a "best practice" for doing this? Am I doing a totally crazy thing here?
Short answer: yes, this is a reasonable thing to do. The way in which you do it is going to depend a lot on the specifics of your situation, though.
You're absolutely right to observe that generating programs as strings is an error-prone and fragile way to do it. Avoiding this, though, requires being able to express the target language at a higher level, in order to circumvent that language's parser.
Again, it really has a lot to do with the language that you're targeting, and how good a job you want to do. I've hacked together things like this for generating Python myself, in a situation where I knew I didn't have time to do things right.
EDIT: oh, you're doing Python too? Bleah! :)
You have a number of different choices. Your cleanest choice is to generate a representation of Python AST nodes, so you can either inject them directly or use existing serialization. You're going to ask me whether there are libraries for this, and ... I fergits. I do believe that the current Python architecture includes ... okay, yes, I went and looked, and you're in good shape. Python's "Parser" module generates ASTs, and it looks like the AST module can be constructed directly.
https://docs.python.org/3/library/ast.html#module-ast
I'm guessing your cleanest path would be to generate JSON that represents these AST modules, then write a Python stub that translates these to Python ASTs.
All of this assumes that you want to take the high road; there's a broad spectrum of in-between approaches involving simple generalizations of python syntax (e.g.: oh, it looks like this kind of statement has a colon followed by an indented block of code, etc.).
If your source language shares syntax with Racket, then use read-syntax to produce a syntax-object representing the input program. Then use recursive descent using syntax-case or syntax-parse to discern between the various constructs.
Instead of printing directly to an output port, I recommend building a tree of elements (strings, numbers, symbols etc). The last step is then to print all the elements of the tree. Representing the output using a tree is very flexible and allows you to handle sub expressions out of order. It also allows you to efficiently concatenate output from different sources.
Macros are not needed.

What's a Good package for Phonetic Representation for Various Human Languages?

I'm currently working on a project for which I think being able to come up with phonetic representations of words in various languages would be really helpful. I know Aspell does this pretty well, but I don't think there's a very easy way to get at their phonetic representations, so I ask: is there some other good package for getting the phonetic representation of a word given the word and the language/dialect/accent/whatever it's coming from?
This doesn't need to be in any particular language, but if it were Perl, that would be best.
I've already tried Soundex, Metaphone, DoubleMetaphone, and everything else in Text::Phonetic, and none of that stuff was very good – definitely nowhere near as good as the stuff in Aspell.
The first thing that springs to mind is Soundex. Of course, there is a Perl module Soundex, too. While this is designed to generate a soundex "key" from input it might be useful in mapping different variants to a common key.
There is a package Text::Aspell in CPAN. Might be useful.
I you are trying to make a google style suggestion/correction system, it's not based on just phonetics or AI, but on a massive amount of user input. When a user makes a search, and doesn't click in any link but corrects the input and searches again, it gives google a lot of data about "correct" writing than a phonetics test or dictionary matching.
The main problem is in human language itself, it's not that people speak or write in a deterministic way, let alone in multiple languages.
Of course , i might be wrong, but if you need a library that let's you do this:
getLanguage(string);
I want to see that working, really.

Uses for both static strong typed languages like Haskell and dynamic (strong) languages like Common LIsp

I was working with a Lisp dialect but also learning some Haskell as well. They share some similarities but the main difference in Common Lisp seems to be that you don't have to define a type for each function, argument, etc. whereas in Haskell you do. Also, Haskell is mostly a compiled language. Run the compiler to generate the executable.
My question is this, are there different applications or uses where a language like Haskell may make more sense than a more dynamic language like Common Lisp. For example, it seems that Lisp could be used for more bottom programming, like in building websites or GUIs, where Haskell could be used where compile time checks are more needed like in building TCP/IP servers or code parsers.
Popular Lisp applications:
Emacs
Popular Haskell applications:
PUGS
Darcs
Do you agree, and are there any studies on this?
Programming languages are tools for thinking with. You can express any program in any language, if you're willing to work hard enough. The chief value provided by one programming language over another is how much support it gives you for thinking about problems in different ways.
For example, Haskell is a language that emphasizes thinking about your problem in terms of types. If there's a convenient way to express your problem in terms of Haskell's data types, you'll probably find that it's a convenient language to write your program in.
Common Lisp's strengths (which are numerous) lie in its dynamic nature and its homoiconicity (that is, Lisp programs are very easy to represent and manipulate as Lisp data) -- Lisp is a "programmable programming language". If your program is most easily expressed in a new domain-specific language, for example, Lisp makes it very easy to do that. Lisp (and other dynamic languages) are a good fit if your problem description deals with data whose type is poorly specified or might change as development progresses.
Language choice is often as much an aesthetic decision as anything. If your project requirements don't limit you to specific languages for compatibility, dependency, or performance reasons, you might as well pick the one you feel the best about.
You're opening multiple cans of very wriggly worms. First off, the whole strongly vs weakly typed languages can. Second, the functional vs imperative language can.
(Actually, I'm curious: by "lisp dialect" do you mean Clojure by any chance? Because it's largely functional and closer in some ways to Haskell.)
Okay, so. First off, you can write pretty much any program in pretty much any normal language, with more or less effort. The purported advantage to strong typing is that a large class of errors can be detected at compile time. On the other hand, less typeful languages can be easier to code in. Common Lisp is interesting because it's a dynamic language with the option of declaring and using stronger types, which gives the CL compiler hints on how to optimize. (Oh, and real Common Lisp is usually implemented with a compiler, giving you the option of compiling or sticking with interpreted code.)
There are a number of studies about comparing untyped, weakly typed, and strongly typed languages. These studies invariably either say one of them is better, or say there's no observable difference. There is, however, little agreement among the studies.
The biggest area in which there may be some clear advantage is in dealing with complicated specifications for mathematical problems. In those cases (cryptographic algorithms are one example) a functional language like Haskell has advantages because it is easier to verify the correspondence between the Haskell code and the underlying algorithm.
I come mostly from a Common Lisp perspective, and as far as I can see, Common Lisp is suited for any application.
Yes, the default is dynamic typing (i.e. type detection at runtime), but you can declare types anyway for optimization (as a side note for other readers: CL is strongly typed; don't confuse weak/strong with static/dynamic!).
I could imagine that Haskell could be a bit better suited as a replacement for Ada in the avionics sector, since it forces at least all type checks at compile time.
I do not see how CL should not be as useful as Haskell for TCP/IP servers or code parsers -- rather the opposite, but my contacts with Haskell have been brief so far.
Haskell is a pure functional language. While it does allow imperative constructs (using monads), it generally forces the programmer to think the problem in a rather different way, using a more mathematical-oriented approach. You can't reassign another value to a variable, for example.
It is claimed that this reduces the probability of making some types of mistakes. Moreover, programs written in Haskell tend to be shorter and more concise than those written in typical programming languages. Haskell also makes heavy use of non-strict, lazy evaluation, which could theoretically allow the compiler to make optimizations not otherwise possible (along with the no-side-effects paradigm).
Since you asked about it, I believe Haskell's typing system is quite nice and useful. Not only it catches common errors, but it can also make code more concise (!) and can effectively replace object-oriented constructs from common OO languages.
Some Haskell development kits, like GHC, also feature interactive environments.
The best use for dynamic typing that I've found is when you depend on things that you have no control over so it could as well be used dynamically. For example getting information from XML document we could do something like this:
var volume = parseXML("mydoc.xml").speaker.volume()
Not using duck typing would lead to something like this:
var volume = parseXML("mydoc.xml").getAttrib["speaker"].getAttrib["volume"].ToString()
The benefit of Haskell on the other hand is in safety. You can for example make sure, using types, that degrees in Fahrenheit and Celsius are never mixed unintentionally. Besides that I find that statically typed languages have better IDEs.

Keeping CL and Scheme straight in your head

Depending on my mood I seem to waffle back and forth between wanting a Lisp-1 and a Lisp-2. Unfortunately beyond the obvious name space differences, this leaves all kinds of amusing function name/etc problems you run into. Case in point, trying to write some code tonight I tried to do (map #'function listvar) which, of course, doesn't work in CL, at all. Took me a bit to remember I wanted mapcar, not map. Of course it doesn't help when slime/emacs shows map IS defined as something, though obviously not the same function at all.
So, pointers on how to minimize this short of picking one or the other and sticking with it?
Map is more general than mapcar, for example you could do the following rather than using mapcar:
(map 'list #'function listvar)
How do I keep scheme and CL separate in my head? I guess when you know both languages well enough you just know what works in one and not the other. Despite the syntactic similarities they are quite different languages in terms of style.
Well, I think that as soon you get enough experience in both languages this becomes a non-issue (just with similar natural languages, like Italian and Spanish). If you usually program in one language and switch to the other only occasionally, then unfortunately you are doomed to write Common Lisp in Scheme or vice versa ;)
One thing that helps is to have a distinct visual environment for both languages, using syntax highlighting in some other colors etc. Then at least you will always know whether you are in Common Lisp or Scheme mode.
I'm definitely aware that there are syntactic differences, though I'm certainly not fluent enough yet to automatically use them, making the code look much more similar currently ;-).
And I had a feeling your answer would be the case, but can always hope for a shortcut <_<.
The easiest way to keep both languages straight is to do your thinking and code writing in Common Lisp. Common Lisp code can be converted into Scheme code with relative ease; however, going from Scheme to Common Lisp can cause a few headaches. I remember once where I was using a letrec in Scheme to store both variables and functions and had to split it up into the separate CL functions for the variable and function namespaces respectively.
In all practicality though I don't make a habit of writing CL code, which makes the times that I do have to all the more painful.