EDIT: I completely re-wrote the question since it seems like I was not clear enough in my first two versions. Thanks for the suggestions so far.
I would like to internationalize the source code for a tutorial project (please notice, not the runtime application). Here is an example (in Java):
/** A comment */
public String doSomething() {
System.out.println("Something was done successfully");
}
in English , and then have the French version be something like:
/** Un commentaire */
public String faitQuelqueChose() {
System.out.println("Quelque chose a été fait avec succès.");
}
and so on. And then have something like a properties file somewhere to edit these translations with usual tools, such as:
com.foo.class.comment1=A comment
com.foo.class.method1=doSomething
com.foo.class.string1=Something was done successfully
and for other languages:
com.foo.class.comment1=Un commentaire
com.foo.class.method1=faitQuelqueChose
com.foo.class.string1=Quelque chose a été fait avec succès.
I am trying to find the easiest, most efficient and unobtrusive way to do this with the least amount of manual grunt work (other than obviously translating the actual text). Preferably working under Eclipse. For example, the original code would be written in English, then externalized (to properties, preferably leaving the original source untouched), translated (humanly) and then re-generated (as a separate source file / project).
Some trails I have found (other than what AlexS suggested):
AntLR, a language parser / generator. There seems to be a supporting Eclipse plugin
Using Eclipse's AST (Abstract Syntax Tree) and I guess building some kind of plugin.
I am just surprised there isn't a tool out there that does this already.
I'd use unique strings as methodnames (or anything you want to be replaced by localized versions.
public String m37hod_1() {
System.out.println(m355a6e_1);
}
then I'd define a propertyfile for each language like this:
m37hod_1=doSomething
m355a6e_1="Something was done successfully"
And then I'd write a small program parsing the sourcefiles and replacing the strings. So everything just outside eclipse.
Or I'd use the ant task Replace and propertyfiles as well, instead of a standalone translation program.
Something like that:
<replace
file="${src}/*.*"
value="defaultvalue"
propertyFile="${language}.properties">
<replacefilter
token="m37hod_1"
property="m37hod_1"/>
<replacefilter
token="m355a6e_1"
property="m355a6e_1"/>
</replace>
Using one of these methods you won't have to explain anything about localization in your tutorials (except you want to), but can concentrate on your real topic.
What you want is a massive code change engine.
ANTLR won't do the trick; ASTs are necessary but not sufficient. See my essay on Life After Parsing. Eclipse's "AST" may be better, if the Eclipse package provides some support for name and type resolution; otherwise you'll never be able to figure out how to replace each "doSomething" (might be overloaded or local), unless you are willing to replace them all identically (and you likely can't do that, because some symbols refer to Java library elements).
Our DMS Software Reengineering Toolkit could be used to accomplish your task. DMS can parse Java to ASTs (including comment capture), traverse the ASTs in arbitrary ways, analyze/change ASTs, and the export modified ASTs as valid source code (including the comments).
Basically you want to enumerate all comments, strings, and declarations of identifiers, export them to an external "database" to be mapped (manually? by Google Translate?) to an equivalent. In each case you want to note not only the item of interest, but its precise location (source file, line, even column) because items that are spelled identically in the original text may need different spellings in the modified text.
Enumeration of strings is pretty easy if you have the AST; simply crawl the tree and look for tree nodes containing string literals. (ANTLR and Eclipse can surely do this, too).
Enumeration of comments is also straightforward if the parser you have captures comments. DMS does. I'm not quite sure if ANTLR's Java grammar does, or the Eclipse AST engine; I suspect they are both capable.
Enumeration of declarations (classes, methods, fields, locals) is relatively straightforward; there's rather more cases to worry about (e.g., anonymous classes containing extensions to base classes). You can code a procedure to walk the AST and match the tree structures, but here's the place that DMS starts to make a difference: you can write surface-syntax patterns that look like the source code you want to match. For instance:
pattern local_for_loop_index(i: IDENTIFIER, t: type, e: expression, e2: expression, e3:expression): for_loop_header
= "for (\t \i = \e,\e2,\e3)"
will match declarations of local for loop variables, and return subtrees for the IDENTIFIER, the type, and the various expressions; you'd want to capture just the identifier (and its location, easily done by taking if from the source position information that DMS stamps on every tree node). You'd probably need 10-20 such patterns to cover the cases of all the different kinds of identifiers.
Capture step completed, something needs to translate all the captured entities to your target language. I'll leave that to you; what's left is to put the translated entities back.
The key to this is the precise source location. A line number isn't good enough in practice; you may have several translated entities in the same line, in the worst case, some with different scopes (imagine nested for loops for example). The replacement process for comments, strings and the declarations are straightforward; rescan the tree for nodes that match any of the identified locations, and replace the entity found there with its translation. (You can do this with DMS and ANTLR. I think Eclipse ADT requires you generate a "patch" but I guess that would work.).
The fun part comes in replacing the identifier uses. For this, you need to know two things:
for any use of an identifier, what is the declaration is uses; if you know this, you can replace it with the new name for the declaration; DMS provides full name and type resolution as well as a usage list, making this pretty easy, and
Do renamed identifiers shadow one another in scopes differently than the originals? This is harder to do in general. However, for the Java language, we have a "shadowing" check, so you can at least decide after renaming that you have an issues. (There's even a renaming procedure that can be used to resolve such shadowing conflicts
After patching the trees, you simply rewrite the patched tree back out as a source file using DMS's built-in prettyprinter. I think Eclipse AST can write out its tree plus patches. I'm not sure ANTLR provides any facilities for regenerating source code from ASTs, although somebody may have coded one for the Java grammar. This is harder to do than it sounds, because of all the picky detail. YMMV.
Given your goal, I'm a little surprised that you don't want a sourcefile "foo.java" containing "class foo { ... }" to get renamed to .java. This would require not only writing the transformed tree to the translated file name (pretty easy) but perhaps even reconstructing the directory tree (DMS provides facilities for doing directory construction and file copies, too).
If you want to do this for many languages, you'd need to run the process once per language. If you wanted to do this just for strings (the classic internationalization case), you'd replace each string (that needs changing, not all of them do) by a call on a resource access with a unique resource id; a runtime table would hold the various strings.
One approach would be to finish the code in one language, then translate to others.
You could use Eclipse to help you.
Copy the finished code to language-specific projects.
Then:
Identifiers: In the Outline view (Window>Show View>Outline), select each item and Refactor>Rename (Alt+Shift+R). This takes care of renaming the identifier wherever it's used.
Comments: Use Search>File to find all instances of "/*" or "//". Click on each and modify.
Strings:
Use Source>Externalize strings to find all of the literal strings.
Search>File for "Messages.getString()".
Click on each result and modify.
On each file, ''Edit>Find/Replace'', replacing "//\$NON-NLS-.*\$" with empty string.
for the printed/logged string, java possess some internatization functionnalities, aka ResourceBundle. There is a tutorial about this on oracle site
Eclipse also possess a funtionnality for this ("Externalize String", as i recall).
for the function name, i don't think there anything out, since this will require you to maintain the code source on many version...
regards
Use .properties file, like:
Locale locale = new Locale(language, country);
ResourceBundle captions= ResourceBundle.getBundle("Messages",locale);
This way, Java picks the Messages.properties file according to the current local (which is acquired from the operating system or Java locale settings)
The file should be on the classpath, called Messages.properties (the default one), or Messages_de.properties for German, etc.
See this for a complete tutorial:
http://docs.oracle.com/javase/tutorial/i18n/intro/steps.html
As far as the source code goes, I'd strongly recommend staying with English. Method names like getUnternehmen() are worse to the average developer then plain English ones.
If you need to familiarize foreign developers to your code, write a proper developer documentation in their language.
If you'd like to have Javadoc in both English and other languages, see this SO thread.
You could write your code using freemarker templates (or another templating language such as velocity).
doSomething.tml
/** ${lang['doSomething.comment']} */
public String ${lang['doSomething.methodName']}() {
System.out.println("${lang['doSomething.message']}");
}
lang_en.prop
doSomething.comment=A comment
doSomething.methodName=doSomething
doSomething.message=Something was done successfully
And then merge the template with each language prop file during your build (using Ant / Gradle / Maven etc.)
Related
I've been given the task of researching whether one can use Powershell to automate the managing of References in VB6 application and then compile it's projects afterwards.
There are 3 projects. I requirement is to remove a specific reference in each project. Then, compile projects from bottom up (server > client > interface) and add reference back in along the way. (remove references, compile server.dll >add client reference to server.dll, compile client.dll > add interface reference to client.dll, compile interface.exe)
I'm thinking no, but I was still given the task of finding out for sure. Of course, where does one go to find this out? Why here of course, StackOverflow.
References are stored in the project .VBP files which are just text files. A given reference takes up exactly one line of the file.
For example, here is a reference to DAO database components:
Reference=*\G{00025E01-0000-0000-C000-000000000046}#5.0#0#C:\WINDOWS\SysWow64\dao360.dll#Microsoft DAO 3.6 Object Library
The most important info is everything to the left of the path which contains the GUID (i.e., the unique identifier of the library, more or less). The filespec and description text are unimportant as VB6 will update that to whatever it finds in the registry for the referenced DLL.
An alternate form of reference is for GUI controls, such as:
Object={BDC217C8-ED16-11CD-956C-0000C04E4C0A}#1.1#0; tabctl32.ocx
which for whatever reason never seem to have a path anyway. Most likely you will not need to modify this type of reference, because it would almost certainly break forms in the project which rely on them.
So in your Powershell script, the key task would be to either add or remove the individual reference lines mentioned in the question. Unless you are using no form of binary compatibility, the GUID will remain stable. Therefore, you could essentially hardcode the strings you need to add/remove.
Aside from all that, its worth thinking through why you need to take this approach at all. Normally to build a VB6 solution it is totally unnecessary to add/remove references along the way. Also depending on your choice of deployment techniques, you are probably using either project or binary compatibility which tends to keep the references stable.
Lastly, I'll mention that there are existing tools such as Kinook's Visual Build Pro which already know how to build groups of VB6 projects and if using a 3rd party tool like that is an option, could save you a lot of work.
I'm currently making good use of GWT's ClientBundles in my app. It works fine, but I have a large number of resources and it becomes tedious to manually create Java interfaces for each file:
#ClientBundle.Source("world_war_ii.txt")
public ExternalTextResource worldWarII();
#ClientBundle.Source("spain.txt")
public ExternalTextResource spain();
#ClientBundle.Source("france.txt")
public ExternalTextResource france();
I'd like to be able to (perhaps at compile time) dynamically list every *.txt file in a given directory, and then have run-time access to them, perhaps as an array ExternalTextResource[], rather than having to explicitly list them in my code. There may be hundreds of such resources, and enumerating them manually as code would be very painful and unmaintainable.
The ClientBundle documentation explicitly says that "to provide a file-system abstraction" is a non-goal, so unfortunately this seems to disallow what I'm trying to do.
What's the best way to deal with a large number of external resources that must be available at run-time? Would a generator help?
There's an automatic generator for CssResource - maybe you could look at its code and modify it to your needs?
I ended up following this advice: perform the file operations on the server, and then return a list of the file (meta)data via an RPC call.
This turns out to be fairly simple, and also allows me to return lightweight references (filenames) in the list, which I use to populate a Tree client-side; when the user clicks on a TreeItem the actual text contents are downloaded.
Do you use table-of-contents for listing all the functions (and maybe variables) of a class in the beginning of big source code file? I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.. but some complex tasks require a lot of code. I'm not sure is it really worth it spending your time subdividing implementation into multiple of files? Or is it ok to create an index-listing additionally to the class/interface declaration?
EDIT:
To better illustrate how I use table-of-contents this is an example from my hobby project. It's actually not listing functions, but code blocks inside a function.. but you can probably get the idea anyway..
/*
CONTENTS
Order_mouse_from_to_points
Lines_intersecting_with_upper_point
Lines_intersecting_with_both_points
Lines_not_intersecting
Lines_intersecting_bottom_points
Update_intersection_range_indices
Rough_method
Normal_method
First_selected_item
Last_selected_item
Other_selected_item
*/
void SelectionManager::FindSelection()
{
// Order_mouse_from_to_points
...
// Lines_intersecting_with_upper_point
...
// Lines_intersecting_with_both_points
...
// Lines_not_intersecting
...
// Lines_intersecting_bottom_points
...
// Update_intersection_range_indices
for(...)
{
// Rough_method
....
// Normal_method
if(...)
{
// First_selected_item
...
// Last_selected_item
...
// Other_selected_item
...
}
}
}
Notice that index-items don't have spaces. Because of this I can click on one them and press F4 to jump to the item-usage, and F2 to jump back (simple visual studio find-next/prevous-shortcuts).
EDIT:
Another alternative solution to this indexing is using collapsed c# regions. You can configure visual studio to show only region names and hide all the code. Of course keyboard support for that source code navigation is pretty cumbersome...
I know that alternative to that kind of listing would be to split up big files into smaller classes/files, so that their class declaration would be self-explanatory enough.
Correct.
but some complex tasks require a lot of code
Incorrect. While a "lot" of code be required, long runs of code (over 25 lines) are a really bad idea.
actually not listing functions, but code blocks inside a function
Worse. A function that needs a table of contents must be decomposed into smaller functions.
I'm not sure is it really worth it spending your time subdividing implementation into multiple of files?
It is absolutely mandatory that you split things into smaller files. The folks that maintain, adapt and reuse your code need all the help they can get.
is it ok to create an index-listing additionally to the class/interface declaration?
No.
If you have to resort to this kind of trick, it's too big.
Also, many languages have tools to generate API docs from the code. Java, Python, C, C++ have documentation tools. Even with Javadoc, epydoc or Doxygen you still have to design things so that they are broken into intellectually manageable pieces.
Make things simpler.
Use a tool to create an index.
If you create a big index you'll have to maintain it as you change your code. Most modern IDEs create list of class members anyway. it seems like a waste of time to create such index.
I would never ever do this sort of busy-work in my code. The most I would do manually is insert a few lines at the top of the file/class explaining what this module did and how it is intended to be used.
If a list of methods and their interfaces would be useful, I generate them automatically, through a tool such as Doxygen.
I've done things like this. Not whole tables of contents, but a similar principle -- just ad-hoc links between comments and the exact piece of code in question. Also to link pieces of code that make the same simplifying assumptions that I suspect may need fixing up later.
You can use Visual Studio's task list to get a listing of certain types of comment. The format of the comments can be configured in Tools|Options, Environment\Task List. This isn't something I ended up using myself but it looks like it might help with navigating the code if you use this system a lot.
If you can split your method like that, you should probably write more methods. After this is done, you can use an IDE to give you the static call stack from the initial method.
EDIT: You can use Eclipse's 'Show Call Hierarchy' feature while programming.
What is a good strategy for dealing with changing product and feature names in source code. Here's the situation I find myself in over and over again (most of you can relate?)...
Product name starts off as "DaBomb"
Major features are "Exploder", "Lantern" and "Flag".
Time passes, and the Feature names are changed to "Boom", "Lighthouse" and "MarkMan"
Time passes, and the product name changes to "DaChronic"
...
...
Blah, blah, blah...over and over and over
And now we have a large code base with 50 different names sprinkled around the directory tree and source files, most of which are obsolete. Only the veterans remember what each name means, the full etimologic history, etc.
What is the solution to this mess?
Clarification: I don't mean the names that customers see, I mean the names of directories, source files, classes, variables, etc. that the developers see where the changing product and feature names get woven into.
Given your clarification that you "don't mean the names that customers see, [you] mean the names of directories, source files, classes, variables, etc. that the developers see", yeah, this can be an annoying problem.
The way teams I've been on have coped with best when we've had a policy of always using only one name for each thing in the code base. If the name changes later on we either stay with the old name in the code, or we migrate all instances of the old name to the new name. The important thing is to never start using the new name in the code unless all instance of the old name have been migrated. That way you only ever have to keep 2 names for something in your head: the "old name", used in the code, and the name everyone else uses.
We've also often chosen a very generic/descriptive name for things when starting out if we know the "brand name" is likely to change.
I consider renaming to better naming conventions just another form of refactoring. Create a branch, perform the renames, run unit/integration tests, commit, merge, repeat. It's all about process control to keep consistency in the project.
The solution to the mess is to not create it in the first place. Once a code path is named, there's rarely a good reason to change it and never a good reason to use a new name alongside the old one. When "Exploder" becomes "Boom", you have two choices: Either keep using Exploder exclusively, and never mention Boom anywhere, or change all instances of Exploder to Boom and then continue on using Boom exclusively and never mention Exploder again.
If you're using both Exploder and Boom in the same code base, you're doing it wrong.
Also, I know you clarified that you're not talking about the user-visible names, but, if you start out working with your own internal names which are relevant to what the code does and completely independent of what marketing wants to call the product/feature, then this is much less likely to become an issue. If you're already referring to Exploder internally as TNT, then what difference does it make if Exploder gets changed to Boom?
How do you deal with Localization? Same thing; same method.
We use an internal and and external name. It could be as simple as a static variable definition like
public static final String EXPLODER = "Boom";
And in code you'll always use the reference to EXPLODER. Same for path names and the like - hard coding those paths at different places is a no-go anyway. If some guys starts digging through internal stuff (like JS sources or ini files or whatever), who cares if they discover Exploder?
Just use internal names, and ignore changes to marketing/official names: https://softwareengineering.stackexchange.com/a/208578/55472.
I'm a big fan of GhostDoc's automatic comment generation in Visual Studio so am looking for an plugin that does the same job with my Java code in Eclipse. Any recommendations?
You can check JAutodoc (http://jautodoc.sourceforge.net/)
From the author:
JAutodoc is an Eclipse Plugin for
automatically adding Javadoc and file
headers to your source code. It
optionally generates initial comments
from element name by using Velocity
templates for Javadoc and file
headers.
This one is the one I've found closest to GhostDoc.
It is basically the equivalent of Javadoc, which can be generating in eclipse with the shortcut:
ALT+Shift+J
(when you are within the Java function you wish to add javadoc for)
From there, if you really want XML format, you can try and use a JELDoclet
GhostDoc has a nice extra feature that infers a description of what the method does by parsing the method name and providing this as skeletal documentation. For example, using GhostDoc on a method named GetDocumentName() might return the phrase "Gets the document name". While this is hardly more information than provided by the method name, it adds method documentation where previously none existed. Some might argue that this is barely useful. I argue to the contrary because it supports generating documentation from the source code (e.g., for tools like NDoc or SandCastle).
In my opinion the greatest benefit of GhostDoc over eclipse's "Generate Element Comment" is that it encourages programmers to begin adding documentation comments by adding an extremely fast and reliable way create this. The programmer can accept the inferred text, (suitable in 50 - 80% of cases), or expand on this for more complex methods. For the junior programmer who is not as familiar with how documentation comments are used, this can quickly shorten the learning curve and encourage good programming practices.
Javadoc is not like GhostDoc my friend. Javadoc only creates the structure so one can write the documentation from scratch. GhostDoc actually fills up the information according to the Method/Property name.
Example:
/// <summary>
/// Gets the user from id.
/// </summary>
/// <param name="id">The id.</param>
/// <returns></returns>
private string GetUserFromId(string id);
JAutoDoc is the closest I've found so far but it's not as magical as GhostDoc.
Never used GhostDoc, so not sure what extra functionality it gives, but if it's about generating type and method comments based on the name, parameters, return type etc. then eclipse has it built in, so no extensions needed.