I am asking for gplex, however it might be the case, the solution to the problem works for other lex-derived tools.
I wrote all rules, everything is fine with one exception. The type of the scan method of the generated scanner is int, and I would like to be MySymbol (which would consist of id of the token -- INT, STR, PLUS, so on, its value, and possible location in the file).
I checked the samples (not many of them), but they are very simplistic and just write out the fact rule was matched, I've read the manual, but it starts from parser perspective and for now I am a bit lost.
One of my rules in lex file:
while { return new MySymbol(MyTokens.WHILE); }
All I have now is scanning phase, I have to finish it, and then I will think about parser.
Yacc and Yacc-like tools (here GPLex) relies on side effect. Normally you could think of returning the data, but here you are returning token id, and any extra data has to be "passed" via special variables like yyval.
I always gets confused when I deal with Classes and Objects. As I am trying to understand the Spreadsheet::ParseExcel module, I am having some doubts for its classes and object :
My doubt is:
With $parser= Spreadsheet::ParseExcel->new();, we are creating an object for Spreadsheet::ParseExcel and after this we shall create the object for Spreadsheet::ParseExcel::Workbook.
Why can not we create the object directly for Spreadsheet::ParseExcel::Workbook and start parsing ?
Thanks
Why can not we create the object directly for Spreadsheet::ParseExcel::Workbook and start parsing
That is a reasonable question and in older versions of Spreadsheet::ParseExcel there was a Spreadsheet::ParseExcel::Workbook->Parse() method that did just that. (*)
Users tend to see an Excel file only as a workbook. However the file format also contains data such as metadata (author, creation date, etc.) and vba macros that are separate from the workbook data.
As such the logical division of the parser from the workbook probably occurred due to the physical division of the data in the file.
Or it may have been to allow reporting of file parsing errors rather than just returning an undefined workbook object.
Either way, other people may have chosen to model the interface differently but that is what the original author chose. It is not completely intuitive but it works.
(*) This method is now deprecated since it doesn't allow error checking on the file.
Think about Spreadsheet::ParseExcel and Spreadsheet::ParseExcel::Workbook like they are just of different types, like integer and string, which are both scalar, but you cannot, say, multiply them, although they can interact in some cases. E.g. length() applied to string gives you integer length of string. The same way, Spreadsheet::ParseExcel::parse() gives you Spreadsheet::ParseExcel::Workbook. They are bound by common namespace but they are completely different, Spreadsheet::ParseExcel is a parser and Spreadsheet::ParseExcel::Workbook is a workbook.
EDIT: I completely re-wrote the question since it seems like I was not clear enough in my first two versions. Thanks for the suggestions so far.
I would like to internationalize the source code for a tutorial project (please notice, not the runtime application). Here is an example (in Java):
/** A comment */
public String doSomething() {
System.out.println("Something was done successfully");
}
in English , and then have the French version be something like:
/** Un commentaire */
public String faitQuelqueChose() {
System.out.println("Quelque chose a été fait avec succès.");
}
and so on. And then have something like a properties file somewhere to edit these translations with usual tools, such as:
com.foo.class.comment1=A comment
com.foo.class.method1=doSomething
com.foo.class.string1=Something was done successfully
and for other languages:
com.foo.class.comment1=Un commentaire
com.foo.class.method1=faitQuelqueChose
com.foo.class.string1=Quelque chose a été fait avec succès.
I am trying to find the easiest, most efficient and unobtrusive way to do this with the least amount of manual grunt work (other than obviously translating the actual text). Preferably working under Eclipse. For example, the original code would be written in English, then externalized (to properties, preferably leaving the original source untouched), translated (humanly) and then re-generated (as a separate source file / project).
Some trails I have found (other than what AlexS suggested):
AntLR, a language parser / generator. There seems to be a supporting Eclipse plugin
Using Eclipse's AST (Abstract Syntax Tree) and I guess building some kind of plugin.
I am just surprised there isn't a tool out there that does this already.
I'd use unique strings as methodnames (or anything you want to be replaced by localized versions.
public String m37hod_1() {
System.out.println(m355a6e_1);
}
then I'd define a propertyfile for each language like this:
m37hod_1=doSomething
m355a6e_1="Something was done successfully"
And then I'd write a small program parsing the sourcefiles and replacing the strings. So everything just outside eclipse.
Or I'd use the ant task Replace and propertyfiles as well, instead of a standalone translation program.
Something like that:
<replace
file="${src}/*.*"
value="defaultvalue"
propertyFile="${language}.properties">
<replacefilter
token="m37hod_1"
property="m37hod_1"/>
<replacefilter
token="m355a6e_1"
property="m355a6e_1"/>
</replace>
Using one of these methods you won't have to explain anything about localization in your tutorials (except you want to), but can concentrate on your real topic.
What you want is a massive code change engine.
ANTLR won't do the trick; ASTs are necessary but not sufficient. See my essay on Life After Parsing. Eclipse's "AST" may be better, if the Eclipse package provides some support for name and type resolution; otherwise you'll never be able to figure out how to replace each "doSomething" (might be overloaded or local), unless you are willing to replace them all identically (and you likely can't do that, because some symbols refer to Java library elements).
Our DMS Software Reengineering Toolkit could be used to accomplish your task. DMS can parse Java to ASTs (including comment capture), traverse the ASTs in arbitrary ways, analyze/change ASTs, and the export modified ASTs as valid source code (including the comments).
Basically you want to enumerate all comments, strings, and declarations of identifiers, export them to an external "database" to be mapped (manually? by Google Translate?) to an equivalent. In each case you want to note not only the item of interest, but its precise location (source file, line, even column) because items that are spelled identically in the original text may need different spellings in the modified text.
Enumeration of strings is pretty easy if you have the AST; simply crawl the tree and look for tree nodes containing string literals. (ANTLR and Eclipse can surely do this, too).
Enumeration of comments is also straightforward if the parser you have captures comments. DMS does. I'm not quite sure if ANTLR's Java grammar does, or the Eclipse AST engine; I suspect they are both capable.
Enumeration of declarations (classes, methods, fields, locals) is relatively straightforward; there's rather more cases to worry about (e.g., anonymous classes containing extensions to base classes). You can code a procedure to walk the AST and match the tree structures, but here's the place that DMS starts to make a difference: you can write surface-syntax patterns that look like the source code you want to match. For instance:
pattern local_for_loop_index(i: IDENTIFIER, t: type, e: expression, e2: expression, e3:expression): for_loop_header
= "for (\t \i = \e,\e2,\e3)"
will match declarations of local for loop variables, and return subtrees for the IDENTIFIER, the type, and the various expressions; you'd want to capture just the identifier (and its location, easily done by taking if from the source position information that DMS stamps on every tree node). You'd probably need 10-20 such patterns to cover the cases of all the different kinds of identifiers.
Capture step completed, something needs to translate all the captured entities to your target language. I'll leave that to you; what's left is to put the translated entities back.
The key to this is the precise source location. A line number isn't good enough in practice; you may have several translated entities in the same line, in the worst case, some with different scopes (imagine nested for loops for example). The replacement process for comments, strings and the declarations are straightforward; rescan the tree for nodes that match any of the identified locations, and replace the entity found there with its translation. (You can do this with DMS and ANTLR. I think Eclipse ADT requires you generate a "patch" but I guess that would work.).
The fun part comes in replacing the identifier uses. For this, you need to know two things:
for any use of an identifier, what is the declaration is uses; if you know this, you can replace it with the new name for the declaration; DMS provides full name and type resolution as well as a usage list, making this pretty easy, and
Do renamed identifiers shadow one another in scopes differently than the originals? This is harder to do in general. However, for the Java language, we have a "shadowing" check, so you can at least decide after renaming that you have an issues. (There's even a renaming procedure that can be used to resolve such shadowing conflicts
After patching the trees, you simply rewrite the patched tree back out as a source file using DMS's built-in prettyprinter. I think Eclipse AST can write out its tree plus patches. I'm not sure ANTLR provides any facilities for regenerating source code from ASTs, although somebody may have coded one for the Java grammar. This is harder to do than it sounds, because of all the picky detail. YMMV.
Given your goal, I'm a little surprised that you don't want a sourcefile "foo.java" containing "class foo { ... }" to get renamed to .java. This would require not only writing the transformed tree to the translated file name (pretty easy) but perhaps even reconstructing the directory tree (DMS provides facilities for doing directory construction and file copies, too).
If you want to do this for many languages, you'd need to run the process once per language. If you wanted to do this just for strings (the classic internationalization case), you'd replace each string (that needs changing, not all of them do) by a call on a resource access with a unique resource id; a runtime table would hold the various strings.
One approach would be to finish the code in one language, then translate to others.
You could use Eclipse to help you.
Copy the finished code to language-specific projects.
Then:
Identifiers: In the Outline view (Window>Show View>Outline), select each item and Refactor>Rename (Alt+Shift+R). This takes care of renaming the identifier wherever it's used.
Comments: Use Search>File to find all instances of "/*" or "//". Click on each and modify.
Strings:
Use Source>Externalize strings to find all of the literal strings.
Search>File for "Messages.getString()".
Click on each result and modify.
On each file, ''Edit>Find/Replace'', replacing "//\$NON-NLS-.*\$" with empty string.
for the printed/logged string, java possess some internatization functionnalities, aka ResourceBundle. There is a tutorial about this on oracle site
Eclipse also possess a funtionnality for this ("Externalize String", as i recall).
for the function name, i don't think there anything out, since this will require you to maintain the code source on many version...
regards
Use .properties file, like:
Locale locale = new Locale(language, country);
ResourceBundle captions= ResourceBundle.getBundle("Messages",locale);
This way, Java picks the Messages.properties file according to the current local (which is acquired from the operating system or Java locale settings)
The file should be on the classpath, called Messages.properties (the default one), or Messages_de.properties for German, etc.
See this for a complete tutorial:
http://docs.oracle.com/javase/tutorial/i18n/intro/steps.html
As far as the source code goes, I'd strongly recommend staying with English. Method names like getUnternehmen() are worse to the average developer then plain English ones.
If you need to familiarize foreign developers to your code, write a proper developer documentation in their language.
If you'd like to have Javadoc in both English and other languages, see this SO thread.
You could write your code using freemarker templates (or another templating language such as velocity).
doSomething.tml
/** ${lang['doSomething.comment']} */
public String ${lang['doSomething.methodName']}() {
System.out.println("${lang['doSomething.message']}");
}
lang_en.prop
doSomething.comment=A comment
doSomething.methodName=doSomething
doSomething.message=Something was done successfully
And then merge the template with each language prop file during your build (using Ant / Gradle / Maven etc.)
I am desperately searching for an ASN.1 compiler that will successfully parse a predefined ASN.1 definition I got from a customer.
None of the products (free or commercial) so far was able to parse the definition, which is XER based and has some transient RXER dependency - when I specify the relevant ASN.1-definitions manually (taken from the RFCs), the compiler also emits parser errors.
The relevant code line is (simplified):
MYMSG ::= SEQUENCE
{
msgID [ATTRIBUTE] [250] UTF8String OPTIONAL,
msgType UTF8String
}
Every compiler complains about the [ATTRIBUTE] token, so I found this is part of the ASN.X specification defined in RFC 4912 and also depends on support for RXER in RFC 4910, see also X.680-1.
Problem is every compiler I tried does not seem to support these encoding schemas out of the box and fails to parse the definitions mentioned in the RFCs, e.g. for RXER and all definitions that use this encoding I get:
"RXER.asn", line 20
(AdditionalBasicDefinitions): A1139W:
The default encoding reference 'RXER'
for the module
'AdditionalBasicDefinitions' is
unknown and all encoding instructions
within the module identified by this
encoding reference will be ignored.
RXER INSTRUCTIONS
(Note: all dependent modules like ASN.X include the instruction "RXER INSTRUCTIONS" immediately after the "DEFINITIONS"-tag, which is not understood by any compiler I tried).
I tried openasn1 (www.openasn1.org) - funny thing is that I have some old and partially functional Java mapping objects in the code I got that was generated by openasn1! - the online compiler at http://lionet.info/asn1c/asn1c.cgi and various commercial tools like Objective Systems ASN1C v6.4.1 at http://www.obj-sys.com/Cnge641Dwld/acv64kits.php (they even have a current Eclipse plugin), Marben http://www.marben-products.com/asn.1/tce_java.html and unigone http://www.unigone.com/en/products/Asn1Compiler/description
I always get an error similar to this:
ASN.1 grammar parse error near line 13 (token "ATTRIBUTE"): parse error, unexpected TOK_capitalreference, expecting TOK_number
Am I missing something obvious like IMPORTs or other definitions/compiler flags?
I managed to compile the schema and generate the needed Java mapping classes. I had to use the commercial OSS Nokalva compiler, as all free tools I tried failed on (E)XER encoded schemata.
There were also some errors in the schema I received, so here is what I had to do:
first I added the XER-instructions in the DEFINITIONS-line along with the usual Tags
directive:
DEFINITIONS XER INSTRUCTIONS AUTOMATIC TAGS ::=
in the footer, I added the XER encoding directive:
ENCODING-CONTROL XER GLOBAL-DEFAULTS
MODIFIED-ENCODINGS
when using XER encoding, you have to explicitly specify tags to avoid ambiguities in syntax:
[ATTRIBUTE] [TAG: 0]
instead of the ambiguous definition
[ATTRIBUTE] [0]
All these problems were resolved by consulting this nice writeup on EXER-encoding, definitely a good read and thanks to Nokalva for this helpful documentation!
For my work, I sometimes have to deal with logfiles from a binary protocol (the logfiles contain hexdumps of the messages). I want to write a Perl script that can interpret the binary data for me and print the contents in a more friendly format.
I have a (machine readable) description of the protocol messages in a proprietary format and I have (mostly) figured out how to parse that format (the parts I can"t fully understand are not related to my goal, so I can just ignore them), so I can convert the description into a data structure for use in my script.
Because the protocol description only rarely changes, it seems a waste to re-parse the protocol description each time I want to analyse a logfile, but on the other hand, if the description does change or if I accidentally throw away my pre-parsed form of the description, then I would like my script to automatically trigger a re-parsing of the description.
What is the best way to realise this?
Assuming that the protocol description lives in a file accessible to the script, have a function to read in the parsed data which caches the parsed results in intermediate file. The logic is very very simple but the steps look very verbose since I tried to write out the full spec - in reality it should take <10 lines of Perl code.
Check if intermediate file exists. If it does not (or can not be read), skip to proprietary parsing step (#4)
If you can read in the intermediate cache file, read in the "protocol description timestamp" field (described below). Then find out modification time of "protocol description" file via stat() and compare. If modification time of "protocol description" file is >= cache file's stored timestamp, skip to proprietary parsing step (#4)
Else (e.g. the time of "protocol description" file is < cache file's stored timestamp), read the intermediary cache file data via Data::Dumper or Storable. End.
If you need to re-parse because of logic in #1 or #2, read in "protocol description" file, parse it into your data structure.
Then create a hash with 2 keys: "protocol_description_timestamp" (with the value being the modification time of protocol description file derived from stat call) and second key "data", with the value being a reference to the data structure you just produced as a result of parsing.
Then save that data structure into the intermediate cache file using Storable or Data::Dumper or any other method of your choice for storing Perl data structires.
You can use a Makefile for this. Make the data structure you use a Makefile target that depends on the protocol description.
When Make notices that the protocol was updated more recently than the script, it will run the commands you specify to recreate your data.