Spreadsheet::ParseExcel module in perl - perl

I always gets confused when I deal with Classes and Objects. As I am trying to understand the Spreadsheet::ParseExcel module, I am having some doubts for its classes and object :
My doubt is:
With $parser= Spreadsheet::ParseExcel->new();, we are creating an object for Spreadsheet::ParseExcel and after this we shall create the object for Spreadsheet::ParseExcel::Workbook.
Why can not we create the object directly for Spreadsheet::ParseExcel::Workbook and start parsing ?
Thanks

Why can not we create the object directly for Spreadsheet::ParseExcel::Workbook and start parsing
That is a reasonable question and in older versions of Spreadsheet::ParseExcel there was a Spreadsheet::ParseExcel::Workbook->Parse() method that did just that. (*)
Users tend to see an Excel file only as a workbook. However the file format also contains data such as metadata (author, creation date, etc.) and vba macros that are separate from the workbook data.
As such the logical division of the parser from the workbook probably occurred due to the physical division of the data in the file.
Or it may have been to allow reporting of file parsing errors rather than just returning an undefined workbook object.
Either way, other people may have chosen to model the interface differently but that is what the original author chose. It is not completely intuitive but it works.
(*) This method is now deprecated since it doesn't allow error checking on the file.

Think about Spreadsheet::ParseExcel and Spreadsheet::ParseExcel::Workbook like they are just of different types, like integer and string, which are both scalar, but you cannot, say, multiply them, although they can interact in some cases. E.g. length() applied to string gives you integer length of string. The same way, Spreadsheet::ParseExcel::parse() gives you Spreadsheet::ParseExcel::Workbook. They are bound by common namespace but they are completely different, Spreadsheet::ParseExcel is a parser and Spreadsheet::ParseExcel::Workbook is a workbook.

Related

Most efficient way to change the value of a specific tag in a DICOM file using GDCM

I have a need to go through a set of DICOM files and modify certain tags to be current with the data maintained in the database of an external system. I am looking to use GDCM. I am new to GDCM. A search through stack overflow posts demonstrates that the anonymizer class can be used to change tag values.
Generating a simple CT DICOM image using GDCM
My question is if this is the best use of the GDCM API or if there is a better approach for changing the values of individual tags such as patient name or accession number. I am unfamiliar with all of the API options but have a link to the API documentation. It looks like the DataElement SetValue member could be used, but it doesn't appear that there is a valid constructor for doing this in the Value class. Any assistance would appreciated. This is my current approach:
Anonymizer anon = new Anonymizer();
anon.SetFile(myFile);
anon.Replace(new Tag(0x0010, 0x0010), "BUGS^BUNNY");
Quite late, but maybe it would be still useful. You have not mention if you write in C++ or C#, but I assume the latter, as you do not use pointers. Generally, your approach is correct (unless you use System.IO.File instead of gdcm.File). The value (second parameter of Replace function) has to be a plain string so no special constructor is needed. You should probably start with doxygen documentation of gdcm, and there is especially one complete example. It is in C++, but there should be no problems with translation.
There are two different ways to pad dicom tags:
Anonymizer
gdcm::Anonymizer anon;
anon.SetFile(file);
anon.Replace(gdcm::Tag(0x0002, 0x0013), "Implementation Version Name");
//Implementation Version Name
DatsElement
gdcm::Attribute<0x0018, 0x0088> ss;
ss.SetValue(10.0);
ds.Insert(ss.GetAsDataElement());

Can ItemReaders just pass in the record read and not need a lineMapper t o convert to an object

I'm asking if I can pass into the ItemProcessors the entire delimited record read in the ItemReader as one long string.
I have situations with unpredictable data. The file is pipe-delimited, but even with that, a single double-quote will have a parse error using Spring Batch's ItemReader.
In a standalone java application I wrote code using Spring's StringUtils class. I read in the full delimited record as a String (BufferedReader), then call Spring's StringUtils.delimitedListToStringArray(...,...). This gets all the characters whether valid or not, and then I can do a search/replace to get things like any single double-quote or commas in the fields.
My standalone Java program is a down-n-dirty solution. I'm turning it into a Spring Batch job for the long term solution. It's a monthly process, and it's an impractical, if not impossible, task to get SAP users to keep trash out of data fields (i.e. fat-finger city).
I see where it appears I have to have a domain object for the input record to be mapped into. Is this correct, or can i do a pass-through scenario, and let me handle the parsing myself using StringUtils?
The pipe-delimited records turn into comma-delimited records. There's really no need to create a domain object and do all the field set mapping.
Am happy for ideas if I'm approaching this the wrong way.
Thank you in advance.
Thanks,
Michael
EDIT:
This is the error, and the record. The lone double-quote in column 6 is the problem. I can't control the input, so I'm scrubbing each field (all Strings) for unwanted characters. So, my solution was to skip the line mapping and use StringUtils to do it myself--as I've done as mentioned earlier.
Caused by: org.springframework.batch.item.file.FlatFileParseException: Parsing error at line: 33526 in resource=[URL [file:/temp/comptroller/myfile.txt]], input=[xxx|xxx|xxx|xxx|xxx|xxx x xxx xxxxxxx xxxx xxxx "x|xxx|xxx|xxxxx|xx|xxxxxxxxxxxxx|xxxxxxx|xxx|xx |xxx ]
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:182)
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:85)
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:90)
at org.springframework.batch.core.step.item.FaultTolerantChunkProvider.read(FaultTolerantChunkProvider.java:87)
... 27 more
Caused by: org.springframework.batch.item.file.transform.IncorrectTokenCountException: Incorrect number of tokens found in record: expected 15 actual 6
Since the domain objects you read from ItemReaders, write to ItemWriters, and optionally process with ItemProcessors can be any Object, they can be Strings.
So the short answer is yes, you should be able to use a FlatFileItemReader to read one line at a time, pass it to SomeItemProcessor<String,String>, which replaces your pipes with commas (and handles existing commas) with whatever code you want, and sends those converted lines to a FlatFileItemWriter. Spring Batch includes common implementations of the LineTokenizer and LineAggregator classes which could help.
In this scenario, Spring Batch would be acting like a glorified search replace tool, with saner failure handling. To answer the bigger question of whether you should be using domain objects, or at least beans, think about whether you want to perform other tasks in the conversion process, like validation.
P.S. I'm not aware that FFItemReader blows up on a single double-quote, might want to file that as a bug.

How to internationalize java source code?

EDIT: I completely re-wrote the question since it seems like I was not clear enough in my first two versions. Thanks for the suggestions so far.
I would like to internationalize the source code for a tutorial project (please notice, not the runtime application). Here is an example (in Java):
/** A comment */
public String doSomething() {
System.out.println("Something was done successfully");
}
in English , and then have the French version be something like:
/** Un commentaire */
public String faitQuelqueChose() {
System.out.println("Quelque chose a été fait avec succès.");
}
and so on. And then have something like a properties file somewhere to edit these translations with usual tools, such as:
com.foo.class.comment1=A comment
com.foo.class.method1=doSomething
com.foo.class.string1=Something was done successfully
and for other languages:
com.foo.class.comment1=Un commentaire
com.foo.class.method1=faitQuelqueChose
com.foo.class.string1=Quelque chose a été fait avec succès.
I am trying to find the easiest, most efficient and unobtrusive way to do this with the least amount of manual grunt work (other than obviously translating the actual text). Preferably working under Eclipse. For example, the original code would be written in English, then externalized (to properties, preferably leaving the original source untouched), translated (humanly) and then re-generated (as a separate source file / project).
Some trails I have found (other than what AlexS suggested):
AntLR, a language parser / generator. There seems to be a supporting Eclipse plugin
Using Eclipse's AST (Abstract Syntax Tree) and I guess building some kind of plugin.
I am just surprised there isn't a tool out there that does this already.
I'd use unique strings as methodnames (or anything you want to be replaced by localized versions.
public String m37hod_1() {
System.out.println(m355a6e_1);
}
then I'd define a propertyfile for each language like this:
m37hod_1=doSomething
m355a6e_1="Something was done successfully"
And then I'd write a small program parsing the sourcefiles and replacing the strings. So everything just outside eclipse.
Or I'd use the ant task Replace and propertyfiles as well, instead of a standalone translation program.
Something like that:
<replace
file="${src}/*.*"
value="defaultvalue"
propertyFile="${language}.properties">
<replacefilter
token="m37hod_1"
property="m37hod_1"/>
<replacefilter
token="m355a6e_1"
property="m355a6e_1"/>
</replace>
Using one of these methods you won't have to explain anything about localization in your tutorials (except you want to), but can concentrate on your real topic.
What you want is a massive code change engine.
ANTLR won't do the trick; ASTs are necessary but not sufficient. See my essay on Life After Parsing. Eclipse's "AST" may be better, if the Eclipse package provides some support for name and type resolution; otherwise you'll never be able to figure out how to replace each "doSomething" (might be overloaded or local), unless you are willing to replace them all identically (and you likely can't do that, because some symbols refer to Java library elements).
Our DMS Software Reengineering Toolkit could be used to accomplish your task. DMS can parse Java to ASTs (including comment capture), traverse the ASTs in arbitrary ways, analyze/change ASTs, and the export modified ASTs as valid source code (including the comments).
Basically you want to enumerate all comments, strings, and declarations of identifiers, export them to an external "database" to be mapped (manually? by Google Translate?) to an equivalent. In each case you want to note not only the item of interest, but its precise location (source file, line, even column) because items that are spelled identically in the original text may need different spellings in the modified text.
Enumeration of strings is pretty easy if you have the AST; simply crawl the tree and look for tree nodes containing string literals. (ANTLR and Eclipse can surely do this, too).
Enumeration of comments is also straightforward if the parser you have captures comments. DMS does. I'm not quite sure if ANTLR's Java grammar does, or the Eclipse AST engine; I suspect they are both capable.
Enumeration of declarations (classes, methods, fields, locals) is relatively straightforward; there's rather more cases to worry about (e.g., anonymous classes containing extensions to base classes). You can code a procedure to walk the AST and match the tree structures, but here's the place that DMS starts to make a difference: you can write surface-syntax patterns that look like the source code you want to match. For instance:
pattern local_for_loop_index(i: IDENTIFIER, t: type, e: expression, e2: expression, e3:expression): for_loop_header
= "for (\t \i = \e,\e2,\e3)"
will match declarations of local for loop variables, and return subtrees for the IDENTIFIER, the type, and the various expressions; you'd want to capture just the identifier (and its location, easily done by taking if from the source position information that DMS stamps on every tree node). You'd probably need 10-20 such patterns to cover the cases of all the different kinds of identifiers.
Capture step completed, something needs to translate all the captured entities to your target language. I'll leave that to you; what's left is to put the translated entities back.
The key to this is the precise source location. A line number isn't good enough in practice; you may have several translated entities in the same line, in the worst case, some with different scopes (imagine nested for loops for example). The replacement process for comments, strings and the declarations are straightforward; rescan the tree for nodes that match any of the identified locations, and replace the entity found there with its translation. (You can do this with DMS and ANTLR. I think Eclipse ADT requires you generate a "patch" but I guess that would work.).
The fun part comes in replacing the identifier uses. For this, you need to know two things:
for any use of an identifier, what is the declaration is uses; if you know this, you can replace it with the new name for the declaration; DMS provides full name and type resolution as well as a usage list, making this pretty easy, and
Do renamed identifiers shadow one another in scopes differently than the originals? This is harder to do in general. However, for the Java language, we have a "shadowing" check, so you can at least decide after renaming that you have an issues. (There's even a renaming procedure that can be used to resolve such shadowing conflicts
After patching the trees, you simply rewrite the patched tree back out as a source file using DMS's built-in prettyprinter. I think Eclipse AST can write out its tree plus patches. I'm not sure ANTLR provides any facilities for regenerating source code from ASTs, although somebody may have coded one for the Java grammar. This is harder to do than it sounds, because of all the picky detail. YMMV.
Given your goal, I'm a little surprised that you don't want a sourcefile "foo.java" containing "class foo { ... }" to get renamed to .java. This would require not only writing the transformed tree to the translated file name (pretty easy) but perhaps even reconstructing the directory tree (DMS provides facilities for doing directory construction and file copies, too).
If you want to do this for many languages, you'd need to run the process once per language. If you wanted to do this just for strings (the classic internationalization case), you'd replace each string (that needs changing, not all of them do) by a call on a resource access with a unique resource id; a runtime table would hold the various strings.
One approach would be to finish the code in one language, then translate to others.
You could use Eclipse to help you.
Copy the finished code to language-specific projects.
Then:
Identifiers: In the Outline view (Window>Show View>Outline), select each item and Refactor>Rename (Alt+Shift+R). This takes care of renaming the identifier wherever it's used.
Comments: Use Search>File to find all instances of "/*" or "//". Click on each and modify.
Strings:
Use Source>Externalize strings to find all of the literal strings.
Search>File for "Messages.getString()".
Click on each result and modify.
On each file, ''Edit>Find/Replace'', replacing "//\$NON-NLS-.*\$" with empty string.
for the printed/logged string, java possess some internatization functionnalities, aka ResourceBundle. There is a tutorial about this on oracle site
Eclipse also possess a funtionnality for this ("Externalize String", as i recall).
for the function name, i don't think there anything out, since this will require you to maintain the code source on many version...
regards
Use .properties file, like:
Locale locale = new Locale(language, country);
ResourceBundle captions= ResourceBundle.getBundle("Messages",locale);
This way, Java picks the Messages.properties file according to the current local (which is acquired from the operating system or Java locale settings)
The file should be on the classpath, called Messages.properties (the default one), or Messages_de.properties for German, etc.
See this for a complete tutorial:
http://docs.oracle.com/javase/tutorial/i18n/intro/steps.html
As far as the source code goes, I'd strongly recommend staying with English. Method names like getUnternehmen() are worse to the average developer then plain English ones.
If you need to familiarize foreign developers to your code, write a proper developer documentation in their language.
If you'd like to have Javadoc in both English and other languages, see this SO thread.
You could write your code using freemarker templates (or another templating language such as velocity).
doSomething.tml
/** ${lang['doSomething.comment']} */
public String ${lang['doSomething.methodName']}() {
System.out.println("${lang['doSomething.message']}");
}
lang_en.prop
doSomething.comment=A comment
doSomething.methodName=doSomething
doSomething.message=Something was done successfully
And then merge the template with each language prop file during your build (using Ant / Gradle / Maven etc.)

Good practices for formatting simulation output

This is almost a programming question, but geared towards physicists.
Suppose I am writing a piece of software that takes some system parameters as input and then calculates something from it, in my case a spectral function $A(k,\omega)$.
When I want to just take the output and feed it to gnuplot, I should make the program output a simple table with one column for the $k$-values, one for $\omega$ and one for $A(k,\omega)$.
But then I cannot store there all the additional information, such as what parameters were used. And maybe I want to store in that output some additional debugging information such as intermediate quantities. In my example, the spectral function is obtained from the self energy, so in some situations I might want to look at the self energy directly.
I do not want to constantly hack the source code depending on what output I want. It would be nicer if all the relevant data of a "run" would be present in a single file/entity but so that it is still easy to extract tables I can feed to gnuplot.
Not wanting to reinvent the wheel and develop a full-blown file format, are there some "standards" around that are best used when creating, processing and storing data from calculations or simulations? Maybe even in an SQL database format?
There are dozens of methods, and none too good; I'll share two mine:
If the program is worth it, I add a small parser of config files. Then I just make a cofig, let's say, SimA.in, and simulator makes a bunch of files with corresponding data SimA.paths, SimA.stats, SimA.log, etc. Unless the names are unique and I add version of the code to log, this makes the results fully reproducible and the simulation itself portable enough to be easily manageable.
If not, I just wrap a code a bit and use R as a host. Then I just return all the arrays and scalars (R data structures are very flexible, and it is easy to cast native R or C structs) and use R to manage, save/load and of course visualize and analyse the data. Moreover, with Sweave and CacheSweave the whole executing, analysis and reporting can be bunched in an elegant bunch, fully reproducible with one command.
If you want an "enterprise" solution, try NetCDF or HDF5. But I feel it may be an overkill here.
And of course a version control of the simulator code is a must. But that's obvious =)
For a project I'm currently working on that uses Python and C++ (via SWIG), I'm planning to use a short python script as input file. So, in a way, I'll be 'hacking the source' to change parameters, but in an interpreted language, not a compiled one.
Currently, I plan to have an input file like parameters.py, and use it like from parameters import params. But that might be too dependent on correct syntax.
params = {
"foods" : ["spam", "beans", "eggs"],
"costs" : [199, 4, 1],
"customerAge" : 23,
}
Another option might be to just define the variables at the script level in parameters2.py. This loses the nice dictionary packaging, but makes it a little harder for the user to mess it up. And it probably wouldn't be to hard to write a 'parser' that puts those script-level variables into a nice dictionary. A plus to method is that the user could parameterize things that weren't originally considered--from parameters2 import * would overwrite previous definitions of those parameters. Of course, this might be bad if the user overwrites something important.
foods = ["spam", "beans", "eggs"]
costs = [199, 4, 1]
customerAge = 23
parameters3.py would use a class, though it is contraindicated by Python's persnicketiness about indentation. from parameters3 import params:
class params:
foods = ["spam", "beans", "eggs"]
costs = [199, 4, 1]
customerAge = 23
I should also mention, for completeness, that our C++ code also defines a parameters class. That is, in our actual project, parameters.py is a SWIG wrapper for a corresponding C++ class. You'd use like from parameters4 import params. However, this allows only parameters that are already declared in the C++ class.
import parameters
params = parameters.Parameters()
params.foods = ["spam", "beans", "eggs"]
params.costs = [199, 4, 1]
params.customerAge = 23

Rose::DB::Object::Manager and HTML Template

I am using Rose::DB::Object::Manager (get/iterate methods) to source data from a database and HTML::Template for reporting.
The HTML report requires a TMPL_LOOP to display entries in a database.
My question is how do I create an array reference with the get/iterate methods of RDBOM and pass it to HTML::Template. Thank You.
Rose::DB::Object::Manager's get/iterate methods will give you objects, but HTML::Template wants plain Perl data structures and values. To bridge the gap, use one or more of the methods in the Rose::DB::Object::Helpers module. The as_tree or column_value_pairs methods are probably your best bets.