Best way to represent format for presentation of cells in a grid? - tsql

I am building a dynamic reporting feature for a client. They want to create new stored procedures, and have them correspond to new reports. We are using T-SQL and each cell in a grid/report can have its own formatting and/or functionality.
I'm looking for a format specification to identify presentation, color and conditionals for data... for instance, I am thinking of something like this:
{data}|{format}
123.56|$#,##0.00
Results in ... $123.56
I am looking for standard ways to represent the formatting field, with the potential for colors and conditionals. Is there some standard out there already?

It all depends on what you're looking for. You have to ask yourself what types of formatting you wish to apply. Here's some cases you might want to consider:
In-line formatting
Do you want to have a cell that contains mixed formatting (e.g. "1234.567" shows bold, regular and italic in a single cell)?
Multi-column based output
Do you want to output a value in a cell that's based on multiple cells?
Cell1="1234"
Cell2="56"
Cell3={Cell1}.{Cell2}
---> which would output "1234.56"
If don't need either of those things, then all you want to do is provide a single format for the entire cell. Let's divide it into the two formatting elements: transformations and visual effects:
Formatting "1234.5678" into "1234.56" is a transformation. It has to be done by code that knows how to interpret the value as a number, and how to turn that number into the textual string of digit-characters.
Making a cell blue, or the text red, or bold - these are all visual transformations that are merely a set of attributes regarding the display of data in a cell. We don't care here about the type of data in the cell, since we just have to put pixels on a screen.
So, to bottom-line this: it's all about what you want to happen. If you're producing HTML reports, then HTML & CSS are very convenient methods for describing the visual-effects formatting of the cell, since you won't have to convert it twice.
As far as I know, there's only a couple of standards for encoding visual-effects display, and they are similar to SGML - TeX, HTML, PostScript, etc; they all have "tags" (sometimes with "attributes") to modify the display of the content within the tag.
Which leaves us the transformational formatting. There were two common approaches to this. The first is procedural. You list a set of transformations you wish to do on the data to turn it into text. Nowadays, we often use substitution masks, like in your example, $#,##0.00, or like in sprintf's %.2f, etc.
Again, just choose a formatting specifier that is the simplest to use in your environment. If you're coding in a language that accepts a certain format, then use it!

Related

iTextSharp Flattening Form Removes Indentation, Spacing

I am having an issue using Adobe LiveCycle Designer in conjunction with iTextSharp. I have a multi-line text field that I'm stamping That looks like...
Blah blah text here: _________________
______________________________________
______________________________________
In LiveCycleDesigner, I have a single field that encapsulated all 3 lines (including the static text). I've set the font/paragraph settings so that the first line indents over to where the field starts, the field aligns vertically, and the lines are spaced properly.
When I use PdfStamper to set the fields (without flattening the form), it looks fine in Adobe (though Chrome and Firefox default plugins don't seem to support AcroForms very well). When I flatten the form, though, I lose everything but the font.
Does iTextSharp just not support the ability to do this? Is there some better way I should be doing this? I'm trying to build a generic form builder for my application, so a one-off fix won't really be useful for me.
The only alternative I've thought of is to break it into 3 fields on the PDF and use some clever grouping and MeasureString() (UGH) to determine how much of my string can fit in each field. Can anyone think of anything better?

What's a good method for writing fixed width field files?

I need to write a file that is probably being interpreted by something like RPG IV on an AS/400 (but I don't know that). The file will be created by reading data from our MySQL database and then writing it in the specified format. It could be quite large ( potentially measured in GB but haven't determined yet ). Right now I'm thinking Perl's built in format might actually be my best bet, because things like Xslate, and Template Toolkit are more designed for things that aren't fixed width (HTML). My only concern there is that format doesn't appear to have conditionals and it looks like I may need them (I found a format left justified if field A is set, right justified and padded if not)
Other possibilities that come to mind are pack and the sprintf family of functions.
I don't think pack supports right-justified text, so that wouldn't be an option.
That leaves (s)printf. You can build format specifiers programatically to support your conditional logic for justification.
Template Toolkit can do a serviceable job at creating fixed width formatted files. The trick is to use the templates to describe the file and record structure, but have a Perl function format the data for each field.
It may be easier to skip the templates and do all the formatting in Perl. Either way you need to consider how you need to format your fields. In my experience sprintf is better and handling more of the formatting cases required by fixed width formatted files. You will probably still need to implement a few helper functions the hand oddities (like EBCDIC/COBOL signed numbers encoded in ASCII, if your unlucky enough).
There are a thousand odd special cases in legacy fixed width formatted files, it's almost enough to make me like XML data files, typically it's the oddest special case in the end that determines what the best method for formatting the file is.

How to prevent line breaks with jasper-reports HTML export when using textfield truncation?

Using iReport 4.5.0, I'm setting these two properties and values:
net.sf.jasperreports.text.truncate.at.char=true
net.sf.jasperreports.text.truncate.suffix=...
The intent is to add "..." to the end of textfields whenever they must be truncated, and that the truncation determination happens at the character level, rather than at the word level. This works as expected when exporting to PDF. However, when exporting to HTML, the last truncated token (with the suffix appended) will often, though not always, wrap incorrectly. (It does this even though StretchType is set to No Stretch.) Example:
If I change net.sf.jasperreports.text.truncate.at.char=false (so that it breaks on words instead of characters) it seems to work more often, but only because word breaks usually leave more space for the suffix. The unexpected line wrapping still occurs with word breaks, especially if I increase the length of the given suffix.
My best guess is that the HTML exporter measurement isn't precisely calculating the width required by the given suffix (if it's calculating it at all).
Can anyone confirm?
Any suggestions as to a workaround?
It seems like with StretchType set to No Stretch, that the HTML exporter should probably also set white-space:nowrap. However, although that would prevent the line from wrapping, the end of the suffix would be partially hidden (due to overflow:hidden styling).
"My best guess is that the HTML exporter measurement isn't precisely calculating the width required by the given suffix (if it's calculating it at all)."
I confirm that this is surely the reason.
But there's not really a simple workaround. Your PDF is good, so you're doing something right. Well... you're doing lots of things right. ;-)
In HTML you don't know--in a very fundamental way--the precise details of the font that will render the text. You can certainly specify the font. But the client machine might not have it. Or it might have one that is the same... but not quite the same. Or the client might choose to use a different font or different size via various client-side override mechanisms.
If you try different fonts, you should notice slightly different results. You may be able to find one that works better more often. (Clearly, this isn't 100% perfect.)
If you aren't using Font Extensions, then you should. If you are using Font Extensions, then you can specify the list of fonts in descending preference that ought to be used in the HTML. This should give you enough control to get behavior that is good in a large number of cases. Often you can make it perfect in all of the cases that you care about.

Website localization for multibyte languages

I have started to code a multi-language feature for a medium-sized website with a lot of hardcoded text. As the website is supposed to be translated into Japanese and Korean (multibyte character set) I am considering the following:
If I use string externalization, do the strings for Japanese or Korean need to be in unicode form within the locale file (i.e. 台北 instead of 台北 as string value)?
Would it make more sense to store the localization in a DB (i.e. MySQL) and retrieve the respective values via a localization function in PHP?
Your thought input is much appreciated.
Best regards
$0.02 from someone who has some experience with i18n...
Keep your translations in human-readable form, as it will likely be translators and not coders managing these resources.
If this text (hard-coded, you say) is not subject to frequent change, then you may wish to store these resources as files that you read in at runtime.
If this text is subject to frequent change, then you may wish to explore other alternatives for storing resources, such as databases or in-memory key-value stores.
Depending upon your requirements, you may want to consider a mixture of the above.
But I strongly suggest that you avoid mixing code (the HTML character entities) with your translation resources. Most translators will not understand what they mean and may break them when they are translating. And on the flip-side, a programmer may not understand how to insert code or formatting into the translation resources properly, unless they actually understand that language.
tl;dr
- use UTF-8
- don't mix any code/formatting into the translations themselves
- how you store the translations depends upon your requirements
I doubt that string externalization would be your biggest problem. But let me give you some advise.
String externalization
Of course you would need to separate translatable strings from the code. I would recommend storing translation in plain text, UTF-8 encoded file containing key-value pairs:
some.key=some translation
Of course you would need to write a helper script to resolve this at runtime. The script would need to detect end-user's language.
Language detection
Web browsers are so nice to send AcceptLanguage header each time they send a request. What you need to do, is to read the content of this header and check if you support any of the language user has listed. If so, read the resource file (as defined above) and return strings for given language, return your default language otherwise. The code example below will give you the most desired language (which is not necessary the one you support):
<?php
$locale = Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']);
echo $locale;
?>
This is still, not the biggest of your challenges.
Styles and style sheets
The real problem with multilingual web sites or web applications are styles. People tend to put style definitions in-line, which is problematic to say the least. Also, designers tend to think that Arial is the best font for entire Universe, as well as emphasis always have to come with bolded font. The only problem is, the font might be unreadable under some circumstances.
I must admit, I don't know why it happens, but most of the times web browsers tend to ignore bold attribute for Asian scripts (which is good), but sometimes they do not and it could became a major challenge for end users if your font definition is say font-family:Arial; font-size:10px;.
The other problem could be colors. Depending on your web site design, some colors used might be inappropriate for target customers. That is because we all tend to assign meaning to colors based on our cultural background.
Images containing localizable text could also give you a headache, you would need to either externalize such texts (and write them down just like any other HTML element), or prepare multilingual resources structure (i.e. put all images to directories named after language code ("en", "ja", "ko")).
The real challenge however, are hard-coded formatting tags like <b>, <i>, <u>, <strong>, etc. Nobody should use them nowadays, style classes should be used instead but the common practice is different. You would probably need to replace them with style classes; each element could have more than one style class, which to my surprise is not common knowledge (for example <p class="main boldText">).
OK, once you have your styles externalized, you would probably be forced to implement some sort of CSS Localization Mechanism. This is needed in the lights of what I wrote above. The easiest way to do that is to create directory structure similar to the one I mentioned before - "en" for English base CSS files, "ja" for Japanese and "ko" for Korean, so each language would have their own, separate set of CSS files. This is similar to UI skins, only in that case user won't be able to choose the skin, you will decide on which CSS to present them - you would detect language anyway.
As for in-line style definitions (<p style="whatever">), after you define CSS L10n Mechanism, you could override any style by forcing it with !important keyword. That is, unless somebody in his very wrong mind put this keyword to in-line style definition.
Concatenations
Well, this is your biggest challenge. Even people who understand the need of string externalization tend to concatenate the strings like this:
$result = $label + ": " + $product;
$message = "$your_basket_is + $basket_status + ".";
This poses serious problem for Internationalization (and if it is not resolved for Localization as well). That is because, the order of the sentence tend to be different after translating text into different language (this especially regards to Korean). Also, I showed you hard-coded punctuations, which are not necessary correct for Asian languages. That is what I have to go through on a daily basis :/
What you would probably need to do, is to remove such concatenations, or use some means of message formatting. The PHP example (taken directly from web page I am referencing) would be:
<?php
$fmt = new MessageFormatter("en_US", "{0,number,integer} monkeys on {1,number,integer} trees make {2,number} monkeys per tree");
echo $fmt->format(array(4560, 123, 4560/123));
$fmt = new MessageFormatter("de", "{0,number,integer} Affen auf {1,number,integer} Bäumen sind {2,number} Affen pro Baum");
echo $fmt->format(array(4560, 123, 4560/123));
?>
As you can see in this example, numbers are also formatted to much locale style. This leads us to:
Locale aware formatting
Dates, times, numbers and currencies or other similar information need to be formatted according to user-detected Locale. There is a slight difference here: you should attempt to do that, even if you do not support related language resources (do not have translations). Of course for currency symbol, you would use whatever is your real currency, not the user's default, but the format should respect end user's cultural background.
Summary
I have just presented you with a short introduction to multilingual web site design with focus on Japanese and Korean target markets. If at some point you would need to support Chinese Simplified as well, support for GB18030 encoding would be probably needed as well. This would be very challenging...
You do not want to store all your text as HTML entities. It'll drive you mad. The only reason to do this is if you need to serve your document in an ASCII encoding and cannot embed the characters directly. But in this day and age there's no reason for that; serve your document as UTF-8 and write and store your contents in UTF-8 and be done with it.
Whether or not to store translations in the database depends on many factors, including performance, caching, whether you need to be able to search for the text, whether the text should be editable by non-programmers etc. Usually .mo/.po translation files with gettext are a good way to go unless proven otherwise.

Where to get a reference image for any unicode code point?

I am looking for an online service (or collection of images) that can return an image for any unicode code point.
Unicode.org does not have an image for each one, consider for example
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=31cf
EDIT: I need to use these images programmatically, so the code chart PDFs provided at unicode.org are not useful.
The images in the PDF are copyrighted, so there are legal issues around extracting them. (I am not a lawyer.) I suspect that those legal issues prevent a simple solution from being provided, unless someone wants to go to the trouble of drawing all of those images. It might happen, but seems unlikely.
Your best bet is to download a selection of fonts that collectively cover the entire range of characters, and display the characters using those fonts. There are two difficulties with this approach: combining characters and invisible characters.
The combining characters can easily be detected from the Unicode database, and you can supply a base character (such as NBSP) to use for displaying them. (There is a special code point intended for this purpose, but I can't find it at the moment.)
Invisible characters could be displayed with a dotted square box containing the abbreviation for the character. Those you may have to locate manually and construct the necessary abbreviations. I am not aware of any shortcuts for that.