Interpretation of ambiguous dates by Microsoft Access textbox - forms

I've been searching around without any luck for an MSDN or any other official specification which describes how 2 digit years are interpreted in a date format textbox. That is, when data is manually entered into a textbox on a form, with the format sent to short date. (My current locale defines dates as yyyy/MM/dd)
A few random observations (conversion from entered date)
29/12/31 --> 2029/12/31
30/1/1 --> 1930/01/01
So far it makes sense, the range for 2 digit dates is 1930 to 2029. Then as we go on,
1/2/32 --> 1932/01/02 (interpreted as M/d/yy)
15/2/28 --> 2015/02/28 (interpreted as yy/M/dd)
15/2/29 --> 2029/02/15 (interpreted as M/d/yy)
2/28/16 --> 2016/02/28 (interpreted as M/dd/yy)
2/29/15 --> 2029/02/15 (interpreted as M/yy/dd)
It tries to twist about invalid dates so that they are valid in some format, but seem to ignore the system locale setting for dates. Only the ones that are invalid in any format (like 0/0/1) seem to generate an error. Is this behavior documented somewhere?
(I only want to refer the end user to this documentation, I have no problem with the actual behavior)

The 29/30 split was settled this way with Access 2.0 as of 1999-12-17 in the Acc2Date.exe Readme File as part of the last Y2K update:
Introduction
The Acc2Date.exe file contains three updated files that modify the way
Microsoft Access 2.0 interprets two-digit years. By default, Access
2.0 interprets all dates that are entered by the user or imported from a text file to fall within the 1900s. After you apply the updated
files, Access 2.0 will treat two-digit dates that are imported from
text in the following manner:
00 to 29 - resolve to the years 2000 to 2029 30 to 99 - resolve
to the years 1930 to 1999
Years that are entered into object property sheets, the query design
grid, or expressions in Access modules will be interpreted based on a
100-year sliding date window as defined in the Win.ini on the computer
that is running Access 2.0.
The Acc2Date.exe file contains the following files:
File name Version Description
---------------------------------------------------------------------
MSABC200.DLL 2.03 The Updated Access Basic file
MSAJT200.DLL 2.50.2825 The Updated Access Jet Engine Library file
MSAJU200.DLL 2.50.2819 The Updated Access Jet Utilities file
Readme.txt n/a This readme file
For more information about the specific issues solved by this update,
see the following articles in the Microsoft Knowledge Base:
Article ID: Q75455
Title : ACC2: Years between 00 and 29 Are Interpreted as 1900 to 1929
That article can be found here as KB75455 (delayed page load):
ACC2: Years Between 00 and 29 Are Interpreted as 1900 to 1929
As for the 2/29/15 is not accepted here where system default is dd-mm-yyyy, so there are limits to how much creativity Access/VBA puts into interpreting date expressions.

Related

How to find the missing components of Chinese characters within Unicode?

I am currently working on the decomposition of Chinese characters (Japanese kanji, to be more exact) and I have found a few components that are seemingly either not included in the Unihan database or they cannot be properly displayed with any font I am aware of. Is there some way to locate these characters within UTF-8 or UTF-16 and make them to be properly displayed in their character form? The list of components is provided below:
渋 ---> 氵+ 止 + ??? ... I have not managed to find those four dots in the Unihan database ... even here the authors had to encode the component ... the same issue appears in kanji 楽 and 摂 and 率
龍 ---> 𦚏 + ??? .... the component on the right hand side seems not to be in the Unicode ... the same goes for 拝 or 継
制 ---> ??? + 刂 ... left component seems not to be in the Unicode (the closest probably is 韦) ... the same goes for the kanji 段 ---> ??? + 殳
祭 ---> ??? + 示
留 ---> ??? + 田 (it is possible to decompose into three components 𠫔 + 刀 + 田, but two would be better)
Thank you very much for your advice :-)
I went through the whole Unihan database (over 90 000 chars) and did not manage to find the missing components. I tried installing various fonts Babel Stone Han, simch5100, etc. but their coverage of Unicode is not 100%. Nevertheless, I am afraid that some of these components are not included within Unicode by themselves and they can be displayed only as a part another character.
You may want to have a look at the IDS.TXT data file maintained by Andrew West (BabelStone), which provides Ideographic Description Sequences (IDS) for all the 97,058 CJK unified ideographs defined in Unicode version 15.0.
It makes use of about 120 "numbered components" which are characters not yet defined in Unicode (although it seems they may be added later on, according to some official proposal). They are currently represented by glyphs found in an associated Private Use Area (PUA) font named BabelStone Han PUA, which can be freely downloaded from the bottom of the page.
There is also one open-source application making extensive use of this data in a graphical way, called Unicopedia Sinica, available on GitHub.

`uuuu` versus `yyyy` in `DateTimeFormatter` formatting pattern codes in Java?

The DateTimeFormatter class documentation says about its formatting codes for the year:
u year year 2004; 04
y year-of-era year 2004; 04
…
Year: The count of letters determines the minimum field width below which padding is used. If the count of letters is two, then a reduced two digit form is used. For printing, this outputs the rightmost two digits. For parsing, this will parse using the base value of 2000, resulting in a year within the range 2000 to 2099 inclusive. If the count of letters is less than four (but not two), then the sign is only output for negative years as per SignStyle.NORMAL. Otherwise, the sign is output if the pad width is exceeded, as per SignStyle.EXCEEDS_PAD.
No other mention of “era”.
So what is the difference between these two codes, u versus y, year versus year-of-era?
When should I use something like this pattern uuuu-MM-dd and when yyyy-MM-dd when working with dates in Java?
Seems that example code written by those in the know use uuuu, but why?
Other formatting classes such as the legacy SimpleDateFormat have only yyyy, so I am confused why java.time brings this uuuu for “year of era”.
Within the scope of java.time-package, we can say:
It is safer to use "u" instead of "y" because DateTimeFormatter will otherwise insist on having an era in combination with "y" (= year-of-era). So using "u" would avoid some possible unexpected exceptions in strict formatting/parsing. See also this SO-post. Another minor thing which is improved by "u"-symbol compared with "y" is printing/parsing negative gregorian years (in far past).
Otherwise we can clearly state that using "u" instead of "y" breaks long-standing habits in Java-programming. It is also not intuitively clear that "u" denotes any kind of year because a) the first letter of the English word "year" is not in agreement with this symbol and b) SimpleDateFormat has used "u" for a different purpose since Java-7 (ISO-day-number-of-week). Confusion is guaranteed - for ever?
We should also see that using eras (symbol "G") in context of ISO is in general dangerous if we consider historic dates. If "G" is used with "u" then both fields are unrelated to each other. And if "G" is used with "y" then the formatter is satisfied but still uses proleptic gregorian calendar when the historic date mandates different calendars and date-handling.
Background information:
When developing and integrating the JSR 310 (java.time-packages) the designers decided to use Common Locale Data Repository (CLDR)/LDML-spec as the base of pattern symbols in DateTimeFormatter. The symbol "u" was already defined in CLDR as proleptic gregorian year, so this meaning was adopted to new upcoming JSR-310 (but not to SimpleDateFormat because of backwards compatibility reasons).
However, this decision to follow CLDR was not quite consistent because JSR-310 had also introduced new pattern symbols which didn't and still don't exist in CLDR, see also this old CLDR-ticket. The suggested symbol "I" was changed by CLDR to "VV" and finally overtaken by JSR-310, including new symbols "x" and "X". But "n" and "N" still don't exist in CLDR, and since this old ticket is closed, it is not clear at all if CLDR will ever support it in the sense of JSR-310. Furthermore, the ticket does not mention the symbol "p" (padding instruction in JSR-310, but not defined in CLDR). So we have still no perfect agreement between pattern definitions across different libraries and languages.
And about "y": We should also not overlook the fact that CLDR associates this year-of-era with at least some kind of mixed Julian/Gregorian year and not with the proleptic gregorian year as JSR-310 does (leaving the oddity of negative years aside). So no perfect agreement between CLDR and JSR-310 here, too.
In the javadoc section Patterns for Formatting and Parsing for DateTimeFormatter it lists the following 3 relevant symbols:
Symbol Meaning Presentation Examples
------ ------- ------------ -------
G era text AD; Anno Domini; A
u year year 2004; 04
y year-of-era year 2004; 04
Just for comparison, these other symbols are easy enough to understand:
D day-of-year number 189
d day-of-month number 10
E day-of-week text Tue; Tuesday; T
The day-of-year, day-of-month, and day-of-week are obviously the day within the given scope (year, month, week).
So, year-of-era means the year within the given scope (era), and right above it era is shown with an example value of AD (the other value of course being BC).
year is the signed year, where year 0 is 1 BC, year -1 is 2 BC, and so forth.
To illustrate: When was Julius Caesar assassinated?
March 15, 44 BC (using pattern MMMM d, y GG)
March 15, -43 (using pattern MMMM d, u)
The distinction will of course only matter if year is zero or negative, and since that is rare, most people don't care, even though they should.
Conclusion: If you use y you should also use G. Since G is rarely used, the correct year symbol is u, not y, otherwise a non-positive year will show incorrectly.
This is known as defensive programming:
Defensive programming is a form of defensive design intended to ensure the continuing function of a piece of software under unforeseen circumstances.
Note that DateTimeFormatter is consistent with SimpleDateFormat:
Letter Date or Time Component Presentation Examples
------ ---------------------- ------------ --------
G Era designator Text AD
y Year Year 1996; 96
Negative years has always been a problem, and they now fixed it by adding u.
Long story short
For 99 % of purposes you can toss a coin, it will make no difference whether you use yyyy or uuuu (or whether you use yy or uu for 2-digit year).
It depends on what you want to happen in case a year earlier than 1 CE (1 AD) occurs. The point being that in 99 % of programs such a year will never occur.
Two other answers have already presented the facts of how u and y work very nicely, but I still felt something was missing, so I am contributing the slightly more opinion-based answer.
For formatting
Assuming that you don’t expect a year before 1 CE to be formatted, the best thing you can do is to check this assumption and react appropriately in case it breaks. For example, depending on circumstances and requirements, you may print an error message or throw an exception. One very soft failure path might be to use a pattern with y (year of era) and G (era) in this case and a pattern with either u or y in the normal, current era case. Note that if you are printing the current date or the date your program was compiled, you can be sure that it is in the common era and may opt to skip the check.
For parsing
In many (most?) cases parsing also means validating meaning you have no guarantees what your input string looks like. Typically it comes from the user or from another system. An example: a date string comes as 2018-09-29. Here the choice between uuuu and yyyy should depend on what you want to happen in case the string contains a year of 0 or negative (e.g., 0000-08-17 or -012-11-13). Assuming that this would be an error, the immediate answer is: use yyyy in order for an exception to be thrown in this case. Still finer: use uuuu and after parsing perform a range check of the parsed date. The latter approach allows both for a finer validation and for a better error message in case of a validation error.
Special case (already mentioned by Meno Hochschild): If your formatter uses strict resolver style and contains y without G, parsing will always fail because strictly speaking year of era is ambiguous without era: 1950 might mean 1950 CE or 1950 BCE (1950 BC). So in this case you need u (or supplying a default era, this is possible through a DateTimeFormatterBuilder).
Long story short again
Explicit range check of your dates, specifically your years, is better than relying on the choice between uuuu and yyyy for catching unexpected very early years.
Short comparison, if you need strict parsing:
Examples with invalid Date 31.02.2022
System.out.println(DateTimeFormatter.ofPattern("dd.MM.yyyy").withResolverStyle(ResolverStyle.STRICT).parse("31.02.2022"));
prints "{MonthOfYear=2, DayOfMonth=31, YearOfEra=2022},ISO"
System.out.println(DateTimeFormatter.ofPattern("dd.MM.uuuu").withResolverStyle(ResolverStyle.STRICT).parse("31.02.2022"));
throws java.time.DateTimeException: Invalid date 'FEBRUARY 31'
So you must use 'dd.MM.uuuu' to get the expected behaviour.

Raw Excel Data contains different Date formats

I have huge amounts of raw data that are separated by columns. All is well when i import these to Matlab except for the fact that I just saw that the excel files contains different formats for the dates.
One series (i.e 3 days, 1 row or each hour gets 3x24 rows) have its' dates in the format "mm/dd/yyyy" which neither excel or matlab recognizes as proper dates.
I've tried solving this problem in different ways. First i tried to just highlight the cells and use the function format cells, but this didn't work since excel doesn't see them as 'cells' but rather as 'text'.
Then i tried the Text to columns function which didn't work either (delimited or fixed width).
Im really stuck and would appreciate some help with this.
In Excel:
If cell A1 has a string like mm/dd/yyyy then try this:
=DATE(RIGHT(A1,4), LEFT(A1,2), MID(A1,4,2))
In Matlab:
=datenum(yourDateString, 'mm/dd/yyyy')
Select the desired range to fix and use this script:
Sub bulk_Date_fix()
on error resume next
Set d_ranged = Selection
For Each a In d_ranged
a.Value = Split(a.Value, "/")(0) & "/" & Split(a.Value, "/")(1) & "/" & Split(a.Value, "/")(2)
Next
on error goto 0
End Sub
How it works: The above script loops through all the cells in the selected area and splits out the various attributes of a date based on the "/" symbol.
I examined your file and you will need to go back to the source data to straighten this out. Instead of "opening" the file in Excel, you will need to IMPORT the file. When you do that, the text import wizard will open, and you can then designate the date column as being of the format DMY (or whatever format is being generated by the source).
The problem is that there is a mismatch between the format of the file, and your Windows Regional Short date format. If you look at Row 229, you will see that real dates are present, but out of sequence with the rest.
Excel parses dates being input according to the Windows Regional Short Date settings. If the date doesn't make sense, according to that parsing (e.g. month > 12) , Excel will interpret the date as a string; if the date does make sense, it will be interpreted as a date in accordance with that windows regional date component order, and this will be different from what is intended.
So the first few hundred dates are strings, but at line 229, the date, which is probably meant to be 12 OCT 2014, gets changed to 10 DEC 2014. This will happen in other areas where that value in the 2nd position is 12 or less.
EDIT: I am not certain of the variabilities inherent in using XL on the MAC. In the Windows version of XL, the "text import" feature is on the Data Ribbon / Get External Data Tab:
When you click on that and open a text file, you will see the Text Import Wizard, and when you get to Step 3, you will be able to specify the text format of the data to be imported:

Convert to alphanumeric a sequential file generated by COBOL with compacted data

I have a COBOL program which generates a sequential file with this structure:
FD ALUMNOS-FILE.
01 ALUMNOS-DATA.
88 EOF VALUE HIGH-VALUES.
05 STUDENTID PIC 9(7).
05 STUDENTNAME PIC X(10).
05 FILLER PIC X(8).
05 COURSECODE PIC X(4).
05 FOO PIC S9(7)V USAGE COMP-3.
If I open the file in Notepad++, I see strange unicode symbols which are difficult to read caused by the COMP-3 variable. Something similar to the image below (the image is from another file):
Is there any way without using COBOL to rewrite this sequential file to be readable? Maybe using a script language like VBS? Any tip or advice will be appreciated, and if you need more info let me know and I'll edit the post.
I would suggest having a look at the Last Cobol Questions
But the RecordEditor will let you view / edit Cobol Files using a Cobol Copybook. In the RecordEditor you can export the file as Csv, Xml if you want.
As mentioned in the Last Cobol Questions there are several solution for Reading Cobol files in Java and probably some in other languages.
To import the Cobol Copybook into the RecordEditor,
Select: Record Layout >>> Import Cobol Copybook
The File-Structure controls how the file is read. Use a File-Structure of Fixed length Binary if all the records are of the same length (no carraige return).
Other Structures supported include
Standard Text files File Structures: ** ..Text.. **
Cobol Variable Record Length Files. These Typically have a Record-Length followed by the Data. There are versions for Mainframes, Open-Cobol, Fujitsu.
The Default File structure will choose the most likely File-Structure based on the Record-Definition. In your case it should choose Fixed length Binary because there is a binary Field in the Definition.
Note: From Record Editor 0.94.4, with a File-Structure of Fixed Length Binary you can edit Fixed Length Text files in a basic Text Editor if you want.
Note: I am the author of RecordEditor
Answer Updates 2017/08/09
Conversion Utilities
For simple Cobol files these conversion utilities (based on JRecord) could be used:
[CobolToCsv][5]
[CobolToXml][6]
[Cobol To Json][7]
RecordEditor
The RecordEditor has a Generate option for generating Java / JRecord code.
See [RecordEditor Code Generation notes][8]

My dollar signs are now little boxes

This week we upgraded to JasperReports Server 4.7 (Professional) and iReport 4.7. I have several reports that I created in iReport 4.5.1 and successfully used in JasperReports Server 4.5.1.
After the upgrade, all of my dollar signs are now little boxes. The pattern for my currency fields is ¤ #,##0.00. JasperReports Server is not replacing the box with a dollar sign when the report is generated. Everything looks ok in the pattern sample. My percentage symbols are all still working. I tried removing and applying the currency pattern to the fields again, but this didn't fix the problem.
Any thoughts on how I can fix this?
This is Java operating as intended... but not as you want it to operate. Your locale does not specify a currency, so you get that "¤" symbol.
You could workaround it by changing your locale from "en" to "en_US". I just did this last week. As a side note, I found one tweak that I needed to make. After changing the locale to en_US I needed to copy one file like this:
cp .../jasperserver-pro/scripts/jquery/js/jquery.ui.datepicker-en.js .../jasperserver-pro/scripts/jquery/js/jquery.ui.datepicker-en-US.js
Alternatively, I usually find it's better to work around it by setting your format mask to use a hard-coded dollar sign. If you are displaying "$50.00" to a user in the United States, it would be nonsensical to display "€50,00" to a European user or "¥50.00" to a Japanese user for the same value. There are lots of times when the hard-coded currency symbol is more appropriate.