NPOI number wrong format read - number-formatting

I have an xlsx file and try to read numbers from it and put them in another file. Problem is, that some numbers are read incorrectly and i have no idea why. For example:
Number in excel | Number read
-----------------------------
139,8 | 1,398E+16
2,2 | 2,2E+16
Interesting thing is, that this problem happens only with some numbers. Formatting for all numbers is the same. NPOI reads the exact number from excel, not the formatted, so i checked values, but hey all are the same as formatted ones.
Ok, i guess i found a problem. Now i just need to find solution. I Extracted xlsx file and checked the real values stored in cells. Problem is that when i have value 139.80000000000001 it is read as 1,398E+16, so i guess NPOI interpretes the formatting wrong. It thinks that . (dot) separates thousands, while it doesn't.

Just for the record, I just updated from Alpha to Beta, and it worked. Now I get the exact value that is on the cell.
The beta can be found here.

Looks like this is a known issue, and there's a planned fix in an upcoming NPOI 2.0 beta 1 release:
RELEASE NOTES
...
fix decimal seperated by comma instead of dot

It looks to be a bug in NPOI 2.0 alpha. Please try NPOI 2.0 beta 1 if it still exists, we will plan to fix it in 2.0 final release

Related

"Number is not compatible with column definition or is not available for a not nullable column"

I have a problem when importing a txt file into an oracle table.
for some reason, SQL developer doesn't allow decimals to get imported as the image shows, and I believe that the data type is correct as it worked for me before.
please help, and thanks a lot
Please check that the decimal seperator on your system has not changed since the last time this worked. To test this replace all dots(.) with commas(,) in the text file and check if the error goes away.
I have reproduced your error on my own system, where comma is the decimal seperator, and using a file that uses dot as the decimal seperator.
Not sure if the scale, -127, will cause any issues for you. NUMBER columns defined without precision and scale get the scale -127 in SQL Developer import engine. I have never noticed this before but it is present in version 20.2.0.175.

Why can not Google Dataprep handle the encoding in my log files?

We are receiving big log files each month. Before loading them into Google BigQuery they need to be converted from fixed with to delimited. I found a good article on how to do that in Google Dataprep. However, there seems to be something wrong with the encoding.
Each time a Swedish Character appears in the log file, the Split function seems to add another space. This messes up the rest of the columns, as can be seen in the attached screenshot.
I can't determine the correct encoding of the log files, but I know they are being created by pretty old Windows servers in Poland.
Can anyone advice on how to solve this challenge?
Screenshot of the issue in Google Dataprep.
What us the exact recipe you are using ? Do you use (split every x ) ?
When I used in a test case an ISO Latin1 text and ingested it as ISO 8859-1, the output was as expected and only the display was off
Can you try the same ?
Would it be possible to share an example input file with one or two rows ?
As a workaround you can use the RegEx, which should work.
It's unfortunately a bit more complex, because you would have to use multiple regex splits. Here's an example for the first two splits after 10 characters each /.{10}/ and split on //

How to interpret emoji in Web API

I am trying to intercept and replace emoji with a corresponding text. I left the default encoding on the Web API (UTF-8 / UTF-16 respectively).
How can I convert an emoji like 😉 to something like U+1F609?
Here is something that helped me out, although it is in Perl. But you can encode and decode. This should be what you're looking for: https://metacpan.org/pod/Encode::JP::Emoji
This is quite an old post and even though I'm not on the project anymore I want to still answer with my findings for future reference if someone else has the same problem.
What I ended up doing is to create a dictionary with key the UTF combination of the emoji and as a value the text. One thing as an advice: I made sure the longest UTF combination, some emoji have 4 or even 5, as the first ones as otherwise some emoji never got reached. Totally not a perfect and future proof solution that I was hoping for but it worked for us and it shipped into production where it has been working since 2017.

Formatting dates as dd_mm_yyyy in Go gives strange values

So as in the title, I'm trying to format a date in dd_mm_yy format using time.Now().Format("02_01_2006") as shown in this playground session:
http://play.golang.org/p/alAj-OcRZt
First problem, dd_mm_yyyy isn't an acceptable format, only dd_mm_yy is, which is fine I can manipulate the returned string myself.
The problem I have for you is to help me figure out what Go is even trying to do with this input.
You should notice the result you get is:
10_1110009
A good few thousand years off and it's lost the underscore which it only does it for _2. Does this character sequence represent something special here?
Replacing the last underscore with a hyphen or space returns a valid result. dd_mm_yy works fine. Just this particular case seems to completely fly off the handle.
On my local machine (Go playground is on a specific date) the result for today (the 5th) is:
05_01 5016
Which is equally strange, if not moreso as it's substituted in a space which seems to be an ANSIC thing.
This is very likely due to the following bug: https://github.com/golang/go/issues/11334
This has been fixed in Go 1.6beta1
Found an issue from their github:
https://github.com/golang/go/issues/11334
Basically _2 is taking the 2 as the day value from the reference time and then trying to parse the rest (006) which it doesn't recognise so it all goes wrong from there.

CSV in bad Encoding

We have uploaded a file with bad encoding now when downloading it again all the "strange" French characters are mixed up.
Example of the bad text:
R�union
Now when opening the CSV with Openoffice we tried all of the encodings in the Dropdown none of them seem to work.
Anyone have a way to fix the encoding to the correct one that we can view the chars?
Links to file https://drive.google.com/file/d/0BwgeuQK3LAFRWkJuNHd2TlF2WjQ/view?usp=sharing
Kr.
Sadly there is no way to automatically fix the linked file. Consider the two words afectación and sécurité. In the file they have been converted incorrectly to afectaci?n and s?curit?. There is no way to convert the question marks back because sometimes they're ó and other times é.
(Actually instead of question marks the file uses the unicode replacement character, but that doesn't change the problem).
Hopefully you have an earlier version of the file that has not been converted incorrectly.
Next time try to use a consistent encoding. This question gives some suggestions for how to do this.
If the original data cannot be obtained, there is one thing that could be done outside of retyping the whole thing. It is possible to use dictionary lookups to guess the missing words. However this would be a difficult project, and there would be mistakes where incorrect guesses were made. It's probably not worth it.