i am wondering regarding Java: is there a file extension separator?
like *.doc, the "." being the question.
i know there is a line.separator. just would like my app to be portable so i need to know.
thank you.
The . being a filename from file extension separator is an artifact of DOS and it's 8.3 filename limitations. On Windows, MacOS X, Linus, etc that's no longer that case. . is just any other character (although a leading . on Linux/Unix filesystems indicates a hidden file).
Windows systems still use the convention (even though you can create a filename with as many periods as you wish) as the extension is still used to file type and association. Linux/Unix/MacOS X tend to rely on magic numbers more than file extensions although there are conventions used there too (eg .pl' for Perl files,.sh` for Shell scripts and so forth) but, unlike Windows, these are just conventions that have no OS meaning.
So basically there is no concept of "file separator". Not in a universal sense anyway.
Related
The symbol is this one:
It crops up at the end of folder names sometimes on our filesystem. The people who put the folder there swear that they couldn't see it. I think it might be being produced by one of our internal asset building tools, perhaps in response to something copied in from a Google docs spreadsheet?
Sorry that's so vague...
It looks like the culprit could be Windows CHKDSK (analogous to fsck) …
https://github.com/dw/scratch/blob/master/ntfs-3g-chkdsk-unicode-fix.py →
# … Windows
# CHKDSK, which will immediately replace all the invalid characters with
# Unicode ordinals in the private use plane.
This symbol is used by OS X if you add a trailing space to a folder name on a FAT file system.
Why? Because it is forbidden to have trailing spaces in folder names on Windows. As OS X allows trailing spaces in general, it somewhat extends this behavior to FAT drives. The private use unicode character is used as a means to keep the file system sane at the same time.
While you're on the Mac, you barely see the additional character as it is rendered as space there. Only after moving the drive to Windows or Linux, the weird unicode character shows up as unprintable.
This is the unicode character U+F028 which falls in the private use area of U+E000–U+F8FF. This means it's use is application specific. You are probably right that some tool produces this character and fails to remove/replace that character when copying. Wikipedia lists some vendor usage examples but I doubt that this list will help much.
To find the cause of the problem, I would first try asking the users who created the folders about which operating systems they use and which applications they copied the text from. If you think one of your build tool is the cause, you could try to search for that character in all log files of those tools and any databases that the tools use.
As for the actual question in the title, this character does not have a defined symbol. Some applications show a default symbol while other application may just ignore it and show nothing.
How to read ASCII files with mixed line endings (Windows and Unix) and UTF-16 Big Endian files in SAP?
Background: our ABAP application must read some of our configuration files. Most of them are ASCII files (normal text files) and one is Unicode Big Endian. So far, the files were read using ASCII mode and things were fine during our test.
However, the following happened at customers: the configuration files are located on a Linux terminal, so it has Unix Line Endings. People read the configuration files via FTP or similar and transport it to the Windows machine. On the Windows machine, they adapt some of the settings. Depending on the editor, our customers now have mixed line endings.
Those mixed line ending cause trouble when reading the file in ASCII mode in ABAP. The file is read up to the point where the line endings change plus a bit more but not the whole file.
I suggested reading the file in BINARY mode, remove all the CRs, then replace all the remaining LFs by CR LF. That worked fine - except for the UTF-16 BE file for which this approach results in a mess. So the whole thing was reverted.
I'm not an ABAP developer, I just have to test this. With my background in other programming languages I must assume there is a solution and I tend to decline a "CAN'T FIX" resolution of this bug.
you can use CL_ABAP_FILE_UTILITIES=>CHECK_FOR_BOMto determine which encoding the file has and then use the constants of class CL_ABAP_CHAR_UTILITIES to process further.
I found this statement under another SO question concerning Unicode and I'd like to ask for further elaboration of this rather surprising fact.
Code that believes once you successfully create a file by a given name, that when you run ls or readdir on its enclosing directory,
you'll actually find that file with the name you created it under is
buggy, broken, and wrong. Stop being surprised by this!
When does this happen and what to do about it?
The first example which comes to my mind: If you create a file under OSX that is named é (single U+00E9 codepoint), the OS will store it actually as U+0065 U+0301 (Unicode decomposition). The file will be still accessible under the original name, but listed as decomposed.
How to avoid: don't lookup your files manually unless you are sure their names are pure ASCII.
Second: On Windows, if you have a file called e, try creating (with overwriting enabled) a file called E, the OS will still list a file called e. If e didn't exists beforehand, a file called E would be created.
How to avoid: don't lookup your files manually unless you are sure their names are pure ASCII, and take case into account. Try using a consistent capitalisation style. I suggest going all lowercase.
Third: on Windows, if for example you have Windows 1250 as your system encoding, and you want to create a file named ê via the narrow, char-based API, a file called e will be created instead. This of course is easy to avoid, but this exact problem bit me once: WinRAR extracted files ê.png, è.png and e.png all into e.png, overwriting data. Similar problems can happen with other encoding mixups, too.
How to avoid: don't use API's that take the filename as a char* on Windows.
I have a bunch of text files that need cleaning up. Example
`E..4B?#.#...
..9J5.....P0.z.n9.9.. ........
.k#a..5
E...y^#.r...J5..
E...y_#.r...J5..
..9.P..n9..0.z............
….2..3..9…n7…..#.yr`
Is there any way sed can do this? Like notice weird patterns?
For this answer, I will assume that you have access to standard unix/linux tools.
Your file might be in some word-processor format. If so, the best way to get rid of the junk is to open it with that program. You may be able to find out which with file:
$ file mysteryfile
mysteryfile: Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.1 ....
If that doesn't work, there is a standard unix utility for extracting text from binary files. It is called strings:
$ strings mysteryfile
Some
Recovered Text
...
The behavior of strings can be fine tuned with several options. See man strings.
I have an exe file(eg naming.exe) on my Solaris server.
I want to see the contents of the exe file.
Is it possible ?
See the strings command which will extract readable text from a file. See the article on wikipedia for more about it.
Although Solaris and Unix in general doesn't care that much about suffixes, especially for executables.".exe" isn't a common file suffix there, it looks like a Windows thing to me.
Start by running file naming.exe to get an idea about what kind of file it is.
Often data beyond simply strings is packed in too. (For example software installers sometimes have useful cross-platform data files embedded within executable files). On linux you can extract this using cabextract. I can't see any reason why porting this to Solaris would be hard if it isn't already working on Solaris.