What is this symbol ``, and why can't some people see it? - unicode

The symbol is this one: 
It crops up at the end of folder names sometimes on our filesystem. The people who put the folder there swear that they couldn't see it. I think it might be being produced by one of our internal asset building tools, perhaps in response to something copied in from a Google docs spreadsheet?
Sorry that's so vague...

It looks like the culprit could be Windows CHKDSK (analogous to fsck) …
https://github.com/dw/scratch/blob/master/ntfs-3g-chkdsk-unicode-fix.py →
# … Windows
# CHKDSK, which will immediately replace all the invalid characters with
# Unicode ordinals in the private use plane.

This symbol is used by OS X if you add a trailing space to a folder name on a FAT file system.
Why? Because it is forbidden to have trailing spaces in folder names on Windows. As OS X allows trailing spaces in general, it somewhat extends this behavior to FAT drives. The private use unicode character is used as a means to keep the file system sane at the same time.
While you're on the Mac, you barely see the additional character as it is rendered as space there. Only after moving the drive to Windows or Linux, the weird unicode character shows up as unprintable.

This is the unicode character U+F028 which falls in the private use area of U+E000–U+F8FF. This means it's use is application specific. You are probably right that some tool produces this character and fails to remove/replace that character when copying. Wikipedia lists some vendor usage examples but I doubt that this list will help much.
To find the cause of the problem, I would first try asking the users who created the folders about which operating systems they use and which applications they copied the text from. If you think one of your build tool is the cause, you could try to search for that character in all log files of those tools and any databases that the tools use.
As for the actual question in the title, this character does not have a defined symbol. Some applications show a default symbol while other application may just ignore it and show nothing.

Related

When and how might the operating system store a file under a different name than I gave it?

I found this statement under another SO question concerning Unicode and I'd like to ask for further elaboration of this rather surprising fact.
Code that believes once you successfully create a file by a given name, that when you run ls or readdir on its enclosing directory,
you'll actually find that file with the name you created it under is
buggy, broken, and wrong. Stop being surprised by this!
When does this happen and what to do about it?
The first example which comes to my mind: If you create a file under OSX that is named é (single U+00E9 codepoint), the OS will store it actually as U+0065 U+0301 (Unicode decomposition). The file will be still accessible under the original name, but listed as decomposed.
How to avoid: don't lookup your files manually unless you are sure their names are pure ASCII.
Second: On Windows, if you have a file called e, try creating (with overwriting enabled) a file called E, the OS will still list a file called e. If e didn't exists beforehand, a file called E would be created.
How to avoid: don't lookup your files manually unless you are sure their names are pure ASCII, and take case into account. Try using a consistent capitalisation style. I suggest going all lowercase.
Third: on Windows, if for example you have Windows 1250 as your system encoding, and you want to create a file named ê via the narrow, char-based API, a file called e will be created instead. This of course is easy to avoid, but this exact problem bit me once: WinRAR extracted files ê.png, è.png and e.png all into e.png, overwriting data. Similar problems can happen with other encoding mixups, too.
How to avoid: don't use API's that take the filename as a char* on Windows.

Windows Installer Automation and Installshield Basic MSI: Mystery String during Chained MSI

EDIT: Turns out the mystery string was a simple MD5 hash of the name of the file (including the extension and capitalization).
I'm attempting to automate the process of creating a Chained MSI through InstallShield. In the GUI, this involves going to Releases, adding a chained package, linking to the MSI and streaming the file into the project.
I've reverse engineered what exactly happens behind the scenes by analyzing the project file as XML. It essentially just comes down to table edits. I understand you can use Windows Installer Automation to open an *.ism file and access the database tables (LINK).
Yet, there is a single field in the ISChainPackageData table which I cannot seem to generate or figure out how it was calculated. It is the column titled, File. It is a 32 character hex string preceded by an underscore. I have discovered that the only attribute that determines this field is the name of the MSI file being streamed. For example:
Linking to a chained MSI by the name of Test.msi, yields _29B31F67F21C9EE77CBF8C4C5D24ACE9.
Changing the name would change this. Changing the file, including replacing it with an empty file of the same name, does not.
I believe it is some kind of simple hash of the name, but I haven't had any luck guessing it.
Does anyone have any insight on what they might be using here?
Thanks!
Close. It's a hash-based GUID of a combination of a few things. I'd have to trudge up the code to find out exactly what, but it's at least the relative path and filename, and possibly something related to the package in question (probably its primary key value).
This is used to generate a unique key for each file you include with a package, without allowing duplicate files. (Windows Installer doesn't like backslashes in its primary keys.) The actual value here isn't meaningful; if you're careful to avoid duplicate keys and don't overlap file path and name combinations, you can probably put in any valid key value you like. However that may prevent the IDE from detecting duplicates itself.

how can we identify notepad file?

how can we identify notepad files which is created in two computer, is there a any way to get any information about in which computer it was created.Or whether it is build in xp or linux.
If you right click on the file, you should be able to see the permissions and attributes of the file.
Check at the end of the line. Under GNU/Linux lines end with \n (ascii: 0x0A) while under Miscrosoft W$ndos it is \r\n (ascii: 0x0D 0x0A).
Wikipedia: https://en.wikipedia.org/wiki/Newline
found this: http://bit.ly/J258Mr
for identifying a word document but some of the info is relevant
To see on which computer the document had been created, open the Word
document in a hex editor and look for "PID_GUID". This is followed by
a globally unique identifier that, depending upon the version of Word
used, may contain the MAC address of the system on which the file was
created.
Checking the user properties (as already mentioned) is a good way to
see who the creator of the original file was...so, if the document was
not created from scratch and was instead originally created on another
system, then the user information will be for the original file.
Another way to locate the "culprit" in this case is to parse the
contents of the NTUSER.DAT files for each user on each computer. While
this sounds like a lot of work, it really isn't...b/c you're only
looking for a couple of pieces of information. Specifically, you're
interested in the MRU keys for the version of Word being used, as well
as perhaps the RecentDocs keys."
The one thing I can think on the top of my mind is inspecting the newline characters on your file - I'm assuming your files do have multiple lines. If the file was generated using Windows then a newline would be characterized by the combination of carriage return and line feed characters (CR+LF) whereas a simple line feed (LF) would be a hint that the file was generated in a Linux machine.
Right click one the file--> Details . You can see the computer name where it was created and the date.

Java Line separator File .ext separator

i am wondering regarding Java: is there a file extension separator?
like *.doc, the "." being the question.
i know there is a line.separator. just would like my app to be portable so i need to know.
thank you.
The . being a filename from file extension separator is an artifact of DOS and it's 8.3 filename limitations. On Windows, MacOS X, Linus, etc that's no longer that case. . is just any other character (although a leading . on Linux/Unix filesystems indicates a hidden file).
Windows systems still use the convention (even though you can create a filename with as many periods as you wish) as the extension is still used to file type and association. Linux/Unix/MacOS X tend to rely on magic numbers more than file extensions although there are conventions used there too (eg .pl' for Perl files,.sh` for Shell scripts and so forth) but, unlike Windows, these are just conventions that have no OS meaning.
So basically there is no concept of "file separator". Not in a universal sense anyway.

Microsoft Symbol Server / Local Cache Hash Algorithm

I am trying to figure out what hashing algorithm is used for the Microsoft Symbol Local Cache directory.
For example, the local cache can be something like the following
L:\Symbols
\browseui.dll
\44FBC679fe000
browsue.dll
\browseui.pdb
\44F402F62
browseui.pdb
\explorer.exe
\3EBF1F14f7000
explorer.exe
\explorer.pdb
\3EBF1F141
explorer.pdb
\msvcr71.pdb
\60D915C6AB6A4F3586E9096E2F8856482
msvcr71.pdb
There seems to be some sort of correspondence between a file and its debug database. Other than that, I can’t figure out how the names of these (presumably) hexadecimal string folders are being generated.
Some of them are 9 digits, some 13 digits, and others are 33 digits. It looks like an actual, live-file (which for some reason is stored in the symbol cache) has a 13-digit hash while its (nearly similar) debug database gets a 9-digit hash. Some debug databases get a 13-digit hash; can’t figure out what makes these ones special, although they don’t have a corresponding live-file.
I’ve tried hashing the files with every kind of hash algorithm that I know of (39 of them) and none match in any way (straight up, reversed, alternate endian’d, etc.)
Any ideas?
Update
I think I finally found it. From Symbol Storage Format:
SymStore uses the file system itself as a database. It creates a large tree of directories, with directory names based on such things as the symbol file time stamps, signatures, age, and other data.
Edit
Dang, unfortunately it only mentions that the directory name is derived from various aspects (not quite a hash I guess), but does not say exactly how. The search continues… :-(
This page has info on calculating the IDs for the symbol files as well as executables/DLLs.
Basically, for executables and DLLs, you extract the timestamp and filesize from the PE header as listed in the page that Griff linked to. For PDB files however, you will need the DBH command from the Windows Debugging Tools. Simply load the PDB file into DBH and use the INFO command to get the PdbSig/PdbSig70 and PdbAge. Bam! That’s it.
I just created the appropriate folders for the PDB files that I had in my SYSTEM32 folder for some reason, and finally moved them to the local symbol store.
Try looking at this page: Symbol Server Callback Function
EXE/DLL directory name is created by concatenating hex string of the "file modified" time-stamp and "SizeOfImage" from IMAGE_OPTIONAL_HEADER
Finding PE files
The format for the path to a PE file in a symbol server share is:
"%s\%s\%08X%x\%s" % (serverName, peName, timeStamp, imageSize, peName)
Example:
https://msdl.microsoft.com/download/symbols/ntdll.dll/B29ECF521f0000/ntdll.dll
Finding PDB files
The format for the path to a PDB file in a symbol server share is:
"%s\%s\%s%x\%s" % (serverPath, pdbName, guid, age, pdbName)
Example:
https://msdl.microsoft.com/download/symbols/ntdll.pdb/4BC147AE72E8D05022366D6570A8E3461/ntdll.pdb
Source: Symbols the Microsoft Way by Bruce Dawson.
You can find the answer,
SYMBOL RETRIEVER SHELL EXTENSION
; http://www.vitoplantamura.com/index.aspx?page=symretriever
DebugDir.cpp
; http://www.debuginfo.com/examples/src/DebugDir.cpp
PDB File Internals
; http://www.informit.com/articles/article.aspx?p=22685