Microsoft Symbol Server / Local Cache Hash Algorithm - hash

I am trying to figure out what hashing algorithm is used for the Microsoft Symbol Local Cache directory.
For example, the local cache can be something like the following
L:\Symbols
\browseui.dll
\44FBC679fe000
browsue.dll
\browseui.pdb
\44F402F62
browseui.pdb
\explorer.exe
\3EBF1F14f7000
explorer.exe
\explorer.pdb
\3EBF1F141
explorer.pdb
\msvcr71.pdb
\60D915C6AB6A4F3586E9096E2F8856482
msvcr71.pdb
There seems to be some sort of correspondence between a file and its debug database. Other than that, I can’t figure out how the names of these (presumably) hexadecimal string folders are being generated.
Some of them are 9 digits, some 13 digits, and others are 33 digits. It looks like an actual, live-file (which for some reason is stored in the symbol cache) has a 13-digit hash while its (nearly similar) debug database gets a 9-digit hash. Some debug databases get a 13-digit hash; can’t figure out what makes these ones special, although they don’t have a corresponding live-file.
I’ve tried hashing the files with every kind of hash algorithm that I know of (39 of them) and none match in any way (straight up, reversed, alternate endian’d, etc.)
Any ideas?
Update
I think I finally found it. From Symbol Storage Format:
SymStore uses the file system itself as a database. It creates a large tree of directories, with directory names based on such things as the symbol file time stamps, signatures, age, and other data.
Edit
Dang, unfortunately it only mentions that the directory name is derived from various aspects (not quite a hash I guess), but does not say exactly how. The search continues… :-(

This page has info on calculating the IDs for the symbol files as well as executables/DLLs.
Basically, for executables and DLLs, you extract the timestamp and filesize from the PE header as listed in the page that Griff linked to. For PDB files however, you will need the DBH command from the Windows Debugging Tools. Simply load the PDB file into DBH and use the INFO command to get the PdbSig/PdbSig70 and PdbAge. Bam! That’s it.
I just created the appropriate folders for the PDB files that I had in my SYSTEM32 folder for some reason, and finally moved them to the local symbol store.

Try looking at this page: Symbol Server Callback Function

EXE/DLL directory name is created by concatenating hex string of the "file modified" time-stamp and "SizeOfImage" from IMAGE_OPTIONAL_HEADER

Finding PE files
The format for the path to a PE file in a symbol server share is:
"%s\%s\%08X%x\%s" % (serverName, peName, timeStamp, imageSize, peName)
Example:
https://msdl.microsoft.com/download/symbols/ntdll.dll/B29ECF521f0000/ntdll.dll
Finding PDB files
The format for the path to a PDB file in a symbol server share is:
"%s\%s\%s%x\%s" % (serverPath, pdbName, guid, age, pdbName)
Example:
https://msdl.microsoft.com/download/symbols/ntdll.pdb/4BC147AE72E8D05022366D6570A8E3461/ntdll.pdb
Source: Symbols the Microsoft Way by Bruce Dawson.

You can find the answer,
SYMBOL RETRIEVER SHELL EXTENSION
; http://www.vitoplantamura.com/index.aspx?page=symretriever
DebugDir.cpp
; http://www.debuginfo.com/examples/src/DebugDir.cpp
PDB File Internals
; http://www.informit.com/articles/article.aspx?p=22685

Related

2 files with same hash, but 1 is corrupted and 1 isn't

I found something very weird on a project.
I have 2 files :
One is the input file, it's a .bip file which you can open with GIS software like QGIS
here's the input. this file is provided by the CCSDS and accessible here
The other is the output after been compressed and decompressed by a lossless compression algorithm (CCSDS 123 by ESA)
Those 2 files shares the exact same sha256 and sha1 hash, so they are identical.
3226009de97d66589fc58cdc9af377e6315ccc69a7095bec8dc04447bf3cea2e test_ptn_x100y36z17_16u.bip
3226009de97d66589fc58cdc9af377e6315ccc69a7095bec8dc04447bf3cea2e test_ptn_decomp.bip (sha256 shown here).
The thing is, if the entry is showed by QGIS, the second one displays a message and refuses to open it shows this message (translated : the file test_ptn_decomp.bip is not a recognized or valid data source)
Is there something i don't understand with hashes ? i've tried moving files to other directories and renaming but nothing changes QGIS wise.
It is highly unlikely you got a different content with same sha256 hash by chance. So I'll assume the files are identical. Anyway, it is easy to use any diff program to compare.
So there should be some other differences, things that come to mind:
file name might contain some meaningful information needed by QGIS. Try renaming decompressed file e.g. decomp_ptn_x100y36z17_16u.bip, maybe x100.. is essential?
There are some additional files, that must have matching names. Do you have a .hdr file, as explained in QGIS tutorials?
https://www.qgistutorials.com/en/docs/open_bil_bip_bsq_files.html

BSP Application upload generates hashed filenames without folder structure

After the upload the a sapui5 application on the SAP system has a strange structure. The files are not in the same structure as they were on my machine and the filenames are hashed, except MIMEs. So I am not able to find e.g. a specific "controller.js". The application is still fully working.
In this specific case the SAP Program "/UI5/UI5_REPOSITORY_LOAD" was used to upload the application. The upload protocol looks fine, no hint about renaming or similar. So I am not sure if the problem is with the system or the program.
All the hash files name should be normal naming and should be in sub-folders components. Even the "index.html" file has a hash, this cases a problem when click on "test application", because it opens the hash in the URL. The hash, which is the path and the filename cannot be opened, but if I replace the hash with the original path -> it works
http://scn.sap.com/thread/3809662
A work colleague found the issue in the sap system. It seems it is not allowed to have path + filename longer than 70 characters. If it is longer it hashes the path and filename to place it under the project root folder.
The german comment seems very strange, though ... "Name length should not be a
problem. Do we have anyhow a max length?"
It also creates a file containing the mapping from filepath + filename to hash.
You should not use file names longer than 70 characters.
Also you should not use '-' in your file names.
As far as I know the only allowed characters for valid BSP paths are:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789./_"

When and how might the operating system store a file under a different name than I gave it?

I found this statement under another SO question concerning Unicode and I'd like to ask for further elaboration of this rather surprising fact.
Code that believes once you successfully create a file by a given name, that when you run ls or readdir on its enclosing directory,
you'll actually find that file with the name you created it under is
buggy, broken, and wrong. Stop being surprised by this!
When does this happen and what to do about it?
The first example which comes to my mind: If you create a file under OSX that is named é (single U+00E9 codepoint), the OS will store it actually as U+0065 U+0301 (Unicode decomposition). The file will be still accessible under the original name, but listed as decomposed.
How to avoid: don't lookup your files manually unless you are sure their names are pure ASCII.
Second: On Windows, if you have a file called e, try creating (with overwriting enabled) a file called E, the OS will still list a file called e. If e didn't exists beforehand, a file called E would be created.
How to avoid: don't lookup your files manually unless you are sure their names are pure ASCII, and take case into account. Try using a consistent capitalisation style. I suggest going all lowercase.
Third: on Windows, if for example you have Windows 1250 as your system encoding, and you want to create a file named ê via the narrow, char-based API, a file called e will be created instead. This of course is easy to avoid, but this exact problem bit me once: WinRAR extracted files ê.png, è.png and e.png all into e.png, overwriting data. Similar problems can happen with other encoding mixups, too.
How to avoid: don't use API's that take the filename as a char* on Windows.

how can we identify notepad file?

how can we identify notepad files which is created in two computer, is there a any way to get any information about in which computer it was created.Or whether it is build in xp or linux.
If you right click on the file, you should be able to see the permissions and attributes of the file.
Check at the end of the line. Under GNU/Linux lines end with \n (ascii: 0x0A) while under Miscrosoft W$ndos it is \r\n (ascii: 0x0D 0x0A).
Wikipedia: https://en.wikipedia.org/wiki/Newline
found this: http://bit.ly/J258Mr
for identifying a word document but some of the info is relevant
To see on which computer the document had been created, open the Word
document in a hex editor and look for "PID_GUID". This is followed by
a globally unique identifier that, depending upon the version of Word
used, may contain the MAC address of the system on which the file was
created.
Checking the user properties (as already mentioned) is a good way to
see who the creator of the original file was...so, if the document was
not created from scratch and was instead originally created on another
system, then the user information will be for the original file.
Another way to locate the "culprit" in this case is to parse the
contents of the NTUSER.DAT files for each user on each computer. While
this sounds like a lot of work, it really isn't...b/c you're only
looking for a couple of pieces of information. Specifically, you're
interested in the MRU keys for the version of Word being used, as well
as perhaps the RecentDocs keys."
The one thing I can think on the top of my mind is inspecting the newline characters on your file - I'm assuming your files do have multiple lines. If the file was generated using Windows then a newline would be characterized by the combination of carriage return and line feed characters (CR+LF) whereas a simple line feed (LF) would be a hint that the file was generated in a Linux machine.
Right click one the file--> Details . You can see the computer name where it was created and the date.

_NT_SYMBOL_PATH format

I'm trying to use windbg more, and I keep having problems with the symbol cache. It isn't clear to me what the format of the string is supposed to be.
I have a few requirements:
use Microsoft's server http://msdl.microsoft.com/download/symbols
use symbols from our software that are archived at \\foo\Build1234
use a local cache at c:\dev\symbols
The archive of symbols from our distributed build at \\foo\Build1234 are not organized as a symbol server. If I understand it correctly, I need to use the cache keyword.
Given these requirements, does this look like a properly formatted srvpath:
cache*\\foo\Build1234;srv*c:\dev\symbols*http://msdl.microsoft.com/download/symbols
Edit:
I just started reading Advanced Windows Debugging and I had misinterpreted how the cache keyword works. I thought it was a way of telling the debugger that the folder is just a folder of files and not a symbol server. After Michael left his comment, I reread the section and see that it indeed works as Michael described.
Now I'm confused by when you use a ; or a * to separate paths/URLs. And when you need the srv* prefix. In the online help for windbg they give this example:
\\someshare\that\cachestar\ignores;srv*c:\mysymbols*http://msdl.microsoft.com/download/symbols;cache*c:\mysymbols;\\anothershare\that\gets\cached
The symbols from \\someshare are not cached, symbols from Microsoft are cached in c:\mysymbols, and c:\mysymbols is used as the cache for any other paths to the right of the cache* directive.
The occasional use of srv* is confusing me -- I don't understand why the first and last paths aren't prefixed with srv*.
Edit 2:
This is slowly starting to make sense to me. The srv directive is used for symbol servers, and not for normal symbol directories. So, I believe the answer to my original question is this:
\\foo\Build1234;cache*c:\dev\symbols;srv*http://msdl.microsoft.com/download/symbols
SRV*C:\dev\symbols*http://msdl.microsoft.com/download/symbols;\\foo\build1234
Should work fine, if \\foo\build1234 is just flat PDB's. Cache isn't needed here; you just need to add the directory to your symbol path.
The cache keyword specifies where you want to cache your symbol files, and is useful for caching symbols locally from non-indexed shares (like \\foo\build1234)
cache*C:\dev\symbols;SRV*C:\dev\symbols*http://msdl.microsoft.com/download/symbols;\\foo\build1234
The above path would store symbols from MS's symbol server and your symbol share to your local machine in C:\dev\symbols.
To debug symbol issues using windbg, do
!sym noisy
.reload <some exe or DLL in your session>
And then do some action that would force the PDB to be loaded. You'll see where windbg is looking for files, and if it rejects a PDB why it did so.
!sym quiet
Will then suppress symbol prompts.
Here's a detailed post on debugging issues with symbols loading.
Loading symbols in Windbg