Windows Installer Automation and Installshield Basic MSI: Mystery String during Chained MSI

Windows Installer Automation and Installshield Basic MSI: Mystery String during Chained MSI - hash

EDIT: Turns out the mystery string was a simple MD5 hash of the name of the file (including the extension and capitalization).
I'm attempting to automate the process of creating a Chained MSI through InstallShield. In the GUI, this involves going to Releases, adding a chained package, linking to the MSI and streaming the file into the project.
I've reverse engineered what exactly happens behind the scenes by analyzing the project file as XML. It essentially just comes down to table edits. I understand you can use Windows Installer Automation to open an *.ism file and access the database tables (LINK).
Yet, there is a single field in the ISChainPackageData table which I cannot seem to generate or figure out how it was calculated. It is the column titled, File. It is a 32 character hex string preceded by an underscore. I have discovered that the only attribute that determines this field is the name of the MSI file being streamed. For example:
Linking to a chained MSI by the name of Test.msi, yields _29B31F67F21C9EE77CBF8C4C5D24ACE9.
Changing the name would change this. Changing the file, including replacing it with an empty file of the same name, does not.
I believe it is some kind of simple hash of the name, but I haven't had any luck guessing it.
Does anyone have any insight on what they might be using here?
Thanks!

Close. It's a hash-based GUID of a combination of a few things. I'd have to trudge up the code to find out exactly what, but it's at least the relative path and filename, and possibly something related to the package in question (probably its primary key value).
This is used to generate a unique key for each file you include with a package, without allowing duplicate files. (Windows Installer doesn't like backslashes in its primary keys.) The actual value here isn't meaningful; if you're careful to avoid duplicate keys and don't overlap file path and name combinations, you can probably put in any valid key value you like. However that may prevent the IDE from detecting duplicates itself.

Related

Powershell: dealing with / in Registry property names

Given this (real) Registry path...
HKEY_CLASSES_ROOT\Local Settings\MrtCache\C:%5CProgram Files%5CWindowsApps%5C89006A2E.AutodeskSketchBook_1.6.0.0_x64__tf1gferkr813w%5Cresources.pri\1d3438f5876f755\6dfb7f2f\#{89006A2E.AutodeskSketchBook_1.6.0.0_x64__tf1gferkr813w?ms-resource://89006A2E.AutodeskSketchBook/Files/Assets/AppLogo/Orion_Tiny.png}
where the property name is...
#{Microsoft.Office.OneNote_17.8625.20901.0_x64__8wekyb3d8bbwe?ms-resource://Microsoft.Office.OneNote/Files/images/OneNoteAppList.png}
I am trying to figure out how to properly deal with extracting the data value. I need to differentiate the path to the containing key from the property name, but because the property name contains / and Split-Path converts those to \ and treats them as key delimiters, I get bad data out of that Cmdlet. From a programming standpoint the solution is to not start with a single path. However, I am somewhat constrained by existing data in XML files that provides only a single path. For 99.9% of cases, including drive and UNC file & folder paths, registry Key & Property paths as well as URL paths, Split-Path works. But for this very specific situation it fails. Is there a .NET solution that can be depended on? Or is this a case where there is no solution other than to break up the data and curse Microsoft for their inconsistency and incomplete solutions?
I get that this example is probably a situation I will never actually run into, but I have been burned before with things like assuming anything with an extension is a file and then finding someone (usually Autodesk) has decided to name a bunch of folders with a . in the name, causing code with that assumption about naming conventions to fail. So I am looking for a consistent way to deal with this, if one exists. Ideally in PS 5.1, not PS Core, as I cannot and will not demand that all my users upgrade to a new version of PowerShell to address such an edge case.
EDIT: I should also mention that a similar issue arrises when there is a / in a key name, and I want to verify that the key exists. HKLM\Software\Key/Name is a perfectly valid path, and Split-Path will soil itself every time and come back with Name as the leaf, not Key/Name. Because Split-Path doesn't actually understand what's valid as a registry key name, it seems.

What is this symbol ``, and why can't some people see it?

The symbol is this one: 
It crops up at the end of folder names sometimes on our filesystem. The people who put the folder there swear that they couldn't see it. I think it might be being produced by one of our internal asset building tools, perhaps in response to something copied in from a Google docs spreadsheet?
Sorry that's so vague...

It looks like the culprit could be Windows CHKDSK (analogous to fsck) …
https://github.com/dw/scratch/blob/master/ntfs-3g-chkdsk-unicode-fix.py →
# … Windows
# CHKDSK, which will immediately replace all the invalid characters with
# Unicode ordinals in the private use plane.

This symbol is used by OS X if you add a trailing space to a folder name on a FAT file system.
Why? Because it is forbidden to have trailing spaces in folder names on Windows. As OS X allows trailing spaces in general, it somewhat extends this behavior to FAT drives. The private use unicode character is used as a means to keep the file system sane at the same time.
While you're on the Mac, you barely see the additional character as it is rendered as space there. Only after moving the drive to Windows or Linux, the weird unicode character shows up as unprintable.

This is the unicode character U+F028 which falls in the private use area of U+E000–U+F8FF. This means it's use is application specific. You are probably right that some tool produces this character and fails to remove/replace that character when copying. Wikipedia lists some vendor usage examples but I doubt that this list will help much.
To find the cause of the problem, I would first try asking the users who created the folders about which operating systems they use and which applications they copied the text from. If you think one of your build tool is the cause, you could try to search for that character in all log files of those tools and any databases that the tools use.
As for the actual question in the title, this character does not have a defined symbol. Some applications show a default symbol while other application may just ignore it and show nothing.

When and how might the operating system store a file under a different name than I gave it?

I found this statement under another SO question concerning Unicode and I'd like to ask for further elaboration of this rather surprising fact.
Code that believes once you successfully create a file by a given name, that when you run ls or readdir on its enclosing directory,
you'll actually find that file with the name you created it under is
buggy, broken, and wrong. Stop being surprised by this!
When does this happen and what to do about it?

The first example which comes to my mind: If you create a file under OSX that is named é (single U+00E9 codepoint), the OS will store it actually as U+0065 U+0301 (Unicode decomposition). The file will be still accessible under the original name, but listed as decomposed.
How to avoid: don't lookup your files manually unless you are sure their names are pure ASCII.
Second: On Windows, if you have a file called e, try creating (with overwriting enabled) a file called E, the OS will still list a file called e. If e didn't exists beforehand, a file called E would be created.
How to avoid: don't lookup your files manually unless you are sure their names are pure ASCII, and take case into account. Try using a consistent capitalisation style. I suggest going all lowercase.
Third: on Windows, if for example you have Windows 1250 as your system encoding, and you want to create a file named ê via the narrow, char-based API, a file called e will be created instead. This of course is easy to avoid, but this exact problem bit me once: WinRAR extracted files ê.png, è.png and e.png all into e.png, overwriting data. Similar problems can happen with other encoding mixups, too.
How to avoid: don't use API's that take the filename as a char* on Windows.

Configuration Key Value Store

I'm in the planning stages of a script/app that I'm going to need to write soon. In short, I'm going to have a configuration file that stores multiple key value pairs for a system configuration. Various applications will talk to this file including python/shell/rc scripts.
One example case would be that when the system boots, it pulls the static IP to assign to itself from that file. This means it would be nice to quickly grab a key/value from this file in a shell/rc script (ifconfig `evalconffile main_interface` `evalconffile primary_ip` up), where evalconffile is the script that fetches the value when provided with a key.
I'm looking for suggestions on the best way to approach this. I've tossed around the idea of using a plain text file and perl to retrieve the value. I've also tossed around the idea of using YAML for the configuration file since there may end up being a use case where we need multiple values for a key and general expansion. I know YAML would make it accessible from python and perl, but I'm not sure what the best way to access it from a quickly access it from a shell/rc script would be.
Am I headed in the right direction?

One approach would be to simply do the YAML as you wanted, and then when a shell/RC wants a key/value pair, they would call a small Perl script (the evalconffile in your example) that would parse YAML on the shell script's behalf and print out the value(s)

SQLite will give you greatest flexibility, since you don't seem to know the scope of what will be stored in there. It appears there's support for it in all scripting languages you mentioned.

Microsoft Symbol Server / Local Cache Hash Algorithm

I am trying to figure out what hashing algorithm is used for the Microsoft Symbol Local Cache directory.
For example, the local cache can be something like the following
L:\Symbols
\browseui.dll
\44FBC679fe000
browsue.dll
\browseui.pdb
\44F402F62
browseui.pdb
\explorer.exe
\3EBF1F14f7000
explorer.exe
\explorer.pdb
\3EBF1F141
explorer.pdb
\msvcr71.pdb
\60D915C6AB6A4F3586E9096E2F8856482
msvcr71.pdb
There seems to be some sort of correspondence between a file and its debug database. Other than that, I can’t figure out how the names of these (presumably) hexadecimal string folders are being generated.
Some of them are 9 digits, some 13 digits, and others are 33 digits. It looks like an actual, live-file (which for some reason is stored in the symbol cache) has a 13-digit hash while its (nearly similar) debug database gets a 9-digit hash. Some debug databases get a 13-digit hash; can’t figure out what makes these ones special, although they don’t have a corresponding live-file.
I’ve tried hashing the files with every kind of hash algorithm that I know of (39 of them) and none match in any way (straight up, reversed, alternate endian’d, etc.)
Any ideas?
Update
I think I finally found it. From Symbol Storage Format:
SymStore uses the file system itself as a database. It creates a large tree of directories, with directory names based on such things as the symbol file time stamps, signatures, age, and other data.
Edit
Dang, unfortunately it only mentions that the directory name is derived from various aspects (not quite a hash I guess), but does not say exactly how. The search continues… :-(

This page has info on calculating the IDs for the symbol files as well as executables/DLLs.
Basically, for executables and DLLs, you extract the timestamp and filesize from the PE header as listed in the page that Griff linked to. For PDB files however, you will need the DBH command from the Windows Debugging Tools. Simply load the PDB file into DBH and use the INFO command to get the PdbSig/PdbSig70 and PdbAge. Bam! That’s it.
I just created the appropriate folders for the PDB files that I had in my SYSTEM32 folder for some reason, and finally moved them to the local symbol store.

Try looking at this page: Symbol Server Callback Function

EXE/DLL directory name is created by concatenating hex string of the "file modified" time-stamp and "SizeOfImage" from IMAGE_OPTIONAL_HEADER

Finding PE files
The format for the path to a PE file in a symbol server share is:
"%s\%s\%08X%x\%s" % (serverName, peName, timeStamp, imageSize, peName)
Example:
https://msdl.microsoft.com/download/symbols/ntdll.dll/B29ECF521f0000/ntdll.dll
Finding PDB files
The format for the path to a PDB file in a symbol server share is:
"%s\%s\%s%x\%s" % (serverPath, pdbName, guid, age, pdbName)
Example:
https://msdl.microsoft.com/download/symbols/ntdll.pdb/4BC147AE72E8D05022366D6570A8E3461/ntdll.pdb
Source: Symbols the Microsoft Way by Bruce Dawson.

You can find the answer,
SYMBOL RETRIEVER SHELL EXTENSION
; http://www.vitoplantamura.com/index.aspx?page=symretriever
DebugDir.cpp
; http://www.debuginfo.com/examples/src/DebugDir.cpp
PDB File Internals
; http://www.informit.com/articles/article.aspx?p=22685

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse