2 files with same hash, but 1 is corrupted and 1 isn't - hash

I found something very weird on a project.
I have 2 files :
One is the input file, it's a .bip file which you can open with GIS software like QGIS
here's the input. this file is provided by the CCSDS and accessible here
The other is the output after been compressed and decompressed by a lossless compression algorithm (CCSDS 123 by ESA)
Those 2 files shares the exact same sha256 and sha1 hash, so they are identical.
3226009de97d66589fc58cdc9af377e6315ccc69a7095bec8dc04447bf3cea2e test_ptn_x100y36z17_16u.bip
3226009de97d66589fc58cdc9af377e6315ccc69a7095bec8dc04447bf3cea2e test_ptn_decomp.bip (sha256 shown here).
The thing is, if the entry is showed by QGIS, the second one displays a message and refuses to open it shows this message (translated : the file test_ptn_decomp.bip is not a recognized or valid data source)
Is there something i don't understand with hashes ? i've tried moving files to other directories and renaming but nothing changes QGIS wise.

It is highly unlikely you got a different content with same sha256 hash by chance. So I'll assume the files are identical. Anyway, it is easy to use any diff program to compare.
So there should be some other differences, things that come to mind:
file name might contain some meaningful information needed by QGIS. Try renaming decompressed file e.g. decomp_ptn_x100y36z17_16u.bip, maybe x100.. is essential?
There are some additional files, that must have matching names. Do you have a .hdr file, as explained in QGIS tutorials?
https://www.qgistutorials.com/en/docs/open_bil_bip_bsq_files.html

Related

How to load .mat files onto Matlab? Basically what's wrong with my code?

For this project we have been given code, and will be changing some inputs and assumptions. Thus, I already possess the original codes, but just changing all the creator's file paths to match my own computer is yielding me a lot of trouble. The following, and many variations of, continually yield errors.
load \Users\myname\Library\Documents\...
The error is
Error using load
'Unable to read file
\Users\myname\Library\Documents...'.
No such file or directory.
My files are stored in my Documents. Another person in my group on windows has used
load C:\Users\hisname\Desktop\...
Is there something I'm missing in my line, similar to the C drive but on Mac? Is my code just completely wrong, I'm able to load files in R quite easily, but Matlab is posing a huge hurdle. I have no experience with Matlab and have been asked simply to run this code.
On the Mac, path components are separated by /, not \. Thus, you should type
load /Users/myname/Documents/filename.mat
You can use the location bar at the top of the command window to change to the directory where your file is located, and then you can type
load filename
to load filename.mat.
Also, are you sure you have a Documents directory under Library? Why?
To run code from a file called "my_file.m", than just open your Matlab and type run my_file.m. This will run your script in the Command Window.
The load function is used, if you want to load a .mat file. These are normally files, where variables from your workspace are stored.

System.IO - Does BinaryReader/Writer read/write exactly what a file contains? (abstract concept)

I'm relatively new to C# and am attempting to adapt a text encryption algorithm I designed in wxMaxima into a Binary encryption program in C# using Visual Studio forms. Because I am new to reading/writing binary files, I am lacking in knowledge regarding what happens when I try to read or write to a filestream.
For example, instead of encrypting a text file as I've done in the past, say I want to encrypt an executable or any other form of binary file.
Here are a few questions I don't understand:
When I open a file stream and use binaryreader will it read in an absolute duplicate of absolutely everything in the file? I want to be able to, for example, read in an entire file, delete the original file, then create a new file with the old name and write the entire binary stream back. Will this reproduce the original file exactly or will there be some sort of corruption that must otherwise be accounted for?
Because it's an encryption program, I was hoping to add in a feature that would low-level "format" the original file before deleting it so it would be theoretically inaccessible by combing the physical data of a harddisk. If I use binarywriter to overwrite parts of the original file with gibberish will it be put on the same spot on the harddisk or will the file become fragmented and actually just redirect via the FAT to some other portion of the harddisk? Obviously there's no point in overwriting the original file with gibberish if it's not over-writing the original cluster on the harddisk.
For your first question: A BinaryReader is not what you want. The name is a bit misleading: it "Reads primitive data types as binary values in a specific encoding." You probably want a FileStream.
Regarding the second question: That will not be easy: please see the "How SDelete Works" section of SDelete for an explanation. Brief extract in case that link breaks in the future:
"Securely deleting a file that has no special attributes is relatively straight-forward: the secure delete program simply overwrites the file with the secure delete pattern. What is more tricky is securely deleting Windows NT/2K compressed, encrypted and sparse files, and securely cleansing disk free spaces.
Compressed, encrypted and sparse are managed by NTFS in 16-cluster blocks. If a program writes to an existing portion of such a file NTFS allocates new space on the disk to store the new data and after the new data has been written, deallocates the clusters previously occupied by the file."

Extracting file names from an online data server in Matlab

I am trying to write a script that will allow me to download numerous (1000s) of data files from a data server (e.g, http://hydro1.sci.gsfc.nasa.gov/thredds/catalog/GLDAS_NOAH10SUBP_3H/2011/345/). Unfortunately, the names of the files in each directory are not formatted in a similar way (the time that they were created were appended to the end of the file name). I need to be able to specify the file name to subset the data (I have a special tool for these data types) and download it. I cannot find a function in matlab that will extract the file names.
I have looked at URLREAD, but it downloads everything including html code.
Thanks for your help!
You can easily parse the link.
x=urlread(url)
links=regexp(x,'<a href=''([^>]+)''>','tokens')
Reads every link, you have to filter all unwanted links.
For example this gets all grb files:
a=regexp(x,'<a href=''([^>]+.grb)''>','tokens')

Microsoft Symbol Server / Local Cache Hash Algorithm

I am trying to figure out what hashing algorithm is used for the Microsoft Symbol Local Cache directory.
For example, the local cache can be something like the following
L:\Symbols
\browseui.dll
\44FBC679fe000
browsue.dll
\browseui.pdb
\44F402F62
browseui.pdb
\explorer.exe
\3EBF1F14f7000
explorer.exe
\explorer.pdb
\3EBF1F141
explorer.pdb
\msvcr71.pdb
\60D915C6AB6A4F3586E9096E2F8856482
msvcr71.pdb
There seems to be some sort of correspondence between a file and its debug database. Other than that, I can’t figure out how the names of these (presumably) hexadecimal string folders are being generated.
Some of them are 9 digits, some 13 digits, and others are 33 digits. It looks like an actual, live-file (which for some reason is stored in the symbol cache) has a 13-digit hash while its (nearly similar) debug database gets a 9-digit hash. Some debug databases get a 13-digit hash; can’t figure out what makes these ones special, although they don’t have a corresponding live-file.
I’ve tried hashing the files with every kind of hash algorithm that I know of (39 of them) and none match in any way (straight up, reversed, alternate endian’d, etc.)
Any ideas?
Update
I think I finally found it. From Symbol Storage Format:
SymStore uses the file system itself as a database. It creates a large tree of directories, with directory names based on such things as the symbol file time stamps, signatures, age, and other data.
Edit
Dang, unfortunately it only mentions that the directory name is derived from various aspects (not quite a hash I guess), but does not say exactly how. The search continues… :-(
This page has info on calculating the IDs for the symbol files as well as executables/DLLs.
Basically, for executables and DLLs, you extract the timestamp and filesize from the PE header as listed in the page that Griff linked to. For PDB files however, you will need the DBH command from the Windows Debugging Tools. Simply load the PDB file into DBH and use the INFO command to get the PdbSig/PdbSig70 and PdbAge. Bam! That’s it.
I just created the appropriate folders for the PDB files that I had in my SYSTEM32 folder for some reason, and finally moved them to the local symbol store.
Try looking at this page: Symbol Server Callback Function
EXE/DLL directory name is created by concatenating hex string of the "file modified" time-stamp and "SizeOfImage" from IMAGE_OPTIONAL_HEADER
Finding PE files
The format for the path to a PE file in a symbol server share is:
"%s\%s\%08X%x\%s" % (serverName, peName, timeStamp, imageSize, peName)
Example:
https://msdl.microsoft.com/download/symbols/ntdll.dll/B29ECF521f0000/ntdll.dll
Finding PDB files
The format for the path to a PDB file in a symbol server share is:
"%s\%s\%s%x\%s" % (serverPath, pdbName, guid, age, pdbName)
Example:
https://msdl.microsoft.com/download/symbols/ntdll.pdb/4BC147AE72E8D05022366D6570A8E3461/ntdll.pdb
Source: Symbols the Microsoft Way by Bruce Dawson.
You can find the answer,
SYMBOL RETRIEVER SHELL EXTENSION
; http://www.vitoplantamura.com/index.aspx?page=symretriever
DebugDir.cpp
; http://www.debuginfo.com/examples/src/DebugDir.cpp
PDB File Internals
; http://www.informit.com/articles/article.aspx?p=22685

How do you compare the content of two archive files programmatically?

I'm doing some testing to ensure that the all in one zip file that i created using a script file will produce the same output as the content of a few zip files that i must manually click and create via web interface. Therefore the zip will have different folder structure.
Of course i can manually extracted them out and using my powerful eyeball technique to scan them or even lazier i can write a script to do that, but before i invest more time and get accused by my boss for company time robbery, i'm asking if there's a better way to do this?
I'm using perl LAMP stack by the way.
thanks.
You can use perl's Archive::ZIP or Python's zipfile to extract the filenames, sizes and CRC checksums of the files in the archives. Create a file which contains the results sorted by file name (ignore the path).
For your smaller ZIPs, merge the results of the script (cat list1 list2 list3 | sort).
Now, you can use diff to compare the results.
I can wholeheartly recommend Beyond Compare. Unless you're really getting underpaid, it's the biggest bang for your (bosses) buck.
[Edit] I seem to have scanned over the different folder structure, sorry about that.Beyond Compare can compare all files in folders with the same folderstructure. It does not have (I believe) the intelligence to go searching for matches in files in different folders.
Regards,
Lieven
Create a crc checksum for your files.
If your checksum is the same for the original files and the unzipped files, you can be sure the files are the same. And even works for non text data.
A checksum be easily be created with an external program such as "SFV Checker" or programmatically (.net/java for example include libraries to do this).
Taking a cue from Carra's answer...if A.zip is your single big archive and B.zip is the archive generated through the web then use the following algorithm
Extract all files from A.zip and recursively (w.r.t folders) compute the checksum of the files present in the folder (using cksum, md5sum etc) where the contents were extracted and save this information after sorting it (pipe it through sort) to a file (say A.txt)
Do the same for B.zip and generate B.txt
Compare A.txt with B.txt they should be exactly the same.
OR
Use unzip -l to get file/directory lists for both the (zip) archives and then flatten the hierarchy of the user generated zip file and compare with the contents of your script generated zip file using some thing like diff. By flattening of hierarchy I mean you may need to do some kind of pre-precessing on one or both lists before you can do a meaningful comparison with diff.