Does metadata alter hash for a file? - hash

We know that hash value for a file is independent of filename.
And I did some experiment and it proved that in terms of mac os, the change of label(red,..), keywords, description (in open meta) of a file do not alter hash value.
But the change of metadata in jpeg does change the hash.
So I start to wonder why it holds? Any clue or inspiring tutorial?

The tool that you used apparently hashed what the OS considers as file contents, which in the case of a JPEG includes some metadata defined in the JPEG standard. Keywords, description, etc. are stored outside of the file contents proper by the filesystem.
(What is considered data and what metadata can be rather arbitrary and dependent on the context, e.g. the processing application and platform.)

There are different ways that metadata is stored.
For structured storage files created by COM applications, it's embedded directly in the file data. This would change the file's hash if the document properties were changed. On volumes formatted with NTFS 5 (Win2k and later), document properties can be added to any type of file and are stored in alternate data streams. I assume the same is true for the OS X filesystem.

Related

Does seafile store synced files anywhere?

I'm using Seafile (on docker) to sync some files to a Synology nas and it is all working correctly. I've created an external folder that is pointed to /shared folder in the container.
I think I already know the answer, but are the files synced to the server stored 'normally' somewhere? i.e. If I sync a folder called 'photos' and it has 'a.jpg' in it, will I be able to find that file on the seafile server?
The reason for the question is I would like to backup the original files that are sync'd, rather than having to backup the seafile DB, etc.
(I am aware that syncthing does what I want, so I may choose to use that instead, just want to confirm my understanding)
Thanks
TLDR;
No you won't find your a.jpg file on the server. Your files are going to be turned into blocks of bytes.
To understand
If you take a look at this part of the documentation of data model
FS
There are two types of FS objects, SeafDir Object and Seafile Object. SeafDir Object represents a directory, and Seafile Object represents a file.
Block
A file is further divided into blocks with variable lengths. We use Content Defined Chunking algorithm to divide file into blocks. A clear overview of this algorithm can be found at http://pdos.csail.mit.edu/papers/lbfs:sosp01/lbfs.pdf. On average, a block's size is around 1MB.
So backing up files will won't be as easy as making a raw copy of the seafile drive. As mentioned by #JensV you may still achieve something along those lines using the seafile drive client.

Why moodledata directory has this structure?

I know moodle's internal files such as uploaded images are stored in moodledata directory.
Inside, there are several directories:
moodledata/filedir/1c/01/1c01d0b6691ace075042a14416a8db98843b0856
moodledata/filedir/63/
moodledata/filedir/63/89/
moodledata/filedir/63/89/63895ece79c4a91666312d0a24db82fe3017f54d
moodledata/filedir/63/3c/
moodledata/filedir/63/37/
moodledata/filedir/63/a7/
What are these hashses?
What are the design reasons behind this design, in oposition with, for example, wordpress /year/month/file.jpg structure?
https://docs.moodle.org/dev/File_API_internals#File_storage_on_disk
Simple answer - files are stored based on the hash of their content (inspired by the way Git stores files internally).
This means that if you have the same file in multiple places (e.g. the same PDF or image in multiple courses), it is stored only once on the disk, even if the original filename is different.
On real sites this can involve a huge reduction in disk usage (obviously dependent on how much duplication there is on your site).
Moodledata files are stored according to the SHA1 encoding of their content, in order to prevent the duplication of content (for example, when the same file is uploaded twice with a different name).
For further explanations of how to handle such files, you can read the official documentation of the File API :
https://docs.moodle.org/dev/File_API_internals
especially the File storage on disk part.

Where are Filenames and other File Properties stored in OS's?

I was wondering some time ago, where are filenames and modification dates stored in Operating System.
For instance, when you create a text file in Windows, and you give it a name, when you look at the binary form using a tool like Frhed, there won't be anything (besides the text content)
Is there a folder with all files names and dates?
Supposing your friend sends you a text file, how do you get the filename (and other file properties) in your computer?
Complete description of what you are asking cannot be covered In a single SO answer, if you really want to understand details then I suggest you pick a good operating system book and read file management section.
A very simple and general description is as follows.
At the very basic level the operating system (file system to be specific) will use two types of data structures to store your file.
• Data structure to store information related to file (meta data)
• Date structure to store the actual data of file that you see ( text,image,sound)
In UNIX world the first data structure is called an Inode, it contains information related to file such as owner, permission, time created, time modified, size, pointer to the data blocks that store the actual data of file.
Every file has its own Inode which contains data associated with that file. Note that Inode doesn’t contain the actual file data.
actual file data is stored in Data blocks.
So in summary for every file you create, operating system will create a data structure which will contain all the related data.
The operating system stores the attributes of the file on the disk. The actual disk structure depends upon the operating system.
Is there a folder with all files names and dates?
The Windoze disk structure is NTFS. It has a master file table with information about all the files on the disk.
There are effectively two structures that work cooperatively. Directories define the tree structure holding files. The master file table all the files. It is not a folder with all the files but rather an internal data structure. Generally users cannot see the MFT.
If the disk gets hosed, recovery software will go to the master file table. That allows restoring the files but not their location within the directory structure.
Supposing your friend sends you a text file, how do you get the filename (and other file properties) in your computer?
That is something entirely different from the first question. Email messages encode the file name of a attachments. Your mail program uses that name to create local copies of the file.

Saving image in database

Is it good to save an image in database with type BLOB?
or save only the path and copy the image in specific directory?
Which way is the best (I mean good performance for the database and the application) and why?
What are your requirements?
In the vast majority of cases saving the path will be better, simply because of the sheer size of the files compared to the rest of data (bulge the DB by GBs due to image inclusion). Consider adding an indirection, eg. save the path as a name and a reference to a storage resource (eg. a storage_id referencing a row in storages tables) and the path attached to the 'storage'. This way you can easily move files (copy all files, then update the storage path, rather than update 1MM individual paths).
However, if your requirements include consistent backup/restore and/or disaster recoverability, is often better to store images in the DB. Is not easier, nor more convenient, but is simply going to be required. Each DB has its own way of dealing with this problem, eg. in SQL Server you would use a FILESTREAM type which allows remote access via file access API. See FILESTREAM MVC: Download and Upload images from SQL Server for an example.
Also, a somehow dated but none the less interesting paper on the topic: To BLOB or Not to BLOB.

Can a FAT filesystem support multiple references to a file?

Can a FAT based file system be modified to support multiple references to a file (i.e. aliases) by using the same FAT block sequence in directory table entries?
No because then when any reference was deleted, the file would be added to free space and possibly reused. This would result in two different files sharing space with any write to one corrupting the other.
This could work if the file system was immutable. For example if it was written to an unwritable medium.
Surely, you can have directory items points to same FAT records, but there are two things you should keep in mind:
1) never run any standard check disk utilities otherwise you get it wrong
2) you have to implement own delete operation to remove records from directory which points to the same item that you delete.
UPD: answer consider that question has 'can be modified' approach
The FAT File System stores all information about a file in a single structure inside a directory, except the addresses of disk blocks that contain file data. Disk block numbers of all files are kept in a File Allocation Table (FAT).
Since the link information and file container information are bound together in a single structure, FAT file system does not support multiple links to a single file. It does not support symbolic links either, though it could have. However, Windows supports shortcuts that are similar to symbolic links.