What is the difference between File System and File Management in an operating system? - operating-system

I’ve found about file management explanations and file system explanations and “file system is part of file management” explanations. But I am wondering if they are the same or two different things? Because I cannot seem to find an article about them.

A modern Operating System, to be portable, must be file system independent. i.e.: It should not matter what type of storage format a given media device contains. At the same time, a media device must contain a specific type of storage format to contain files and folders, and at the same time be Operating System independent.
For example, an OS should be able to handle any file locally, allowing the actual transfer of these files from physical media to the OS (and visa-versa) to be managed by the file system manager. Therefore, an OS can be completely independent of how the file was stored on the media.
With this in mind, there are at least two layers, usually more, of management between the file being viewed and the file on the physical media. Here is a (simple) list of layers that might be used from top down.
OS App viewing the file
OS File Manager
OS File System Manager (allowing multiple file systems)
Specific File System Driver
Media Device Driver
When a call to read a file is made, the app (1) calls the OS File manager (2), which in turn--due to the opening of the file--calls the correct OS File System Manager (3), which then calls the Specific File System Driver (4), which then calls the Media Device Driver (5) for the actual access.
Please note that any or all could have a working cache manager which means calls are processed and returned without calling lower layers. e.g.: each read more than requested in anticipation of another read.
By having multiple layers like this, you can have any (physical) file system and/or media device you wish and the OS would be none the wiser. All you need is a media driver for the specific physical device and a file system manager for the physical format of the contents of the media. As long as these layers all support the common service calls, any format of media and content on that media will be allowed by the OS.

Related

How is a program loaded in OS

I am reading about logical and physical addressing. I am confused that when a binary file is run, does it pass first through the CPU where the logical address are generated or is it directly copied to the physical memory ?
when a binary file is run, does it pass first through the CPU where the logical address are generated or is it directly copied to the physical memory ?
Typically some code somewhere loads the executable file's headers into memory, and then uses information from the headers to figure out where various pieces of the file (sections - e.g. .text, .data, etc) should be in virtual memory and what each virtual page's virtual permissions should be (if writes are allowed, if execution is allowed).
After this, areas of the virtual address space are set up. Often this is done by memory mapping the relevant part of the file into the virtual address space, without actually loading them into physical memory. In this case each page's actual permissions don't reflect the page's virtual permissions (e.g. a "read/write" page might be "not present" initially, and when software tries to read from the page you'll get a page fault and the page fault handler might fetch the page from disk and change the page to "present, read only"; and then later when software tries to write to the page you might get a second page fault and the page fault handler might do a "copy on write" so that anything else using the same physical page isn't effected and then make the new copy "read/write" so that it matches the original virtual permissions).
While this is happening; the OS could (depending on amount of free physical RAM and whether storage devices have more important data to transfer) be prefetching the file data from disk (e.g. into VFS cache), and could be "opportunistically" updating the process' page tables to avoid the overhead of page faults for pages that have been prefetched.
However; if the OS knows that the file was on unreliable and/or removable media it may decide that using memory mapped files is a bad idea and may actually load the needed executable's sections into memory before executing it; and an OS could have other features that cause the file to be loaded into RAM before it's executed (e.g. if the OS checks that an executable file's digital signature is correct before allowing the file to be executed, then the entire file probably needs to be loaded into memory to allow the digital signature can be checked, and in that case the entire file is likely to still be in memory when virtual address space is set up).
You need to read an entire book on these topics and spend several weeks on that
But Operating Systems: Three Easy Pieces is a good book, and it is freely downloadable.
Once you have read it, look perhaps also into osdev.org for practical things. And don't forget free software OSes such as Linux, e.g.
https://kernelnewbies.org/
Be aware of copy-on-write and virtual address space...
An executable files is generally an interpreted program itself that is executed by the loader. The executable contains instructions that tell the loader how the program should exist in VIRTUAL memory. By that, I mean the instructions in the executable define how the initial VIRTUAL representation of process address space.
So when the executable starts, there is only a virtual representation of the address space in secondary storage. As the program executes, it starts page faulting repeatedly to load pages into memory. After the initial load, the page fault rate dies down.
The executable NORMALLY only contains logical addresses.

akka event sourcing with distributed and/or large files

I am using the book "Mastering Akka" by Christian Baxter.
Now I try to build a new project with akka as event sourced system.
I have a object like Folder. In this Folder could be a number of Files. Really files (java.io.File). For a locale system no problem. But I try to build a distributed system. User A set up a database and gets access to his database. Where are the files? Because user A does not sitting on his desktop pc (where he saved the Folder). He is sitting at a notebook on his homeoffice. Now he needs these files inside Folder.
First I thought to save the files as Array[Byte]. But whats about the situation the file is 150MB? And maybe there are 20 files inside folder and all with more then 150MB? I think my RAM is not working for so long without a crash.
Their is no http server. Also maybe I could ask the server to deliver the files as stream? But this needs settings. Is that the best way?
What is the best practice to handle multiple distrubted and/or large files on av event sourced actor?

How does my operating system get information about disk size, RAM size, CPU frequency, etc

I can see from my OS the informations about my hard disk, RAM and CPU. But I've never told my OS these info.
How does my OS know it?
Is there some place in the hard disk or CPU or RAM that stores this kind of information?
Is there some standard about the format of this kind of information?
SMBIOS (formerly known as DMI) contains much of this information. SMBIOS is a a data structure/API that is part of the BIOS/UEFI firmware and contains info like brand and model of the computer, etc.
The rest is gathered by the OS querying hardware directly.
Answer grabbed from superuser by Mokubai.
You don't need to tell it because each device already knows (or has a way) to identify itself.
If you get the idea that every device is accessed via address and data lines, and in some cases only data lines then you come to the relaisation that in those data lines you need some kind of "protocol" that determines just how you talk to those devices.
In amongst that protocol you have commands that say "read this" and "send that" or "put this over there". It is also relatively easy to have a command that says "identify yourself" which, rather than reading a block of disk or memory or painting a pixel a particular colour, will return a premade string or set of strings that tell the driver or operating system what that device is. Using a series of identity commands you could discover a device type, it's capabilities and what driver might be able to work with it.
You don't need to tell a device what it is, because it already knows. And you don't need to tell the operating system what it is because it can ask the device itself.
You don't tell people what they're called and how they talk, you ask them.
Each device has it's own protocol for these messages, and they don't store the details of other devices because to do so would be insane and near useless given that you can remove any device at any time. Your hard drive doesn't need to store information about your memory or graphics card except for the driver that the operating system uses to talk to it with.
The PC UEFI specification would define a core set of system specifications that every computer has, allowing the processor to be powered up and for a program stored in an EEPROM to begin the asbolute basic system probing necessary to determine the processor, set up the RAM, find a disk and display and thus continue to boot the computer.
From there the UEFI system would hand over to the operating system which would have more detailed probing and identification procedures, but it all starts at the most basic "I have a processor, what is around me?" situation.

Synchronize Directory of Files Between Server and iOS Application

I am building an internal iOS application (so - it won't ever be in the app store), and I need to keep a directory of content synchronized between a server and each of the instances of the iOS application. This would be easy enough if I just wanted to delete and re-download this content each time, but I would rather use something similar to rsync to only download the elements that have changed.
I haven't found any good way to utilize rsync. I considered looking at Objective-Git as a possibility here, but at a quick glance it looked like there is still a lot of the support for remote repositories that isn't supported yet.
As a final note, while this won't be in the app store, I will not be jailbreaking these devices and I would prefer to not rely on any private API's (although if there was an elegant solution that utilized private API's I might consider it).
Thoughts?
ADDITIONAL NOTE: This needs to be an isolated solution. I won't be relying on outside services (like Dropbox, Box.net, etc...). This needs to work solely between the device and the server (which is on a local network with the device).
Use HTTP to list the contents of each folder on the server.
Compare last modification time of each file with those on the device, and identify added/removed files.
Get added and modified files, remove deleted files.
It sounds like you're maybe asking for a library that already does this, but if you don't find one it's obviously moderately easy to write this from the ground up using stat(2) on the server and the same or a higher-level equivalent on the iOS devices. Have the iPhone send a tree of files with their modification date to the server and get back a list of insert/delete/update operations to do with the url (or whatever) for each one so you can do them incrementally on a background thread. Have the information from the server for new/updated files include the mod date that the server has so you can set it to be the same on the iOS device and send that when asking the server for the status of each file (kind of hack using the file system to store that, but it works).
Why not just set up a RESTful interface and do it across HTTP; that way you could query the modification times easily enough to determine whether client or server files need to be updated. You might also want to keep track of what files on the client have been synced, so you can easily know which files to add or delete. This can be done with a simple .sync file or using a plist / sqlite / etc.
If you'll consider FTP, there are some pretty advanced client libraries available.
For example, the iOS Chilkat bundle includes an FTP client library that supports synchronization in both directions. It's not free, but it's pretty cheap -- and you get a ton of other stuff that will likely prove useful someday. Here's an example of iOS pulling down all additions and changes (mode 2):
http://www.example-code.com/ios/ftp_syncLocalTree.asp
One caveat -- judging solely from the example, it doesn't appear to synchronize deletions. If this is a requirement, you could do it yourself without too much effort immediately following a sync.
acrosync (see https://acrosync.com/library.html) seems like a good fit given the initial question, however I haven't used it myself yet.

Detect a file in transit?

I'm writing an application that monitors a directory for new input files by polling the directory every few seconds. New files may often be several megabytes, and so take some time to fully arrive in the input directory (eg: on copy from a remote share).
Is there a simple way to detect whether a file is currently in the process of being copied? Ideally any method would be platform and filesystem agnostic, but failing that specific strategies might be required for different platforms.
I've already considered taking two directory listings separaetd by a few seconds and comparing file sizes, but this introduces a time/reliability trade-off that my superiors aren't happy with unless there is no alternative.
For background, the application is being written as a set of Matlab M-files, so no JRE/CLR tricks I'm afraid...
Edit: files are arriving in the input directly by straight move/copy operation, either from a network drive or from another location on a local filesystem. This copy operation will probably be initiated by a human user rather than another application.
As a result, it's pretty difficult to place any responsibility on the file provider to add control files or use an intermediate staging area...
Conclusion: it seems like there's no easy way to do this, so I've settled for a belt-and-braces approach - a file is ready for processing if:
its size doesn't change in a certain period of time, and
it's possible to open the file in read-only mode (some copying processes place a lock on the file).
Thanks to everyone for their responses!
The safest method is to have the application(s) that put files in the directory first put them in a different, temporary directory, and then move them to the real one (which should be an atomic operation even when using FTP or file shares). You could also use naming conventions to achieve the same result within one directory.
Edit:
It really depends on the filesystem, on whether its copy functionality even has the concept of a "completed file". I don't know the SMB protocol well, but if it has that concept, you could write an app that exposes an SMB interface (or patch Samba) and an API to get notified for completed file copies. Probably a lot of work though.
This is a middleware problem as old as the hills, and the short answer is: no.
The two 'solutions' put the onus on the file-uploader: (1) upload the file in a staging directory and then move it into the destination directory (2) upload the file, and then create/upload a 'ready' file that indicates the state of the content file.
The 1st one is the better, but both are inelegant. The truth is that better communication media exist than the filesystem. Consider using some IPC that involves only a push or a pull (and not both, as does the filesystem) such as an HTTP POST, a JMS or MSMQ queue, etc. Furthermore, this can also be synchronous, allowing the process receiving the file to acknowledge the content, even check it for worthiness, and hand the client a receipt - this is the righteous road to non-repudiation. Follow this, and you will never suffer arguments over whether a file was or was not delivered to your server for processing.
M.
One simple possibility would be to poll at a fairly large interval (2 to 5 minutes) and only acknowledge the new file the second time you see it.
I don't know of a way in any OS to determine whether a file is still being copied, other than maybe checking if the file is locked.
How are the files getting there? Can you set an attribute on them as they are written and then change the attribute when write is complete? This would need to be done by the thing doing the writing ... which sounds like it isn't an option.
Otherwise, caching the listing and treating a file as new if it has the same file size for two consecutive listings is the best way I can think of.
Alternatively, you could use the modified time on the file - the file has to be new and have a modified time that is at least x in the past. But I think this will be about equivalent to caching the listing.
It you are polling the folder every few seconds, its not much of a time penalty is it? And its platform agnostic.
Also, linux only: http://www.linux.com/feature/144666
Like cron but for files. Not sure how it deals with your specific problem - but may be of use?
What is your OS. In unix you can use the "lsof" utility to determine if a user has the file open for write. Apparently somewhere in the MS Windows Process Explorer there is the same functionality.
Alternativly you could just try an exclusive open on the file and bail out of this fails. But this can be a little unreliable and its easy to tread on your own toes.