FSCTL_MOVE_FILE on Windows XP, System volume, FAT32 - windows-xp

I am having a problem with defragmenting files on windows xp, fat32 system volume. I am not writing a defragmenter but instead a part of the solution requires a certain set of files to be laid out continuously on the disk. To ensure this i am using FSCTL_MOVE_FILE ioctl to move file extents into a single free space extent of sufficient size on the volume. The process goes as follows:
1) Create a file:
return m_file.Create(path,
GENERIC_READ | GENERIC_WRITE,
0, NULL,
CREATE_ALWAYS,
FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH |
FILE_ATTRIBUTE_SYSTEM | FILE_ATTRIBUTE_HIDDEN,
NULL);
2) Fill file with zeroes.
3) Check that file is fragmented, if it is, acquire volume bitmap with FSCTL_GET_VOLUME_BITMAP, find free cluster chain of sufficient size.
4) Use FSCTL_MOVE_FILE to defragment the file into found extent as such:
MOVE_FILE_DATA input;
input.FileHandle = fileHandle;
input.StartingVcn.QuadPart = 0;
input.StartingLcn.QuadPart = freeExtent.lcn;
input.ClusterCount = totalFileClusters;
DWORD bytesReturned = 0; // unused
::DeviceIoControl(
volumeHandle,
FSCTL_MOVE_FILE,
&input,
sizeof(input),
NULL,
0,
&bytesReturned,
NULL);
That last call works fine on NTFS system and regular volumes. Non-system volumes on XP also present no problem. However on FAT32 system volumes on XP i almost always get INVALID_ARGUMENT (87) error. Files are quite large, about 700MB. The volume has about 10GB free space. After failed fsctl it can be seen that part of the file was actually moved before the error occured. I tried several attempts, but so far all 50 of them failed. I am aware that moving a large file this way can fail due to a previously free cluster further down the road becoming occupied by something else on the volume, especially if the volume has a lot of activity (like the system volumes usually have). But i have no idea how to mitigate this given i have no kernel presence. What i am doing wrong and/or how can i do better?

The short answer here is no, you cannot get there from user mode. You'll always be racing with the rest of the applications/operating system processes that are manipulating the filesystem.
If you really want to go down this path, you're going to need to write a fairly complex kernel driver (File System Minifilter) to assist with synchronizing this operation. This can get tricky, particularly for moves on volumes that host page/swap files.
Good luck!

Two problems. First, on a live system you are always racing against other activity (as the previous answerer suggested). As an example, once you have retrieved the volume bitmap, it may already be out of date due to operations in other processes that you don't see.
Second, there is probably a limitation in FAT32 for how much you can move at a time. Perhaps you should consider moving files in chunks of 256Kb (aka the cache manager mapping view size).
If you encounter an error, such as hitting space the was formerly free, you have much less work to reason about.
Finally, if you have a 700Mb file, if it exists in 700 1Mb chunks, I doubt that you'd see any significant performance improvements over a single 700Mb chunk.

Related

INode File System, what is the extra space in a data block used for?

So, I am currently learning about the INode file system and am asked to write a simple file system using Inodes.
So far, I understand that there is an INode table that has a mapping from INode-> Data blocks through direct/indirect pointers.
Let's assume data gets written into a file, the data is stored into two blocks. Let's say each block is 512bytes, and the file takes one full block, and only 200 bytes of the second block. What happens with the rest of the space in that data block? Is it reserved for that file only or do other files use this block?
Depending on the file system, usually and most likely this area is now lost. I think the Reiser File System actually reclaimed this area, but I could be wrong.
Creating your own File System can be a challenging experience, but also an enjoyable experience. I have created a few myself and worked on another. If you are creating your own file system, you can have it do whatever you wish.
Look at the bottom of this page for a few that I am working on/with. The LeanFS in particular, uses Inodes as well. The SFS is a very simple file system. Each is well documented so that you can research and decide what you would like to do.

Is WinDbg supposed to be so excruciatingly slow?

I'm trying to analyze some mini crash dumps. I'm using Windows 10 Pro Build 1607 and WinDbg 10.0.14321.1024. I have my symbol file path set to
SRV*C:\SymCache*https://msdl.microsoft.com/download/symbols
Basically, whenever I load up a minidump (all < 1 MB .dmp files), it takes WinDbg forever to actually analyze them. I understand the first run can take long, but it took mine almost 12 hours before it would let me enter a command. I assumed that, since the symbols were cached, it wouldn't take long at all to re-open the same .dmp. This is not the case. It loads up, goes pretty much instantaneously to "Loading Kernel Symbols", then takes another 30 minutes before it prints the "BugCheck" line. It's been another 30 minutes, and I still can't enter commands into it.
My PC has a 512 GB SSD, 8 GB of RAM, and an i5-4590. I don't think it should be this slow.
What am I doing wrong?
These kind of complaints seem to occur more often lately and I can reproduce it on my PC. This is not your fault but some issue with the Internet or the symbol server on Microsoft side.
Monitoring the traffic with Wireshark and looking at my disk on how the symbol cache get populated, I can say:
only one file is being downloaded at a time.
the problem also occurs with older WinDbg versions (6.2.9200)
the problem occurs with HTTP and HTTPS
when symbols are found, the transfer speed is very slow, then increasing. The effective transfer rate is down at 11 kb/s to 20 kb/s (on a line which can handle 6500 kb/s)
there's quite a high number of packets out of order, duplicate packets etc., especially during the "lookup phase" where no file is downloaded yet. Such a lookup phase can easily take 8 minutes.
even if the file already exists on disk, the "lookup phase" is performed.
the HTTP roundtrip time (request to response) is 8 to 9 seconds
This is the symbol server being really slow. Other have noticed as well: https://twitter.com/BruceDawson0xB/status/772586358556667904
Your symbol path contains a local cache so it should load faster next time around, but it seems that the cache is not effective, I can't tell really why (I suspect the downloaded symbols are not a perfect match and they are being downloaded again, every time).
I would recommend modifying the _NT_SYMBOL_PATH (or whatever is the way your sympath is initialized) to SRV*C:\SymCache only, ie. do not attempt to automatically download, just use the symbols you already have cached locally. The image should open fairly fast. Only enable the symbols server if you discover missing symbols.
I ran into the same problem (extremely slow windbg), but loading/reloading/fixing/caching symbols did not help. By accident, I figured out that this problem persists when I try to print memory with address taken from a register, like
db rax
The rule of thumb is to always use # with the register name.
db #rax
Without this symbol, the debugger considers rax to be a symbol name, and looks for it some time (depending on the amount of symbols you have loaded) and fails to find it eventually, and falls back to treating it like a register name. Printing memory from register with # symbol works instantly, even if you have gigs of symbols loaded in memory. As you can see, this problem is also symbol-related, but in a different way.

Matlab not able to read in large file?

I have a data file (6.3GB) that I'm attempting to work on in MATLAB, but I'm unable to get it to load, and I think it may be a memory issue. I've tried loading in a smaller "sample" file (39MB) and that seems to work, but my actual file won't load at all. Here's my code:
filename = 'C://Users/Andrew/Documents/filename.mat';
load(filename);
??? Error using ==> load
Can't read file C://Users/Andrew/Documents/filename.mat.
exist(filename);
EDU>> ans = 2
Well, at least the file exists. When I check the memory...
memory
Maximum possible array: 2046 MB (2.146e+009 bytes) *
Memory available for all arrays: 3442 MB (3.609e+009 bytes) **
Memory used by MATLAB: 296 MB (3.103e+008 bytes)
Physical Memory (RAM): 8175 MB (8.572e+009 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
So since I have enough RAM, do I need to increase the maximum possible array size? If so, how can I do that without adding more RAM?
System specifics: I'm running 64-bit Windows, 8GB of RAM, MATLAB Version 7.10.0.499 (R2010a). I think I can't update to a newer version since I'm on a student license.
As the size might be the issue, you could try load('fileName.mat', 'var1'); load('fileName.mat', 'var2'); etc. For this, you'll have to know the variable names though.
An option would be to use the matfile object to load/index directly into the file instead of loading into ram.
doc matfile
But one limitation is that you can not index directly into a struct. So you would need to find a friend to convert the struct in your mat file and save it with the version option
save(filename, variables, '-v7.3')
May be you can load part by part your data to do your stuff using load part of variables from mat file. You must have matlab 7.3 or newer.
From your file path I can see you are using Windows. Matlab is only 32 bit for Windows and Linux (there is no 64 bit for these OSes at least for older releases, please see my edit), which means you are limited to <4GB ram total for a single application (no matter how much you have in your system), this is a 32 bit application issue so there is nothing you can do to remedy it. Interestingly the Mac version is 64 bit and you can use as much ram as you want (in my computer vision class we often used my mac to do our big video projects because windows machines would just say "out of memory")
As you can see from your memory output you can only have ~3.4GB total for matrix storage, this is far less than the 6.3GB file. You'll also notice, you can only use ~2GB for one particular matrix (that number changes as you use more memory).
Typically when working with large files you can read the file line by line, rather than loading the entire file into memory. But since this is a .mat file that likely wouldn't work. If the file contains multiple variables maybe separate them each into their own individual files that are small enough to load
The take home message here is you can't read the entire file at once unless you hop onto a Mac with enough RAM. Even then the size for a single matrix is still likely less than 6.3GB
EDIT
Current Matlab student versions can be purchased in 64 bit for all OSes as of 2014 see here so a newer release of Matlab might allow you to read the entire file at once. I should also add there has been a 64 bit version before 2014, but not for the student license

MATLAB used up all my disk space! How can I get it back?

I left MATLAB running on a simple ode45 + plot, and when I came back I saw that the 5GBs of free space I had on my drive (C:) was no more! MATLAB had stopped due to "no memory".
Can someone please tell me what happened and how I can get my space back???
Thank You.
You can visually inspect hard disk usage and find folders and files which take up a lot of space with a tool such as TreeSize Free.
P.S. You can also try clearing temporary folders either trough built-in disk cleaner or other tools such as CCleaner.
MatLab is one of those apps that have an all world of computing science where you only want to work in a small tiny island of knowledge, the Help folder of it is huge, anyway here's some things you can do to make it slimmer on disk:
Install only the packages you need.
Use JPEGMini to compress the JPEG collection of the huge help folder.
Use Pngyu to compress the huge collection of PNG files to 8 bit depth.
Step 2 and 3 will get you back like a Gigabyte if not more.
Use NTFS compression on the MatLab Folder.
It will get you back another 2 Gigabytes
Both step 2 and 3 must be done with admin privileges, the drag and drop of folder to it must be done with another app with admin privileges also, you can use Explorer++ as Windows File Explorer alternative.

MATLAB slowing down on long debugging sessions

I have noticed that MATLAB (R2011b on Windows 7, 64 bit) tends to slow down if I am in debugging mode for a long period of time (e.g. 3 hours). I don't recall this happening on previous versions of MATLAB.
The slow down is small, but significant enough to have an impact on my productivity (sometimes MATLAB needs to wait for up to 1 sec before I can type on the command line or on the editor).
I usually spend hours on debugging mode (e.g. after stopping at a keyboard statement) coding full projects in this mode. I find working on debugging mode convenient to organically grow my code while inspecting my code anytime in execution time.
The odd thing is my machine has 16 GB of RAM and the total size of all workspaces while in debugging mode is usually less than 4 GB. I don't have any other large process running in the background, and my system reports ~8GB of free RAM.
Also, unfortunately MATLAB does not let me call pack from debugging mode; it complains with :
Warning: PACK can only be used from the MATLAB command line.
I have reproduced this behavior after restarting MATLAB, rebooting my system, and on different days. With this, my question/s are:
Has anybody else noticed this? Is there anything I could do to prevent this slowdown without exiting debugging mode?
Are there any technical notes or statements from Mathworks addressing this issue?
In case it matters, my code is on a network drive, so I added the following on my startup.m file, which should alleviate any impact on performance resulting from it:
system_dependent('RemoteCWDPolicy', 'None');
system_dependent('RemotePathPolicy', 'None');
system_dependent('DirChangeHandleWarn','Never');
I have experienced some similar issues. The problem ended up being that Mathworks changed how Matlab caches files. For some users, it is now storing data in the TMP folder as defined by the environment variables. This folder was being scanned by anti virus and causing a lot of performance problem. Of course, IT wouldn't let us exclude the TMP folder from scans. So we added a line to our start up script that changes the environment variable of TMP to some other location within an excluded folder.
You don't have to worry about changing the variable back or messing up other programs. When applications launch, they copy the environment variables into their own local instance of them. Any changes made to them only change the local copy of those variables, not the system copy.
Here is the function you will need.
setenv('TEMP', 'C:\TEMP');
I'm not sure if it was TMP or TEMP. Check your environment variables to be sure.
I am using MATLAB R2011 on linux 10, windows 7 (32 bit).
I experienced MATLAB slowing down while printing simple variables in command window.
It turned that there was one .m file loaded in my Editor.
It was a big file with 10000 lines. These lines were simple data that should have been saved as mat file. When i closed this file, the editor was back to its normal speed.