When writing a large array directly to disk in MATLAB, is there any need to preallocate?

When writing a large array directly to disk in MATLAB, is there any need to preallocate? - matlab

I need to write an array that is too large to fit into memory to a .mat binary file. This can be accomplished with the matfile function, which allows random access to a .mat file on disk.
Normally, the accepted advice is to preallocate arrays, because expanding them on every iteration of a loop is slow. However, when I was asking how to do this, it occurred to me that this may not be good advice when writing to disk rather than RAM.
Will the same performance hit from growing the array apply, and if so, will it be significant when compared to the time it takes to write to disk anyway?
(Assume that the whole file will be written in one session, so the risk of serious file fragmentation is low.)

Q: Will the same performance hit from growing the array apply, and if so will it be significant when compared to the time it takes to write to disk anyway?
A: Yes, performance will suffer if you significantly grow a file on disk without pre-allocating. The performance hit will be a consequence of fragmentation. As you mentioned, fragmentation is less of a risk if the file is written in one session, but will cause problems if the file grows significantly.
A related question was raised on the MathWorks website, and the accepted answer was to pre-allocate when possible.
If you don't pre-allocate, then the extent of your performance problems will depend on:
your filesystem (how data are stored on disk, the cluster-size),
your hardware (HDD seek time, or SSD access times),
the size of your mat file (whether it moves into non-contiguous space),
and the current state of your storage (existing fragmentation / free space).
Let's pretend that you're running a recent Windows OS, and so are using the NTFS file-system. Let's further assume that it has been set up with the default 4 kB cluster size. So, space on disk gets allocated in 4 kB chunks and the locations of these are indexed to the Master File Table. If the file grows and contiguous space is not available then there are only two choices:
Re-write the entire file to a new part of the disk, where there is sufficient free space.
Fragment the file, storing the additional data at a different physical location on disk.
The file system chooses to do the least-bad option, #2, and updates the MFT record to indicate where the new clusters will be on disk.
Now, the hard disk needs to physically move the read head in order to read or write the new clusters, and this is a (relatively) slow process. In terms of moving the head, and waiting for the right area of disk to spin underneath it ... you're likely to be looking at a seek time of about 10ms. So for every time you hit a fragment, there will be an additional 10ms delay whilst the HDD moves to access the new data. SSDs have much shorter seek times (no moving parts). For the sake of simplicity, we're ignoring multi-platter systems and RAID arrays!
If you keep growing the file at different times, then you may experience a lot of fragmentation. This really depends on when / how much the file is growing by, and how else you are using the hard disk. The performance hit that you experience will also depend on how often you are reading the file, and how frequently you encounter the fragments.
MATLAB stores data in Column-major order, and from the comments it seems that you're interested in performing column-wise operations (sums, averages) on the dataset. If the columns become non-contiguous on disk then you're going to hit lots of fragments on every operation!
As mentioned in the comments, both read and write actions will be performed via a buffer. As #user3666197 points out the OS can speculatively read-ahead of the current data on disk, on the basis that you're likely to want that data next. This behaviour is especially useful if the hard disk would be sitting idle at times - keeping it operating at maximum capacity and working with small parts of the data in buffer memory can greatly improve read and write performance. However, from your question it sounds as though you want to perform large operations on a huge (too big for memory) .mat file. Given your use-case, the hard disk is going to be working at capacity anyway, and the data file is too big to fit in the buffer - so these particular tricks won't solve your problem.
So ...Yes, you should pre-allocate. Yes, a performance hit from growing the array on disk will apply. Yes, it will probably be significant (it depends on specifics like amount of growth, fragmentation, etc). And if you're going to really get into the HPC spirit of things then stop what you're doing, throw away MATLAB , shard your data and try something like Apache Spark! But that's another story.
Does that answer your question?
P.S. Corrections / amendments welcome! I was brought up on POSIX inodes, so sincere apologies if there are any inaccuracies in here...

Preallocating a variable in RAM and preallocating on the disk don't solve the same problem.
In RAM
To expand a matrix in RAM, MATLAB creates a new matrix with the new size and copies the values of the old matrix into the new one and deletes the old one. This costs a lot of performance.
If you preallocated the matrix, the size of it does not change. So there is no more reason for MATLAB to do this matrix copying anymore.
On the hard-disk
The problem on the hard-disk is fragmentation as GnomeDePlume said. Fragmentation will still be a problem, even if the file is written in one session.
Here is why: The hard disk will generally be a little fragmentated. Imagine
# to be memory blocks on the hard disk that are full
M to be memory blocks on the hard disk that will be used to save data of your matrix
- to be free memory blocks on the hard disk
Now the hard disk could look like this before you write the matrix onto it:
###--##----#--#---#--------------------##-#---------#---#----#------
When you write parts of the matrix (e.g. MMM blocks) you could imagine the process to look like this >!(I give an example where the file system will just go from left to right and use the first free space that is big enough - real file systems are different):
First matrix part:
###--##MMM-#--#---#--------------------##-#---------#---#----#------
Second matrix part:
###--##MMM-#--#MMM#--------------------##-#---------#---#----#------
Third matrix part:
###--##MMM-#--#MMM#MMM-----------------##-#---------#---#----#------
And so on ...
Clearly the matrix file on the hard disk is fragmented although we wrote it without doing anything else in the meantime.
This can be better if the matrix file was preallocated. In other words, we tell the file system how big our file would be, or in this example, how many memory blocks we want to reserve for it.
Imagine the matrix needed 12 blocks: MMMMMMMMMMMM. We tell the file system that we need so much by preallocating and it will try to accomodate our needs as best as it can. In this example, we are lucky: There is free space with >= 12 memory blocks.
Preallocating (We need 12 memory blocks):
###--##----#--#---# (------------) --------##-#---------#---#----#------
The file system reserves the space between the parentheses for our matrix and will write into there.
First matrix part:
###--##----#--#---# (MMM---------) --------##-#---------#---#----#------
Second matrix part:
###--##----#--#---# (MMMMMM------) --------##-#---------#---#----#------
Third matrix part:
###--##----#--#---# (MMMMMMMMM---) --------##-#---------#---#----#------
Fourth and last part of the matrix:
###--##----#--#---# (MMMMMMMMMMMM) --------##-#---------#---#----#------
Voilá, no fragmentation!
Analogy
Generally you could imagine this process as buying cinema tickets for a large group. You would like to stick together as a group, but there are already some seats in the theatre reserved by other people. For the cashier to be able to accomodate to your request (large group wants to stick together), he/she needs knowledge about how big your group is (preallocating).

A quick answer to the whole discussion (in case you do not have the time to follow or the technical understanding):
Pre-allocation in Matlab is relevant for operations in RAM. Matlab does not give low-level access to I/O operations and thus we cannot talk about pre-allocating something on disk.
When writing a big amount of data to disk, it has been observed that the fewer the number of writes, the faster is the execution of the task and smaller is the fragmentation on disk.
Thus, if you cannot write in one go, split the writes in big chunks.

Prologue
This answer is based on both the original post and the clarifications ( both ) provided by the author during the recent week.
The question of adverse performance hit(s) introduced by a low-level, physical-media-dependent, "fragmentation", introduced by both a file-system & file-access layers is further confronted both in a TimeDOMAIN magnitudes and in ComputingDOMAIN repetitiveness of these with the real-use problems of such an approach.
Finally a state-of-art, principally fastest possible solution to the given task was proposed, so as to minimise damages from both wasted efforts and mis-interpretation errors from idealised or otherwise not valid assumptions, alike that a risk of "serious file fragmentation is low" due to an assumption, that the whole file will be written in one session ( which is simply principally not possible during many multi-core / multi-process operations of the contemporary O/S in real-time over a time-of-creation and a sequence of extensive modification(s) ( ref. the MATLAB size limits ) of a TB-sized BLOB file-object(s) inside contemporary COTS FileSystems ).
One may hate the facts, however the facts remain true out there until a faster & better method moves in
First, before considering performance, realise the gaps in the concept
The real performance adverse hit is not caused by HDD-IO or related to the file fragmentation
RAM is not an alternative for the semi-permanent storage of the .mat file
Additional operating system limits and interventions + additional driver and hardware-based abstractions were ignored from assumptions on un-avoidable overheads
The said computational scheme was omited from the review of what will have the biggest impact / influence on the resulting performance
Given:
The whole processing is intended to be run just once, no optimisation / iterations, no continuous processing
Data have 1E6 double Float-values x 1E5 columns = about 0.8 TB (+HDF5 overhead)
In spite of original post, there is no random IO associated with the processing
Data acquisition phase communicates with a .NET to receive DataELEMENTs into MATLAB
That means, since v7.4,
a 1.6 GB limit on MATLAB WorkSpace in a 32bit Win ( 2.7 GB with a 3GB switch )
a 1.1 GB limit on MATLAB biggest Matrix in wXP / 1.4 GB wV / 1.5 GB
a bit "released" 2.6 GB limit on MATLAB WorkSpace + 2.3 GB limit on a biggest Matrix in a 32bit Linux O/S.
Having a 64bit O/S will not help any kind of a 32bit MATLAB 7.4 implementation and will fail to work due to another limit, the maximum number of cells in array, which will not cover the 1E12 requested here.
The only chance is to have both
both a 64bit O/S ( wXP, Linux, Solaris )
and a 64bit MATLAB 7.5+
MathWorks' source for R2007a cited above, for newer MATLAB R2013a you need a User Account there
Data storage phase assumes block-writes of a row-ordered data blocks ( a collection of row-ordered data blocks ) into a MAT-file on an HDD-device
Data processing phase assumes to re-process the data in a MAT-file on an HDD-device, after all inputs have been acquired and marshalled to a file-based off-RAM-storage, but in a column-ordered manner
just column-wise mean()-s / max()-es are needed to calculate ( nothing more complex )
Facts:
MATLAB uses a "restricted" implementation of an HDF5 file-structure for binary files.
Review performance measurements on real-data & real-hardware ( HDD + SSD ) to get feeling of scales of the un-avoidable weaknesses thereof
The Hierarchical Data Format (HDF) was born on 1987 at the National Center for Supercomputing Applications (NCSA), some 20 years ago. Yes, that old. The goal was to develop a file format that combine flexibility and efficiency to deal with extremely large datasets. Somehow the HDF file was not used in the mainstream as just a few industries were indeed able to really make use of it's terrifying capacities or simply did not need them.
FLEXIBILITY means that the file-structure bears some overhead, one need not use if the content of the array is not changing ( you pay the cost without consuming any benefit of using it ) and an assumption, that HDF5 limits on overall size of the data it can contain sort of helps and saves the MATLAB side of the problem is not correct.
MAT-files are good in principle, as they avoid an otherwise persistent need to load a whole file into RAM to be able to work with it.
Nevertheless, MAT-files are not serving well the simple task as was defined and clarified here. An attempt to do that will result in just a poor performance and HDD-IO file-fragmentation ( adding a few tens of milliseconds during write-through-s and something less than that on read-ahead-s during the calculations ) will not help at all in judging the core-reason for the overall poor performance.
A professional solution approach
Rather than moving the whole gigantic set of 1E12 DataELEMENTs into a MATLAB in-memory proxy data array, that is just scheduled for a next coming sequenced stream of HDF5 / MAT-file HDD-device IO-s ( write-throughs and O/S vs. hardware-device-chain conflicting/sub-optimised read-aheads ) so as to have all the immenses work "just [married] ready" for a few & trivially simple calls of mean() / max() MATLAB functions( that will do their best to revamp each of the 1E12 DataELEMENTs in just another order ( and even TWICE -- yes -- another circus right after the first job-processing nightmare gets all the way down, through all the HDD-IO bottlenecks ) back into MATLAB in-RAM-objects, do redesign this very step into a pipe-line BigDATA processing from the very beginning.
while true % ref. comment Simon W Oct 1 at 11:29
[ isStillProcessingDotNET, ... % a FLAG from .NET reader function
aDotNET_RowOfVALUEs ... % a ROW from .NET reader function
] = GetDataFromDotNET( aDtPT ) % .NET reader
if ( isStillProcessingDotNET ) % Yes, more rows are still to come ...
aRowCOUNT = aRowCOUNT + 1; % keep .INC for aRowCOUNT ( mean() )
for i = 1:size( aDotNET_RowOfVALUEs )(2) % stepping across each column
aValue = aDotNET_RowOfVALUEs(i); %
anIncrementalSumInCOLUMN(i) = ...
anIncrementalSumInCOLUMN(i) + aValue; % keep .SUM for each column ( mean() )
if ( aMaxInCOLUMN(i) < aValue ) % retest for a "max.update()"
aMaxInCOLUMN(i) = aValue; % .STO a just found "new" max
end
endfor
continue % force re-loop
else
break
endif
end
%-------------------------------------------------------------------------------------------
% FINALLY:
% all results are pre-calculated right at the end of .NET reading phase:
%
% -------------------------------
% BILL OF ALL COMPUTATIONAL COSTS ( for given scales of 1E5 columns x 1E6 rows ):
% -------------------------------
% HDD.IO: **ZERO**
% IN-RAM STORAGE:
% Attr Name Size Bytes Class
% ==== ==== ==== ===== =====
% aMaxInCOLUMNs 1x100000 800000 double
% anIncrementalSumInCOLUMNs 1x100000 800000 double
% aRowCOUNT 1x1 8 double
%
% DATA PROCESSING:
%
% 1.000.000x .NET row-oriented reads ( same for both the OP and this, smarter BigDATA approach )
% 1x INT in aRowCOUNT, %% 1E6 .INC-s
% 100.000x FLOATs in aMaxInCOLUMN[] %% 1E5 * 1E6 .CMP-s
% 100.000x FLOATs in anIncrementalSumInCOLUMN[] %% 1E5 * 1E6 .ADD-s
% -----------------
% about 15 sec per COLUMN of 1E6 rows
% -----------------
% --> mean()s are anIncrementalSumInCOLUMN./aRowCOUNT
%-------------------------------------------------------------------------------------------
% PIPE-LINE-d processing takes in TimeDOMAIN "nothing" more than the .NET-reader process
%-------------------------------------------------------------------------------------------
Your pipe-lined BigDATA computation strategy will in a smart way principally avoid interim storage buffering in MATLAB as it will progressively calculate the results in not more than about 3 x 1E6 ADD/CMP-registers, all with a static layout, avoid proxy-storage into HDF5 / MAT-file, absolutely avoid all HDD-IO related bottlenecks and low BigDATA sustained-read-s' speeds ( not speaking at all about interim/BigDATA sustained-writes... ) and will also avoid ill-performing memory-mapped use just for counting mean-s and max-es.
Epilogue
The pipeline processing is nothing new under the Sun.
It re-uses what speed-oriented HPC solutions already use for decades
[ generations before BigDATA tag has been "invented" in Marketing Dept's. ]
Forget about zillions of HDD-IO blocking operations & go into a pipelined distributed process-to-process solution.
There is nothing faster than this
If it were, all FX business and HFT Hedge Fund Monsters would already be there...

Related

How to 'copy' matrix without creating a temporary matrix in memory that caused memory overflow?

By assigning a matrix into a much bigger allocated memory, matlab somehow will duplicate it while 'copying' it, and if the matrix to be copied is large enough, there will be memory overflow. This is the sample code:
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
slice_matrix(:,:,i)=gather(gpuArray(rand(500,500)));
end
main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end
Any way to just 'smash' the slice_matrix onto the main_mat without the overhead? Thanks in advance.
EDIT:
The overflow occurred when main_mat is allocated beforehand. If main_mat is initialized with main_mat=zeros(500,500,1); (smaller size), the overflow will not occur, but it will slowed down as allocation is not done before matrix is assigned into it. This will significantly reduce the performance as the range of k increases.

The main issue is that numbers take more space than zeros.
main_mat=zeros(500,500,2000); takes little RAM while main_mat = rand(500,500,2000); take a lot, no matter if you use GPU or parfor (in fact, parfor will make you use more RAM). So This is not an unnatural swelling of memory. Following Daniel's link below, it seems that the assignment of zeros only creates pointers to memory, and the physical memory is filled only when you use the matrix for "numbers". This is managed by the operating system. And it is expected for Windows, Mac and Linux, either you do it with Matlab or other languages such as C.

Removing parfor will likely fix your problem.
parfor is not useful there. MATLAB's parfor does not use shared memory parallelism (i.e. it doesn't start new threads) but rather distributed memory parallelism (it starts new processes). It is designed to distribute work over a set or worker nodes. And though it also works within one node (or a single desktop computer) to distribute work over multiple cores, it is not an optimal way of doing parallelism within one node.
This means that each of the processes started by parfor needs to have its own copy of slice_matrix, which is the cause of the large amount of memory used by your program.
See "Decide When to Use parfor" in the MATLAB documentation to learn more about parfor and when to use it.

I assume that your code is just a sample code and that rand() represents a custom in your MVE. So there are a few hints and tricks for the memory usage in matlab.
There is a snippet from The MathWorks training handbooks:
When assigning one variable to another in MATLAB, as occurs when passing parameters into a function, MATLAB transparently creates a reference to that variable. MATLAB breaks the reference, and creates a copy of that variable, only when code modifies one or more of teh values. This behavior, known as copy-on-write, or lazy-copying, defers the cost of copying large data sets until the code modifies a values. Therefore, if the code performs no modifications, there is no need for extra memory space and execution time to copy variables.
The first thing to do would be to check the (memory) efficiency of your code. Even the code of excellent programmers can be futher optimized with (a little) brain power. Here are a few hints regarding memory efficiency
make use of the nativ vectorization of matlab, e.g. sum(X,2), mean(X,2), std(X,[],2)
make sure that matlab does not have to expand matrices (implicit expanding was changed recently). It might be more efficient to use the bsxfun
use in-place-operations, e.g. x = 2*x+3 rather than x = 2*x+3
...
Be aware that optimum regarding memory usage is not the same as if you would want to reduce computation time. Therefore, you might want to consider reducing the number of workers or refrain from using the parfor-loop. (As parfor cannot use shared memory, there is no copy-on-write feature with using the Parallel Toolbox.
If you want to have a closer look at your memory, what is available and that can be used by Matlab, check out feature('memstats'). What is interesting for you is the Virtual Memory that is
Total and available memory associated with the whole MATLAB process. It is limited by processor architecture and operating system.
or use this command [user,sys] = memory.
Quick side node: Matlab stores matrices consistently in memory. You need to have a large block of free RAM for large matrices. That is also the reason why you want to allocate variables, because changing them dynamically forces Matlab to copy the entire matrix to a larger spot in the RAM every time it outgrows the current spot.
If you really have memory issues, you might just want to dig into the art of data types -- as is required in lower level languages. E.g. you can cut your memory usage in half by using single-precision directly from the start main_mat=zeros(500,500,2000,'single'); -- btw, this also works with rand(...,'single') and more native functions -- although a few of the more sophisticated matlab functions require input of type double, which you can upcast again.

If I understand correctly your main issue is that parfor does not allow to share memory. Think of every parfor worker as almost a separate matlab instance.
There is basically just one workaround for this that I know (that I have never tried), that is 'shared matrix' on Fileexchange: https://ch.mathworks.com/matlabcentral/fileexchange/28572-sharedmatrix
More solutions: as others suggested: remove parfor is certainly one solution, get more ram, use tall arrays (that use harddrives when ram runs full, read here), divide operations in smaller chunks, last but not least, consider an alternative other than Matlab.

You may use following code. You actually don't need the slice_matrix
main_mat=zeros(500,500,2000);
n=500;
slice_matrix=zeros(500,500,n);
for k=1:4
parfor i=1:n
main_mat(:,:,1+(k-1)*n + i - 1) = gather(gpuArray(rand(500,500)));
end
%% now you don't need this main_mat(:,:,1+(k-1)*n:1+(k-1)*n+n-1)=slice_matrix; %This is where the memory will likely overflow
end

Optimizing compression using HDF5/H5 in Matlab

Using Matlab, I am going to generate several data files and store them in H5 format as 20x1500xN, where N is an integer that can vary, but typically around 2300. Each file will have 4 different data sets with equal structure. Thus, I will quickly achieve a storage problem. My two questions:
Is there any reason not the split the 4 different data sets, and just save as 4x20x1500xNinstead? I would prefer having them split, since it is different signal modalities, but if there is any computational/compression advantage to not having them separated, I will join them.
Using Matlab's built-in compression, I set deflate=9 (and DataType=single). However, I have now realized that using deflate multiplies my computational time with 5. I realize this could have something to do with my ChunkSize, which I just put to 20x1500x5 - without any reasoning behind it. Is there a strategic way to optimize computational load w.r.t. deflation and compression time?
Thank you.

1- Splitting or merging? It won't make a difference in the compression procedure, since it is performed in blocks.
2- Your choice of chunkshape seems, indeed, bad. Chunksize determines the shape and size of each block that will be compressed independently. The bad is that each chunk is of 600 kB, that is much larger than the L2 cache, so your CPU is likely twiddling its fingers, waiting for data to come in. Depending on the nature of your data and the usage pattern you will use the most (read the whole array at once, random reads, sequential reads...) you may want to target the L1 or L2 sizes, or something in between. Here are some experiments done with a Python library that may serve you as a guide.
Once you have selected your chunksize (how many bytes will your compression blocks have), you have to choose a chunkshape. I'd recommend the shape that most closely fits your reading pattern, if you are doing partial reads, or filling in in a fastest-axis-first if you want to read the whole array at once. In your case, this will be something like 1x1500x10, I think (second axis being the fastest, last one the second fastest, and fist the slowest, change if I am mistaken).
Lastly, keep in mind that the details are quite dependant on the specific machine you run it: the CPU, the quality and load of the hard drive or SSD, speed of RAM... so the fine tuning will always require some experimentation.

What are the maximum number of columns in the input data in MATLAB

i must import a big data file in matlab , and its size is abute 300 MB.
now i want to know what are the maximum number of columns ,that i can imort to matlab. so divided that file to some small file.
please hellp me

There are no "maximum" number of columns that you can create for a matrix. What's the limiting factor is your RAM (à la knedlsepp), the data type of the matrix (which is also important... a lot of people overlook this), your operating system, and also what version of MATLAB you're using - specifically whether it's 32 or 64 bit.
If you want a more definitive answer, here's a comprehensive chart from MathWorks forums on what you can allocate given your OS version, MATLAB version and the data type of the matrix you want to create:
The link to this post is here: http://www.mathworks.com/matlabcentral/answers/91711-what-is-the-maximum-matrix-size-for-each-platform
Even though the above chart is for MATLAB R2007a, the sizes will most likely not have changed over the evolution of the software.
There are a few caveats with the above figure that you need to take into account:
The above table also takes your workspace size into account. As such, if you have other variables in memory and you are trying to allocate a matrix that tries to reach the limit seen in the charge, you will not be successful in its allocation.
The above table assumes that MATLAB has just been launched with no major processing carried out in a startup.m file.
The above table assumes that there is unlimited system memory, so RAM plus any virtual memory or swap file being available.
The above table's actual limits will be less if there is insufficient system memory available, usually due to the swap file being too small.

What is the difference between compaction and defragmentation?

My operating systems textbook says that compaction is a process that rearranges disk blocks such that all free disk blocks form a contiguous "chunk" of free disk space.
But I always thought that was what defragmentation does? Are these two terms the same? Or am I missing something?

Compaction :- means moving the "in-use" memory areas to eliminate holes caused by terminated processes.Suppose we have five processes A, B, C, D, E, allocated as |A|B|C|D|E| in memory. After sometime process B and D are terminated. Now we have memory layout as |A| |C| |E|. After applying compaction we will have |A|C|E| | | i.e instead of two one-block memory unit we have one two-block memory unit.
Defragmentation :- means storing complete file in smallest number of contiguous regions.
That is, it tries to store file as one complete unit if that size of contiguous memory is available. Suppose process A has fragments A1, A2, A3, process B has fragments B1, B2. Now, suppose memory layout is |A1|B1|A2|A3|B2|, after defragmentation we have |A1|A2|A3|B1|B2|. Defragmentation can also contribute to compaction.

In modern disk operating systems, files are subdivided into blocks, each of which may be stored at an arbitrary location on the disk. Files can be read most quickly from a physical disk if the blocks are stored consecutively, but I think every OS that was created since the mid-1980's can, without difficulty, create a file which is larger than the single largest consecutive free area on the disk, provided that the total size of all free areas is sufficient to hold the file. Such a file will end up with different pieces stored in different formerly-free parts of the disk, and thus accessing it will often not be as fast as if the entire file had been stored consecutively.
Conceptually, an "ideal" disk arrangement would have the contents of every file stored consecutively, with all files stored "back-to-back", so all the unused blocks were in a consecutive range. Such an arrangement would be both "compacted" and "defragmented". In general, though, the amount of effort to arrange everything perfectly is seldom worthwhile (with an obvious exception being a disk that is written all at once, and will never be modified, as would typically be the case with e.g. a CD-ROM). Defragmenting a disk will move all the blocks that make up each file to a consecutive sequence of blocks on the disk, but will not necessarily attempt to eliminate free areas between files. Compacting a disk will consolidate all of the free areas by moving data from later parts of the disk to unused locations in earlier parts, but may cause fragmentation of existing files.
Generally, software that performs defragmentation will try to avoid creating too many scattered free areas, and software that performs compaction will try to avoid causing needless fragmentation, but depending upon what the software is trying to do (e.g. maximize efficiency for existing files, versus preparing a large contiguous area of space in preparation for a large data-acquisition operation that needs to run smoothly) the software may focus on one kind of operation at the expense of the other.

is kdb fast solely due to processing in memory

I've heard quite a couple times people talking about KDB deal with millions of rows in nearly no time. why is it that fast? is that solely because the data is all organized in memory?
another thing is that is there alternatives for this? any big database vendors provide in memory databases ?

A quick Google search came up with the answer:
Many operations are more efficient with a column-oriented approach. In particular, operations that need to access a sequence of values from a particular column are much faster. If all the values in a column have the same size (which is true, by design, in kdb), things get even better. This type of access pattern is typical of the applications for which q and kdb are used.
To make this concrete, let's examine a column of 64-bit, floating point numbers:
q).Q.w[] `used
108464j
q)t: ([] f: 1000000 ? 1.0)
q).Q.w[] `used
8497328j
q)
As you can see, the memory needed to hold one million 8-byte values is only a little over 8MB. That's because the data are being stored sequentially in an array. To clarify, let's create another table:
q)u: update g: 1000000 ? 5.0 from t
q).Q.w[] `used
16885952j
q)
Both t and u are sharing the column f. If q organized its data in rows, the memory usage would have gone up another 8MB. Another way to confirm this is to take a look at k.h.
Now let's see what happens when we write the table to disk:
q)`:t/ set t
`:t/
q)\ls -l t
"total 15632"
"-rw-r--r-- 1 kdbfaq staff 8000016 May 29 19:57 f"
q)
16 bytes of overhead. Clearly, all of the numbers are being stored sequentially on disk. Efficiency is about avoiding unnecessary work, and here we see that q does exactly what needs to be done when reading and writing a column - no more, no less.
OK, so this approach is space efficient. How does this data layout translate into speed?
If we ask q to sum all 1 million numbers, having the entire list packed tightly together in memory is a tremendous advantage over a row-oriented organization, because we'll encounter fewer misses at every stage of the memory hierarchy. Avoiding cache misses and page faults is essential to getting performance out of your machine.
Moreover, doing math on a long list of numbers that are all together in memory is a problem that modern CPU instruction sets have special features to handle, including instructions to prefetch array elements that will be needed in the near future. Although those features were originally created to improve PC multimedia performance, they turned out to be great for statistics as well. In addition, the same synergy of locality and CPU features enables column-oriented systems to perform linear searches (e.g., in where clauses on unindexed columns) faster than indexed searches (with their attendant branch prediction failures) up to astonishing row counts.
Sources(S): http://www.kdbfaq.com/kdb-faq/tag/why-kdb-fast

as for speed, the memory thing does play a big part but there are several other things, fast read from disk for hdb, splaying etc. From personal experienoce I can say, you can get pretty good speeds from c++ provided you want to write that much code. With kdb you get all that and some more.
another thing about speed is also speed of coding. Steep learning curve but once you get it, complex problems can be coded in minutes.
alternatives you can look at onetick or google in memory databases

kdb is fast but really expensive. Plus, it's a pain to learn Q. There are a few alternatives such as DolphinDB, Quasardb, etc.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse