How to append data to a compressed file? - swift

I am getting a lot of data from websocket screem and I want to store them on disk. The amount of data received is ~300 MB per hour and I want to store this data long term (months, years).
In .NET there is a way how to read/write from/to zipped files using compressed streams. Is there a way to write directly to compressed file in Swift?
This is Mac OS (OSX) question.
Edit:
Stream compression here might be a solution but I am not used to work with unsafe pointers and don't even know whether it can be used to write to compressed file... I am stacked on this for few hours now. Code sample or directions how to approach it would help. Cocoapods wrapper for stream compression would be even better.

gzlog does what you're looking for. It is written in C and uses the zlib library. zlib is available on macOS, and you can link to C code from Swift.

Related

Store binary file in OrientDB using pyorient

I am trying to store a binary file in OrientDB. I am using pyorient. The file can be large (more than 50mb) and I could not find a way except for storing its hex as a list of strings which takes a long time to store. Is there a way I can do it in a more elegant way and get done faster?
Unfortunately I think not; I spent a bunch of time on this and OrientDB's ability to store binary files isn't exposed in PyOrient as far as I could tell.
As you may know, PyOrient hasn't received an update since 2017 (I assume no longer officially supported) and none of the more recent features in the database are available via the PyOrient driver either.
Personally I've reached the conclusion that OrientDB is no longer a viable choice for a Python based solution even without the binary file limitation, unless you have the time/energy to dig into the driver and bring it up to spec.

How to read/write to raw device with PowerShell?

I have to read and write data (up to 512 bytes) to/from raw disks (first sector on disk, and first sector on partitions).
I'd like to use PowerShell for that, but fail to find any reference to access to raw disks and raw partitions.
Whats is/are the way/s to do that?
You can do in PowerShell most of the things you can do in .NET (with C# or another language). You will find in CCS LABS C#: Low Level Disk Access the way to do it in C#, but really I'am not sure it's a good idea to do that using a scripting language.

How to use MongoDB or other document database to keep video files, with options of adding to existing binary files and parallel read/write

I'm working on a video server, and I want to use a database to keep video files.
Since I only need to store simple video files with metadata I tried to use MongoDB in Java, via its GridFS mechanism to store the video files and their metadata.
However, there are two major features I need, and that I couldn't manage using MongoDB:
I want to be able to add to a previously saved video, since saving a video might be performed in chunks. I don't want to delete the binary I have so far, just append bytes at the end of an item.
I want to be able to read from a video item while it is being written. "Thread A" will update the video item, adding more and more bytes, while "Thread B" will read from the item, receiving all the bytes written by "Thread A" as soon as they are written/flushed.
I tried writing the straightforward code to do that, but it failed. It seems MongoDB doesn't allow multi-threaded access to the binary (even if one thread is doing all the writing), nor could I find a way to add to a binary file - the Java GridFS API only gives an InputStream from an already existing GridFSDBFile, I cannot get an OutputStream to write to it.
Is this possible via MongoDB, and if so how?
If not, do you know of any other DB that might allow this (preferably nothing too complex such as a full relational DB)?
Would I be better off using MongoDB to keep only the metadata of the video files, and manually handle reading and writing the binary data from the filesystem, so I can implement the above requirements on my own?
Thanks,
Al
I've used mongo gridfs for storing media files for a messaging system we built using Mongo so I can share what we ran into.
So before I get into this for your use case scenario I would recommend not using GridFS and actually using something like Amazon S3 (with excellent rest apis for multipart uploads) and store the metadata in Mongo. This is the approach we settled on in our project after first implementing with GridFS. It's not that GridFS isn't great it's just not that well suited for chunking/appending and rewriting small portions of files. For more info here's a quick rundown on what GridFS is good for and not good for:
http://www.mongodb.org/display/DOCS/When+to+use+GridFS
Now if you are bent on using GridFS you need to understand how the driver and read/write concurrency works.
In mongo (2.2) you have one writer thread per schema/db. So this means when you are writing you are essentially locked from having another thread perform an operation. In real life usage this is super fast because the lock yields when a chunk is written (256k) so your reader thread can get some info back. Please look at this concurrency video/presentation for more details:
http://www.10gen.com/presentations/concurrency-internals-mongodb-2-2
So if you look at my two links essentially we can say quetion 2 is answered. You should also understand a little bit about how Mongo writes large data sets and how page faults provide a way for reader threads to get information.
Now let's tackle your first question. The Mongo driver does not provide a way to append data to GridFS. It is meant to be a fire/forget atomic type operation. However if you understand how the data is stored in chunks and how the checksum is calculated then you can do it manually by using the fs.files and fs.chunks methods as this poster talks about here:
Append data to existing gridfs file
So going through those you can see that it is possible to do what you want but my general recommendation is to use a service (such as Amazon S3) that is designed for this type of interaction instead of trying to do extra work to make Mongo fit your needs. Of course you can go to the filesystem directly as well which would be the poor man's choice but you lose redundancy, sharding, replication etc etc that you get with GridFS or S3.
Hope that helps.
-Prasith

Can i store sqlite db as zip file in iphone application

My sqlite file has a size of 7MB. I want to reduce its size. How i can do that ? When am simply compressing it will come around only 1.2 MB. Can i compress my mydb.sqlite to a zip file ? If it is not possible, any other way to reduce size of my sqlite file ?
It is possible to compress before hand, but is very redundant. You will compress your binary before distribution, Apple distributes your app through the store compressed and the compression of a compressed file is fruitless. Thus, any work you do to compress beforehand should not have much of an effect on the resulted size of your application
without details of what you are storing in the DB it's hard to give specific advice. The usual generics on DB Design will apply. Normalise your database.. for example
reduce/remove repeating data. If you have text/data that is repeated then store it once, and use key to reference it
If you are storing large chunks of data then you might be able to zip and unzip these in and out of the database in your app code rather than try to zip the DB

How to efficiently process 300+ Files concurrently in scala

I'm going to work on comparing around 300 binary files using Scala, bytes-by-bytes, 4MB each. However, judging from what I've already done, processing 15 files at the same time using java.BufferedInputStream tooks me around 90 sec on my machine so I don't think my solution would scale well in terms of large number of files.
Ideas and suggestions are highly appreciated.
EDIT: The actual task is not just comparing the difference but to processing those files in the same sequence order. Let's say I have to look at byte ith in every file at the same time, and moving on to (ith + 1).
Did you notice your hard drive slowly evaporating as you read the files? Reading that many files in parallel is not something mechanical hard drives are designed to do at full-speed.
If the files will always be this small (4MB is plenty small enough), I would read the entire first file into memory, and then compare each file with it in series.
I can't comment on solid-state drives, as I have no first-hand experience with their performance.
You are quite screwed, indeed.
Let's see... 300 * 4 MB = 1.2 GB. Does that fit your memory budget? If it does, by all means read them all into memory. But, to speed things up, you might try the following:
Read 512 KB of every file, sequentially. You might try reading from 2 to 8 at the same time -- perhaps through Futures, and see how well it scales. Depending on your I/O system, you may gain some speed by reading a few files at the same time, but I do not expect it to scale much. EXPERIMENT! BENCHMARK!
Process those 512 KB using Futures.
Go back to step 1, unless you are finished with the files.
Get the result back from the processing Futures.
On step number 1, by limiting the parallel reads you avoid trashing your I/O subsystem. Push it as much as you can, maybe a bit less than that, but definitely not more than that.
By not reading all files on step number 1, you use some of the time spent reading these files doing useful CPU work. You may experiment with lowering the bytes read on step 1 as well.
Are the files exactly the same number of bytes? If they are not, the files can be compared simply via the File.length() method to determine a first-order guess of equality.
Of course you may be wanting to do a much deeper comparison than just "are these files the same?"
If you are just looking to see if they are the same I would suggest using a hashing algorithm like SHA1 to see if they match.
Here is some java source to make that happen
many large systems that handle data use sha1 Including the NSA and git
Its simply more efficient use a hash instead of a byte compare. the hashes can also be stored for later to see if the data has been altered.
Here is a talk by Linus Torvalds specifically about git, it also mentions why he uses SHA1.
I would suggest using nio if possible. Introudction To Java NIO and NIO2 seems like a decent guide to using NIO if you are not familiar with it. I would not suggest reading a file and doing a comparison byte by byte, if that is what you are currently doing. You can create a ByteBuffer to read in chunks of data from a file and then do comparisons from that.