scala - lazy file/url iterator - scala

As far as I know iterators of files/urls in scala are lazy, i.e.
scala.io.Source.fromFile("c:/tmp.csv") getLines()
should return an Iterator[String] which has not yet read the file in and is simply pointing to the first line of that file. yet, if I debug this code, stop on the next line, go and physically change the file on the HDD, the values returned by this iterator correspond to the file from before the update. Why is that the case?
This is what I would epect from a Java iterator, that would prefetch the wholefile into the memory

Obviously, reads from the file are buffered (to increase performance, since disk access is slow). So when you start reading from a file, some part of it is read into buffer right away (for example, 4kb), so when you edit that already-read part, it doesn't change in your program.
I tried to do this with 7Mb file - I opened the file, edited the last line and the edit was properly reflected in the code. On the contary, when I did the same trick with 4Kb file, I got the behavior you describe.
EDIT: I suspect that the actual buffering happens somewhere around these lines (I love the comments there :) ).
EDIT2: Actually, I feel that you found some funny bug - since I'm looking at source for half an hour now, and I still don't see the place where the buffering could take place before the call to getLines.

Related

Can I force visdiff to display more than the first 2000 bytes?

I have two binary files that I'm trying to compare using Matlab's built-in function visdiff, but it only displays the first 2000 bytes as a default. Is there any way to force the comparison tool to display the entire contents of both files side by side?
Edit the file matlabroot\toolbox\shared\comparisons\private\bindiff.m, where matlabroot is your MATLAB installation directory. On line 149, you'll see it sets the variable MAXLEN to 2000. Change this to something bigger (even Inf seems to work).
You may need to type rehash toolboxcache after making this change, in order to get MATLAB to notice.
Please note:
As you're making a change to the MATLAB source, this is at your own risk (it seems fine to me though). Keep a backup of the file you've edited.
That truncation at 2000 bytes is there for a reason - comparing the whole of larger binary files does seem to take quite a while, so be patient. Maybe try gradually increasing MAXLEN, rather than going straight to Inf.
I only have R2011b available to me right now, so if you're on a newer version the file path and line number I mentioned above may have changed. It was very easy to trace through the code from visdiff to comparisons_private to bindiff though, so unless they've changed the deeper structure of the Comparisons Tool between 11b and now, it will probably be very similar.

How to replace a file inside a zip on iOS?

I need to replace a file on a zip using iOS. I tried many libraries with no results. The only one that kind of did the trick was zipzap (https://github.com/pixelglow/zipzap) but this one is no good for me, because what really do is re-zip the file again with the change and besides of this process be to slow for me, also do something that loads the whole file on memory and make my application crash.
PS: If this is not possible or way to complicated, I can settle for rename or delete an specific file.
You need to find a framework where you can modify how data is read and written. You would then use some form of mmap to essentially read and write small chunks. Searching on NSData and mmap resulted in this Post, however you can use mmap from the posix level too. Ps it will be slower than using pure memory no way around that.
Got it WORKING!! JXZip (https://github.com/JanX2/JXZip) has made exactly what I need, they link to libzip (http://www.nih.at/libzip/) that is a fully equiped library for working with ZIP files and JXZip have all the necessary Objective-C wrapper code. Thanks for all the replys.
For archive purposes, as the author of zipzap:
Actually zipzap does exactly what you want. If you replace an entry within a zip file, zipzap will do the minimum necessary to update it: it will skip writing all entries before the replaced entry, then write out the entry, then write out all entries after the replaced entry without recompressing. At the moment, it does require sufficient memory for the entries after the replaced entry though.

To read a big file which are in Gigs fastly in PERL

We are currently reading the file line by line which delays to read and complete for all.
we would need to read the file fastly and prgoress with our commands.
the commands which i tried using fork and array just displays me the first set of lines only and not proceeding with pther sets.
please help on it.
Reading a large file takes a fair bit of time - disks are slow, after all. Before you start looking at Perl, first try (assuming you're on a unix-type system):
time cat /path/to/your/large/file >/dev/null
The output will tell you how long it takes to just read that file from disk without doing anything to it. Alternately, open the file in your favorite text editor and time how long it takes to load. Once you have that time, compare it to how long your Perl program takes to read the file. Unless the Perl program takes significantly longer, you're not likely to be able to do anything about it because the time is being spent on getting the data from disk rather than on processing it.
Of course, that's assuming that you actually do need to read the entire file. If you can get by with only reading specific parts of it, then you could create an index file and use that to jump directly to the part that's of interest, but you haven't provided enough information for us to tell whether that would apply to your case or not.
If you need more specific help, please provide a better description of what you mean to accomplish and a small, runnable piece of Perl code which shows how you're currently reading and processing the file so that we can see whether you're doing anything particularly inefficient that can be improved on.

A rotating log file in perl

I have implemented a log file that will be storing the cpu and memory state of a process after every minute.I have limited the maximum size of the file to 3MB (thats enough for my purpose).
The script will be called by a cron job after every minute and the script will log the details for that minute and will rename the file as "Log_.log".
When the size reaches "3MB - 100 bytes" I reset the file pointer to point to the begining and will overwrite the first entry in the log file and will now rename the file as "Log_<0+some offset>.log".
As I am renaming the file after every minute to update the file pointer position, is it a good/efficient way ?
I do not want to maintain more than one log file for this purpose.
Another option for me is to maintain the file pointer position in a file ,but ....another file !! not interested in maintaining one if this option is good :)
Thanks in Advance.
Are you an engineer? This is a nice example of some simple task, solved by a perfectly working but overly complex solution.
Unless the content you put in takes exactly as many bytes as the content you take out, writing "in" a file will actually cause the whole following part after your writing position to be rewritten to disk. Append is much cheaper.
Renaming the file to store the pointer works - but it's not very elegant, and makes stuff more complex (for one, your process needs write rights to the directory in which the file resides - else just write access to two files is sufficient)
Unless disk space is an issue (and really, it rarely is), your approach is less efficient than say, append everything to a file, and rotate the file when it reaches its maximum size. This way you always have the last 3MB of logs available, and maximum 3MB more in your current file. It will make parsing the file a lot easier too, instead of recalculating the entire pointer position thing.
Update to answer your comment:
Renaming a file every minute (or even every second) shouldn't slow down your system significantly, don't worry about that.
Our concerns are mainly with "why you think you need to rename the file". It's not better technically, it's not better from a logical point of view, it makes a lot of other (future) tasks harder. You could store the file pointer in a seperate file, or at the end of your file, and there are better^H^H^H^H^H^H simpler solutions that don't require the file pointer at all.
I'm confused why you would rename your file. What does this accomplish?
Are the log entries fixed size? Or variable size?
If the entries are fixed size, then there is no trouble in re-writing the existing file from the start: you won't ever have incomplete entries in your file, and if you are writing a counter or timestamps to the file, it should be clear where the 'cursor' is located.
If the entries are variable size, then you should probably not begin re-writing the file from the beginning without somehow making it clear where the 'cursor' is located in the file, and write code that is resilient to reading truncated log entries.
Can you re-use existing tools such as RRDtool?

ExtAudioFileSeek and ExtAudioFileWrite together on the same file

I have a situation where I can save a post-processing pass through the audio by taking some manipulated buffer from the end of the track and writing them to the beginning of my output file.
I originally thought I could do this by resetting the write pointer using ExtAudioFileSeek, and was about to implement it when I saw this line in the docs
Ensure that the file you are seeking in is open for reading only. This function’s behavior with files open for writing is undefined.
Now I know I could close the file for writing then reopen it, but the process is a little more complicated than that. Part of the manipulation I am doing is reading from buffers that are in the file I am writing to. The overall process looks like this:
Read buffers from the end of the read file
Read buffers from the beginning of the write file
Process the buffers
Write the buffers back to the beginning of the write file, overwriting the buffers I read in step 2
Logically, this can be done in 1 pass no problem. Programmatically, how can I achieve the same thing without corrupting my data, becoming less-efficient (opposite of my goal) or potentially imploding the universe?
Yes, using a single audio file for both reading and writing may, as you put it, implode the universe, or at least lead to other nastiness. I think that the key to solving this problem is in step 4, where you should write the output to a new file instead of trying to "recycle" the initial write file. After your processing is complete, you can simply scrap the intermediate write file.
Or have I misunderstood the problem?
Oh, and also, you should use ExtAudioFileWriteAsync instead of ExtAudioFileWrite for your writes if you are doing this in realtime. Otherwise the I/O load will cause audio dropouts.