random file access in C --- shrink file? - fopen

easy question.
I want to create a set of records (database is an overstatement) on disk. I can open with rb+, move to a random location with fseek, and then fread and fwrite, all interspersed with fflush. all good.
now I want to delete one record. easy---move the last record to the spot where I want to delete another record, and then shorten the file by one record.
but...how do I shorten my existing file?
/iaw

Copy the contents before and after the record to be deleted to
a temporary file.
Delete the original file (which had the record to be deleted).
Rename temporary file
as the original file.

To truncate the existing file that you have opened, seek to the desired file size and then set a new EOF at that position. The problem is that C has no function for that purpose. You would have to use platform-specific APIs, like SetEndOfFile() on Windows.

there is no ANSI solution to this problem, short of copying the whole file anew.
there are OS-specific solutions only. see comments under the answers.
Windows: SetEndOfFile()
Posix: ftruncate() and truncate()

Related

how to move file after reading the file in ibm datastage

i have 1 folder which has 4 files, they are sales_jan, sales_feb, debt_jan, debt_feb.I created specific job for each sales and debt. The thing is, if i already run the job previously for sales_jan only and then there comes sales_feb after that, i dont wanna repeat reading the sales_jan again, i only want to read the newest file added that hasn't been processed. For reading the file, i pass the pattern of the specific file (ex. sales_*) but if i use it like that, then the stage will reprocessed the sales_jan again although it already has. I want to move the file already been read into another folder. How do i exactly do it in ibm datastage? if there's no way to do it, what's your suggestion for my problem. Any ideas would be appreciated.
The easiest solution is to use an after-job subroutine (ExecSH on Linux/UNIX, ExecDOS on Windows) to move the file to a different location.
Since you're using wildcards for the Sequential File stage, you're going to have to be a bit more clever in handling a situation where your job processes only some of the files. I would prefer to write this using a loop in a sequence, processing one file at a time, so that the move can be handled per-file.
you might make a flag for every file which already read by your job. For example add a maxdate field for each file. When the first file max date is less than the second file or new file Then read the latest file. It can be done by using simple linux command in sequence or tranformer. Just like Ray mentioned before

How to recover files that were moved to a single file?

I tried to move multiple files into a folder, but there was a mistake in my matlab code that I didn't create the folder. Now all the files were moved to a single file which cannot be opened or edited. How to recover these files?
Example of the mistake:
a=strcat('C:\Users\foldername'); % name and directory of the folder
fname=a;
% mkdir(fname); % so this command wasn't executed...
movefile('file1',fname);
movefile('file2',fname);
So now file1 and file2 were merged in file 'fname', instead of in the folder named 'fname'. How to get file1 and file2 back?
Thanks in advance!
Unfortunately, the odds may be stacked against you getting back any of the files, except for the last one. The reason why is because movefile doesn't append to an existing destination file, it overwrites it. The following will give you back your last file (by simply renaming fname):
movefile(fname, 'file2');
If you're lucky, your operating system will have options for you to restore previous versions of your files/folders. Your best bet may be to check and see if the folder containing your original files has any previous versions you can open/restore to get previous versions of 'file1' and 'file2'. For example, on my Windows machine I can right click on my default MATLAB folder, select "Properties", then select the "Previous Versions" tab, and I see this:
You can see there are a few versions I could open and copy files from if I've inadvertently deleted or overwritten anything recently. Good luck!

Talend tWaitForFile insufficiency

We have a producer process that write files into a specific folder, which run continuously, we have to read files one by one using talend, there is 2 issues:
The 1st: tWaitForFile read only files which exist before its starting, so files which have created after the component starting are not visible for it.
The 2nd: There is no way to know if the file is released by the producer process, it may be read while it is not completely written, the parameter _wait_release_ of tWaitForFile does not work on Linux system !
So how can make Talend read complete written files from a directory that have an increasing files number ?
I'm not sure what you mean by your first issue. tWaitForFile has options to trigger when files are created, modified or deleted in a folder.
As for the second issue, your best bet here is for the file producer to be creating an OK or control file which is a 0 byte touch when it has finished writing the file you want.
In this case you simply look for the appearance of the OK file and then pick up the relevant completed file. If you name the 2 files the same but with a different file extension (the OK file is typically called ".OK" then this should be easy enough to look for. So you would set your tWaitForFile to look for "*.OK" files and then connect this to an iterate to a tFileInputDelimited (in the case you want to pick up a delimited text file) and then declare the file name as ((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).substring(0,((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).length()-3) + ".txt"
I've included some screenshots to help you below:

Save filename.m deleted my origonal filename.m file and replaced it with a new file. How to get the old back?

I tried to save my file using the command window in Matlab. Unfortunately it replaced my file with a new file. And now I am unable to get it back.
It's probably easy, but I am new to using the command window in Matlab.
You are out of luck on this one. Saving to a file will irretrievably overwrite any existing file with that name unless you specify otherwise with the -append option. In the future, if you have a data set that is important because it is either not reproducable or because it takes a long time to generate it, I would recommend either backing it up or saving it with a timestamp. This is one example:
function save_t(name,varargin)
save(sprintf('%s-%d',name,time),clock*[1e8 1e6 1e4 1e2 1 0].',varargin{:});
end
Save this to a file in you matlab path named "save_t.m" and then you can simply call it just like you would call the save function, but now it will add on a timestamp.
save_t filename
This will help to ensure that you don't accidentally overwrite an existing file.

A rotating log file in perl

I have implemented a log file that will be storing the cpu and memory state of a process after every minute.I have limited the maximum size of the file to 3MB (thats enough for my purpose).
The script will be called by a cron job after every minute and the script will log the details for that minute and will rename the file as "Log_.log".
When the size reaches "3MB - 100 bytes" I reset the file pointer to point to the begining and will overwrite the first entry in the log file and will now rename the file as "Log_<0+some offset>.log".
As I am renaming the file after every minute to update the file pointer position, is it a good/efficient way ?
I do not want to maintain more than one log file for this purpose.
Another option for me is to maintain the file pointer position in a file ,but ....another file !! not interested in maintaining one if this option is good :)
Thanks in Advance.
Are you an engineer? This is a nice example of some simple task, solved by a perfectly working but overly complex solution.
Unless the content you put in takes exactly as many bytes as the content you take out, writing "in" a file will actually cause the whole following part after your writing position to be rewritten to disk. Append is much cheaper.
Renaming the file to store the pointer works - but it's not very elegant, and makes stuff more complex (for one, your process needs write rights to the directory in which the file resides - else just write access to two files is sufficient)
Unless disk space is an issue (and really, it rarely is), your approach is less efficient than say, append everything to a file, and rotate the file when it reaches its maximum size. This way you always have the last 3MB of logs available, and maximum 3MB more in your current file. It will make parsing the file a lot easier too, instead of recalculating the entire pointer position thing.
Update to answer your comment:
Renaming a file every minute (or even every second) shouldn't slow down your system significantly, don't worry about that.
Our concerns are mainly with "why you think you need to rename the file". It's not better technically, it's not better from a logical point of view, it makes a lot of other (future) tasks harder. You could store the file pointer in a seperate file, or at the end of your file, and there are better^H^H^H^H^H^H simpler solutions that don't require the file pointer at all.
I'm confused why you would rename your file. What does this accomplish?
Are the log entries fixed size? Or variable size?
If the entries are fixed size, then there is no trouble in re-writing the existing file from the start: you won't ever have incomplete entries in your file, and if you are writing a counter or timestamps to the file, it should be clear where the 'cursor' is located.
If the entries are variable size, then you should probably not begin re-writing the file from the beginning without somehow making it clear where the 'cursor' is located in the file, and write code that is resilient to reading truncated log entries.
Can you re-use existing tools such as RRDtool?