To read a big file which are in Gigs fastly in PERL - perl

We are currently reading the file line by line which delays to read and complete for all.
we would need to read the file fastly and prgoress with our commands.
the commands which i tried using fork and array just displays me the first set of lines only and not proceeding with pther sets.
please help on it.

Reading a large file takes a fair bit of time - disks are slow, after all. Before you start looking at Perl, first try (assuming you're on a unix-type system):
time cat /path/to/your/large/file >/dev/null
The output will tell you how long it takes to just read that file from disk without doing anything to it. Alternately, open the file in your favorite text editor and time how long it takes to load. Once you have that time, compare it to how long your Perl program takes to read the file. Unless the Perl program takes significantly longer, you're not likely to be able to do anything about it because the time is being spent on getting the data from disk rather than on processing it.
Of course, that's assuming that you actually do need to read the entire file. If you can get by with only reading specific parts of it, then you could create an index file and use that to jump directly to the part that's of interest, but you haven't provided enough information for us to tell whether that would apply to your case or not.
If you need more specific help, please provide a better description of what you mean to accomplish and a small, runnable piece of Perl code which shows how you're currently reading and processing the file so that we can see whether you're doing anything particularly inefficient that can be improved on.

Related

How to replace value in txt file with powershell from GitHub

I want to build a simple script that may be useful for others as well, but I have only very basic programming knowledge and can't do it myself without learning how to write powershell scripts from scratch.
What this script is supposed to do is, open an INI file (really just a txt), look for a variable with an assigned value and replace that value from a txt hosted on GitHub, save and then run a program.
This is for the tracker list of qBittorrent, since that feature still hasn't been implemented and the only other script that I could find that does this is for linux and mac, there seem to be none for windows.
The basic idea is this:
get-content "c:\users\[user]\appdata\roaming\qbittorrent\qbittorrent.ini"
# This is where pseudo code starts
get file from "[github-link.txt]"
save file to cache # keeping it is useless as it gets updated daily
find variable "Session\AdditionalTrackers=" in qbittorrent.ini
replace value of variable with content of cached file # this is what I struggle with most when looking for example code. Everything I could find specified the exact string that needed replacing, which in this case is quite long and may change with every update of the file.
overwrite original file
launch program qbittorrent.exe
end script
Conveniently or most likely deliberately all (most) of the tracker lists on GitHub are already formatted in a way that they can be directly pasted into the file without having to worry about formatting. Example.
I can totally understand if nobody wants to do the work, but I would greatly appreciate it and possibly others that are looking for a stopgap for the lacking feature.
If this already exists, go ahead and call me an idiot and while you're at it drop a link ;)
I just found a little tool called Power Automate and it pretty much does what I was looking for. It's not quite as elegant as a single click script but it does the job. Sadly I can't share the "flow" I built because, well, there is no option for it - thanks Microsoft. So, I'll try my best to write it out.
Not quite a "solution" but pretty to close to it.
Here is the "flow":
get file from web // from github for example
read text from file // read downloaded .txt file
read text from file // read qBittorrent.ini
crop text // crop between flags in qBittorrent.ini use "Session\AdditionalTrackers=" as start and "Session\GlobalMaxRatio=" as end and save to cropVar2
crop text // crop before flag use "Session\AdditionalTrackers=" as flag and save to cropVar1
crop text // crop after flag use cropVar2 as flag and save to cropVar3
replace text // replace cropVar2 with content of downloaded file and save to cropVar2
write text to file // write cropVar1,cropVar2,cropVar3
end flow
Keep in mind that any changes to the qBittorrent.ini may change the order of the entries. Which means you have to check if it's still correct after every update and after every change you make in the options. This is a massive cludge after all...
You can input fail saves so that you won't break anything if the order changed.

INode File System, what is the extra space in a data block used for?

So, I am currently learning about the INode file system and am asked to write a simple file system using Inodes.
So far, I understand that there is an INode table that has a mapping from INode-> Data blocks through direct/indirect pointers.
Let's assume data gets written into a file, the data is stored into two blocks. Let's say each block is 512bytes, and the file takes one full block, and only 200 bytes of the second block. What happens with the rest of the space in that data block? Is it reserved for that file only or do other files use this block?
Depending on the file system, usually and most likely this area is now lost. I think the Reiser File System actually reclaimed this area, but I could be wrong.
Creating your own File System can be a challenging experience, but also an enjoyable experience. I have created a few myself and worked on another. If you are creating your own file system, you can have it do whatever you wish.
Look at the bottom of this page for a few that I am working on/with. The LeanFS in particular, uses Inodes as well. The SFS is a very simple file system. Each is well documented so that you can research and decide what you would like to do.

Mistake when use abaqus subroutine to read file with multiple processors(cpus)

I got a mistake when I use abaqus subroutine to read file with multiple processors(cpus),could you help me to deal with this mistake.thanks a lot
I want to read variables from a file ,when one cpu is used,everything is ok,
but when more than one cpus are used,there will be a mistake,it seems that every cpu repeat the same command.
for example,the following is the contents of the file to read from,file name is data.dat
*matID ,2,1
131000.000, 8880.000, 8180.000
0.324, 0.324, 0.300
3990.000, 5320.000, 5320.000
1871.000, 59.700, 59.700
1291.000, 215.000, 215.000
90.000, 102.000, 102.000
my subroutine is shown as follow:
character*12 check1
integer check2,error
OPEN(10,file='data.dat',status='old',iostat=error)
if (error.EQ.0) then
read(10,*,iostat=error) check1,Nm
end if
close(10)
print *,'Nm=',nm,error
print *,'**'
when I use 2 cpus,the printed results will be :
Nm= 2 0
Nm= 8880 0
**
**
Depending on the reason for reading in data from a file, there are a couple of ways to avoid this problem:
If you only need to access the data once:
Read in the data in a subroutine that is always called in serial. UEXTERNALDB is a good example and can be used so that the file open only happens at the beginning of an analysis or the beginning of an increment as needed. You can then carefully store the information in common blocks. Reading from a common block in parallel should work fine, but do not write to them from the parallel subroutines.
Another way to get in a smaller amount of data is to define solution variables in your input file instead.
If you really need to open this file locally within each parallel thread (can't see why but open to correction), you can use GETNUMCPUS and GETRANK to open different copies of the files within each thread. GETRANK returns an integer giving you the rank/id of the process. I would advise against this method though. If your problem is large enough to warrant using parallel, then you should avoid slowing it down with file reads.
For more info see sections 1.1.31 and 2.1.4 of the Abaqus 6.14 docs.

A rotating log file in perl

I have implemented a log file that will be storing the cpu and memory state of a process after every minute.I have limited the maximum size of the file to 3MB (thats enough for my purpose).
The script will be called by a cron job after every minute and the script will log the details for that minute and will rename the file as "Log_.log".
When the size reaches "3MB - 100 bytes" I reset the file pointer to point to the begining and will overwrite the first entry in the log file and will now rename the file as "Log_<0+some offset>.log".
As I am renaming the file after every minute to update the file pointer position, is it a good/efficient way ?
I do not want to maintain more than one log file for this purpose.
Another option for me is to maintain the file pointer position in a file ,but ....another file !! not interested in maintaining one if this option is good :)
Thanks in Advance.
Are you an engineer? This is a nice example of some simple task, solved by a perfectly working but overly complex solution.
Unless the content you put in takes exactly as many bytes as the content you take out, writing "in" a file will actually cause the whole following part after your writing position to be rewritten to disk. Append is much cheaper.
Renaming the file to store the pointer works - but it's not very elegant, and makes stuff more complex (for one, your process needs write rights to the directory in which the file resides - else just write access to two files is sufficient)
Unless disk space is an issue (and really, it rarely is), your approach is less efficient than say, append everything to a file, and rotate the file when it reaches its maximum size. This way you always have the last 3MB of logs available, and maximum 3MB more in your current file. It will make parsing the file a lot easier too, instead of recalculating the entire pointer position thing.
Update to answer your comment:
Renaming a file every minute (or even every second) shouldn't slow down your system significantly, don't worry about that.
Our concerns are mainly with "why you think you need to rename the file". It's not better technically, it's not better from a logical point of view, it makes a lot of other (future) tasks harder. You could store the file pointer in a seperate file, or at the end of your file, and there are better^H^H^H^H^H^H simpler solutions that don't require the file pointer at all.
I'm confused why you would rename your file. What does this accomplish?
Are the log entries fixed size? Or variable size?
If the entries are fixed size, then there is no trouble in re-writing the existing file from the start: you won't ever have incomplete entries in your file, and if you are writing a counter or timestamps to the file, it should be clear where the 'cursor' is located.
If the entries are variable size, then you should probably not begin re-writing the file from the beginning without somehow making it clear where the 'cursor' is located in the file, and write code that is resilient to reading truncated log entries.
Can you re-use existing tools such as RRDtool?

ExtAudioFileSeek and ExtAudioFileWrite together on the same file

I have a situation where I can save a post-processing pass through the audio by taking some manipulated buffer from the end of the track and writing them to the beginning of my output file.
I originally thought I could do this by resetting the write pointer using ExtAudioFileSeek, and was about to implement it when I saw this line in the docs
Ensure that the file you are seeking in is open for reading only. This function’s behavior with files open for writing is undefined.
Now I know I could close the file for writing then reopen it, but the process is a little more complicated than that. Part of the manipulation I am doing is reading from buffers that are in the file I am writing to. The overall process looks like this:
Read buffers from the end of the read file
Read buffers from the beginning of the write file
Process the buffers
Write the buffers back to the beginning of the write file, overwriting the buffers I read in step 2
Logically, this can be done in 1 pass no problem. Programmatically, how can I achieve the same thing without corrupting my data, becoming less-efficient (opposite of my goal) or potentially imploding the universe?
Yes, using a single audio file for both reading and writing may, as you put it, implode the universe, or at least lead to other nastiness. I think that the key to solving this problem is in step 4, where you should write the output to a new file instead of trying to "recycle" the initial write file. After your processing is complete, you can simply scrap the intermediate write file.
Or have I misunderstood the problem?
Oh, and also, you should use ExtAudioFileWriteAsync instead of ExtAudioFileWrite for your writes if you are doing this in realtime. Otherwise the I/O load will cause audio dropouts.