File cache for large files in perl

File cache for large files in perl - perl

I am looking for a cache library in Perl. But the ones I found so far like Cache::Cache and CHI all seem to assume you want to read the file into a data structure in Perl. I am only interested to caching the files to disk without ever reading the file content into Perl.
The files I am dealing with are around 200 MB and will be downloaded from the net. I want a size limit of the cache and an expiry time for the cached files.
Any suggestions ?
Edit: As I did not find any ready library for this I have implemented it myself now. But if anyone can point to one anyway it would of course be interesting.

Solve the problem with one layer of indirection. Store references to files, not the files themselves, in the cache. How exactly a reference looks like depends on your use case.

Try the Cache::File module from CPAN

Related

How to add logging information to perl legacy code

I have a medium to large size system built in perl, that has been developed during the last 15 years and is built of many scripts and pm files,
and in order to improve the system i need more data, the easiest way as i see it to get this data is to have every function in the code to print out the start and end time to some log so it will be possible the understand what is taking the most time.
however this is an old system and some parts are less maintainable than others and on top of it i need it to be running which means in order to get real data i need it to print this out from production.
what i want to do is to override in some way the function declration to wrap each function start in a line like
NAME start STARTTIME PARAMS
and when it leaves the function
NAME ended STARTTIME PARAMS
does anybody can point me to the right direction?
Thanks

Take a look at Devel::NYTProf. It can profile the amount of time that all of your subs are taking (and do a lot more). It doesn't involve a lot of messy code modification; instead you just run your script with it:
perl -d:NYTProf your_script.pl

Previous answers are spot on (I especially recommend Devel::NYTProf). However, a more general technique you could apply in general to gather data about your subroutines' behaviour is fiddling with the symbol table, "appending" (or prepending) code to the actual sub's code.
A couple of pointers:
In Perl, can I call a method before executing every function in a package? (this answer shows a code example you could adapt to your particular situation)
Hook::LexWrap is a module that lets you augment subroutine behaviour in several ways, without touching the original code
HTH

sounds like you need to use a profiler
http://www.perl.org/about/whitepapers/perl-profiling.html

Perl profilers usually have a huge impact on the program performance, so using them in production may not be a great idea.
You can try Devel::ContinuousProfiler that claims to have very low impact (I myself have never used it, just discovered it this morning!)

MD5 hash of a file in ABAP

I want to generate a MD5 hash of a text file in ABAP. I have not found any standard solution for generating it for a very big file. Function module CALCULATE_HASH_FOR_CHAR does not meet my requirements because it takes a string as an input parameter. Although it works for smaller files, in case of a for example 4 GB file one cannot construct such a big string.
Does anybody know whether there is a standard piece of coding for doing that (my google efforts did not bring me anything) or maybe someone has an MD5 algorithm in ABAP that calculates the hash of a file?

It looks like the implementation of this algorithm is impossible in ABAP because of the fact that the language does not allow arithmetic overflows during the calculations. This should also answer the question why it has not been implemented so far in SAP system. Either way looks that there is no other way as to call an external tool which of course is, regrettably, hardly platform independent.
EDIT: Ok! So with a great help of René and the code of Fast MD5 Implementation in Java I created the implementation of MD5 algorithm in ABAP . This implementation allows to update the calculated hash with more bytes, which of course might be coming from different sources.
There is no method which takes a file so far but anyways most of the work has been done.
Some simple ABAP Unit tests are included in the code, which also document how to use it.

Perhaps you could read the file in data blocks of a couple megabytes and create a hash list of those using the suggested function. And then create a single top hash using the generated hash list.

The SDN is usually a very good starting point for finding ABAP-related solutions. I was able to find this post: http://scn.sap.com/thread/1483479
The author suggests:
Upload the .txt file BUT as BIN.
Calculate the hash code using function MD5_CALCULATE_HASH_FOR_RAW
Are you able to get your file in binary format and use MD5_CALCULATE_HASH_FOR_RAW?
Edit: This post even has a more detailed answer using CALCULATE_HASH_FOR_RAW: http://scn.sap.com/thread/1298723
Quote of Shivanand Kalagi's answer:
STR_LEN = XSTRLEN( DATA ).
CALL FUNCTION 'CALCULATE_HASH_FOR_RAW'
EXPORTING
ALG = 'MD5'
DATA = DATA
LENGTH = STR_LEN
IMPORTING
HASH = L_MD5_HASH.

Is it naughty to have a large utility file?

In my C project I have quite a large utils.c file. It is really full of many utilities of different sorts. I feel a bit naughty just stuffing different miscellaneous functions in there. For example it has some utilities related to low level stuff such as a lowercase() function, and it also has some quite sophisticated utilities such as converting to/from different colour formats.
My question is, is it very naughty to have such a large utils.c with many different types of utilities in it? Should I break it up into many different kinds of utility files? Such as graphics_utils.c and so on What do you think?

Breaking them up into separate files based on categories (ie graphics, strings, etc.) will lead to better organization, making it easier to locate certain pieces of code, having smaller files to go through, instead of just one large file.

You want to break it up, not just for organizational reasons, but because you will have many other files that depend on this one. Because everything will depend on this file, it makes this one file difficult to change because it might cause widespread breakage.
http://ifacethoughts.net/2006/04/15/stable-dependencies-principle/

If it's just you that will EVER maintain the stuff, it's a matter of when the complexity gets to the point where you find yourself searching for things. That would be the time to refactor and reorganize (there's a cost to reorganize, just as there's a cost to not reorganize).
If it's POSSIBLE that anyone else will maintain a project that includes your utils, you have to consider THEIR pain point when deciding when to reorganize. Theirs is MUCH lower than yours.

I tend to break them up into various sub-utils as you say (graphics_utils) when it becomes appropriate.

Break it up. Stuff will be easier to find, easier to reuse, easier to refactor, easier to unit test. I recently needed to get a set of ISO-8601 date handling methods out of a ginormous Java utility class of static methods, and it was really hard to find the 5% of the code I needed.

It is definitely not kosher, because the next guy coming through your code won't know where to look for anything. Break it up by function, and your coworkers will thank you!

Another advantage that comes from breaking up the file into separates is that when you place it under source control, you can have finer grained control. This really is useful if you have bits that are tweaked/extended/specialised frequently, and other bits that are relatively stable.

Another point: You should organize your code, i. e. break it up in smaller modules and categorize it, because at some point in time you will end up writing a second and third function for the same thing, simply for the reason that you wont find that function that you knew it was there, but you don't remember it's name.
I've got a (rather large) project with such a module and there is programming logic for which there are up to 5-6 implementations (for the same thing).

Like everyone else I would break them up. But I tend to use Extension Methods now, so I would have one class (and one file) per class being extended (e.g. StringExtensions, SqlDataReaderExtensions, etc). I find this tends to break up the utility methods nicely.

Does it matter if there are unused functions I put into a big CoolFunctions.h / CoolFunctions.m file that's included everywhere in my project?

I want to create a big file for all cool functions I find somehow reusable and useful, and put them all into that single file. Well, for the beginning I don't have many, so it's not worth thinking much about making several files, I guess. I would use pragma marks to separate them visually.
But the question: Would those unused methods bother in any way? Would my application explode or have less performance? Or is the compiler / linker clever enough to know that function A and B are not needed, and thus does not copy their "code" into my resulting app?

This sounds like an absolute architectural and maintenance nightmare. As a matter of practice, you should never make a huge blob file with a random set of methods you find useful. Add the methods to the appropriate classes or categories. See here for information on the blob anti-pattern, which is what you are doing here.
To directly answer your question: no, methods that are never called will not affect the performance of your app.

No, they won't directly affect your app. Keep in mind though, all that unused code is going to make your functions file harder to read and maintain. Plus, writing functions you're not actually using at the moment makes it easy to introduce bugs that aren't going to become apparent until much later on when you start using those functions, which can be very confusing because you've forgotten how they're written and will probably assume they're correct because you haven't touched them in so long.
Also, in an object oriented language like Objective-C global functions should really only be used for exceptional, very reusable cases. In most instances, you should be writing methods in classes instead. I might have one or two global functions in my apps, usually related to debugging, but typically nothing else.
So no, it's not going to hurt anything, but I'd still avoid it and focus on writing the code you need now, at this very moment.

The code would still be compiled and linked into the project, it just wouldn't be used by your code, meaning your resultant executable will be larger.
I'd probably split the functions into seperate files, depending on the common areas they are to address, so I'd have a library of image functions separate from a library of string manipulation functions, then include whichever are pertinent to the project in hand.

I don't think having unused functions in the .h file will hurt you in any way. If you compile all the corresponding .m files containing the unused functions in your build target, then you will end up making a bigger executable than is required. Same goes for if you include the code via static libraries.
If you do use a function but you didn't include the right .m file or library, then you'll get a link error.

testing strategies: generating a XML file

I'm writing a couple of classes that generate xml file. (Details probably not important at the moment).
I wondering the best testing strategy is.
I don't want to re-write the xml generation code just to compare the output, when I could write the file to disk and compare it at certain milestones (the xml spec won't change often, like once or twice every couple of years)
I'm more interested in testing the behaviour of the architecture instead of the getters & setters
Options that come to mind:
rebuilding the xml file in the testing environment and comparing the string representations
manually checking the result (writing to file, etc)
rebuilding the xml file in memory in the testing environment and comparing the in-memory elements.
Virtual Bonus if you know any libraries for C++ and/or Google Test.
Ideas?

Have you considered using XSD's and validating your XML to the XSD? You didn't mention if it was content or structure you were testing for (probably both).
If it validates, it will test the structure of the XML will conform to the required structure.

In the past I've approached this two ways:
Compare the xml file against the result stored as a string in the test file. This is easy to implement, and unless you are wanting to generate variations of the xml file for testing purposes, the string comparison method works fine.
In the case where you have a xml file writer and reader, you can compare the original with the round trip result.
I agree with you that you shouldn't replicate the logic to generate the file in the test function, just for the purpose of testing. Also, I would try to avoid the need to write to the file system -- this is unecessary dependence on the file system, and would probably result in slower running tests.

You might consider using XML Unit: http://xmlunit.sourceforge.net.
It provides JUnit extension classes which can be used to assert equality
of XML files.

You might consider an XML diff tool. There is a free one available on MSDN: XML Diff and Patch Tool.
I see you are looking for C++ tools. In that case, libxmldiff might be more suitable.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

File cache for large files in perl - perl

Solve the problem with one layer of indirection. Store references to files, not the files themselves, in the cache. How exactly a reference looks like depends on your use case.

Try the Cache::File module from CPAN

Related

How to add logging information to perl legacy code

MD5 hash of a file in ABAP

Is it naughty to have a large utility file?

Does it matter if there are unused functions I put into a big CoolFunctions.h / CoolFunctions.m file that's included everywhere in my project?

testing strategies: generating a XML file

Categories

Resources