(Perl's) GD Graph - Limit of plotted data? - perl

Haven't had many good experiences with GD::Graph when trying to plot larger data arrays.
What i have is two arrays, one is 2mln float/integer values, the other - various length but less than 2 million. Trying to plot them on the same line graph. (i do create a 0..2000000 index array for the x axis). Everything has worked when tested for 1 million of the values.
Larger array sizes throw up:
Not a GD::Image object at
/usr/local/lib/perl5/site_perl/5.8.9/GD/Graph.pm
line 182
not even sure where in my script it fails - no other errors
Did not find anything in the official documentation about memory/data limits of GD::Graph.
Additional info that might help you people help me:
my script attempts to save graphs into a file (.gif)
pretty sure this is not due to my web server memory limit (it would show a message about killed perl process)
Thanks

Could you maybe post the code in question so we can give it an inspection and see what's up? at first guess, it does sound like a memory issue related to inability to allocate that much storage space, the allocation is returning a null pointer in the underlying system and thus Perl can't actually create the GD object, since you're trying to allocate somewhere in the range of 125MB off the heap with 2000000 64bit (assuming you're on a 64bit host) ints/floats. But, it could just be something syntactical.

Related

MongoDB - Intermitted slowdowns involving sorting

A couple buddies and I have been having a serious issue with MongoDB. Before I get in to this lets just get some stats out of the way.
Running on a Dedicated VPS with Softlayer
-Ubuntu 13.04 inside an ESX VM
-4 Virtual cores running at 2.0 GHz each
-48GB of RAM
-All on an SSD
-MongoDB Version 2.6.5 (and issue happening on the last 6 releases)
Were doing polygon geo queries using a 2dsphere index, as well a few other variables with ints and booleans added. There's about 7 keys total we'll query at a time, and then sort them afterwards depending what the user requests to sort by (Price, closest point to location, ect).
Now here's where things gets stupidly complicated. At random time in the days, specific locations will stop returning queries in a reasonable time, instead of the 300 to 2000 milliseconds average it returns in like 30 to 150 seconds, and this is ONLY for specific locations. Searching for New York City will return, but searching for London England will take ages.
The second we change what key were sorting by (we've even tried sorting by _id) everything goes back to normal. Then later on, that sort key will break for random locations, and by flipping it it repairs. Another method we found to fix is completely reindexing the key were sorting by (deleting and recreating), but then our search goes down for the time it takes to recreate the index.
This method works, sure, but changing the sorting method defeats the purpose, and reindexing is just more and more downtime for something that shouldn't be happening.
This issue could be a bunch of things, so let me just go through quickly the ones we've diagnosed and found not to be the issue.
-Does not relate to amount of users accessing database (Could be 0, could be 1000).
-Lock is averaged at 2.5% or below.
-Btree is normal between 0 and 200.
-Happens even when not writing to database (and haven't done so in hours)
-CPU usage is minimal
-RAM usage is only 60% average.
-Data in listings are valid.
-All keys within the indexes are the proper types. No string in int indexes, no datetimes in float indexs, ect.
-Has nothing to do with how many results are returned. Issue happens in locations where there is 10000 results, or just 5.
-No errors in logs, no errors in MMS, no errors anywhere... Just slow.
I'm out of ideas here, I simply can't figure out what it may be. Has anyone ever come across something like this? Or have any ideas on what it may be? Anything is appreciated. If I've forgotten to mention something or need to redo how I've explained it please just ask, no issues rewriting. Thanks again.

NetLogo BehaviorSpace memory size constraint

In my model I'm using behaviour space to carry out a number of runs, with variables changing for each run and the output being stored in a *.csv for later analysis. The model runs fine for the first few iterations, but quickly slows as the data grows. My questions is will file-flush when used in behaviour space help this? Or is there a way around it?
Cheers
Simon
Make sure you are using table format output and spreadsheet format is disabled. At http://ccl.northwestern.edu/netlogo/docs/behaviorspace.html we read:
Note however that spreadsheet data is not written to the results file until the experiment finishes. Since spreadsheet data is stored in memory until the experiment is done, very large experiments could run out of memory. So you should disable spreadsheet output unless you really want it.
Note also:
doing runs in parallel will multiply the experiment's memory requirements accordingly. You may need to increase NetLogo's memory ceiling (see this FAQ entry).
where the linked FAQ entry is http://ccl.northwestern.edu/netlogo/docs/faq.html#howbig
Using file-flush will not help. It flushes any buffered data to disk, but only for a file you opened yourself with file-open, and anyway, the buffer associated with a file is fixed-size, not something that grows over time. file-flush is really only useful if you're reading from the same file from another process during a run.

Data management in matlab versus other common analysis packages

Background:
I am analyzing large amounts of data using an object oriented composition structure for sanity and easy analysis. Often times the highest level of my OO is an object that when saved is about 2 gigs. Loading the data into memory is not an issue always, and populating sub objects then higher objects based on their content is much more java memory efficient than just loading in a lot of mat files directly.
The Problem:
Saving these objects that are > 2 gigs will often fail. It is a somewhat well known problem that I have gotten around by just deleting a number of sub objects until the total size is below 2-3 gigs. This happens regardless of how boss the computer is, a 16 gigs of ram 8 cores etc, will still fail to save the objects correctly. Back versioning the save also does not help
Questions:
Is this a problem that others have solved somehow in MATLAB? Is there an alternative that I should look into that still has a lot of high level analysis and will NOT have this problem?
Questions welcome, thanks.
I am not sure this will help, but here: Do you make sure to use recent version of mat file? Check for instance save. Quoting from the page:
'-v7.3' 7.3 (R2006b) or later Version 7.0 features plus support for data items greater than or equal to 2 GB on 64-bit systems.
'-v7' 7.0 (R14) or later Version 6 features plus data compression and Unicode character encoding. Unicode encoding enables file sharing between systems that use different default character encoding schemes.
Also, could by any chance your object by or contain a graphic handle object? In that case, it is wise to use hgsave

What is the maximum NUM_OF_PARAMS in Perl DBI placeholders

What is the maximum number of placeholders is allowed in a single statement? I.e. the upper limit of attribute NUM_OF_PARAMS.
I'm experiencing odd issue where I try to tune the maximum number of multiple rows insert, ie set the number to 20,000 gives me an error because $sth->{NUM_OF_PARAMS} becomes negative.
Reducing the max inserts to 5000 works fine.
Thanks.
As far as I am aware the only limitation in DBI is that the value is placed into a Perl scalar so it is what can be held in that. However, for DBDs it is totally different. I doubt many, if any databases support 20000 parameters. BTW, NUM_OF_PARAMS is readonly so I've no idea what you mean by "set the number to 20,000". I presume you just mean you create a SQL statement with 20000 parameters and then read NUM_OF_PARAMS and it gives you a negative value. If the latter I suggest you report (with an example) that on rt.cpan.org as it does not sound right at all.
I cannot imagine creating a SQL statement with 20000 parameters is going to be very efficient in any database. Far better to try and reduce that to a range or something like it if you can. In ODBC, 20000 parameters would mean 20000 IPDs and APDs and they are quite big structures. Since DB2 cli library is very like ODBC I would imagine you are going to eat up loads of memory.
Given that 20,000 causes negative problems and 5,000 doesn't, there's a signed 16-bit integer somewhere in the system, and the upper bound is therefore approximately 16383.
However, the limit depends on the underlying DBMS and the API used by the DBD module for the DBMS (and possibly the DBD code itself); it is not affected by DBI.
Are you sure that's the best way to deal with your problem?

Memory Efficient and Speedy iPhone/Android Dictionary Storage/Access

Im having trouble with memory on older generation iPhones (ipod touch 1st gen, 2nd gen e.t.c). This is due to the amount of memory allocated when I load and store a 170k word dictionary.
This is the code (very simple):
string[] words = dictionaryRef.text.Split("\n"[0]);
_words = new List<string>(words);
This allocates on start around 12mb of storage, iphone has around 43mb I think. So that + textures + sounds + the OS it tends to break.
Speed wise, accessing using a binary search is fine. But its storing it in memory more efficiently (and loading it more efficiently).
The text.Split appears to take up alot of heap memory.
Any advice?
You can't count too much on how much memory these pre-3.0 devices have available on startup. 43 MB is rather optimistic. Is your app just checking to see if the word is in the list or not? You might want to roll your own hash table instead of using a binary search. I'd search some of the literature and stack overflow to look for efficient ways to store a large dictionary with the particular word sizes you have. A google search on hash table might give you a better implementation.
Use SQLite. It will use less memory and be faster. Create an index on your words column and voila, you have binary search, without having the whole dictionary loaded in memory.
First if dictionaryRef.text is a string (and it looks so) then you already got something huge being allocated (2 bytes per characters). Check this it might well account for a large (near half) amount of the total memory being allocated. You should think about caching this (the database idea is a good one, but a file could do to then use File.ReadAllLines in future execution).
Next you can try do a bit better than Mono's Split method. It creates a List and then turn it into an array (calling ToList) at the end - which you end up creating a new List from. Since your requirement (only '/n') is fairly basic I suggest you to roll your own Split method (or copy/paste/reduce the one from Mono) and avoid the temporary memory allocations.
In any case take a lot of (memory) measurements since allocations, even more for strings, often occurs where we don't look ;-)
I would have to agree with Morningstar that using a SQLite backend for your word storage sounds like the best solution to what you are trying to do.
However, if you insist on using a word list, here's a suggestion:
It looks to me like dictionaryRef.text is constructed by reading a text file in its entirety (File.ReadAllText() or some such).
Instead of doing that, why not use TextReader.ReadLine() to read 1 word at a time from the file into a List, thus avoiding the need to use String.Split() and using tons of temporary storage space?
Ultimately that seems to be what you want anyway... and ReadLine() will "split" on \n for you.