memcached.exe -h
-b run a managed instanced (mnemonic: buckets)
what the heck does this mean? Googling did not help :(
From the source code, this appears to mean that the memcached has multiple independent buckets.
This is probably for security or accounting reasons, allowing you to have multiple concurrent users of a memcached, without one user getting the results that a different one cached.
http://src.opensolaris.org/source/xref/webstack/memcached-incubator/branches/performance/server/memcached.h#51
Related
I have a bucket containing many millions of blobs that I want to delete however I can't simply delete the bucket. This is the best method I have come up with to delete millions of blobs in the quickest time possible:
gsutil ls gs://bucket/path/to/dir/ | xargs gsutil -m rm -r
For what I want to do (which involves removing about 30million blobs) it still takes many hours to run, partly I guess because its at the mercy of the speed of my broadband connection.
Anyone know of a quicker way of achieving this? I had kinda hoped it'd be an instantaneous operation as in the backend the location could simply be marked as deleted - clearly not.
Google recommend using the console to do this
The Cloud Console can bulk delete up to several million objects and does so in the background. The Cloud Console can also be used to bulk delete only those objects that share a common prefix, which appear as part of a folder when using the Cloud Console.
https://cloud.google.com/storage/docs/best-practices#deleting
That said (personal opinion here) using the console might be quicker but you haven't a clue how far its got. At least with cli option you do know.
Another alternative is using lifecycle management to delete based on rules:
Delete objects in bulk
If you want to bulk delete a hundred thousand or more objects, avoid
using gsutil, as the process takes a long time to complete. Instead,
use the Google Cloud console, which can delete up to several million
objects, or Object Lifecycle Management, which can delete any number
of objects.
To bulk delete objects in your bucket using Object Lifecycle
Management, set a lifecycle configuration rule on your bucket where
the condition has Age set to 0 days, and the action is set to delete.
From: https://cloud.google.com/storage/docs/deleting-objects#delete-objects-in-bulk
However this won't work if you're in a rush:
After you have added or edited a rule, it may take up to 24 hours to take effect.
I'm about upgrade a quite large PostgreSQL cluster from 9.3 to 11.
The upgrade
The cluster is approximately 1,2Tb in size. The database has a disk system consisting of a fast HW RAID 10 array of 8 DC-edition SSDs with 192GB ram and 64 cores. I am performing the upgrade by replicating the data to a new server with streaming replication first, then upgrading that one to 11.
I tested the upgrade using pg_upgrade with the --link option, this takes less than a minute. I also tested the upgrade regularly (without --link) with many jobs, that takes several hours (+4).
Questions
Now the obvious choice is of cause for me to use the --link option, however all this makes me wonder - is there any downsides (performance or functionality wise) to using that over the regular slower method? I do not know the internal workings of postgresql data structures, but I have a feeling there could be a performance difference after the upgrade between rewriting the data entirely and to just using hard links - whatever that means?
Considerations
The only thing I can find in the documentation about the drawbacks of --link is the downside of not being able to access the old data directory after the upgrade is performed https://www.postgresql.org/docs/11/pgupgrade.htm However that is only a safety concern and not a performance drawback and doesn't really apply in my case of replicating the data first.
The only other thing I can think of is reclaiming space, with whatever performance upsides that might have. However as I understand it, that can also be achieved by running a VACUUM FULL DATABASE (or CLUSTER?) command after the --link-upgraded database has been upgraded? Also the reclaiming of space is not very impactful performance wise on an SSD as I understand.
I appreciate if anyone can help cast some light into this.
There is absolutely no downside to using hard links (with the exception you noted, that the old cluster is dead and has to be removed).
A hard link is in no way different from a normal file.
A “file” in UNIX is in reality an “inode”, a structure containing file metadata. An entry in a directory is a (hard) link to that inode.
If you create another hard link to the inode, the same file will be in two different directories, but that has no impact whatsoever on the behavior of the file.
Of course you must make sure that you don't start both the only and the new server. Instant data corruption would ensue. That's why you should remove the old cluster as soon as possible.
In my Python app I need to share a key/value store among a few processes - one is updating the data, other processes are only retrieving it at random times. Persistence is not required.
My initial idea was to use memcached, but it seems to have some LRU mechanism to remove old data when it's short on RAM. I'd much prefer to get an error in such case.
Obviously, memcached was optimized to be a cache system, while what I need is simply a network-accessible hash table. I could implement something simple from scratch, but why reinvent the wheel?
Run memcached with the -M option.
-M return error on memory exhausted (rather than removing items)
How the dump load can be done faster in Progress?
I need to automate the process of dump load,so that I can have dump load on weekly basis?
Generally one wouldn't need to do a weekly D&L as the server engine does a decent job of managing is data. A D&L should only be done when there's an evident concern about performance, when changing versions, or making a significant organizational change in the data extents.
Having said that, a binary D&L is usually the fastest, particularly if you can make it multi-threaded.
Ok, dumping and loading to cross platforms to build a training system is probably a legitimate use-case. (If it were Linux to Linux you could just backup and restore -- you may be able to do that Linux to UNIX if the byte ordering is the same...)
The binary format is portable across platforms and versions of Progress. You can binary dump a Progress version 8 HPUX database and load it into a Windows OpenEdge 11 db if you'd like.
To do a binary dump use:
proutil dbname -C dump tablename
That will create tablename.bd. You can then load that table with:
proutil dbname -C load tablename
Once all of the data has been loaded you need to remember to rebuild the indexes:
proutil dbname -C idxbuild all
You can run many simultaneous proutil commands. There is no need to go one table at a time. You just need to have the db up and running in multi-user mode. Take a look at this for a longer explanation: http://www.greenfieldtech.com/downloads/files/DB-20_Bascom%20D+L.ppt
It is helpful to split your database up into multiple storage areas (and they should be type 2 areas) for best results. Check out: http://dbappraise.com/ppt/sos.pptx for some ideas on that.
There are a lot of tuning options available for binary dump & load. Details depend on what version of Progress you are running. Many of them probably aren't really useful anyway but you should look at the presentations above and the documentation and ask questions.
What is the optimal way to introduce a data persistance system in gearman with optimal performance in mind?
I'm asking because we are thinking of moving away from our queue system in mysql and moving to gearman. It seems rather odd to use a relational database again for persisting data in the queue so we are looking for other possibilities.
I know of libdrizzle, libsqlite, etc ... but i'm thinking more into nosql, what are good, proven and stable solutions?
if you do
$ gearmand -h
it should show you what queue options you have..
I think the only no-sql option available by default is memcached.