I'm quite new to PostgreSQL. I'm implementing a c program to transfer much data into a PostgreSQL database.
In order to develop the program more and more I have to test a lot (especially performance), I have to perform several runs of the imports successively.
I want to make sure that the caches are clean again when I start the program again.
Which items do I have to keep in mind - besides shared buffers - in order to achieve this.
Thanks a lot in advance
EDIT:
We are using Suse Linux Enterprise 12 and PostgreSQL 9.4.
We have 4 x NetApp filers, each with around 50x VOLs. We've been experience performance issues & tracked it down to how fragmented the data is. We've run some measures (all coming back over 7+) and have been gradually manually running the WAFL reallocates (starting with our VMStores) which is improving the fragmentation level to around 3/4.
As ever - time is short and was wondering if anyone had a script which could handle this process? Preferably Powershell or VBScript.
(we have the DataOnTap CMDlets installed & enabled)
I know you can schedule scans but you cant seem to tell the filer to only run one at a time.
I'd ideally like a script which would:
+Pull a csv of volumes
+Measure each volume sequentially, only starting the next measure when the previous has completed, recording the scoring
+Then Reallocate each volume sequentially, only starting the next Reallocate when the previous has completed, recording the new scoring
For your reference:
https://library.netapp.com/ecmdocs/ECMP1196890/html/man1/na_reallocate.1.html
Any help / guidance in this matter would be very much appreciated!
Are you using 7-mode or cDOT?
Anyway, I only know Powershell. The script shouldn't be long and it would go as something like this:
connect the netapp (using connect-nacontroller / connect-nccontroller)
getting all the volumes (using get-navol / get-ncvol)
get the measurement for each volume (either using foreach or perhaps the command could be run once and give the information for all the volumes)
export the output to csv (using export-csv)
a foreach loop iterating on all the volumes:
- if volume is fragmented behind a given threshold
- run the reallocation (I do not know which command needs to be used)
If you want this thing to run eternally just put it all under a while loop, if you are going to schedule this you should rerun the checks to get a new csv with the new measurements.
Disclaimer:
I am not familiar with the reallocation process nor with it's powershell command behaviors. The post should give you pretty much the things to be done but I was only using common sense.
Perhaps the command for reallocation only start the reallocation process and let it run at the background - resulting in all of the reallocation to run simultaneously. If so, a while loop is needed inside the if statement using another command to report the status until it is completed.
You should try to run this on a single volume and then a list including few volumes in order to make sure it runs the way you want it to.
I'm trying to do a quick and dirty querying of my mongodb database using ipython notebook.
I have several cells each with its own query. Since mongodb can support several connections I would like to run each query in parallel. I thought an ideal way would be just do something like
%%script --bg python
query = pymongo.find(blahbalhba)
You can imagine several cells each with its own query. However I'm not able to access the query returned by pymongo.find.
I understand that this is a subprocess run in a seperate thread, but I have no idea how to access the data since the process is quickly destroyed and the namespace goes away.
I found a similar post for %%bash here but I'm having trouble translating this to a python namespace.
%%script is just a convenient magic, it will not replace writing a full blown magic.
The only thing I can see is to write your own magic. Basically if you can do it with a function that takes a string a parameter, you know how to write a magic.
So How would you (like to) write it in pure python ? (Futures, multiprocessing, queuing library ?) ... then move it to a magic.
We have a policy requirement to use items using wtmp, such as the 'last' command or GDM-Last-Login details. We've discovered that these items will have gaps depending on when wtmp was last rotated, and need to try to work around this.
Because these gaps have been determined to be unacceptable, and keeping wtmp data in a single active logfile forever without splitting off the old data into archives is not really viable, I'm looking for a way to rollover / age-out old wtmp entries while still keeping more recent ones.
From some initial research I've seen this problem addressed in the Unix (AIX, SunOS) world with the use of 'fwtmp' and some pre/post logrotate scripts. Has this been addressed in the Linux world and I've just missed it?
So far as I can tell 'fwtmp' is a Unix built-in that's not made it into RHEL 5 & 6, per searching the RHEL customer portal and some 'yum whatprovides' searches on my test boxes.
Many thanks in advance!
I am wondering how to get a process run at the command line to use less processing power. The problem I'm having is the the process is basically taking over the CPU and taking MySQL and the rest of the server with it. Everything is becoming very slow.
I have used nice before but haven't had much luck with it. If it is the answer, how would you use it?
I have also thought of putting in sleep commands, but it'll still be using up memory so it's not the best option.
Is there another solution?
It doesn't matter to me how long it runs for, within reason.
If it makes a difference, the script is a PHP script, but I'm running it at the command line as it already takes 30+ minutes to run.
Edit: the process is a migration script, so I really don't want to spend too much time optimizing it as it only needs to be run for testing purposes and once to go live. Just for testing, it keeps bring the server to pretty much a halt...and it's a shared server.
The best you can really do without modifying the program is to change the nice value to the maximum value using nice or renice. Your best bet is probably to profile the program to find out where it is spending most of its time/using most of its memory and try to find a more efficient algorithm for what you are trying to do. For example, if your are operating on a large result set from MySQL you may want to process records one at a time instead of loading the entire result set into memory or perhaps you can optimize your queries or the processing being performed on the results.
You should use nice with 19 "niceness" this makes the process very unlikely to run if there are other processes waiting for the cpu.
nice -n 19 <command>
Be sure that the program does not have busy waits and also check the I/O wait time.
Which process is actually taking up the CPU? PHP or MySQL? If it's MySQL, 'nice' won't help at all (since the server is not 'nice'd up).
If it's MySQL in general you have to look at your queries and MySQL tuning as to why those queries are slamming the server.
Slamming your MySQL server process can show as "the whole system being slow" if your primary view of the system through MySQL.
You should also consider whether the cmd line process is IO intensive. That can be adjusted on some linux distros using the 'ionice' command, though it's usage is not nearly as simplistic as the cpu 'nice' command.
Basic usage:
ionice -n7 cmd
will run 'cmd' using 'best effort' scheduler at the lowest priority. See the man page for more usage details.
Using CPU cycles alone shouldn't take over the rest of the system. You can show this by doing:
while true; do done
This is an infinite loop and will use as much of the CPU cycles it can get (stop it with ^C). You can use top to verify that it is doing its job. I am quite sure that this won't significantly affect the overall performance of your system to the point where MySQL dies.
However, if your PHP script is allocating a lot of memory, that certainly can make a difference. Linux has a tendency to go around killing processes when the system starts to run out of memory.
I would narrow down the problem and be sure of the cause, before looking for a solution.
You could mount your server's interesting directory/filesystem/whatever on another machine via NFS and run the script there (I know, this means avoiding the problem and is not really practical :| ).