How to prune iostat command to get a single numeric value - iostat

I am trying to use iostat command output to plot a graph. For this purpose I need a single numeric value.
When I run this command "iostat -d -z device sdb --human 2" I get a tabular output:
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdb 0.20 0.1k 36.9k 36.6M 24.9G
How can I prune the command to retrieve only the kB_wrtn/s

Related

How capture "ANALYZE VERBOSE tablename" output into logfile

I would like to capture output of "ANALYZE VERBOSE TableName" into output file when executed from shell script.
Want to capture this:
INFO: analyzing "tablename" INFO: "tablename": scanned 1 of 1 pages,
containing 7 live rows and 2 dead rows; 7 rows in sample, 7 estimated
total rows
Using this command
psql -h $DB_HOST_NAME -U $DB_USER $DB_NAME -f query.txt --echo-errors --echo-queries >> output.log
But its only capturing text "ANALYZE" not capturing entire text.
Please suggest how to print entire text into output file.
The output you are trying to capture is directed through stderr, so you need to capture/redirect that output at the shell level. This can be tempermental based on your versions of your OS / Shell. On OSX with 3.2.57, you would need to use:
psql -h $DB_HOST_NAME -U $DB_USER $DB_NAME -f query.txt --echo-queries >> output.log 2>&1 ;
If that doesn't work, try looking up the specifics for whatever combination of OS/shell you are using.

Mahout clustering: How to retrieve the name of a named vector

I want to cluster multiple documents using Mahout. The clustering works fine but I have no idea how to find out which documents are located in each cluster.
I read that you can use the option --namedVector when creating the sparse-files but where does it take the ID from and how can I retrieve this ID after the clustering is completed?
Right now I am doing the following steps:
I have a directory with a file for each document. The files are in the following format with the ID of the document as filename:
filename: documentID.txt
[TITLE]
[CONTENT]
I create a sparse directory with namedVectors using:
./mahout seqdirectory -i tmp/es-out -o tmp/es-out-seqdir -c UTF-8 -chunk 64 -xm sequential
./mahout seq2sparse -i tmp/es-out-seqdir -o tmp/es-out-sparse --maxDFPercent 85 --namedVector
Then I can cluster the results and create a dump:
./mahout kmeans -i tmp/es-out-sparse/tfidf-vectors -c tmp/es-kmeans-clusters -o tmp/es-kmeans -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -x 10 -k 20 -ow --clustering
./mahout clusterdump -i tmp/es-kmeans/clusters-10-final -o tmp/clusterdump -d tmp/es-out-sparse/dictionary.file-0 -dt sequencefile -b 100 -n 20 --evaluate -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -sp 0 --pointsDir tmp/es-kmeans/clusteredPoints
The dump looks like this:
:VL-190{n=1 c=[1:3.407, 110:6.193, 2007:3.736, about:1.762, according:2.948, account:3.507, acting:6.
Top Terms:
epa => 13.471728324890137
mountaintop => 11.364262580871582
mine => 10.942587852478027
Weight : [props - optional]: Point:
[...]
k-means in Mahout is only a toy.
You can use it for howtos and tutorials, but for real use it is too slow, too limited, roo hard to use. (Also, k-means results are not half as good as people think... most of the time they are dogfood.)
Benchmark other tools, and you'll be surprised big time.
I found a way. You can use the seqdumper to extract the cluster mapping:
./mahout seqdumper -i /tmp/es-kmeans/clusteredPoints/part-m-00000 -o /tmp/cluster-points.txt
Than you can use a regex to extract the mapping of the vector IDs to cluster IDs.

Running NetLogo on HPC machine: how to specify the number of cores to be used?

$ wget https://ccl.northwestern.edu/netlogo/5.1.0/netlogo-5.1.0.tar.gz
$ tar -xzf netlogo-5.1.0.tar.gz
$ ~/netlogo-5.1.0/netlogo-headless.sh \
--model ~/myproject/MyModel.nlogo \
--experiment MyExperiment \
--table ~/myproject/MyNewOutputData.csv
Using the above commands to run a netlogo headless on HPC machine. The problem is how to I specify the number of cores to be used or does by default take the maximum avialable?
A look at http://ccl.northwestern.edu/netlogo/5.1.0/docs/behaviorspace.html#advanced reveals:
--threads <number>: use this many threads to do model runs in parallel, or 1 to disable parallel runs. defaults to one thread per processor.
This is equivalent to the same setting in the BehaviorSpace GUI.

How do I get a core dump on OS X Lion?

I am working on a PostgreSQL extension in C that segfaults, so I want to look at the core dump file on my OS X Lion box. However, there are no core files in /cores or anywhere else that I can find. It appears that they are enabled in the system but are limited to a size of 0:
> sysctl kern.coredump
kern.coredump: 1
> ulimit -c
0
I tried setting ulimit -c unlimited in the shell session I'm using to start and stop PostgreSQL, and it seems to stick:
> ulimit -c
unlimited
And yet no matter what I do, no core files. I am starting PostgreSQL with pg_ctl -c, where the -c tells PostgreSQL to generate core dumps. But the system has nothing. How can I get Lion to dump core files?
The /cores/ directory is not necessarily there in Lion , and if it's not there, you won't get cores. You should be able to set the ulimit (as you have), run a program like cat(1), quit with a SIGQUIT (control-backslash) and get a coredump:
lion:~ user$ ulimit -c unlimited
lion:~ user$ cat
^\
^\
Quit: 3 (core dumped)
lion:~ user$ ls -l /cores/
total 716584
-r-------- 1 user user 366891008 Jun 21 23:35 core.1263
lion:~ user$
Technical Note TN2124 http://developer.apple.com/library/mac/#technotes/tn2124/ as suggested by Yuji in https://stackoverflow.com/a/3783403/225077 is helpful.

K-means on Mahout returning non-exclusive clusters

In my data I have users with a list of likes, I've dumped these likes into individual files for each user and would like to cluster them. Everything is working except the output has the same likes in multiple clusters. My understanding is k-means should be exclusive. I figure the problem is perhaps with how I am dumping the data. I have also dumped all of the likes without spaces for the time being until I can write a custom tokenizer. Here's what I'm running (from a ruby script).
system("#{MAHOUT_CMD} seqdirectory -c UTF-8 -i data/users -o data/kmeans/converted")
system("#{MAHOUT_CMD} seq2sparse -i data/kmeans/converted -o data/kmeans/vectors")
system("#{MAHOUT_CMD} kmeans -i data/kmeans/vectors/tfidf-vectors -c data/kmeans/initial_clusters -o data/kmeans/kmeans_clusters -dm org.apache.mahout.common.distance.EuclideanDistanceMeasure -cd 0.1 -k 20 -x 20")
last_cluster_folder = Dir["data/kmeans/kmeans_clusters/*"].last.gsub("data/kmeans/kmeans_clusters/", "")
system("#{MAHOUT_CMD} clusterdump -s data/kmeans/kmeans_clusters/#{last_cluster_folder}/ -d data/kmeans/vectors/dictionary.file-0 -dt sequencefile -o data/kmeans/clusters.txt -n 1000")
The output lists the "top terms" in each cluster, however many of the likes occur in each cluster (though with different weights). Is the normal output for clusterdumper, do I need to find out what cluster each word belongs to by its weight?
Thanks
Mahout probably is only doing approximate k-means. Plus, there might be objects that have the same distance to more than one cluster.
You should however be able to just take the k means, and then do a 1-nearest-neighbor classification to get a unique result for each objects (this is trivial to parallelize and very fast).