How to get data from ganglia's database - ganglia

I want use ganglia's data to analyze our cluster, But I don't know where is the ganglia's database.
Anyone had do this before?

It is rrd files that stores metrics data on gmetad. usually the default path is /var/lib/ganglia/rrds/<cluster-name>/<node-name>/ where each metric is stored in a single rrd file like bytes_in.rrd
Please refer to command rrdfetch or this question to see how to fetch data from rrd file (this is pure rrdtool question which is out of ganglia technology)

To get data from ganglia .rrd files and save it in .xml file, you can use rddtool.
rrdtool dump filename.rrd [filename.xml] [--header|-h {none,xsd,dtd}] [--no-header] [--daemon address] > filename.xml
example:
sudo rrdtool dump bytes_out.rrd > data.xml

Related

Best practice for importing bulk data to AWS RDS PostgreSQL database

I have a big AWS RDS database that needs to be updated with data on a periodic basis. The data is in JSON files stored in S3 buckets.
This is my current flow:
Download all the JSON files locally
Run a ruby script to parse the JSON files to generate a CSV file matching the table in the database
Connect to RDS using psql
Use \copy command to append the data to the table
I would like switch this to an automated approach (maybe using an AWS Lambda). What would be the best practices?
Approach 1:
Run a script (Ruby / JS) that parses all folders in the past period (e.g., week) and within the parsing of each file, connect to the RDS db and execute an INSERT command. I feel this is a very slow process with constant writes to the database and wouldn't be optimal.
Approach 2:
I already have a Ruby script that parses local files to generate a single CSV. I can modify it to parse the S3 folders directly and create a temporary CSV file in S3. The question is - how do I then use this temporary file to do a bulk import?
Are there any other approaches that I have missed and might be better suited for my requirement?
Thanks.

How to dump a postgres DB into a .sql file

I have a "cinema" DB in postgres and I want to dump all its tables and data in a cinema.sql file.
This file will contain all the sql code for re-creating the schema, tables and filling them with the data.
I already have a bank.sql file (for the "bank" DB) which I can execute via PSQL console in pg Admin III and import using the command
/i *path to my bank.sql file*
Now, I want to produce a cinema.sql file like bank.sql, but I don't know how to do it.
It's not the backup/restore feature of course, because it produces a .backup file.
I've also tried
pg dump > cinema.dump
In PSQL console but I can't find a .sql file anywhere, so I don't think it is what I'm looking for either.
Couldn't find anything useful for what I need in Postgres documentation unfortunately so I hope you can help me because I'm just a beginner.
Thanks.
As mentioned in the comments and the documentation, you should use pg_dump command line tool.
pg_dump cinema > cinema.sql
I've made it! Don't know if it's the 100% right way to do it, though, but I think it is.
I'll report here just in case someone else might need this in the future and it can be of help.
I selected the db I wanted to dump in .sql file and then right click -> backup.
Here, as format, I chose plain instead of custom, and "mydbdump.sql" as file name.
In Dump Options #1 and Dump Options #2 I checked the checkboxes to include everything I needed (e.g. "include CREATE DATABASE statement").
I compared this newly created .sql dump with the one I already had (using Notepad++) and they look the same (even if, of course, they are from different dbs).

Analyse MongoDB's diagnostic.data files

My MongoDB crashed and I am trying to understand why. On Ubuntu MongoDB produces files in /var/lib/mongodb/diagnostic.data. Those files, e.g. metrics.2016-03-08T17-15-01Z0, are binary files.
What tool should I use to analyse MongoDB diagnostic files? What data do the diagnostic files have?
You can see the contained data of the metrics... files using the tool bsondump which is included in every MongoDB installation.
Just execute bsondump metrics.2016-03-08T17-15-01Z0 and it will print out the decoded content of the file.
I believe at the moment there is no tool from MongoDB to view this.
Please see this comment from MongoDB engineer.
serverStatus, replSetGetStatus, collStats of local.oplog.rs.stats, buildInfo, getCmdLineOpts, hostInfo are the data collected as per latest
To understand the data being collected, please head over to MongoDB source code.
MongoDB 3.2 collects server statistics every second (default interval) into the diagnostic files inside the diagnostic.data directory. This data is collected for analysis of MongoDB server's behaviour by MongoDB engineers. I think no tool/document has been released yet for public to analyse the captured data.

Data Migration from Java Hibernate SQL Server to Python Mongo Stack

I have one live website with multiple active users(around 30K) and each of them have their own configuration to render there homepages. The current stack of the portal is Java Spring Hibernate with SQL Server. Now, we have re written the code in Python MongoDB stack and want to migrate our users to new system. The issue here is that the old and new code will be deployed on the separate machines and we want to run this migration for few users as part of Beta Testing. Once the Beta testing is done, we will migrate all the users.
What would be the best approach to achieve this? We are thinking about dumping the data in alternative file system like XML/JSON on a remote server and then reading it in the new code.
Please suggest what should be the best way to accomplish this task
Import CSV, TSV or JSON data into MongoDB.
It will be faster and optimal to dump the file in a format like json, txt or csv , which should be copied to the new server and then import the data using mongoimport, in the command line shell.
Example
mongoimport -d databasename -c collectionname < users.json
Kindly regard to the link below for more information on mongoimport if you need to
http://docs.mongodb.org/manual/reference/mongoimport/

HBase Export/Import: Unable to find output directory

I am using HBase for my application and I am trying to export the data using org.apache.hadoop.hbase.mapreduce.Export as it was directed here. The issue I am facing with the command is that once the command is executed, there are no errors while creating the export. But the specified output directoy does not appear at its place.The command I used was
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export table_name db_dump/
I got the solution hence I am replying my own answer
You must have following two lines in hadoop-env.sh in conf directory of hadoop
export HBASE_HOME=/home/sitepulsedev/hbase/hbase-0.90.4
export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.90.4.jar:$HBASE_HOME/conf:$HBASE_HOME/hbase-0.90.4-test.jar:$HBASE_HOME/lib/zookeeper-3.3.2.jar:$HBASE_HOME
save it and restart mapred by ./stop-mapred.sh and ./start-mapred.sh
now run in bin directory of hadoop
./hadoop jar ~/hbase/hbase-0.90.4/hbase-0.90.4.jar export your_table /export/your_table
Now you can verify the dump by hitting
./hadoop fs -ls /export
finally you need to copy the whole thing into your local file system for which run
./hadoop fs -copyToLocal /export/your_table ~/local_dump/your_table
here are the References that helped me out in export/import and in hadoop shell commands
Hope this one helps you out!!
As you noticed the HBase export tool will create the backup in the HDFS, if you instead want the output to be written on your local FS you can use the file URI. In your example it would be something similar to:
bin/hbase org.apache.hadoop.hbase.mapreduce.Export table_name file:///tmp/db_dump/
Related to your own answer, this would also avoid going through the HDFS. Just be very careful if your are running this is a cluster of servers, because each server will write the result files in their own local file systems.
This is true for HBase 0.94.6 at least.
Hope this helps
I think the previous answer needs some modification:
Platform: AWS EC2,
OS: Amazon Linux
Hbase Version: 0.96.1.1
Hadoop Distribution: Cloudera CDH5.0.1
MR engine: MRv1
To export data from Hbase Table to local filesystem:
sudo -u hdfs /usr/bin/hbase org.apache.hadoop.hbase.mapreduce.Export -Dmapred.job.tracker=local "table_name" "file:///backups/"
This command will dump data in HFile format with number of files equaling the number of regions of that table in Hbase.