Apache Druid: My VM hangs when I try to load quickstart data - druid

I'm new to Apache Druid. I used Azure VM (Standard B2s (2 vcpus, 4 GiB memory)) to install apache druid and then tried to load the quick-start tutorial json data (wikiticker-2015-09-12-sampled.json.gz) using console.
I followed all the instructions as mentioned in the DRUID tutorial on their official site. I tried multiple times but each time the VM hangs and make it unresponsive. Am I missing anything/need to do any configuration changes for task to execute before loading the data?
Thanks.

Druid comes with several startup configuration profiles for a range of machine sizes.
*Single server reference configurations
Nano-Quickstart: 1 CPU, 4GB RAM
Micro-Quickstart: 4 CPU, 16GB RAM
Small: 8 CPU, 64GB RAM (~i3.2xlarge)
Medium: 16 CPU, 128GB RAM (~i3.4xlarge)
Large: 32 CPU, 256GB RAM (~i3.8xlarge)
X-Large: 64 CPU, 512GB RAM (~i3.16xlarge)
*
To start the Druid services I was using the micro configuration profile:
./bin/start-micro-quickstart
However, my machines as mentioned above is more of a Nano configuration and hence should be using below command to start the Druid services:
./bin/start-nano-quickstart
I was now able to successfully load and query the data file.
Please check your machine configuration before running the service start command.
Regards,
Udayan

Related

Mongodb on the cloud

I'm preparing my production environment on the Hetzner cloud, but I have some doubts (I'm more a developer than a devops).
I will get 3 servers for the replicaset with 8 core, 32 Gb ram and 240 gb ssd. I'm a bit worried about the size of the ssd the server comes with and Hetzner has the possibility to create volumes to be attached to the servers. Since mongodb uses a single folder for the db data, I was wondering how can I use the 240 gb that comes with the server in combination with external volumes. At the beginning I can use the 240 gb, but then I will have to move the data folder to a volume when it reaches capacity. Im fine with this, but it looks to me that when I will move to volumes, this 240gb will not be used anymore (yes I can use them to save the mongo journaling as they suggest to store it in a separate partition).
So, my noob question is, how can I use both the disk that comes with the server and the external volumes?
Thank you

Evaluation of thousands metrics in InfuxDB

I’m trying to evaluate thousands of metrics using a checkers, but my computer doesn’t count it. I tried tasks too.
PC: notebook with Core i5 (8 threads) and 16 GB RAM
I’m running influxdb in the docker (6 threads, 8 GB RAM is allowed).
Have you some idea where is problem?
Or influxdb can compute so many metrics?
Thanks!
I solved it on influxdb community: https://community.influxdata.com/t/evaluation-of-thousands-metrics/19422

Presto on Preemptible GCE instances

I am running an instance group of 20 Preemptible GCE instance to read ORC files on Google storage, The data partitioned by hour, each hour about 2GB.
What type of instances should i use ?
How many of the Ram should be used by the JVM ?
I am using autoscale configuration of 80% CPU and 10 minute cooldown, Is there more subtitle config for Presto ?
Is there a solution for servers shutdowns, due to lack of resources ?
Partial responses will be appreciated as well.
As 0.199 version of PrestoDB there's no google cloud storage connector for Presto, which makes impossible to query GCS data.
Regarding hardware requirements, I'll cite Terada doc here.
Memory
You should allocate a minimum of 16GB of RAM per node for Presto. But
recommend 64GB for most production workloads.
Network Bandwidth
It is recommended to have 10 Gigabit Ethernet between all the nodes in
the cluster.
Other Recommendations
Presto can be installed on any normally configured Hadoop cluster.
YARN should be configured to account for resources dedicated to
Presto. For example, if a node has 64GB of RAM, perhaps you would
normally allocate 60GB to YARN. If you install Presto on that node and
give Presto 32GB of RAM, then you should subtract 32GB from the 60GB
and let YARN only allocate 28GB per node. An optimized configuration
might choose to have separate Presto and Hadoop nodes. The optimized
configuration allows you to give more memory to Presto, and thus
perform larger join queries, for example.

Spark Configuration: SPARK_MEM vs. SPARK_WORKER_MEMORY

In spark-env.sh, it's possible to configure the following environment variables:
# - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
export SPARK_WORKER_MEMORY=22g
[...]
# - SPARK_MEM, to change the amount of memory used per node (this should
# be in the same format as the JVM's -Xmx option, e.g. 300m or 1g)
export SPARK_MEM=3g
If I start a standalone cluster with this:
$SPARK_HOME/bin/start-all.sh
I can see at the Spark Master UI webpage that all the workers start with only 3GB RAM:
-- Workers Memory Column --
22.0 GB (3.0 GB Used)
22.0 GB (3.0 GB Used)
22.0 GB (3.0 GB Used)
[...]
However, I specified 22g as SPARK_WORKER_MEMORY in spark-env.sh
I'm somewhat confused by this. Probably I don't understand the difference between "node" and "worker".
Can someone explain the difference between the two memory settings and what I might have done wrong?
I'm using spark-0.7.0. See also here for more configuration info.
A standalone cluster can host multiple Spark clusters (each "cluster" is tied to a particular SparkContext). i.e. you can have one cluster running kmeans, one cluster running Shark, and another one running some interactive data mining.
In this case, the 22GB is the total amount of memory you allocated to the Spark standalone cluster, and your particular instance of SparkContext is using 3GB per node. So you can create 6 more SparkContext's using up to 21GB.

MongoDB in the cloud hosting, benefits

Im still fighting with mongoDB and I think this war will end is not soon.
My database has a size of 15.95 Gb;
Objects - 9963099;
Data Size - 4.65g;
Storage Size - 7.21g;
Extents - 269;
Indexes - 19;
Index Size - 1.68g;
Powered by:
Quad Xeon E3-1220 4 × 3.10 GHz / 8Gb
For me to pay dearly for a dedicated server.
On VPS 6GB memory, database is not imported.
Migrate to the cloud service?
https://www.dotcloud.com/pricing.html
I try to pick up the rate but there max 4Gb memory mongoDB (USD 552.96/month o_0), I even import your base can not, not enough memory.
Or something I do not know about cloud services (no experience with)?
Cloud services are not available to a large database mongoDB?
2 x Xeon 3.60 GHz, 2M Cache, 800 MHz FSB / 12Gb
http://support.dell.com/support/edocs/systems/pe1850/en/UG/p1295aa.htm
Will work my database on that server?
This is of course all the fun and get the experience in the development, but already beginning to pall ... =]
You shouldn't have an issue with a db of this size. We were running a mongodb instance on Dotcloud with 100's of GB of data. It may just be because Dotcloud only allow 10GB of HDD space by default per service.
We were able to backup and restore that instance on 4GB of RAM - albeit that it took several hours
I would suggest you email them directly support#dotcloud.com to get help increasing the HDD allocation of your instance.
You can also consider using ObjectRocket which is a MOngoDB as a service. For a 20Gb database the price is $149 per month - http://www.objectrocket.com/pricing