What are the minimum requirement of hadoop cluster for testing - distributed-computing

I am new to Hadoop. I am trying to built hadoop cluster to test performance of hadoop.I want to know what are the minimum cluster size,memory,disks space,number of cores,for each node(master and slave) and I want to know what would be the size of testing file. I am trying to process text file

For HortonWorks
Runs on 32-bit and 64-bit OS (Windows 7, Windows 8 and Mac OSX and LINUX)
Your machine should have a minimum of 10 GB to able to run the VM which allocates 8GB
Virtualization enabled on BIOS Only If you're running it on a VM
Browser: Chrome 25+, IE 9+, Safari 6+, Firefox 18+ recommended. (Sandbox will not run on IE 10)
Just go to their download page http://hortonworks.com/hdp/downloads/
Look at Cloudera requirements you'll get this
RAM - 4 GB
IPv6 must be disabled.
No blocking by iptables or firewalls; port 7180 must be open because it is used to access Cloudera Manager after installation. Cloudera Manager communicates using specific ports, which must be open.
ext3: This is the most tested underlying filesystem for HDFS.
CPU - the more the better
JDK 1.7 at least
For more information you can check the following link
http://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_cm_requirements.html#cmig_topic_4_1_unique_1

Related

Confluent Control Center no clusters found

I am following this tutorial-
https://docs.confluent.io/platform/current/platform-quickstart.html
At step 3 when I click on "Connect" I see no option to add connector.
How do I add a connector?
For reference I am using M1 Mac book Air and Docker v4.12.0
You'll only be able to add a connector if you are running Kafka Connect server, and have properly configured Control Center to use it.
On Mac: Docker memory is allocated minimally at 6 GB (Mac). When using Docker Desktop for Mac, the default Docker memory allocation is 2 GB. Change the default allocation to 6 GB in the Docker Desktop app by navigating to Preferences > Resources > Advanced.
Assuming you already did that, then you need to look at the outputs from docker-compose ps and docker-compose logs connect to determine if the Connect containers are healthy and running.
Personally, I don't use Control Center since I prefer to manage connectors as config files, not copy/paste or click through UI fields. In other words, if Connect container is healthy, try using its HTTP endpoints directly with curl/postman, etc
I had exactly the same issue with there being no way to add a Connector.
Updating the container version from my old version 6.2.1 to 7.3.0 solved it.

How storage space is allocated in Minikube?

I am using Minikube to bootstrap a Kubernetes cluster on my local machine (for learning purposes). I am in Windows platform. Minikube is installed on C drive. It's actually low on disk space due to some personal files and other Softwares. According to Minikube documentations, it requires a 20GB of disk space for its VM. However, when I try to bootstrap the Kubernetes cluster sometimes booting up fails stating low disk space. But disk space is available in my other drives.
By default on which drive, Minikube allocates its space? Installed location? Is there any way to specify on which drive Minikube allocates its 20GB space?
As pointed out in the comments, disk allocation is done by the driver which is used to create the VM. In my case I was using hyperv as my VM driver, so I used following steps. (Your steps may slightly vary according to your Windows OS version - I am using Windows 10)
Start ---> Hyper-V manager ---> Hyper-V settings ---> Change the default folder to store virtual hard disk files
You can find detailed illustration in here

Apache Druid: My VM hangs when I try to load quickstart data

I'm new to Apache Druid. I used Azure VM (Standard B2s (2 vcpus, 4 GiB memory)) to install apache druid and then tried to load the quick-start tutorial json data (wikiticker-2015-09-12-sampled.json.gz) using console.
I followed all the instructions as mentioned in the DRUID tutorial on their official site. I tried multiple times but each time the VM hangs and make it unresponsive. Am I missing anything/need to do any configuration changes for task to execute before loading the data?
Thanks.
Druid comes with several startup configuration profiles for a range of machine sizes.
*Single server reference configurations
Nano-Quickstart: 1 CPU, 4GB RAM
Micro-Quickstart: 4 CPU, 16GB RAM
Small: 8 CPU, 64GB RAM (~i3.2xlarge)
Medium: 16 CPU, 128GB RAM (~i3.4xlarge)
Large: 32 CPU, 256GB RAM (~i3.8xlarge)
X-Large: 64 CPU, 512GB RAM (~i3.16xlarge)
*
To start the Druid services I was using the micro configuration profile:
./bin/start-micro-quickstart
However, my machines as mentioned above is more of a Nano configuration and hence should be using below command to start the Druid services:
./bin/start-nano-quickstart
I was now able to successfully load and query the data file.
Please check your machine configuration before running the service start command.
Regards,
Udayan

Server Configuration for WSO2 IOT Server with Postgresql

I have a aws ubuntu server with 4gb RAM and 2gb internal memory. I want the wso2 iot server with postgresql configuration. What kind of configuration needed for aws ubuntu server for this requirement. As per the wso2 iot documentation, 4gb RAM and 1gb, I have configured with that configuration which is not good right now. Please do any one tell me the what kind of server optimisation needed for my requirement.
When dealing with wso2 modules, I have found that they only work for me when deployed as individual servers. I was using local VirtualBox vms so that I had Data Services on one vm, Enterprise Service Bus on another, etc. Any attempt to combine them in the installer would result in Java dependency hell.

postgresql ram setting for 16gb macbook

I'm in the development stage of an application and happened that I have to use Postgresql. In the README file are the following instructions...
On a MacBook Pro with 2GB of RAM, the author's sysctl.conf contains:
kern.sysv.shmmax=1610612736
kern.sysv.shmall=393216
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.maxprocperuid=512
kern.maxproc=2048
Note that (kern.sysv.shmall * 4096) should be greater than or equal to
kern.sysv.shmmax. kern.sysv.shmmax must also be a multiple of 4096.
I'm guessing that for development I wouldn't have problem leaving the default setting for my Mac, however, there are occasions where I'm running some python script to do data science and I would like to take advantage all the resources available (RAM). What would be the correct configuration for 16GB of RAM.