Can't launch DataLab on Google DataProc after a while - google-cloud-dataproc

I created a cluster on DataProc with Datalab installed. I used the following commands to access to dataLab:
export ZONE=us-central1-b;export CLUSTER_NAME=test;
gcloud compute ssh ${CLUSTER_NAME}-m --zone=${ZONE} --ssh-flag='-D 10001' --ssh-flag='-N' --ssh-flag='-n'
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
"http://${CLUSTER_NAME}-m:8080" \
--proxy-server='socks5://localhost:10001' \
--host-resolver-rules='MAP * 0.0.0.0 , EXCLUDE localhost' \
--user-data-dir='/tmp'
And it works for a while. I didn't change anything at all, but after like 2-3 hours I ran the same commands above, I can't access to dataLab again, and get the following error:
ERROR: (gcloud.compute.ssh) Instance [test-m] in zone [us-central1-b] has not been allocated an external IP address yet. Try rerunning this command later.
I tried many times later on and can never success from the first error. This happens to every cluster I created (i.e. not able to access dataLab of the cluster after a while). Can anyone please help me with this? Thank you.

Assuming it's not just in a narrow window of time at instance startup where the address hasn't been allocated yet, at runtime you shouldn't have to worry about external IP getting deallocated so it's likely a false error.
Usually this occurs erroneously when an instance is in a TERMINATED state. This is in contrast to instances where you configure to not use external IP at all where, you'd otherwise get a message like Instance [foo] in zone [bar] does not have an external IP address. This is because in a TERMINATED instance, there is no active VM resource, but the config metadata must still contain a networkInterface config to preserve the full configuration metadata of the instance, and the gcloud compute logic currently assumes that if networkInterfaces.accessConfigs is defined that it is expected to "eventually" have the natIP field.
Check to make sure someone didn't click STOP on your VM while you were away. Starting the VM back up should get it working again.

Related

Kubernetes error code 403: user anonymous cannot get path

I been trying to install kubernetes for the first time, after the initial setup, I was finally able to execute the kube-up.bash script:
kubernetes/cluster/kube-up.bash
Everything goes well and I even see the list of cluster services installed:
Then the book I am checking says:
"Go to: https//your_master_ip/ui/"
But when I try I can only see the following:
I assumed I did not performed a proper setup on during the auth process, so I did:
gcloud auth list
But my active account is there with my Google email, so I am not sure what I am doing wrong.
I am able to access the Google Cloud Platfrom and see the project I created for this, I can see the traffic also.
Also, the kubectl commands are not working on my system, it throws a bash error that It was not able to locate the order.
Can someone please assist me?
Regards

Apache CloudStack: No templates showing when adding instance

I have setup the apache cloudstack on CentOS 6.8 machine following quick installation guide. The management server and KVM are setup on the same machine. The management server is running without problems. I was able to add zone, pod, cluster, primary and secondary storage from the web interface. But when I tried to add an instance it is not showing any templates in the second stage as you can see in the screenshot
However, I am able to see two templates under Templates link in web UI.
But when I select the template and navigate to Zone tab, I see Timeout waiting for response from storage host and Ready field shows no.
When I check the management server logs, it seems there is an error when cloudstack tries to mount secondary storage for use. The below segment from cloudstack-management.log file describes this error.
2017-03-09 23:26:43,207 DEBUG [c.c.a.t.Request] (AgentManager-Handler-
14:null) (logid:) Seq 2-7686800138991304712: Processing: { Ans: , MgmtId:
279278805450918, via: 2, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":
{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
GetRootDir for nfs://172.16.10.2/export/secondary failed due to
com.cloud.utils.exception.CloudRuntimeException: Unable to mount
172.16.10.2:/export/secondary at /mnt/SecStorage/6e26529d-c659-3053-8acb-
817a77b6cfc6 due to mount.nfs: Connection timed out\n\tat
org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.getRootDir(Nf
sSecondaryStorageResource.java:2080)\n\tat
org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.execute(NfsSe
condaryStorageResource.java:1829)\n\tat
org.apache.cloudstack.storage.resource.NfsSecondaryStorageResource.executeReques
t(NfsSecondaryStorageResource.java:265)\n\tat
com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat
com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:833)\n\tat
com.cloud.utils.nio.Task.call(Task.java:83)\n\tat
com.cloud.utils.nio.Task.call(Task.java:29)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:262)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\
n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\
n\tat java.lang.Thread.run(Thread.java:745)\n","wait":0}}] }
Can anyone please guide me how to resolve this issue? I have been trying to figure it out for some hours now and don't know how to proceed further.
Edit 1: Please note that my LAN address was 10.103.72.50 which I assume is not /24 address. I tried to give CentOs a static IP by making the following settings in ifcg-eth0 file
DEVICE=eth0
HWADDR=52:54:00:B9:A6:C0
NM_CONTROLLED=no
ONBOOT=yes
BOOTPROTO=none
IPADDR=172.16.10.2
NETMASK=255.255.255.0
GATEWAY=172.16.10.1
DNS1=8.8.8.8
DNS2=8.8.4.4
But doing this would stop my internet. As a workaround, I reverted these changes and installed all the packages first. Then I changed the IP to static by the same configuration settings as above and ran the cloudstack management. Everything worked fine untill I bumped into this template thing. Please help me figure out what might have went wrong
I know I'm late, but for people trying out in the future, here it goes:
I hope you have successfully added a host as mentioned in Quick Install Guide before you changed your IP to static as it autoconfigures VLANs for different traffic and creates two bridges - generally with names 'cloud' or 'cloudbr'. Cloudstack uses the Secondary Storage System VM for doing all the storage-related operations in each Zone and Cluster. What seems to be the problem is that secondary storage system vm (SSVM) is not able to communicate with the management server at port 8250. If not, try manually mounting the NFS server's mount points in the SSVM shell. You can ssh into the SSVM using the below command:
ssh -i /var/cloudstack/management/.ssh/id_rsa -p 3922 root#<Private or Link local Ip address of SSVM>
I suggest you run the /usr/local/cloud/systemvm/ssvm-check.sh after doing ssh into the secondary storage system VM (assuming it is running) and has it's private, public and link local IP address. If that doesn't help you much, take a look at the secondary storage troubleshooting docs at Cloudstack.
I would further recommend, if anyone in future runs into similar issues, check if the SSVM is running and is in "Up" state in the System VMs section of Infrastructure tab and that you are able to open up a console session of it from the browser. If that is working go on to run the ssvm-check.sh script mentioned above which systematically checks each and every point of operation that SSVM executes. Even if console session cannot be opened up, you can still ssh using the link local IP address of SSVM which can be accessed by opening up details of SSVM and than execute the script. If it says, it cannot communicate with Management Server at port 8250, I recommend you check the iptables rules of management server and make sure all traffic is allowed at port 8250. A custom command to check the same is nc -v <mngmnt-server-ip> 8250. You can do a simple search and learn how to add port 8250 in your iptables rules if that is not opened. Next, you mentioned you used CentOS 6.8, so it probably uses older versions of nfs, so execute exportfs -a in your NFS server to make sure all the NFS shares are properly exported and there are no errors. I would recommend that you wait for the downloading status of CentOS 5.5 no GUI kvm template to be complete and its Ready status shown as 'Yes' before you start importing your own templates and ISOs to execute on VMs. Finally, if your ssvm-check.sh script shows everything is good and the download still does not start, you can run the command: service cloud restart and actually check if the service has gotten a PID using service cloud status as the older versions of system vm templates sometimes need us to manually start the cloud service using service cloud start even after the restart command. Restarting the cloud service in SSVM triggers the restart of downloading of all remaining templates and ISOs. Side note: the system VMs uses a Debian kernel if you want to do some more troubleshooting. Hope this helps.

Set ACE target launcher's internal server port to Bluemix's random port number

I am currently trying to deploy and run an Ace Target on an IBM Bluemix CloudFoundry Java/Liberty buildpack but without much success.
Symptoms:
During the deploy/re-stage procedure, the ACE Launcher's internal server is started with a preset port number (default or set manually via cfg) whilst the Bluemix container is dynamically assigned a random port number. Port binding between both entities times-out and launch procedure fails.
Option:
The Bluemix random port is accessible via a sys. env. variable $PORT.
Question:
What would be the best/simplest approach to assign the freshly generated Bluemix's random port number to ACE Launcher's internal server?
You can start the ACE launcher like this:
java -jar org.apache.ace.agent.launcher.felix.jar -v -s http://server:${PORT}
Where:
-v -- verbose, mainly so you can better diagnose what is going on
-s URL -- provides the launcher with the URL (which includes the port) of the server
It depends on how ACE takes arguments. The documentation for the Java Buildpack explains how to provide custom JVM arguments that may be able to provide ACE with what it needs (perhaps -s http://localhost:$PORT as suggested by others).

Proxy setting in gsutil tool

I use gsutil tool for download archives from Google Storage.
I use next CMD command:
python c:\gsutil\gsutil cp gs://pubsite_prod_rev_XXXXXXXXXXXXX/YYYYY/*.zip C:\Tmp\gs
Everything works fine, but if I try to run that command from corporate proxy, I receive error:
Caught socket error, retrying: [Errno 10051] A socket operation was attempted to an unreachable network
I tried several times to set the proxy settings in .boto file, but all to no avail.
Someone faced with such a problem?
Thanks!
Please see the section "I'm connecting through a proxy server, what do I need to do?" at https://developers.google.com/storage/docs/faq#troubleshooting
Basically, you need to configure the proxy settings in your .boto file, and you need to ensure that your proxy allows traffic to accounts.google.com as well as to *.storage.googleapis.com.
A change was just merged into github yesterday that fixes some of the proxy support. Please try it out, or specifically, overwrite this file with your current copy:
https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/util.py
I believe I am having the same problem with the proxy settings being ignored under Linux (Ubuntu 12.04.4 LTS) and gsutils 4.2 (downloaded today).
I've been watching tcpdump on the host to confirm that gsutils is attempting to directly route to Google IPs instead of to my proxy server.
It seems that on the first execution of a simple command like "gsutil -d ls" it will use my proxy settings specified .boto for the first POST and then switch back to attempting to route directly to Google instead of my proxy server.
Then if I CTRL-C and re-run the exact same command, the proxy setting is no longer used at all. This difference in behaviour baffles me. If I wait long enough, I think it will work for the initial request again so this suggests some form on caching taking place. I'm not 100% of this behaviour yet because I haven't been able to predict when it occurs.
I also noticed that it always first tries to connect to 169.254.169.254 on port 80 regardless of proxy settings. A grep shows that it's hardcoded into oauth2_client.py, test_utils.py, layer1.py, and utils.py (under different subdirectories of the gsutils root).
I've tried setting the http_proxy environment variable but it appears that there is code that unsets this.

pactl called from systemd service always reports "pa_context_connect() failed connection refused"

I've setup a systemd service file to perform some pactl operations at system startup for a test process. While the commands work fine when performed from a terminal I always get "pa_context_connect() failed connection refused" when running the same script from the systemd service by starting the service. I'm also using the 'User=' directive in the service file to ensure that the auto-login user matches the user used to run the service commands.
I've read that this is somehow related to the pulseaudio session not being valid in the environmentless context of the systemd service but I haven't been able to figure that out further.
Although it might be a bit late for whatever project you might have been be working on, here's what I found out.
The regular systemctl, the PID 1, indeed cannot access the environement variables of the current user when launching a service. Since pactl relies on those variables to find what instance of pulseaudio it needs to connect to, it is unable to do so when launched though a service. I'm sure there's a fairly dirty workaround for this, but I found something better.
Most systems have a second instance of systemd running in userspace (accessible through systemctl --user while not connected as root). This instance indeed can access all the userspace environment variables and I found that pactl doesn't return any errors when being called either directly or through a script.
All you need to do is put your services in either /usr/lib/systemd/user/, /etc/systemd/user/, or ~/.config/systemd/user/, remove the User= directive from your service file and run systemctl --user daemon-reload as a regular user to make sure they've been detected.