I have created an orion context broker instance (FIWARE cloud portal image) that seems like with pepProxy installed. When I run "service pepProxy start" here is the feedback from the terminal:
Starting...
pepProxy dead but pid file exists
Starting pepProxy... Success
When check the status with "service pepProxy status", it says:
pepProxy dead but pid file exists
What can be done?
It seems something is preventing the PEP Proxy from starting. Have you checked "/var/log/pepProxy"? Please, also check what port the PEP Proxy is trying to bind to (usually 1026) and whether there is any other process actually running on that port (maybe the Context Broker is already running in that standard port).
In case the problem is a port conflict, you should change the Context Broker port in /etc/sysconfig/contextBroker or the one of the PEP Proxy in /etc/sysconfig/pepProxy.
If that's not the problem we would need some more information in order to help you.
Related
I have a on prem Service Fabric 3 Node cluster running 8.2.1571.9590. This has been running for months without any problems.
The cluster node were rebooted overnight, as part of operating system patching, and the cluster will now not establish connections.
If I run connect-servicefabriccluster -verbose, I get the timeout error
System.Fabric.FabricTransientException: Could not ping any of the provided Service Fabric gateway endpoints.
Looking at the processes running I can see all the expected processes start and are stable with the exception of FabricGateway.exe which goes into a boot loop cycle.
I have confirmed that
I can do a TCP-IP Ping between the nodes in the cluster
I can do a PowerShell remote session between the nodes in the cluster
No cluster certs have expired.
Any suggestions as to how to debug this issue?
Actual Problem
On checking the Windows event logs Admin Tools > Event Viewer > Application & Service Logs > Microsoft Service Fabric > Admin I could see errors related to the startup of the FabricGateway process. The errors and warnings come in repeated sets with the following basic order
CreateSecurityDescriptor: failed to convert mydomain\admin-old to SID: NotFound
failed to set security settings to { provider=Negotiate protection=EncryptAndSign remoteSpn='' remote='mydomain\admin-old, mydomain\admin-new, mydomain\sftestuser, ServiceFabricAdministrators, ServiceFabricAllowedUsers' isClientRoleInEffect=true adminClientIdentities='mydomain\admin-old, mydomain\admin-new, ServiceFabricAdministrators' claimBasedClientAuthEnabled=false }: NotFound
Failed to initialize external channel error : NotFound
EntreeService proxy open failed : NotFound
FabricGateway failed with error code = S_OK
client-sfweb1:19000/[::1]:19000: error = 2147943625, failureCount=9082. This is conclusive that there is no listener. Connection failure is expected if listener was never started, or listener / its process was stopped before / during connecting. Filter by (type~Transport.St && ~"(?i)sfweb1:19000") on the other node to get listener lifecycle, or (type~Transport.GlobalTransportData) for all listeners
Using Windows Task Manager (or similar tool) you would see the Fabricgateway.exe process was starting and terminating every few seconds.
The net effect of this was the Service Fabric cluster communication could not be established.
Solution
The problem was the domain account mydomain\admin-old (an old historic account, not use for a long period) had been deleted in the Active Directory, so no SID for the account could be found. This failure was causing then loop, even though the admin accounts were valid.
The fix was to remove this deleted ID from the cluster nodes current active setting.xml file. The process I used was
RDP onto a cluster node VM
Stop the service fabric service
Find the current service fabric cluster configuration e.g. the newest folder on the form D:\SvcFab\VM0\Fabric\Fabric.Config.4.123456
Edit the settings.xml and remove the deleted account mydomain\admin-old from the AdminClientIdentities block, so I ended up with
<Section Name="Security">
<Parameter Name="AdminClientIdentities" Value="mydomain\admin-new" />
...
Once the file is saved, restart the service fabric service, it should start as normal. Remember,it will take a minute or two startup
Repeat the process on the other nodes in the cluster.
Once completed the cluster starts and operates as expected
I have some dotnet core applications running as microservices into GKE (google kubernetes engine).
Usually everything work right, but sometimes, if my microservice isn't in use, something happen that my application shutdown (same behavior as CTRL + C on terminal).
I know that it is a behavior of kubernetes, but if i request application that is not running, my first request return the error: "No such Device or Address" or timeout error.
I will post some logs and setups:
The key to what's happening is this logged error:
TNS: Connect timeout occured ---> OracleInternal.Network....
Since your application is not used, the Oracle database just shuts down it's idle connection. To solve this problem, you can do two things:
Handle the disconnection inside your application to just reconnect.
Define a livenessProbe to restart the pod automatically once the application is down.
Make your application do something with the connection from time to time -> this can be done with a probe too.
Configure your Oracle database not to close idle connections.
Is there a method for specific IP address while setting up Orion Context Broker using any of those methods mentioned here? Now I'm running it as a docker container simultaneously with mongodb. I tried modifying docker-compose file, however couldn't find any network settings for orion.
I recently came across many difficulties with Freeboard and OCB connection and it may be because of OCB running on default loopback interface. It was the same deal when fiware's accumulator server started on that interface and after change to other available the connection was established.
You can use the -localIp CLI option in order to specify on which IP interface the broker listens to. By default it listens to all the interfaces.
I have installed the latest mongodb mms agent (6.5.0.456) on ubuntu 16.04 and initialised the replicaset. Hence I am running a single node replicaset with the monitoring agent enabled. The agent works fine, however it does not seem to actually find the replicaset member:
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Iterate:170] Received new configuration: Primary agent, Assigned 0 out of 0 plus 0 chunk monitor(s)
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Iterate:182] Nothing to do. Either the server detected the possibility of another monitoring agent running, or no Hosts are configured on the Group.
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Run:199] Done. Sleeping for 55s...
[2018/05/26 18:30:30.222] [discovery.monitor.info] [components/discovery.go:discover:746] Performing discovery with 0 hosts
[2018/05/26 18:30:30.222] [discovery.monitor.info] [components/discovery.go:discover:803] Received discovery responses from 0/0 requests after 891ns
I can see two processes for monitor agents:
/bin/sh -c /usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config >> /var/log/mongodb-mms/monitoring-agent.log 2>&1
/usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config
However if I terminate one, it also tears down the other, so I do not think that is the problem.
So, question is what is the Group that the agent is referring to. Where is that configured? Or how do I find out which Group the agent refers to and how do I check if the group is configured correctly.
The rs.config() looks fine, with one replicaset member, which has a host field, which looks just fine. I can use that value to connect to the instance using the mongo command. no auth is configured.
EDIT
It kind of looks that the cloud manager now needs to be configured with the seed host. Then it starts to discover all the other nodes in the replicaset. This seems to be different to pre-cloud-manager days, where the agent was able to track the rs - if I remember correctly... Probably there still is a way to get this done easier, so I am leaving this question open for now...
So, question is what is the Group that the agent is referring to. Where is that configured? Or how do I find out which Group the agent refers to and how do I check if the group is configured correctly.
Configuration values for the Cloud Manager agent (such as mmsGroupId and mmsApiKey) are set in the config file, which is /etc/mongodb-mms/monitoring-agent.config by default. The agent needs this information in order to communicate with the Cloud Manager servers.
For more details, see Install or Update the Monitoring Agent and Monitoring Agent Configuration in the Cloud Manager documentation.
It kind of looks that the cloud manager now needs to be configured with the seed host. Then it starts to discover all the other nodes in the replicaset.
Unless a MongoDB process is already managed by Cloud Manager automation, I believe it has always been the case that you need to add an existing MongoDB process to monitoring to start the process of initial topology discovery. Once a deployment is monitored, any changes in deployment membership should automatically be discovered by the Cloud Manager agent.
Production employments should have authentication and access control enabled, so in addition to adding a seed hostname and port via the Cloud Manager UI you usually need to provide appropriate credentials.
I connect three servers to form an HPC cluster using condor as a middleware when I run the command condor_status from the central manager it does not shows the other nodes I can run jobs in the central manager and connect to the other nodes via SSH but it seems that there is something missing in condor configuration files where I set the central manager as condor host and allows writing and reading for everyone. I keep the daemon MASTER, STARTD in the daemon list for the worker nodes.
When I run condor_status in the central manager it just show the central manager and when I run it on the compute node it give me the error "CEDAR:6001:Failed to connect to" followed by the central manager IP and port number.
I manage to solve it. The problem was in the central manager's firewall (in my case it was iptables) which was running.
So, when I stopped the firewall (su -c "service iptables stop") all nodes appeared normally, typing condor_status".
The firewall status can be checked using "service iptables status".
There are a number of things that could be going on here. I'd suggest you follow this tutorial and see if it resolves your problems -
http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/
In my case the service "condor.exe" was not running on the server. I had stopped manually. I just start it and every thing went fine.