jgroups-kubernetes BindException - kubernetes

I used infinispan in the spring boot project, and I know that jgroups complete node communication and discovery in infinispan. I can already do multi-instance mutual discovery on the dockers. Now the problem is on Kubernetes.
The jgroups configuration file default-jgroups-kubernetes.xml I used was found in the official package of infinispan. I only modified tcp.port and the KUBE_PING tag:
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">
<TCP bind_addr="${jgroups.tcp.address:match-interface:eth.*}"
bind_port="${jgroups.tcp.port:30001}"
enable_diagnostics="false"
thread_naming_pattern="pl"
send_buf_size="640k"
sock_conn_timeout="300"
bundler_type="no-bundler"
logical_addr_cache_expiration="360000"
thread_pool.min_threads="${jgroups.thread_pool.min_threads:0}"
thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"
thread_pool.keep_alive_time="60000"
/>
<kubernetes.KUBE_PING
port_range="3000"
namespace="default"
masterProtocol="https"
masterHost="https://192.1.5.110:32305"
masterPort="32305"
/>
<MERGE3 min_interval="10000"
max_interval="30000"
/>
<FD_SOCK />
<!-- Suspect node `timeout` to `timeout + timeout_check_interval` millis after the last heartbeat -->
<FD_ALL timeout="10000"
interval="2000"
timeout_check_interval="1000"
/>
<VERIFY_SUSPECT timeout="1000"/>
<pbcast.NAKACK2 use_mcast_xmit="false"
xmit_interval="100"
xmit_table_num_rows="50"
xmit_table_msgs_per_row="1024"
xmit_table_max_compaction_time="30000"
resend_last_seqno="true"
/>
<UNICAST3 xmit_interval="100"
xmit_table_num_rows="50"
xmit_table_msgs_per_row="1024"
xmit_table_max_compaction_time="30000"
/>
<pbcast.STABLE stability_delay="500"
desired_avg_gossip="5000"
max_bytes="1M"
/>
<pbcast.GMS print_local_addr="false"
join_timeout="${jgroups.join_timeout:5000}"
/>
<MFC max_credits="2m"
min_threshold="0.40"
/>
<FRAG3/>
</config>
By the way, I have introduced the dependencies of jgroups-kubernetes
I used the above configuration, and then the following exception:
...
Caused by: java.net.BindException: no port available in range [30001 .. 30051] (bind_addr=xxxxx%eth0)
at org.jgroups.util.Util.createServerSocket(Util.java:3512)
...
Where xxxxx represents an IPv6 address
I don't know how to solve this exception. I tried to change the TCP port to 7800, and then the range is unchanged. The result is still the above exception, but the information becomes [7800..7850]
Looking forward to your help.Thanks!

Try to use either IPv4 or IPv6, but not a mix of both. IPv4 can be forced by setting system property java.net.preferIPv4Stack to true.
A port_range of 3000 doesn't make any sense, use a smaller number such as 10, depending on how many containers you're running in the same pod (1 might work, too).
In masterHost you're also setting the port, that's not needed and may actually fail. As a matter of fact, none of the 3 master* properties need to be set, as Kubernetes sets these properties itself.

Related

Getting 502 http status code on a Service Fabric stateless service deployed on lesser node than configured VM Scaleset nodes

We have deployed various stateless services on a 5 node cluster with -1 as instance count as Singleton partition scheme. Recently, we decided to deploy the few stateless services only on 3 nodes out of 5 by defining instance count as 3.
After deployment, the stateless services with -1 as instance count are working and responding with HttpStatus 200 Ok. however, a stateless service deployed with 3 instance node count are intermittently responding with HttpStatus 502 with following error (from fiddler):
The connection to 'someservername.centralus.cloudapp.azure.com' failed.
System.Security.SecurityException Failed to negotiate HTTPS connection with server.fiddler.network.https> HTTPS handshake to someservername.centralus.cloudapp.azure.com failed. System.IO.IOException Authentication failed because the remote party has closed the transport stream.
Below is the application manifest of deployed application for reference
<ApplicationManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ApplicationTypeName="MyService.ServiceFabricType" ApplicationTypeVersion="1.0.0.1.1" ManifestId="8747c387-a7fc-4b05-b189-b1c01958f066" xmlns="http://schemas.microsoft.com/2011/01/fabric">
<Parameters>
<Parameter Name="My_Service_ASPNETCORE_ENVIRONMENT" DefaultValue="" />
<Parameter Name="My_Service_InstanceCount" DefaultValue="3" />
</Parameters>
<ServiceManifestImport>
<ServiceManifestRef ServiceManifestName="MyServicePkg" ServiceManifestVersion="1.0.0.1.1" />
<ConfigOverrides />
<EnvironmentOverrides CodePackageRef="code">
<EnvironmentVariable Name="ASPNETCORE_ENVIRONMENT" Value="[My_Service_ASPNETCORE_ENVIRONMENT]" />
</EnvironmentOverrides>
</ServiceManifestImport>
<DefaultServices>
<Service Name="MyService" ServicePackageActivationMode="ExclusiveProcess">
<StatelessService ServiceTypeName="MyServiceType" InstanceCount="[My_Service_InstanceCount]">
<SingletonPartition />
</StatelessService>
</Service>
</DefaultServices>
</ApplicationManifest>
and service manifest :
<ServiceManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ManifestId="59ea463b-5e4c-44f5-8982-5658b35d6c89" Name="MyServicePkg" Version="1.0.0.1.1" xmlns="http://schemas.microsoft.com/2011/01/fabric">
<ServiceTypes>
<StatelessServiceType ServiceTypeName="MyService" />
</ServiceTypes>
<CodePackage Name="Code" Version="1.0.0.1.1">
<EntryPoint>
<ExeHost>
<Program>MyService.exe</Program>
<WorkingFolder>CodePackage</WorkingFolder>
</ExeHost>
</EntryPoint>
<EnvironmentVariables>
<EnvironmentVariable Name="ASPNETCORE_ENVIRONMENT" Value="" />
</EnvironmentVariables>
</CodePackage>
<ConfigPackage Name="Config" Version="1.0.0.1.1" />
<Resources>
<Endpoints>
<Endpoint Name="ServiceEndpoint" Protocol="https" Type="Input" Port="9226" />
</Endpoints>
</Resources>
</ServiceManifest>
Is it mandatory to deploy a stateless service all nodes in service fabric?
If no, how the above scenario can be configured?
Note - Currently Service Fabric is configured with Silver durability tier and with reverse proxy in disabled state. Also did not get any relevant solution from this azure documentation.

"Failed to create endpoint [XXX] on network because of a duplicate name" on local 5-node Service Fabric cluster

I have a local Service Fabric cluster of 5 nodes and I have a problem deploying my application on all nodes. When set to "1 Node", the cluster works fine. When set to "5 Nodes" it gives an error on all nodes but one. This is the error/warning message:
Error event: SourceId='System.Hosting', Property='CodePackageActivation:Code:EntryPoint:131919316927686034'.
There was an error during CodePackage activation.System.Fabric.FabricException (-2147017731)
Failed to start Container. ContainerName=sf-0-28e0002f-fd7d-412c-81b8-b78ca5339ce4_865991cc-9c36-493f-9b3d-95f6eba43851, ApplicationId=Proton.SFType_App0, ApplicationName=fabric:/Proton.SF.
DockerRequest returned StatusCode=InternalServerError with ResponseBody={"message":"failed to create endpoint sf-0-28e0002f-fd7d-412c-81b8-b78ca5339ce4_865991cc-9c36-493f-9b3d-95f6eba43851 on network nat: HNS failed with error : You were not connected because a duplicate name
The error message looks truncated. The application loads up fine, but on one node only. Am I missing something in 5-node configuration? The application we are deploying is a container which runs a .NET Core app. I've attached a screenshot of the error in Service Fabric Explorer.
Error Screenshot
ServiceManifest.xml
<?xml version="1.0" encoding="utf-8"?>
<ServiceManifest Name="Proton.TestingPkg"
Version="1.0.0"
xmlns="http://schemas.microsoft.com/2011/01/fabric"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ServiceTypes>
<!-- This is the name of your ServiceType.
The UseImplicitHost attribute indicates this is a guest service. -->
<StatelessServiceType ServiceTypeName="Proton.TestingType" UseImplicitHost="true" />
</ServiceTypes>
<!-- Code package is your service executable. -->
<CodePackage Name="Code" Version="1.0.0">
<EntryPoint>
<!-- Follow this link for more information about deploying Windows containers to Service Fabric: https://aka.ms/sfguestcontainers -->
<ContainerHost>
<ImageName>proton.azurecr.io/protontesting:latest</ImageName>
</ContainerHost>
</EntryPoint>
<!-- Pass environment variables to your container: -->
<!--
<EnvironmentVariables>
<EnvironmentVariable Name="VariableName" Value="VariableValue"/>
</EnvironmentVariables>
-->
</CodePackage>
<!-- Config package is the contents of the Config directoy under PackageRoot that contains an
independently-updateable and versioned set of custom configuration settings for your service. -->
<ConfigPackage Name="Config" Version="1.0.0" />
<Resources>
<Endpoints>
<!-- This endpoint is used by the communication listener to obtain the port on which to
listen. Please note that if your service is partitioned, this port is shared with
replicas of different partitions that are placed in your code. -->
<Endpoint Name="Proton.TestingTypeEndpoint" Port="8001" />
</Endpoints>
</Resources>
</ServiceManifest>
ApplicationManifest.xml
<?xml version="1.0" encoding="utf-8"?>
<ApplicationManifest ApplicationTypeName="Proton.SFType"
ApplicationTypeVersion="1.0.0"
xmlns="http://schemas.microsoft.com/2011/01/fabric"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Parameters>
<Parameter Name="Proton.Testing_InstanceCount" DefaultValue="-1" />
</Parameters>
<!-- Import the ServiceManifest from the ServicePackage. The ServiceManifestName and ServiceManifestVersion
should match the Name and Version attributes of the ServiceManifest element defined in the
ServiceManifest.xml file. -->
<ServiceManifestImport>
<ServiceManifestRef ServiceManifestName="Proton.TestingPkg" ServiceManifestVersion="1.0.0" />
<ConfigOverrides />
<Policies>
<ContainerHostPolicies CodePackageRef="Code">
<!-- See https://aka.ms/I7z0p9 for how to encrypt your repository password -->
<RepositoryCredentials AccountName="ProtonCluster" Password="XXX" PasswordEncrypted="false" />
<PortBinding ContainerPort="80" EndpointRef="Proton.TestingTypeEndpoint" />
</ContainerHostPolicies>
</Policies>
</ServiceManifestImport>
<DefaultServices>
<!-- The section below creates instances of service types, when an instance of this
application type is created. You can also create one or more instances of service type using the
ServiceFabric PowerShell module.
The attribute ServiceTypeName below must match the name defined in the imported ServiceManifest.xml file. -->
<Service Name="Proton.Testing" ServicePackageActivationMode="ExclusiveProcess">
<StatelessService ServiceTypeName="Proton.TestingType" InstanceCount="[Proton.Testing_InstanceCount]">
<SingletonPartition />
</StatelessService>
</Service>
</DefaultServices>
</ApplicationManifest>

Hibernate Search +infinispan +jgroups back end slave lock issue

I am new to hibernate search. We decided to user hibernate search for my application. We choose jgroups as a backend. Here is my configuration file.
<?xml version="1.0" encoding="UTF-8"?>
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:7.0
http://www.infinispan.org/schemas/infinispan-config-7.0.xsd
urn:infinispan:config:store:jdbc:7.0 http://www.infinispan.org/schemas/infinispan-cachestore-jdbc-config-7.0.xsd"
xmlns="urn:infinispan:config:7.0"
xmlns:jdbc="urn:infinispan:config:store:jdbc:7.0">
<!-- *************************** -->
<!-- System-wide global settings -->
<!-- *************************** -->
<jgroups>
<!-- Note that the JGroups transport uses sensible defaults if no configuration
property is defined. See the JGroupsTransport javadocs for more flags.
jgroups-udp.xml is the default stack bundled in the Infinispan core jar: integration
and tuning are tested by Infinispan. -->
<stack-file name="default-jgroups-tcp" path="proform-jgroups.xml" />
</jgroups>
<cache-container name="HibernateSearch" default-cache="default" statistics="false" shutdown-hook="DONT_REGISTER">
<transport stack="default-jgroups-tcp" cluster="venkatcluster"/>
<!-- Duplicate domains are allowed so that multiple deployments with default configuration
of Hibernate Search applications work - if possible it would be better to use JNDI to share
the CacheManager across applications -->
<jmx duplicate-domains="true" />
<!-- *************************************** -->
<!-- Cache to store Lucene's file metadata -->
<!-- *************************************** -->
<replicated-cache name="LuceneIndexesMetadata" mode="SYNC" remote-timeout="25000">
<transaction mode="NONE"/>
<state-transfer enabled="true" timeout="480000" await-initial-transfer="true" />
<indexing index="NONE" />
<eviction max-entries="-1" strategy="NONE"/>
<expiration max-idle="-1"/>
<persistence passivation="false">
<jdbc:string-keyed-jdbc-store preload="true" fetch-state="true" read-only="false" purge="false">
<property name="key2StringMapper">org.infinispan.lucene.LuceneKey2StringMapper</property>
<jdbc:connection-pool connection-url="jdbc:mysql://localhost:3306/entityindex" driver="com.mysql.jdbc.Driver" password="pf_user1!" username="pf_user"></jdbc:connection-pool>
<jdbc:string-keyed-table drop-on-exit="false" create-on-start="true" prefix="ISPN_STRING_TABLE">
<jdbc:id-column name="ID" type="VARCHAR(255)"/>
<jdbc:data-column name="DATA" type="BLOB"/>
<jdbc:timestamp-column name="TIMESTAMP" type="BIGINT"/>
</jdbc:string-keyed-table>
</jdbc:string-keyed-jdbc-store>
</persistence>
</replicated-cache>
<!-- **************************** -->
<!-- Cache to store Lucene data -->
<!-- **************************** -->
<distributed-cache name="LuceneIndexesData" mode="SYNC" remote-timeout="25000">
<transaction mode="NONE"/>
<state-transfer enabled="true" timeout="480000" await-initial-transfer="true" />
<indexing index="NONE" />
<eviction max-entries="-1" strategy="NONE"/>
<expiration max-idle="-1"/>
<persistence passivation="false">
<jdbc:string-keyed-jdbc-store preload="true" fetch-state="true" read-only="false" purge="false">
<property name="key2StringMapper">org.infinispan.lucene.LuceneKey2StringMapper</property>
<jdbc:connection-pool connection-url="jdbc:mysql://localhost:3306/entityindex" driver="com.mysql.jdbc.Driver" password="pf_user1!" username="pf_user"></jdbc:connection-pool>
<jdbc:string-keyed-table drop-on-exit="false" create-on-start="true" prefix="ISPN_STRING_TABLE">
<jdbc:id-column name="ID" type="VARCHAR(255)"/>
<jdbc:data-column name="DATA" type="BLOB"/>
<jdbc:timestamp-column name="TIMESTAMP" type="BIGINT"/>
</jdbc:string-keyed-table>
</jdbc:string-keyed-jdbc-store>
</persistence>
</distributed-cache>
<!-- ***************************** -->
<!-- Cache to store Lucene locks -->
<!-- ***************************** -->
<replicated-cache name="LuceneIndexesLocking" mode="SYNC" remote-timeout="25000">
<transaction mode="NONE"/>
<state-transfer enabled="true" timeout="480000" await-initial-transfer="true" />
<indexing index="NONE" />
<eviction max-entries="-1" strategy="NONE"/>
<expiration max-idle="-1"/>
<persistence passivation="false">
<jdbc:string-keyed-jdbc-store preload="true" fetch-state="true" read-only="false" purge="false">
<property name="key2StringMapper">org.infinispan.lucene.LuceneKey2StringMapper</property>
<jdbc:connection-pool connection-url="jdbc:mysql://localhost:3306/entityindex" driver="com.mysql.jdbc.Driver" password="pf_user1!" username="pf_user"></jdbc:connection-pool>
<jdbc:string-keyed-table drop-on-exit="false" create-on-start="true" prefix="ISPN_STRING_TABLE">
<jdbc:id-column name="ID" type="VARCHAR(255)"/>
<jdbc:data-column name="DATA" type="BLOB"/>
<jdbc:timestamp-column name="TIMESTAMP" type="BIGINT"/>
</jdbc:string-keyed-table>
</jdbc:string-keyed-jdbc-store>
</persistence>
</replicated-cache>
</cache-container>
This is my jgroups-file:
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups
http://www.jgroups.org/schema/JGroups-3.6.xsd">
<TCP bind_addr="${jgroups.tcp.address:127.0.0.1}"
bind_port="${jgroups.tcp.port:7801}"
enable_diagnostics="false"
thread_naming_pattern="pl"
send_buf_size="640k"
sock_conn_timeout="300"
thread_pool.min_threads="${jgroups.thread_pool.min_threads:2}"
thread_pool.max_threads="${jgroups.thread_pool.max_threads:30}"
thread_pool.keep_alive_time="60000"
thread_pool.queue_enabled="false"
internal_thread_pool.min_threads=
"${jgroups.internal_thread_pool.min_threads:5}"
internal_thread_pool.max_threads=
"${jgroups.internal_thread_pool.max_threads:20}"
internal_thread_pool.keep_alive_time="60000"
internal_thread_pool.queue_enabled="true"
internal_thread_pool.queue_max_size="500"
oob_thread_pool.min_threads="${jgroups.oob_thread_pool.min_threads:20}"
oob_thread_pool.max_threads="${jgroups.oob_thread_pool.max_threads:200}"
oob_thread_pool.keep_alive_time="60000"
oob_thread_pool.queue_enabled="false"
/>
<S3_PING access_key=""
secret_access_key=""
location="mybucket"
/>
<MERGE3 min_interval="10000"
max_interval="30000"
/>
<FD_SOCK />
<FD_ALL timeout="60000"
interval="15000"
timeout_check_interval="5000"
/>
<VERIFY_SUSPECT timeout="5000" />
<pbcast.NAKACK2 use_mcast_xmit="false"
xmit_interval="1000"
xmit_table_num_rows="50"
xmit_table_msgs_per_row="1024"
xmit_table_max_compaction_time="30000"
max_msg_batch_size="100"
resend_last_seqno="true"
/>
<UNICAST3 xmit_interval="500"
xmit_table_num_rows="50"
xmit_table_msgs_per_row="1024"
xmit_table_max_compaction_time="30000"
max_msg_batch_size="100"
conn_expiry_timeout="0"
/>
<pbcast.STABLE stability_delay="500"
desired_avg_gossip="5000"
max_bytes="1M"
/>
<pbcast.GMS print_local_addr="false"
join_timeout="15000"
/>
<MFC max_credits="2m"
min_threshold="0.40"
/>
<FRAG2 />
</config>
This is my flush-tcp file:-
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:org:jgroups"
xsi:schemaLocation="urn:org:jgroups
http://www.jgroups.org/schema/jgroups.xsd">
<TCP bind_port="7801"/>
<S3_PING access_key=""
secret_access_key=""
location=""
/>
<MERGE3/>
<FD_SOCK/>
<FD/>
<VERIFY_SUSPECT/>
<pbcast.NAKACK2 use_mcast_xmit="false"/>
<UNICAST3/>
<pbcast.STABLE/>
<pbcast.GMS/>
<MFC/>
<FRAG2/>
<pbcast.STATE_TRANSFER/>
<pbcast.FLUSH timeout="0"/>
</config>
These are hibernate settings:
propertyMap.put("hibernate.search.default.directory_provider",
"infinispan");
propertyMap.put("hibernate.search.lucene_version",
KeywordUtil.LUCENE_4_10_4);
propertyMap.put("hibernate.search.infinispan.configuration_resourcename",
"hibernate-search-infinispan-config.xml");
propertyMap.put("hibernate.search.default.​worker.execution","sync");
propertyMap.put("hibernate.search.default.​worker.backend","jgroups");
propertyMap.put("hibernate.search.services.jgroups.configurationFile",
"flush-tcp.xml");
propertyMap.put("hibernate.search.default.exclusive_index_use","true");
Initially we start the cluster with one node with the above configuration. Depends on the load we will add nodes to the cluster. This is our architecture.
Assume that 10-00 AM we started the cluster. only node will become master node. and everthing is fine.
10-10 Am we added one more node to the cluster with slight config change. Here is the change
propertyMap.put("hibernate.search.default.exclusive_index_use","false");
I created an index through second node. Then the locking error comes up. here is the error.
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
org.infinispan.lucene.locking.BaseLuceneLock#46578a74
Problem:- In theory second node should become slave and it should never acquire lock on the index. It should indicate the master node to create the index through jgroups channel. But its not happening. Can one of you please help me on this. Our production system is in problem. Please help me on this.
Problem:- In theory second node should become slave and it should
never acquire lock on the index. It should indicate the master node to
create the index through jgroups channel.
There may be two problems here.
1. Using different values for exclusive_index_use
Maybe someone else can confirm, but unless your new node only deals with a completely separate persistent unit with completely different indexes, I doubt it's a good idea to use a different value for exclusive_index_use on different nodes.
exclusive_index_use is not about not acquiring locks, it's more about releasing them as soon as possible (after each changeset). If your other nodes work in exclusive node, they will never release locks, and your new node will time out waiting for the lock.
Also note that disabling exclusive_index_use is a sure way to decrease write performance, because it requires to constantly close and open index writers. Use with caution.
And finally, as you pointed out, only one node should write to the index at any given time (the JGroups master), so disabling exclusive_index_use should not be needed in your case. There must be another problem...
2. Master/slave election
If I remember correctly, the default master/node election strategy elect a new master when you add a new node. Also, we fixed a few bugs related to dynamic master election in the latest Hibernate Search version (not released yet), so you may be affected by one of those.
You can try using the jgroupsMaster backend on your first node and jgroupsSlave on your second node. There won't be any automatic master election anymore, so you'll lose the ability to maintain service when the master node fails, but from what I understand your main concern is scaling up, so it might give you a temporary solution.
On the master node:
propertyMap.put("hibernate.search.default.​worker.backend","jgroupsMaster");
On the slave node:
propertyMap.put("hibernate.search.default.​worker.backend","jgroupsSlave");
WARNING: you will need a full restart! Keeping the current jgroups backend on your master while adding another node with the jgroupsSlave backend will result in trouble!
You may also need some configuration changes regarding your Infinispan directory, but I'm not familiar with this directory. You can check the documentation: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#jgroups-backend

What do the EndPoints configure in the ServiceManifest of an Service Fabric Service?

We have a Service Fabric Service project with multiple services: Actors, Stateful services and Stateless services combined into one ServiceManifest.
Two stateful services did not work: the constructors were called, the communicationlisteners (through remoting) were created, but the RunAsync method was not called.
After removing the endpoint listing from the ServiceManifest.xml the services started working again. But now we are left wondering why and how this works. Could someone explain?
To illustrate, the relevant section was
<Resources>
<Endpoints>
<Endpoint Name="WebServiceEndpoint" Type="Input" Protocol="http" Port="80" />
<Endpoint Name="StatelessServiceEndpoint1" Type="Input" Protocol="http" Port="10101" />
<Endpoint Name="ActorServiceEndpoint1" />
<Endpoint Name="ActorServiceReplicatorEndpoint1" />
<Endpoint Name="ActorServiceEndpoint2" />
<Endpoint Name="ActorServiceReplicatorEndpoint2" />
<Endpoint Name="ActorServiceEndpoint3" />
<Endpoint Name="ActorServiceReplicatorEndpoint3" />
<Endpoint Name="ActorServiceEndpoint4" />
<Endpoint Name="ActorServiceReplicatorEndpoint4" />
<Endpoint Name="StatefulServiceEndpoint1" Type="Input" Protocol="http" />
<Endpoint Name="StatefulServiceReplicatorEndpoint1" />
<Endpoint Name="StatefulServiceEndpoint2" Type="Input" Protocol="http" />
<Endpoint Name="StatefulServiceReplicatorEndpoint2" />
<Endpoint Name="StatelessServiceEndPoint2" Type="Input" Protocol="http" />
</Endpoints>
</Resources>
After changing it to this
<Resources>
<Endpoints>
<Endpoint Name="WebServiceEndpoint" Type="Input" Protocol="http" Port="80" />
<Endpoint Name="StatelessServiceEndpoint1" Protocol="http" />
<Endpoint Name="ActorServiceReplicatorEndpoint1" />
<Endpoint Name="ActorServiceReplicatorEndpoint2" />
<Endpoint Name="ActorServiceReplicatorEndpoint3" />
<Endpoint Name="ActorServiceReplicatorEndpoint4" />
<Endpoint Name="StatefulServiceReplicatorEndpoint1" />
<Endpoint Name="StatefulServiceReplicatorEndpoint2" />
</Endpoints>
</Resources>
everything worked. But why?
EDIT
The complete ServiceManifest is this:
<?xml version="1.0" encoding="utf-8"?>
<ServiceManifest Name="Service" Version="1.0.0"
xmlns="http://schemas.microsoft.com/2011/01/fabric"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ServiceTypes>
<StatefulServiceType ServiceTypeName="ActorService1Type" />
<StatefulServiceType ServiceTypeName="ActorService1Type" HasPersistedState="true" />
<StatefulServiceType ServiceTypeName="ActorService3Type" />
<StatefulServiceType ServiceTypeName="ActorService4Type" HasPersistedState="true" />
<StatefulServiceType ServiceTypeName="StatefulService1Type" HasPersistedState="true" />
<StatefulServiceType ServiceTypeName="StatefulService2Type" HasPersistedState="true" />
<StatelessServiceType ServiceTypeName="StatelessService1Type" />
<StatelessServiceType ServiceTypeName="StatelessService2Type" />
<StatelessServiceType ServiceTypeName="WebServiceType" />
</ServiceTypes>
<CodePackage Name="Code" Version="1.0.0">
<SetupEntryPoint>
<ExeHost>
<Program>Setup.exe</Program>
</ExeHost>
</SetupEntryPoint>
<EntryPoint>
<ExeHost>
<Program>Service.exe</Program>
</ExeHost>
</EntryPoint>
</CodePackage>
<ConfigPackage Name="Config" Version="1.0.0" />
<Resources>
<Endpoints>
<Endpoint Name="WebServiceEndpoint" Type="Input" Protocol="http" Port="80" />
<Endpoint Name="StatelessServiceEndpoint1" Protocol="http" />
<Endpoint Name="ActorServiceReplicatorEndpoint1" />
<Endpoint Name="ActorServiceReplicatorEndpoint2" />
<Endpoint Name="ActorServiceReplicatorEndpoint3" />
<Endpoint Name="ActorServiceReplicatorEndpoint4" />
<Endpoint Name="StatefulServiceReplicatorEndpoint1" />
<Endpoint Name="StatefulServiceReplicatorEndpoint2" />
</Endpoints>
</Resources>
</ServiceManifest>
Hard to know what happened in your initial reported case since there's no specific error or error message to work off of, but usually this is port conflicts when you end up sharing ports that you don't really want to or which can't be shared, or port exhaustion.
The endpoint resource in your service manifest is mainly for times when:
you want SF to help with allocating communication resources like ports for your services
you want SF to help configure those resources:
Allocating some port and consistently assigning it to some set of workloads
Punching a hole in the local firewall
Setting up a URLACL (relevant to http on windows through http.sys only)
Setting up and configuring certs to enable secure communication (same caveat)
In general you're free to ignore the endpoint resource if you don't need/want the help, since SF really is expecting the service code to do it's setup. In cases where you're not really using SF's programming models then the endpoint resource is more important since it's how you communicate to SF what your endpoints are.
The behavior you get really depends on the transport you're using, as well as the OS's dynamic port range and the Application port range that you've defined, as well as what the service code actually does.
https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-fabric-settings#section-name-fabricnode
Let's say you're setting up an http communication listener in your service like so, and walk through a few examples of what happens when you define and endpoint in your manifest or don't.
1) Let's say you put nothing in your service manifest about endpoints. This means that effectively you're specifying 0 as the port in code. In this case SF's not doing any allocation or management. The port is getting assigned by the OS from the OS dynamic port range. The port that actually gets assigned will be different for each service instance listener. This should work as a reasonable default choice in most scenarios.
2) Let's say you're specifying an endpoint in the manifest and not specifying any port at all, i.e.:
<Endpoint Name="HealthServiceEndpoint"/>
In this case, the port that is assigned will come from the SF application port range. It will be the same for any service instances hosted in the same process, but different across processes. (So it matters if you are using the Exclusive or Shared process hosting model) This also presumes that reusing the port is supported for your transport. Most transports don't (like http on via Kestrel in .NET, TCP under most cases), but there are some notable examples (http.sys based http transports on Windows like WebListener/HttpSys, tcp via net.tcp in WCF probably a few others).
3) Let's say you're specifying an endpoint in the service fabric manifest and explicitly specifying 0 for that port i.e.:
<Endpoint Name="HealthServiceEndpoint" Port="0" Protocol="http"/>
In this case the port that gets assigned will be from the OS dynamic port range, and it will end up the same/shared for any service instances hosted in the same process that use that endpoint. The port will be different across processes. (So again it matters if you are using the Exclusive or Shared process hosting model)
4) Naturally if the endpoint is specified and a specific port is specified, that port will be used for all service instances both within and across processes. This somewhat implicitly assumes that such sharing is going to work, which again depends on your transport and platform, or that you're never planning on running more than one instance of the service on this node.
Other trivia:
the "transport" element mainly determines whether SF registers your url with http.sys on windows or configures certificates to secure traffic (most of this can be done within your service code or a SetupEntryPoint).
as of this writing Type is ignored (this is a holdover from an older version of SF)
PathSuffix is used to create a default uri fragment that gets appended to the IP and port assigned by the platform. This is used in cases where there's code not using SF's listener APIs that sets up some listener on a different path like /api/value, like a the code inside a container might do.

How do I configure local cluster for addtional node types

I have a cluster configuration with two Node Types specified in the ServiceManifest.xml
<?xml version="1.0" encoding="utf-8"?>
<ServiceManifest Name="MKopa.M2M.ConfigurationPkg"
Version="1.0.0"
xmlns="http://schemas.microsoft.com/2011/01/fabric"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ServiceTypes>
<!-- This is the name of your ServiceType.
This name must match the string used in RegisterServiceType call in Program.cs. -->
<StatelessServiceType ServiceTypeName="ConfigurationType">
<PlacementConstraints>(NodeType == Internal)</PlacementConstraints>
</StatelessServiceType>
</ServiceTypes>
<!-- Code package is your service executable. -->
<CodePackage Name="Code" Version="1.0.0">
<EntryPoint>
<ExeHost>
<Program>MKopa.M2M.Configuration.Service.exe</Program>
</ExeHost>
</EntryPoint>
</CodePackage>
<!-- Config package is the contents of the Config directoy under PackageRoot that contains an
independently-updateable and versioned set of custom configuration settings for your service. -->
<ConfigPackage Name="Config" Version="1.0.0" />
<Resources>
<Endpoints>
<!-- This endpoint is used by the communication listener to obtain the port on which to
listen. Please note that if your service is partitioned, this port is shared with
replicas of different partitions that are placed in your code. -->
<Endpoint Name="ServiceEndpoint" />
<Endpoint Name="HttpEndpoint" Protocol="http" Port="8081"/>
</Endpoints>
</Resources>
</ServiceManifest>
My issue that this causes the deployment to the local cluster to fail as this NodeType doesn't exist in the Local Cluster.
I've seen mention of the cluster.xml file and I've found it but making changes to it doesn't seem to have any effect. I've tried a reset, start and stop but the reset overrides the changes.
Here's is hoping that the answer is not starting the services dynamically :-)
I don't know how it works while the cluster is running, but I was able to do it by re-installing the local cluster. These were my steps:
Go to C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\
Uninstall the existing cluster by calling .\CleanCluster.ps1
Create a backup of the file C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\NonSecure\ClusterManifestTemplate.xml
Now you can adjust this file and add placement properties to every node:
<NodeType ...>
<Endpoints>...</Endpoints>
<PlacementProperties>
<Property Name="NodeType" Value="Internal" />
</PlacementProperties>
</NodeType>
Re-create the cluster by calling .\DevClusterSetup.ps1