Spymemcache- Memcache/Membase Faileover

Spymemcache- Memcache/Membase Faileover - memcached

Platform: 64 Bit windows OS, spymemcached-2.7.3.jar, J2EE
We want to use two memcache/membase servers for caching solution. We want to allocate 1 GB memory to each memcache/membase server so total we can cache 2 GB data.
We are using spymemcached java client for setting and getting data from memcache. We are not using any replication between two membase servers.
We loading memcacheClient object at the time of our J2EE application startup.
URI server1 = new URI("http://192.168.100.111:8091/pools");
URI server2 = new URI("http://127.0.0.1:8091/pools");
ArrayList<URI> serverList = new ArrayList<URI>();
serverList.add(server1);
serverList.add(server2);
client = new MemcachedClient(serverList, "default", "");
After that we are using memcacheClient to get and set value in memcache/membase server.
Object obj = client.get("spoon");
client.set("spoon", 50, "Hello World!");
Looks like memcacheClient is setting and getting and value only from server1.
If we stop server1, it fails to get/set value. Should it not use server2 in case of server1 down? Please let me know if we are doing anything wrong here...

aspymemcached java client dos not handle membase failover for particular node.
Ref : https://blog.serverdensity.com/handling-memcached-failover/
We need to handle it manually(by our code)
We can do this by using ConnectionObserver
Here is my code :
public static void main(String a[]) throws InterruptedException{
try {
URI server1 = new URI("http://192.168.100.111:8091/pools");
URI server2 = new URI("http://127.0.0.1:8091/pools");
final ArrayList<URI> serverList = new ArrayList<URI>();
serverList.add(server1);
serverList.add(server2);
final MemcachedClient client = new MemcachedClient(serverList, "bucketName", "");
client.addObserver(new ConnectionObserver() {
#Override
public void connectionLost(SocketAddress arg0) {
//method call when connection lost
for(MemcachedNode node : client.getNodeLocator().getAll()){
if(!node.isActive()){
client.shutdown();
//re init your client here, and after re-init it will connect to your secodry node
break;
}
}
}
#Override
public void connectionEstablished(SocketAddress arg0, int arg1) {
//method call when connection established
}
});
Object obj = client.get("spoon");
client.set("spoon", 50, "Hello World!");
} catch (Exception e) {
}
}

client.get() would use first available node and therefore your value would be stored/updated on one node only.
You seems to be a bit contradicting in your requirements - first you're saying that 'we want to allocate 1 GB memory to each memcache/membase server so total we can cache 2 GB data' which implies distributed cache model (particular key is stored on one node in the cache farm) and then you expect to fetch it if that node is down, which obviously won't happen.
If you need your cache farm to survive node failure without losing data cached on that node you should use replication, which is available in MemBase but obviously you would pay the price of storing the same values multiple times so your desire of '1GB per server...total 2GB of cache' won't be possible.

Related

Netty Client Connect with Server, but server does not fire channelActive/Registered

I have the following architecture in use:
- [Client] - The enduser connecting to our service.
- [GameServer] - The game server on which the game is running.
- [GameLobby] - A server that is responsible for matching Clients with a GameServer.
If we have for example 4 Clients that want to play a game and get matched to a GameLobby, then the first time all these connection succeeds properly.
However when they decide to rematch, then one of the Clients will not properly connect.
The connection between all the Clients and the GameServer happens simultaneously.
Clients that rematch first removes their current connection with the GameServer and head into the lobby again.
This connection will succeed, no errors are thrown. Even using a ChannelFuture it shows that the client connection was made properly, the following values are retrieved to show that the client thinks the connection was correct:
- ChannelFuture.isSuccess() = True
- ChannelFuture.isDone() = True
- ChannelFuture.cause() = Null
- ChannelFuture.isCancelled() = False
- Channel.isOpen() = True
- Channel.isActive() = True
- Channel.isRegistered() = True
- Channel.isWritable() = True
Thus the connection was properly made according to the Client. However on the GameServer at the SimpleChannelInboundHandler, the method ChannelRegistered/ChannelActive is never called for that specific Client. Only for the other 3 Clients.
All the 4 Clients, the GameServer, and the Lobby are running on the same IPAddress.
Since it only happens when (re)connecting again to the GameServer, I thought that is had to do with not properly closing the connection. Currently this is done through:
try {
group.shutdownGracefully();
channel.closeFuture().sync();
} catch (InterruptedException e) {
e.printStackTrace();
}
On the GameServer the ChannelUnregister is called thus this is working, and the connection is destroyed.
I have tried adding listeners to the ChannelFuture of the malfunctioning channel connection, however according to the channelFuture everything works, which is not the case.
I tried adding ChannelOptions to allow for more Clients queued to the server.
GameServer
The GameServer server is initialized as follow:
// Create the bootstrap to make this act like a server.
ServerBootstrap serverBootstrap = new ServerBootstrap();
serverBootstrap.group(bossGroup)
.channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitialisation(new ClientInputReader(gameThread)))
.option(ChannelOption.SO_BACKLOG, 1000)
.childOption(ChannelOption.SO_KEEPALIVE, true)
.childOption(ChannelOption.TCP_NODELAY, true);
bossGroup.execute(gameThread); // Executing the thread that handles all games on this GameServer.
// Launch the server with the specific port.
serverBootstrap.bind(port).sync();
The GameServer ClientInputReader
#ChannelHandler.Sharable
public class ClientInputReader extends SimpleChannelInboundHandler<Packet> {
private ServerMainThread serverMainThread;
public ClientInputReader(ServerMainThread serverMainThread) {
this.serverMainThread = serverMainThread;
}
#Override
public void channelRegistered(ChannelHandlerContext ctx) throws Exception {
System.out.println("[Connection: " + ctx.channel().id() + "] Channel registered");
super.channelRegistered(ctx);
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Packet handling
}
}
The malfunction connection is not calling anything of the SimpleChannelInboundHandler. Not even ExceptionCaught.
The GameServer ChannelInitialisation
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler channelInputReader;
public ChannelInitialisation(SimpleChannelInboundHandler channelInputReader) {
this.channelInputReader = channelInputReader;
}
#Override
protected void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline pipeline = ch.pipeline();
// every packet is prefixed with the amount of bytes that will follow
pipeline.addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
pipeline.addLast(new LengthFieldPrepender(4));
pipeline.addLast(new PacketEncoder(), new PacketDecoder(), channelInputReader);
}
}
Client
Client creating a GameServer connection:
// Configure the client.
group = new NioEventLoopGroup();
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.handler(new ChannelInitialisation(channelHandler));
// Start the client.
channel = b.connect(address, port).await().channel();
/* At this point, the client thinks that the connection was succesfully, as the channel is active, open, registered and writable...*/
ClientInitialisation:
public class ChannelInitialisation extends ChannelInitializer<SocketChannel> {
private SimpleChannelInboundHandler<Packet> channelHandler;
ChannelInitialisation(SimpleChannelInboundHandler<Packet> channelHandler) {
this.channelHandler = channelHandler;
}
#Override
public void initChannel(SocketChannel ch) throws Exception {
// prefix messages by the length
ch.pipeline().addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE, 0, 4, 0, 4));
ch.pipeline().addLast(new LengthFieldPrepender(4));
// our encoder, decoder and handler
ch.pipeline().addLast(new PacketEncoder(), new PacketDecoder(), channelHandler);
}
}
ClientHandler:
public class ClientPacketHandler extends SimpleChannelInboundHandler<Packet> {
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
super.channelActive(ctx);
System.out.println("Channel active: " + ctx.channel().id());
ctx.channel().writeAndFlush(new PacketSetupClientToGameServer());
System.out.println("Sending setup packet to the GameServer: " + ctx.channel().id());
// This is successfully called, as the client thinks the connection was properly made.
}
#Override
protected void channelRead0(ChannelHandlerContext ctx, Packet packet) {
// Reading packets.
}
}
I expect that the Client could connect properly to the server. Since the other Clients are properly connecting and the client could previously connect just fine.
TL;DR: When multiple Clients try to create a new match, there is a possibility that one, possibly more, Client(s) will not connect properly with the server, after the previous connection was closed.

For some that struggle with this issue in some way or another.
I did a workaround that allows me to continue even tho there is still a bug inside the Netty framework (as far as I am concerned). The workaround is quite simple just create a connection pool.
My solution uses a maximum of five connections inside the connection pool. If one of the connection gets no reply from the GameServer, then it is not that big of a deal, since there are four others that will have a high chance of succeeding. I know this is a bad workaround, but I could not find any information on this issue. It works and only gives a maximum delay of 5 seconds (each retry takes a second)

Distributed state-machine's zookeeper ensemble fails while processing parallel regions with error KeeperErrorCode = BadVersion

Background :
Diagram :
Statemachine uml state diagram
We have a normal state machine as depicted in diagram that monitors spring-BATCH micro-service(deployed on streams source/processor/sink design) ,for each batch that is started .
We receive sequence of REST calls to internally fire events per batch id on respective batch's machine object. i.e. per batch id the new state machine object is created .
And each machine is having n number of parallel regions(representing spring batch's chunks ) also as shown in the diagram.
REST calls made are using multi-threaded environment where 2 simultaneous calls of same batchId may come for different region Ids of BATCHPROCESSING state .
Up till now we had a single node(single installation) running of this state machine micro-service but now we want to deploy it on multiple instances ; to receive REST calls .
For this , the Distributed State Machine is what we want to introduce . We have below configuration in place for Running Distributed State Machine .
#Configuration
#EnableStateMachine
public class StateMachineUMLWayConfiguration extends
StateMachineConfigurerAdapter<String, String> {
..
..
#Override
public void configure(StateMachineModelConfigurer<String,String> model)
throws Exception {
model
.withModel()
.factory(stateMachineModelFactory());
}
#Bean
public StateMachineModelFactory<String,String> stateMachineModelFactory() {
StorehubBatchUmlStateMachineModelFactory factory =null;
try {
factory = new StorehubBatchUmlStateMachineModelFactory
(templateUMLInClasspath,stateMachineEnsemble());
} catch (Exception e) {
LOGGER.info("Config's State machine factory got exception
:"+factory);
}
LOGGER.info("Config's State machine factory method Called:"+factory);
factory.setStateMachineComponentResolver(stateMachineComponentResolver());
return factory;
}
#Override
public void configure(StateMachineConfigurationConfigurer<String,
String>
config) throws Exception {
config
.withDistributed()
.ensemble(stateMachineEnsemble());
}
#Bean
public StateMachineEnsemble<String, String> stateMachineEnsemble() throws
Exception {
return new ZookeeperStateMachineEnsemble<String, String>(curatorClient(), "/batchfoo1", true, 512);
}
#Bean
public CuratorFramework curatorClient() throws Exception {
CuratorFramework client =
CuratorFrameworkFactory.builder().defaultData(new byte[0])
.retryPolicy(new ExponentialBackoffRetry(1000, 3))
.connectString("localhost:2181").build();
client.start();
return client;
}
StorehubBatchUmlStateMachineModelFactory's build method:
#Override
public StateMachineModel<String, String> build(String batchChunkId) {
Model model = null;
try {
model = UmlUtils.getModel(getResourceUri(resolveResource(batchChunkId)).getPath());
} catch (IOException e) {
throw new IllegalArgumentException("Cannot build model from resource " + resource + " or location " + location, e);
}
UmlModelParser parser = new UmlModelParser(model, this);
DataHolder dataHolder = parser.parseModel();
ConfigurationData<String, String> configurationData = new ConfigurationData<String, String>( null, new SyncTaskExecutor(),
new ConcurrentTaskScheduler() , false, stateMachineEnsemble,
new ArrayList<StateMachineListener<String, String>>(), false,
null, null,
null, null, false,
null , batchChunkId, null,
null ) ;
return new DefaultStateMachineModel<String, String>(configurationData, dataHolder.getStatesData(), dataHolder.getTransitionsData());
}
Created new custom service interface level method in place of DefaultStateMachineService.acquireStateMachine(machineId)
#Override
public StateMachine<String, String> acquireDistributedStateMachine(String machineId, boolean start) {
synchronized (distributedMachines) {
DistributedStateMachine<String,String> distributedStateMachine = distributedMachines.get(machineId);
StateMachine<String,String> distMachineDelegateX = null;
if (distributedStateMachine == null) {
StateMachine<String, String> machine = stateMachineFactory.getStateMachine(machineId);
distributedStateMachine = (DistributedStateMachine<String, String>) machine;
}
distributedMachines.put(machineId, distributedStateMachine);
return handleStart(distributedStateMachine, start);
}
}
Problem :
Now problem is that , micro service deployed on single instance runs successfully even for events received by it are from multi threaded environment where one thread hits with the event REST call belonging to Region 1 and simultaneously other thread comes for region 2 of same batch . Machine goes ahead in synch ,with successful parallel regions' processing , till its last state i.e BATCHCOMPLETED .
Also we checked at zookeeper side that at last the BATCHCOMPLETED STATE was being recorded in node's current version.
But , besides 1st instance , when we keep same micro service app-jar deployed on some other location to treat it as a 2nd instance of micro-service that is also now running to accept event REST calls(say by listening at another tomcat port 9002) ; it fails in middle somewhere randomly . This failure happens randomly after any one of the events among parallel regions is fired and when ensemble.setState() is being called internally on state change of that event .
It gives following error:
[36mo.s.s.support.AbstractStateMachine [0;39m [2m:[0;39m Interceptors threw exception, skipping state change
org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
at org.springframework.statemachine.zookeeper.ZookeeperStateMachineEnsemble.setState(ZookeeperStateMachineEnsemble.java:241) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
at org.springframework.statemachine.ensemble.DistributedStateMachine$LocalStateMachineInterceptor.preStateChange(DistributedStateMachine.java:209) ~[spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.StateMachineInterceptorList.preStateChange(StateMachineInterceptorList.java:101) ~[spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.callPreStateChangeInterceptors(AbstractStateMachine.java:859) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.switchToState(AbstractStateMachine.java:880) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.access$500(AbstractStateMachine.java:81) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine$3.transit(AbstractStateMachine.java:335) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.handleTriggerTrans(DefaultStateMachineExecutor.java:286) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.handleTriggerTrans(DefaultStateMachineExecutor.java:211) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.processTriggerQueue(DefaultStateMachineExecutor.java:449) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.access$200(DefaultStateMachineExecutor.java:65) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor$1.run(DefaultStateMachineExecutor.java:323) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-4.3.13.RELEASE.jar!/:4.3.13.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.scheduleEventQueueProcessing(DefaultStateMachineExecutor.java:352) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.DefaultStateMachineExecutor.execute(DefaultStateMachineExecutor.java:163) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.sendEventInternal(AbstractStateMachine.java:603) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.support.AbstractStateMachine.sendEvent(AbstractStateMachine.java:218) [spring-statemachine-core-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
at org.springframework.statemachine.ensemble.DistributedStateMachine.sendEvent(DistributedStateMachine.java:108)
..skipping Lines....
Caused by: org.springframework.statemachine.StateMachineException: Error persisting data; nested exception is org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
at org.springframework.statemachine.zookeeper.ZookeeperStateMachinePersist.write(ZookeeperStateMachinePersist.java:113) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
at org.springframework.statemachine.zookeeper.ZookeeperStateMachinePersist.write(ZookeeperStateMachinePersist.java:50) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
at org.springframework.statemachine.zookeeper.ZookeeperStateMachineEnsemble.setState(ZookeeperStateMachineEnsemble.java:235) ~[spring-statemachine-zookeeper-2.0.1.RELEASE.jar!/:2.0.1.RELEASE]
... 73 common frames omitted
Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1006) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910) ~[zookeeper-3.4.8.jar!/:3.4.8--1]
at org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
Question :
1.So is the configuration mentioned above needs something more to be configured to avoid that exception mentioned above??
Because Both state-machine micro-service instances were tested with the case when they both were connecting to same instance i.e. same string .connectString("localhost:2181").build() or case when they were made to connect to different zookeeper instances(i.e. 'localhost:2181' , 'localhost:2182').
Same exception of BAD VERSION occurs during state machine ensemble's processing in both cases .
2.Also If Batches would run in parallel so their respective machines would need to be created to run in parallel at state-machine micro-service end .
So here , technically new State machine we need for new batchId , running simultaneously .
But looking at the ZookeeperStateMachineEnsemble , One znode path seems to be associated with one ensemble , whenever ensemble object is instantiated once in the main config class ("StateMachineUMLWayConfiguration") .
So is it expected to only use that singleton ensemble instance only? Can't multiple ensembles be created at run-time referencing different znode paths run in parallel to log their respective Distributed State Machine's states to their respective znode paths??
a. Because batches running in parallel would need separate znode paths to be created . Thus due to our attempt of keeping separate znode path per batch , we need separate ensemble to be instantiated per batch's machine. But that seems to be getting into the lock condition while getting connection to znode through curator client.
b. REST call fired for event triggering does not complete , as the machine it acquired is stuck in ensemble to connect .
Thanks in advance .

KafkaSpout (idle) generates a huge network traffic

After developing and executing my Storm (1.0.1) topology with a KafkaSpout and a couple of Bolts, I noticed a huge network traffic even when the topology is idle (no message on Kafka, no processing is done in bolts). So I started to comment out my topology piece by piece in order to find the cause and now I have only the KafkaSpout in my main:
....
final SpoutConfig spoutConfig = new SpoutConfig(
new ZkHosts(zkHosts, "/brokers"),
"files-topic", // topic
"/kafka", // ZK chroot
"consumer-group-name");
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConfig.startOffsetTime = OffsetRequest.LatestTime();
topologyBuilder.setSpout(
"kafka-spout-id,
new KafkaSpout(config),
1);
....
When this (useless) topology executes, even in local mode, even the very first time, the network traffic always grows a lot: I see (in my Activity Monitor)
An average of 432 KB of data received/sec
After a couple of hours the topology is running (idle) data received is 1.26GB and data sent is 1GB
(Important: Kafka is not running in cluster, a single instance that runs in the same machine with a single topic and a single partition. I just downloaded Kafka on my machine, started it and created a simple topic. When I put a message in the topic, everything in the topology is working without any problem at all)
Obviously, the reason is in the KafkaSpout.nextTuple() method (below), but I don't understand why, without any message in Kafka, I should have such traffic. Is there something I didn't consider? Is that the expected behaviour? I had a look at Kafka logs, ZK logs, nothing, I have cleaned up Kafka and ZK data, nothing, still the same behaviour.
#Override
public void nextTuple() {
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) {
try {
// in case the number of managers decreased
_currPartitionIndex = _currPartitionIndex % managers.size();
EmitState state = managers.get(_currPartitionIndex).next(_collector);
if (state != EmitState.EMITTED_MORE_LEFT) {
_currPartitionIndex = (_currPartitionIndex + 1) % managers.size();
}
if (state != EmitState.NO_EMITTED) {
break;
}
} catch (FailedFetchException e) {
LOG.warn("Fetch failed", e);
_coordinator.refresh();
}
}
long diffWithNow = System.currentTimeMillis() - _lastUpdateMs;
/*
As far as the System.currentTimeMillis() is dependent on System clock,
additional check on negative value of diffWithNow in case of external changes.
*/
if (diffWithNow > _spoutConfig.stateUpdateIntervalMs || diffWithNow < 0) {
commit();
}
}

Put a sleep for one second (1000ms) in the nextTuple() method and observe the traffic now, For example,
#Override
public void nextTuple() {
try {
Thread.sleep(1000);
} catch(Exception ex){
log.error("Ëxception while sleeping...",e);
}
List<PartitionManager> managers = _coordinator.getMyManagedPartitions();
for (int i = 0; i < managers.size(); i++) {
...
...
...
...
}
The reason is, kafka consumer works on the basis of pull methodology which means, consumers will pull data from kafka brokers. So in consumer point of view (Kafka Spout) will do a fetch request to the kafka broker continuously which is a TCP network request. So you are facing a huge statistics on the data packet sent/received. Though the consumer doesn't consumes any message, pull request and empty response also will get account into network data packet sent/received statistics. Your network traffic will be less if your sleeping time is high. There are also some network related configurations for the brokers and also for consumer. Doing the research on configuration may helps you. Hope it will helps you.

Is your bolt receiving messages ? Do your bolt inherits BaseRichBolt ?
Comment out that line m.fail(id.offset) in Kafaspout and check it out. If your bolt doesn't ack then your spout assumes that message is failed and try to replay the same message.
public void fail(Object msgId) {
KafkaMessageId id = (KafkaMessageId) msgId;
PartitionManager m = _coordinator.getManager(id.partition);
if (m != null) {
//m.fail(id.offset);
}
Also try halt the nextTuple() for few millis and check it out.
Let me know if it helps

How to implement distributed rate limiter?

Let's say, I have P processes running some business logic on N physical machines. These processes call some web service S, say. I want to ensure that not more than X calls are made to the service S per second by all the P processes combined.
How can such a solution be implemented?
Google Guava's Rate Limiter works well for processes running on single box, but not in distributed setup.
Are there any standard, ready to use, solutions available for JAVA? [may be based on zookeeper]
Thanks!

Bucket4j is java implementation of "token-bucket" rate limiting algorithm. It works both locally and distributed(on top of JCache). For distributed use case you are free to choose any JCache implementation like Hazelcast or Apache Ignite. See this example of using Bucket4j in cluster.

I have been working on an opensource solution for these kind of problems.
Limitd is a "server" for limits. The limits are implemented using the Token Bucket Algorithm.
Basically you define limits in the service configuration like this:
buckets:
"request to service a":
per_minute: 10
"request to service b":
per_minute: 5
The service is run as a daemon listening on a TCP/IP port.
Then your application does something along these lines:
var limitd = new Limitd('limitd://my-limitd-address');
limitd.take('request to service a', 'app1' 1, function (err, result) {
if (result.conformant) {
console.log('everything is okay - this should be allowed');
} else {
console.error('too many calls to this thing');
}
});
We are currently using this for rate-limiting and debouncing some application events.
The server is on:
https://github.com/auth0/limitd
We are planning to work on several SDKs but for now we only have node.js and partially implemented go:
https://github.com/limitd

https://github.com/jdwyah/ratelimit-java provides distributed rate limits that should do just this. You can configure your limit as S per second / minute etc and choose burst size / refill rate of the leaky bucket that is under the covers.

Simple rate limiting in java where you want to achieve a concurrency of 3 transactions every 3 seconds. If you want to centralize this then either store the tokens array in elasticache or any database. And in place of synchronized block you will have to implement a lock flag as well.
import java.util.Date;
public class RateLimiter implements Runnable {
private long[] tokens = new long[3];
public static void main(String[] args) {
// TODO Auto-generated method stub
RateLimiter rateLimiter = new RateLimiter();
for (int i=0; i<20; i++) {
Thread thread = new Thread(rateLimiter,"Thread-"+i );
thread.start();
}
}
#Override
public void run() {
// TODO Auto-generated method stub
long currentStartTime = System.currentTimeMillis();
while(true) {
if(System.currentTimeMillis() - currentStartTime > 100000 ) {
throw new RuntimeException("timed out");
}else {
if(getToken()) {
System.out.println(Thread.currentThread().getName() +
" at " +
new Date(System.currentTimeMillis()) + " says hello");
break;
}
}
}
}
synchronized private boolean getToken() {
// TODO Auto-generated method stub
for (int i = 0; i< 3; i++) {
if(tokens[i] == 0 || System.currentTimeMillis() - tokens[i] > 3000) {
tokens[i] = System.currentTimeMillis();
return true;
}
}
return false;
}
}

So with all distributed rate limiting architecture, you'll need a single backend store that acts as single source of true to track the number of requests. You can always use zookeeper as a in memory datastore for this out of convenience, although there are better choices such as Redis.

When is a started service not a started service? (SQL Express)

We require programmatic access to a SQL Server Express service as part of our application. Depending on what the user is trying to do, we may have to attach a database, detach a database, back one up, etc. Sometimes the service might not be started before we attempt these operations. So we need to ensure the service is started. Here is where we are running into problems. Apparently the ServiceController.WaitForStatus(ServiceControllerStatus.Running) returns prematurely for SQL Server Express. What is really puzzling is that the master database seems to be immediately available, but not other databases. Here is a console application to demonstrate what I am talking about:
namespace ServiceTest
{
using System;
using System.Data.SqlClient;
using System.Diagnostics;
using System.ServiceProcess;
using System.Threading;
class Program
{
private static readonly ServiceController controller = new ServiceController("MSSQL$SQLEXPRESS");
private static readonly Stopwatch stopWatch = new Stopwatch();
static void Main(string[] args)
{
stopWatch.Start();
EnsureStop();
Start();
OpenAndClose("master");
EnsureStop();
Start();
OpenAndClose("AdventureWorksLT");
Console.ReadLine();
}
private static void EnsureStop()
{
Console.WriteLine("EnsureStop enter, {0:N0}", stopWatch.ElapsedMilliseconds);
if (controller.Status != ServiceControllerStatus.Stopped)
{
controller.Stop();
controller.WaitForStatus(ServiceControllerStatus.Stopped);
Thread.Sleep(5000); // really, really make sure it stopped ... this has a problem too.
}
Console.WriteLine("EnsureStop exit, {0:N0}", stopWatch.ElapsedMilliseconds);
}
private static void Start()
{
Console.WriteLine("Start enter, {0:N0}", stopWatch.ElapsedMilliseconds);
controller.Start();
controller.WaitForStatus(ServiceControllerStatus.Running);
// Thread.Sleep(5000);
Console.WriteLine("Start exit, {0:N0}", stopWatch.ElapsedMilliseconds);
}
private static void OpenAndClose(string database)
{
Console.WriteLine("OpenAndClose enter, {0:N0}", stopWatch.ElapsedMilliseconds);
var connection = new SqlConnection(string.Format(#"Data Source=.\SQLEXPRESS;initial catalog={0};integrated security=SSPI", database));
connection.Open();
connection.Close();
Console.WriteLine("OpenAndClose exit, {0:N0}", stopWatch.ElapsedMilliseconds);
}
}
}
On my machine, this will consistently fail as written. Notice that the connection to "master" has no problems; only the connection to the other database. (You can reverse the order of the connections to verify this.) If you uncomment the Thread.Sleep in the Start() method, it will work fine.
Obviously I want to avoid an arbitrary Thread.Sleep(). Besides the rank code smell, what arbitary value would I put there? The only thing we can think of is to put some dummy connections to our target database in a while loop, catching the SqlException thrown and trying again until it works. But I'm thinking there must be a more elegant solution out there to know when the service is really ready to be used. Any ideas?
EDIT: Based on feedback provided below, I added a check on the status of the database. However, it is still failing. It looks like even the state is not reliable. Here is the function I am calling before OpenAndClose(string):
private static void WaitForOnline(string database)
{
Console.WriteLine("WaitForOnline start, {0:N0}", stopWatch.ElapsedMilliseconds);
using (var connection = new SqlConnection(string.Format(#"Data Source=.\SQLEXPRESS;initial catal
using (var command = connection.CreateCommand())
{
connection.Open();
try
{
command.CommandText = "SELECT [state] FROM sys.databases WHERE [name] = #DatabaseName";
command.Parameters.AddWithValue("#DatabaseName", database);
byte databaseState = (byte)command.ExecuteScalar();
Console.WriteLine("databaseState = {0}", databaseState);
while (databaseState != OnlineState)
{
Thread.Sleep(500);
databaseState = (byte)command.ExecuteScalar();
Console.WriteLine("databaseState = {0}", databaseState);
}
}
finally
{
connection.Close();
}
}
Console.WriteLine("WaitForOnline exit, {0:N0}", stopWatch.ElapsedMilliseconds);
}
I found another discussion dealing with a similar problem. Apparently the solution is to check the sys.database_files of the database in question. But that, of course, is a chicken-and-egg problem. Any other ideas?

Service start != database start.
Service is started when the SQL Server process is running and responded to the SCM that is 'alive'. After that the server will start putting user databases online. As part of this process, it runs the recovery process on each database, to ensure transactional consistency. Recovery of a database can last anywhere from microseconds to whole days, it depends on the ammount of log to be redone and the speed of the disk(s).
After the SCM returns that the service is running, you should connect to 'master' and check your database status in sys.databases. Only when the status is ONLINE can you proceed to open it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spymemcache- Memcache/Membase Faileover - memcached

Related

Netty Client Connect with Server, but server does not fire channelActive/Registered

Distributed state-machine's zookeeper ensemble fails while processing parallel regions with error KeeperErrorCode = BadVersion

KafkaSpout (idle) generates a huge network traffic

How to implement distributed rate limiter?

When is a started service not a started service? (SQL Express)

Categories

Resources