debezium how to commit offset manually - debezium

I use debezium to sync data from Postgres to flink, and create engine use this code
this.engine = DebeziumEngine.create(Connect.class)
.using(properties)
.notifying(debeziumConsumer)
.using((success, message, error) -> {
if (!success && error != null) {
this.reportError(error);
}
})
.build();
I want to call ChangeEventSourceCoordinator#commitOffset when flink doing checkpoint, but coordinator is private in BaseSourceTask and task is private in EmbeddedEngine, so I can't call commitOffset in my code, Is there any other way to achieve manual commit?
public final class EmbeddedEngine implements DebeziumEngine<SourceRecord>{
private SourceTask task;
}
public abstract class BaseSourceTask extends SourceTask {
private ChangeEventSourceCoordinator coordinator;
}

Related

Deleting element from Store after ExpirePolicy

Environment: I am running Apache Ignite v2.13.0 for the cache and the cache store is persisting to a Mongo DB v3.6.0. I am also utilizing Spring Boot (Java).
Question: When I have an expiration policy set, how do I remove the corresponding data from my persistent database?
What I have attempted: I have attempted to utilize CacheEntryExpiredListener but my print statement is not getting triggered. Is this the proper way to solve the problem?
Here is a sample bit of code:
#Service
public class CacheRemovalListener implements CacheEntryExpiredListener<Long, Metrics> {
#Override
public void onExpired(Iterable<CacheEntryEvent<? extends Long, ? extends Metrics>> events) throws CacheEntryListenerException {
for (CacheEntryEvent<? extends Long, ? extends Metrics> event : events) {
System.out.println("Received a " + event);
}
}
}
Use Continuous Query to get notifications about Ignite data changes.
ExecutorService mongoUpdateExecutor = Executors.newSingleThreadExecutor();
CacheEntryUpdatedListener<Integer, Integer> lsnr = new CacheEntryUpdatedListener<Integer, Integer>() {
#Override
public void onUpdated(Iterable<CacheEntryEvent<? extends Integer, ? extends Integer>> evts) {
for (CacheEntryEvent<?, ?> e : evts) {
if (e.getEventType() == EventType.EXPIRED) {
// Use separate executor to avoid blocking Ignite threads
mongoUpdateExecutor.submit(() -> removeFromMongo(e.getKey()));
}
}
}
};
var qry = new ContinuousQuery<Integer, Integer>()
.setLocalListener(lsnr)
.setIncludeExpired(true);
// Start receiving updates.
var cursor = cache.query(qry);
// Stop receiving updates.
cursor.close();
Note 1: EXPIRED events should be enabled explicitly with ContinuousQuery#setIncludeExpired.
Note 2: Query listeners should not perform any heavy/blocking operations. Offload that work to a separate thread/executor.

Spring data MongoDB change stream with multiple application instances

I have a springboot with   springdata  mongodb application where I am connecting to mongo change stream to save the changes to a audit collection.  My application is running multiple instances (2 instances) and will be scaled up to n number instances when the load increased.   When records are created in the original collection (“my collection”), the listeners will be triggered in all running instances and creates duplicate records.  Following is my setup
build.gradle
…
// spring data mingodb version 3.1.5
implementation 'org.springframework.boot:spring-boot-starter-data-mongodb'
…
Listener config
#Configuration
#Slf4j
public class MongoChangeStreamListenerConfig {
#Bean
MessageListenerContainer changeStreamListenerContainer(
MongoTemplate template,
PartyConsentAuditListener consentAuditListener,
ErrorHandler errorHandler) {
MessageListenerContainer messageListenerContainer =
new MongoStreamListenerContainer(template, errorHandler);
ChangeStreamRequest<PartyConsentEntity> request =
ChangeStreamRequest.builder(consentAuditListener)
.collection("my-collection")
.filter(newAggregation(match(where("operationType").in("insert", "update", "replace"))))
.fullDocumentLookup(FullDocument.UPDATE_LOOKUP)
.build();
messageListenerContainer.register(request, MyEntity.class, errorHandler);
log.info("mongo stream listener is registered");
return messageListenerContainer;
}
#Bean
ErrorHandler getLoggingErrorHandler() {
return new ErrorHandler() {
#Override
public void handleError(Throwable throwable) {
log.error("error in creating audit records {}", throwable);
}
};
}
}
Listener container
public class MongoStreamListenerContainer extends DefaultMessageListenerContainer {
public MongoStreamListenerContainer(MongoTemplate template, ErrorHandler errorHandler) {
super(template, Executors.newFixedThreadPool(15), errorHandler);
}
#Override
public boolean isAutoStartup() {
return true;
}
}
ChangeListener
#Component
#Slf4j
#RequiredArgsConstructor
/**
* This class will listen to mongodb change stream and process changes. The onMessage will triggered
* when a record added, updated or replaced in the mongo db.
*/
public class MyEntityAuditListener
implements MessageListener<ChangeStreamDocument<Document>, MyEntity> {
#Override
public void onMessage(Message<ChangeStreamDocument<Document>, MyEntity > message) {
var update = message.getBody();
log.info("db change event received");
if (update != null) {
log.info("creating audit entries for id {}", update.getId());
// This will execute in all the instances and creating duplicating records
}
}
}
Is there a way to control the execution on one instance at a given time and share the load between nodes?. It would be really nice to know if there is a config from spring data mongodb to control the flow.
Also, I have checked the following post in stack overflow and I am not sure how to use this with spring data.
Mongo Change Streams running multiple times (kind of): Node app running multiple instances
Any help or tip to resolve this issue is highly appreciated. Thank you very much in advance.

Solr custom query component does not return correct facet counts

I have a simple Solr query component as follows:
public class QueryPreprocessingComponent extends QueryComponent implements PluginInfoInitialized {
private static final Logger LOG = LoggerFactory.getLogger( QueryPreprocessingComponent.class );
private ExactMatchQueryProcessor exactMatchQueryProcessor;
public void init( PluginInfo info ) {
initializeProcessors(info);
}
private void initializeProcessors(PluginInfo info) {
List<PluginInfo> queryPreProcessors = info.getChildren("queryPreProcessors")
.get(0).getChildren("queryPreProcessor");
for (PluginInfo queryProcessor : queryPreProcessors) {
initializeProcessor(queryProcessor);
}
}
private void initializeProcessor(PluginInfo queryProcessor) {
QueryProcessorParam processorName = QueryProcessorParam.valueOf(queryProcessor.name);
switch(processorName) {
case ExactMatchQueryProcessor:
exactMatchQueryProcessor = new ExactMatchQueryProcessor(queryProcessor.initArgs);
LOG.info("ExactMatchQueryProcessor initialized...");
break;
default: throw new AssertionError();
}
}
#Override
public void prepare( ResponseBuilder rb ) throws IOException
{
if (exactMatchQueryProcessor != null) {
exactMatchQueryProcessor.modifyForExactMatch(rb);
}
}
#Override
public void process(ResponseBuilder rb) throws IOException
{
// do nothing - needed so we don't execute the query here.
return;
}
}
This works as expected functionally except when I use this in a distributed request, it has an issue with facets counts returned. It doubles the facet counts.
Note that I am not doing anything related to faceting in plugin. exactMatchQueryProcessor.modifyForExactMatch(rb); does a very minimal processing if the query is quoted otherwise it does nothing. Even if the incoming query is not quoted, facet count issue is there. Even if I comment everything inside prepare function, issue persists.
Note that this component is declared in as first-components in solrconfig.xml.
I resolved this issue by extending the class to SearchComponent instead of QueryComponent. It seems that SearchComponent sits at higher level of abstraction than QueryComponent and is useful when you want to work on a layer above shards.

Use postgresql as flink sink, And the connection can't serialize PGConnection by kyro?

I write a postgresql sink :
class PGTwoPhaseCommitSinkFunction
extends TwoPhaseCommmitSinkFunction[Row,PgConnection,Void](new
KryoSerializer[PgConnection](classOf[PgConnection]),new ExecutionConfig),
VoidSerializer.INSTANCE)
but, when i use it, find the PgConnection Can't serialize,
and the exception is :
com.esotericsoftware.kryo.KryoException: Error constructing instance of class: sun.nio.cs.UTF_8"
how can i handle it ? thanks
The transaction object specified as a second generic parameter should be first of all Serializable and I don't think it's right to use the PgConnection for it. Instead, it should be some custom lightweight transaction state object, holding transaction metadata like id - for example, below is the example of FlinkKafkaProducer transaction state:
/**
* State for handling transactions.
*/
#VisibleForTesting
#Internal
static class KafkaTransactionState {
private final transient FlinkKafkaInternalProducer<byte[], byte[]> producer;
#Nullable
final String transactionalId;
final long producerId;
final short epoch;
KafkaTransactionState(String transactionalId, FlinkKafkaInternalProducer<byte[], byte[]> producer) {
this(transactionalId, producer.getProducerId(), producer.getEpoch(), producer);
}
KafkaTransactionState(FlinkKafkaInternalProducer<byte[], byte[]> producer) {
this(null, -1, (short) -1, producer);
}
KafkaTransactionState(
#Nullable String transactionalId,
long producerId,
short epoch,
FlinkKafkaInternalProducer<byte[], byte[]> producer) {
this.transactionalId = transactionalId;
this.producerId = producerId;
this.epoch = epoch;
this.producer = producer;
}
boolean isTransactional() {
return transactionalId != null;
}
...
However, as it's good pointed in this thread, writing and maintaining a two-phase commit sink is a tricky and very difficult task and it's better to use the table API with the JDBC connector plus postgres driver.
Here is an example of the pipeline writing data into the postgres.

Disconnect client from IHubContext<THub>

I can call InvokeAsync from server code using the IHubContext interface, but sometimes I want to force these clients to disconnect.
So, is there any way to disconnect clients from server code that references the IHubContext interface?
Step 1:
using Microsoft.AspNetCore.Connections.Features;
using System.Collections.Generic;
using Microsoft.AspNetCore.SignalR;
public class ErrorService
{
readonly HashSet<string> PendingConnections = new HashSet<string>();
readonly object PendingConnectionsLock = new object();
public void KickClient(string ConnectionId)
{
//TODO: log
if (!PendingConnections.Contains(ConnectionId))
{
lock (PendingConnectionsLock)
{
PendingConnections.Add(ConnectionId);
}
}
}
public void InitConnectionMonitoring(HubCallerContext Context)
{
var feature = Context.Features.Get<IConnectionHeartbeatFeature>();
feature.OnHeartbeat(state =>
{
if (PendingConnections.Contains(Context.ConnectionId))
{
Context.Abort();
lock (PendingConnectionsLock)
{
PendingConnections.Remove(Context.ConnectionId);
}
}
}, Context.ConnectionId);
}
}
Step 2:
public void ConfigureServices(IServiceCollection services)
{
...
services.AddSingleton<ErrorService>();
...
}
Step 3:
[Authorize(Policy = "Client")]
public class ClientHub : Hub
{
ErrorService errorService;
public ClientHub(ErrorService errorService)
{
this.errorService = errorService;
}
public async override Task OnConnectedAsync()
{
errorService.InitConnectionMonitoring(Context);
await base.OnConnectedAsync();
}
....
Disconnecting without Abort() method:
public class TestService
{
public TestService(..., ErrorService errorService)
{
string ConnectionId = ...;
errorService.KickClient(ConnectionId);
In alpha 2 there is the Abort() on HubConnectionContext you could use to terminate a connection. I don't see, however, an easy way to access it from outside the hub.
Because you control the clients you could just invoke a client method and tell the client to disconnect. The advantage is that the client disconnect gracefully. The disadvantage is that it requires sending the message to the client instead of disconnecting the client solely on the server side.