Remote Partitioning : Worker side serialization issue - spring-batch

When running worker getting below exception
I have used org.springframework.kafka.support.serializer.JsonDeserializer and org.apache.kafka.common.serialization.ByteArrayDeserializer which is working at the master side correctly , but getting exception at worker side ..... at the last i get Caused by: java.lang.StackOverflowError: null
org.springframework.kafka.KafkaException: Seek to current after exception; nested exception is org.springframework.kafka.listener.ListenerExecutionFailedException: Listener failed; nested exception is org.springframework.messaging.MessageHandlingException: error occurred in message handler [bean 'handler'; defined in: 'batch.configuration.WorkerConfiguration'; from source: 'org.springframework.core.type.StandardMethodMetadata#7f811d00']; nested exception is org.apache.kafka.common.errors.SerializationException: Can't serialize data [StepExecution: id=5, version=93, name=workerStep:partition4, status=COMPLETED, exitStatus=COMPLETED, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=31, rollbackCount=0, exitDescription=] for topic [reply], failedMessage=GenericMessage [payload=StepExecution: id=5, version=93, name=workerStep:partition4, status=COMPLETED, exitStatus=COMPLETED, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=31, rollbackCount=0, exitDescription=, headers={sequenceNumber=1, errorChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#454b7395, sequenceSize=10, kafka_timestampType=CREATE_TIME, kafka_replyTopic=reply, kafka_receivedTopic=requests, replyChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#454b7395, kafka_offset=5, kafka_consumer=org.apache.kafka.clients.consumer.KafkaConsumer#1a6c92e3, kafka_correlationId=[B#ef1f7b2, correlationId=1:workerStep, id=49d312f5-618f-5085-fad1-7868e8e06aa0, kafka_receivedPartitionId=0, kafka_receivedTimestamp=1603368744168, kafka_acknowledgment=Acknowledgment for ConsumerRecord(topic = requests, partition = 0, leaderEpoch = 0, offset = 5, CreateTime = 1603368744168, serialized key size = -1, serialized value size = 64, headers = RecordHeaders(headers = [RecordHeader(key = sequenceNumber, value = [49]), RecordHeader(key = sequenceSize, value = [49, 48]), RecordHeader(key = correlationId, value = [49, 58, 119, 111, 114, 107, 101, 114, 83, 116, 101, 112]), RecordHeader(key = kafka_replyTopic, value = [114, 101, 112, 108, 121]), RecordHeader(key = spring_json_header_types, value = [123, 34, 115, 101, 113, 117, 101, 110, 99, 101, 78, 117, 109, 98, 101, 114, 34, 58, 34, 106, 97, 118, 97, 46, 108, 97, 110, 103, 46, 73, 110, 116, 101, 103, 101, 114, 34, 44, 34, 115, 101, 113, 117, 101, 110, 99, 101, 83, 105, 122, 101, 34, 58, 34, 106, 97, 118, 97, 46, 108, 97, 110, 103, 46, 73, 110, 116, 101, 103, 101, 114, 34, 44, 34, 99, 111, 114, 114, 101, 108, 97, 116, 105, 111, 110, 73, 100, 34, 58, 34, 106, 97, 118, 97, 46, 108, 97, 110, 103, 46, 83, 116, 114, 105, 110, 103, 34, 44, 34, 107, 97, 102, 107, 97, 95, 114, 101, 112, 108, 121, 84, 111, 112, 105, 99, 34, 58, 34, 106, 97, 118, 97, 46, 108, 97, 110, 103, 46, 83, 116, 114, 105, 110, 103, 34, 125]), RecordHeader(key = kafka_replyTopic, value = [114, 101, 112, 108, 121]), RecordHeader(key = kafka_correlationId, value = [-127, -49, 126, 33, 100, 99, 79, -14, -81, -35, -70, -88, -61, 37, 61, -63])], isReadOnly = false), key = null, value = StepExecutionRequest: [jobExecutionId=1, stepExecutionId=5, stepName=workerStep]), kafka_groupId=repliesGroup, timestamp=1603379041885}]
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:157) ~[spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:103) ~[spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeErrorHandler(KafkaMessageListenerContainer.java:1887) [spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeRecordListener(KafkaMessageListenerContainer.java:1792) [spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doInvokeWithRecords(KafkaMessageListenerContainer.java:1719) [spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeRecordListener(KafkaMessageListenerContainer.java:1617) [spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeListener(KafkaMessageListenerContainer.java:1348) [spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1064) [spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:972) [spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
Caused by: org.springframework.kafka.listener.ListenerExecutionFailedException: Listener failed; nested exception is org.springframework.messaging.MessageHandlingException: error occurred in message handler [bean 'handler'; defined in: 'batch.configuration.WorkerConfiguration'; from source: 'org.springframework.core.type.StandardMethodMetadata#7f811d00']; nested exception is org.apache.kafka.common.errors.SerializationException: Can't serialize data [StepExecution: id=5, version=93, name=workerStep:partition4, status=COMPLETED, exitStatus=COMPLETED, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=31, rollbackCount=0, exitDescription=] for topic [reply], failedMessage=GenericMessage [payload=StepExecution: id=5, version=93, name=workerStep:partition4, status=COMPLETED, exitStatus=COMPLETED, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=31, rollbackCount=0, exitDescription=, headers={sequenceNumber=1, errorChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#454b7395, sequenceSize=10, kafka_timestampType=CREATE_TIME, kafka_replyTopic=reply, kafka_receivedTopic=requests, replyChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#454b7395, kafka_offset=5, kafka_consumer=org.apache.kafka.clients.consumer.KafkaConsumer#1a6c92e3, kafka_correlationId=[B#ef1f7b2, correlationId=1:workerStep, id=49d312f5-618f-5085-fad1-7868e8e06aa0, kafka_receivedPartitionId=0, kafka_receivedTimestamp=1603368744168, kafka_acknowledgment=Acknowledgment for ConsumerRecord(topic = requests, partition = 0, leaderEpoch = 0, offset = 5, CreateTime = 1603368744168, serialized key size = -1, serialized value size = 64, headers = RecordHeaders(headers = [RecordHeader(key = sequenceNumber, value = [49]), RecordHeader(key = sequenceSize, value = [49, 48]), RecordHeader(key = correlationId, value = [49, 58, 119, 111, 114, 107, 101, 114, 83, 116, 101, 112]), RecordHeader(key = kafka_replyTopic, value = [114, 101, 112, 108, 121]), RecordHeader(key = spring_json_header_types, value = [123, 34, 115, 101, 113, 117, 101, 110, 99, 101, 78, 117, 109, 98, 101, 114, 34, 58, 34, 106, 97, 118, 97, 46, 108, 97, 110, 103, 46, 73, 110, 116, 101, 103, 101, 114, 34, 44, 34, 115, 101, 113, 117, 101, 110, 99, 101, 83, 105, 122, 101, 34, 58, 34, 106, 97, 118, 97, 46, 108, 97, 110, 103, 46, 73, 110, 116, 101, 103, 101, 114, 34, 44, 34, 99, 111, 114, 114, 101, 108, 97, 116, 105, 111, 110, 73, 100, 34, 58, 34, 106, 97, 118, 97, 46, 108, 97, 110, 103, 46, 83, 116, 114, 105, 110, 103, 34, 44, 34, 107, 97, 102, 107, 97, 95, 114, 101, 112, 108, 121, 84, 111, 112, 105, 99, 34, 58, 34, 106, 97, 118, 97, 46, 108, 97, 110, 103, 46, 83, 116, 114, 105, 110, 103, 34, 125]), RecordHeader(key = kafka_replyTopic, value = [114, 101, 112, 108, 121]), RecordHeader(key = kafka_correlationId, value = [-127, -49, 126, 33, 100, 99, 79, -14, -81, -35, -70, -88, -61, 37, 61, -63])], isReadOnly = false), key = null, value = StepExecutionRequest: [jobExecutionId=1, stepExecutionId=5, stepName=workerStep]), kafka_groupId=repliesGroup, timestamp=1603379041885}]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.decorateException(KafkaMessageListenerContainer.java:1902) [spring-kafka-2.5.2.RELEASE.jar:2.5.2.RELEASE]
... 10 common frames omitted
Caused by: org.springframework.messaging.MessageHandlingException: error occurred in message handler [bean 'handler'; defined in: 'batch.configuration.WorkerConfiguration'; from source: 'org.springframework.core.type.StandardMethodMetadata#7f811d00']; nested exception is org.apache.kafka.common.errors.SerializationException: Can't serialize data [StepExecution: id=5, version=93, name=workerStep:partition4, status=COMPLETED, exitStatus=COMPLETED, readCount=0, filterCount=0, writeCount=0 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=31, rollbackCount=0, exitDescription=] for topic [reply]
at org.springframework.integration.support.utils.IntegrationUtils.wrapInHandlingExceptionIfNecessary(IntegrationUtils.java:192) ~[spring-integration-core-5.3.1.RELEASE.jar:5.3.1.RELEASE]
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:79) ~[spring-integration-core-5.3.1.RELEASE.jar:5.3.1.RELEASE]
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115) ~[spring-integration-core-5.3.1.RELEASE.jar:5.3.1.RELEASE]
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133) ~[spring-integration-core-5.3.1.RELEASE.jar:5.3.1.RELEASE]
at org.
Code of worker
#SpringBootApplication
#EnableBatchProcessing
#EnableBatchIntegration
#EnableIntegration
#ImportResource("context.xml")
public class WorkerConfiguration {
private final RemotePartitioningWorkerStepBuilderFactory workerStepBuilderFactory;
private final JobBuilderFactory jobBuilderFactory;
#Autowired
ProducerFactory<String, String> producerFactory;
#Autowired
public JobExplorer jobExplorer;
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
public WorkerConfiguration(JobBuilderFactory jobBuilderFactory, RemotePartitioningWorkerStepBuilderFactory workerStepBuilderFactory) {
this.jobBuilderFactory = jobBuilderFactory;
this.workerStepBuilderFactory = workerStepBuilderFactory;
}
/*
* Configure inbound flow (requests coming from the master)
*/
#Bean
public DirectChannel requests() {
return new DirectChannel();
}
/*
* Configure outbound flow (replies going to the master)
*/
#Bean
public DirectChannel replies() {
return new DirectChannel();
}
/*
protected JobRepository createMyJobRepository() throws Exception {
JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
factory.setTransactionManager(new ResourcelessTransactionManager());
factory.setDataSource(createDataSourceForRepository());
factory.setDatabaseType("HSQL");
return factory.getObject();
}
public DataSource createDataSourceForRepository() {
return DataSourceBuilder.create()
.url("jdbc:hsqldb:file:src/main/resources/hsqldb/batchcore.db;hsqldb.lock_file=false;shutdown=true;")
.driverClassName("org.hsqldb.jdbcDriver")
.username("sa")
.password("")
.build();
}
public DataSource createDataSourceForRepository() {
return DataSourceBuilder.create()
.url("jdbc:postgresql://localhost:5432/bdauser")
.driverClassName("org.postgresql.Driver")
.username("bdauser")
.password("bdauser")
.build();
}
#Bean
public BatchConfigurer batchConfigurer() {
return new DefaultBatchConfigurer(createDataSourceForRepository()) {
#Override
public JobRepository getJobRepository() {
JobRepository jobRepository = null;
try {
jobRepository = createMyJobRepository();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("************************WORKER replying Inside batchConfigurer ****************");
return jobRepository;
}
};
}
*/
#Bean
#ServiceActivator(inputChannel = "replies")
public MessageHandler handler() throws Exception {
System.out.println("************************ worker inside serviceactivator **********************");
KafkaProducerMessageHandler<String, String> handler = new KafkaProducerMessageHandler<>(kafkaTemplate());
handler.setTopicExpression(new LiteralExpression("reply"));
return handler;
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory);
}
#Bean
public KafkaMessageListenerContainer<String, String> replyContainer(ConsumerFactory<String, String> cf) {
ContainerProperties containerProperties = new ContainerProperties(new TopicPartitionOffset("nullChannel", 0));
System.out.println("************************** WORKER ContainerProperties *****************************");
containerProperties.setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
return new KafkaMessageListenerContainer<>(cf, containerProperties);
}
#Primary
#Bean
public ReplyingKafkaTemplate<String, String, String> replyingTemplate(ProducerFactory<String, String> producerFactory, KafkaMessageListenerContainer<String, String> repliesContainer) {
System.out.println("**************************WORKER replying Template Templet *****************************");
ReplyingKafkaTemplate<String, String, String> replyingKafkaTemplate = new ReplyingKafkaTemplate<>(producerFactory,repliesContainer);
replyingKafkaTemplate.setSharedReplyTopic(true);
Duration d = Duration.ofSeconds(50);
replyingKafkaTemplate.setDefaultReplyTimeout(d);
return replyingKafkaTemplate;
}
#Bean
public ConcurrentMessageListenerContainer<String, String> repliesContainer(ConcurrentKafkaListenerContainerFactory<String, String> containerFactory) {
System.out.println("**************** worker node ConcurrentMessageListenerContainer *****************************");
ConcurrentMessageListenerContainer<String, String> repliesContainer = containerFactory.createContainer("requests");
repliesContainer.getContainerProperties().setGroupId("repliesGroup");
repliesContainer.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
repliesContainer.setAutoStartup(false);
return repliesContainer;
}
#Bean
public IntegrationFlow serverGateway(ConcurrentMessageListenerContainer<String, String> container, KafkaTemplate<String, String> template) {
return IntegrationFlows
.from(Kafka.inboundGateway(container, template)
.replyTimeout(3000000))
.channel(requests())
.get();
}
#Bean
public Job remotePartitioningJob() {
System.out.println("******************* inside remotePartitioningJob **************************");
return this.jobBuilderFactory.get("remotePartitioningJobMy")
.incrementer(new RunIdIncrementer())
.start(workerStep())
.build();
}
/*
* Configure the worker step
*/
#Bean
public Step workerStep() {
System.out.println("******************* inside worker step **************************");
return this.workerStepBuilderFactory.get("workerStep")
.inputChannel(requests())
.outputChannel(replies())
.tasklet(tasklet(null))
.build();
}
#Bean
#StepScope
public Tasklet tasklet(#Value("#{stepExecutionContext['partition']}") String partition) {
return (contribution, chunkContext) -> {
System.out.println("processing " + partition);
return RepeatStatus.FINISHED;
};
}
#Bean
public PartitionHandler partitionHandler() throws Exception {
MessageChannelPartitionHandler partitionHandler = new MessageChannelPartitionHandler();
partitionHandler.setStepName("slaveStep");
partitionHandler.setGridSize(10);
partitionHandler.setMessagingOperations(messageTemplate());
partitionHandler.setPollInterval(5000l);
partitionHandler.setJobExplorer(this.jobExplorer);
partitionHandler.afterPropertiesSet();
return partitionHandler;
}
#Bean
public MessagingTemplate messageTemplate() {
MessagingTemplate messagingTemplate = new MessagingTemplate(outboundRequests());
messagingTemplate.setReceiveTimeout(60000000l);
return messagingTemplate;
}
#Bean
public ExecutorChannel outboundRequests() {
return MessageChannels.executor("outboundRequests", new SimpleAsyncTaskExecutor()).get();
}
public static void main(String[] args) {
System.out.println("************************WORKER CONFIGURATION ****************");
SpringApplication.run(WorkerConfiguration.class, args);
}
}
property File
spring.main.allow-bean-definition-overriding=true
spring.kafka.bootstrap-servers=cdh5161-e2e-test-1.eaas.amdocs.com:9092,cdh5161-e2e-test-2.eaas.amdocs.com:9092,cdh5161-e2e-test-3.eaas.amdocs.com:9092,cdh5161-e2e-test-4.eaas.amdocs.com:9092,cdh5161-e2e-test-5.eaas.amdocs.com:9092,cdh5161-e2e-test-6.eaas.amdocs.com:9092,cdh5161-e2e-test-7.eaas.amdocs.com:9092,cdh5161-e2e-test-8.eaas.amdocs.com:9092
spring.kafka.consumer.group-id=remotePartitioningConsuerGroup
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.ByteArraySerializer
spring.kafka.producer.value-serializer=org.springframework.kafka.support.serializer.JsonSerializer
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer
spring.kafka.consumer.value-deserializer=org.springframework.kafka.support.serializer.JsonDeserializer
spring.kafka.producer.properties.spring.json.add.type.headers=false
spring.kafka.consumer.properties.spring.json.trusted.packages=*
spring.kafka.producer.properties.spring.json.trusted.packages=*
server.port=8050
## PostgreSQL
spring.datasource.url=jdbc:postgresql://localhost:5432/bdauser
spring.datasource.username=bdauser
spring.datasource.password=bdauser
#drop n create table again, good for testing, comment this in production
spring.jpa.hibernate.ddl-auto=create

Because you're using json.StepExecution class and JobExecution class have bi-directional relationship. So it causes the infinite recursion while trying to serialize the StepExecution object to send back to reply channel.

Related

Caused by: org.apache.kafka.common.errors.SerializationException: Can't deserialize data [

I'm working on Spring Batch and Apache Kafka Stream. Inspiration from : https://www.youtube.com/watch?v=UJesCn731G4. In this example, I am simply trying to read the customers stream looking to write that data into CSV.
Error:
org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition customers-0 at offset 0. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Can't deserialize data [[123, 34, 105, 100, 34, 58, 49, 44, 34, 102, 105, 114, 115, 116, 78, 97, 109, 101, 34, 58, 34, 32, 74, 111, 104, 110, 34, 44, 34, 108, 97, 115, 116, 78, 97, 109, 101, 34, 58, 34, 32, 68, 111, 101, 34, 44, 34, 98, 105, 114, 116, 104, 100, 97, 116, 101, 34, 58, 123, 34, 109, 111, 110, 116, 104, 34, 58, 34, 79, 67, 84, 79, 66, 69, 82, 34, 44, 34, 121, 101, 97, 114, 34, 58, 49, 57, 53, 50, 44, 34, 100, 97, 121, 79, 102, 89, 101, 97, 114, 34, 58, 50, 56, 52, 44, 34, 100, 97, 121, 79, 102, 77, 111, 110, 116, 104, 34, 58, 49, 48, 44, 34, 100, 97, 121, 79, 102, 87, 101, 101, 107, 34, 58, 34, 70, 82, 73, 68, 65, 89, 34, 44, 34, 104, 111, 117, 114, 34, 58, 49, 48, 44, 34, 109, 105, 110, 117, 116, 101, 34, 58, 49, 48, 44, 34, 109, 111, 110, 116, 104, 86, 97, 108, 117, 101, 34, 58, 49, 48, 44, 34, 110, 97, 110, 111, 34, 58, 48, 44, 34, 115, 101, 99, 111, 110, 100, 34, 58, 49, 48, 44, 34, 99, 104, 114, 111, 110, 111, 108, 111, 103, 121, 34, 58, 123, 34, 105, 100, 34, 58, 34, 73, 83, 79, 34, 44, 34, 99, 97, 108, 101, 110, 100, 97, 114, 84, 121, 112, 101, 34, 58, 34, 105, 115, 111, 56, 54, 48, 49, 34, 125, 125, 125]] from topic [customers]
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `java.time.LocalDateTime` (no Creators, like default construct, exist): cannot deserialize from Object value (no delegate- or property-based Creator)
at [Source: (byte[])"{"id":1,"firstName":" John","lastName":" Doe","birthdate":{"month":"OCTOBER","year":1952,"dayOfYear":284,"dayOfMonth":10,"dayOfWeek":"FRIDAY","hour":10,"minute":10,"monthValue":10,"nano":0,"second":10,"chronology":{"id":"ISO","calendarType":"iso8601"}}}"; line: 1, column: 60] (through reference chain: com.example.demo.model.Customer["birthdate"])
at com.fasterxml.jackson.databind.exc.InvalidDefinitionException.from(InvalidDefinitionException.java:67) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.DeserializationContext.reportBadDefinition(DeserializationContext.java:1592) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.DeserializationContext.handleMissingInstantiator(DeserializationContext.java:1058) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1297) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:159) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.deser.impl.MethodProperty.deserializeAndSet(MethodProperty.java:129) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:369) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:159) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:1719) ~[jackson-databind-2.10.3.jar:2.10.3]
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1282) ~[jackson-databind-2.10.3.jar:2.10.3]
at org.springframework.kafka.support.serializer.JsonDeserializer.deserialize(JsonDeserializer.java:438) ~[spring-kafka-2.3.7.RELEASE.jar:2.3.7.RELEASE]
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:1268) ~[kafka-clients-2.3.1.jar:na]
at org.apache.kafka.clients.consumer.internals.Fetcher.access$3600(Fetcher.java:124) ~[kafka-clients-2.3.1.jar:na]
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1492) ~[kafka-clients-2.3.1.jar:na]
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) ~[kafka-clients-2.3.1.jar:na]
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) ~[kafka-clients-2.3.1.jar:na]
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) ~[kafka-clients-2.3.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1294) ~[kafka-clients-2.3.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) ~[kafka-clients-2.3.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) ~[kafka-clients-2.3.1.jar:na]
at org.springframework.batch.item.kafka.KafkaItemReader.read(KafkaItemReader.java:164) ~[spring-batch-infrastructure-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:99) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.read(SimpleChunkProvider.java:180) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration(SimpleChunkProvider.java:126) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:375) ~[spring-batch-infrastructure-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215) ~[spring-batch-infrastructure-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145) ~[spring-batch-infrastructure-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.provide(SimpleChunkProvider.java:118) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:71) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:407) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) ~[spring-tx-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:273) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:82) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:375) ~[spring-batch-infrastructure-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215) ~[spring-batch-infrastructure-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145) ~[spring-batch-infrastructure-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:258) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:208) ~[spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148) [spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.job.AbstractJob.handleStep(AbstractJob.java:410) [spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.job.SimpleJob.doExecute(SimpleJob.java:136) [spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:319) [spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:147) [spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:140) [spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_151]
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[na:1.8.0_151]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[na:1.8.0_151]
at java.lang.reflect.Method.invoke(Unknown Source) ~[na:1.8.0_151]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) [spring-aop-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) [spring-aop-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) [spring-aop-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:127) [spring-batch-core-4.2.1.RELEASE.jar:4.2.1.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) [spring-aop-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212) [spring-aop-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at com.sun.proxy.$Proxy63.run(Unknown Source) [na:na]
at com.example.demo.SpringBatchKafkaReaderApplication.run(SpringBatchKafkaReaderApplication.java:37) [classes/:na]
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:784) [spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:768) [spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:322) [spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226) [spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215) [spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
at com.example.demo.SpringBatchKafkaReaderApplication.main(SpringBatchKafkaReaderApplication.java:27) [classes/:na]
CustomerRowMapper.java
public class CustomerRowMapper implements RowMapper<Customer> {
private static final DateTimeFormatter DT_FORMAT = DateTimeFormatter.ofPattern("dd-MM-yyyy HH:mm:ss");
#Override
public Customer mapRow(ResultSet rs, int rowNum) throws SQLException {
return Customer.builder()
.id(rs.getLong("id"))
.firstName(rs.getString("firstName"))
.lastName(rs.getString("lastName"))
.birthdate(LocalDateTime.parse(rs.getString("birthdate"), DT_FORMAT))
.build();
}
}
Customer.java
#Data
#AllArgsConstructor
#Builder
#NoArgsConstructor
public class Customer {
private Long id;
private String firstName;
private String lastName;
private LocalDateTime birthdate;
}
CustomerLineAggregator.java
public class CustomerLineAggregator implements LineAggregator<Customer>{
private ObjectMapper objectMapper = new ObjectMapper();
#Override
public String aggregate(Customer item) {
try {
return objectMapper.writeValueAsString(item);
} catch (Exception e) {
throw new RuntimeException("Unable to Serialized Customer", e);
}
}
}
JobConfiguration.java
#Configuration
public class JobConfiguration {
#Autowired
private JobBuilderFactory jobBuilderFactory;
#Autowired
private StepBuilderFactory stepBuilderFactory;
#Autowired
private KafkaProperties properties;
#Bean
public KafkaItemReader<Long, Customer> kafkaItemReader() {
Properties props = new Properties();
props.putAll(this.properties.buildConsumerProperties());
return new KafkaItemReaderBuilder<Long, Customer>()
.partitions(0)
.consumerProperties(props)
.name("customers-reader")
.saveState(true)
.topic("customers")
.build();
}
#Bean
public FlatFileItemWriter<Customer> customerItemWriter() throws Exception{
String customerOutputPath = File.createTempFile("customerOutput", ".out").getAbsolutePath();
System.out.println(">> Output Path = "+customerOutputPath);
FlatFileItemWriter<Customer> itemWriter = new FlatFileItemWriter<>();
//A LineAggregator implementation that simply calls Object.toString() on the given object
//itemWriter.setLineAggregator(new PassThroughLineAggregator<>());
//Alternate ways
itemWriter.setLineAggregator(new CustomerLineAggregator());
itemWriter.setResource(new FileSystemResource(customerOutputPath));
itemWriter.afterPropertiesSet();
return itemWriter;
}
#Bean
public Step step1() throws Exception {
return stepBuilderFactory.get("step1")
.<Customer, Customer>chunk(100)
.reader(kafkaItemReader())
.writer(customerItemWriter())
.build();
}
#Bean
public Job job() throws Exception {
return jobBuilderFactory.get("job").incrementer(new RunIdIncrementer())
.start(step1())
.build();
}
}
application.properties
##
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.LongDeserializer
spring.kafka.consumer.value-deserializer=org.springframework.kafka.support.serializer.JsonDeserializer
spring.kafka.consumer.group-id=customers-group
##
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.LongSerializer
spring.kafka.producer.value-serializer=org.springframework.kafka.support.serializer.JsonSerializer
spring.kafka.producer.client-id=customers-client
##
spring.kafka.consumer.properties.spring.json.trusted.packages=*
##
spring.kafka.template.default-topic=customers
spring.batch.job.enabled=false
Data Stored in Kafka like this, what do I need to do ?
The serializer needs a custom ObjectMapper with a LocalDateTime serializer.
In order to do that you need to create the serializer as an object instead of via the properties.
The Kafka binder does not currently support that.
To work around it; create your own serializer that wraps or subclasses a JsonSerializer that has an ObjectMapper configured with the JavaTime module - see https://github.com/FasterXML/jackson-modules-java8
This is automatically handled in spring-kafka 2.3 and later.
at org.springframework.kafka.support.serializer.JsonDeserializer.deserialize(JsonDeserializer.java:438) ~[spring-kafka-2.3.7.RELEASE.jar:2.3.7.RELEASE]
Since you are already using Spring Kafka 2.3, you simply need to add the module jar to the classpath.

I canĀ“t connect tb-gateway with thingsboard on Ubuntu server

I need your help...
I configured my Ubuntu 16.04.5 Server as described on https://thingsboard.io/docs/getting-started-guides/helloworld/ and https://thingsboard.io/docs/iot-gateway/getting-started/ but it doesnt realy work. The tb-gateway.log says:
2018-08-02 08:12:17,065 [pool-3-thread-1] INFO o.t.g.service.MqttMessageSender - Sending message [MqttPersistentMessage(id=3156e095-46d0-4e26-a33d-cda11ca330cc, timestamp=0, deviceId=GATEWAY, messageId=905, topic=v1/devices/me/telemetry, payload=[123, 34, 116, 115, 34, 58, 49, 53, 51, 51, 49, 57, 48, 51, 51, 54, 50, 55, 51, 44, 34, 118, 97, 108, 117, 101, 115, 34, 58, 123, 34, 100, 101, 118, 105, 99, 101, 115, 79, 110, 108, 105, 110, 101, 34, 58, 48, 44, 34, 97, 116, 116, 114, 105, 98, 117, 116, 101, 115, 85, 112, 108, 111, 97, 100, 101, 100, 34, 58, 48, 44, 34, 116, 101, 108, 101, 109, 101, 116, 114, 121, 85, 112, 108, 111, 97, 100, 101, 100, 34, 58, 48, 125, 125])]
2018-08-02 08:12:17,065 [pool-3-thread-1] INFO o.t.g.service.MqttMessageSender - Outgoing queue is not empty. [1] messages are still in progress
2018-08-02 08:12:17,065 [pool-3-thread-1] INFO o.t.g.service.MqttMessageSender - Waiting until all messages are sent before going to the next bucket
2018-08-02 08:12:17,068 [pool-4-thread-1] INFO o.t.g.s.gateway.MqttGatewayService - Gateway statistics {"ts":1533190336273,"values":{"devicesOnline":0,"attributesUploaded":0,"telemetryUploaded":0}} reported!
And reqeats this every minute. So it seams to work. But the Thingsboard cant read the massage. The thingsboard.log file says:
2018-08-02 07:54:38,405 [Akka-akka.actor.default-dispatcher-2375] WARN o.t.s.a.ruleChain.RuleChainActor - Unknown message: org.thingsboard.server.actors.stats.StatsPersistTick#54663f30!
2018-08-02 07:54:38,435 [Akka-akka.actor.default-dispatcher-2375] WARN o.t.s.actors.ruleChain.RuleNodeActor - Unknown message: org.thingsboard.server.actors.stats.StatsPersistTick#15327281!
2018-08-02 07:54:38,436 [Akka-akka.actor.default-dispatcher-2375] WARN o.t.s.actors.ruleChain.RuleNodeActor - Unknown message: org.thingsboard.server.actors.stats.StatsPersistTick#29bb19d1!
2018-08-02 07:54:38,475 [Akka-akka.actor.default-dispatcher-2375] WARN o.t.s.actors.ruleChain.RuleNodeActor - Unknown message: org.thingsboard.server.actors.stats.StatsPersistTick#26ee903e!
2018-08-02 07:54:42,226 [Akka-akka.actor.default-dispatcher-2375] WARN o.t.s.actors.ruleChain.RuleNodeActor - Unknown message: org.thingsboard.server.actors.stats.StatsPersistTick#78d307d8!
2018-08-02 07:54:42,295 [Akka-akka.actor.default-dispatcher-2375] WARN o.t.s.actors.ruleChain.RuleNodeActor - Unknown message: org.thingsboard.server.actors.stats.StatsPersistTick#c357f50!
2018-08-02 07:54:42,695 [Akka-akka.actor.default-dispatcher-2375] WARN o.t.s.actors.ruleChain.RuleNodeActor - Unknown message: org.thingsboard.server.actors.stats.StatsPersistTick#58f174d1!
And reqeats this every hour. So the connection isnt working. On the Thingsboard dashboard the gateway can be configured (eg. Extension) but the "Last Telemetry" doesnt show anything.
Please feel free to ask any question if you need more information.
Best regards

unable to use non lagom based static project services with service-locator-dns library

We successfully integrated 'service-locator-dns' in Lagom and deployed in Kubernetes, All services in the Lagom project are properly resolving with Kubernetes SRV requests.
But even statically defined(in build.sbt) non-lagom projects also go through name-translators and srv-translators and finally not resolving.
I have raised the issue for the same in Github https://github.com/lightbend/service-locator-dns/issues/29
Can we avoid this with changes in name-translators itself or do we need to do any extra changes?
It will be very helpful for us if you please provide support or reference any documentation.
Log in kubernetes
log
Resolving: premium-calculator
Translated premium-calculator to _http-lagom-api._tcp.premium-calculator.staging.svc.cluster.local
Resolving _http-lagom-api._tcp.premium-calculator.staging.svc.cluster.local (SRV)
Message to /10.114.0.10:53: Message(16,<QUERY,RD,SUCCESS>,List(Question(_http-lagom-api._tcp.premium-calculator.staging.svc.cluster.local,SRV,IN)),List(),List(),List())
Received message from /10.114.0.10:53: ByteString(0, 16, -127, -125, 0, 1, 0, 0, 0, 1, 0, 0, 15, 95, 104, 116, 116, 112, 45, 108, 97, 103, 111, 109, 45, 97, 112, 105, 4, 95, 116, 99, 112, 18, 112, 114, 101, 109, 105, 117, 109, 45, 99, 97, 108, 99, 117, 108, 97, 116, 111, 114, 7, 115, 116, 97, 103, 105, 110, 103, 3, 115, 118, 99, 7, 99, 108, 117, 115, 116, 101, 114, 5, 108, 111, 99, 97, 108, 0, 0, 33, 0, 1, 7, 99, 108, 117, 115, 116, 101, 114, 5, 108, 111, 99, 97, 108, 0, 0, 6)... and [76] more
Decoded: Message(16,<AN,QUERY,RD,RA,NAME_ERROR>,Vector(Question(_http-lagom-api._tcp.premium-calculator.staging.svc.cluster.local,SRV,IN)),Vector(),Vector(UnknownRecord(cluster.local,60,6,1,ByteString(2, 110, 115, 3, 100, 110, 115, 7, 99, 108, 117, 115, 116, 101, 114, 5, 108, 111, 99, 97, 108, 0, 10, 104, 111, 115, 116, 109, 97, 115, 116, 101, 114, 7, 99, 108, 117, 115, 116, 101, 114, 5, 108, 111, 99, 97, 108, 0, 90, -80, -107, 80, 0, 0, 112, -128, 0, 0, 28, 32, 0, 9, 58, -128, 0, 0, 0, 60))),Vector())
Resolved: Vector()
java.lang.IllegalStateException: Service premium-calculator was not found by service locator
service trait
trait PremiumCalculator extends Service {
def getPremiums(channelName: String): ServiceCall[JsValue, JsValue]
override final def descriptor = {
import Service._
named("premium-calculator")
.withCalls(
restCall(Method.POST, "/api/v2/premium/:channelName", getPremiums _))
.withAutoAcl(true)
}
}
in build.sbt
lagomUnmanagedServices in ThisBuild := Map(
"premium-calculator" -> "https://test.in",
)
For locating Non-Lagom/Third Party Service in Lagom on Kubernetes, we have to use Lagom's service locator. Like this:
lagom.services {
"premium-calculator" = "https://test.in"
}
Also, we have to use ConfigurationServiceLocator to locate the service:
if(environment.isProd()) {
bind(ServiceLocator.class).to(ConfigurationServiceLocator.class);
}
Here ConfigurationServiceLocator locates the service via configuration (as the name suggests).
I hope this helps!

Spark Scala TF-IDF value sorted vectors

So far I have been able to tokenize all of my documents, and use CountVectorizer and IDF from Spark's MLLib. I am trying to get the top 50 words from each document, but I am not sure how to sort the output of IDF.
onePer is a dataframe of document IDs and tokenized documents.
val tf = new CountVectorizer()
.setInputCol("text")
.setOutputCol("features").fit(onePer)
.transform(onePer).select("features").rdd
.map{x:Row => x.getAs[Vector](0)}
tf.cache()
val idf = new IDF().fit(tf)
val tfidf: RDD[Vector] = idf.transform(tf)
This is what my output looks like (number of words in vocab, id of word, word score). I would like to sort by score and get the top k:
(440,[0,2,3,4,5,6,7,8,9,10,12,15,17,18,19,22,23,24,25,26,27,28,30,31,32,33,34,35,39,41,43,45,47,49,51,52,53,55,57,63,66,69,70,71,74,76,79,80,83,84,85,88,94,95,96,97,99,102,106,107,109,111,117,120,121,124,127,128,129,138,142,145,146,149,154,156,164,166,167,170,171,176,187,189,199,203,204,217,218,219,232,234,236,237,238,240,248,250,251,254,259,263,265,267,280,291,296,302,304,309,319,322,328,333,347,361,364,371,375,384,388,393,395,401,403,433,438,439],[1.3559553712291716,3.9422868018213513,0.6369074622370692,7.795697904781566,3.153829441457081,0.0,5.519201522549892,0.3184537311185346,0.3184537311185346,1.3559553712291716,0.4519851237430572,0.4519851237430572,0.6061358035703155,1.0116009116784799,0.4519851237430572,0.7884573603642703,0.4519851237430572,2.0232018233569597,0.7884573603642703,8.523740461192126,0.6061358035703155,0.6061358035703155,0.6061358035703155,0.6061358035703155,0.7884573603642703,0.6061358035703155,0.6061358035703155,0.6061358035703155,0.7884573603642703,0.7884573603642703,1.0116009116784799,1.0116009116784799,2.0232018233569597,0.7884573603642703,0.7884573603642703,3.897848952390783,0.7884573603642703,0.7884573603642703,1.0116009116784799,5.114244276715276,1.0116009116784799,1.0116009116784799,2.5985659682605218,1.2992829841302609,1.2992829841302609,1.0116009116784799,1.0116009116784799,1.0116009116784799,1.0116009116784799,1.0116009116784799,2.5985659682605218,1.0116009116784799,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,3.4094961844768505,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,3.4094961844768505,1.2992829841302609,1.2992829841302609,1.2992829841302609,3.4094961844768505,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253])
Update
I was able to get this working by doing the following:
tfidf.map(x => x.toSparse).map{x => x.indices.zip(x.values)
.sortBy(-_._2)
.take(10)
.map(_._1)
}
This might help:
scala> val x = (440,Array[Int](0,2,3,4,5,6,7,8,9,10,12,15,17,18,19,22,23,24,25,26,27,28,30,31,32,33,34,35,39,41,43,45,47,49,51,52,53,55,57,63,66,69,70,71,74,76,79,80,83,84,85,88,94,95,96,97,99,102,106,107,109,111,117,120,121,124,127,128,129,138,142,145,146,149,154,156,164,166,167,170,171,176,187,189,199,203,204,217,218,219,232,234,236,237,238,240,248,250,251,254,259,263,265,267,280,291,296,302,304,309,319,322,328,333,347,361,364,371,375,384,388,393,395,401,403,433,438,439),Array[Double](1.3559553712291716,3.9422868018213513,0.6369074622370692,7.795697904781566,3.153829441457081,0.0,5.519201522549892,0.3184537311185346,0.3184537311185346,1.3559553712291716,0.4519851237430572,0.4519851237430572,0.6061358035703155,1.0116009116784799,0.4519851237430572,0.7884573603642703,0.4519851237430572,2.0232018233569597,0.7884573603642703,8.523740461192126,0.6061358035703155,0.6061358035703155,0.6061358035703155,0.6061358035703155,0.7884573603642703,0.6061358035703155,0.6061358035703155,0.6061358035703155,0.7884573603642703,0.7884573603642703,1.0116009116784799,1.0116009116784799,2.0232018233569597,0.7884573603642703,0.7884573603642703,3.897848952390783,0.7884573603642703,0.7884573603642703,1.0116009116784799,5.114244276715276,1.0116009116784799,1.0116009116784799,2.5985659682605218,1.2992829841302609,1.2992829841302609,1.0116009116784799,1.0116009116784799,1.0116009116784799,1.0116009116784799,1.0116009116784799,2.5985659682605218,1.0116009116784799,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,3.4094961844768505,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,3.4094961844768505,1.2992829841302609,1.2992829841302609,1.2992829841302609,3.4094961844768505,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.2992829841302609,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253,1.7047480922384253))
scala> val (r, indices, values) = x
r: Int = 440
indices: Array[Int] = Array(0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 17, 18, 19, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 39, 41, 43, 45, 47, 49, 51, 52, 53, 55, 57, 63, 66, 69, 70, 71, 74, 76, 79, 80, 83, 84, 85, 88, 94, 95, 96, 97, 99, 102, 106, 107, 109, 111, 117, 120, 121, 124, 127, 128, 129, 138, 142, 145, 146, 149, 154, 156, 164, 166, 167, 170, 171, 176, 187, 189, 199, 203, 204, 217, 218, 219, 232, 234, 236, 237, 238, 240, 248, 250, 251, 254, 259, 263, 265, 267, 280, 291, 296, 302, 304, 309, 319, 322, 328, 333, 347, 361, 364, 371, 375, 384, 388, 393, 395, 401, 403, 433, 438, 439)
values: Array[Double] = Array(1.3559553712291716, 3.9422868018213513, 0.6369074622370692, 7.795697904781566, 3.153829441457081, 0.0, 5.519201522549892, 0.3184537311185346, 0.31845373...
scala> val topTermIds = indices.zip(values).sortBy( - _._2).take(50).map(_._1)
topTermIds: Array[Int] = Array(26, 4, 7, 63, 2, 52, 109, 124, 138, 5, 70, 85, 24, 47, 176, 187, 189, 199, 203, 204, 217, 218, 219, 232, 234, 236, 237, 238, 240, 248, 250, 251, 254, 259, 263, 265, 267, 280, 291, 296, 302, 304, 309, 319, 322, 328, 333, 347, 361, 364)
Now you need to plug in above code into a closure, something like:
val topTermsByScore = rdd.map { v: Vector =>
// to sort decreasing use -
v.indices.zip(v.values).sortBy( - _._2).take(50).map(_._1)
}

E11000 duplicate key error index: MongoDb unusual error

I have a simple "users" collection inside which right now i have only 2 documents.
{
"_id": ObjectId("4ef8e1e41d41c87069000074"),
"email_id": {
"0": 109,
"1": 101,
"2": 64,
"3": 97,
{
"_id": ObjectId("4ef6d2641d41c83bdd000001"),
"email_id": {
"0": 109,
"1": 97,
"2": 105,
"3": 108,
now if i try to create a new index with {unique: true} on email_id field, mongodb complaints me with "E11000 duplicate key error index: db.users.$email_id dup key: { : 46 }". I get same error even after specifying {dropDups: true}, however i don't think this is the case here, as both document have different email id's stored.
I am not sure what's going on here, any pointers will be greatly appreciated.
Edit: Full view of documents:
{
"_id": ObjectId("4ef8e1e41d41c87069000074"),
"email_id": {
"0": 109,
"1": 101,
"2": 64,
"3": 97,
"4": 98,
"5": 104,
"6": 105,
"7": 110,
"8": 97,
"9": 118,
"10": 115,
"11": 105,
"12": 110,
"13": 103,
"14": 104,
"15": 46,
"16": 99,
"17": 111,
"18": 109
}
}
and
{
"_id": ObjectId("4ef6d2641d41c83bdd000001"),
"email_id": {
"0": 109,
"1": 97,
"2": 105,
"3": 108,
"4": 115,
"5": 102,
"6": 111,
"7": 114,
"8": 97,
"9": 98,
"10": 104,
"11": 105,
"12": 110,
"13": 97,
"14": 118,
"15": 64,
"16": 103,
"17": 109,
"18": 97,
"19": 105,
"20": 108,
"21": 46,
"22": 99,
"23": 111,
"24": 109
}
}
There are a couple of more fields like "display_name", "registered_since", etc which i have omitted from the display above (i don't think they have any role in the error thrown, if you still need them i can probably paste the entire documents here)
I am using erlang mongodb driver for communication with my mongo instance. All fields as can be seen are saved as binary bytes, thats why you see such weird email_id in document.
Note: Binary byte format is not forced by my code logic, i very much pass string email_id inside my bson documents, but i always end up seeing my data as binary bytes. (Probably because how erlang mongodb driver is written, i didn't really investigate on this, since my find(), find_one() and other queries works as expected even with fields saved as binary bytes)
Edit: > db.users.findOne()
{
"_id" : ObjectId("4ef6d2641d41c83bdd000001"),
"email_id" : [
109,
97,
105,
108,
115,
102,
111,
114,
97,
98,
104,
105,
110,
97,
118,
64,
103,
109,
97,
105,
108,
46,
99,
111,
109
],
"display_name" : [
65,
98,
104,
105,
110,
97,
118,
43,
83,
105,
110,
103,
104
],
"provider" : [
106,
97,
120,
108,
46,
105,
109
],
"provider_id" : [ ]
}
When MongoDB indexes an array field, it actually indexes the individual elements in the array. This is to efficiently support queries looking for a particular element of an array, like:
db.users.find({email_id: 46})
Since this email_id (46) exists in both documents, there are duplicate keys in your unique index.
I'm not sure why you would get this error if you have dropDups: true set... can you show a code sample with how you're invoking createIndex? You should also try dropDups: 1, as MongoDB erroneously treats 1 and true differently in this context (see https://jira.mongodb.org/browse/SERVER-4562).
For others having this problem, check your mongo version with db.version(). If you are running Mongo 3 and are trying to use dropDups to clear duplicates, it will fail and give you this error.