Compose Transporter throws error when collection_filters is set to sync data for current day from DocumentDB/MongoDB to file/ElasticSearch - mongodb

I am using Compose Transporter to sync data from DocumentDB to ElasticSearch instance in AWS. After one time sync, I added following collection_filters in pipeline.js to sync incremental data daily:
// pipeline.js
var source = mongodb({
"uri": "mongodb <URI>"
"ssl": true,
"collection_filters": '{ "mycollection": { "createdDate": { "$gt": new Date(Date.now() - 24*60*60*1000) } }}',
})
var sink = file({
"uri": "file://mongo_dump.json"
})
t.Source("source", source, "^mycollection$").Save("sink", sink, "/.*/")
I get following error:
$ transporter run pipeline.js
panic: malformed collection_filters [recovered]
panic: Panic at 32: malformed collection_filters [recovered]
panic: Panic at 32: malformed collection_filters
goroutine 1 [running]:
github.com/compose/transporter/vendor/github.com/dop251/goja.(*Runtime).RunProgram.func1(0xc420101d98)
/Users/JP/gocode/src/github.com/compose/transporter/vendor/github.com/dop251/goja/runtime.go:779 +0x98
When I change collection_filters so that value of "gt" key is single string token (see below), malformed error vanishes but it doesn't fetch any document:
'{ "mycollection": { "createdDate": { "$gt": "new Date(Date.now() - 24*60*60 * 1000)" } }}',
To check if something is fundamentally wrong with the way I am querying, tried simple string filter and that works well:
"collection_filters": '{ "articles": { "createdBy": "author name" }}',
I tried various ways to pass createdDate filter but either getting malformed error or no data. However same filter on mongo shell gives me expected output. Note that I tried with ES as well as file as sink before asking here.

Related

How to parse protobuf object in Kafka message by Trino

I run Trino in a Docker container using this command and passing configuration files:
sudo docker run -p 8080:8080 -v /mnt/c/trino-config-test/catalog/kafka.properties:/etc/trino/catalog/kafka.properties -v /mnt/c/trino-config-test/topic_description/kafka.mario.test.json:/etc/kafka/kafka.mario.test.json trinodb/trino
kafka.properties
connector.name=kafka
kafka.nodes=(hidden):9092
kafka.table-names=mario.test
kafka.hide-internal-columns=false
kafka.mario.test.json
{
"tableName": "test",
"schemaName": "mario",
"topicName": "mario.test",
"key": {
"dataFormat": "raw",
"fields": [
{
"name": "kafka_key",
"dataFormat": "LONG",
"type": "BIGINT",
"hidden": "false"
}
]
}
}
I checked inside the container and files are copied successfully.
I successfully manage to connect Trino to Kafka, in fact I can run queries on the topic "mario.test" using the CLI.
The topic has the following structure:
The key is type of String but the actual value of this is an Integer, e.g. "12345"
The value is a Protobuf object
I have two problems:
kafka.mario.test.json is not taken for whatever reason, in fact I have the column "kafka_key" is not created. It's not relavant for the moment, but I wanted to mention this anyway.
When I try to parse the Protobuf object, I get an error message.
This is the code for reference:
try (
Connection conn = DriverManager.getConnection(url, properties);
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT _key as key, _partition_id as partition, _message as value FROM kafka.mario.test LIMIT 1");
) {
log.info("Iterating on result set...");
// Extract data from result set
while (rs.next()) {
log.info("{}", rs.getBigDecimal("partition"));
log.info("{}", rs.getString("key"));
final String valueString = rs.getString("value");
final ScheduledJob value = ScheduledJob.parseFrom(valueString.getBytes(StandardCharsets.UTF_8)); // EXCEPTION THROWN
log.info("{}", value);
}
} catch (SQLException e) {
e.printStackTrace();
}
The error message is:
Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
I can't really find a way to deserialize that object properly and Trino doesn't offer any support for Protobuf as far as I could read. I hope that someone is able to help me.

MongoDB Kafka connect ChangeStreamHandler do not support truncatedArrays

I am using ChangeStreamHandler in mongo Kafka sink connector to stream changes from mongo source to sink collection
"change.data.capture.handler": "com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler"
On updates events from the source MongoDB collection the change stream handler is failing with exception
ERROR Unable to process record SinkRecord{kafkaOffset=3, timestampType=CreateTime} ConnectRecord{topic='quickstart.sampleData', kafkaPartition=0, key={"_id": {"_data": "8262A5CD4B000000012B022C0100296E5A1004B80560BF7F114B04962A5F523CEAB5D046645F6964006462A5CC9B84956FD488691BF10004"}}, keySchema=Schema{STRING}, value={"_id": {"_data": "8262A5CD4B000000012B022C0100296E5A1004B80560BF7F114B04962A5F523CEAB5D046645F6964006462A5CC9B84956FD488691BF10004"}, "operationType": "update", "clusterTime": {"$timestamp": {"t": 1655033163, "i": 1}}, "ns": {"db": "quickstart", "coll": "sampleData"}, "documentKey": {"_id": {"$oid": "62a5cc9b84956fd488691bf1"}}, "updateDescription": {"updatedFields": {"hello": "moto"}, "removedFields": [], "truncatedArrays": []}}, valueSchema=Schema{STRING}, timestamp=1655033166742, headers=ConnectHeaders(headers=)} (com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData)
org.apache.kafka.connect.errors.DataException: Warning unexpected field(s) in updateDescription [truncatedArrays]. {"updatedFields": {"hello": "moto"}, "removedFields": [], "truncatedArrays": []}. Cannot process due to risk of data loss.
at com.mongodb.kafka.connect.sink.cdc.mongodb.operations.OperationHelper.getUpdateDocument(OperationHelper.java:99)
at com.mongodb.kafka.connect.sink.cdc.mongodb.operations.Update.perform(Update.java:57)
at com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler.handle(ChangeStreamHandler.java:84)
at com.mongodb.kafka.connect.sink.MongoProcessedSinkRecordData.lambda$buildWriteModelCDC$3(MongoProcessedSinkRecordData.java:99)
at java.base/java.util.Optional.flatMap(Optional.java:294)
Below is the Change stream event received on the sink side
{"schema":{"type":"string","optional":false},"payload":"{\"_id\": {\"_data\": \"8262A5CD4B000000012B022C0100296E5A1004B80560BF7F114B04962A5F523CEAB5D046645F6964006462A5CC9B84956FD488691BF10004\"}, \"operationType\": \"update\", \"clusterTime\": {\"$timestamp\": {\"t\": 1655033163, \"i\": 1}}, \"ns\": {\"db\": \"quickstart\", \"coll\": \"sampleData\"}, \"documentKey\": {\"_id\": {\"$oid\": \"62a5cc9b84956fd488691bf1\"}}, \"updateDescription\": {\"updatedFields\": {\"hello\": \"moto\"}, \"removedFields\": [], \"truncatedArrays\": []}}"}
On looking at the code in class
com.mongodb.kafka.connect.sink.cdc.mongodb.operations.OperationHelper.getUpdateDocument(OperationHelper.java:99)
It shows that the updateDescription.updatedfields only handles updatedFields & removedFields.. support for truncatedArrays is not present.
Is this a bug? or I need to tune my source connector to somehow stop sending truncatedArrays in changeEvents.
I had the same issue here and i could solve that setting up the following configuration at the Source Connector:
"change.stream.full.document": "updateLookup"
A Full Exemple:
{
"name": "mongo-simple-source",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"connection.uri": "yourMongodbUri",
"database": "yourDataBase",
"collection": "yourCollection",
"change.stream.full.document": "updateLookup"
}
}

OpsManager mongodb deployment issue adding PLAIN auth

I'm trying to enable PLAIN authentication security over a mongodb replica shard managed with OpsManager following their documentation https://docs.opsmanager.mongodb.com/v4.0/tutorial/enable-ldap-authentication-for-group/ .
The issue I'm facing is at the automation-agent trying to get mongoS status while restarting after enabling security. Please see the error output below:
<mongos_5> [09:18:19.711] Failed to compute states :
<mongos_5> [09:18:19.711] Error calling ComputeState : <mongos_5> [09:18:19.632] Error getting current config from running mongo using conn params = mongos01:27017 (local=false) :
<mongos_5> [09:18:19.632] Error getting pid for mongos01:27017 (local=false) :
<mongos_5> [09:18:19.632] Error running command for runCommandWithTimeout(dbName=admin, cmd=[{serverStatus 1} {locks false} {recordStats false}]) :
result={"$clusterTime":{"clusterTime":6808443558471663617,"signature": {"hash":"e44BxV30B7dTpampo4VZsVuio7E=","keyId":6808441655801151517}},"code":13,"codeName":"Unauthorized",
"errmsg":"command serverStatus requires authentication","ok":0,"operationTime":6808443558471663617} connection=&{mongos01:27017 (local=false) 2 true 0xc4207b21a0 2020-03-26 09:18:19.627337419 +0000 UTC 0xc4207bdef0 <nil> }
identityUsed= : command serverStatus requires authentication
I noticed that even if opsmanager is not able to get the status the security was enabled successfully and PLAIN authentication mechanism works but the status hangs at
Start the process ... Start MongoDB process
I tried this over the API following mongodb-labs repo https://github.com/mongodb-labs/mms-api-examples/blob/master/automation/api_usage_example/configs/security_ldap_cluster.json but also manually following mongodb docs but everytime I'm facing the same error.
After all I enabled LDAP(PLAIN) only for mongo in mongoconfig file (see below the ops manager API snippet call example), and avoid enable in opsmanager for the agents also.
{
"args2_6": {
"net": {
"port": 28001
},
"replication": {
"replSetName": "rs0"
},
"storage": {
"dbPath": "/data/mongo"
},
"systemLog": {
"destination": "file",
"path": "/data/mongo/mongodb.log"
},
"security": {
"authorization": "enabled"
},
"setParameter": {
"saslauthdPath": "",
"authenticationMechanisms": "PLAIN,MONGO-CR,SCRAM-SHA-256",
}
}, ...

resource type error while trying to use cloudformation

I tried to use the exact same example provided in the user guide mentioned below. It works from console but fails to create stack using client.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-athena-namedquery.html
I got an error while trying to execute the following:
{
"Resources": {
"AthenaNamedQuery": {
"Type": "AWS::Athena::NamedQuery",
"Properties": {
"Database": "swfnetadata",
"Description": "A query that selects all aggregated data",
"Name": "MostExpensiveWorkflow",
"QueryString": "SELECT workflowname, AVG(activitytaskstarted) AS AverageWorkflow FROM swfmetadata WHERE year='17' AND GROUP BY workflowname ORDER BY AverageWorkflow DESC LIMIT 10"
}
}
}
}
Is the "create-stack" parameter of cloudformation correct?
aws cloudformation create-stack --stack-name dnd --template-body file://final.json
Why am I getting a resource type error like this?
An error occurred (ValidationError) when calling the CreateStack operation: Template format error: Unrecognized resource types: [AWS::Athena::NamedQuery]
It worked when I updated my CLI version as suggested in the comment. This issue is now closed.

Fiware-Cygnus ERROR: invalid input syntax for type point

I have an attribute in Fiware-Orion with type Point, as follows,
{
"name": "Coords",
"type": "geo:point",
"value": "LATITUDE,LONGITUDE"
}
and Cygnus is subscribed to Orion. When Cygnus PostgreSQLSink is receiving the event and it tries to store it in the PostgreSQL database I'm getting the following error
WARN sinks.OrionSink: Bad context data (ERROR: invalid input syntax for type point: "[]"
What should I use in Fiware-Orion?