MongoDB query for the most similar array

MongoDB query for the most similar array - mongodb

I have a bunch of documents in a MongoDB collection that has a fingerprint, that is made up of 8 booleans in a sequence. I would like to construct a query that will give me the documents that has between 7 to 8 of the booleans similar to my query in the same sequence.
So my query would be something in the line of: Find All documents that has the
fingerPrint = ["true", "false", "true", "true", "true", "true", "false", "true" ]
The query would return both documents, since the first document has all the booleans in the sequence correct and the second document has 7 of the 8 booleans correct in the sequence.
{
"_id" : ObjectId("5538e75c3cea103b25ff94a3"),
"name" : "document1",
"fingerPrint" : [
"true",
"false",
"true",
"true",
"true",
"true",
"false",
"true"
]
},
{
"_id" : ObjectId("5538e75c3cea103b25ff94a4"),
"name" : "document2",
"fingerPrint" : [
"true",
"false",
"true",
"true",
"false",
"true",
"false",
"true"
]
}
How would I go about doing this?
Alternativly: Is there a better way of storing the bit array and be able to query the collection more optimally?

Related

In Kafka, the "Key" does not match the "Id" when updating the document in MongoDB

We are trying to take all the records from MongoDB to Kafka using the com.mongodb.kafka.connect.MongoSourceConnector. The settings are used for connector as follows:
{
"name": "mongo-source",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSourceConnector",
"connection.uri": "mongodb:***:27017/?authSource=admin&replicaSet=myMongoCluster&authMechanism=SCRAM-SHA-256",
"database": "someDb",
"collection": "someCollection",
"output.format.value":"json",
"output.format.key":"json",
"key.converter.schemas.enable":"false",
"value.converter.schemas.enable":"false",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"publish.full.document.only": "true",
"change.stream.full.document":"updateLookup",
"copy.existing": "true"
}
}
When all documents are initially uploaded from MongoDB to Kafka, the "key" corresponds to the "id" from Mongo document:
{"_id": {"_id": {"$oid": "5e54fd0fbb5b5a7d35737232"}, "copyingData": true}}
But when a document in MongoDB is updated, an update with a different "key" gets into Kafka:
{"_id": {"_data": "82627B2EF6000000022B022C0100296E5A1004A47945EC361D42A083988C14D982069C46645F696400645F0FED2B3A35686E505F5ECA0004"}}
Thus, the consumer cannot identify the initially uploaded document and update for it.
Please help me find which settings on the Kafka, Connector or MongoDB side are responsible for this and how I can change the "Key" in Kafka to the same as during the initial upload.

We were facing the same issue and after some search we started using the following config. We defined a avro schema to extract the output schema key. The key is generated consistently and looks like
Struct{fullDocument._id=e2ce4bfe-d03a-4192-830d-895df5a4b095}
Here "e2ce4bfe-d03a-4192-830d-895df5a4b095" is the document id.
{
"change.stream.full.document" : "updateLookup",
"connection.uri" : "<connection_uri>",
"connector.class" : "com.mongodb.kafka.connect.MongoSourceConnector",
"collection": "someCollection",
"copy.existing" : "true",
"database" : "someDb",
"key.converter" : "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable" : "false",
"key.serializer" : "org.apache.kafka.connect.storage.StringConverter",
"name" : "mongo-source",
"output.format.key" : "schema",
"output.json.formatter" : "com.mongodb.kafka.connect.source.json.formatter.SimplifiedJson"
"publish.full.document.only" : "true",
"output.schema.key": "{\"type\":\"record\",\"name\":\"keySchema\",\"fields\":[{\"name\":\"fullDocument._id\",\"type\":\"string\"}]}",
"tasks.max" : "1",
"value.converter" : "org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable" : "false"
}

how to query hstore in typeorm?

I have stored the following object in hstore data type in Postgres
"verified": {
"dob": "true",
"name": "false",
"email": "true"
},
How can i query where "dob" = "true"?

You can write the clause as a string:
this.yourEntityRepository.find({ where: "verified -> 'dob' = TRUE" })
or
this.yourEntityRepository.createQueryBuilder().where("verified -> 'dob' = TRUE").getMany()

How to get kafka message's headers in Kafka Connect Sink connector with MongoDB

How do I retrieve incoming headers from the kafka message with Kafka Connect to store them as additional data fields with MongoDB Sink Connector to mongodb.
I have a kafka topic "PROJECT_EXAMPLE_TOPIC".
As you see I am already able to save msg timestamp, incoming message data and mongo document created/updated dates.
I guess there is a function to extract header somewhere.
Example kafka value
// incoming kafka value
{
"msgId" : "exampleId"
}
How to get original header header_foo ?
//expected example
{
"_id" : ObjectId("5f83869c1ad2db246fa25a5a"),
"_insertedTS" : ISODate("2020-10-11T22:26:36.051Z"),
"_modifiedTS" : ISODate("2020-10-11T22:26:36.051Z"),
"message_source" : "mongo_connector",
"message_timestamp" : ISODate("2020-09-28T21:50:54.940Z"),
"message_topic" : "PROJECT_EXAMPLE_TOPIC",
"msgId" : "exampleId",
"message_header_foo" : "header_foo_value"
}
how to get all kafka headers ?
//expected example
{
"_id" : ObjectId("5f83869c1ad2db246fa25a5a"),
"_insertedTS" : ISODate("2020-10-11T22:26:36.051Z"),
"_modifiedTS" : ISODate("2020-10-11T22:26:36.051Z"),
"message_source" : "mongo_connector",
"message_timestamp" : ISODate("2020-09-28T21:50:54.940Z"),
"message_topic" : "PROJECT_EXAMPLE_TOPIC",
"msgId" : "exampleId",
"message_headers" : {
"header_001" : "header_001_value",
"header_002" : "header_002_value",
...
"header_x" : "header_x_value"
}
}
There is my configuration
{
"name": "sink-mongo-PROJECT-EXAMPLE",
"config": {
"topics": "PROJECT_EXAMPLE_TOPIC",
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"key.converter.schema.registry.url": "SCHEMA_REGISTRY_URL",
"key.converter.schemas.enable": "false",
"key.converter.basic.auth.credentials.source": "USER_INFO",
"key.converter.basic.auth.user.info": "SCHEMA_REGISTRY_API_KEY_AND_SECRET",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "SCHEMA_REGISTRY_URL",
"value.converter.schemas.enable": "false",
"value.converter.basic.auth.credentials.source": "USER_INFO",
"value.converter.basic.auth.user.info": "SCHEMA_REGISTRY_API_KEY_AND_SECRET",
"connection.uri": "PROJECT_REFERENTIAL_MONGO_URL",
"database": "PROJECT_DB_NAME",
"collection": "EXAMPLE",
"max.num.retries": "3",
"retries.defer.timeout": "5000",
"key.projection.type": "none",
"key.projection.list": "",
"field.renamer.mapping": "[]",
"field.renamer.regex": "[]",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.BsonOidStrategy",
"post.processor.chain": "com.mongodb.kafka.connect.sink.processor.DocumentIdAdder",
"value.projection.list": "msgId",
"value.projection.type": "whitelist",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy",
"delete.on.null.values": "false",
"max.batch.size": "0",
"rate.limiting.timeout": "0",
"rate.limiting.every.n": "0",
"change.data.capture.handler": "",
"errors.tolerance": "all",
"errors.log.enable":true,
"errors.log.include.messages":true,
"transforms": "InsertSource,InsertTopic,InsertTimestamp",
"transforms.InsertSource.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertSource.static.field": "message_source",
"transforms.InsertSource.static.value": "mongo_connector",
"transforms.InsertTopic.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertTopic.topic.field": "message_topic",
"transforms.InsertTimestamp.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertTimestamp.timestamp.field": "message_timestamp"
}
}

This is a bit of an old question, but there is a 3rd party message transform that can convert headers to fields on either key or value
https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-transform-common/transformations/HeaderToField.html
This won't allow you to grab all headers though, you need to specify by name the ones you want to extract and their type.

Debezium Postgres and ElasticSearch - Store complex Object in ElasticSearch

I have in Postgres a database with a table "product" which is connected 1 to n with "sales_Channel". So 1 Product can have multiple SalesChannel. Now I want to transfer it to ES and keep it up to date, so I am using debezium and kafka. It is no problem to transfer the single tables to ES. I can query for SalesChannels and Products. But I need Products with all SalesChannels attached as a Result. How get I debezium to transfer this?
mapping for Product
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "integer"
}
}
}
}
}
sink for Product
{
"name": "es-sink-product",
"config": {
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"tasks.max": "1",
"topics": "product",
"connection.url": "http://elasticsearch:9200",
"transforms": "unwrap,key",
"transforms.unwrap.type": "io.debezium.transforms.UnwrapFromEnvelope",
"transforms.unwrap.drop.tombstones": "false",
"transforms.unwrap.drop.deletes": "false",
"transforms.key.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.key.field": "id",
"key.ignore": "false",
"type.name": "_doc",
"behavior.on.null.values": "delete"
}
}

you either need to use Outbox pattern, see https://debezium.io/documentation/reference/1.2/configuration/outbox-event-router.html
or you can use aggregate objects, see
https://github.com/debezium/debezium-examples/tree/master/jpa-aggregations
https://github.com/debezium/debezium-examples/tree/master/kstreams-fk-join

MongoDB query double nested array

Below is my mongo db document named business
{
"_id": ObjectId("5be8e24a6600321ead321466"),
"business_id": "r89Re4FNgVWHgBfjCVZyVw",
"name": "Harlow",
"neighborhood": "Ville-Marie",
"address": "438 Place Jacques Cartier",
"city": "Montréal",
"state": "QC",
"postal_code": "H2Y 3B3",
"stars": 3.5,
"attributes": {
"Alcohol": "full_bar",
"BikeParking": "True",
"BusinessAcceptsCreditCards": "True",
"BusinessParking": "{'garage': False, 'street': False, 'validated': False, 'lot': False, 'valet': False}",
"Caters": "False",
"GoodForMeal": "{'dessert': False, 'latenight': False, 'lunch': False, 'dinner': False, 'breakfast': False, 'brunch': False}",
"RestaurantsDelivery": "False",
"RestaurantsGoodForGroups": "True",
},
"categories": "Nightlife, Bars, American (Traditional), Tapas/Small Plates, Poutineries, Supper Clubs, Restaurants, Tapas Bars",
}
Question:In the above collection named business i need to Find all restaurants which provides meal for lunch. (need to Check attributes-GoodForMeal-lunch)
it is nested array. please suggest me how can this be done with mongo db

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

MongoDB query for the most similar array - mongodb

Related

In Kafka, the "Key" does not match the "Id" when updating the document in MongoDB

how to query hstore in typeorm?

How to get kafka message's headers in Kafka Connect Sink connector with MongoDB

Debezium Postgres and ElasticSearch - Store complex Object in ElasticSearch

MongoDB query double nested array

Categories

Resources