com.mongodb.MongoInternalException: The reply message length is less than the maximum message length 4194304 - azure-cosmosdb-mongoapi

I am trying to get the documents in a mongodb collection using a databricks script in pyspark. I am trying to fetch the data for each day.
Script works fine for some days but sometime it throws following error for some day.
com.mongodb.MongoInternalException: The reply message length 14484499 is less than the maximum message length 4194304.
Not sure what this error is and how to resolve this. Any help is appreciated.
This is the sample code I am running:
pipeline = [{'$match':{'$and':[{'UpdatedTimestamp':{'$gte': 1555891200000}},
{'UpdatedTimestamp':{'$lt': 1555977600000}}]}}]
READ_MSG = spark.read.format("com.mongodb.spark.sql.DefaultSource")
.option("uri",connectionstring)
.option("pipeline",pipeline)
.load()
The datetime is provided in epoch format.

It is more a comment than an answer (I have not enough reputation to post comments).
I have the same issue.
After some Research I found out that it was my nested field "survey" with more than 1 sublevel that was creating the issue, since I was able to read the db by selecting all the other fields except this one:
root
|-- _id: string (nullable = true)
|-- _t: array (nullable = true)
| |-- element: string (containsNull = true)
|-- address: struct (nullable = true)
| |-- streetAddress1: string (nullable = true)
|-- survey: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- SurveyQuestionId: string (nullable = true)
| | |-- updated: array (nullable = true)
| | | |-- element: long (containsNull = true)
| | |-- value: string (nullable = true)
Is there anyone getting closer to a workaround to what seems to be a bug of the mongodb spark connector?

After adding appName in the mongo db connection string, the issue seems to be resolved. I am not getting this error now.

Related

MongoDB doesn't save document when timestamps are initialized in struct

I am having a very weird issue when trying to save a data in Mongodb by the Rust driver. This is my struct
#[derive(Deserialize, Serialize, Debug)]
struct Info {
#[serde(rename = "_id", skip_serializing_if = "Option::is_none")]
id: Option<bson::oid::ObjectId>,
name: String,
created_at: i64,
updated_at: i64,
}
And this is my actix route handler function
async fn post_request(info: web::Json<Info>, data: web::Data<State>) -> impl Responder {
let name: &str = &info.name;
let document = Info {
id: None,
name: name.to_string(),
created_at: Utc::now().timestamp_millis(),
updated_at: Utc::now().timestamp_millis(),
};
// Convert to a Bson instance:
let serialized_doc = bson::to_bson(&document).unwrap();
let doc = serialized_doc.as_document().unwrap();
let collection = data.client.database("test1").collection("users");
let result = collection.insert_one(doc.to_owned(), None).await.unwrap();
HttpResponse::Ok().json(result).await
}
I am getting the Utc struct by chrono crate.
When i am trying to save the data in MongoDB by hitting the route, It doesn't gets saved. But, oddly, when i comment out the created_at and updated_at in struct and handler, it gets saved. If i don't use structs and try to save it as a raw document, by storing created_at and updated_at in variables, then also it gets saved, but not by structs. I am new to rust so maybe i am doing something wrong. Please help

How to query the additional column (other than siblings) added to pivot table?

Here is my RecipeIngredientPivot table schema to which I added an addition quantity column.
-----------------------------------------------------
id | ingredientID | recipeID | quantity
-----------------------------------------------------
| 1 | A001 | R001 | 100 |
| 2 | A002 | R001 | 50 |
| 3 | C004 | R001 | 23 |
| 4 | A001 | R002 | 75 |
I was able to create a quantity column to the Pivot table which has the ingredient_id and recipe_id as siblings by only using Pivot instead of ModifiablePivot.
extension RecipeIngredientPivot: Pivot {
init(_ recipe: Recipe, _ ingredient: Ingredient, _ quantity: Double) throws {
self.recipeID = try recipe.requireID()
self.ingredientID = try ingredient.requireID()
self.quantity = quantity
}
}
However, I am not sure how to query and get quantities of a recipe.
extension Recipe {
var ingredients: Siblings<Recipe, Ingredient, RecipeIngredientPivot> {
return siblings()
}
var quantities: [Double] {
// Not sure how to get this.
}
}
Any help is much appreciated.
I figured it out. I had to do query(on.req) & pivot(on: req) on the siblings() and wait for both to complete successfully before mapping on the results.
let ingredientFuture = try recipe.ingredients.query(on: req).all()
let pivotFuture = try recipe.ingredients.pivots(on: req).all()
return ingredientFuture.and(pivotFuture).map { ingredients, pivots in
for (ingredient,pivot) in zip(ingredients, pivots) {
print("\(ingredient) - \(pivot.quantity)")
}
}

How to pass HashSet to server to test API from postman?

I created an API which I want to test using postman. My api is accepting many parameters and one parameter is HAshSet. I dont know how to pass HashSet parameter using postman. Please help me. Thanks in advance
Here is my code:
#PutMapping
#ApiOperation(value = "collectMultiInvoices", nickname = "collectMultiInvoices")
public BaseResponse collectAmountMultipleInvoices(#RequestParam(value = "invoice_id") HashSet<Integer> invoiceIds,
#RequestParam("date") String _date,
#RequestParam(value = "cash", required = false) Float cashAmount,
#RequestParam(value = "chequeAmount", required = false) Float chequeAmount,
#RequestParam(value = "chequeNumber", required = false) String chequeNumber,
#RequestParam(value = "chequeDate", required = false) String _chequeDate,
#RequestParam(value = "chequeImage", required = false) MultipartFile chequeImage,
#RequestParam(value = "chequeBankName", required = false) String chequeBankName,
#RequestParam(value = "chequeBankBranch", required = false) String chequeBankBranch,
#RequestParam(value = "otherPaymentAmount", required = false) Float otherPaymentAmount,
#RequestParam(value = "otherPaymentType", required = false) Integer otherPaymentType,
#RequestParam(value = "otherPaymentTransactionId", required = false) String otherPaymentTransactionId,
#RequestParam(value = "discountPercentorAmount", required = false) String discountPercentorAmount,
#RequestParam(value = "discountId", required = false) String discountId) throws AppException.RequestFieldError, AppException.CollectionAmountMoreThanOutstanding {
//method implementation
}
A Set or HashSet is a java concept. There is no such thing as a Set from the HTTP perspective, and there is no such thing as a Set in Postman. So from Postman, you need to send the invoice_ids in a format that Spring's parsing library can convert to a HashSet. As #Michael pointed out in the comments, one way to do this is to comma separate the invoice_ids like this: invoice_id=id1,id2,id3. When Spring processes this request, it will see that you are expecting data in the form of a HashSet, so it will attempt to convert id1,id2,id3 into a HashSet<Integer>, which it knows how to do automatically.
Side note: Unless you specifically need a HashSet, it is considered good practice to declare your type using the interface instead of an implementing subclass. So in this situation I would recommend changing your method signature to accept a Set<Integer> instead of a HashSet<Integer>

Conversion Flat File to Cassandra Data Model using Spark and Scala

I am using Spark(Data Frame) and Scala to transform the flat file to Cassandra data model for storing. Cassandra data model have lot of frozen inside each column and difficult to achieve it.
Tried multiple option using dataframe/dataset nothing worked out and do not want to achieve using RDD.
Cassandra Data Model
Transfer Table
transferNumber - String
orderList - Frozen List -forder
forder Frozen
orderNumber TEXT
lineItem Frozen List -flineitem
flineitem Frozen
lineitemid INT
trackingnumber Frozen List -ftrackingnumber
ftrackingnumber Frozen
trackingnumber TEXT
expectedquantity INT
xList List Text
yList LIST TEXT
DataFrame Output
[order1,10,tracking1,WrappedArray(xlist12,xlist13),null,transfer1]
[order1,20,tracking1,null,WrappedArray(ylist14),transfer1]
[order2,10,tracking2,null,WrappedArray(ylist15),transfer1]
Data Frame Schema
root
orderNumber: string (nullable = true)
lineItemId: integer (nullable = true)
trackingNumber: string (nullable = true)
xList: array (nullable = true)
element: string (containsNull = true)
yList: array (nullable = true)
element: string (containsNull = true)
transferNumber: string (nullable = true)
Tried code
val groupByTransferNumber = lineItem.groupBy("transferNumber").agg(collect_set($"orderNumber".alias("order")))
Output:
root
transferNumber: string (nullable = true)
collect_set: array (nullable = true)
element: long (containsNull = true)
[transfer1,WrappedArray(order1,order2)]

YAML data format for mongodb collections and referenced entities

I want to load test data in my scala play application from data.yml file
which is in YAML format.
My entities looks like:
#Entity("users")
User(#Required val uname: String, val isAdmin: Boolean = false) {
#Id var id: ObjectId = _
#Reference val accounts = new ArrayList[Account]
}
#Entity("account")
class Account {
#Id var id: ObjectId = _
#Embedded val addresses = new ArrayList[Address]
#Reference val departments = new ArrayList[Department]
var description : String = _
}
class Address {
street: String = _
city: String = _
}
#Entity("department")
class Department {
#Id var id: ObjectId = _
principal: String = _
}
This is what almost a blank data.yml look like:
User(foo):
uname: Foo
accounts:
I want to load one user with 2 accounts. One of the account has just one address and one department, the other account has 2 addresses and one department to keep things as simple as possible. So what the complete yml data looks to achieve this?
Why can't you just use lists with keys? Using the '- key' notation or '[key1, key2]'? Example:
Department(dep1):
..
Address(address1):
..
Address(address2):
..
Account(account1):
..
addresses:
- address1
departments:
- dep1
Account(account2):
..
addresses:
- address1
- address2
departments:
- dep1
User(user1):
..
accounts:
- account1
- account2
Check http://en.wikipedia.org/wiki/Yaml#Lists