Straightforward way to store protobuf java object in Mongodb? - mongodb

When there is a need to store protobuf3 message coming in form of Java instance (from the generated java class), the best option is to store the object itself and later to read it back from the database.
My usecase it to store such messages in Mongodb.
While investigating this, I couldn't find a way, so decided to ask here.

You could transform the Protobufs Java class to JSON and from JSON to an org.bson.Document and then write the document and reverse that transform when reading.
See JsonFormat for more details.
Here's a simple example on the write side:
YourProtobufsClass anInstance = YourProtobufsClass.getDefaultInstance();
String json = JsonFormat.printer().print(yourProtobufsClass);
Document document = Document.parse(json);
mongoClient.getDatabase(...).getCollection(...).insertOne(document);
Here's a simple example on the read side:
JsonFormat.Parser parser = JsonFormat.parser();
FindIterable<Document> documents = collection.find(...);
for (Document document : documents) {
YourProtobufsClass.Builder builder = YourProtobufsClass.newBuilder();
parser.merge(document.toJson(), builder);
// now you can build an instance of your protobufs class which
// has been populated from the retrieved JSON
YourProtobufsClass anInstance = builder.build();
}

Related

Unmarshal map[string]types.AttributeValue to specific business model / struct

I am trying to use AWS SDK GO v2: https://github.com/aws/aws-sdk-go-v2
And seem to have a hard time unmarshalling the dynamodb.GetItemOutput's Item attribute which is of type map[string]types.AttributeValue.
in AWS SDK GO v1, it's easy to call dynamodbattribute.UnmarshalMap(result.Item, &data) to unmarshal the result. But on v2, I can't find any way to do this.
does anyone have an idea how to do it ?
I was able to find an answer, thanks to Sean McGrail, one of the contributors of the aws-sdk-go-v2 project.
The attributevalue library has methods to unmarshal and marshal the query results to your specific business model/struct:
https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue
I just needed to manually import this library since this wasn't pre-included during download of aws-sdk-go-v2:
go get github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue
Using github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue
You can unmarshal each result item returned by the "Scan" API into a go struct.
var object struct {
PropertyA string `json:"property_a"`
PropertyB string `json:"property_b"`
}
_ = attributevalue.UnmarshalMap(<item>, &object)
Whenever I have to unmarshall JSON data, which is the default output of most AWS SDK responses, I:
Barf out the entire JSON blob (fmt.Println(result))
Copy and paste that into https://mholt.github.io/json-to-go
Snarf the resulting struct into my code, say as MyStruct
Call json.Unmarshal([]byte(result), &MyStruct)

Testing Abstract Document Repository which interacts with MongoDB

Heey all,
I'm having troubles to set up a testcase.
I have a plain symfony 3 project connected to mongodb. I have multiple documents which each needs an extra method to query the database. The method will get the last document inserted in the collection and is called findLatestInserted().
This specific function was duplicated in each document repository. So I decided to extract it and create a class BaseDocumentRepository which extends the default DocumentRepository. All of my document repositories still have their own DocumentRepository class let's say: CpuInfoRepository, RamInfoRepository. These classes do offer a few extra methods to query the mongodb database and the one in common: findLatestInserted()
It all works fine but just in case i'd wanted to write a unit test for this method findLatestInserted().
I have a test database called prototyping-test which is used to create a document and query it and check the result. Afterwards it'll clear itself so no documents stays in. For each repository there's a specific url to post data to to create a file in the database. To create a CpuInfo collection you'll post data to http://localhost:8000/ServerInfo/CreateCpuInfo. To create a RamInfo collection you'll post data to http://localhost:8000/ServerInfo/CreateRamInfo.
So here follows my question how would i write a test to test the method findLatestInserted()?
this is what i've tried so far:
public function testFindLatestInserted()
{
$client = self::createClient();
$crawler = $client->request('POST',
'/ServerInfo/CreateCpuInfo',
[
"hostname" => $this->hostname,
"timestamp" => $this->timestamp,
"cpuCores" => $this->cpuCores,
"cpu1" => $this->cpu1,
"cpu2" => $this->cpu2
]);
$this->assertTrue($client->getResponse()->isSuccessful());
$serializer = $this->container->get('jms_serializer');
$cpuInfo = $serializer->deserialize($client->getResponse()->getContent(), 'AppBundle\Document\CpuInfo', 'json');
$expected = $this->dm->getRepository("AppBundle:CpuInfo")->find($cpuInfo->getId());
$stub = $this->getMockForAbstractClass('BaseDocumentRepository');
$actual = $this->dm
->getRepository('AppBundle:CpuInfo')
->findLatestInserted();
$this->assertNotNull($actual);
$this->assertEquals($expected, $actual);
}
At the line $actual = $this->dm->getRepository('AppBundle:CpuInfo')->findLatestInserted(); i got stuck. As this would only test for CpuInfo while there is RamInfo too (and some other classes not mentioned here). How would one approach this setting?
I specificly want to test the method findLatestInserted() on the level of the abstract class instead of the concrete classes.
Please help me out!
Instead of testing the whole stack, just concentrate on testing findLatestInserted() in concrete classes.
Inject MondoDB stub into AppBundle:CpuInfo and check if findLatestInserted() returns expected value.
Do the same for AppBundle:RamInfo.
Avoid testing abstract classes, always test concrete classes.
In future, you may decide not to inherit from BaseDocumentRepository and may not notice that new implementation of findLatestInserted() fails.

How to change binary file into RDD or Dataframe?

http://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds
The link shows how to change txt file into RDD, and then change to Dataframe.
So how to deal with binary file ?
Ask for an example ,Thank you very much .
There is a similar question without answer here : reading binary data into (py) spark DataFrame
To be more detail, I don't know how to parse the binary file .for example , I can parse txt file into lines or words like this:
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
});
It seems that I just need the API that could parse the binary file or binary stream like this way:
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.bin").map(
new Function<String, Person>() {
public Person call(/*stream or binary file*/) throws Exception {
/*code to construct every row*/
return person;
}
});
EDIT:
The binary file contains structure data (relational database 's table,the database is a self-made database) and I know the meta info of the structure data.I plan to change the structure data into RDD[Row].
And I could change every thing about the binary file when I use FileSystem's API (http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html) to write the binary stream into HDFS .And The binary file is splittable. I don't have any idea to parse the binary file like the example code above . So I cann't try anything so far.
There is a binary record reader that is already available for spark (I believe available in 1.3.1, atleast in the scala api).
sc.binaryRecord(path: string, recordLength: int, conf)
Its on you though to convert those binaries to an acceptable format for processing.

Getting Json to a model object

I need to get Json to my model but i have problems(i am beginner).
my model
https://gist.github.com/anonymous/1c2e88cb83cbeace6f34
and i need to do getting json to my model
and i need json->to model convertion
and i use web service for getting json
and how can i implement object list of jobs to my model
controller
https://gist.github.com/anonymous/c526483b29be0b198bca
i need objects to edit some details and i want to re convert to Json
my opinion is this
i am open new ideas
Thanks...
It looks like you have nearly everything you need there. To convert builds in the JSON to a list of Build, you'd do this:
val js: JsValue = response.json
(js \ "builds").as[List[Build]]
To modify a field, you can do, for example:
val build = builds.head // get the first build
val modifiedBuild = build.copy(name = "new name")
Then to convert that back to being JSON:
Json.toJson(mp)

Combine two PDF-a documents using ITextSharp

hoping that someone can see the flaw in my code to merge to PDF-a documents using ITextSharp. Currently it complains about missing metadata which PDF-a requires.
Document document = new Document();
MemoryStream ms = new MemoryStream();
using (PdfACopy pdfaCopy = new PdfACopy(document, ms, PdfAConformanceLevel.PDF_A_1A))
{
document.Open();
using (PdfReader reader = new PdfReader("Doc1.pdf"))
{
pdfaCopy.AddDocument(reader);
}
using (PdfReader reader = new PdfReader("doc2.pdf"))
{
pdfaCopy.AddDocument(reader);
}
}
The exact error received is
Unhandled Exception: iTextSharp.text.pdf.PdfAConformanceException: The document catalog dictionary of a PDF/A conforming file shall contain
the Metadata key
I was hoping that the 'document catalog dictionary' would be copied as well, but I guess the 'new Document()' creates an empty non-conforming document or something.
Thanks! Hope you can help
Wouter
You need to add this line:
copy.CreateXmpMetadata();
This will create some default XMP metadata. Of course: if you want to create your own XMP file containing info about the documents you're about to merge, you can also use:
copy.XmpMetadata = myMetaData;
where myMetaData is a byte array containing a correct XMP stream.
I hope you understand that iText can't automatically create the correct metadata. Providing metadata is something that needs human attention.