Looking for a NoSQL database that store the JSON but does care about the structure to write the document - nosql

I am trying to write a json/csv file to a database that is coming from a third party site. Occasionally the headers change but the data definition is the same. They might add columns or new fields. I would love to have a database that I could map the changes after so that I could pull the fields together to report. Example if fields were "Daily Sales" for a while and then they changed it to "Sales", I could run report on total sales for the year.

Related

REST V2 ingest of JSON events to a S3 bucket (avoid duplicates)

I would like to ask you for help.
I am trying to ingest events in JSON from a source using REST API (REST V2 connector) in a raw format.
The source allows me to pass parameters "take" and "days" in the headers. The parameter "take" allows us to specify how many records to take, parameter "days", specifies how old events to request.
The job I have created works fine for Data Ingestion to a database, where I map filed to columns in the database.
I tried million things, and the two recent problems I am facing when I try to ingest files into a bucket or database in raw format:
For mass ingestion: there are no incremental jobs options available (for REST V2 source), so I am getting duplicate records, and ingestion never stops.
Is there a way to stop mass ingestion and avoid duplicates when all records are ingested?
For Data integration to a DB: Each record/event I attempt to ingest has multiple fields. Since I DON'T want to separate the records (I WANT entire documents in JSON), I pack all files into an array. The problem is that when I request ten records (or N records), all records get ingested into a single row in a table.
Here is what I mean:
TABLE DB:
ROW1: "array packed" JSON1, JSON2 .... JSNO_N...JSON10 "/array packed"
ROW2: empty
This is what I need (each record in a separate row in raw format)
TABLE DB:
ROW1: JSON1
ROW2: JSON2
ROWN: JSON_N
ROW10: JSON_10
I was also trying to accomplish this using a lambda function. The problem with lambda is that I will have to make sure there is no duplicates (Informatica has this cool option "upsert" that allows me to avoid duplicates).
At the end of the day, I don't care if this will be accomplished using data integration, mass ingestion, or lambda and if ingest will be directly into DB or S3. For now, I am trying to find a working solution.
If somebody can come up with some ideas, I will appreciate the help.

How to remove duplication with Talend Data Preparation?

I would like to remove duplication with my Talend Data Preparation and I have a column named: HOURS, I want to calculate those hours between them and remove the email and names duplication, here is an example of my table :
As you can see I have a lot of user_name and email is the same, but my hours are not same, I want to add my hours together depending on the user_name and email and remove any duplication of my user_name and email at the same time.
(I am not really into Data Prep, so perhaps there is an inside solution that I don't know of).
I think you can't have a GROUP BY with a SUM operation in Talend Data Preparation, as the tool is only able to correct lines of data, and can't make aggregation operations.
You'll be able to sum your data with a tAggregateRow in Talend Data Integration, after exporting your corrected data from Data Prep.

JasperReports and Database Design Concept

I have a fairly simple question concerning a design with a view for how reports would look like when the program is complete. I use Java, with JasperReports for my reporting needs.
Correct me if I'm wrong, but JasperReports does not make elements from different tables overlap to make sense, for example, in my case, I would like the the sales and receipt of an Item in a single report ordered by date (Sales and Receipts overlap). I have a sales table and a receipt table in the database.
The question is, should I redesign my database so that both sales and receipts are stored in the same table, or is there a way jasper reports can merge both tables and make reports overlap in a tabular form?
You can write an SQL query, from which the table in jasper-reports will take it's data.
In this SQL, you can write a join that will get you the data from both of the tables.
So you can leave the design of your tables as it is now.

(JasperReports) Combine data from different datasources as columns of the same report row

I am evaluating JasperReports (CE) as a reporting solution for one of my clients.
As for now I like it very much and it looks like a pretty solid platform. One thing I cannot find info about, is the possibility of combining results of sub-queries made to different datasources in one report (not as drill-down sub-reports but as different columns of the same row).
As in example: there is some products info in one database (Firebird), but the sales info, actual stock and purchase prices are stored in a different system, which uses different database (SQL Server of Microsoft). In both databases products are represented with the same product unique code. So I need to query the first database to obtain the "master recordset" for fulfilling some report columns, and then query each product for additional info, which is stored in the second database, combining resulting data from both datasources in the same row as different columns of the same report.
Is it possible with JasperReports? If not, I'd appreciate your suggestions on other reporting solutions being able to fulfill my request.
Since your row data is from different DBs, you need to query the required tables in both Dbs, build a BeanDatasource from the resultsets and pass it on to jasper reports.

Transferring data from Lotus Notes to DB2 using Agent

before you say something, I searched a lot but didn't find how to do that.
So I got database in .NSF format for use in Lotus Notes. I need to write an Agent (I know how to) so data from that database will be automatically transferred to DB2 database.
So before I create DB2 tables, how do i know which structure I need to use? How do I check how exactly data in that .NSF file is stored?
Thanks
Notes documents are unstructured, there's no guarantee that any two documents in a database have the same structure. You will need to decide what data you want to transfer to a relational table, then check each document to see if it contains the corresponding fields (items). You didn't mention what language you're planning to use for your agent; in Java you would use NotesDocument.getItems() to enumerate all items in a document.
As mustaccio also said, since Notes/Domino is a NoSQL database, you don't have a schema.
You should talk to the developer of the application and get an understanding of what data is lovated where.
You could of course use the Design Synopsis function in Domino Designer to export the actual design, but document can potentially contain data not showing up in the design.
If you want to export the documents as XML, I have a tool I wrote available here: http://www.texasswede.com/home.nsf/Page/Notes%20XML%20Exporter
You can export all the documents and then look at the XML to see what data you have.