How to capture schema of a JSON file using Talend - talend

I'm trying to capture the schema of a JSON file that I am generating from a SQL database using Talend. I need to store this schema in a separate file. Does anyone know of a way to capture this?

With the metadata section in repository, you can create a JSON File Schema. Here you can import a json file example , it will then generate a schema that you could reuse in the output of your job, in a twritejsonfields component for example.

Related

Azure data factory: Implementing the SCD2 on txt files

I have flat files in adls source,
for full load we are adding 2 columns Insert and datatimestamp.
For change load we need to Lookup with full data, the data available in full should be taken as Updated and not available data as Insert and copy.
below is the approach I tried to work out, but i'm unable to perform.
Can any one help me on this.
Thanks you and waiting for quick response.
Currently, the feature to update the existing flat file using the Azure data factory sink is not supported. You have to create a new flat file.
You can also use data flow activity to read full and incremental data and load to a new file in sink transformation.

Migrate from OrientDB to AWS Neptune

I need to migrate a database from OrientDB to Neptune. I have an exported JSON file from Orient that contains the schema (classes) and the records - I now need to import this into Neptune. However, it seems that to import data into Neptune there must be a csv file containing all the vertex's and another file containing all the edges.
Are there any existing tools to help with this migration and converting to the required files/format?
If you are able to export the data as GraphML then you can use the GraphML2CSV tool. It will create a CSV file for the nodes and another for the edges with the appropriate header rows.
Keep in mind that GraphML is a lossy format (it cannot describe complex Java types the way GraphSON can) but you would not be able to import those into Neptune either.

Generating GraphQL schema from external .graphqls file in scala (sangria)

In graphql-scala, can we parse and use a graphql schema passed externally in some .graphqls file so that I don't have to create case classes for each of the object types. I know this is very much possible in graphql-java which picks the schema from a .graphqls file where I do not have to create POJOs for each object.
So basically, my requirement is that we would have more than 100 tables, schemas for which would be stored in hbase as key-value pairs in json format for each of the tables. I would have a program which would read the schema from hbase and create a .graphqls file to be then used in my graphQL server.

Is there a way to dynamically generate output schema in Data Fusion?

I am using google cloud data fusion to ingest a CSV file from AWS, apply directives, and store the resultant file in GCS. Is there a way to dynamically generate the output schema of the CSV file for Wrangler? I am uploading the input path, directives, output schema as a macro to the argument setter. I can see options only to import and export output schema.
Currently there is no way to get the output schema dynamically since the output schema can be wildly different as a result of a parse directive.

Dynamically create table from csv

I am faced with a situation where we get a lot of CSV files from different clients but there is always some issue with column count and column length that out target table is expecting.
What is the best way to handle frequently changing CSV files. My goal is load these CSV files into Postgres database.
I checked the \COPY command in Postgres but it does have an option to create a table.
You could try creating a pg_dump compatible file instead which has the appropriate "create table" section and use that to load your data instead.
I recommend using an external ETL tool like CloverETL, Talend Studio, or Pentaho Kettle for data loading when you're having to massage different kinds of data.
\copy is really intended for importing well-formed data in a known structure.