I want to Extract tables in PDF file and insert that data into output sink ( CSV \ Azure SQL etc )
I have tried below things
Analyze custom pdf document using Form Recognizer General document as I just want to scrape Tables
Call "Get Analyze Result" REST API from ADF to get Table Array
Now I want to loop through every Table and Cells and insert data into Azure SQL table
How do I achieve this effectively ?
One way I see is , use JSON parsing along with Looping mechanism in ADF to transform Form Recognizer output row by row
Note : I have checked this post already
Extract PDF table data using Azure Form Recognizer
You should be able to achieve this using the cognitive services API with the external call transformation: https://youtu.be/r22nthp-f4g?t=400
Related
So, I am creating a Copy activity that reads from SQL Server table and have to send the data to an API end point with the PATCH request.
API provider specified that the body must be in the form of
"updates":[{"key1":"value1","key2":"value2","key3":"value3" },
{"key1":"value1","key2":"value2","key3":"value3" }, ...
.... {"key1":"value1","key2":"value2","key3":"value3" }]
However, my sql table maps to json this way (without the wrapper 'updates:')
[{"key1":"value1","key2":"value2","key3":"value3" },
{"key1":"value1","key2":"value2","key3":"value3" }, ...
.... {"key1":"value1","key2":"value2","key3":"value3" }]
I use the copy activity with the sink data set being of type Rest ..
How can we modify the mapping, so that schema gets wrapped by "updates" object ?
Using copy data activity, there might not be any possibility to wrap the data (array of objects) to an updates key.
To do this, I have used a lookup activity to get the data, set variable activity to wrap the data with an updates object key and finally, use Web activity with PATCH method and above variable value as body to complete the activity.
The following is the sample data I have taken for my SQL server table.
Use look up activity to select the data from this table using table or query option (I used query option). The debug output would be as follows:
NOTE: If your data is not same as in sample table I have taken, try using the query option so the output would be something as shown below
In the set variable activity, I have used an array variable and used the following dynamic content to wrap the above array of objects with updates key.
#array(json(concat('{"updates":',string(activity('Lookup1').output.value),'}')))
Now in the Web activity, choose all the necessary settings (PATCH method, authorizations, headers, URL, etc.,) and give the body as follows (I used a fake REST api as a demo):
#variables('tp')[0]
Since I am using the Fake REST API, the activity succeeds, but checking the Web activity debug input shows what is the body that is being passed to the Rest API. The following is an image for reference:
I have a question, hopefully someone in the forum could give some help here. I am able to pull data from Soap API call to SQL Server table (xml data type field actually) via Copy Data Activity. The pipeline that runs this process is metadata driven, so how could I write other parameters in the same SQL Server table for the same run? I am using a Copy Data Activity to load XML data to SQL Server table but in Mapping tab I am not able to select other parameters in order to point them to others SQL table columns.
In addition, I am using a ForEach Activity in order the Copy Data Activity iterates for several values of one column on SQL Server table.
I will appreciate any advice on this.
Thanks
David
Thank you for your interest, I will try to be more explicit with this image: Hopefully this clarify a little bit. Given the current escenario, how could I pass StoreId and CustomerNumber parameters to the table Stage.XmlDataTable?
Taking in to account in the mapping step I am just able to map XML data from the current API call and then write it into Stage.XmlDataTable - XmlData column.
Thanks in advance David
You can add your parameters using Additional Columns in the Copy data activity Source.
When you import schema in mapping you can see the additional columns added in source.
Refer to this MS document for more details on adding additional columns during the copy.
I have an Excel file with 5 sheets: Sheet1, Sheet2, Sheet3, Sheet4, Sheet 5.
In the future, the user can add Sheet6, Sheet7 as well.
I want to create a pipeline to copy all the sheet data into a single table. I want to iterate all the sheets in excel and copy the data from Sheet to a single table.
As per my approach, I have created an Array variable and assigned ["Sheet1", "Sheet2", "Sheet3", "Sheet4", "Sheet5"] and I am using a Foreach loop and Inside Foreach, I am copying the Sheet data to a single table.
In my second approach, I am using a Lookup activity to fetch the sheet info from a SQL table and then using foreach loop to copy the sheet's data into the table.
But, in both the approach, whenever a user adds a new sheet, either, I need to update my ADF pipeline (approach 1) or I need to update my SQL table where Sheet info is present.
I don't want to update either the pipeline or SQL table to fetch data from the new additional sheet. It should iterate dynamically and loads all the sheets' data to a single table. It will do always truncate and load.
Currently getting the sheetnames dynamically in ADF is not possible.
So you would have to write a custom logic to get the list of sheet names and then iterate it over foreach.
For that you can leverage Azure automation/Azure function etc and call them in ADF
ADF - How to copy an Excel Sheet with Multiple Sheets into separate .csv files
I am afraid that this feature is available at this point of time, as Excel is still relatively new for ADF v2, this feature might not be there, but you can submit a feedback or create a feature request for this with Microsoft here
For continuing the job, you will have to follow the same approach that you are using, adding the names of sheet manually
Alternatively, if you don't want to add the names of the sheet then you can provide the user access to update the ADF parameter by giving a custom role to just update it and then inform them to update the parameter list as soon as they add a new sheet
Thanks!
I have a dataset based on a csv file. This exposes a data as follows:
Name,Age
John,23
I have an Azure SQL Server instance with a table named: [People]
This has columns
Name, Age
I am using the Copy Data task activity and trying to copy data from the csv data set into the azure table.
There is no option to indicate the table name as a source. Instead I have a space to input a Stored Procedure name?
How does this work? Where do I put the target table name in the image below?
You should DEFINITELY have a table name to write to. If you don't have a table, something is wrong with your setup. Anyway, make sure you have a table to write to; make sure the field names in your table match the fields in the CSV file. Then, follow the steps outlined in the description below. There are several steps to click through, but all are pretty intuitive, so just follow the instructions step by step and you should be fine.
http://normalian.hatenablog.com/entry/2017/09/04/233320
You can add records into the SQL Database table directly without stored procedures, by configuring the table value on the Sink Dataset rather than the Copy Activity which is what is happening.
Have a look at the below screenshot which shows the Table field within my dataset.
I have over 100 JSON files which is nested and I am trying to Load the JSON files via Data FactoryV2 into SQL Data Warehouse. I have created the Data FactoryV2 and everything seems fine the connection below seems fine and the Data Preview seems fine also.
When I run the Data Factory I get this error:
I am not sure what the issue is. I have tried to re-create the Data Factory several times.
The error message is clear enough when it says "All columns of the table must be specified...". This means that the table in the data warehouse has more columns than what you are seeing in the preview of the json file. You will need to create a table in the data warehouse with the same columns that are shown in the preview of the json files.
If you need to insert them in a table with more fields, create a "staging" table with the same columns as the json file, and then call a stored procedure to insert the content of this staging table in the corresponding table.
Hope this helped!