Dataset Empty parameter value - azure-data-factory

I have an xml dataset, I want to parametrize the compression type to treat .xml and .xml.gz files with the same pipeline :
When I put 'gzip' value in compression type it reads xml.gzip file. I want to know what value I should put to read uncompressed .xml file because it does not accept empty value. It is able to read xml file just when I delete the compression_type parameter

You should pass "None" and it should work out .

I feel "None" is more of a workaround in this particular case. "None" is still a string value, not empty.
In my scenario right now, I have an Excel dataset. I want to make every parameter as generic as possible, including the file path/name, sheet name, and the range. The value of "Range" under Connection tab allows empty value. However if I specify it as #dataset().DataRange and leave my parameter DataRange empty, I cannot preview the data or submit the pipeline because it complains that the value cannot be empty.

Related

how to pass the outputs from Get metadata stage and use it for file name comparison in databricks notebook

I have 2 Get metadata stages in ADF which is fetching file names from 2 different folders, I need to use these outputs for file name comparison in databricks notebook and return true if all the files are present.
how to pass the output from Get meta data stages to databricks and perform string comparison and
return true if all files are present and return false if even 1 file is missing
How to achieve this?
Please find the below answer which I explained with 1 Get metadata stage , the same can be replicated for more than one also.
Create an ADF pipeline with below activities.
Now in the Get Metadata activity , add the childItems in the Fieldlist as argument, to pass the output of Get Metadata to Notebook as show below
In the Databricks Notebook activity , add the below parameter as Base Paramter which will capture the output of Get Metadata and pass as input paramater to Notebook. Generally this parameter will of object datatype , but I converted to string datatype to access the names of files in the notebook as show below
#string(activity('Get Metadata1').output.childItems)
Now we can able to access the Get Metadata output as string in the notebook.
import ast
required_filenames = ['File1.csv','File2.csv','File3.csv'] ##This is for comparing with the output we get from GetMetadata activity.
metadata_value = dbutils.widgets.get('metadata_output') ##Accessing the output from Get Metadata and storing into a variable using databricks widgets.
metadata_list = ast.literal_eval(metadata_value) ##Converting the above string datatype to the list datatype.
blob_output_list=[] ##Creating an empty list to add the names of files we get from GetMetadata activity.
for i in metadata_list:
blob_output_list.append(i['name']) ##This will add all the names of files from blob storage to the empty list we created above.
validateif = all(item in blob_output_list for item in required_filenames) ##This validateif variable now compare both the lists using list comprehension and provide either True or False.
I tried in the above way and can able to solve the provided requirement. Hope this helps.
Request to please upvote the answer if this helps in your requirement.

Setting Default Value to 'None' for Parameter (Azure Data Factory)

I am currently trying to parametrize a dataset so that the compression type of a binary file can be set without creating a new dataset.
The issue that I am having is that I cannot seem to make the compression type default to 'None' while still having a parameter. I have tried typing in null, '', etc. but nothing seems to work. The pipeline will either not run, or it returns an error that it cannot be null, invalid type "" etc. Any advice would be appreciated.
--Update
It so appears this is something to do with how Binary dataset is designed.
Going through the Binary Dataset code I can see the below difference in usage of compression method/code block.
It indeed leads to an error for default value set to None.
And when compression type is set to None
You can reach out to support for official response or log an issue here or share an idea here
Also, it works fine when using the Default None option from the dropdown for compression properties in case of Binary dataset.
Checkout..
However...
I have tried this at both source and sink with csv files - Source.csv and Source.csv.gz both ways
You can set the parameter default value to None. And it works just fine.
Same is expected for Binary format Dataset properties but in vain 😕
Not an obvious solution, but you can add a parameter named "CompressionType" to your dataset and then edit the dataset json to add this under "typeProperties":
"#if(equals(dataset().CompressionType,'None'),'no_compression','compression')": {
"type": "#dataset().CompressionType"
}

How do I prevent users to use thousands separator in FileMaker Pro?

In FileMaker Pro, when using number field, the user can choose to use a thousand separator or not. For example, if I have a database with a field for the price of an item, the user can either enter 1,000 or 1000.
I am using my database to generate an XML file that needs to be uploaded. The thing is, that my XML scheme dictates that only a value of 1000 is allowed and not 1,000. Therefore, I want to either automatically remove the comma, or (my preference in this case) alert the user when trying to enter a value with a thousand separator.
What I tried is the following.
For the field, I am setting Validation options. For example:
Require Strict data type: Numeric Only
Validated by calculation: Position ( Self ; ","; 1 ; 1 ) = 0
Validated by calculation: Self = Substitue ( Self, ",", "")
Auto-enter calculation: Filter( Self ; "0123456789." )
Unfortunately, none of these work. As the field is defined as a number (and I want to keep it like this, as I am also performing calculations based on this number), the Position function and the Substitute function apparently ignore the thousand separator!
EDIT:
Note that I am generating my XML by concatenating a string, for example:
"<Products><Product><Name>" & Name & "</Name><Price>" & Price & "</Price></Product></Product>"
The reason is that what I am exporting is dependent on the values in my database. Therefore, I am not using the [File][Export records...] function.
Auto-enter calculation will work, but you need to uncheck the box "Do not replace existing value of field" (which is checked by default).
I'd suggest using the calculation GetAsNumber(self) as the auto-enter calc. If it should only contain integers, wrap that in a call to Int()
I am using my database to generate an XML file that needs to be uploaded. The thing is, that my XML scheme dictates that only a value of 1000 is allowed and not 1,000.
If this is only a problem when you export, why not handle it when exporting?
If you are exporting as XML using XSLT, you can add an instruction to
your stylesheet to remove the comma from all number fields;
Alternatively, you can export from a layout where the field is
formatted to display without the comma and select the Apply current's layout data formatting to exported data option when
exporting.
Added:
Perhaps I should have clarified. I am not using the export function to generate the XML as there is some logic involved in how the XML should be formatted (dependent on the data that I want to export). What I do instead is that I make a string where I combine XML-tags and actual values from the database.
IMHO, you're making a mistake by not taking advantage of the built-in XML/XSLT export option. Any imaginable logic can be implemented this way, without burdening your solution with the fragile task of creating a valid XML.
In any case, if you're using the field in a calculation, you can replace all references to it with:
GetAsNumber (YourField )
to get an unformatted, numeric-only, value.
Your question puzzles me. As far as I know, FileMaker does not store the thousands separator, but rather offers it only as a display option.
That's also why those functions can't find it.
Are you sure you are exporting the raw data and not a "formatted as layout" variant?

Talend How To Pass Last Modified File Into TFileInputDelimited?

I have searched all over, and read this post.
But it doesn't seem complete and doesn't work.
The situation: I need to get the last modified file from a directory on the local machine. I then need to pass that file into the fileinputdelimited component.
I currently have:
tfilelist --> iterate --> titeratetoflow --> tsamplerow
-->tflowtoiterate -> tinpufiledelimited ---> tlogrow (just to make sure its pulling the right file)
But it doesn't work. I have configured it. so that titeratetoflow has a column called
"FileName" with "((String)globalMap.get("CURRENT_FILE"))" as the value,
"FileDirectory" with ((String)globalMap.get("CURRENT_FILEDIRECTORY")) as value, and
"FileAndDirectory" with ((String)globalMap.get("CURRENT_FILEPATH")) as value.
The tsamplerow is limited to "1".
The tiflowtoiterate is set so that
"FileNameOnly" is value of "FileName"
"FileDirectoryOnly" is "FileDirectory" and
"FilePathComplete" is "FileAndDirectory"
In the File location field of the tinputfiledelimited, I have "((String)globalMap.get("FilePathComplete"))"
When it runs I get an error saying cannot find file or path. If I cut out the fileinput component and have it send straight to the tlogrow, it shows a single line of blank entry.
Any ideas?
I'm not sure if you've just slightly misconfigured the job here but it seems to work fine for me.
Here's a few screenshots showing my job design:
The only thing I can think of just by looking at your post is that you might have slightly messed up the key value pair combinations in the tFlowToIterate. I tend to find that the default settings there work fine pretty much all of the time and it makes it a little more obvious what it's doing as well.
EDIT: Actually, it looks like you might be using the wrong values in your tIterateToFlow. The tFileList will throw the values for the file paths etc in to the global map but it will preface it with the unique component name. If you hit ctrl+space in the value window it should prompt you with a list of available values (these are also specified in the "Outline" tab of the studio). It typically makes an implicit conversion to String but for this you will need to explicitly convert it so use .toString() instead of (String).
Another way to get last modified file is as below
tFileList(sorted DESC by file modified date) ------> tFixedFlowInput (schema - filename, filenumber) ----->tHashOutput
here in tFixedFlowInput
filename = file(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")+"/"+(String)globalMap.get("tFileList_1_CURRENT_FILE")
filenumber = (Integer)globalMap.get("tFileList_1_NB_FILE")
What above will accomplish is get list of all files in the directory with their number/rank - where the file last modified will have file number =1 and next to that will have 2...and so on.
Now on SubJobOK of above tFileList you can have tHashInput which will read from above tHashOutput and filter only row where filenumber==1 - which means the last modified file.
tHashInput (link to tHashoutput) ---->tFilterRow(filenumber==1)------>tLogRow
One reason why you are getting null is probably you have used globalMap.get("CURRENT_FILEPATH) instead of globalMap.get("tFileList_1_CURRENT_FILEPATH")
The Simple Solution for above problem could be as below:
tFileList(sorted ASC by file modified date)--> tIterateToFlow --> tJava( just to end the subjob).
Then on
subjob ok --> tfileinput ( use (String)globalMap.get("tFileList_1_CURRENT_FILE") or (String)globalMap.get("tFileList_1_CURRENT_FILEPATH") as a file name/file path)
Explanation:
Since tFileList iterates all the files in ASC order, it will always have Latest file name stored in globalMap for the last iteration. The list is only iterated till tIterateToFlow hence after this component (String)globalMap.get("tFileList_1_CURRENT_FILE") will always give the last file name from the iterated list, which is the latest file in out case.
Main Flow :
Component View:

Text input through SSRS parameter including a Field name

I have a SSRS "statement" type report that has general layout of text boxes and tables. For the main text box I want to let the user supply the value as a parameter so the text can be customized, i.e.
Parameters!MainText.Value = "Dear Mr.Doe, Here is your statement."
then I can set the text box value to be the value of the parameter:
=Parameters!MainText.Value
However, I need to be able to allow the incoming parameter value to include a dataset field, like so:
Parameters!MainText.Value = "Dear Mr.Doe, Here is your [Fields!RunDate.Value] statement"
so that my report output would look like:
"Dear Mr.Doe, Here is your November statement."
I know that you can define it to do this in the text box by supplying the static text and the field request, but I need SSRS to recognize that inside the parameter string there is a field request that needs to be escaped and bound.
Does anyone have any ideas for this? I am using SSRS 2008R2
Have you tried concatenating?
Parameters!MainText.Value = "Dear Mr.Doe, Here is your" & [Fields!RunDate.Value] & "statement"
There are a few dramatically different approaches. To know which is best for you will require more information:
Embedded code in the report. Probably the quickest to
implement would be embedded code in the report that returned the
parameter, but called String.Replace() appropriately to substitute
in dynamic values. You'll need to establish some code for the user for which strings will be replaced. Embedded code will get you access to many objects in the report. For example:
Public Function TestGlobals(ByVal s As String) As String
Return Report.Globals.ExecutionTime.ToString
End Function
will return the execution time. Other methods of accessing parameters for the report are shown here.
1.5 If this function is getting very large, look at using a custom assembly. Then you can have a better authoring experience with Visual Studio
Modify the XML. Depending on where you use
this, you could directly modify the .rdl/.rdlc XML.
Consider other tools, such as ReportBuilder. IF you need to give the user
more flexibility over report authoring, there are many tools built
specifically for this purpose, such as SSRS's Report Builder.
Here's another approach: Display the parameter string with the dataset value already filled in.
To do so: create a parameter named RunDate for example and set Default value to "get values from a query" and select the first dataset and value field (RunDate). Now the parameter will hold the RunDate field and you can use it elsewhere. Make this parameter hidden or internal and set the correct data type. e.g. Date/Time so you can format its value later.
Now create the second parameter which will hold the default text you want:
Parameters!MainText.Value = "Dear Mr.Doe, Here is your [Parameters!RunDate.Value] statement"
Not sure if this syntax works but you get the idea. You can also do formatting here e.g. only the month of a Datetime:
="Dear Mr.Doe, Here is your " & Format(Parameters!RunDate.Value, "MMMM") & " statement"
This approach uses only built-in methods and avoids the need for a parser so the user doesn't have to learn the syntax for it.
There is of course one drawback: the user has complete control over the parameter contents and can supply a value that doesn't match the report content - but that is also the case with the String Replace method.
And just for the sake of completeness there's also the simplistic option: append multiple parameters: create 2 parameters named MainTextBeforeRunDate and MainTextAfterRunDate.
The Textbox value expression becomes:
=Parameters!MainTextBeforeRunDate.Value & Fields!RunDate.Value & Parameters!MainTextAfterRunDate.Value.
This should explain itself. The simplest solution is often the best, but in this case I have my doubts. At least this makes sure your RunDate ends up in the final report text.