I'm quite new to Data Factory and have a question regarding looping over a list of consecutive numbers.
I have been assigned a task to retrieve data from an api in Data Factory (based on a pre-existing template). The problem is that the api is split into multiple pages and the link is :".../2020/entries?skippages=1&pagesize=1000".
In my pipeline I therefore need to loop over the page number (the number of iterations is 11).
I have looked a bit into ForEach and Until loops but it seems a lot more complicated than need be.
What is best practice for such a task?
Hopefully, this makes sense. If not, please let me know and I will elaborate.
Thanks in advance.
Azure Data Factory (ADF) and Synapse Pipelines have a number of functions you can use in your pipelines, including range which generates a range of numbers.
All you have to do is specify range in the Items section of a ForEach loop. A simple example:
To explain the definition a bit further, all ADF expressions (not including Mapping Data Flows) start with the # symbol, range is the function, 1 is the start index and 11 is the count or max number to reach. See the help for the range function here.
In order to access the number inside the loop, use the item() syntax, remember # at the start of the expression.
As you are paging from a web API, you should have a good look at the Pagination section of the Copy activity which may offer an alternate and more dynamic approach depending on the capabilities of the API you are calling.
Related
Beginner question. I would like to have a value list display only the records in a found set.
For example, in a law firm database that has two tables, Clients and Cases, I can easily create value list that displays all cases for clients.
But that is a lot of cases to pick from, and invites user mistakes. I would like the selection from the value list to be restricted to cases matched to a particular client.
I have tried this method https://support.claris.com/s/article/Creating-conditional-Value-Lists-1503692929150?language=en_US and it works up to a point, but it requires too much entry of data and too many tables.
It seem like there ought to be a simpler method using the find function. Any help or ideas greatly appreciated.
I have a ForEach activity where inside each iteration, I need to set a few iteration specific variables. I can achieve this by using variables defined for the pipeline (pipeline scope), but this forces me to run the loop in Sequential mode so that multiple iterations running in parallel will not update the same variable. What I really need is the ability to define these variables within each iteration (iteration scope) so that I can run the ForEach activity in parallel mode.
I've considered creating a SQL dataset where I could do a lookup for fake values (SELECT 1 AS var1, 2 AS var2) just to get a structure where I can set and use those values, but that seems really lame. I've also considered using an array variable type with the AppendVariable option, but that introduces a lot of custom parsing.
Would be nice if I could just have an InMemory dataset that doesn't have to be tied to a data source where I could use it as a structure inside my ForEach iteration. Does anyone have any other ideas about how to set iteration specific variables inside ForEach loop?
I agree, this is very annoying and irritating.
If the first part of Jason's answer is vaiable for your situation, then that's definitely the way to go. (Define the variables outside the loop).
But assuming that the variables are dynamically calculated per-iteration, then the only solution that I know is to define the body of the Foreach loop as its own pipeline. Now you can define variable inside that inner pipeline, which are "scoped" to the separate executions of the inner-pipeline.
Quite a lot of ADF's pipeline limitations can be circumvented like this. Nested Ifs/Foreaches, Activity limits, etc.
About the best way to currently do this it to pull the values from an outer lookup or get metadata activity if you can. using the inner lookup wouldn't be as cost efficient or performance efficient. especially if you were iterating over 100's or thousands. Of course this is if you can determine the values for each iteration ahead of time. if you can't than. I would drably go for your lookup approach. or if you can get away from the variables entirely just set the values using an expression using dynamic properties.
Another workaround that worked for me was to use a Filter Activity. It's not super pretty but can help.
Say you want to assign expr to a variable. Just add a filter activity and configure it like this:
Items: #array(expr)
Condition: #equals(1, 1)
Then instead of using a variable simply use the Filter Activity output like this:
#first(activity('<your filter activity name>').output.Value)
I've built a KNIME workflow that helps me analyse (sales) data from numerous channels. In the past I used to export all orders manually and use an XSLX or CSV reader but I want to do it via WooCommerce's REST API to reduce manual labor.
I would like to be able to receive all orders up until now from a single query. So far, I only get as many as the # I fill in for &per_page=X. But if I fill in like 1000, it gives an error. This + my common sense give me the feeling I'm thinking the wrong way!
If it is not possible, is looping through all pages the second best thing?
I've managed to connect to the api via basic auth. The following query returns orders, but only 10:
I've tried increasing the number per_page but I do not think this is the right way to get all orders in one table.
https://XXXX.nl/wp-json/wc/v3/orders?consumer_key=XXXX&consumer_secret=XXXX
My current mindset would like to be able to receive all orders up until now from a single query. But it personally also feels like that this is not the common way to do it. Is looping through all pages the second best thing?
Thanks in advance for your responses. I am more of a data analist than a data engineer or scientist and I hope your answers will help me towards my goal of being more of a scientist :)
It's possible by passing params "per_page" with the request
per_page integer Maximum number of items to be returned in result set. Default is 10.
Try -1 as the value
https://woocommerce.github.io/woocommerce-rest-api-docs/?php#list-all-orders
Is there a way to dynamically create tables in wiki?
Usecase : I'm trying to mimic similar to soap sonar in fitnesse. SOAP SOANR 1. Once we import the wsdl, soap sonar generates inputs for operations in wsdl. 2. Choose a operation, Enter input and then execute the operation. 3. In case of arrays, we can select size of array and enter values in respective array.
Fitnesse 1. I'm able to achieve point 1 using soapui jars. 2. This i'm able to achieve using xmlhttptest fixture
I'm stuck in 3rd point. Is there a way i can do this in fitnesse? (My idea is from point 1, i can get sample input for each operation, from which i will get to know that there are arrays/complex types present in input.xml but how do we represent this in wiki dynamically?
Thanks in advance
What I've done in the past is use ListFixture (and MapFixture) to dynamically fill a List (and Map/Hashes for each element's properties) and then use these as input values to a XmlHttpTest's feature to create the body to be sent using a FreeMarker template (which allows iteration over a list, which I use to create elements in the array based on the list).
But this gets quite complex quickly. Is that level of flexibility truly required? I found that quite often hard coding the number of elements in arrays/lists in the wiki is simpler to do and makes the test far easier to understand/maintain.
I most cases I prefer to create a script (or scenario) with the right number of elements for the test case(s) in with the request in the wiki page. The use of scenarios allows me to test with different values (but the same number of elements). Another element count gets its own script/scenario.
Being able to dynamically change the number of elements is only worthwhile if you need to test for many different counts, otherwise the added complexity of dynamically creating the body is just not worth it.
I have a core data application and I would like to get results from the db, based on certain parameters. For example if I want to grab only the events that occured in the last week, and the events that occured in the last month. Is it better to do a fetch for the whole entity and then work with that result array, to create arrays out of that for each situation, or is it better to use predicates and make multiple fetches?
The answer depends on a lot of factors. I'd recommend perusing the documentation's description of the various store types. If you use the SQLite store type, for example, it's far more efficient to make proper use of date range predicates and fetch only those in the given range.
Conversely, say you use a non-standard attribute like searching for a substring in an encrypted string - you'll have to pull everything in, decrypt the strings, do your search, and note the matches.
On the far end of the spectrum, you have the binary store type, which means the whole thing will always be pulled into memory regardless of what kind of fetches you might do.
You'll need to describe your managed object model and the types of fetches you plan to do in order to get a more specific answer.