Azure Data Factory For Each Loop is importing all my CSV files per iteration instead of just the file name I *think* I've told it to

Azure Data Factory For Each Loop is importing all my CSV files per iteration instead of just the file name I *think* I've told it to - azure-data-factory

I could really do with some help with ADF; I've recently started trying to use it thinking it would be similar to SSIS but wow am I having a hard time! I've built up this kinda complicated pipeline over the last few weeks which basically reads a list of files from a folder and from within a For Each loop it's supposed to check where the data starts per file and import it into a SQL table. I'll not bore you with all the issues I've had so far but atm it seems to be working aside from the For Each part of it, it's importing all the files in the folder per iteration and it seems to be the data set configuration which is not recognising the filename per iteration because if I look through the debugging I can see it pick up the list of files, set the DSFileName variable to the first of them, but the output of the data flow task is both files. So it seems like I've missed a step somewhere and I've just spent the last 5 hours looking and could really do with some help :(
I reckon to have followed the instructions here: https://www.sqlshack.com/how-to-use-iterations-and-conditions-activities-in-azure-data-factory/
Some pictures to show the debugging I've done:
Here it shows it's picking up 2 files (after I filtered out folders and stuff)
Here shows the first file name only being passed into the first data flow
Here shows the output from it, where it has picked up both files somehow and displays the count of 2 files
Here shows the Data Set set up where I believe to have correctly set the variable as the file name to be used
I just don't even know where to start now tbh, I reckon to have checked everything I can see and I'm not using any wild cards or anything. I can see it passing the 1 file name per iteration into that variable but each iteration I can see 2x counts of the file going into the table and the output of each data flow task showing both file counts.
Does anybody have any ideas or know what I've missed?
EDIT 23/07/22: Pics of the source as requested:
Data Source Settings
Data Source Options

So it turns out that adding .name to item() in the dataset parameter means it uses just the current one instead of them all.... I'm confused by this as all the documentation I've read states that item() references the CURRENT item within the For Each, did I misunderstand?
Adding .name to the dataset here is now importing just the current file per loop iteration

Related

How to do duplicate file check in DataStage?

For instance
File A Loaded then next day
File B Loaded then next day
This time Again, File A received this time sequence should be abort
Can anyone help me out with this
Thanks

There are multiple ways to solve this, but please don't do intentionally aborts as they're most likely boomerangs.
Keep track of filenames and file hashes (like MD5sum) in a table and compare the list before loading. If the file is known, handle/ignore it.
Just read the file again as if it was new or updated. Compare old data with new data using the Change Capture stage, handle data as needed, e.g. write changed and new data to target. (recommended)
I would not recommend writing a sequence that "should abort" as this is not the goal of an ETL process. If the file contains the very same content that is already known, just ignore it. If it has updated data, handle it as needed. Only abort, if there is a technical issue, e.g. the file given is wrong formatted. An abort of a job should indicate that something is wrong with the job. When you get a file twice, then it's not the job that failed.
If an error was found in the data that needs to be fixed by others, write the information about it to a table. Have a another independend process monitoring that table to tell the data producer about it (via dashboard, email,...).

Best Practice to Store Simulation Results

Dear Anylogic Community,
I am struggling with finding the right approach for storing my simulation results. I have datasets created that keep track of every value I am interested in. They live in Main (see below)
My aim is to do a parameter variation experiment. In every run, I change the value for p_nDrones (see below)
After the experiment, I would like to store all the datasets in one excel sheet.
However, when I do the parameter variation experiment and afterwards check the log of the dataset (datasets_log), the changed values do not even show up (2 is the value I did set up in the normal simulation).
Now my question. Do I need to create another type of dataset if I want to track the values that are produced in the experiments? Why are they not stored after executing the experiment?
I really would appreciate if someone could share the best way to set up this export of experiment results. I would like to store the whole time series for every dataset.
Thank you!

Best option would be to write the outputs to some external file at the end of each model run.
If you want to use Excel, which I personally would not advise, even though it has a nice excelFile.writeDataSet() function, you can.
I would rather write the data to a text file as you will have much for control over the writing, the file itself, it is thread-safe, and useable in many many more platforms than Microsoft Excel.
See my example below:
Setup parameters in your model that you will write the data to at the end of the model of type TextFile. Here I used the model on destroy code to write out the data from the data sets.
Here you can immediately see the benefit of using the text file! You can add the number of drones we are simulating (or scenario name or any other parameter) in a column, whereas with Excel this would be a pain...
Now you can pass your specific text file to the model to use by adding it to the parameter variation page, providing it to the model through the parameters.
You will see that I also set up some headers for the text file in the Initial Experiment setup part, and then at the very end of the experiment, I close the text files in the After experiment section so that the text files can be used.
Here is the result if you simply right-click on the text files and open them in Excel. (Excel will always have a purpose, even if it is just to open text files ;-) )

where to store scheduler task related data in TYPO3

I'm working on an import script that loads data from a json feed. For this I need to save data at two moments:
1) when running the script I import a number of entries broken down into smaller chunks. This number can vary depending on what the feed delivers. Since I dont want to get the feed on every chunk but only once per import I would like to save the number of written entriers per run to have them available in the next run. Lets say I want to import 100 entries at 25/per run. That would make 4 runs. But now in one run only 20 out of the 25 are eligible for saving. So I'm 5 short at the end of the import. I would need to save the number of saved entries so I can do more runs if needed.
2) to find out how many entries should be imported I would like to save the id of the last imported item so I can check against this id in the next scheduler run of the import.
Where is the best place to safely save and access this data from extbase? The DB (seems excessive?)? The extension configuration (could not find a way to save data here, only read it)? Could I set custom temporary variables in $GLOBALS (at least for question 1)?
Thanks
EDIT: thanks to #Krzysztof Kasprzyca, this works:
// get registry
$registry = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(\TYPO3\CMS\Core\Registry::class);
// get info
$alreadyImported = $registry->get('tx_my_ext_name', 'numberImported');
// set info
$alreadyImported = $registry->set('tx_my_ext_name', 'numberImported', $numberImported);

You can think about database table called sys_registry

In which file is the _AppInfo data stored in Beckhoff TwinCAT 3 PLC

I'm looking for the 'AppTimeStamp' information so this can be used to verify that the code is not updated/changed by service personel.
Detect code changes on Beckhoff PLC using C#
At this location I already find part of my information, but I was not able to add a comment due to the 'new user' limitations

You can find the AppTimestamp in the _AppInfo instance.
So just call _AppInfo.AppTimestamp in your program to know the time of the last application start.
Make sure you also check the number of online changes since last download with the OnlineChangeCnt counter which you will also find in the _AppInfo instance.

There are many possibilities where this value is saved. The TwinCAT saves data to the C:\TwinCAT\3.1\Boot folder, different files are explained here.
The ProjectName can be found for example from the configuration data (CurrentConfig.xml), from the end of the file (TcBootProject/ProjectInfo/ProjectName). The same file contains one date (<TcBootProject CreateTime="2019-06-10T13:14:17">), but it seems to be the build time of the boot project created.
I couldn't find the date of AppTimestamp in any files, but perhaps the TwinCAT uses the creation time of the files in those folders? Or perhaps it's hidden in the binary somewhere.
When you update the software without updating the boot project, the file Port_851_act.tizip is updated. So you can check its timestamp. When you update the boot project too, Port_851_boot.tizip and other files are also updated.
So basically, to check if the code is updated by someone, check that modified dates of the files under Boot directory. I suppose only .bootdata files should update as they contain saved persistent data. Of course, you can easily change the dates with 3rd party program. So one solution is to compare the Port_851.crc file contents since it contains the CRC check value of the code. It will always change when boot project is updated.

Symfony: getting form values before and after form handling

Hello I want to be able to compare values before and after form handling, so that I can process them before flush.
What I do is collect old values in an array before handlerequest.
I then compare new values to the old values in the array.
It works perfectly on simple variables, like strings for instance.
However I want to work on uploaded files. I am able to get their fullpath and names before handling the form but when I get the values after checking if form is valid, I am still getting the same old value.
I tried both calling $entity->getVar() and $form->getData()->getVar() and I have the same output....

Hello I actually found a solution. Yet it is a departure from the strategy announced in my question, which I realize is somewhat truncated regarding my objective. Which was to compare old file names and new names (those names actually include full path) for changes, so that I would unlink those of those old names that were not in the new name list anymore. Basically, to operate a cleanup after a file was uploaded to replace another, without the first one being deleted first. And to save the webmaster the hassle of having to sort between uniqid-named files that are still used by the web site and those that are useless.
Problem is that my upload functions, that are very similar to those given in examples to the file upload code shown on the official documentation pages, seemed to take effect at flush time.
So, since what I wanted to do with those files had nothing to do with database operations, I resorted to having step two code launch after flush, which works fine.
However I am intrigued by your solutions, as they are both strategies I hadn't thought of. Thank you for suggestions.
However I am not sure if cloning the whole object will be as straightforward as comparing two arrays of file names.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Azure Data Factory For Each Loop is importing all my CSV files per iteration instead of just the file name I think I've told it to - azure-data-factory

Related

How to do duplicate file check in DataStage?

Best Practice to Store Simulation Results

where to store scheduler task related data in TYPO3

In which file is the _AppInfo data stored in Beckhoff TwinCAT 3 PLC

Symfony: getting form values before and after form handling

Categories

Resources