where to store scheduler task related data in TYPO3 - typo3

I'm working on an import script that loads data from a json feed. For this I need to save data at two moments:
1) when running the script I import a number of entries broken down into smaller chunks. This number can vary depending on what the feed delivers. Since I dont want to get the feed on every chunk but only once per import I would like to save the number of written entriers per run to have them available in the next run. Lets say I want to import 100 entries at 25/per run. That would make 4 runs. But now in one run only 20 out of the 25 are eligible for saving. So I'm 5 short at the end of the import. I would need to save the number of saved entries so I can do more runs if needed.
2) to find out how many entries should be imported I would like to save the id of the last imported item so I can check against this id in the next scheduler run of the import.
Where is the best place to safely save and access this data from extbase? The DB (seems excessive?)? The extension configuration (could not find a way to save data here, only read it)? Could I set custom temporary variables in $GLOBALS (at least for question 1)?
Thanks
EDIT: thanks to #Krzysztof Kasprzyca, this works:
// get registry
$registry = \TYPO3\CMS\Core\Utility\GeneralUtility::makeInstance(\TYPO3\CMS\Core\Registry::class);
// get info
$alreadyImported = $registry->get('tx_my_ext_name', 'numberImported');
// set info
$alreadyImported = $registry->set('tx_my_ext_name', 'numberImported', $numberImported);

You can think about database table called sys_registry

Related

How to do duplicate file check in DataStage?

For instance
File A Loaded then next day
File B Loaded then next day
This time Again, File A received this time sequence should be abort
Can anyone help me out with this
Thanks
There are multiple ways to solve this, but please don't do intentionally aborts as they're most likely boomerangs.
Keep track of filenames and file hashes (like MD5sum) in a table and compare the list before loading. If the file is known, handle/ignore it.
Just read the file again as if it was new or updated. Compare old data with new data using the Change Capture stage, handle data as needed, e.g. write changed and new data to target. (recommended)
I would not recommend writing a sequence that "should abort" as this is not the goal of an ETL process. If the file contains the very same content that is already known, just ignore it. If it has updated data, handle it as needed. Only abort, if there is a technical issue, e.g. the file given is wrong formatted. An abort of a job should indicate that something is wrong with the job. When you get a file twice, then it's not the job that failed.
If an error was found in the data that needs to be fixed by others, write the information about it to a table. Have a another independend process monitoring that table to tell the data producer about it (via dashboard, email,...).

Azure Data Factory For Each Loop is importing all my CSV files per iteration instead of just the file name I *think* I've told it to

I could really do with some help with ADF; I've recently started trying to use it thinking it would be similar to SSIS but wow am I having a hard time! I've built up this kinda complicated pipeline over the last few weeks which basically reads a list of files from a folder and from within a For Each loop it's supposed to check where the data starts per file and import it into a SQL table. I'll not bore you with all the issues I've had so far but atm it seems to be working aside from the For Each part of it, it's importing all the files in the folder per iteration and it seems to be the data set configuration which is not recognising the filename per iteration because if I look through the debugging I can see it pick up the list of files, set the DSFileName variable to the first of them, but the output of the data flow task is both files. So it seems like I've missed a step somewhere and I've just spent the last 5 hours looking and could really do with some help :(
I reckon to have followed the instructions here: https://www.sqlshack.com/how-to-use-iterations-and-conditions-activities-in-azure-data-factory/
Some pictures to show the debugging I've done:
Here it shows it's picking up 2 files (after I filtered out folders and stuff)
Here shows the first file name only being passed into the first data flow
Here shows the output from it, where it has picked up both files somehow and displays the count of 2 files
Here shows the Data Set set up where I believe to have correctly set the variable as the file name to be used
I just don't even know where to start now tbh, I reckon to have checked everything I can see and I'm not using any wild cards or anything. I can see it passing the 1 file name per iteration into that variable but each iteration I can see 2x counts of the file going into the table and the output of each data flow task showing both file counts.
Does anybody have any ideas or know what I've missed?
EDIT 23/07/22: Pics of the source as requested:
Data Source Settings
Data Source Options
So it turns out that adding .name to item() in the dataset parameter means it uses just the current one instead of them all.... I'm confused by this as all the documentation I've read states that item() references the CURRENT item within the For Each, did I misunderstand?
Adding .name to the dataset here is now importing just the current file per loop iteration

Logging a counter value to a batch name in siemens TIA Portal

I need to create a program for 1214 PLC in TIA Portal and a Comfort HMI that counts several products using a count up and stores that value to a specific batch name.
For every new batch, the operator would enter a new batch name, and the counter will count the products for that specific batch.
The count needs to be displayed on the HMI screen along with the history of batches and the associated final count number.
So basically, I need a way to attach a name (batch_id) to a final count and log that pair for later reference.
Can someone give me some advice as to how I would do that?
To clarify, I need help with storing and displaying the counter value and batch names, not with the counting itself.
I appreciate any help you can provide.
There are a few ways to do this (yes, you can use PLC data logs and no they don't have to create a separate file for each batch), but I am posting here what I would do, because it's convenient for data backups, I have taken this approach before, and know it works.
Write the count value (generated in the PLC), the batch value and the timestamp to a CSV file on a USB drive inserted into the Comfort HMI, using VBScripts on the HMI.
Split the files regularly - e.g. daily, weekly or monthly, to minimize the risk of any single file becoming corrupt and you losing the data. More detail follows.
Data Storage:
Count is calculated in the PLC. Batch ID and timestamp can be stored in the PLC (if you want it to be retentive after a power cut), or in the HMI.
You will have Comfort HMI tags representing each of these three values. Once a batch is complete, call a VB script that writes the values of these values to CSV file. There are application examples and forum entries on SIOS about this.
Data display as a table:
Read the CSV file values according to your filter criteria (day, time range, batch ID, batch ID range, etc) using a VB script. Write to internal HMI tags.
Display these internal HMI tags as IO fields on a Comfort panel screen. This is your custom-built table and yes it's the only way to do it unless you want to create a custom control and install it on the panel.
Backing up:
Disable logging and check USB is not in use using a script, e.g. this: https://support.industry.siemens.com/cs/document/89855157
Remove the USB, copy the files, re-insert it and activate logging again.
(you implement the 'disable' and 'activate' logging features, e.g. using an internal BOOL tag that prevents a script from executing).
There is a lot of info on SIOS about these topics, as Application Examples, FAQs and forum entries.
support.industry.siemens.com
The PLC log method works, but data backup and especially display can become a pain.

Can we change multicapabilies in between the test running in protractor

I am using protractor-cucumber framework(protractor 5.2.2 and cucumber 3.2.0).
I have a requirements like this - posting some details(from DB) to an application with different user credentials.
Currently, I am doing with a single login credential. So, in beforeLaunch() I have to call one function (which create temporary table that is having all data to be entered for that user), it will split the data for each set(let it be Set 1, Set 2 and Set 3). And I am running the automation script in a 3 nodes by selenium grid by passing this set of numbers to the query (which is used to fetch data from the temporary table according to the set number).
I have a loop in my js file to enter data row by row. And I have set the getMultiCapabilities() dynamically (by dividing total numbers of rows of a table for the given user by a constant number).
I can successfully run it like this. But when I need to run for multiple user, each node may have data for different users. So i need to run in a way that, process one user at a time in all threads and then for next user.
Is it possible to do it like this? Thanks in advance.
You have a tricky way to run your tests. I'm sure that it could be done in a more "easier to understand" way.
But if does not break your flow, I think you could archive what you want with creating several config files. Where you will keep specific data for each user.
Better to split logic. In test spec files should be nothing specific about user, just something const user = someClass.getUser(). Separately, you should have some class that managed these users. And again, separately, the class where you get and receive and ... data about User X from DB or filesystem or API or whatever.

How to increment a number from a csv and write over it

I'm wondering how to increment a number "extracted" from a field in a csv, and then rewrite the file with the number incremented.
I need this counter in a tMap.
Is the design below a good way to do it ?
EDIT: im trying a new method. see the design of my subjob below, but i have an error when i link the tjavarow to my main tmap in the main job
Exception in component tMap_1
java.lang.NullPointerException
at mod_file_02.file_02_0_1.FILE_02.tFileList_1Process(FILE_02.java:9157)
at mod_file_02.file_02_0_1.FILE_02.tRowGenerator_5Process(FILE_02.java:8226)
at mod_file_02.file_02_0_1.FILE_02.tFileInputDelimited_2Process(FILE_02.java:7340)
at mod_file_02.file_02_0_1.FILE_02.runJobInTOS(FILE_02.java:12170)
at mod_file_02.file_02_0_1.FILE_02.main(FILE_02.java:11954)
2014-08-07 12:43:35|bm9aSI|bm9aSI|bm9aSI|MOD_FILE_02|FILE_02|Default|6|Java
Exception|tMap_1|java.lang.NullPointerException:null|1
[statistics] disconnected
enter image description here
You should be able to do this mid flow in a tMap or a tJavaRow.
Simply read the number in as an integer (or other numeric data type) and then add your increment to it.
A really simple example might look like this:
Here we have a tFixedFlowInput that has some hard coded values for the job:
And we run it through a tMap where we add 1 to the age column:
And finally, we output it to the console in a table:
EDIT:
As Gabriele B has pointed out, this doesn't exactly work when reading and writing to the same flat file as Talend claims an exclusive read-write lock on the file when reading and keeps it open throughout the job.
Instead you would have to write the incremented data to some other place such as a temporary file, a database or even just to the buffer and then read that data in to a separate job which would then output the file you want and clean up anything temporary.
The problem with that is you can't do the output in the same process. I've just tried testing reading in the file in one child job, passing the data back to a parent job using a tBufferOutput and then passing that data to another child job as a context variable and then trying to output to the file. Unfortunately the file lock remains on it so you can't do this all in one self contain job (even using a parent job and several child jobs).
If this sounds horrible to you (it is) and you absolutely need this to happen (I'd suggest a database table sounds like a better match for this functionality than a flat file) then you could raise a feature request on the Talend Jira for the tFileInputDelimited to not hold the file open or to not insist on an exclusive read-write lock on the file.
Once again, I strongly recommend that you move to using a database table for this because even without the file lock issue, this is definitely not the right use of a flat file and this use case perfectly fits a database, even something as lightweight as an embedded H2 database.