Reducing Number of Files and Folders in Information Server - ibm-information-server

Because of the presence of a huge quantity of files under a folder present in Docstore, the tool is loading indefinetly.
Does anyone know a way to delete/unmount folders/files only from Docstore and not from file system (because of business needs).

The short answer to your query is no. Choices are
 
Periodically purging old doc unnecessary doc store files is a performance best practice. Check this link. You may want to leverage IBM MDMPIM DocStore Maintenance Script  job and IBM_MDMPIM_DocStore_Maintenance_Lookup lkp table. 
 
https://www.ibm.com/support/knowledgecenter/en/SSWSR9_11.4.0/com.ibm.pim.adm.doc/sys_admin/pim_con_docstoremaintenance.html
 
If inbound parameter is set to yes in $TOP/etc/default/docstore_mount.xml  then periodically moving the file system files to another backup location (using some cronjobs that moves files on a scheduled fashion) would be ideal choice. This will allow you to see the folder in doc store but will get to see files created since last backup.
 
If you altogether don't want to see the folder in doc store at all then remove corresponding entry in $TOP/etc/default/docstore_mount.xml  and recycle MDM CE application.

Related

Nextcloud - mass removal of collaborative tags from files

due to an oversight in a flow-routine that was meant to tag certain folders on upload into the cloud, a huge amount of unwanted files were also tagged in the process. Now there are thousands upon thousands of files that have the wrong tag and need to be untagged. Neither doing this by hand nor reuploading with the correct flow-routine are really workable options. Is there a way to do the following:
Crawl through every entry in a folder
If its a file, untag it, if its a folder, don't
Everything I found about tags and NextCloud was concerning with handling them when they were uploaded, but never running over existing files in regards of tagging.
Is this possible?
The cloud stores those data into the configured database. So you could simply remove the assigns from the db.
The assigns are stored in oc_systemtag_object_mapping while the tags itself are in oc_systemtag. If you found the ID of the tag to remove (let's say 4), you could simply remove all assignments from the db:
DELETE FROM oc_systemtag_object_mapping WHERE systemtagid = 4;
If you would like to do this only for a specific folder, it's not even getting much more complicated. Files (including their folder structure!) are stored in oc_filecache, while oc_systemtag_object_mapping.objectid references oc_filecache.fileid. So with some joining and LIKEing, you could limit the rows to delete. If your tag is used for non-files, your condition should include oc_systemtag_object_mapping.objecttype = 'files'.

In which file is the _AppInfo data stored in Beckhoff TwinCAT 3 PLC

I'm looking for the 'AppTimeStamp' information so this can be used to verify that the code is not updated/changed by service personel.
Detect code changes on Beckhoff PLC using C#
At this location I already find part of my information, but I was not able to add a comment due to the 'new user' limitations
You can find the AppTimestamp in the _AppInfo instance.
So just call _AppInfo.AppTimestamp in your program to know the time of the last application start.
Make sure you also check the number of online changes since last download with the OnlineChangeCnt counter which you will also find in the _AppInfo instance.
There are many possibilities where this value is saved. The TwinCAT saves data to the C:\TwinCAT\3.1\Boot folder, different files are explained here.
The ProjectName can be found for example from the configuration data (CurrentConfig.xml), from the end of the file (TcBootProject/ProjectInfo/ProjectName). The same file contains one date (<TcBootProject CreateTime="2019-06-10T13:14:17">), but it seems to be the build time of the boot project created.
I couldn't find the date of AppTimestamp in any files, but perhaps the TwinCAT uses the creation time of the files in those folders? Or perhaps it's hidden in the binary somewhere.
When you update the software without updating the boot project, the file Port_851_act.tizip is updated. So you can check its timestamp. When you update the boot project too, Port_851_boot.tizip and other files are also updated.
So basically, to check if the code is updated by someone, check that modified dates of the files under Boot directory. I suppose only .bootdata files should update as they contain saved persistent data. Of course, you can easily change the dates with 3rd party program. So one solution is to compare the Port_851.crc file contents since it contains the CRC check value of the code. It will always change when boot project is updated.

Is the age of an object in Google Cloud Storage affected by calls to set meta?

I'm trying to use Google Cloud Storage's lifecycle management features on a bucket, but I want to circumvent it for certain files (basically auto delete all files after 1 day, except for specific files that I want to keep). If I call the set metadata API endpoint will that update the age of the object and prevent the delete from occurring?
Set metadata changes the last updated time, not the creation time. TTL is keyed off of creation time, so that will not prevent TTL cleanup.
However, you could do a copy operation, and just set the destination to be the same as the source. That would update the creation time, and would be a fast operation as it can copy in the cloud.
That being said, it would probably be safer to just use a different bucket for these files. If your job to keep touching the files goes down they may get deleted.

Can watchman send why a file changed?

Is watchman capable of posting to the configured command, why it's sending a file to that command?
For example:
a file is new to a folder would possibly be a FILE_CREATE flag;
a file that is deleted would send to the command the FILE_DELETE flag;
a file that's modified would send a FILE_MOD flag etc.
Perhaps even when a folder gets deleted (and therefore the files thereunder) would send a FOLDER_DELETE parameter naming the folder, as well as a FILE_DELETE to the files thereunder / FOLDER_DELETE to the folders thereunder
Is there such a thing?
No, it can't do that. The reasons why are pretty fundamental to its design.
The TL;DR is that it is a lot more complicated than you might think for a client to correctly process those individual events and in almost all cases you don't really want them.
Most file watching systems are abstractions that simply translate from the system specific notification information into some common form. They don't deal, either very well or at all, with the notification queue being overflown and don't provide their clients with a way to reliably respond to that situation.
In addition to this, the filesystem can be subject to many and varied changes in a very short amount of time, and from multiple concurrent threads or processes. This makes this area extremely prone to TOCTOU issues that are difficult to manage. For example, creating and writing to a file typically results in a series of notifications about the file and its containing directory. If the file is removed immediately after this sequence (perhaps it was an intermediate file in a build step), by the time you see the notifications about the file creation there is a good chance that it has already been deleted.
Watchman takes the input stream of notifications and feeds it into its internal model of the filesystem: an ordered list of observed files. Each time a notification is received watchman treats it as a signal that it should go and look at the file that was reported as changed and then move the entry for that file to the most recent end of the ordered list.
When you ask Watchman for information about the filesystem it is possible or even likely that there may be pending notifications still due from the kernel. To minimize TOCTOU and ensure that its state is current, watchman generates a synchronization cookie and waits for that notification to be visible before it responds to your query.
The combination of the two things above mean that watchman result data has two important properties:
You are guaranteed to have have observed all notifications that happened before your query
You receive the most recent information for any given file only once in your query results (the change results are coalesced together)
Let's talk about the overflow case. If your system is unable to keep up with the rate at which files are changing (eg: you have a big project and are very quickly creating and deleting files and the system is heavily loaded), the OS can't fit all of the pending notifications in the buffer resources allocated to the watches. When that happens, it blows those buffers and sends an overflow signal. What that means is that the client of the watching API has missed some number of events and is no longer in synchronization with the state of the filesystem. If that client is maintains state about the filesystem it is no longer valid.
Watchman addresses this situation by re-examining the watched tree and synthetically marking all of the files as being changed. This causes the next query from the client to see everything in the tree. We call this a fresh instance result set because it is the same view you'd get when you are querying for the first time. We set a flag in the result so that the client knows that this has happened and can take appropriate steps to repair its own state. You can configure this behavior through query parameters.
In these fresh instance result sets, we don't know whether any given file really changed or not (it's possible that it changed in such a way that we can't detect via lstat) and even if we can see that its metadata changed, we don't know the cause of that change.
There can be multiple events that contribute to why a given file appears in the results delivered by watchman. We don't them record them individually because we can't track them with unbounded history; imagine a file that is incrementally being written once every second all day long. Do we keep 86400 change entries for it per day on hand and deliver those to our clients? What if there are hundreds of thousands of files like this? We'd have to truncate that data, and at that point the loss in the data reduces how well you can reason about it.
At the end of all of this, it is very rare for a client to do much more than try to read a file or look at its metadata, and generally speaking, they want to do that only when the file has stopped changing. For this use case, watchman-wait, watchman-make and trigger all have the concept of a settle period that causes the change notifications to be delayed in delivery until after the filesystem has stopped changing.

Trying to determine best design for this workflow - c# - 3.0

Input Server - files of type jpg,tif, raw,png, mov come in via FTP
Each file needs to be watermarked, if applicable, and meta data added to the file
Then each file needs to be moved to an orders directory where an order file is generated and then packaged as zip file and moved to processing server.
The file names are of [orderid_userid_guid].[jpg|tif|mov|png...]
As I expect the volume to grow I dont want to work on one file at a time and move it through the work flow. I would prefer multi threaded/asynchronous if possible..
I might setup a message queuing and processing system for this.
One process/thread/service will monitor the FTP server, and when new files appear, will grab them and dump them into a queue (possibly MSMQ, or just a staging folder, etc.)
Another process monitors this queue, and when a file appears, it grabs it and does watermarking/metadata/etc., then drops it in another queue/folder.
Another process monitors this queue and grabs new files for zipping. After zipping, drop in another queue.
...and so on.
You can setup "work dispatchers" at the end of each queue to grab files and dispatch them to however many worker threads you want.
You don't necessarily have to split it out into this many separate processes and queues - that's up to you to decide. The "queues" can be implemented in a number of different ways as well. You could look at MSMQ as a start, but you might also consider just moving files between folders, etc. WCF and Windows Workflow Foundation might be good technologies to look at first.