Removing versions from versionStorage for deleted pages AEM (JackRAbbit) - aem

We have a large number of deleted pages in our application. However, the versions still exits in the version storage. Is there any way to delete them.
I tried by traversing through the /jcr:system/jcr:versionStorage and identifying the deleted pages. However, when I try to Delete the version, I get the following error.
javax.jcr.nodetype.ConstraintViolationException: Item is protected.
Also, if I try to Purge the page through code, due to the high volume of deleted pages present in the repository, I get the error based
(org.apache.jackrabbit.oak.plugins.index.property.strategy.ContentMirrorStoreStrategy) - Traversed 10000 nodes (31911 index entries) using index cqParentPath with filter Filter(query=sele
ct [jcr:path], [jcr:score], * from [nt:version] as a where isdescendantnode(a,
Please help, as I am literally stuck with this issue.
So basic question is, is there any way to delete the nodes containing the versions for deleted pages in JackRabbit (AEM)

Related

TYPO3 sets always some extension database values to 1

I am usung TYPO3 V11.5.3 and figured out, that some pages not working anymore. The reason was, that in the database of the extension, in some entries the delete flag have been set.
If I set this values back to 0 with phpmyadmin, the next day these values are again set to 1 and the web pages are not working.
What's going on there?
How can I avoid it, that TYPO3 sets again these values?
Edit
Rudi, this extension has worked for about 1 year without any problem.
The extension has 3 databases, Album, Discs and Tracks. A album has one or more discs and a number of tracks. The extion is collecting this information (BE) and displays it (FE).
Can it be, that TYPO3 is automatically setting back the changes I have made with phpmyadmin?
** EDIT **
I tried several things, but they didn't solved theproblem!
Finally, I deleted the effected tables and but them new. These seems to solve the problem.
Finally I figured out was the problem was.
I generated my extension via extension_builder. And extension-builder created a list view for the frontend, which I didn't use, but also not deleted. This list view was found by GOOGLE and executed the delete function of this list view!

Nextcloud - mass removal of collaborative tags from files

due to an oversight in a flow-routine that was meant to tag certain folders on upload into the cloud, a huge amount of unwanted files were also tagged in the process. Now there are thousands upon thousands of files that have the wrong tag and need to be untagged. Neither doing this by hand nor reuploading with the correct flow-routine are really workable options. Is there a way to do the following:
Crawl through every entry in a folder
If its a file, untag it, if its a folder, don't
Everything I found about tags and NextCloud was concerning with handling them when they were uploaded, but never running over existing files in regards of tagging.
Is this possible?
The cloud stores those data into the configured database. So you could simply remove the assigns from the db.
The assigns are stored in oc_systemtag_object_mapping while the tags itself are in oc_systemtag. If you found the ID of the tag to remove (let's say 4), you could simply remove all assignments from the db:
DELETE FROM oc_systemtag_object_mapping WHERE systemtagid = 4;
If you would like to do this only for a specific folder, it's not even getting much more complicated. Files (including their folder structure!) are stored in oc_filecache, while oc_systemtag_object_mapping.objectid references oc_filecache.fileid. So with some joining and LIKEing, you could limit the rows to delete. If your tag is used for non-files, your condition should include oc_systemtag_object_mapping.objecttype = 'files'.

Cleaning up duplicate files in TYPO3

There are several duplicate files in my TYPO3 installation. Also some dupes in sys_file for the same file (different 'uid', same 'identifier' and 'storage').
These have several reasons:
first of all, this is an older site, so the previous behaviour (before FAL) resulted in duplicates anyway which were then moved to _migrated. (I am not sure if the upgrade wizard at that point did some cleaning up as well.)
editors just upload things more than once sometimes and lose track of existing files (in spite of filemounts used and a sensisble directly structure and thumbnails)
I don't know the exact reason for the dupes in sys_file, but they appear to be mostly related to the _migrated files.
What I would now like to do is create a script / extension to clean this up or assist editors to clean it up (e.g. show duplicates).
files with same content hash (but different filename / path) could be merged which means also merging all references
duplicates in sys_file should also get merged
I have a rough idea how this could be done but would like to know if there are already tools, experiences or knowledge anyone could share.

Google Search Appliance (GSA) feeds - unpredictable behavior

We have a metadata-and-url feed and a content feed in our project. The indexing behaviour of the documents submitted using either feed is completely unpredictable. For the content feed, the documents get removed from the index after a random interval every time. For the metadata-and-url feed, the additional metadata we add is ignored, again randomly. The documents themselves do remain in index in the latter case - only our custom metadata gets removed. Basically, it looks like the feeds get "forgotten" by GSA after sometime. What could be the cause of this issue, and how do we go about debugging this?
Points to note:
1) Due to unavoidable reasons, our GSA index is always hovering around the license limit (+/- 1000 documents or so). Could this have an effect? Are feeds purged when nearing license limit? We do have "lock = true" set in the feed records though.
2) These fed documents are not linked to from pages and hence (I believe) would have low page rank. Are feeds automatically purged if not linked to from pages?
3) Our follow patterns include the fed documents.
4) We do not use action=delete with the same documents, so that possibility is ruled out. Also for the content feed we always post all the documents. So they are not removed through feeds.
When you hit the license limit the GSA will start dropping documents from the index so I'd say that's definitely your problem.

Swiftstack - Containers not getting removed

Even after deleting containers and objects directly from file system, Swift is listing the containers when executed GET command on it. However, if we try to delete the container with DELETE command then 404: Not Found error message is returned. Please explain whether there is something wrong or is there some kind of cache?
I think the problem came from deleting the containers and/or objects directly from the file system.
Swift's methods for handling write requests for object and container have to be very careful to ensure all the distributed index information remains eventually consistent. Direct modification of the file system is not sufficient. It sounds like the container databases got removed before they had a chance to update the account databases listings - perhaps manually unlinked before all of the object index information was removed?
Normally after a delete request the containers have to hang around for awhile as "tombstones" to ensure the account database gets updated correctly.
As a work around you could recreate them (with a POST) and then re-issue the DELETE; which should successfully allow the DELETE of the new empty containers and update the account database listing directly.
(Note: the container databases themselves, although empty, will still exist on disk as tombstones until the the reclaim_age passes)