Cleaning up duplicate files in TYPO3 - typo3

There are several duplicate files in my TYPO3 installation. Also some dupes in sys_file for the same file (different 'uid', same 'identifier' and 'storage').
These have several reasons:
first of all, this is an older site, so the previous behaviour (before FAL) resulted in duplicates anyway which were then moved to _migrated. (I am not sure if the upgrade wizard at that point did some cleaning up as well.)
editors just upload things more than once sometimes and lose track of existing files (in spite of filemounts used and a sensisble directly structure and thumbnails)
I don't know the exact reason for the dupes in sys_file, but they appear to be mostly related to the _migrated files.
What I would now like to do is create a script / extension to clean this up or assist editors to clean it up (e.g. show duplicates).
files with same content hash (but different filename / path) could be merged which means also merging all references
duplicates in sys_file should also get merged
I have a rough idea how this could be done but would like to know if there are already tools, experiences or knowledge anyone could share.

Related

Nextcloud - mass removal of collaborative tags from files

due to an oversight in a flow-routine that was meant to tag certain folders on upload into the cloud, a huge amount of unwanted files were also tagged in the process. Now there are thousands upon thousands of files that have the wrong tag and need to be untagged. Neither doing this by hand nor reuploading with the correct flow-routine are really workable options. Is there a way to do the following:
Crawl through every entry in a folder
If its a file, untag it, if its a folder, don't
Everything I found about tags and NextCloud was concerning with handling them when they were uploaded, but never running over existing files in regards of tagging.
Is this possible?
The cloud stores those data into the configured database. So you could simply remove the assigns from the db.
The assigns are stored in oc_systemtag_object_mapping while the tags itself are in oc_systemtag. If you found the ID of the tag to remove (let's say 4), you could simply remove all assignments from the db:
DELETE FROM oc_systemtag_object_mapping WHERE systemtagid = 4;
If you would like to do this only for a specific folder, it's not even getting much more complicated. Files (including their folder structure!) are stored in oc_filecache, while oc_systemtag_object_mapping.objectid references oc_filecache.fileid. So with some joining and LIKEing, you could limit the rows to delete. If your tag is used for non-files, your condition should include oc_systemtag_object_mapping.objecttype = 'files'.

typo3 upgrade 4.7 => 6.2 is losing images

I've done quit a couple of upgrades from t3 4.X to 6.X but this time I have a persistent problem I cannot not understand. After doing the upgrade (all upgrade wizards ran fine), I can see in the database that the image column of tt_content has the FAL index values in it and not the file names anymore. The references to the FAL tables are ok as well. When I look at CEs like textpic, however, the image tab does not show any images. No images are shown in the FW either.
I could think of trying to fix this in TS but I want to upgrade this install to 8 and think that when the first upgrade needs such a clutch, further updates will be doomed right from the start.
[edit #1]
I'm 100% sure it worked before. But now, whatever I do (update ref index, ...), sys_file_reference stays empty.
[edit #2]
I now followed How to upgrade TYPO3 4.5 to 6.2 and it worked. Strange thing is that it's not really that different from how I did it all the time. Maybe it just needed me to try it 27 times :)
your problem might depend on individual contentelements. if you have individual definitions the upgrade wizard does not know, these definitions stay unchanged and as a result your 'new' images (sys_file records) are not inserted correctly.
Individual CEs might need extra care at each upgrade.
After 6.2 FAL was stable and had no big changes. I would not expect the same amount of work for further upgrades.
In TYPO3 Version 6.2 the fileadmin-folder is represented by an automatically created storage record. In this record is a setting to respect case-sensitive filenames.
If this setting is not enabled before the migration of all media-files, then all media-files with upper-case characters are written in the database but not found anymore by the file-system because they are written lower case then.
So if you never find any images on the page after migration I assume all images had one or more upper-case characters in the filename.
If you've only a few images you could change the filename in the database, specifically in the table sys_file and the column identifier, else it's the best to repeat the whole process and care about the setting in the storage-record fileadmin in time.
Storage-records are located on the root-page [uid=0] in the backend, where also backend-users are resided.
Below is a partial screenshot of the database-table sys_file:
My experience was that mysql mode SQL_STRICT_TRANS_TABLES was in the middle of the problem. Once changed, sys_file_reference begins to fill the records correctly.

Eclipse indexing - what do the various options do

When you right-click > index on a project there are a few options:
Rebuild
Freshen All Files
Update with Modified Files
Re-resolve Unresolved Includes
I've been just hitting rebuild everytime but now I'm working on a huge project and can't afford to do that; when I modify a file, whether it's a .cpp or .h, I need to know which 'index' operation to do.
For each of the 'index' options:
What does it precisely do?
What is the cost (relative memory, CPU time)?
Documentation from Eclipse would be helpful but already searched and didn't find any.
Rebuild can only be performed on the whole project. It throws away the project's entire index and rebuilds it from scratch, indexing each file in the project.
Since it starts by throwing away the previous index, cancelling a Rebuild will result in an empty or partially built index.
The other actions can be performed either on the whole project, or on a folder or file (or group of folders/files) in the project.
They all go through the files in the selection, and update some or all of them in the index. Unlike Rebuild, they do not start by clearing the index, so cancelling them is relatively safe.
Freshen All Files updates all files in the selection. If called on the project, the end result is comparable to Rebuild.
Update with Modified Files only updates those files in the selection which have changed since the last time they were updated in the index, as determined by their timestamp and a hash of their contents.
Re-Resolve Unresolved Includes only updates those files in the selection for which configuration info (such as specified include paths) has changed, and the change resulted in an include that was previously unresolved now being resolved.
The performance characteristics can vary a lot depending on the project size and the kind of machine you're running on. I work on a very large project (millions of lines) for which a Rebuild can take 20-30 minutes on a relatively modern desktop. The operation is typically CPU-bound, but the indexer is currently single-threaded, so it will only use up one CPU core.
Finally, I'd like to mention again what I said in my comment on the question: if you configure the index to be updated automatically in Preferences | C/C++ | Indexer, you shouldn't need to manually invoke these commands at all, at least in theory. In practice, I find an occasional Rebuild is necessary (say once every few weeks), especially after a configuration change (e.g. adding a new include path).
Sources: this mailing list post, reading the implementation of the actions, and experience using CDT.

Symfony: getting form values before and after form handling

Hello I want to be able to compare values before and after form handling, so that I can process them before flush.
What I do is collect old values in an array before handlerequest.
I then compare new values to the old values in the array.
It works perfectly on simple variables, like strings for instance.
However I want to work on uploaded files. I am able to get their fullpath and names before handling the form but when I get the values after checking if form is valid, I am still getting the same old value.
I tried both calling $entity->getVar() and $form->getData()->getVar() and I have the same output....
Hello I actually found a solution. Yet it is a departure from the strategy announced in my question, which I realize is somewhat truncated regarding my objective. Which was to compare old file names and new names (those names actually include full path) for changes, so that I would unlink those of those old names that were not in the new name list anymore. Basically, to operate a cleanup after a file was uploaded to replace another, without the first one being deleted first. And to save the webmaster the hassle of having to sort between uniqid-named files that are still used by the web site and those that are useless.
Problem is that my upload functions, that are very similar to those given in examples to the file upload code shown on the official documentation pages, seemed to take effect at flush time.
So, since what I wanted to do with those files had nothing to do with database operations, I resorted to having step two code launch after flush, which works fine.
However I am intrigued by your solutions, as they are both strategies I hadn't thought of. Thank you for suggestions.
However I am not sure if cloning the whole object will be as straightforward as comparing two arrays of file names.

Merge two LLBLGEN 2 source files

I have two LLGLGEN 2.6 pro source files that I have to merge in my git repo (2 different branches). Due to the "professionnal" work of previous programmers on this project, the two projects have changes (the fork is 1 year old) that are not tracked in documents.
What can be the less painfull solution to finalize my merge ?
Thanks.
In my experience, it's easier to simply ignore the merge conflicts in the LLBL generated code and just re-sync the project to the database and then regenerate the code completely post-merge.
Where this becomes a problem is when there are a lot (or even a few) customizations made to the LLBL project file (e.g renaming fields, creating typed lists). There isn't much you can do about these outside of tracking them down one by one. The good news is the compiler will complain of something is missing or renamed.