Nextcloud - mass removal of collaborative tags from files - owncloud

due to an oversight in a flow-routine that was meant to tag certain folders on upload into the cloud, a huge amount of unwanted files were also tagged in the process. Now there are thousands upon thousands of files that have the wrong tag and need to be untagged. Neither doing this by hand nor reuploading with the correct flow-routine are really workable options. Is there a way to do the following:
Crawl through every entry in a folder
If its a file, untag it, if its a folder, don't
Everything I found about tags and NextCloud was concerning with handling them when they were uploaded, but never running over existing files in regards of tagging.
Is this possible?

The cloud stores those data into the configured database. So you could simply remove the assigns from the db.
The assigns are stored in oc_systemtag_object_mapping while the tags itself are in oc_systemtag. If you found the ID of the tag to remove (let's say 4), you could simply remove all assignments from the db:
DELETE FROM oc_systemtag_object_mapping WHERE systemtagid = 4;
If you would like to do this only for a specific folder, it's not even getting much more complicated. Files (including their folder structure!) are stored in oc_filecache, while oc_systemtag_object_mapping.objectid references oc_filecache.fileid. So with some joining and LIKEing, you could limit the rows to delete. If your tag is used for non-files, your condition should include oc_systemtag_object_mapping.objecttype = 'files'.

Related

Cleaning up duplicate files in TYPO3

There are several duplicate files in my TYPO3 installation. Also some dupes in sys_file for the same file (different 'uid', same 'identifier' and 'storage').
These have several reasons:
first of all, this is an older site, so the previous behaviour (before FAL) resulted in duplicates anyway which were then moved to _migrated. (I am not sure if the upgrade wizard at that point did some cleaning up as well.)
editors just upload things more than once sometimes and lose track of existing files (in spite of filemounts used and a sensisble directly structure and thumbnails)
I don't know the exact reason for the dupes in sys_file, but they appear to be mostly related to the _migrated files.
What I would now like to do is create a script / extension to clean this up or assist editors to clean it up (e.g. show duplicates).
files with same content hash (but different filename / path) could be merged which means also merging all references
duplicates in sys_file should also get merged
I have a rough idea how this could be done but would like to know if there are already tools, experiences or knowledge anyone could share.

Symfony: getting form values before and after form handling

Hello I want to be able to compare values before and after form handling, so that I can process them before flush.
What I do is collect old values in an array before handlerequest.
I then compare new values to the old values in the array.
It works perfectly on simple variables, like strings for instance.
However I want to work on uploaded files. I am able to get their fullpath and names before handling the form but when I get the values after checking if form is valid, I am still getting the same old value.
I tried both calling $entity->getVar() and $form->getData()->getVar() and I have the same output....
Hello I actually found a solution. Yet it is a departure from the strategy announced in my question, which I realize is somewhat truncated regarding my objective. Which was to compare old file names and new names (those names actually include full path) for changes, so that I would unlink those of those old names that were not in the new name list anymore. Basically, to operate a cleanup after a file was uploaded to replace another, without the first one being deleted first. And to save the webmaster the hassle of having to sort between uniqid-named files that are still used by the web site and those that are useless.
Problem is that my upload functions, that are very similar to those given in examples to the file upload code shown on the official documentation pages, seemed to take effect at flush time.
So, since what I wanted to do with those files had nothing to do with database operations, I resorted to having step two code launch after flush, which works fine.
However I am intrigued by your solutions, as they are both strategies I hadn't thought of. Thank you for suggestions.
However I am not sure if cloning the whole object will be as straightforward as comparing two arrays of file names.

Split MS Access DB Needs Compact/Repair as well as Re Link on Front and Backend, Why?

I have an ACCDB that I split a while ago that contains many forms with sub forms (based on tables) and over two hundred tables in the BE (almost all are small lookup tables for vehicle objects) and 400+ queries. There also happens to exist another ACCDB with a single table in it with 6.5M rows that the FE links to with basic history info. The two backends do not link to each other in any way. The FE is 14MB, BE is 1.2G and the single table DB is 900MB, all with primary keyes and indexes setup appropriately. The DB is 100% normalized. Both BE's grow 5% every month. The DB is currently slated to be migrated to an Oracle 11G environment later this year.
Question:
I found out recently that if I compact and repair the back end or front end that none of the forms containing subforms open; the whole FE just freezes to white. Even if all 3 are repaired I still have issues. BUT if I compact/repair all 3 as well as relink the entire front end to the two backends the forms all of sudden start working. It was only recently that this behavior began.
Why do I have to relink to make the forms work again?
You should not have to re-link anything here at all after a C+R.
The only thing that comes to mind is the user who is doing the C+R has some restricted rights in the folder or directory where the C+R occurs.
Remember, when the user does the C+R, then a COPY of the file is created – and thus possible inheriting of the CURRENT user’s rights can occur WHEN the NEW file is created. So it sounds like some permissions issues exists on the folder, or the user that is doing the C+R has some special (different) rights. (perhaps some inherited rights do to membership in some security group).
Of course one should ensure that you are using UNC path names, and of course the front end needs to be placed on each machine.
Perhaps again the user doing the C+R has “different” drive mappings and thus links to the back end databases are thus wrong due to different drive letter. So if not already, as a general rule I would STRONGLY avoid drive letters and use NC path names (if you not already).
If you are using UNC path names, then the likely issue is permissions.
There also a possibility that the new user doing the C+R is running the front end from a “non” trusted location.
Also, the table of 6.5 million rows seems a bit large, and I assume the 1.2 gig size is RIGHT AFTER a C+R? (but this issue is for another post).
This suggests a drive mapping issue, a permissions issue, or perhaps the user launching the application is messing up references. I would shift by-pass into the application and ensure that the user doing the C+R can compile the application, and would from VBA editor take CAREFUL note that say office 14 references are not being hi-jacked to office 15 references for example.
You're reaching the "hassle-free" viable (as opposed to "documented") limits of Access as a database. remember the queries need to be compiled which means resolving all the table links, and verifying existing indexes and other meta-data. it's possible that simply over-writing this information by manually using the linked table manager as you have, may be more efficient.
Here's a few prescribed tips which might help you out:
http://office.microsoft.com/en-gb/access-help/improve-performance-of-an-access-database-HP005187453.aspx
And some more...
http://www.fmsinc.com/MicrosoftAccess/Performance.html#Linked%20Tables
And a related thread from this site:
Proper way to program a Microsoft Access Backend Database in a Multiuser Environment
Issues which may not be helping you:
queries which don't restrict the dataset sufficiently, particularly those running a dynaset
backed database files sitting too low in the windows folder structure (the higher the better)
As the 2nd link suggests, the truth is there are so many variables at work that resolving this will require some tinkering, with trial & error playing a major part.
All that, or you can upsize to SQL Server Express :)
http://office.microsoft.com/en-gb/access-help/move-access-data-to-a-sql-server-database-by-using-the-upsizing-wizard-HA010275537.aspx

Mongo schema: Todo-list with groups

I want to learn mongo and decided to create a more complex todo-application for learning purpose.
The basic idea is a task-list where tasks are grouped in folders. Users may have different access to those folders (read, write) and tasks may be moved to other folders. Usually (especially for syncing) tasks will be requested by-folder and not alone.
Basically I thought about three approaches and would like to hear your opinion for them. Maybe I missed some points or just have the wrong way of thinking.
A - List of References
Collections: User, Folder, Task
Folders contain references to Users
Folders contain references to Tasks
Problem
When updating a Task a reference to Folder is needed. Either those reference is stored within the Task (redundancy) or it must be passed with each API-call.
B - Subdocuments
Collections: User, Folder
Folders contain references to Users
Tasks are subdocuments within Folders
Problem
No way to update a Task without knowing the Folder. Both need to be transmitted as well but there is no redundancy compared to A.
C - References
Collections: User, Folder, Task
Folders contain references to Users
Taskskeep a reference to their Folders
Problem
Requesting a folder means searching in a long list instead of having direct references (A) or just returning the folder (B).
If you don't need any metadata for the folder except the name you could also go with:
Collections: User,Task
Task has field folder
User has arrays read_access and write_access
Then
You can get a list of all folders with
db.task.distinct("folder")
The folder a specific user can access are automatically retrieved when you retrieve the user document so those can basically known at login.
You can get all tasks a user can read with
db.task.find( { folder: { $in: read_access } } )
with read_access beeing the respective array you got from your users document. The same with write_access.
You can find all tasks within a folder with a simple find query for the folder name.
Renaming a folder can be achieved with one update query on each of the collections.
Creating a folder or moving a task to another folder can also be achieved in simple manners.
So without metadata for folders that is what I would do. If you need metadata for folders it can become a little more complicated but basically you could manage those independent of the tasks and users above using a folder collection containing the metadata with _id beeing the folder name referenced in user and task.
Edit:
Comparison of the different approaches
Stumbled over this link which might be of interest for you. In there is a discussion of transitioning from a relational database model to mongo. The difference beeing that in a relational database you usually try to go for third normal form where one of the goals is to avoid bias to any form of access pattern where in mongodb you can try to model your data to best fit your access patterns (while keeping in mind not to introduce possible data anomalies through redundancy).
So with that in mind:
your model A is a way how you could do it in a relational database (each type of information in one table referenced over id)
model B would be tailored for an access pattern where you always list a complete folder and tasks are only edited when the folder is opened (if you retrieve one folder you have all the task without an additional query)
C would be a different relational model than A and I think little closer to third normal form (without knowing the exact tables)
My suggestion would support the folder access not as optimal as B but would make it easier to show and edit single tasks
Problems that could come up with the schemas: Since A and C are basically relational you can get a problem with foreign keys since mongodb does not enforce foreign key constraints (e.g. you could delete a folder while there are still tasks referencing it in C or a task without deleting its reference in the folder in A). You could circumvent this problem by enforcing it from the application. For B the 16MB document limit could become a problem circumventable by allowing folders to split into multiple document when they reach a certain task count.
So new conclusion: I think A and C might not show you the advanatages of mongodb (and might even be more work to build in mongodb than in sql) since they are what you would do on a traditional relational database which is the way mongodb was not designed for (e.g. the missing join statement, no foreign key constraints). In sum B most matches your access patern "Usually (especially for syncing) tasks will be requested by-folder" while still allowing to easily edit and move tasks once the folder is opened.

Lotus Notes application Document count and disk space

Using Lotus Notes 8.5.2 & made a backup of my mail application in order to preserve everything in a specific folder before deleting the contents of it from my main application. The backup is a local copy, created by going to File --> Application --> New Copy. Set the Server to Local, give it a title & file name that I am saving in a folder. All of this works okay.
Once I have that, I go into the All Documents & delete everything out except the contents of the folder(s) I want this application to preserve. When finished, I can select all and see approximately 800 documents.
However, there are a couple other things I have noticed also. First - the Document Count (Right-click on the newly created application & go to Properties). Select the "i" tab, and it has a Disk Space & Document count there. However, that document count doesn't match what is shown when you open the application & go to All Documents. That count is matches the 800 I had after deleting all but the contents I wanted to preserve. Instead, the application says it has almost double that amount (1500+), with a fairly large file size.
I know about the Unread Document count, and in this particular application I checked the "Don't maintain unread marks" on the last property tab. There is no red number in the application, but the document count nor the file size changed when that was selected. Compacting the application makes no difference.
I'm concerned that although I've trimmed down what I want to preserve on this Lotus Notes application that there's a lot of excess baggage with it. Also, since the document count appears to be inflated, I suspect that the file size is also.
How do you make a backup copy of a Lotus Notes application, then keep only what you want & have the Document Count and File Size reflect what you have actually preserved? Would appreciate any help or advice.
Thanks!
This question might really belong on ServerFault or SuperUser, because it's more of an admin or user question than a development question, but I can give you an answer from a developer angle...
Open your mailbox in Domino Designer, and look at the selection formula for $All view. It should look something like this:
SELECT #IsNotMember("A"; ExcludeFromView) & IsMailStationery != 1 & Form != "Group" & Form != "Person"
That should tell you first of all that indeed, "All Documents" doesn't really mean all documents. If you take a closer look, you'll see that three types of documents are not included in All Documents.
Stationery documents
Person and Group documents (i.e., synchronized contacts)
Any other docs that any IBM, 3rd party, or local developer has decided to mark with an "A" in the ExcludeFromView field. (I think that repeat calendar appointment info probably falls into this category.)
One or more of those things is accounting for the difference in your document count.
If you want, you can create a view with the inverse of that selection formula by reversing each comparison and changing the Ands to Ors:
SELECT #IsMember("A"; ExcludeFromView) | (IsMailStationery = 1) | ( Form = "Group" | Form = "Person")
Or for that matter, you can get the same result just taking the original formula and surrounding it with parens and prefixing it with a logical not.
Either way, that view should show you everything that's not in AllDocuments, and you can delete anything there that you don't want.
For a procedure that doesn't involve mucking around with Domino Designer, I would suggest making a local replica instead of a local copy, and using the selective replication option to replicate only documents from specific folders (Space Savers under More Options). But that answer belongs on ServerFault or SuperUser so if you have any questions about it please enter a new question there.