Add metadata to large number of files in SharePoint Online - powershell

We're migrating a large number of files from a document management system into SharePoint online. We have important metadata associated with a good number of the files. The export process renames the files as nnnnn_yyyyy_oldfilename with nnnnn being a cabinet number and yyyyy being a folder number. It also creates a file that associates all existing metadata by these two pieces of information. Is it practical to script renaming the files back to their original names while storing the two pieces of information in new custom metadata fields (cabinet, ofolder) for each file? If we can save those pieces of information, we'll then be able to use a similar script to push the saved information into other custom metadata fields later on.

Related

Open source tools for reconcilliation

Are there any open source tools available for data reconciliations? Key usecases are getting data from different parties in custom formats and having to recon them and identify any missing/mismatched rows. Another good to have usecase is the ability to build this recon pipeline directly through UI, where users can load sample files, mark key fields for matching and define the output format for recon.

Nextcloud - mass removal of collaborative tags from files

due to an oversight in a flow-routine that was meant to tag certain folders on upload into the cloud, a huge amount of unwanted files were also tagged in the process. Now there are thousands upon thousands of files that have the wrong tag and need to be untagged. Neither doing this by hand nor reuploading with the correct flow-routine are really workable options. Is there a way to do the following:
Crawl through every entry in a folder
If its a file, untag it, if its a folder, don't
Everything I found about tags and NextCloud was concerning with handling them when they were uploaded, but never running over existing files in regards of tagging.
Is this possible?
The cloud stores those data into the configured database. So you could simply remove the assigns from the db.
The assigns are stored in oc_systemtag_object_mapping while the tags itself are in oc_systemtag. If you found the ID of the tag to remove (let's say 4), you could simply remove all assignments from the db:
DELETE FROM oc_systemtag_object_mapping WHERE systemtagid = 4;
If you would like to do this only for a specific folder, it's not even getting much more complicated. Files (including their folder structure!) are stored in oc_filecache, while oc_systemtag_object_mapping.objectid references oc_filecache.fileid. So with some joining and LIKEing, you could limit the rows to delete. If your tag is used for non-files, your condition should include oc_systemtag_object_mapping.objecttype = 'files'.

Google Data Fusion reading files from multiple sub folders in a bucket and need to place in another folder in side sub folder

Example
sameer/student/land/compressed files
sameer/student/pro/uncompressed files
sameer/employee/land/compressed files
sameer/employee/pro/uncompressed files
In the above example I need to read files from all LAND folders present in different sub directories and need to process them and place them in PRO folders with in same sub folders.
For this I have taken two GCS nodes one from source and another from sink.
in the GCS source i have provided path gs://sameer/ , it is reading files from all sub folders and merging them into one file placing it in sink path.
Excepted output all files should be placed in sub directories where i have fetched from.
It can achieve the excepted output by running pipeline separately for each folder
I am expecting is this can be possible by a single pipeline run
It seems like your use case is simply moving files. In that case, I would suggest using the Action plugin GCS Move or GCS Copy.
It seems like the task you are trying to carry out is not possible to do in one single Data Fusion pipeline, at least at the time of writing this.
In a pipeline, all the sources and sinks have to be connected. Otherwise you will get the following error:
'Invalid DAG. There is an island made up of stages ...'
This means it is not possible to parallelise several uncompression tasks, one for each folder of files, inside the same pipeline.
At the same time, if you were to use something like the following schema, the outputs would be aggregated and replicated over all of the sinks:
Finally, I would say that the only case in which you can parallelise a task between several sources and several links is when using multiple database tables. By means of the following plug-ins (2) and (3) you can process data from multiple table inputs and export the output to multiple tables. If you would like to see all available plugins for Data fusion, please check the following link (4).

Reading file from Google Drive with Talend

I need to read an uploaded file in Google Drive and perform X transformation with it. As per my reading, the single way to do it is by downloading the file to my local machine with the Talend component and then, reading from there.
If it is correct, I cannot figure what would be the file name assuming that I don't want to use the exact name of the file.
I found http://meowbi.com/2018/02/23/getting-google-sheet-gdrive-talend/ and it is exactly what I need - read from Google Drive, check the file name and proceed if the file name is X. What is unclear for me is what they used in tJava.
The output schema of tGoogleDriveList component's Main row contains a field name that is the file name you're looking for. Using Iterate row is less straightforward as you need to extract values from GlobalMap. In the article you cited they get file name by "tGoogleDriveList_1_TITLE" key of the GlobalMap.
Main row between tGoogleDriveList and tJava
For more details please look into the Talend Reference for Google Drive components. The Listing files and folders in Google Drive section should be particularly topical for your case.

The default value for a column in not set when copy file using rest api (SharePoint 2013 standalone)

I am trying to copy files from one folder to another folder using SharePoint REST API. Some columns inside the destination folder have defined a default value. Even though the files are copied successfully, some files do not get the default value for the columns.
On a closer look, I found that the new office documents types (.docx, .xlsx, .pptx etc.) get the default values, while the old office document types (.doc, .xls, .ppt) do not get the values.
Also the old office documents get the values only when they are coming from a source folder which already contains the columns in the destination folder.
I am wondering why the old office documents do not get the values and if anything can be done.
Is it a bug in SharePoint Server or am I missing any configuration to make all files work?
My understanding is that this is expected. Because you are copying files, the copy includes not only the file itself but also its metadata. If the file in the source folder doesn't have values in those columns, it does make sense that if you copy it to a destination folder, those same columns shouldn't have values either. Now, why some files (docx, pptx, etc.) do have values in the destination? Probably because of the SharePoint document parser feature (Document Property Promotion and Demotion). So in your case what you can do is, instead of copying the files, download/upload them using for instance code like this.