dspace command line import files not appearing in collection

dspace command line import files not appearing in collection - command-line

I am running dspace 6.1 on Suse 12.2. I have used the command line import command to bring files into a collection, and the command prints out that it succeeded.
bin/dspace import --add --eperson=myemail#mycompany.com --
collection=123456789/49 --source=/opt/dspace/import_dateien/test/ --
mapfile=mapfile
Destination collections:
Owning Collection: 1982-11-10 Collection
Adding items from directory: /opt/dspace/import_dateien/test/
Generating mapfile: mapfile
Started: 1503332092481
Ended: 1503332095888
Elapsed time: 3 secs (3407 msecs)
I have added files without additional metadata, and no SAF zip file, because I want the extra files in the collection without their own metadata, only the existing set of metadata.
The name of the collection is correct in the success message. But I do not see the files in the collection. Are they there, but hidden? How can I get them to appear?

It appears that the command found zero items in the batch. You should see several lines of output for each item.
The first thing I would check is that the source directory "/opt/dspace/import_dateien/test/" conforms to SAF: it should contain numbered subdirectories, each containing one item. The importer seems to find no directories there.

I found it! I was using SAFBuilder, and there one has a csv file, with one line per file, each file with metadata. But, the SAFBuilder script creates a folder as well as a zip file, and in this folder are the files, a text file named contents, and the metadata as xml. One can simply add here the extra files one wants to add, list them in the contents file, and re-zip it. I just tested it with bin/dspace import, and there are my extra files in the collection. As #Mark Wood said, my files were listed in the output from the script.

Related

How to copy file based on date in Azure Data Factory

I have a list of files in a adls container which contain date in the name as given below:
TestFile-Name-20221120. csv
TestFile-Name-20221119. csv
TestFile-Name-20221118. csv
and i want to copy files which contain today date only like TestFile-Name-20221120. csv on today and so on.
I've used get metedata activity to get list of files and then for each to iterate over each file and then used set variable to extract name from the file like 20221120 but not sure how to proceed further.

We have something similar running. We check an SFTP folder for the existanc e of files, using the Get Metadata activity. In our case, there can be folders or files. We only want to process files, and very specific ones for that matter (I.e. we have 1 pipeline per filename we can process, as the different filenames would contain different columns/datatypes etc).
Our pipeline looks like this:
Within our Get Metadata component, we basically just filter for the name of the object we want, and we only want files ending in .zip, meaning we added a Filename filter:
:
In your case, the first part would be 'TestFile-Name-', and the second part would be *.csv'.
We then have a For Each loop set up, to process anything (the child items) we retrieved in the Get Metadata step. Within the For Each we defined an If Condition to only process files, and not folders.
In our cases, we use the following expression:
#equals(item().type, 'File')
In your case, you could use something like:
#endsWith(item().name, concat(<variable containing your date>, '.csv'))

Assuming all the file names start with TestFile-Name-,
and you want to copy the data of file with todays date,
use get metadata activity to check if the file exists and the file name can be dynamic like
#concat('TestFile-Name-',utcnow(),'.csv')
Note: you need to fromat utcnow as per the needed format
and if file exists, then proceed for copy else ignore

How to move files whose file name is not used in a set of text files?

I'm a Powershell beginner and this is my first post on stackoverflow. I can understand some simple pipelines, but the following challenge is too complicated for me at this point:
I have a folder with testdata containing *.bmp files and their associated files. I want a powershell script to check which bmp-files are still used. If not used, move bmp-files and associated files to another folder.
Details:
bmp-files and associated files: For example; car01.bmp, car01.log, car01.file, car02.bmp, (...)
The bmp-files are in use if their file name (eg, car01.bmp) is mentioned in any of the (text/csv) files in at least one of 2 locations (incl. subfolders).
If the file name is not found in any of the text files, I want the script to move that file, and any file who's name differs only by file extension to a designated folder.
Looking forward to your solutions!

Tool/Script to Automate File Naming From Metadata

I have a folder containing 1000+ PDF files with generic names (i.e. DOC (1)) that I need to rename with the order number, which is listed inside the file itself. I've had to manually open each document, look for the order number, then change the file name.
I have a spreadsheet list of all of the order numbers contained in that folder. I've used the search function of the file explorer to look up each individual order #, then renaming whichever file comes up. This has reduced the task time substantially but it's still very time-consuming.
Does anyone know of a macro or scrip that I can build to automate the task? Again, I would need something to grab the order number line by line from the spreadsheet, then go to the file explorer to search for any files with that specific term, and finally rename whichever file comes up with said term.
Any and all relevant help is greatly appreciated.
Thank you in advance for your time!

Update doxygen doc for sub directories

I am running doxygen for C/C++ documentation on a large codebase which has many different directories d1, d2 d3 etc. When I run my doxygen by giving the INPUT as top level directory, it generates document for all directories.
Now if only doc in one of sub directory has changed, how can I generate/update doc for only the modified directory. If I give INPUT as subdirectory d1, the generated index.html/main.html has doc specific to only that directory loosing other directories doc.
Is there a way to update the doc for only a particular directory ?
-Thanks

I believe something like this would be in order. I haven't tried myself but something like this should help.
By dividing it up into parts and then instead running a script that checks diff as in the first link or by maybe looking at when file was changed. This script could then invoke the changed folder as target for doxygen to run on.

Talend tWaitForFile insufficiency

We have a producer process that write files into a specific folder, which run continuously, we have to read files one by one using talend, there is 2 issues:
The 1st: tWaitForFile read only files which exist before its starting, so files which have created after the component starting are not visible for it.
The 2nd: There is no way to know if the file is released by the producer process, it may be read while it is not completely written, the parameter _wait_release_ of tWaitForFile does not work on Linux system !
So how can make Talend read complete written files from a directory that have an increasing files number ?

I'm not sure what you mean by your first issue. tWaitForFile has options to trigger when files are created, modified or deleted in a folder.
As for the second issue, your best bet here is for the file producer to be creating an OK or control file which is a 0 byte touch when it has finished writing the file you want.
In this case you simply look for the appearance of the OK file and then pick up the relevant completed file. If you name the 2 files the same but with a different file extension (the OK file is typically called ".OK" then this should be easy enough to look for. So you would set your tWaitForFile to look for "*.OK" files and then connect this to an iterate to a tFileInputDelimited (in the case you want to pick up a delimited text file) and then declare the file name as ((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).substring(0,((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).length()-3) + ".txt"
I've included some screenshots to help you below:

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

dspace command line import files not appearing in collection - command-line

Related

How to copy file based on date in Azure Data Factory

How to move files whose file name is not used in a set of text files?

Tool/Script to Automate File Naming From Metadata

Update doxygen doc for sub directories

Talend tWaitForFile insufficiency

Categories

Resources