PropertyManager2 iterates the tree for infinite depth for deleting/moving an IFile - eclipse

I have a scenario, where I have around 10K files inside a folder named '_out'. and '_out' in-turn contains other folders and inner files and sub folders.
I'm deleting around 6000 files , which are directly present under the _out folder.
I delete the files by using the following code snippets:
for(IFile iFile:files){
iFile.delete(true, new NullProgressMonitor());
}
During the delete of each file, Property of each resource is being deleted, internally in the API Resource.deleteResource(boolean, MultiStatus), using method call to IPropertyManager.deleteResource(IResource). This API in-turn calls PropertyManager2.deleteProperties(IResource, int depth) with INFINITE depth and hence, all the folder and sub-folders inside the location "\workspace.metadata.plugins\org.eclipse.core.resources.projects\projectName.indexes\8f" are visited and 'Properties.index' files are loaded in each folder and then, checked whether the given IFile has a property in them. Where '8f' folder represents the Bucket for the '_out' folder.
for all the 6000 files, this operation is being repeated and hence, it takes around 15 minutes to delete all the 6000 files.
Ideally, should we just visit the properties.index file under the '8f' and delete the property for the given file and return?
My assumption is that the Properties for any files present directly under the '_out' folder will be saved under the 'properties.index' file in the Bucket corresponding to '_out' folder. (in my scenario 8f folder).
Apologies, if I have misunderstood it. Kindly, clarify.
I have raised a bug for the same at link.
Thanks,
Palraj

Related

Azure Data Factory Cannot Read Metadata Folder

I hope you guys keep health and keep strong in Pandemic covid-19.
I have some question on Azure Data Factory. btw I have create some pipeline with Metadata activity with detail below:
I have file in Folder and Subfolder like this:
I have metadata activity with for each with first get metadata child item (in folder) like this:
metadata with last modified like this (if you setting like this, metadata only read last modified subfolder
after that add variable I use #item().Name to read file in that folder like this:
after running metadata which have subfolder, I've get error like this:
the error give info that with #item().Name cannot read subfolder on that folder. the metadata for each file is success, but error like this which on my activity cannot read metadata subfolder .
many big thanks to have answer, Thank You
If you need to access the folder
Create a clone of same dataset and setup parameter as below, leave the file field empty.
If you need to access the file inside directory, use condition #equals(item().type,'Folder') to identity directory then inside that use dataset with parameters for directory and file.

Reading files from Apache Spark textFileStream

I'm trying to read/monitor txt files from a Hadoop file system directory. But I've noticed all txt files inside this directory are directories themselves as showed in this example bellow:
/crawlerOutput/b6b95b75148cdac44cd55d93fe2bbaa76aa5cccecf3d723c5e47d361b28663be-1427922269.txt/_SUCCESS
/crawlerOutput/b6b95b75148cdac44cd55d93fe2bbaa76aa5cccecf3d723c5e47d361b28663be-1427922269.txt/part-00000
/crawlerOutput/b6b95b75148cdac44cd55d93fe2bbaa76aa5cccecf3d723c5e47d361b28663be-1427922269.txt/part-00001
I'd want read all the data inside the part's files. I'm trying to use the following code as showed in this snippet:
val testData = ssc.textFileStream("/crawlerOutput/*/*")
But, unfortunately it said it doesn't exist /crawlerOutput/*/*. Doesn't textFileStream accept wildcards? What should I do to solve this problem?
The textFileStream() is just a wrapper for fileStream() and does not support subdirectories (see https://spark.apache.org/docs/1.3.0/streaming-programming-guide.html).
You would need to list the specific directories to monitor. If you need to detect new directories a StreamingListener could be used to check then stop streaming context and restart with new values.
Just thinking out loud.. If you intend to process each subdirectory once and just want to detect these new directories then potentially key off another location that may contain job info or a file token that once present could be consumed in the streaming context and call the appropriate textFile() to ingest the new path.

How to exclude a directory in Stream mapping in p4v

I'm using the visual client for perforce and I want to exclude a directory from the workspace. Before streams, I would just navigate to my workspace, find the folder in the tree, and exclude it (and I've found this solution in a number of other related questions I've found). However, now that I am using a stream, it won't let me do this, i have to edit the stream mapping apparently.
So I tried to add this line to the remapped box when editing the stream:
-//NumberPlus/current/Library/... //nplus-mainline/current/Library/
However I just get an error:
Error in stream specification.
Error detected at line 24
Null directory (//) not allowed in '-//NumberPlus/current/Library/...'.
EDIT: I'm in Windows 8.1, for clarification.
If the folder you want to exclude is specific to your machine, setting P4IGNORE locally is the easiest way to exclude it from being added to the depot.
http://www.perforce.com/blog/120214/new-20121-p4ignore
You'd set P4IGNORE to some name like "p4ignore.txt", create a file with that name, and add "Libraries" to it -- subsequent "p4 add" commands will skip over paths found in the P4IGNORE file, so those files will never get added to the depot.
If this is something that's going to be common to all workspaces of this stream (e.g. it's a build artifact that everyone is going to generate and nobody is supposed to check in), what you want to do is add an "exclude" to the stream's Paths (this will exclude it from both branch views and client views generated by that stream). E.g.:
Paths:
share ...
exclude Libraries/...
The "exclude Libraries/..." is basically the same thing as the exclusion line you would add to the client view, except you specify it as a relative path, you don't need to specify both sides of the mapping, and the "-" is implied by the "exclude" type. The "remap" type is if you want to keep those files but in a different depot location, which doesn't sound applicable here.
More information on defining stream views:
http://www.perforce.com/perforce/doc.current/manuals/p4v/streams_views.html
You can't just edit the mappings for your client workspace if it is switched to a particular stream. The whole point of streams is that your workspace mapping is directly generated from the stream definition. So that's a feature.
It's not totally clear whether
you don't want the directory in the stream at all, or
it's valid to have the directory in the stream, but you don't want to sync it to your workstation, or
you want the directory sync'd to your workstation, but you want the directory to have different contents (say, from some other stream which has a different version of the library.
However, for all of these situations, I suspect the best path forward is to define a new child stream of your current stream.
You will want to define the path mappings using the "share", "exclude", "isolate", and "import" mapping types.
For example, if you just didn't want the Library/... directory at all, you'd "exclude" it from your parent.
Then that stream simply won't have that directory, and it (of course) won't be on your workstation when you sync to the stream, either.
If you wanted to have a different copy of the code in the Library/... directory, so that it became a point of intentional divergence from the parent, you'd "isolate" it from your parent to submit your own custom version, or "import" it from another stream to use that stream's Library/... directory instead.
In either case, the directory would be part of the stream, and would be sync'd to your workstation, but the contents of that directory would differ from the contents that are used in the parent stream (the exact way in which they'd differ is under your control, as you define the stream accordingly).
Documentation and some examples are here: http://www.perforce.com/perforce/doc.current/manuals/p4v/streams_views.html
and here:
http://www.perforce.com/sites/default/files/pdf/Streams-ebook.pdf
I believe I have solved this. To be clear, I wanted the folder to be completely ignored by version control. I'm using p4connect with Unity and it keeps wanting to include unnecessary stuff in my depot.
All I had to do was add this line to my parent stream in the Paths box:
exclude current/Library/...

How can I rename file uploaded to s3 using javascript api?

'pickAndStore' method allows me to specify full path to the file, but I don't know it's extension at this point (file path has to be defined before file is uploaded, so it's not possible to provide a path with correct extension).
if I use 'pick' and then 'store' I have 2 files (because both methods uploads file to the s3). I can delete 'old' file, but it's not optimal and can be pain (take ages) with really big files.
Is there any better solution? Ideally to rename existing file.
Currently, there is no workaround for renaming file.
However, in our Javascript API v2 we are planing to add new callback function. onStart callback will be fired after user pick file but before file uploading. There could be option like renaming file based on original filename.
We will keep you updated.

addToFolder(): The copy version of the file is deleted, if the original version is deleted

I started doing development with google scripts few days ago and recently joined stackoverflow. I have a problem with addToFolder() function. I have the following piece of code that copies my new spreadsheet into a folder (test/sheets) in my Google Drive:
var ss = SpreadsheetApp.create("test");
var ssID = ss.getId();
DocsList.getFileById(ssID).addToFolder(DocsList.getFolder("test/sheets"));
My problem is that now I have 2 versions of the same file (one in the root of my Google Drive folder and the other in test/sheets folder), whenever I try to delete either of the copies, the other copy is deleted as well. Is there a way to delete the old file and keep the new one OR is there a way to create the file in the desired folder in first place?
EDIT :
thanks for you quick response. I played with this for couple of hours but still have problem copying the file to the destination folder. The problem is that even when I use makeCopy Method of the file, still addToFolder is the only option to mention the folder. Again this ends up having the tagged filename in the destination folder.
I had the same problem with the copy method.
Here is my new Code:
var SetLocationFile = "icompare/sheets/stocks"
var FolderID = DocsList.getFolder(SetLocationFile);
var FileID = DocsList.getFileById(ssID);
FileID.makeCopy("test3").addToFolder(FolderID);
Folders in Google Docs\Google Drive are actually tags. When you "add" a file to the folder "test/sheets", you do not make a copy of your file, you just attach the tag "test/sheets" to it. Now the same file is shown both in the "test/sheets" folder (i.e. in the list of all files with the tag "test/sheets") and in the root. If you wish to make a copy of the file, you should use the copy method. (Please let me know if I just misunderstand your question.)
I realize this is an old questions but you can simply use .removeFromFolder(DocsList.getRootFolder()); to remove the file from the root folder.
I would also like to know the answer to this question.. seems rather "weird" that the API does not even provide a way to create spreadsheets and place them in a certain map? And no, I do not want a Copy of the file, I want the file to be in a specific map and in no other map...