datajoint-matlab implementation of filepath#data datatype - datajoint

For datajoint-matlab, #327 seems to indicate that File external storage (#143, PR #197) should be implemented in the current version. I can create a table with the datatype filepath#data, after defining the store 'data', but I get an error on insert.
Error using dj.Relvar/insert/makePlaceholder (line 244)
The field `fref` with datatype `filepath#data` is not yet supported.
Error in dj.Relvar/insert (line 334)
[v, placeholder] = makePlaceholder(i,
tuple.(header.attributes(i).name));
Is this still not implemented, or is the error-checking here just stopping me from using it? Happy to provide more details on the test if needed.

Maintainer for DataJoint here. Looks like there is a bit of confusion so let's see if I can help to bring some clarity. Hope to use this discussion as a resource to improve the documentation.
DataJoint provides a few DataJoint-only datatypes. Of these types, we identify the ones associated with external storage by embedding an # symbol. We classify each part of the type as <datatype>#<store>. Essentially with these types, information (i.e. datatype) is stored remotely in an object store (i.e. a store) with proper reference links within the relational database and client configuration for access.
For datatype, there are currently 3 options:
blob: Equivalent to a blob type but for external stores. Currently, this type is supported both in datajoint-python and most recently in datajoint-matlab.
attach: A special type that captures file content as binary information but does not preserve any path information. Currently, this type is supported only in datajoint-python. Documentation is available for this type in File Attachment Datatype section.
filepath: A special type that captures file content as binary information and includes path (along with file name) detail. Currently, this type is in preview in datajoint-python requiring it to be enabled. Documentation is available for this type in Filepath Datatype section. See note in documentation to enable it.
For store, there is an External Store section in the documentation. Multiple stores can be configured as a mapping located under the stores key in dj.config. For MATLAB, see help('dj.config') for examples and in Python, refer to the docs for attach and filepath above.
Stores currently support 2 protocols:
s3: Stores objects in an S3 bucket. Currently, this store is supported both in datajoint-python and most recently in datajoint-matlab.
file: Stores files in a directory accessible via the filesystem on the client. Currently, this store is supported both in datajoint-python and most recently in datajoint-matlab.
Issue #143 and PR's #197, #327 that you've mentioned refer to efforts in implementing file, s3 stores for the blob datatype in DataJoint MATLAB. The error you've experienced is expected and a simple placeholder until we support the other 2 datatypes in DataJoint MATLAB.

Related

How to split openAPI/Swagger file into multiple valid sub-files?

Our service implements different levels of access and we are using one openAPI YAML file internally.
For external documentation purposes, we would like to create multiple openAPI files, that are valid in themselves (self-sustained), but only have a partial set of the global file, e.g. based on the path or on tags.
(The same path may be used in different split-Files but I don't think that is a problem then.)
Any idea on how to achieve that? Is there some tooling around for it?
You can use a valid URI in a JSON Pointer which points to another resource. The URI can be a path to a local file, a web resource, etc.:
paths:
/user/{id}:
summary: Get a user
parameters:
- $ref: "./path/to/file#/user_id"
# And so on...
Reserved keys in the OpenAPI spec must be unique so I don't think you'd be able to create standalone OpenAPI specs without some third-party utility that could overcome that limitation.
However, you would be able to create valid standalone JSON objects defined across many files and reference them in the index document. There are many articles online providing examples:
https://davidgarcia.dev/posts/how-to-split-open-api-spec-into-multiple-files/
https://blog.pchudzik.com/202004/open-api-and-external-ref/
I ended up writing a Python script, that I have posted here.
Flow
Read the YAML File into a dictionary
Copy the dictionary to a new dictionary
Iterate through the original dictionary and
Remove items that are not tagged with the tag(s) you want to keep
Remove items that are have some keyword you want to omit in the path
Write out the dictionary to a new YAML
The GIST is available here:
https://gist.github.com/erikm30/d1f7e1cea3f18ece207ccdcf9f12354e

Macros in Datafusion using Argument setter

Using Argument setter by supplying the parameter value I want to make the Datafusion pipeline as resuable. As said by many other answer's have tried implementing using the cloud reusable pipeline example given in Google guide.I was not able to pass the parameter Json file.So how to create the API to that parameter Json file stored in Google storage.Please explain the values to be passed to Argument setter like URL,Request response etc., If any one of you had implemented in your projects.
Thank you.
ArgumentSetter plugin reads from a HTTP endpoint and it must be publicly accessible as is depicted within the GCP documentation. Currently, there is not a way to read from a non-public file stored in GCS. This behavior has been reported in order to be improved to CDAP through this ticket.
Can you please provide what you've tried so far and where you're stuck?
The URL field in argument setter would contain the API endpoint you're making a call to. Make sure you include any headers your call would need like Authorization, Accept etc.
If you're having issues with argument setter a good check is to use Curl or any other tool to make sure you're able to talk to the endpoint you're trying to use.
Here's some documentation about Argument setter: https://github.com/data-integrations/argument-setter
Define a JSON file with appropriate name/value pairs. Upload it in a GCS bucket - make it public by changing permissions (add "allUsers" in permissions list). When you save it, the file will say "Public to Internet"
Copy the https path to the file and use it in Arguments Setter. If you're able to access this path from curl/ your browser, Argument Setter will be able to do too..
There are other problems I've encountered while using Argument Setter though - the pipe doesn't supersede runtime arguments over default values provided in the URL many a times, specially when the pipe is duplicated.
To make file public
You have to make your bucket public, currently there is no other way.
gsutil iam ch allUsers:objectViewer gs://BUCKET_NAME

Most efficient way to change the value of a specific tag in a DICOM file using GDCM

I have a need to go through a set of DICOM files and modify certain tags to be current with the data maintained in the database of an external system. I am looking to use GDCM. I am new to GDCM. A search through stack overflow posts demonstrates that the anonymizer class can be used to change tag values.
Generating a simple CT DICOM image using GDCM
My question is if this is the best use of the GDCM API or if there is a better approach for changing the values of individual tags such as patient name or accession number. I am unfamiliar with all of the API options but have a link to the API documentation. It looks like the DataElement SetValue member could be used, but it doesn't appear that there is a valid constructor for doing this in the Value class. Any assistance would appreciated. This is my current approach:
Anonymizer anon = new Anonymizer();
anon.SetFile(myFile);
anon.Replace(new Tag(0x0010, 0x0010), "BUGS^BUNNY");
Quite late, but maybe it would be still useful. You have not mention if you write in C++ or C#, but I assume the latter, as you do not use pointers. Generally, your approach is correct (unless you use System.IO.File instead of gdcm.File). The value (second parameter of Replace function) has to be a plain string so no special constructor is needed. You should probably start with doxygen documentation of gdcm, and there is especially one complete example. It is in C++, but there should be no problems with translation.
There are two different ways to pad dicom tags:
Anonymizer
gdcm::Anonymizer anon;
anon.SetFile(file);
anon.Replace(gdcm::Tag(0x0002, 0x0013), "Implementation Version Name");
//Implementation Version Name
DatsElement
gdcm::Attribute<0x0018, 0x0088> ss;
ss.SetValue(10.0);
ds.Insert(ss.GetAsDataElement());

What is the fastest way (at run time) to have an Intersystems Cache database stored procedure return nothing but a BLOB?

This question is specifically about Intersystems-Cache databases.
I'm currently using $$$ResultSet("MySQLQueryText") to select the BLOB from a table, but this is probably writing the BLOB to the table, then reading out from the table, instead of writing directly to the output BLOB.
The .INT code compiles into code that creates a %Library.ProcedureContext object, then calls NewResultSet() on that object. However, the source for NewResultSet has a comment: "Used internally only, do not call directly".
Is there a supported way to efficiently create a result set that is nothing but a single record with a single, BLOB column? Ideally I'd like something like a stream object and write to that directly, and have that go straight to the ODBC (or other) driver without copying the stream. If there is a supported solution using another object that isn't exactly a stream that would also be great.
#psr - Based on the discussion in the comments, I believe that you should be able to use code something like the following:
/// Method that will take in various arguments and return a BLOB as an output argument
ClassMethod GetBLOB(
arg1 As %String,
arg2 As %String,
...
Output blob As %Stream.TmpBinary) [ SqlProc ]
{
// Do work to produce your BLOB
Set blob = yourBLOB
Quit
}
Actual support for the BLOB may depend on your client software and whether you are using ODBC or JDBC, but anything reasonably recent should not pose any problems.
You would invoke this stored procedure using syntax like:
CALL Your_Schema.YourClass_GetBLOB('arg1','arg2',?)
The actual method for retrieving the BLOB will then depend on your client software and access method. You can also control the stored procedure name (i.e. the piece after the schema) by adding SqlName = MyGetBLOB next to the SqlProc keyword.

What is the HTTP content type for binary plist?

I am modifying a rails server to handle binary plist from an iPhone client via POST and PUT requests. The content type for text plist is text/plist, as far as I can tell. I would like the server to handle both text and binary plists, so I would like to distinguish between the two forms. What is the content type for binary plist?
I believe that most binary formats are preceded by application so maybe application/plist.
See the bottom of RFC1341.
Update
Like Pumbaa80 mentioned, since application/plist is not a standard mime-type it should be application/x-plist.
In RFC2045 it explains this:
In the future, more top-level types
may be defined only by a standards-track extension to this standard.
If another top-level type is to be used for any reason, it must be
given a name starting with "X-" to indicate its non-standard status
and to avoid a potential conflict with a future official name.