DCMMKDIR creating new files instead of appending - command-line

I have a C# application that uses DCMMKDIR.exe to create a DICOMDIR file out of a bunch of DICOM files. My problem with this is that it does not put the information in the DICOMDIR file but instead creates it's own file name something similar to Dda11008. My command line and args are as follows:
dcmmkdir +A +r +u +I -nb --input-directory " + studyLocation + " --output-file " + dicomdirlocation + #"\DICOMDIR"
These are the files created:
As you can see the DICOMDIR is 0kb in size and has nothing in it.
Please let me know if you need any other information.
Thanks in advance!

If you are creating part 10 compliant DICOM Media, you should prepare the individual DICOM file to match the part 10 requirement. This should include a 128 byte File Preamble, followed by a 4 byte DICOM prefix "DICM", followed by the File Meta Elements such as follows:
File Meta Information Group Length (0002,0000)
File Meta Information Version (0002,0001)
Media Storage SOP Class UID (0002,0002)
Media Storage SOP Instance UID (0002,0003)
Transfer Syntax UID (0002,0010)
Implementation Class UID (0002,0012)
Follow the file naming and folder structure convention as defined in the standard. They include following:
DICOMDIR location should be at the root directory
Filename should be limited to 8 characters and no extension. G0 repertoire of ISO 8859 (e.g. upper case A-Z and 0 to 1 plus the “_” (underscore))
Directory structure: it may contain one to eight components (sub directory) and each component should be limited to 16 characters max from the subset of the G0 repertoire
Also make sure to have the unique ID for each level of DICOM hierarchical model (Patient ID, Study Instance UID, Series Instance UID and SOP Instance UID).
Having the above information will help facilitate creation of a proper DICOMDIR

Related

How to save urls of images from a text file?

I have a text file with many URL's and have imported them into matlab via a cell with 1400 rows of URL's.
How do I take these urls of images and save them with a pre-determined size into a sub-folder?
urlsToImgs = importdata('ImageURLS.txt');
for i = 1:1
outfilename = websave(['PosImage: ', num2str(i)],['' urlsToImgs{i} '']);
end
This makes an error: The file name contains characters that are not contained in the filesystem encoding.
Certain operations may not work as expected.
Im guessing since I pasted it from the web it contains an invisible character, how can I delete this?
delete colon in filename. i.e.) change 'PosImage: ' to 'PosImage_'
you may not allowed to create file with special characters.

Reading file names inside .zip file

I am familiar with the .zip file format, and able to read the internal file table content so far.
The problem occurs with non-english characters in the file name.
The specification states that file names use OEM character set, yet sometimes I get UTF-8 representation and sometimes I get OEM represantation.
The specification states the "version made by" field should be in range 0-20, yet I get versions 31 and 63 which may or may not affect the character set.
Another related problem: When I read the "extra field" there is "up" (unicode path, id=0x7075) which suppose to store the utf-8 represantation of the filename, well, it starts with 5 redundant bytes before the actual utf-8 string (Created by WinRar), yet the other softwares seems to read it correctly.
Any input about the issue?

Google Cloud Storage not handling UTF-8 filenames

I am serving files from Google Cloud Storage and some of the filenames contain non-ASCII, UTF-8 encoded characters. For example, volvía.mp3.
If I request volvía.mp3, GCS throws an error.
If I percent encode the filename (í = %C3%AD) as volv%C3%AD.mp3, it still fails.
If I percent encode the filename using the "combining acute accent" = %CC%81 as volvi%CC%81a.mp3, it succeeds.
Any ideas what is going on?
EDIT: The error it throws is an "Access Denied" error:
Anonymous users does not have storage.objects.get access to object. However, this seems to be the error one gets when requesting an object that's not found.
The problem is due Mac OS's HFS+ file system, which enforces canonical decomposition (NFD) on filenames. This means it normalizes characters such as í into two code points (i + combining acute accent) rather than the single code point that is used in "composed" forms, ie., NFC).
GCS treats these two different forms as distinct filenames, despite that fact that they appear identical.
One solution is to convert NFD filenames to the more common NFC forms (using a utility such as convmv) before uploading to GCS. However, this can't be done on Mac OS because the file system itself enforces NFD.
I was not able to reproduce your issue. I uploaded an object named volvía.mp3 and was able to retrieve it as both http://storage.googleapis.com/bucketname/volvía.mp3 and http://storage.googleapis.com/bucketname/volv%C3%ADa.mp3
I suspect that you actually created an object with the "combining acute accent" character instead. How did you upload your object?

ITEXT PDFReader not able to read PDF

I am not able to read a PDF file using itext pdfreader. This PDf is valid PDF if I tried to open this.
URL Of PDF: http://www.fundslibrary.co.uk/FundsLibrary.DataRetrieval/Documents.aspx?type=fund_class_kiid&id=f096b13b-3d0e-4580-8d3d-87cf4d002650&user=fidelitydocumentreport
The PDF in question is encrypted.
According to the PDF specification,
Encryption applies to all strings and streams in the document's PDF file, with the following exceptions:
The values for the ID entry in the trailer
Any strings in an Encrypt dictionary
Any strings that are inside streams such as content streams and compressed object streams, which themselves are encrypted
Later on there are information on special cases in which the document level metadata stream is not encrypted either or in which only attachments are encrypted.
The Cross-Reference Stream Dictionary of the PDF looks like this:
<<
/Root 101 0 R
/Info 63 0 R
/XRef(stream)
/Encrypt 103 0 R
/ID[<D034DE62220E1CBC2642AC517F0FE9C7><D034DE62220E1CBC2642AC517F0FE9C7>]
/Type/XRef
/W[1 3 2]
/Index[0 107]
/Size 107
/Length 642
>>
As you can see there is an non-encrypted string here, (stream), which is neither the value for the ID entry, nor in an Encrypt dictionary, nor inside a stream. Furthermore, the afore mentioned special cases do not apply here either.
Thus, this file violates the PDF specification here. Therefore, this file is not a valid PDF.
Furthermore, according to the PDF specification
The last line of the file shall contain only the end-of-file marker, %%EOF.
The file at handsends like this
Thus, the last line of the file does contain something else than the end-of-file marker (which is in the line before), a 0x06 and a 0x0c.
The file, therefore, violates the PDF specification here, too.

Determine whether file is a PDF in perl?

Using perl, what is the best way to determine whether a file is a PDF?
Apparently, not all PDFs start with %PDF. See the comments on this answer: https://stackoverflow.com/a/941962/327528
Detecting a PDF is not hard, but there are some corner cases to be aware of.
All conforming PDFs contain a one-line header identifying the PDF specification to which the file conforms. Usually it's %PDF-1.N where N is a digit between 0 and 7.
The third edition of the PDF Reference has an implementation note that Acrobat viewer require only that the header appears within the first 1024 bytes of a file. (I've seen some cases where a job control prefix was added to the start of a PDF file, so '%PDF-1.' weren't the first seven bytes of the file)
The subsequent implementation note from the third edition (PDF 1.4) states: Acrobat viewers will also accept a header of the form: %!PS-Adobe-N.n PDF-M.m but note that this isn't part of the ISO32000:2008 (PDF 1.7) specification.
If the file doesn't begin immediately with %PDF-1.N, be careful because I've seen a case where a zip file containing a PDF was mistakenly identified as a PDF because that part of the embedded file wasn't compressed. so a check for the PDF file trailer is a good idea.
The end of a PDF will contain a line with '%%EOF',
The third edition of the PDF Reference has an implementation note that Acrobat viewer requires only that the %%EOF marker appears within the last 1024 bytes of a file.
Two lines above the %%EOF should be the 'startxref' token and the line in between should be a number for the byte offset from the start of the file to the last cross reference table.
In sum, read in the first and last 1kb of the file into a byte buffer, check that the relevant identifying byte string tokens are approximately where they are supposed to be and if they are then you have a reasonable expectation that you have a PDF file on your hands.
The module PDF::Parse has method called IsaPDF which
Returns true, if the file could be parsed and is a PDF-file.