Metadata-Extractor -- Missing List of Tags? - metadata

I'm using metadata-extractor to retrieve metadata from video files. I have it successfully retrieving the directories. Now I need to query the directories for specific info -- duration, height, etc.
The metadata-extractor docs give this example of how to query for a specific tag value:
// obtain the Exif directory
ExifSubIFDDirectory directory
= metadata.getFirstDirectoryOfType(ExifSubIFDDirectory.class);
// query the tag's value
Date date
= directory.getDate(ExifSubIFDDirectory.TAG_DATETIME_ORIGINAL);
So it appears I need to get a list of the relevant tags, such as TAG_DATETIME_ORIGINAL, for duration, height, etc.
This page in the metadata-extractor docs contains a link titled "the various tag values", but the page it goes to lists tags for still images only, not for video files.
Googling for Metadata-Extractor -- Complete List of All Tags does not seem to bring up a list of all tags.
Are the metadata-extractor docs really missing a list of tags, or am I approaching this the wrong way somehow?

I found a list of tags at:
https://developer.tizen.org/dev-guide/2.3.1/org.tizen.guides/html/native/multimedia/metadata_extractor_n.htm
However, those constants don't seem to be what's needed in actual code. Here's Java code that works:
import com.drew.imaging.ImageMetadataReader;
import com.drew.metadata.Directory;
import com.drew.metadata.Metadata;
import com.drew.metadata.Tag;
import com.drew.metadata.file.FileTypeDirectory;
import com.drew.metadata.mp4.Mp4Directory;
import com.drew.metadata.mp4.media.Mp4SoundDirectory;
import com.drew.metadata.mp4.media.Mp4VideoDirectory;
[.....]
Metadata theMetadata = null;
try {
InputStream stream = new URL(theVideoInfo.getLinkToVideo()).openStream();
theMetadata = ImageMetadataReader.readMetadata(stream);
}
} catch (java.lang.Exception exception) {
exception.printStackTrace();
}
Mp4SoundDirectory soundDirectory
= theMetadata.getFirstDirectoryOfType(Mp4SoundDirectory.class);
Mp4VideoDirectory videoDirectory
= theMetadata.getFirstDirectoryOfType(Mp4VideoDirectory.class);
Mp4Directory mp4Directory
= theMetadata.getFirstDirectoryOfType(Mp4Directory.class);
FileTypeDirectory fileTypeDirectory
= theMetadata.getFirstDirectoryOfType(FileTypeDirectory.class);
String numberOfAudioChannels
= soundDirectory.getString(Mp4SoundDirectory.TAG_NUMBER_OF_CHANNELS);
String duration = mp4Directory.getString(Mp4Directory.TAG_DURATION);
String frameRate = videoDirectory.getString(Mp4VideoDirectory.TAG_FRAME_RATE);
String height = videoDirectory.getString(Mp4VideoDirectory.TAG_HEIGHT);
String width = videoDirectory.getString(Mp4VideoDirectory.TAG_WIDTH);
String type = fileTypeDirectory.getString(FileTypeDirectory.TAG_DETECTED_FILE_MIME_TYPE);
I found the constants (TAG_HEIGHT, TAG_WIDTH, etc.) by directly examining the metadata-extractor objects in the debugger. For example, I'd type:
Mp4VideoDirectory.WIDTH
...and the debugger (IntelliJ) would auto-complete the available constants that had the text "WIDTH" in them.

Related

Email Pdf attachments to Google sheet

I am looking for something that allows me from a mail PDF attachment to get a data in a google sheet.
We all often get PDF attachments in our email and it will be great if we get the entire data in a google sheet.
DO let me know if there is anything like this
Explanation:
Your question is very broad and it is impossible to give a specific answer because that answer would depend on the pdf but also on the data you want to fetch from that, besides all the other details you skipped to mention.
Here I will provide a general code snippet which you can use to get the pdf from the gmail attachments and then convert it to text (string). For this text you can use some regular expressions (which have to be very specific on your use case) to get the desired information and then put it in your sheet.
Code snippet:
The main code will this one. You should only modify this code:
function myFunction() {
const queryString = "label:unread from example#gmail.com" // use your email query
const threads = GmailApp.search(queryString);
const threadsMessages = GmailApp.getMessagesForThreads(threads);
const ss = SpreadsheetApp.getActive();
for (thr = 0, thrs = threads.length; thr < thrs; ++thr) {
let messages = threads[thr].getMessages();
for (msg = 0, msgs = messages.length; msg < msgs; ++msg) {
let attachments = messages[msg].getAttachments();
for (att = 0, atts = attachments.length; att < atts; ++att) {
let attachment_Name = attachments[att].getName();
let filetext = pdfToText( attachments[att], {keepTextfile: false} );
Logger.log(filetext)
// do something with filetext
// build some regular expression that fetches the desired data from filetext
// put this data to the sheet
}}}
}
and pdfToText is a function implemented by Mogsdad which you can find here. Just copy paste the code snippet provided in that link together with myFunction I posted in this answer. Also you have some options which you can use that are very well explained in the link I provided. Important thing to note, to use this library you need to enable the Drive API from the resources.
This will get you started and if you face any issues down the road which you can't find the solution for, you should create a new post here with the specific details of the problem.

Import content to AEM

We have a lot content that need to be imported in AEM.
what is best way to import it? Is that any possibilities to import from excel file?
check this example
An good example for exporting is here /etc/importers/bulkeditor.html we can export the file with single "Properties / Columns" where i can define the Root Path and Properties.
I tried this packet but dos not contain what I like. https://helpx.adobe.com/experience-manager/using/creating-custom-excel-service-experience.html
http://localhost:4502/etc/importers/bulkeditor.html
documentation
I just made a test following the instructions on the link above and it worked.
My test:
Root path = /content/myapp-path/rootpage
Query parameters = "jcr:title": Title of pages that I what to include in the search
Content mode = unchecked
Properties / Columnos =
sling:resourceType and
jcr:title
Custom properties / Columns =
landingTags
Clicked Search... and worked.
Importing data to AEM can be done in lots of ways.
What is the "right" way for you, depends on your specific requirements.
Is this a one time import, or are you rather planning to write a
reusable import tool?
Is the import done by programmers or admins, or
rather by editors?
Do you need fault tolerance or a rollback? Are you
working on productive instances?
...
Here are a few of the more or less common ways (ordered from cheap/fast to extensive/comfortable) along with links to the documentation or examples:
Write a bash-script and post values with cURL requests (1)
Upload data in DAM, write EventHandler (2), (7) or a workflow (3) with a workflow launcher (4) and parse the data, afterwards change the repository.
Upload data in separate file in an own component with a file upload section (5), parse the data and change the repository.
For parsing excel data I would suggest you use apache poi (6), which is already included in AEM. But using formats like csv, json or xml will maybe save you lots of parsing efforts.
(1): http://www.aemcq5tutorials.com/tutorials/adobe-cq5-aem-curl-commands/
(2): https://osgi.org/javadoc/r4v42/index.html?org/osgi/service/event/EventHandler.html
(3): https://docs.adobe.com/docs/en/aem/6-1/develop/extending/workflows/wf-extending.html
(4): https://docs.adobe.com/docs/en/aem/6-1/administer/operations/workflows/wf-start.html
(5): https://helpx.adobe.com/experience-manager/using/uploading-files-aem1.html
(6): https://poi.apache.org/spreadsheet/index.html
(7): code example
#Service
#Component(immediate = true, policy = ConfigurationPolicy.OPTIONAL, description = "Listen to page modification events and track them.")
#Properties(value = { #Property(name = "event.topics", value = { PageEvent.EVENT_TOPIC, DamEvent.EVENT_TOPIC}, propertyPrivate = true),
#Property(name = JobConsumer.PROPERTY_TOPICS, value = ModificationEventHandler.JOB_TOPICS, propertyPrivate = true) })
public class ModificationEventHandler implements EventHandler, JobConsumer {
#Override public void handleEvent(Event event) {
logger.trace("Checking event.");
PageEvent pageEvent = PageEvent.fromEvent(event);
DamEvent damEvent = DamEvent.fromEvent(event);
Map<String, Object> properties = new HashMap<>();
if (damEvent != null) {
// DamEvent is not serializable, so we cannot add the complete event to the map.
logger.trace("Event on {} is a dam event ({}).", damEvent.getAssetPath(), damEvent.getType().name());
properties.put(DAM_EVENT_ASSET_PATH, damEvent.getAssetPath());
}
if (pageEvent != null) {
logger.trace("Event is a page event.");
properties.put(PAGE_EVENT, pageEvent);
}
logger.trace("Adding new job.");
jobManager.addJob(JOB_TOPICS, properties);
}
Content can be imported via the SlingPostServlet with :operation=import: https://sling.apache.org/documentation/bundles/manipulating-content-the-slingpostservlet-servlets-post.html#importing-content-structures. Here is an example adapted from the page:
curl -u admin:admin http://localhost:4502/content/mysite/mypage \
-F":operation=import" \
-F":contentType=json"
-F":name=sample" \
-F":content={ 'jcr:primaryType': 'nt:unstructured', 'propOne' : 'propOneValue', 'childOne' : { 'childPropOne' : true } }"
Another author-friendly option is the CSV Asset Importer included with the ACS AEM Tools package. You can save an Excel file to CSV so this should be the easy option.

How to edit pasted content using the Open XML SDK

I have a custom template in which I'd like to control (as best I can) the types of content that can exist in a document. To that end, I disable controls, and I also intercept pastes to remove some of those content types, e.g. charts. I am aware that this content can also be drag-and-dropped, so I also check for it later, but I'd prefer to stop or warn the user as soon as possible.
I have tried a few strategies:
RTF manipulation
Open XML manipulation
RTF manipulation is so far working fairly well, but I'd really prefer to use Open XML as I expect it to be more useful in the future. I just can't get it working.
Open XML Manipulation
The wonderfully-undocumented (as far as I can tell) "Embed Source" appears to contain a compound document object, which I can use to modify the copied content using the Open XML SDK. But I have been unable to put the modified content back into an object that lets it be pasted correctly.
The modification part seems to work fine. I can see, if I save the modified content to a temporary .docx file, that the changes are being made correctly. It's the return to the clipboard that seems to be giving me trouble.
I have tried assigning just the Embed Source object back to the clipboard (so that the other types such as RTF get wiped out), and in this case nothing at all gets pasted. I've also tried re-assigning the Embed Source object back to the clipboard's data object, so that the remaining data types are still there (but with mismatched content, probably), which results in an empty embedded document getting pasted.
Here's a sample of what I'm doing with Open XML:
using OpenMcdf;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
...
object dataObj = Forms.Clipboard.GetDataObject();
object embedSrcObj = dateObj.GetData("Embed Source");
if (embedSrcObj is Stream)
{
// read it with OpenMCDF
Stream stream = embedSrcObj as Stream;
CompoundFile cf = new CompoundFile(stream);
CFStream cfs = cf.RootStorage.GetStream("package");
byte[] bytes = cfs.GetData();
string savedDoc = Path.GetTempFileName() + ".docx";
File.WriteAllBytes(savedDoc, bytes);
// And then use the OpenXML SDK to read/edit the document:
using (WordprocessingDocument openDoc = WordprocessingDocument.Open(savedDoc, true))
{
OpenXmlElement body = openDoc.MainDocumentPart.RootElement.ChildElements[0];
foreach (OpenXmlElement ele in body.ChildElements)
{
if (ele is Paragraph)
{
Paragraph para = (Paragraph)ele;
if (para.ParagraphProperties != null && para.ParagraphProperties.ParagraphStyleId != null)
{
string styleName = para.ParagraphProperties.ParagraphStyleId.Val;
Run run = para.LastChild as Run; // I know I'm assuming things here but it's sufficient for a test case
run.RunProperties = new RunProperties();
run.RunProperties.AppendChild(new DocumentFormat.OpenXml.Wordprocessing.Text("test"));
}
}
// etc.
}
openDoc.MainDocumentPart.Document.Save(); // I think this is redundant in later versions than what I'm using
}
// repackage the document
bytes = File.ReadAllBytes(savedDoc);
cf.RootStorage.Delete("Package");
cfs = cf.RootStorage.AddStream("Package");
cfs.Append(bytes);
MemoryStream ms = new MemoryStream();
cf.Save(ms);
ms.Position = 0;
dataObj.SetData("Embed Source", ms);
// or,
// Clipboard.SetData("Embed Source", ms);
}
Question
What am I doing wrong? Is this just a bad/unworkable approach?

Reading sheet contents using SmartSheet Java SDK

I am trying to read data using the Smartsheet API from Java to create different formats, such as reports & labels with the data from one row.
I've set up my IDE (NetBeans) so that the API samples work for me, but they are all about creating new sheets etc and I can not figure out how to read the contents of an existing sheet.
I would have thought that I could read the entire sheet into a java object in one line of code, but it appears more complicated than that, and I can not find any applicable documentation anywhere. The Javadoc does not say where/how to get the relevant IDs, what any of the inclusion or exclusion objects actually do, or which are required or optional etc.
Are there any examples of reading the contents of a sheet from java available?
I know that this is a bit of a broad question, but I'm totally stumped.
Thanks Kim!
For others, here's what worked for me. This code gets a list of the sheets in my account and displays the contents those with names starting with "Specs - " :
import com.smartsheet.api.Smartsheet;
import com.smartsheet.api.SmartsheetBuilder;
import com.smartsheet.api.SmartsheetException;
import com.smartsheet.api.models.Cell;
import com.smartsheet.api.models.Column;
import com.smartsheet.api.models.PagedResult;
import com.smartsheet.api.models.Row;
import com.smartsheet.api.models.Sheet;
import com.smartsheet.api.oauth.Token;
import java.util.List;
import java.util.Objects;
import java.util.logging.Level;
import java.util.logging.Logger;
public class SampleCode {
/*
Show the contenst of all sheets whose name starts with "Specs - "
*/
public static void main(String[] args) {
final String delimiter = ", ";
// Create a Smartsheet object with our Access Token
Token token = new Token();
token.setAccessToken(Private.TOKEN);
Smartsheet smartsheet = new SmartsheetBuilder().setAccessToken(token.getAccessToken()).build();
//get a paged list of all Sheets, using null Source Inclusion & Pagination parameters
PagedResult<Sheet> homeSheets = new PagedResult<>();
try {
homeSheets = smartsheet.sheetResources().listSheets(null, null);
} catch (SmartsheetException ex) {
Logger.getLogger(SampleCode.class.getName()).log(Level.SEVERE, null, ex);
}
// get a Java List<Sheet> from the PagedResult<Sheet>
List<Sheet> sheetInfoList = homeSheets.getData();
// Loop through each sheet in the list
for (Sheet sheetInfo : sheetInfoList) {
String sheetName = sheetInfo.getName();
// Show data for all sheets with names that match our pattern
if (sheetName.startsWith("Specs - ")) {
// get the sheet object, with no optional includes or excludes
Sheet theSheet = null;
try {
theSheet = smartsheet.sheetResources().getSheet(sheetInfo.getId(), null, null, null, null, null, null, null);
} catch (SmartsheetException ex) {
Logger.getLogger(SampleCode.class.getName()).log(Level.SEVERE, null, ex);
}
// Print the sheets name
System.out.println("\nSheet: " + theSheet.getName() + "\n");
// Print the column titles as a delimited line of text.
List<Column> columnList = theSheet.getColumns();
String columnHeader = null;
for (Column col : columnList) {
columnHeader = columnHeader == null ? col.getTitle() : columnHeader + delimiter + col.getTitle();
}
System.out.println(columnHeader);
// Print each row as a delimited line of text.
List<Row> rowList = theSheet.getRows();
for (Row row : rowList) {
List<Cell> cellList = row.getCells();
String rowOutput = null;
for (Cell cell : cellList) {
String cellOutput = Objects.toString(cell.getValue() != null ? cell.getValue() : cell.getDisplayValue());
rowOutput = rowOutput == null ? cellOutput : rowOutput + delimiter + cellOutput;
}
System.out.println(rowOutput);
}
}
}
}
}
The Smartsheet API documentation contains sample code that shows how to use the Java SDK. The Java Sample Code section in the docs describes how to establish the connection, etc. Then, each operation within in the "API Reference" section shows sample code (in the "Java" tab of the panel on the right side of the page) for executing the operation using the Java SDK.
To get Sheet data you'll use the "Get Sheet" operation. As described in the API docs, here's the Java SDK sample code for that operation:
// Get sheet (omit all parameters).
smartsheet.sheetResources().getSheet(sheetId, null, null, null, null, null, null, null);
The "sheetId" parameter should be the ID of the Sheet which you want to retrieve. You can either get this ID programmatically (i.e., by using an operation like "List Sheets", for example) -- or you can get it manually via the Smartsheet UI as described in this help article. The other parameters (all set to "null" in the sample code) represent the 7 parameters described in the API documentation for this operation. I imagine intellisense in your IDE should indicate the sequence of those parameters expected by the getSheet function, as well as valid values for each parameter (but the API docs will explain the meaning of each parameter).

How do I retrieve a page number or page reference for an Outline destination in a PDF on iOS?

I've been reading through the adobe pdf spec, along with apple's quartz 2d documentation for pdf rendering and parsing. I've also downloaded Voyeur and inspected a local pdf with it to see it's internal data. At this point I'm able to get the document catalog, and then fetch the outlines dictionary from there. I can see that nested within the outlines dictionary dictionaries that there are named "/Dest" nodes with values such as:
G1.1025588
etc
I'm wondering if there is a way for me to use these values to get a reference to page to render using some methods I've seen github projects such as Reader, along with apple documented examples.
PDF processing is definitely a challenge, so any help would be appreciated.
The /Dest entry in an outline item dictionary can either be a name, a string, or an array.
The simplest case is if it's an array; then the first item is the page object the outline entry points to (a dictionary). To get the page number, you have to iterate over all pages in the document and see which one is equal (==) to the dictionary you have (CGPDFPageRefs are actually CGPDFDictionaryRefs). You could also traverse the page tree, which is a bit harder, but may be faster (not as much as you might expect, I wouldn't optimize prematurely here). The other items in the array are position on the page etc., search for "Explicit Destinations" in the PDF spec to learn more.
If the entry is a name or string, it is a named destination. You have to map the name to a destination from the document catalog's /Dests entry which is a dictionary that contains a name tree. A name tree is essentially a tree map that allows fast access to named values without requiring to read all the data at once (as with a plain dictionary). Unfortunately, there's no direct support for name trees in Quartz, so you'll have to do a little more work to parse this structure recursively (see "Name Trees" in the PDF spec).
Note that an outline item doesn't necessarily have a /Dest entry, it can also specify its destination via an /A (action) entry, which is a little bit more complex. In most cases, however, the action will be a "GoTo" action that is essentially a wrapper for a destination.
The mapping of names to destinations can also be stored as a plain dictionary. In that case, it's in the /Dests entry of the /Names dictionary in the document's catalog. I've rarely seen this though and it was deprecated after PDF 1.2 (current is 1.7).
You will definitely need the PDF spec for this: http://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf
Thanks to Omz, here is a piece of code to retreive a page number for an outline destination in a PDF file :
// Get Page Number from an array
- (int) getPageNumberFromArray:(CGPDFArrayRef)array ofPdfDoc:(CGPDFDocumentRef)pdfDoc withNumberOfPages:(int)numberOfPages
{
int pageNumber = -1;
// Page number reference is the first element of array (el 0)
CGPDFDictionaryRef pageDic;
CGPDFArrayGetDictionary(array, 0, &pageDic);
// page searching
for (int p=1; p<=numberOfPages; p++)
{
CGPDFPageRef page = CGPDFDocumentGetPage(pdfDoc, p);
if (CGPDFPageGetDictionary(page) == pageDic)
{
pageNumber = p;
break;
}
}
return pageNumber;
}
// Get page number from an outline. Only support "Dest" and "A" entries
- (int) getPageNumber:(CGPDFDictionaryRef)node ofPdfDoc:(CGPDFDocumentRef)pdfDoc withNumberOfPages:(int)numberOfPages
{
int pageNumber = -1;
CGPDFArrayRef destArray;
CGPDFDictionaryRef dicoActions;
if(CGPDFDictionaryGetArray(node, "Dest", &destArray))
{
pageNumber = [self getPageNumberFromArray:destArray ofPdfDoc:pdfDoc withNumberOfPages:numberOfPages];
}
else if(CGPDFDictionaryGetDictionary(node, "A", &dicoActions))
{
const char * typeOfActionConstChar;
CGPDFDictionaryGetName(dicoActions, "S", &typeOfActionConstChar);
NSString * typeOfAction = [NSString stringWithUTF8String:typeOfActionConstChar];
if([typeOfAction isEqualToString:#"GoTo"]) // only support "GoTo" entry. See PDF spec p653
{
CGPDFArrayRef dArray;
if(CGPDFDictionaryGetArray(dicoActions, "D", &dArray))
{
pageNumber = [self getPageNumberFromArray:dArray ofPdfDoc:pdfDoc withNumberOfPages:numberOfPages];
}
}
}
return pageNumber;
}