How to use a dynamic string variable in an XPath expression in matlab - matlab

I'm trying to find the children of a specific node (concept) of an XMl document in matlab using Xpath.
I used the following code I get 5 children which is true.
expression = xpath.compile('//concept\[#name="con1"\]/\*');
Childs = expression.evaluate(xDoc, XPathConstants.NODESET);
But for my project I have to use the string values of the attributes "name" of each concept in dynamic manner, so I stored them in vector in order to cal them one by one.
For example, ConceptName(1)="con1", however, when I execute the following code, I get zero children:
expression = xpath.compile('//concept\[#name="ConceptName(1)"\]/\*');
Childs = expression.evaluate(xDoc, XPathConstants.NODESET);
If there is someone who can help me to call the sting variables to the path expression I would be very grateful.
Thank you in advance.
Here is how my XML doc look like, My desired outpout whould be a list of four concepts (the first children of the concept which has the name="con1"), but I must extract the name of the parent concept dynamicly because the structure whould be unkowen.
<?xml version="1.0" encoding="UTF-8"?>
<taxonomy>
<concept name="con1">
<concept name="con11">
<concept name="con1033990258">
<concept name="con271874239">
<concept name="con1657241849">
<concept name="con1448945150">
<instance name="inst686829093"/>
<instance name="inst1379512917"/>
<instance name="inst2072196703"/>
</concept>
</concept>
</concept>
</concept>
<concept name="con12"> </concept>
<concept name="con13"></concept>
<concept name="con14"></concept>
</concept>
</taxonomy>
This is my code
% get the xpath mechanism into the workspace
import javax.xml.xpath.*
factory = XPathFactory.newInstance;
xpath = factory.newXPath;
% read the XML file
filedir = 'C:\Users\Asus\Documents\Asma\MatlabCode\Contribution2\WSC2009_XML'; %location of the file
files = dir(fullfile(filedir, '*.xml'));
xDoc = xmlread(fullfile(filedir, files(1).name)); % read the XML doc but return "[#document: null]". The xmlread function returns a Java object that represents the file's Document Object Model, or DOM. The "null" is simply what the org.apache.xerces.dom.DeferredDocumentImpl's implementation of toString() dumps to the MATLAB Command Window
XDocInMatlab = xmlwrite(xDoc); % show the XML file
taxonomy = xDoc.getElementsByTagName('taxonomy'); %% get the root elment
concepts = xDoc.getElementsByTagName('concept'); %% get the concept elemnt node
concept_Matrix = strings(concepts.getLength,1);
for i = 0 : concepts.getLength-1
conceptName = string(concepts.item(i).getAttribute('name'));
concept_Matrix(i+1,1) = conceptName;
if concepts.item(i).hasChildNodes
expression = xpath.compile('//concept[#name=conceptName]/*');
Childs = expression.evaluate(xDoc, XPathConstants.NODESET);
% Iterate through the nodes that are returned.
for j = 0:Childs.getLength-1
ChildsName(j+1) = char(Childs.item(j).getAttribute('name'));
end
end
end

The expression #name="ConceptName(1)" doesn't select anything because you don't have any elements whose name attribute has the value "ConceptName(1)".
It's hard to know how to correct your code because you don't really tell us what you thought it might mean. You say you stored the attribute names "in a vector" but there's no such thing as a vector in XPath, so I really don't know what you did or what you are trying to achieve.

My guess is that you want to replace the following line
expression = xpath.compile('//concept\[#name="ConceptName(1)"\]/\*')
with something like this:
expression = xpath.compile('//concept\[#name="' + ConceptName(1) + '"\]/\*')
Note that this works only if ConceptName is a string array type (with double quotes), not a char vector (single quotes).
Note also that it is not necessary to escape square brackets and asterisks in strings:
expression = xpath.compile('//concept[#name="' + ConceptName(1) + '"]/*')

Related

Having trouble conditionally moving files based on their names

I am trying to write a script that will auto sort files based on the 7th and 8th digit in their name. I get the following error: "Argument must be a string scalar or character vector". Error is coming from line 16:
Argument must be a string scalar or character vector.
Error in sort_files (line 16)
movefile (filelist(i), DirOut)
Here's the code:
DirIn = 'C:\Folder\Experiment' %set incoming directory
DirOut = 'C:\Folder\Experiment\1'
eval(['filelist=dir(''' DirIn '/*.wav'')']) %get file list
for i = 1:length(filelist);
Filename = filelist(i).name
name = strsplit(Filename, '_');
newStr = extractBetween(name,7,8);
if strcmp(newStr,'01')
movefile (filelist(i), DirOut)
end
end
Also, I am trying to make the file folder conditional so that if the 10-11 digits are 02 the file goes to DirOut/02 etc.
First, try avoid using the eval function, it is pretty much dreaded as slow and hard to understand. Specially if you need to create variables. Instead do this:
filelist = dir(fullfile(DirIn,'*.wav'));
Second, the passage:
name = strsplit(Filename, '_');
Makes name a list, so you can access name{1} or possibly name{2}. Each of these are strings. But name isn't a string, it is a list. extractBetween requires a string as an input. That is why you are getting this problem. But note that you could have simply done:
newStr = name(7:8);
If name was a string, which in Matlab is a char array.
EDIT:
Since it has been now claimed that the error occurs on movefile (filelist(i), DirOut), the likely cause is because filelist(i) is a struct. Wheres a filena name (char array) should have been given at input. The solution should be replacing this line with:
movefile(fullfile(filelist(i).folder, filelist(i).name), DirOut)
Now, if you want to number the output folders too, you can do this:
movefile(fullfile(filelist(i).folder, filelist(i).name), [DirOut,filesep,name(7:8)])
This will move a file to /DirOut/01. If you wanted /DirOut/1, you could do this:
movefile(fullfile(filelist(i).folder, filelist(i).name), [DirOut,filesep,int2str(str2num(name(7:8)))])

Saving figure without providing filename [duplicate]

this question about matlab:
i'm running a loop and each iteration a new set of data is produced, and I want it to be saved in a new file each time. I also overwrite old files by changing the name. Looks like this:
name_each_iter = strrep(some_source,'.string.mat','string_new.(j).mat')
and what I#m struggling here is the iteration so that I obtain files:
...string_new.1.mat
...string_new.2.mat
etc.
I was trying with various combination of () [] {} as well as 'string_new.'j'.mat' (which gave syntax error)
How can it be done?
Strings are just vectors of characters. So if you want to iteratively create filenames here's an example of how you would do it:
for j = 1:10,
filename = ['string_new.' num2str(j) '.mat'];
disp(filename)
end
The above code will create the following output:
string_new.1.mat
string_new.2.mat
string_new.3.mat
string_new.4.mat
string_new.5.mat
string_new.6.mat
string_new.7.mat
string_new.8.mat
string_new.9.mat
string_new.10.mat
You could also generate all file names in advance using NUM2STR:
>> filenames = cellstr(num2str((1:10)','string_new.%02d.mat'))
filenames =
'string_new.01.mat'
'string_new.02.mat'
'string_new.03.mat'
'string_new.04.mat'
'string_new.05.mat'
'string_new.06.mat'
'string_new.07.mat'
'string_new.08.mat'
'string_new.09.mat'
'string_new.10.mat'
Now access the cell array contents as filenames{i} in each iteration
sprintf is very useful for this:
for ii=5:12
filename = sprintf('data_%02d.mat',ii)
end
this assigns the following strings to filename:
data_05.mat
data_06.mat
data_07.mat
data_08.mat
data_09.mat
data_10.mat
data_11.mat
data_12.mat
notice the zero padding. sprintf in general is useful if you want parameterized formatted strings.
For creating a name based of an already existing file, you can use regexp to detect the '_new.(number).mat' and change the string depending on what regexp finds:
original_filename = 'data.string.mat';
im = regexp(original_filename,'_new.\d+.mat')
if isempty(im) % original file, no _new.(j) detected
newname = [original_filename(1:end-4) '_new.1.mat'];
else
num = str2double(original_filename(im(end)+5:end-4));
newname = sprintf('%s_new.%d.mat',original_filename(1:im(end)-1),num+1);
end
This does exactly that, and produces:
data.string_new.1.mat
data.string_new.2.mat
data.string_new.3.mat
...
data.string_new.9.mat
data.string_new.10.mat
data.string_new.11.mat
when iterating the above function, starting with 'data.string.mat'

Qt5.5 QByteArray indexOf mid wrong result

I have an XML file in a QByteArray I am using the indexOf method to find a string in the array, but the position returned isn't correct. If I examine the data content using qDebug I can see that the data has escape characters which isn't a problem but I don't think indexOf is counting the escape characters.
For example the result from:
qDebug() << arybytXML;
A snippet from the result of this is:
<?xml version="1.0" encoding="utf-8"?><!--\n Node: gui\n Attrbuttes: left, right, top and bottom defines the pixel white space to allow\n from the edge of the display\n\t\tlanguage, should be set to the appropriate country code, an XML file named using\n\t\tthe country code must exist, e.g. 44.xml\n//-->\n<gui id=\"root\" bottom=\"0\" left=\"0\" right=\"0\" top=\"24\" language=\"44\">
I use the code:
intOpenComment = arybytXML.indexOf("<!--");
The result is that intOpenComment is 39. If I then search for the end comment and try to extract the data I get the wrong result:
intClosingComment = arybytXML.indexOf("-->", intOpenComment);
QString strComment = arybytXML.mid(intOpenComment
,intClosingComment + strlen("-->"));
Result:
<!--\n Node: gui\n Attrbuttes: left, right, top and bottom defines the pixel white space to allow\n from the edge of the display\n\t\tlanguage, should be set to the appropriate country code, an XML file named using\n\t\tthe country code must exist, e.g. 44.xml\n//-->\n<gui id=\"root\" bottom=\"0\" left=\"0\" rig"
The result should stop after -->, why is there more data?
The problem is that when using mid, the 2nd parameter should be the number of bytes and needed to have 'intOpenComment' removed.

why strcat() doesn't return a string in Matlab?

I'm trying to access multiple files in a for loop, like this:
age = xlsread(strcat('Pipeline_BO_2013_',names(2),'_CDBU.xlsx'), 'Data', 'H:I')
It returns an error the filename must be string. So I did following test:
filename = strcat('Pipeline_BO_2013_',names(2),'_CDBU.xlsx')
filename =
'Pipeline_BO_2013_0107_CDBU.xlsx'
isstr(filename)
ans =
0
This is so weird. Could any one help me out? Thank you so much.
It looks like names is a cellstr and not a char array. If so, indexing in to it with parentheses like names(2) will return a 1-long cellstr array, not a char array. And when strcat is called with any of its arguments as a cellstr, it returns a cellstr. Then xlsread errors because it wants a char, not a cellstr.
Instead of just calling isstr or ischar on filename, do class(filename) and it'll tell you what it is.
Another clue is that filename is displayed with quotes. This is how cellstrs are displayed. If it were a char array, it would be displayed without quotes.
If this is the case, and names is a cellstr, you need to use {} indexing to "pop out" the cell contents.
filename = strcat('Pipeline_BO_2013_',names{2},'_CDBU.xlsx')
Or you can use sprintf, which you may find more readable, and will be more flexible once you start interpolating multiple arguments of different types.
filename = sprintf('Pipeline_BO_2013_%s_CDBU.xlsx', names{2})
% An example of more flexibility:
year = 2013;
filename = sprintf('Pipeline_BO_%04d_%s_CDBU.xlsx', year, names{2})

How to get Matlab to read correct amount of xml nodes

I'm reading a simple xml file using matlab's xmlread internal function.
<root>
<ref>
<requestor>John Doe</requestor>
<project>X</project>
</ref>
</root>
But when I call getChildren() of the ref element, it's telling me that it has 5 children.
It works fine IF I put all the XML in ONE line. Matlab tells me that ref element has 2 children.
It doesn't seem to like the spaces between elements.
Even if I run Canonicalize in oXygen XML editor, I still get the same results. Because Canonicalize still leaves spaces.
Matlab uses java and xerces for xml stuff.
Question:
What can I do so that I can keep my xml file in human readable format (not all in one line) but still have matlab correctly parse it?
Code Update:
filename='example01.xml';
docNode = xmlread(filename);
rootNode = docNode.getDocumentElement;
entries = rootNode.getChildNodes;
nEnt = entries.getLength
The XML parser behind the scenes is creating #text nodes for all whitespace between the node elements. Whereever there is a newline or indentation it will create a #text node with the newline and following indentation spaces in the data portion of the node. So in the xml example you provided when it is parsing the child nodes of the "ref" element it returns 5 nodes
Node 1: #text with newline and indentation spaces
Node 2: "requestor" node which in turn has a #text child with "John Doe" in the data portion
Node 3: #text with newline and indentation spaces
Node 4: "project" node which in turn has a #text child with "X" in the data portion
Node 5: #text with newline and indentation spaces
This function removes all of these useless #text nodes for you. Note that if you intentionally have an xml element composed of nothing but whitespace then this function will remove it but for the 99.99% of xml cases this should work just fine.
function removeIndentNodes( childNodes )
numNodes = childNodes.getLength;
remList = [];
for i = numNodes:-1:1
theChild = childNodes.item(i-1);
if (theChild.hasChildNodes)
removeIndentNodes(theChild.getChildNodes);
else
if ( theChild.getNodeType == theChild.TEXT_NODE && ...
~isempty(char(theChild.getData())) && ...
all(isspace(char(theChild.getData()))))
remList(end+1) = i-1; % java indexing
end
end
end
for i = 1:length(remList)
childNodes.removeChild(childNodes.item(remList(i)));
end
end
Call it like this
tree = xmlread( xmlfile );
removeIndentNodes( tree.getChildNodes );
I felt that #cholland answer was good, but I didn't like the extra xml work. So here is a solution to strip the whitespace from a copy of the xml file which is the root cause of the unwanted elements.
fid = fopen('tmpCopy.xml','wt');
str = regexprep(fileread(filename),'[\n\r]+',' ');
str = regexprep(str,'>[\s]*<','><');
fprintf(fid,'%s', str);
fclose(fid);