Qt5.5 QByteArray indexOf mid wrong result - substring

I have an XML file in a QByteArray I am using the indexOf method to find a string in the array, but the position returned isn't correct. If I examine the data content using qDebug I can see that the data has escape characters which isn't a problem but I don't think indexOf is counting the escape characters.
For example the result from:
qDebug() << arybytXML;
A snippet from the result of this is:
<?xml version="1.0" encoding="utf-8"?><!--\n Node: gui\n Attrbuttes: left, right, top and bottom defines the pixel white space to allow\n from the edge of the display\n\t\tlanguage, should be set to the appropriate country code, an XML file named using\n\t\tthe country code must exist, e.g. 44.xml\n//-->\n<gui id=\"root\" bottom=\"0\" left=\"0\" right=\"0\" top=\"24\" language=\"44\">
I use the code:
intOpenComment = arybytXML.indexOf("<!--");
The result is that intOpenComment is 39. If I then search for the end comment and try to extract the data I get the wrong result:
intClosingComment = arybytXML.indexOf("-->", intOpenComment);
QString strComment = arybytXML.mid(intOpenComment
,intClosingComment + strlen("-->"));
Result:
<!--\n Node: gui\n Attrbuttes: left, right, top and bottom defines the pixel white space to allow\n from the edge of the display\n\t\tlanguage, should be set to the appropriate country code, an XML file named using\n\t\tthe country code must exist, e.g. 44.xml\n//-->\n<gui id=\"root\" bottom=\"0\" left=\"0\" rig"
The result should stop after -->, why is there more data?

The problem is that when using mid, the 2nd parameter should be the number of bytes and needed to have 'intOpenComment' removed.

Related

Replace $ char with zero for data field using SQLLoader

A text file contains data like below.
041522$$$$$$$$$NAPTTALIE REVERE #1621500025 OLD ST FUNNRHILL MA1530 273 000000$$$$$$$03#$$$##############$$$$$$$$$$$$$$$$$$Z$$$$$$$$$$$$$$$$$$$$$$###$$$$$$$$$$$$$$$$$$$$$#####$$$$$$$$$$$$$$$#$$$$$0$$$$$$$$$$$000000$$$$$$$$$$$$#$$#$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$##$$$$$$$$$$$$000000$$$$$$$$$$$$$A###Y$$$$$$$$$$$$$1##$$$$$$$$$$$$$$$$$$$##02$$$$$$$$$$$$$#$$$$$$$$$$$$$$$$$$$$$$##Y#######$$$$#################################
Control FIle:
LOAD DATA
CHARACTERSET "UTF8"
INFILE 'C:\bendex\MA_File38\fileout.txt'
BADFILE 'C:\bendex\MA_File38\baddata.bad'
DISCARDFILE 'C:\bendex\MA_File38\discdata.dsc'
APPEND
INTO TABLE "TMP_DATA_1220"
TRAILING NULLCOLS
(
SOURCE CONSTANT "TEST",
FILE_DTE "TRUNC(SYSDATE)",
AU_REGION POSITION (1:2),
AU_OFFICE POSITION (3:5),
AU_PGM_CATEGORY POSITION (6) ,
GRANTEE_SSN POSITION (7:15),
GRANTEE_NAME POSITION (16:38),
CAT_ELIG_IND POSITION (39),
PHONE POSITION (40:47),
ADDRESS POSITION (48:70),
CITY POSITION (71:83),
STATE POSITION (84:85),
ZIP POSITION (86:90),
CAN_NUM POSITION (91:95),
NET_INC POSITION (96:101) "TO_NUMBER(:NET_INC)",
START_DTE POSITION (102:107) "CASE WHEN :START_DTE ='$$$$$$' THEN TO_CHAR(REPLACE(:START_DTE, '$', '0')) ELSE DATE 'rrmmdd'",
LAST_UPDT_UID_NAM CONSTANT "LOADF38",
LAST_UPDT_TS "SYSTIMESTAMP"
)
**Error:**
Record 1: Rejected - Error on table "TMP_DATA_1220", column START_DTE.
ORA-01841: (full) year must be between -4713 and +9999, and not be 0
I have to read the data from the text file and load into table. I tried to replace '$' with '0' and convert to date field, position 102 to 107, but I am getting error. I tried using REPLACE, DECODE did not work.
Any help is much appreciated. Thank you.
NOTE: The text file has full length data but reading only first few data points using SQL Loader.
I believe you would want to make your start date NULL if it was invalid, no?
"CASE WHEN :START_DTE ='$$$$$$' THEN NULL ELSE to_date(:START_DTE, 'rrmmdd') END"

printing a vector of string in static text box with new lines

I have a bunch of classes that I am iterating through and collecting which classes the student is failing in. If the student fails , I collect the name of the class in a vector called retake.
retake =[Math History Science]
I have line breaks so when the classes print in the command window it shows as:
retake=
Math
History
Science.
However, I am trying display retake in a static text box in Gui Guide so it looks like the above. Instead, the static text box is showing as:
MathHistoryScience
set(handles.text13,'String', retake) % this is what I tried
can you please show me so it prints:
Math
History
Science
It looks to me like you need to add carriage returns.
Assuming you have a cell array with strings (rather than concatenated strings using [], which will give you a single long line), you can do it as follows:
retake = {'Math', 'History', 'Science'};
rString = '';
for ii = 1:numel(retake)-1
rString = [rString sprintf('%s\n', retake{ii}];
end
rString = [rString retake{end}];
Notice the use of '' to denote strings, {} to denote a cell array, '\n' as the end-of-line character, and [a b] to do simple string concatenation.

wxWidgets wrong substring

I am trying to extract a substring out of some html code in wxWidgets but I can't get my method working properly.
content of to_parse:
[HTML CODE]
<html><head></head><body><font face="Segue UI" size=2 .....<font face="Segoe UI"size="2" color="#000FFF"><font face="#DFKai-SB" ... <b><u> the text </u></b></font></font></font></body></html>
[/HTML CODE] (sorry about the format)
wxString to_parse = SOStream.GetString();
size_t spos = to_parse.find_last_of("<font face=",wxString::npos);
size_t epos = to_parse.find_first_of("</font>",wxString::npos);
wxString retstring(to_parse.Mid(spos,epos));
wxMessageBox(retstring); // Output: always ---> tml>
As there are several font face tags in the HTML the to_parse variable I would like to find the postion of the last <"font face= and the postion of the first <"/font>" close tag.
For some reason, only get the same to me unexpected output tml>
Can anyone spot the reason why?
The methods find_{last,first}_of() don't do what you seem to think they do, they behave in the same way as std::basic_string<> methods of the same name and find the first (or last) character of the string you pass to them, see the documentation.
If you want to search for a substring, use find().
Thank you for the answer. Yes you were right, I must have somehow been under the impression that Substring() / substr() / Mid() takes two wxStrings as parameters, which isn't the case.
wxString to_parse = SOStream.GetString();
to_parse = to_parse.Mid(to_parse.find("<p ")); disregarts everything before "<p "
to_parse = to_parse.Remove(to_parse.find("</p>")); removes everything after "</p>"
wxMessageBox(to_parse); // so we are left with everything between "<p" and "</p>"

How to return next string without >> with stringstream?

Instead of:
stringstream szBuffer;
szBuffer>>string;
myFunc(string);
How do I do like:
muFunc(szBuffer.NextString());
I dont want to create a temp var just for passing it to a function.
If you want to read the whole string in:
// .str() returns a string with the contents of szBuffer
muFunc(szBuffer.str());
// Once you've taken the string out, clear it
szBuffer.str("");
If you want to extract the next line (up to the next \n character), use istream::getline:
// There are better ways to do this, but for the purposes of this
// demonstration we'll assume the lines aren't longer than 255 bytes
char buf[ 256 ];
szBuffer.getline(buf, sizeof(buf));
muFunc(buf);
getline() can also take in a delimiter as a second parameter (\n by default), so you can read it word by word.

How to get Matlab to read correct amount of xml nodes

I'm reading a simple xml file using matlab's xmlread internal function.
<root>
<ref>
<requestor>John Doe</requestor>
<project>X</project>
</ref>
</root>
But when I call getChildren() of the ref element, it's telling me that it has 5 children.
It works fine IF I put all the XML in ONE line. Matlab tells me that ref element has 2 children.
It doesn't seem to like the spaces between elements.
Even if I run Canonicalize in oXygen XML editor, I still get the same results. Because Canonicalize still leaves spaces.
Matlab uses java and xerces for xml stuff.
Question:
What can I do so that I can keep my xml file in human readable format (not all in one line) but still have matlab correctly parse it?
Code Update:
filename='example01.xml';
docNode = xmlread(filename);
rootNode = docNode.getDocumentElement;
entries = rootNode.getChildNodes;
nEnt = entries.getLength
The XML parser behind the scenes is creating #text nodes for all whitespace between the node elements. Whereever there is a newline or indentation it will create a #text node with the newline and following indentation spaces in the data portion of the node. So in the xml example you provided when it is parsing the child nodes of the "ref" element it returns 5 nodes
Node 1: #text with newline and indentation spaces
Node 2: "requestor" node which in turn has a #text child with "John Doe" in the data portion
Node 3: #text with newline and indentation spaces
Node 4: "project" node which in turn has a #text child with "X" in the data portion
Node 5: #text with newline and indentation spaces
This function removes all of these useless #text nodes for you. Note that if you intentionally have an xml element composed of nothing but whitespace then this function will remove it but for the 99.99% of xml cases this should work just fine.
function removeIndentNodes( childNodes )
numNodes = childNodes.getLength;
remList = [];
for i = numNodes:-1:1
theChild = childNodes.item(i-1);
if (theChild.hasChildNodes)
removeIndentNodes(theChild.getChildNodes);
else
if ( theChild.getNodeType == theChild.TEXT_NODE && ...
~isempty(char(theChild.getData())) && ...
all(isspace(char(theChild.getData()))))
remList(end+1) = i-1; % java indexing
end
end
end
for i = 1:length(remList)
childNodes.removeChild(childNodes.item(remList(i)));
end
end
Call it like this
tree = xmlread( xmlfile );
removeIndentNodes( tree.getChildNodes );
I felt that #cholland answer was good, but I didn't like the extra xml work. So here is a solution to strip the whitespace from a copy of the xml file which is the root cause of the unwanted elements.
fid = fopen('tmpCopy.xml','wt');
str = regexprep(fileread(filename),'[\n\r]+',' ');
str = regexprep(str,'>[\s]*<','><');
fprintf(fid,'%s', str);
fclose(fid);