Malformed UTF-8 character (fatal) error while parsing XML using XML::LibXML

Malformed UTF-8 character (fatal) error while parsing XML using XML::LibXML - perl

I am parsing XML files using XML::LibXML. For the following XML entry I get the error:
Malformed UTF-8 character (fatal) at C:/Perl64/site/lib/XML/LibXML/Error.pm line 217
which is
$context=~s/[^\t]/ /g;
The entry in XML is the following
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">15177811</PMID>
<DateCreated>
<Year>2004</Year>
<Month>06</Month>
<Day>04</Day>
</DateCreated>
<DateCompleted>
<Year>2004</Year>
<Month>08</Month>
<Day>11</Day>
</DateCompleted>
<DateRevised>
<Year>2011</Year>
<Month>04</Month>
<Day>07</Day>
</DateRevised>
<Article PubModel="Print">
<Journal>
<ISSN IssnType="Print">0278-2626</ISSN>
<JournalIssue CitedMedium="Print">
<Volume>55</Volume>
<Issue>2</Issue>
<PubDate>
<Year>2004</Year>
<Month>Jul</Month>
</PubDate>
</JournalIssue>
<Title>Brain and cognition</Title>
<ISOAbbreviation>Brain Cogn</ISOAbbreviation>
</Journal>
<ArticleTitle>Efficiency of orientation channels in the striate cortex for distributed categorization process.</ArticleTitle>
<Pagination>
<MedlinePgn>352-4</MedlinePgn>
</Pagination>
<Affiliation>Cognitive Science Department, Université de Liège, Belgium. mmermillod#ulg.ac.be</Affiliation>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Mermillod</LastName>
<ForeName>Martial</ForeName>
<Initials>M</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Chauvin</LastName>
<ForeName>Alan</ForeName>
<Initials>A</Initials>
</Author>
<Author ValidYN="Y">
<LastName>Guyader</LastName>
<ForeName>Nathalie</ForeName>
<Initials>N</Initials>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType>Journal Article</PublicationType>
</PublicationTypeList>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Brain Cogn</MedlineTA>
<NlmUniqueID>8218014</NlmUniqueID>
<ISSNLinking>0278-2626</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList>
<CommentsCorrections RefType="ErratumIn">
<RefSource>Brain Cogn. 2005 Jul;58(2):245</RefSource>
</CommentsCorrections>
<CommentsCorrections RefType="RepublishedIn">
<RefSource>Brain Cogn. 2005 Jul;58(2):246-8</RefSource>
<PMID Version="1">16044513</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="Y">Neural Networks (Computer)</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Neurons</DescriptorName>
<QualifierName MajorTopicYN="N">physiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Orientation</DescriptorName>
<QualifierName MajorTopicYN="Y">physiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Pattern Recognition, Visual</DescriptorName>
<QualifierName MajorTopicYN="Y">physiology</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Visual Cortex</DescriptorName>
<QualifierName MajorTopicYN="Y">physiology</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
But the things I want out of this entry is PMID, DateRevised, PubDate, ArticleTitle, CommentsCorrectionList, and MeshHeadingList. But, if I remove Affiliation which contains some other character this error is no more. How should I fix this error?

You could either convert the file to the specified encoding (UTF-8), or you can specify the encoding actually used for the file. (<?xml version="1.0" encoding="cp1252"?>).
Notepad can be used to convert to UTF-8, and so can Perl:
perl -pe"
BEGIN {
binmode STDIN, ':encoding(cp1252)';
binmode STDOUT, ':encoding(UTF-8)';
}
" < file.cp1252 > file.UTF-8
(You'll have to remove the line breaks I've added for readability.)

Related

Plain text to odt with dynamic elements

I am trying to figure out a way to create odt files with dynamic elements from a plain text file and avoid using any kind of scripts. While doing some testing, I can easily convert an odt file to plain text using libre office command line tool. However, when trying to convert from plain text to odt format, dynamic elements are lost. I searched the documentation but nothing specifies the expected format when converting.
I tried using the command libreoffice --convert-to odt sampleFile.txt but dynamic elements are not recognized.
Considering the following .txt file that contains
${Parent.child}
Can I convert it to a working .odt file ?
UPDATE:
I was able to do so using the XML format but it is not the desired way of doing this because the XML format requires/generate a lot of elements.
Sample file:
<?xml version="1.0" encoding="UTF-8"?>
<office:document xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:officeooo="http://openoffice.org/2009/office" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:drawooo="http://openoffice.org/2010/draw" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2" office:mimetype="application/vnd.oasis.opendocument.text">
<office:meta><meta:document-statistic meta:character-count="108" meta:image-count="0" meta:object-count="0" meta:page-count="1" meta:paragraph-count="5" meta:table-count="0" meta:word-count="5"/><dc:date>2021-09-02T10:44:35</dc:date><meta:editing-duration>PT00H00M00S</meta:editing-duration><meta:editing-cycles>1</meta:editing-cycles><meta:generator>LibreOffice/6.0.7.3$Linux_X86_64 LibreOffice_project/00m0$Build-3</meta:generator></office:meta>
<office:settings>
<config:config-item-set config:name="ooo:view-settings">
<config:config-item config:name="ViewAreaTop" config:type="long">0</config:config-item>
<config:config-item config:name="ViewAreaLeft" config:type="long">0</config:config-item>
<config:config-item config:name="ViewAreaWidth" config:type="long">0</config:config-item>
<config:config-item config:name="ViewAreaHeight" config:type="long">0</config:config-item>
<config:config-item config:name="ShowRedlineChanges" config:type="boolean">true</config:config-item>
<config:config-item config:name="InBrowseMode" config:type="boolean">false</config:config-item>
</config:config-item-set>
<config:config-item-set config:name="ooo:configuration-settings">
<config:config-item config:name="PrintPaperFromSetup" config:type="boolean">false</config:config-item>
<config:config-item config:name="PrintFaxName" config:type="string"/>
<config:config-item config:name="PrintSingleJobs" config:type="boolean">false</config:config-item>
<config:config-item config:name="PrintProspectRTL" config:type="boolean">false</config:config-item>
<config:config-item config:name="PrintProspect" config:type="boolean">false</config:config-item>
<!-- ... and more -->
</config:config-item-set>
</office:settings>
<office:scripts>
<office:script script:language="ooo:Basic">
<ooo:libraries xmlns:ooo="http://openoffice.org/2004/office" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</office:script>
</office:scripts>
<!-- ... truncated styles -->
<office:body>
<office:text>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
</text:sequence-decls>
<text:p text:style-name="Standard"><text:text-input text:description="jooscript">${Parent.child}</text:text-input></text:p>
</office:text>
</office:body>
</office:document>

How in Powershell see all XML Levels

I have a test xml and I want to get the value from this line ATTRIBUTE NAME="News- offers_OPT_EMAIL">F
so I can check for the value F or T
if I do below I can get the title but how do I get the above line value.
[xml]$xml = Get-Content testFile.xml
$xml
$xml.CUSTOMERS.CUSTOMER.NAME.TITLE
sample XML code
<?xml version="1.0" encoding="UTF-8"?>
<CUSTOMERS xml:lang="en">
<CUSTOMER CREATED_DATE="2018-01-01 05:18:53.0" GROUP_ID="" ID="95656565">
<NAME>
<TITLE>M</TITLE>
<FIRST_NAME>Joe</FIRST_NAME>
<LAST_NAME>Smith</LAST_NAME>
</NAME>
<GENDER/>
<DATE_OF_BIRTH/>
<ADDRESS>
<ADDRESS_LINE_1>1 White House</ADDRESS_LINE_1>
<ADDRESS_LINE_2>Old Ave</ADDRESS_LINE_2>
<ADDRESS_LINE_3/>
<TOWNCITY>LONDON</TOWNCITY>
<COUNTY/>
<POSTCODE>18659</POSTCODE>
<COUNTRY>France</COUNTRY>
</ADDRESS>
<ADDRESS>
<ADDRESS_LINE_1>175 avenue de la division Leclerc</ADDRESS_LINE_1>
<ADDRESS_LINE_2/>
<ADDRESS_LINE_3/>
<TOWNCITY>Antony</TOWNCITY>
<COUNTY/>
<POSTCODE>92160</POSTCODE>
<COUNTRY>France</COUNTRY>
</ADDRESS>
<CONTACT_DETAILS>
<TELEPHONE MARKETING_OPT_IN="F" TYPE="MOBILE">0123456789</TELEPHONE>
<EMAIL MARKETING_OPT_IN="F">johnsmith#gmail.com</EMAIL>
</CONTACT_DETAILS>
<ATTRIBUTE NAME="News- offers_OPT_EMAIL">F</ATTRIBUTE>
<NOTE>NA</NOTE>
</CUSTOMER>

You could use SelectSingleNode or SelectNodes with an XPath expression. There are several options to achieve what you want, depending on your intention, but this would be one way to do it:
# finde the nodes
$nodes = $xml.SelectNodes("//*[local-name()='ATTRIBUTE'][#NAME='News- offers_OPT_EMAIL']")
# get value
$nodes.InnerText
Or if the value of the attribute doesn't matter, simply do:
$xml.customers.customer.attribute.'#text'

FOP 2.1 cannot find Tahoma font

Im using FOP 2.1 to generate pdf files. It mostly works fine, but there are a number of warnings im trying to get rid of. First thing is to get rid of the missing font warnings:
2019-01-24 15:23:30.052 WARN 8772 --- [https-jsse-nio-8087-exec-6] org.apache.fop.apps.FOUserAgent : Font "Tahoma,normal,700" not found. Substituting with "any,normal,700".
2019-01-24 15:23:30.098 WARN 8772 --- [https-jsse-nio-8087-exec-6] org.apache.fop.apps.FOUserAgent : Font "Tahoma,normal,400" not found. Substituting with "any,normal,400".
Fop config is in fop.xml file and is as such:
<?xml version="1.0"?>
<fop>
<renderers>
<renderer mime="application/pdf">
<font kening="yes" embed-url="tahoma.ttf" sub-font="Tahoma">
<font-triplet name="Tahoma" style="normal" weight="normal"/>
</font>
<font kening="yes" embed-url="tahomabd.ttf" sub-font="Tahoma">
<font-triplet name="Tahoma" style="normal" weight="bold"/>
</font>
</renderer>
</renderers>
</fop>
Fop config is read in as follows:
InputStream inStr = this.getClass().getResourceAsStream("/fop.xml");
FopFactory fopFactory = FopFactory.newInstance(new java.net.URI("."), inStr);
Am i missing something or why it is not finding the fonts?

Magento 2 Currency symbol is not showing

I have 2 stores English and Arabic. Default Store is Arabic and default currency is SAR. Currency symbol is showing fine on English Store but not showing anywhere on Arabic store. It only showing price like this 44 on product listing page and on single product page.

I fixed it.
Posting my answer may be it can help someone.
edit this file.
vendor\magento\zendframework1\library\Zend\Locale\Data\ar_SA.xml
and remove following code.
<numbers>
<currencyFormats numberSystem="latn">
<currencyFormatLength>
<currencyFormat type="standard">
<pattern>¤#0.00</pattern>
</currencyFormat>
</currencyFormatLength>
</currencyFormats>
</numbers>
UPDATED ANSWER ========
I got better solution instead editing core file you can do it with observer.
Vendor/Module/etc/events.xml
<?xml version="1.0"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:Event/etc/events.xsd">
<event name="currency_display_options_forming">
<observer name="change_currency_position" instance="Vendor\Module\Model\Observer\ChangeCurrencyPosition" />
</event>
</config>
and observer File.
use Magento\Framework\Event\ObserverInterface;
class ChangeCurrencyPosition implements ObserverInterface
{
public function execute(\Magento\Framework\Event\Observer $observer)
{
$currencyOptions = $observer->getEvent()->getCurrencyOptions();
$currencyOptions->setData('position', \Magento\Framework\Currency::RIGHT);
return $this;
}
}
Need to change the 'position' to RIGHT.

Its a bug in 2.4.3:
src\vendor\magento\module-directory\Model\Currency.php
Comment out these lines:
if ($this->canUseNumberFormatter($options)) {
return $this->formatCurrency($price, $options);
}
Then:
php -dmemory_limit=6G bin/magento setup:upgrade
php -dmemory_limit=6G bin/magento setup:di:compile
php -dmemory_limit=6G bin/magento setup:static-content:deploy -f
php -dmemory_limit=6G bin/magento cache:flush
Editing core is bad, we can create preference.

If #Ask4Tec's solution is not working
Try this
Copy the currency and numbers block from en.xml file and paste it into the ar_SA.xml file
Clean and flush cache
Check after hard refresh

Update ar_SA.xml like below:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ldml SYSTEM "../../common/dtd/ldml.dtd">
<!-- Copyright © 1991-2013 Unicode, Inc.
CLDR data files are interpreted according to the LDML specification (http://unicode.org/reports/tr35/)
For terms of use, see http://www.unicode.org/copyright.html
-->
<ldml>
<identity>
<version number="$Revision: 9287 $"/>
<generation date="$Date: 2013-08-28 21:32:04 -0500 (Wed, 28 Aug 2013) $"/>
<language type="ar"/>
<territory type="SA"/>
</identity>
<dates>
<calendars>
<calendar type="islamic">
<dateTimeFormats>
<availableFormats>
<dateFormatItem id="Md" draft="contributed">M/d</dateFormatItem>
<dateFormatItem id="MEd" draft="contributed">E, M/d</dateFormatItem>
<dateFormatItem id="MMMd" draft="contributed">MMM d</dateFormatItem>
<dateFormatItem id="MMMEd" draft="contributed">E, MMM d</dateFormatItem>
</availableFormats>
</dateTimeFormats>
</calendar>
</calendars>
</dates>
<numbers>
<symbols numberSystem="latn">
<decimal>.</decimal>
<group>,</group>
<list>;</list>
<percentSign>%</percentSign>
<plusSign>+</plusSign>
<minusSign>-</minusSign>
<exponential>E</exponential>
<superscriptingExponent>×</superscriptingExponent>
<perMille>‰</perMille>
<infinity>∞</infinity>
<nan>NaN</nan>
</symbols>
<decimalFormats numberSystem="latn">
<decimalFormatLength>
<decimalFormat>
<pattern>#,##0.###</pattern>
</decimalFormat>
</decimalFormatLength>
</decimalFormats>
<currencyFormats numberSystem="latn">
<currencyFormatLength>
<currencyFormat type="standard">
<pattern>¤#,##0.00</pattern>
</currencyFormat>
<currencyFormat type="accounting">
<pattern>¤#,##0.00;(¤#,##0.00)</pattern>
</currencyFormat>
</currencyFormatLength>
<unitPattern count="one">{0} {1}</unitPattern>
<unitPattern count="other">{0} {1}</unitPattern>
</currencyFormats>
<currencies>
<currency type="USD">
<displayName>US Dollar</displayName>
<displayName count="one">US dollar</displayName>
<displayName count="other">US dollars</displayName>
<symbol>$</symbol>
</currency>
</currencies>
</numbers>
</ldml>

How to Parse XML File using xmlparsing on iphone?

How do I access following XML file using xmlparsing on an iphone?
<?xml version="1.0" encoding="iso-8859-1"?>
<chapter>
<TITLE>Title &plainEntity;</TITLE>
<para>
<informaltable>
<tgroup cols="3">
<tbody>
<row><entry>a1</entry><entry morerows="1">b1</entry><entry>c1</entry></row>
<row><entry>a2<<?xml version="1.0" encoding="iso-8859-1"?>
<ADDRESS>
<CONTACT>
<NAME FIRST="Fred" LAST="Bloggs" NICK="Bloggsie" TITLE="Mr" />
<STREET HOME="5 Any Street" WORK="Floor 24, BigShot Tower" />
<CITY HOME="Little Town" WORK="Anycity" />
<COUNTY HOME="Anyshire" WORK="Anyshire" />
<POSTAL HOME="as plain text" WORK="text" />
<COUNTRY HOME="UK" WORK="UK" />
<PHONE HOME="as text" WORK="text" />
<FAX HOME="none" WORK="555" />
<MOBILE HOME="444" WORK="333" />
<WEB HOME="www.codehelp.co.uk" WORK="" />
<COMPANY>Full name of company here</COMPANY>
<GENDER>male</GENDER>
<BDAY>Add tags for year, month and day to make this more useful</BDAY>
<ANNI>some date long forgotten :-)</ANNI>
<SPOUSE>angry</SPOUSE>
<CHILDREN>Make sure this tag is one of the ones allowed to repeat in the DTD</CHILDREN>
<COMMENT>comments here</COMMENT>
<EMAILONE>Either use fixed tags like this, or change to a repeating tag</EMAILONE>
<EMAILTWO>second email line</EMAILTWO>
<EMAILTHREE>third</EMAILTHREE>
<EMAILFOUR>fourth</EMAILFOUR>
</CONTACT>
</ADDRESS>/entry><entry>c2</entry></row>
<row><entry>a3</entry><entry>b3</entry><entry>c3</entry></row>
</tbody>
</tgroup>
</informaltable>
</para>
&systemEntity;
<section id="about">
<title>About this Document</title>
<para>
<!-- this is a comment -->
</para>
</section>
</chapter>

The SDK provides two ways to parse XML: libxml2 and NSXMLParser. libxml2 is the fastest. To use it in your project, go to the build settings for your iPhone App project and set the following:
Other linker flags = -lxml2
Header Search Paths: $(SDKROOT)/usr/include/libxml2
Then download XPathQuery.m and XPathQuery.h from this page: Using libxml2 for XML parsing and XPath queries in Cocoa, which also provides a tutorial on how to use it. That XPathQuery class is a simplified way to parse XML. I recommend it unless you want to write the same code yourself, which doesn't seem the case.
With that on place, do
NSString *string = nil; // put your html document here
NSData *htmlData = [string dataUsingEncoding:NSUTF8StringEncoding];
NSString *xpath = #"//row"; // any standard XPath expression
NSArray *nodesArray = PerformHTMLXPathQuery(htmlData, xpath);
NSDictionary *dict;
if (0<[nodesArray count]) {
dict = [nodesArray objectAtIndex:0];
}
At this point the elements from your document should be inside the dict dictionary.

Here are releated the SO post,
Parser XML with NSXMLParser
http://www.raywenderlich.com/553/how-to-chose-the-best-xml-parser-for-your-iphone-project
NSXMLParser example
below is the blog tutorial for using NSXMLParser.
http://markstruzinski.com/?p=47

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Malformed UTF-8 character (fatal) error while parsing XML using XML::LibXML - perl

Related

Plain text to odt with dynamic elements

How in Powershell see all XML Levels

FOP 2.1 cannot find Tahoma font

Magento 2 Currency symbol is not showing

How to Parse XML File using xmlparsing on iphone?

Categories

Resources