substring before function in XSLT 2.0 - substring

I need help with string manipulation.
Scenario:
<ADR_LINE>This is a sample text, to test adr-line.</ADR_LINE>
Requirement is to split ADR_LINE into 2 if length is more than 20. While splitting we need to make sure that the last character in the string should be space, comma, or full stop.
In this example, total length is 40 and split is something like:
<ADR_LINE>This is a sample tex</ADR_LINE>
<ADR_LINE>t, to test adr-line.</ADR_LINE>
To avoid cutting word 'text' in this example, I am trying to use below logic:
<xsl:if test="not(substring($adrlnpart1, 20) = ' ' or '.' or ',' or '-')">
<xsl:variable name="adrln2trunc" select="substring-before($adrln2part1,' ')"/>
</xsl:if>
in the substring-before function, I am not sure if we can specify multiple characters (space, comma, or full stop) to look at the end of string. In this example checking for space may work, but other scenarios will require comma or full stop.
Please suggest your ideas to handle this. Thanks in advance.

Here is an example using xsl:analyze-string:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ADR_LINE[string-length() > 20]">
<xsl:analyze-string select="." regex=".{{19,}}?([., ]|$)">
<xsl:matching-substring>
<ADR_LINE>
<xsl:value-of select="."/>
</ADR_LINE>
</xsl:matching-substring>
<xsl:non-matching-substring>
<ADR_LINE>
<xsl:value-of select="."/>
</ADR_LINE>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:transform>
Splits
<ADR_LINE>This is a sample text, to test adr-line.</ADR_LINE>
into
<ADR_LINE>This is a sample text,</ADR_LINE>
<ADR_LINE> to test adr-line.</ADR_LINE>
Online sample at http://xsltransform.net/pNmBy2a.

Related

How to remove text between a string and a space using SED

I have a file with repeating line in it like this;
<stack-block name="B" sub-type="SBL" type="ABM_BLOCK" level="2" parent-name="PBTYRD" geo-anchor-latitude="-34.96723069348281" geo-anchor-longitude="150.2157080161554" geo-anchor-orientation="72.35290364141252" z-index-min="1" />
<stack-block name="C" sub-type="SBL" type="ABM_BLOCK" level="2" parent-name="PBTYRD" geo-anchor-latitude="-34.967529872288864" geo-anchor-longitude="150.2145108805486" geo-anchor-orientation="72.35290364141252" z-index-min="1" />
...and so on...
I want to remove the geo-anchor-latitude="-34.96723069348281" section from the lines of a file including the geo-anchor-latitude phrase up to the second double quote.
I have tried sed -i 's/geo-anchor-latitude.*"//' filename with no luck as it strips everything from geo-anchor-latitude to the end of the line.
Any clues out there? Thanks.
Would you try the following:
sed -i 's/geo-anchor-latitude="[^"]*"//' filename
Output:
<stack-block name="B" sub-type="SBL" type="ABM_BLOCK" level="2" parent-name="PBTYRD" geo-anchor-longitude="150.2157080161554" geo-anchor-orientation="72.35290364141252" z-index-min="1" />
<stack-block name="C" sub-type="SBL" type="ABM_BLOCK" level="2" parent-name="PBTYRD" geo-anchor-longitude="150.2145108805486" geo-anchor-orientation="72.35290364141252" z-index-min="1" />
The regex geo-anchor-latitude="[^"]*" matches the substring such as:
A literal string geo-anchor-latitude="
Followed by a sequence of any characters except for "
Followed by a double quote "
Then the matched substring above is removed by the s command.
You can use extended regular expressions (-E) with sed to do this.
sed -Ei 's/geo-anchor-latitude="[-0-9]+[.][0-9]+"//' filename
This regex looks for the latitude attribute, followed by a decimal number with any number of digits.

Why this line stops Sphinx search?

I use sanitizing from example: Barryhunter's
But when I use the line:
$q = preg_replace('/[^\w~\|\(\)\^\$\?"\/=-]+/',' ',trim(strtolower($q)));
then Russian search don't works! Only English.
What the reason? How I should use sanitizing?
This is my piece:
<HTML>
<BODY>
<form action="" method="get">
<input name="q" size="40" value="<?php echo #$_GET['q']; ?>" />
<input type="submit" value="Search" />
</form>
<?php
require ( 'sphinxapi.php' );
$sphinx = new SphinxClient;
$sphinx->SetServer('ununtu', 9312);
$sphinx->open();
$sphinx->SetMatchMode (SPH_MATCH_EXTENDED);
$sphinx->setFieldWeights(array(
'title' => 10,
'content' => 5
));
$sphinx->SetRankingMode(PH_RANK_WORDCOUNT);
$sphinx->SetSortMode(SPH_SORT_RELEVANCE);
$sphinx->setLimits(0, 10, 200);
$sphinx->resetFilters();
$q = isset($_GET['q'])?$_GET['q']:'';
$q = preg_replace('/ OR /',' | ',$q);
// $q = preg_replace('/[^\w~\|\(\)\^\$\?"\/=-]+/',' ',trim(strtolower($q)));
if(isset($_GET['q']) and strlen($_GET['q']) > 1)
{
$result = $sphinx->query($sphinx->escapeString($q), '*');
...
Assuming your input string is utf-encoded you use non-unicode preg_replace. Add 'u' in the end, e.g.:
$q = preg_replace('/[^\w~\|\(\)\^\$\?"\/=-]+/u',' ',trim(strtolower($q)));
Specifically that regex is stripping anything that is not a 'word' char, or a predefined list of syntax/punctuation chars.
The PREG definition of word (the \w ) is
A "word" character is any letter or digit or the underscore character,
that is, any character which can be part of a Perl "word". The
definition of letters and digits is controlled by PCRE's character
tables, and may vary if locale-specific matching is taking place. For
example, in the "fr" (French) locale, some character codes greater
than 128 are used for accented letters, and these are matched by \w.
http://php.net/manual/en/regexp.reference.escape.php
So possibly in English locale (or other western European for example), hence many Russian chars are not considered a word char, and stripped.
(if your pages are in UTF8, then may also need the /u as mentioned by other answer)

Yang model for remainder operation (%)

I want to create yang model with for some integer range e.g from 1000 to maximum and values must be entered in steps of 500. Is there any way I can make use of remainder(modulus) % operator in yang or range function like python with steps.
Or I just need to use pattern with some regex.
Use a must constraint to further constrain an integer type value that is already constrained with a range.
module modulus {
yang-version 1.1;
namespace "org:so:modulus";
prefix "som";
leaf value {
type int32 {
range "1000..max";
}
must ". mod 500 = 0" {
error-message "values must be entered in steps of 500";
}
}
}
XPath specification provides the mod operator.
<?xml version="1.0" encoding="utf-8"?>
<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<som:value xmlns:som="org:so:modulus">1501</som:value>
</data>
Results in:
Error at (3:3): failed assert at "/nc:data/som:value": values must be entered in steps of 500
While
<?xml version="1.0" encoding="utf-8"?>
<data xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<som:value xmlns:som="org:so:modulus">2000</som:value>
</data>
is okay.

Iterate all XML nodes and their childs

What would be the most efficient way in VBScript to iterate through an XML file.
I am looking for a way to iterate all nodes in the XML file. I cannot use XQL queries, because I really do need to iterate all nodes to check all attributes in the file.
PS: Basically I am writing a script to replace references to file paths. The problem is that these file paths can be in a big number of places. (But that's for me to find out). I only need help with the XML iterating part.
While I suspect that putting some intelligence and XPath expressions into the search would increase effiency, this
Option Explicit
Dim oXDoc : Set oXDoc = CreateObject( "Msxml2.DOMDocument" )
oXDoc.async = False
oXDoc.load "..\data\31677574.xml"
If 0 = oXDoc.ParseError Then
WScript.Echo oXDoc.documentElement.xml
walk oXDoc.documentElement, 0
Else
WScript.Echo oXDoc.ParseError.Reason
End If
Sub walk(e, i)
WScript.Echo Space(i), e.tagName
Dim a
For Each a In e.Attributes
WScript.Echo Space(i + 1), a.name, a.value
Next
Dim c
For Each c In e.childNodes
walk c, i + 2
Next
End Sub
output:
cscript 31694559.vbs
<Configuration>
<Add SourcePath="\\sample" ApplicationEdition="32">
<Product ID="SampleProductID">
<Language ID="en-us"/>
<Language ID="en-us"/>
</Product>
</Add>
</Configuration>
Configuration
Add
SourcePath \\sample
ApplicationEdition 32
Product
ID SampleProductID
Language
ID en-us
Language
ID en-us
will visit all elements and their attributes.

sed/awk Capitallize everything between patterns and lowercase small words

I did find a way to capitalize the whole document, with both sed and awk, but how to do it, if I want to convert everything inside patterns from CAPS LOCK to Capital?
For example, I have an HTML file, and everything (multiple occurrences) between <b> and </b> has to be converted from TITLE to Title, and if possible making small words (1 ~ 2 letters) in lowercase.
From This:
<div id="1">
<div class="p"><b>THIS IS A RANDOM TITLE</b></div>
<table class="hugetable">
...
</table>
<div class="p"><b>THIS IS ANOTHER RANDOM TITLE</b></div>
<table class="hugetable">
...
</table>
...
</div>
To this:
<div id="1">
<div class="p"><b>This is a Random Title</b></div>
<table class="hugetable">
...
</table>
<div class="p"><b>This is Another Random Title</b></div>
<table class="hugetable">
...
</table>
...
</div>
This is not the most beautiful solution but I think it works:
sed -r -e '/<b>/ {s/( .)([^ ]*)/\1\L\2/g}' -e 's/<b>(.)/<b>\u\1/' -e '/<b>/ {s/(\b.{1,2}\b)/\L\1/g}' data
Explanation:
1st expression (-e): If a line contains <b>:
Then for each word which has a space in front of it, keep the space and the first (already capitalized) character (\1) and then convert all the following characters of the word to lower case (\L\2)
2nd expression (-e): The first word after <b> is still uncapitalized, so select the first character after the bold tag <b>(.) and replace it uppercased <b>\u\1
3rd expression (-e): Again if a line contains <b>:
Then select words of 1 or 2 characters in length \b.{1,2}\b and replace them lowercased \L\1