Advanced XML regular expression features [Part 1]

Regular expressions are sets of special characters or character sequences that are used to match string values in text data. XPath 2.0 offers improved support for regular expressions. This is useful because it enables you to search for particular string patterns and return nodes that match those string patterns.

xml regular expression

XPath 2.0 functions

In XPath 2.0, three new functions provide support for regular expressions:

  • tokenize()
  • matches()
  • replace()

tokenize()

You use the tokenize() function to split a string into substrings. For example, you can use this function to split a sentence into separate words. The syntax for the tokenize() function is

fn:tokenize( $input as xs:string?,

$pattern as xs:string,

$flags as xs:string) as xs:string+

The fn prefix is used to identify the function as belonging to the XPath 2.0 function namespace – http://www.w3.org/2003/11/xpath-functions. The $input parameter represents the string you want to split, $pattern represents the regular expression you want to use on this string, and $flags is an optional parameter representing a number of predefined options that determine the way the function operates, such as whether it works case sensitively.

matches()

You use the matches() function to return a Boolean value – true or false. If a match exists for the regular expression, a value of true is returned. Otherwise, a value of false is returned. The syntax for the matches() function is

fn:matches( $input as xs:string?,

$pattern as xs:string,

$flags as xs:string) as xs:boolean

The following code example uses the matches() function in an XSL template to identify people with Boston area codes in a list containing customer details. It does this by using a regular expression to test whether a customer’s phone number has an area code of 617. It then outputs those customers’ names in a results table.

<xsl:template match=”customer”>

<table>

<xsl:if test=”matches

(address/phoneNumber, ‘617-[0-9]{3}-[0-9]{4}’)”>

<tr>

<td>

<xsl:value-of select=”name/firstName”/>

</td>

<td>

<xsl:value-of select=”name/lastName”/>

</td>

</tr>

</xsl:if>

</table>

</xsl:template>

replace()

The replace() function enables you to search for and replace a string. The syntax for this function is

fn:replace( $input as xs:string?,

$pattern as xs:string,

$replacement as xs:string,

$flags as xs:string) as xs:string

The function accepts an input string, $string, searches it for values represented by $pattern, and replaces these with whatever you specify in the $replacement parameter. As is the case with the tokenize() and matches() functions, $flags is an optional parameter that determines how the function operates.

For example, say you want to ensure that US addresses in customer lists are always displayed as “U.S.A.” rather than “USA” or “US”. The following XSL code searches for alternative address formats and replaces them with U.S.A..

<xsl:for-each select=”address/country”>

<xsl:value-of select=”replace

(., ‘[Uu]\.?[Ss]\.?[Aa]?\.?$’, ‘U.S.A.’)”/>

</xsl:for-each>

No comments yet.

Leave a Reply