Package RDFClosure :: Module DatatypeHandling
[hide private]
[frames] | no frames]

Module DatatypeHandling

source code

Most of the XSD datatypes are handled directly by RDFLib. However, in some cases, that is not good enough. There are two major reasons for this:

  1. Some datatypes are missing from RDFLib and required by OWL 2 RL and/or RDFS
  2. In other cases, though the datatype is present, RDFLib is fairly lax in checking the lexical value of those datatypes. Typical case is boolean.

Some of these deficiencies are handled by this module. All the functions convert the lexical value into a python datatype (or return the original string if this is not possible) which will be used, eg, for comparisons (equalities). If the lexical value constraints are not met, exceptions are raised.


Requires: RDFLib, 4.0.0 and higher

License: This software is available for use under the W3C Software License

Organization: World Wide Web Consortium

Author: Ivan Herman

Contact: Ivan Herman, ivan@w3.org

Classes [hide private]
  _namelessTZ
(Nameless) timezone object.
Functions [hide private]
 
_returnTimeZone(incoming_v)
Almost all time/date related methods require the extraction of an optional time zone information.
source code
 
_strToBool(v)
The built-in conversion to boolean is way too lax.
source code
 
_strToDecimal(v)
The built in datatype handling for RDFLib maps a decimal number to float, but the python version 2.4 and upwards also has a Decimal number.
source code
 
_strToAnyURI(v)
Rudimentary test for the AnyURI value.
source code
 
_strToBase64Binary(v)
Rudimentary test for the base64Binary value.
source code
 
_strToBoundNumeral(v, interval, conversion)
Test (and convert) a generic numerical type, with a check against a lower and upper limit.
source code
 
_strToDouble(v)
Test and convert a double value into a Decimal or float.
source code
 
_strToFloat(v)
Test and convert a float value into Decimal or (python) float.
source code
 
_strToHexBinary(v)
Test (and convert) hexa integer values.
source code
datetime.datetime
_strToDateTimeAndStamp(incoming_v, timezone_required=False)
Test (and convert) datetime and date timestamp values.
source code
 
_strToTime(incoming_v)
Test (and convert) time values.
source code
 
_strToDate(incoming_v)
Test (and convert) date values.
source code
 
_strTogYearMonth(v)
Test gYearMonth value
source code
 
_strTogYear(v)
Test gYear value
source code
 
_strTogMonthDay(v)
Test gYearMonth value
source code
 
_strTogDay(v)
Test gYearMonth value
source code
 
_strTogMonth(v)
Test gYearMonth value
source code
 
_strToXMLLiteral(v)
Test (and convert) XML Literal values.
source code
 
_strToVal_Regexp(v, regexp, flag=0, excludeStart=[])
Test (and convert) a generic string type, with a check against a regular expression.
source code
 
_strToToken(v)
Test (and convert) a string to a token.
source code
 
_strToPlainLiteral(v)
Test (and convert) a plain literal
source code
 
use_Alt_lexical_conversions()
Registering the datatypes item for RDFLib, ie, bind the dictionary values.
source code
 
use_RDFLib_lexical_conversions()
Restore the original (ie, RDFLib) set of lexical conversion routines.
source code
Variables [hide private]
  __author__ = 'Ivan Herman'
  __license__ = u'W3C® SOFTWARE NOTICE AND LICENSE, http://www.w...
  _hexc = ['A', 'B', 'C', 'D', 'E', 'F', 'a', 'b', 'c', 'd', 'e'...
  _numb = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '0']
  _limits_unsignedByte = [-1, 256]
  _limits_byte = [-129, 128]
  _limits_unsignedInt = [-1, 4294967296]
  _limits_int = [-2147483649, 2147483648]
  _limits_unsignedShort = [-1, 65536]
  _limits_short = [-32769, 32768]
  _limits_unsignedLong = [-1, 18446744073709551616]
  _limits_long = [-9223372036854775809, 9223372036854775808]
  _limits_positiveInteger = [0, None]
  _limits_nonPositiveInteger = [None, 1]
  _limits_nonNegativeInteger = [-1, None]
  _limits_negativeInteger = [None, 0]
  _re_language = '[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*'
  _re_NMTOKEN = '[\\w:_.\\-]+'
  _re_Name_ex = ['.', '-', '1', '2', '3', '4', '5', '6', '7', '8...
  _re_NCName = '[\\w_.\\-]+'
  _re_NCName_ex = ['.', '-', '1', '2', '3', '4', '5', '6', '7', ...
  _re_token = '[^\n\t\r]+'
  AltXSDToPYTHON = {rdflib.term.URIRef(u'http://www.w3.org/1999/...
  __package__ = 'RDFClosure'

Imports: ns_rdf, XSDToPython, Literal, _toPythonMapping, ns_xsd, datetime, time, re, Decimal


Function Details [hide private]

_returnTimeZone(incoming_v)

source code 

Almost all time/date related methods require the extraction of an optional time zone information.

Parameters:
  • incoming_v - the time/date string @return (v,timezone) tuple; 'v' is the input string with the timezone info cut off, 'timezone' is a _namelessTZ instance or None

_strToBool(v)

source code 

The built-in conversion to boolean is way too lax. The xsd specification requires that only true, false, 1 or 0 should be used...

Parameters:
  • v - the literal string defined as boolean @return corresponding boolean value
Raises:
  • ValueError - invalid boolean values

_strToDecimal(v)

source code 

The built in datatype handling for RDFLib maps a decimal number to float, but the python version 2.4 and upwards also has a Decimal number. Better make use of that to use very high numbers. However, there is also a big difference between Python's decimal and XSD's decimal, because the latter does not allow for an exponential normal form (why???). This must be filtered out.

Parameters:
  • v - the literal string defined as decimal @return Decimal
Raises:
  • ValueError - invalid decimal value

_strToAnyURI(v)

source code 

Rudimentary test for the AnyURI value. If it is a relative URI, then some tests are done to filter out mistakes. I am not sure this is the full implementation of the RFC, though, may have to be checked at some point later.

Parameters:
  • v - the literal string defined as a URI @return the incoming value
Raises:
  • ValueError - invalid URI value

_strToBase64Binary(v)

source code 

Rudimentary test for the base64Binary value. The problem is that the built-in b64 module functions ignore the fact that only a certain family of characters are allowed to appear in the lexical value, so this is checked first.

Parameters:
  • v - the literal string defined as a base64encoded string @return the decoded (binary) content
Raises:
  • ValueError - invalid base 64 binary value

_strToBoundNumeral(v, interval, conversion)

source code 

Test (and convert) a generic numerical type, with a check against a lower and upper limit.

Parameters:
  • v - the literal string to be converted
  • interval - lower and upper bounds (non inclusive). If the value is None, no comparison should be done
  • conversion - conversion function, ie, int, long, etc
Raises:
  • ValueError - invalid value

_strToDouble(v)

source code 

Test and convert a double value into a Decimal or float. Raises an exception if the number is outside the permitted range, ie, 1.0E+310 and 1.0E-330. To be on the safe side (python does not have double!) Decimals are used if possible. Upper and lower values, as required by xsd, are checked (and these fixed values are the reasons why Decimal is used!)

Parameters:
  • v - the literal string defined as a double @return Decimal
Raises:
  • ValueError - invalid value

_strToFloat(v)

source code 

Test and convert a float value into Decimal or (python) float. Raises an exception if the number is outside the permitted range, ie, 1.0E+40 and 1.0E-50. (And these fixed values are the reasons why Decimal is used!)

Parameters:
  • v - the literal string defined as a float @return Decimal if the local python version is >= 2.4, float otherwise
Raises:
  • ValueError - invalid value

_strToHexBinary(v)

source code 

Test (and convert) hexa integer values. The number of characters should be even.

Parameters:
  • v - the literal string defined as a hexa number @return long value
Raises:
  • ValueError - invalid value

_strToDateTimeAndStamp(incoming_v, timezone_required=False)

source code 

Test (and convert) datetime and date timestamp values.

Parameters:
  • incoming_v - the literal string defined as the date and time
  • timezone_required - whether the timezone is required (ie, for date timestamp) or not @return datetime
Returns: datetime.datetime
Raises:
  • ValueError - invalid datetime or date timestamp

_strToTime(incoming_v)

source code 

Test (and convert) time values.

Parameters:
  • incoming_v - the literal string defined as time value @return time @rtype datetime.time
Raises:
  • ValueError - invalid datetime or date timestamp

_strToDate(incoming_v)

source code 

Test (and convert) date values.

Parameters:
  • incoming_v - the literal string defined as date (in iso format) @return date @return datetime.date
Raises:
  • ValueError - invalid datetime or date timestamp

_strTogYearMonth(v)

source code 

Test gYearMonth value

Parameters:
  • v - the literal string @return v
Raises:
  • ValueError - invalid value

_strTogYear(v)

source code 

Test gYear value

Parameters:
  • v - the literal string @return v
Raises:
  • ValueError - invalid value

_strTogMonthDay(v)

source code 

Test gYearMonth value

Parameters:
  • v - the literal string @return v
Raises:
  • ValueError - invalid value

_strTogDay(v)

source code 

Test gYearMonth value

Parameters:
  • v - the literal string @return v
Raises:
  • ValueError - invalid value

_strTogMonth(v)

source code 

Test gYearMonth value

Parameters:
  • v - the literal string @return v
Raises:
  • ValueError - invalid value

_strToXMLLiteral(v)

source code 

Test (and convert) XML Literal values.

Parameters:
  • v - the literal string defined as an xml literal @return the canonical version of the same xml text
Raises:
  • ValueError - incorrect xml string

_strToVal_Regexp(v, regexp, flag=0, excludeStart=[])

source code 

Test (and convert) a generic string type, with a check against a regular expression.

Parameters:
  • v - the literal string to be converted
  • regexp - the regular expression to check against
  • flag - flags to be used in the regular expression
  • excludeStart - array of characters disallowed in the first position @return original string
Raises:
  • ValueError - invalid value

_strToToken(v)

source code 

Test (and convert) a string to a token.

Parameters:
  • v - the literal string to be converted @return original string
Raises:
  • ValueError - invalid value

_strToPlainLiteral(v)

source code 

Test (and convert) a plain literal

Parameters:
  • v - the literal to be converted @return a new RDFLib Literal with language tag
Raises:
  • ValueError - invalid value

use_Alt_lexical_conversions()

source code 

Registering the datatypes item for RDFLib, ie, bind the dictionary values. The 'bind' method of RDFLib adds extra datatypes to the registered ones in RDFLib, though the table used here (ie, AltXSDToPYTHON) actually overrides all of the default conversion routines. The method also add a Decimal entry to the PythonToXSD array of RDFLib.


Variables Details [hide private]

__license__

Value:
u'W3C® SOFTWARE NOTICE AND LICENSE, http://www.w3.org/Consortium/Legal\
/2002/copyright-software-20021231'

_hexc

Value:
['A', 'B', 'C', 'D', 'E', 'F', 'a', 'b', 'c', 'd', 'e', 'f']

_re_Name_ex

Value:
['.', '-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']

_re_NCName_ex

Value:
['.', '-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']

AltXSDToPYTHON

Value:
{ns_xsd ["language"]: lambda v: _strToVal_Regexp(v, _re_language), ns_\
xsd ["NMTOKEN"]: lambda v: _strToVal_Regexp(v, _re_NMTOKEN, re.U), ns_\
xsd ["Name"]: lambda v: _strToVal_Regexp(v, _re_NMTOKEN, re.U, _re_Nam\
e_ex), ns_xsd ["NCName"]: lambda v: _strToVal_Regexp(v, _re_NCName, re\
.U, _re_NCName_ex), ns_xsd ["token"]: _strToToken, ns_rdf ["PlainLiter\
al"]: _strToPlainLiteral, ns_xsd ["boolean"]: _strToBool, ns_xsd ["dec\
imal"]: _strToDecimal, ns_xsd ["anyURI"]: _strToAnyURI, ns_xsd ["base6\
4Binary"]: _strToBase64Binary, ns_xsd ["double"]: _strToDouble, ns_xsd\
...