Getting started with SAOPY and Semantic Sensory Data

Working Draft

This is a work in progress and as such is subject to change. Comments are very welcome, please send them to Dr. Sefki Kolozali.

  1. How to use SAOPY
  2. SPARQL Query Examples
  3. Validation of Annotated Data Samples
  4. (Simplified) KAT python library with examples

Appendices

  1. References

1. How to use saopy

saopy depends on:

  1. rdflib (http://rdflib.org/) v2.4.0 (easy_install)

saopy can be downloded from the following link: SAOPY (v1.1.9). Since it is an ongoing work, please make sure that you are using the latest version. In order to install SAOPY library, you have to use the following command in the directory that you have downloaded the SAOPY python wheels.
$ pip install ./saopy-1.1.9-py2.py3-none-any.whl

then start python using
$ python

Now let's open the python interpreter and give it a go...

[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Start by importing the saopy package :
>>> import saopy

To view all "saopy" classes:
>>> dir(saopy)
['DUL', 'PropertySet', 'RDFInterface', 'SaoInfo', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'ces', 'ct', 'exportRDFFile', 'exportRDFGraph', 'foaf', 'geo', 'geo1', 'importRDFFile', 'importRDFGraph', 'model', 'muo', 'owl', 'owlsg', 'owlss', 'owlssc', 'owlssp', 'owlssrp', 'prov', 'qoi', 'rdfs', 'sao', 'ssn', 'tl', 'tm', 'tzont']

To view all classes of sao ontology from saopy library:
>>> dir(saopy.sao)
['DiscreteCosineTransform', 'DiscreteFourierTransform', 'DiscreteWaveletTransform', 'KMeans', 'Mean', 'Median', 'MovingAverage', 'PiecewiseAggregateApproximation', 'Point', 'Segment', 'SensorSAX', 'StreamAnalysis', 'StreamData', 'StreamEvent', 'SymbolicAggregateApproximation', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'saopy']

Creating a sensor object:
>>> sensor124 = saopy.ssn.Sensor("http://example.org/x")

Why shall we use saopy for annotation of IoT data? Saopy enables a user to avoid various common problems including use of undefined properties and classes, poorly formed namespaces, problematic prefixes, literal syntax and other optional heuristics.
>>> sensor124 = saopy.ssn.Senzor("http://example.org/x")
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'Senzor'
>>> cityofaarhus = saopy.foaf.Organisation("http://example.org/cityofaarhus")
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'Organisation'

To mash up multiple sensory objects for serialisation, you can use the ``SaoInfo" class. The following example shows how to create and export a sensor observation object with a value:
>>> trafficData124 = saopy.sao.StreamData("http://example.org/data1")
>>> trafficData124.value = "1234"
>>> saoOut = saopy.SaoInfo()
>>> saoOut.add(trafficData124)
>>> saopy.RDFInterface.exportRDFFile(saoOut,”example1.rdf”, "n3")

Now that we know how to describe a simple observation, let's start annotating more detailed sensory observation. First create provenance information for sensory data:
>>> cityofaarhus = saopy.foaf.Organization("http://example.org/cityofaarhus")
>>> cityofaarhus = saopy.prov.Agent("http://example.org/cityofaarhus")

>>> trafficsensor158324 = saopy.ssn.Sensor("http://example.org/data158324")
>>> trafficsensor158324.actedOnBehalfOf = cityofaarhus

creating properties of sensory data:
>>> measuredTime = saopy.ssn.Property("http://unis/ics/property003")
>>> measuredTime.description = "Measured Time"
>>> estimatedTime = saopy.ssn.Property("http://unis/ics/property004")
>>> estimatedTime.description = "Estimated Time"
>>> avgSpeed = saopy.ssn.Property("http://unis/ics/property001")
>>> avgSpeed.description = "Average Speed"
>>> vcCount = saopy.ssn.Property("http://unis/ics/property002")
>>> vcCount.description = "Vehicle Count"
>>> trafficsensor158324.observes.add(vcCount)
>>> trafficsensor158324.observes.add(avgSpeed)
>>> trafficsensor158324.observes.add(estimatedTime)
>>> trafficsensor158324.observes.add(measuredTime)

SAOPY allows to describe time instants and intervals. Time instants should only be used with tl:at to describe the time instance that data has been collected, whereas time interval should specify both the beginning time of the interval and duration. Here we provide both of the examples.
>>> universaltimeline = saopy.tl.PhysicalTimeLine("http://purl.org/NET/c4dm/timeline.owl#universaltimeline")
>>> instant = saopy.tl.Instant("http://unis/ics/timeinstant")
>>> interval = saopy.tl.Interval("http://unis/ics/timeinterval")
>>> instant.at = "2014-09-30T06:00:00"
>>> instant.onTimeLine = universaltimeline
>>> interval.beginsAtDateTime = "2014-09-30T06:00:00"
>>> interval.durationXSD = "PT5M"

SAO ontology subsumes the measurement unit descriptions from Measurement Unit Ontology (muo). Therefore, it enables to describe measurement unit of an observation as follows:
>>> unitseconds = saopy.muo.UnitOfMeasurement("http://unis/ics/unit1:seconds")
>>> unitkilometer = saopy.muo.UnitOfMeasurement("http://unis/ics/unit2:km-per-hour")

Now, we can annotate sensor observations for two sensor features, namely average speed and measure time:
>>> trafficData001 = saopy.sao.StreamData("http://unis/ics/trafficdataavgspeed001")
>>> trafficData001.value = "60"
>>> trafficData001.hasUnitOfMeasurement=unitkilometer
>>> trafficData001.observedProperty = avgSpeed
>>> trafficData001.observedBy = trafficsensor158324
>>> trafficData001.time = instant

>>> trafficData003 = saopy.sao.StreamData("http://unis/ics/trafficdataMeasuredTime001")
>>> trafficData003.value = "30"
>>> trafficData003.hasUnitOfMeasurement=unitseconds
>>> trafficData003.observedProperty = measuredTime
>>> trafficData003.observedBy = trafficsensor158324
>>> trafficData003.time = interval

exporting sensor data in N3 format:
>>> saoOut.add(trafficData001)
>>> saoOut.add(trafficData003)
>>> saoOut.add(cityofaarhus)
>>> saoOut.add(trafficsensor158324)
>>> saoOut.add(estimatedTime)
>>> saoOut.add(measuredTime)
>>> saoOut.add(avgSpeed)
>>> saoOut.add(vcCount)
>>> saoOut.add(unitkilometer)
>>> saoOut.add(unitseconds)
>>> saoOut.add(universaltimeline)
>>> saoOut.add(instant)
>>> saoOut.add(interval)
>>> saopy.RDFInterface.exportRDFFile(saoOut, "example2.rdf", "n3")

saopy also allows to annotate quality values, such as correctness, frequency, age and completeness. The following example illustrates how to annotate quality features for a data segment of the traffic data stream. The correctness has been obtained based on an algorithm by examining the difference between two successful submitted traffic streams to validate the correctness quality value.
>>> segmentSample1 = saopy.sao.Segment("segment-sample-1")
>>> age = saopy.qoi.Age("age-sample-1")
>>> age.hasAge = "10"
>>> completeness = saopy.qoi.Completeness("completeness-sample-1")
>>> completeness.hasCompleteness = "1"
>>> correctness = saopy.qoi.Correctness("correctness-sample-1")
>>> correctness.hasCorrectness = "1"
>>> frequency = saopy.qoi.Frequency("frequency-sample-1")
>>> frequency.hasFrequency = "10"
>>> segmentSample1.hasQuality.add(frequency)
>>> segmentSample1.hasQuality.add(correctness)
>>> segmentSample1.hasQuality.add(completeness)
>>> segmentSample1.hasQuality.add(age)
>>> owner = saopy.prov.Person("MisterX")
>>> segmentSample1.hasProvenance = owner
>>> saoOut = saopy.SaoInfo()
>>> saoOut.add(segmentSample1)
>>> saoOut.add(age)
>>> saoOut.add(completeness)
>>> saoOut.add(correctness)
>>> saoOut.add(frequency)
>>> saoOut.add(owner)
>>> saopy.RDFInterface.exportRDFFile(saoOut, "example3.rdf", "n3")

2. SPARQL query examples

Once inside python or ipython. Please download the servicerepository document in order to use it in the query example.
>>> import rdflib
>>> from pprint import pprint
>>> graph = rdflib.ConjunctiveGraph()
>>> graph.parse('servicerepository.n3', format='n3')

We've loaded the RDF graph into memory! We have to specify to the parser our format is "n3" as it assumes RDF/XML. We could load our query from the text file, but let's just re-type it in the terminal

>>> query1='''
prefix ssn: <http://purl.oclc.org/NET/ssnx/ssn#>
prefix tl: <http://purl.org/NET/c4dm/timeline.owl#>
prefix sao: <http://purl.oclc.org/NET/UNIS/sao/sao#>
prefix ct: <http://www.insight-centre.org/ct#>

SELECT ?sensorid ?propertyid ?propertyname ?reportID
 WHERE {?sensorid a ssn:Sensor .
  ?sensorid prov:hadPrimarySource ?reportID .
  ?sensorid ssn:observes ?propertyid .
  ?propertyid a ?propertyname .}'''

With rdflib, you'll generally make your query and iterate over the results all at once
>>> for res in graph.query(query1):
... pprint(res['sensorid'])
... pprint(res['propertyid'])
... pprint(res['propertyname'])
... pprint(res['reportID'])

each result row is a tuple where the length / order of the tuple corresponds exactly to the SELECT statement of the query here we query a n3 document to obtain results for all sensors with their property IDs, names, previous IDs. The same .parse function we used to parse our local file can parse URIs over the Web

How to use SPARQLWrapper to fetch annotated dynamic sensory data from a graph and match it with the static data that is available in another graph on virtuoso database using the given SPARQL query

sparql = SPARQLWrapper("http://iot.ee.surrey.ac.uk:8890/sparql")

query2='''
prefix g1: <http://iot.ee.surrey.ac.uk/citypulse/datasets/AarhusObservations>
prefix g2: <http://iot.ee.surrey.ac.uk/citypulse/datasets/servicerepository>
prefix ssn: <http://purl.oclc.org/NET/ssnx/ssn#>
prefix tl: <http://purl.org/NET/c4dm/timeline.owl#>
prefix sao: <http://purl.oclc.org/NET/UNIS/sao/sao#>
prefix ct: <http://www.insight-centre.org/ct#>

SELECT ?observation ?value ?observationTime ?property{
 {GRAPH g1: {
   ?observation a sao:Point .
   ?observation sao:value ?value .
   ?observation sao:time ?time .
   ?time tl:at ?observationTime .
   ?observation ssn:observedProperty ?property .
}} UNION {GRAPH g2: {
  ?property a ct:AverageSpeed .}}
} '''
# add a default graph, though that can also be done in the query string
sparql.setQuery(query2)
sparql.setReturnFormat(JSON)
ret = sparql.query().convert() # ret is a stream with the results in XML, it is a file like object
for res in ret['results']['bindings']:
  pprint(res['observation'])
  pprint(res['value'])
  pprint(res['observationTime'])
  pprint(res['property'])

3. Validation Samples

As an illustration of the validation of annotated data, you can find some samples that involves some of the typical errors that are usually made by ontology engineers on the following link: validation samples, and you can further validate it on SSN Validation Tool either by copy/paste method in the text box or uploading them individually through the browse button.

4. (Simplified) KAT python library

This library is only a simplified version of Knowledge Acquisition Tool (KAT) that will allow researchers to aggregate sensory data for multiple csv documents in one run and quickly obtain annotated document. The libraries that kat library depends on can be downloaded using the following commands:

  1. sudo pip install rdflib
  2. sudo pip install numpy
  3. sudo pip install scipy
  4. sudo pip install PyWavelets

kat library can be downloded from the following link: kat library (v1.0.0). Since it is an ongoing work, please make sure that you are using the latest version. In order to install kat library, you have to use the following command in the directory that you have downloaded the kat python wheels.
$ pip install ./kat-1.0.0-py2.py3-none-any.whl

then start python using
$ python

Now let's open the python interpreter and give it a go...

[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Start by importing the kat package :
>>> import kat

To view all "kat" classes:
>>> dir(kat)
['DataAggregation', '__author__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'dft', 'dwt', 'glob', 'kat', 'logic', 'main', 'paa', 'saopy', 'sax', 'sensorsax', 'test', 'utils']

Let's choose a directory that you would like to run your examples and download the sample in that directory (i.e. data sample ):

>>> path="home/citypulse/Documents/ESWC/SAOPY-Examples/KAT-DataAggregation/"

Now we can write one line command to ask the KAT to extract all the sensory data from CSV documents from the given directory and analyse them based on the method name. The parameters that you need to use to run KAT library is given below:

  1. path of the directory that you keep the csv documents
  2. output directory with the output file name
  3. method name for the data aggregation process
  4. desired output length of the aggregated sensory data
  5. serialisation method (i.e. n3)


The library automatically detects the csv columns and extracts the data. For the sake of clarity in reading the csv documents, timestamp, latitude and longitude should be given in the csv headers as follows: "TIMESTAMP", "LAT", "LONG". The filenames will be accepted as sensor IDs.

>>> kat.DataAggregation(path,path+"sax.rdf", "sax", 10, "n3")

The output is now saved into the output directory that you have given. You can also try it with the other given aggregation methods, such as "dft", "dwt", or "paa".

References

Sefki Kolozali, Maria Bermudez-Edo, Daniel Puschmann, Frieder Ganz, Payam Barnaghi, A Knowledge-based Approach for Real-Time IoT Data Stream Annotation and Processing. Proceedings of the 2014 IEEE International Conference on Internet of Things (iThings 2014), September 2014, Taipei, Taiwan.

Acknowledgements

This work is part of the EU FP7 CityPulse project at the Institution for Communication Systems University of Surrey.