Getting started with SAOPY and Semantic Sensory Data

Working Draft

This is a work in progress and as such is subject to change. Comments are very welcome, please send them to Dr. Sefki Kolozali.

How to use SAOPY
SPARQL Query Examples
Validation of Annotated Data Samples
(Simplified) KAT python library with examples

Appendices

References

1. How to use saopy

saopy depends on:

rdflib (http://rdflib.org/) v2.4.0 (easy_install)

saopy can be downloded from the following link: SAOPY (v1.1.9). Since it is an ongoing work, please make sure that you are using the latest version. In order to install SAOPY library, you have to use the following command in the directory that you have downloaded the SAOPY python wheels.
$ pip install ./saopy-1.1.9-py2.py3-none-any.whl

then start python using
$ python

Now let's open the python interpreter and give it a go...

[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin Type "help", "copyright", "credits" or "license" for more information.

Start by importing the saopy package :
>>> import saopy

To view all "saopy" classes:
>>> dir(saopy) ['DUL', 'PropertySet', 'RDFInterface', 'SaoInfo', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'ces', 'ct', 'exportRDFFile', 'exportRDFGraph', 'foaf', 'geo', 'geo1', 'importRDFFile', 'importRDFGraph', 'model', 'muo', 'owl', 'owlsg', 'owlss', 'owlssc', 'owlssp', 'owlssrp', 'prov', 'qoi', 'rdfs', 'sao', 'ssn', 'tl', 'tm', 'tzont']

To view all classes of sao ontology from saopy library:
>>> dir(saopy.sao) ['DiscreteCosineTransform', 'DiscreteFourierTransform', 'DiscreteWaveletTransform', 'KMeans', 'Mean', 'Median', 'MovingAverage', 'PiecewiseAggregateApproximation', 'Point', 'Segment', 'SensorSAX', 'StreamAnalysis', 'StreamData', 'StreamEvent', 'SymbolicAggregateApproximation', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'saopy']

Creating a sensor object:
>>> sensor124 = saopy.ssn.Sensor("http://example.org/x")

Why shall we use saopy for annotation of IoT data? Saopy enables a user to avoid various common problems including use of undefined properties and classes, poorly formed namespaces, problematic prefixes, literal syntax and other optional heuristics.
>>> sensor124 = saopy.ssn.Senzor("http://example.org/x") Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'Senzor' >>> cityofaarhus = saopy.foaf.Organisation("http://example.org/cityofaarhus") Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'Organisation'

To mash up multiple sensory objects for serialisation, you can use the ``SaoInfo" class. The following example shows how to create and export a sensor observation object with a value:
>>> trafficData124 = saopy.sao.StreamData("http://example.org/data1") >>> trafficData124.value = "1234" >>> saoOut = saopy.SaoInfo() >>> saoOut.add(trafficData124) >>> saopy.RDFInterface.exportRDFFile(saoOut,”example1.rdf”, "n3")

Now that we know how to describe a simple observation, let's start annotating more detailed sensory observation. First create provenance information for sensory data:
>>> cityofaarhus = saopy.foaf.Organization("http://example.org/cityofaarhus") >>> cityofaarhus = saopy.prov.Agent("http://example.org/cityofaarhus")

>>> trafficsensor158324 = saopy.ssn.Sensor("http://example.org/data158324") >>> trafficsensor158324.actedOnBehalfOf = cityofaarhus

creating properties of sensory data:
>>> measuredTime = saopy.ssn.Property("http://unis/ics/property003") >>> measuredTime.description = "Measured Time" >>> estimatedTime = saopy.ssn.Property("http://unis/ics/property004") >>> estimatedTime.description = "Estimated Time" >>> avgSpeed = saopy.ssn.Property("http://unis/ics/property001") >>> avgSpeed.description = "Average Speed" >>> vcCount = saopy.ssn.Property("http://unis/ics/property002") >>> vcCount.description = "Vehicle Count" >>> trafficsensor158324.observes.add(vcCount) >>> trafficsensor158324.observes.add(avgSpeed) >>> trafficsensor158324.observes.add(estimatedTime) >>> trafficsensor158324.observes.add(measuredTime)

SAOPY allows to describe time instants and intervals. Time instants should only be used with tl:at to describe the time instance that data has been collected, whereas time interval should specify both the beginning time of the interval and duration. Here we provide both of the examples.
>>> universaltimeline = saopy.tl.PhysicalTimeLine("http://purl.org/NET/c4dm/timeline.owl#universaltimeline") >>> instant = saopy.tl.Instant("http://unis/ics/timeinstant") >>> interval = saopy.tl.Interval("http://unis/ics/timeinterval") >>> instant.at = "2014-09-30T06:00:00" >>> instant.onTimeLine = universaltimeline >>> interval.beginsAtDateTime = "2014-09-30T06:00:00" >>> interval.durationXSD = "PT5M"

SAO ontology subsumes the measurement unit descriptions from Measurement Unit Ontology (muo). Therefore, it enables to describe measurement unit of an observation as follows:
>>> unitseconds = saopy.muo.UnitOfMeasurement("http://unis/ics/unit1:seconds") >>> unitkilometer = saopy.muo.UnitOfMeasurement("http://unis/ics/unit2:km-per-hour")

Now, we can annotate sensor observations for two sensor features, namely average speed and measure time:
>>> trafficData001 = saopy.sao.StreamData("http://unis/ics/trafficdataavgspeed001") >>> trafficData001.value = "60" >>> trafficData001.hasUnitOfMeasurement=unitkilometer >>> trafficData001.observedProperty = avgSpeed >>> trafficData001.observedBy = trafficsensor158324 >>> trafficData001.time = instant >>> trafficData003 = saopy.sao.StreamData("http://unis/ics/trafficdataMeasuredTime001") >>> trafficData003.value = "30" >>> trafficData003.hasUnitOfMeasurement=unitseconds >>> trafficData003.observedProperty = measuredTime >>> trafficData003.observedBy = trafficsensor158324 >>> trafficData003.time = interval

exporting sensor data in N3 format:
>>> saoOut.add(trafficData001) >>> saoOut.add(trafficData003) >>> saoOut.add(cityofaarhus) >>> saoOut.add(trafficsensor158324) >>> saoOut.add(estimatedTime) >>> saoOut.add(measuredTime) >>> saoOut.add(avgSpeed) >>> saoOut.add(vcCount) >>> saoOut.add(unitkilometer) >>> saoOut.add(unitseconds) >>> saoOut.add(universaltimeline) >>> saoOut.add(instant) >>> saoOut.add(interval) >>> saopy.RDFInterface.exportRDFFile(saoOut, "example2.rdf", "n3")

saopy also allows to annotate quality values, such as correctness, frequency, age and completeness. The following example illustrates how to annotate quality features for a data segment of the traffic data stream. The correctness has been obtained based on an algorithm by examining the difference between two successful submitted traffic streams to validate the correctness quality value.
>>> segmentSample1 = saopy.sao.Segment("segment-sample-1") >>> age = saopy.qoi.Age("age-sample-1") >>> age.hasAge = "10" >>> completeness = saopy.qoi.Completeness("completeness-sample-1") >>> completeness.hasCompleteness = "1" >>> correctness = saopy.qoi.Correctness("correctness-sample-1") >>> correctness.hasCorrectness = "1" >>> frequency = saopy.qoi.Frequency("frequency-sample-1") >>> frequency.hasFrequency = "10" >>> segmentSample1.hasQuality.add(frequency) >>> segmentSample1.hasQuality.add(correctness) >>> segmentSample1.hasQuality.add(completeness) >>> segmentSample1.hasQuality.add(age) >>> owner = saopy.prov.Person("MisterX") >>> segmentSample1.hasProvenance = owner >>> saoOut = saopy.SaoInfo() >>> saoOut.add(segmentSample1) >>> saoOut.add(age) >>> saoOut.add(completeness) >>> saoOut.add(correctness) >>> saoOut.add(frequency) >>> saoOut.add(owner) >>> saopy.RDFInterface.exportRDFFile(saoOut, "example3.rdf", "n3")

2. SPARQL query examples

Once inside python or ipython. Please download the servicerepository document in order to use it in the query example.
>>> import rdflib >>> from pprint import pprint >>> graph = rdflib.ConjunctiveGraph() >>> graph.parse('servicerepository.n3', format='n3')

We've loaded the RDF graph into memory! We have to specify to the parser our format is "n3" as it assumes RDF/XML. We could load our query from the text file, but let's just re-type it in the terminal


>>> 
query1='''

    prefix ssn: <http://purl.oclc.org/NET/ssnx/ssn#> 

    prefix tl: <http://purl.org/NET/c4dm/timeline.owl#> 

    prefix sao: <http://purl.oclc.org/NET/UNIS/sao/sao#> 

    prefix ct: <http://www.insight-centre.org/ct#> 
 


    SELECT ?sensorid ?propertyid ?propertyname ?reportID

     WHERE {?sensorid a ssn:Sensor .

      ?sensorid prov:hadPrimarySource ?reportID .

      ?sensorid ssn:observes ?propertyid .

      ?propertyid a ?propertyname .}'''

With rdflib, you'll generally make your query and iterate over the results all at once
>>> for res in graph.query(query1): ... pprint(res['sensorid']) ... pprint(res['propertyid']) ... pprint(res['propertyname']) ... pprint(res['reportID'])
each result row is a tuple where the length / order of the tuple corresponds exactly to the SELECT statement of the query here we query a n3 document to obtain results for all sensors with their property IDs, names, previous IDs. The same .parse function we used to parse our local file can parse URIs over the Web

How to use SPARQLWrapper to fetch annotated dynamic sensory data from a graph and match it with the static data that is available in another graph on virtuoso database using the given SPARQL query
sparql = SPARQLWrapper("http://iot.ee.surrey.ac.uk:8890/sparql") query2=''' prefix g1: <http://iot.ee.surrey.ac.uk/citypulse/datasets/AarhusObservations> prefix g2: <http://iot.ee.surrey.ac.uk/citypulse/datasets/servicerepository> prefix ssn: <http://purl.oclc.org/NET/ssnx/ssn#> prefix tl: <http://purl.org/NET/c4dm/timeline.owl#> prefix sao: <http://purl.oclc.org/NET/UNIS/sao/sao#> prefix ct: <http://www.insight-centre.org/ct#> SELECT ?observation ?value ?observationTime ?property{ {GRAPH g1: { ?observation a sao:Point . ?observation sao:value ?value . ?observation sao:time ?time . ?time tl:at ?observationTime . ?observation ssn:observedProperty ?property . }} UNION {GRAPH g2: { ?property a ct:AverageSpeed .}} } ''' # add a default graph, though that can also be done in the query string sparql.setQuery(query2) sparql.setReturnFormat(JSON) ret = sparql.query().convert() # ret is a stream with the results in XML, it is a file like object for res in ret['results']['bindings']: pprint(res['observation']) pprint(res['value']) pprint(res['observationTime']) pprint(res['property'])

4. (Simplified) KAT python library

This library is only a simplified version of Knowledge Acquisition Tool (KAT) that will allow researchers to aggregate sensory data for multiple csv documents in one run and quickly obtain annotated document. The libraries that kat library depends on can be downloaded using the following commands:

sudo pip install rdflib
sudo pip install numpy
sudo pip install scipy
sudo pip install PyWavelets

kat library can be downloded from the following link: kat library (v1.0.0). Since it is an ongoing work, please make sure that you are using the latest version. In order to install kat library, you have to use the following command in the directory that you have downloaded the kat python wheels.
$ pip install ./kat-1.0.0-py2.py3-none-any.whl

then start python using
$ python

Now let's open the python interpreter and give it a go...

[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin Type "help", "copyright", "credits" or "license" for more information.

Start by importing the kat package :
>>> import kat

To view all "kat" classes:
>>> dir(kat) ['DataAggregation', '__author__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'dft', 'dwt', 'glob', 'kat', 'logic', 'main', 'paa', 'saopy', 'sax', 'sensorsax', 'test', 'utils']

Let's choose a directory that you would like to run your examples and download the sample in that directory (i.e. data sample ):

>>> path="home/citypulse/Documents/ESWC/SAOPY-Examples/KAT-DataAggregation/"

Now we can write one line command to ask the KAT to extract all the sensory data from CSV documents from the given directory and analyse them based on the method name. The parameters that you need to use to run KAT library is given below:

path of the directory that you keep the csv documents
output directory with the output file name
method name for the data aggregation process
desired output length of the aggregated sensory data
serialisation method (i.e. n3)

The library automatically detects the csv columns and extracts the data. For the sake of clarity in reading the csv documents, timestamp, latitude and longitude should be given in the csv headers as follows: "TIMESTAMP", "LAT", "LONG". The filenames will be accepted as sensor IDs.

>>> kat.DataAggregation(path,path+"sax.rdf", "sax", 10, "n3")

The output is now saved into the output directory that you have given. You can also try it with the other given aggregation methods, such as "dft", "dwt", or "paa".

Getting started with SAOPY and Semantic Sensory Data

Working Draft

Appendices

1. How to use saopy

2. SPARQL query examples

3. Validation Samples

4. (Simplified) KAT python library

References

Acknowledgements