T3.1 – Spatial knowledge mapping
One goal of this task is to develop algorithms and services to map spatial (datasets with geometric information) and non-spatial RDF data. For example, often datasets only contain implicit geographic references, such as city names. However, this information alone is not sufficient for unambiguously identifying geographic entities.A further goal is therefore to provide a tool assisted way of providing context information and parameterizing transformations for the generation of high quality RDF datasets. The following components will be developed:
a component to lift implicit geographical references in data sets. Given an RDF data set with implicit geographical references, the module will automatically detect potential types of geospatial information and the relations to existing vocabularies. For instance, a dataset containing address information could be automatically geocoded using a service such as Nominatim. The address information itself can be identified either by the used vocabulary or by crosschecking the used values against reference lists of e.g. country, city and street names. Additionally, this enables interlinking with spatial datasets, such as GeoNames, DBpedia or LinkedGeoData. The component will be manually evaluated (a) using information retrieval measures, such as precision and recall, and (b) by comparison with reference datasets. For large datasets, sampling methods will be applied.
a component to configure transformation of data in conventional formats into RDF using existing vocabularies. Based on the detected implicit information,the user will be able to specify the type, format, granularity and amount of data being transformed.
a distributed system for continuously processing a large amount of conventional data into an RDF representation based on the configuration. The challenge here is the expected amount of data to be processed. Since the original data sets remain unchanged and evolve over time it is required to be able to quickly re-process large amount of RDF data. This component will be evaluated for its performance, such as by measuring the process time for different datasets and dataset sizes.
Other Tasks in this Workpackage