Data Enrichment Service (DES)

 What is the Data Enrichment Service?

The Data Enrichment Service (DES) is a web-based scalable information system build on top of the GATE framework. It allows bespoke processing of data, but it can also be used in conjunction with the core DES enrichment system, which provides continually improving entity extraction with a focus on UK-based data.

If you are handling mainly unstructured text that you know has information hidden away inside it, then the Data Enrichment Service available within the OpenUp™ platform will help you to get at that information. It does this by ‘extracting’ useful items of information, such as people's names, organisations or places. The extracted information can be made available in a variety of formats to suit the requirements of the user.

One of the formats that is available is RDF/XML. This information can then be stored in the RDF store making it available for querying. The combination of extraction and RDF store provides a powerful approach to maximising the information potential of documents.

How can the DES help me and my organisation?

The Data Enrichment Service takes textual information from sources, over which the user has limited control, and adds value to the data to make it useful to other computer systems. The purpose of the DES is two fold:

  1. To identify key elements within unstructured and semi-structured data and introduce machine-readable markup
  2. To relate the markup to other data sources and create linked data

The core DES service is free to use within a certain limit. For data publishers needing to enrich more than 10,000 documents per day with a Service Level Agreement (SLA), we can offer the professional version of the Data Enrichment Service for a monthly fee.

Find out more

To find out more about how TSO can help you to create and enrich data, visit http://openup.tso.co.ukopens in new window

We have also produced two Information Sheets explaining what DES is and how it can help your organisation:

OpenUp Information Sheet [305KB]opens in a new window

OpenUp DES Information Sheet [321KB]opens in new window

To discuss your requirements with one of our experts email opendata@tso.co.uk

How can I access the DES?

You can access the DES through our OpenUp platform at http://openup.tso.co.uk/developer/desopens in new window

The service is free to use within a certain limit. For data publishers needing to enrich more than 10,000 documents per day with a Service Level Agreement (SLA), we can offer the professional version of the Data Enrichment Service for a monthly fee.

More information on the DES – Data sets used

Data.gov.uk

Much of the data that drives the DES will come from the data.gov.uk initiative. The following data sets are currently used:


In addition to these datasets, URIs for dates are also included where possible which reference http://reference.data.gov.ukopens in new window

Legislation.gov.uk

www.legislation.gov.ukopens in new window is the UK’s new website for legislation. It provides an API that can be used to access data across all UK enacted and consolidated legislation.

Currently the DES only makes use of primary legislation information.

Ordnance Survey

Certain data from Ordnance Survey is also used. This consists of:

  • British cities
  • British towns
  • British other settlements
  • British water features
  • Administrative London boroughs
  • Administrative counties

Wikipedia

Information has been acquired from Wikipedia for certain gazetteer lists. These are:

Other Data

Additional information on political parties is taken from http://openelectiondata.org/opens in new window

In addition some data from Geonames is used for countries and a small amount of data from DBpedia is also used.

Provenance

The provenance of information can be an important factor for certain users. To that end the DES is starting to implement the Open Provenance Model (OPMV) as is being created with the ongoing work being done by data.gov.uk. More details on OPMV can be found at SourceForge.

Currently the DES is adding in provenance information for RDF/XML serialisations of documents. As the OPMV work is still ongoing the information generated in this respect is subject to change.