Developing standards and platforms for data quality and provenance, to ensure data can be applied effectively in diverse situations
Scientists, medics, businesses, consumers and electronic devices generate ever-increasing volumes of data. Historically, only a small fraction of the generated data was shared and re-used, while the majority of data were used once and then erased or archived. In the era of increasingly vast and complex data, we want to extract information from the data to accelerate research and development, make predictions, avoid mistakes and optimise processes.
Robust data provenance standards will make data more understandable, reproducible and discoverable by providing information about its origin, lifecycle and meaning. We are defining best practice in measurement data reuse and traceability by developing metadata standards and data storage structures. Together with industry and academic partners, we are defining minimum metadata standards for life science imaging to structure, locate and interpret datasets and make them available for sharing, publication and data mining.
We serve on an ISO Big Data Analytics working group, making input based on our experience and benefiting from working with other practitioners. NPL is also working on model validation in big data and the data science life cycle.
Find out more about NPL’s Data science case study