1. Introduction

The Open Geospatial Consortium (OGC) is releasing this Call for Participation (CFP) to solicit proposals for the OGC Testbed-19. The Testbed-19 initiative will explore six tasks, including Agile Reference Architecture, Analysis Ready Data, Geodatacubes, Geospatial in Space, High-performance Computing, and Machine Learning: Transfer Learning for Geospatial Applications.

T19 Logo

1.1. Background

OGC Testbeds are an annual research and development initiative that explore geospatial technology from various angles. They take the OGC Baseline into account, and at the same time explore selected aspects with broad teams from industry, government, and academia to advance Findable, Accessible, Interoperable, and Reusable (FAIR) principles and OGC’s open standards capabilities. Testbeds integrate requirements and ideas from a group of sponsors, which allows leveraging symbiotic effects and makes the overall initiative more attractive to both participants and sponsoring organizations.

The Open Geospatial Consortium (OGC) is a collective problem-solving community of more than 550 experts representing industry, government, research and academia, collaborating to make geospatial (location) information and services FAIR - Findable, Accessible, Interoperable, and Reusable. The global OGC Community engages in a mix of activities related to location-based technologies: developing consensus-based open standards and best-practices; collaborating on problem solving in agile innovation initiatives; participating in member meetings, events, and workshops; and more. OGC’s unique standards development process moves at the pace of innovation, with constant input from technology forecasting, practical prototyping, real-world testing, and community engagement.

OGC’s member-driven consensus process creates royalty free, publicly available, open geospatial standards. Existing at the cutting edge, OGC actively analyzes and anticipates emerging tech trends, and runs an agile, collaborative Research and Development (R&D) lab – the OGC Innovation and Collaborative Solution Program – that builds and tests innovative prototype solutions to members’ use cases.

1.2. OGC COSI Program Initiative

This initiative is being conducted under the OGC Collaborative Solutions and Innovation (COSI) Program. The OGC COSI Program aims to solve the biggest challenges in location. Together with OGC-members, the COSI Team is exploring the future of climate, disasters, defense and intelligence, and more.

The OGC COSI Program is a forum for OGC members to solve the latest and hardest geospatial challenges via a collaborative and agile process. OGC members (sponsors and technology implementers) come together to solve problems, produce prototypes, develop demonstrations, provide best practices, and advance the future of standards. Since 1999, more than 100 funded initiatives have been executed - from small interoperability experiments run by an OGC working group to multi-million dollar testbeds with more than three hundred OGC-member participants.

OGC COSI initiatives promote rapid prototyping, testing, and validation of technologies, such as location standards or architectures. Within an initiative, OGC Members test and validate draft specifications to address geospatial interoperability requirements in real-world scenarios, business cases, and applied research topics. This approach not only encourages rapid technology development, but also determines the technology maturity of potential solutions and increases the technology adoption in the marketplace.

1.3. Benefits of Participation

This initiative provides an outstanding opportunity to engage with the latest research on geospatial system design, concept development, and rapid prototyping with government organizations (Sponsors) across the globe. The initiative provides a business opportunity for stakeholders to mutually define, refine, and evolve service interfaces and protocols in the context of hands-on experience and feedback. The outcomes are expected to shape the future of geospatial software development and data publication. The Sponsors are supporting this vision with cost-sharing funds to partially offset the costs associated with development, engineering, and demonstration of these outcomes. This offers selected Participants a unique opportunity to recoup a portion of their initiative expenses. OGC COSI Program Participants benefit from:

  1. Access to funded research & development

  2. Reduced development costs, risks, and lead-time of new products or solutions

  3. Close relationships with potential customers

  4. First-to-market competitive advantage on the latest geospatial innovations

  5. Influence on the development of global standards

  6. Partnership opportunities within our community of experts

  7. Broader market reach via the recognition that OGC standards bring

1.4. Master Schedule

The following table details the major Initiative milestones and events. Dates are subject to change.

Table 1. Master schedule
Milestone Date  Event

M01

24 February 2023

Release of CFP.

M02

6 March 2023

Questions from CFP Bidders for the Q&A Webinar due. (Submit here using Additional Message textbox.)

M03

8 March 2023

Bidders Q&A Webinar to be held 10:00-11:00EST (Recording Link)

M04

10 April 2023

CFP Proposal Submission Deadline (11:59pm EST) Note: Extended to 11 April 2023 (11:59pm EST)

M05

24 April 2023

All testbed Participation Agreements Signed.

M06

10-11 May 2023

Kickoff Workshop (In Person in Philadelphia, PA at Cesium HQ)

M07

5-9 June 2023

OGC Member Meeting Huntsville, AL. (optional)

M08

22 June 2023

Initial Engineering Reports (IERs).

M09

29 September 2023

Technology Integration Experiments (TIE) component implementations completed & tested; preliminary Draft Engineering Reports (DERs) completed & ready for internal reviews.

M10

31 October 2023

Ad hoc TIE demonstrations & Demo Assets posted to Portal; Near-Final DERs are ready for review; WG review requested.

M11

16 November 2023

Final DERs (incorporating internal and WG feedback) posted to pending to meet the 3-week-rule before the technical committee (TC) electronic vote for publication.

M12

6 December 2023

Last deadline for the final DER presentation in the relevant WG for publication electronic vote.

M13

7 December 2023

Last deadline for the TC electronic vote on publishing the final DER.

M14

29 December 2023

Participants' final summary reports are due.

M15

Jan 2023

Outreach presentations at an online demonstration event.

2. Technical Architecture

This section provides the technical architecture and identifies all requirements and corresponding work items. It references the OGC standards baseline, i.e. the complete set of member approved Abstract Specifications, Standards including Profiles and Extensions, and Community Practices where necessary.

Please note that some documents referenced below may not have been released to the public yet. These reports require a login to the OGC portal. If you don’t have a login, please contact OGC using the Additional Message textbox in the OGC COSI Program Contact Form.

The Testbed deliverables are organized in a number of tasks:

The above tasks will be grouped into common use cases or stories as use cases and data are finalized.

2.1. Geospatial in Space

Currently, most OGC Standards focus on data that is observed on the surface or directly above planet Earth. There has been less focus on extra-terrestrial space and the exact location of the remote sensors.

GeospatialInSpace

Testbed 18 evaluated current standards with respect to the exact positioning of sensors at any location within the solar system and their corresponding data streams. The next step is to evaluate Implementation Specifications. Use cases have been identified and this task seeks additional sponsor participation as well as sample data that should be realistic but does not have to be authentic.

2.1.1. Problem Statements and Research Questions

The Geospatial in Space task brings together the results of two Testbed 18 work items, 3D+ and Moving Features, and Sensor Integration. The OGC Moving Features architectures developed through Testbed 18 have achieved a fairly high degree of maturity. The Connected Systems Standards Working Group (SWG) has been formed to take this work to the next level, formal standardization. Before taking that step, it is important to make sure that all potential uses of this technology are addressed.

Testbed 18 also explored the extension of existing OGC standards and technologies to support extra-terrestrial applications (3D+). This includes spatial-temporal services and data for both non-Terrestrial planetary and open space applications. Features in this environment are almost always Moving Features. It would be premature to advance new Moving Features standards without also addressing the 3D+ requirements. The Moving Features in 3D+ Dimensions work item addresses that issue.

To achieve this objective, the following research topics should be explored:

  1. Extend the architecture and draft standards for Moving Feature content developed through Testbed-18 to support 3D (six degrees of freedom) and 4D (spacetime) geometries.

  2. Develop ISO 19111 conformant definitions for non-Earth planetary Coordinate Reference Systems. Register those definitions in a Coordinate Reference System registry. This should include at a minimum an CRS for the Moon and Mars.

  3. Develop a Spatial Reference System definition for Minkowski spacetime based on ISO 19111 and ISO 19108. Identify any required modifications to those standards. Register that definition in a CRS registry.

  4. Explore the ability for existing Moving Features standards and software to work with the non-Terrestrial CRS (#2) and spacetime CRS (#3). Identify shortfalls and propose solutions.

  5. Explore the ability for existing Moving Features standards and software to work with Moving Features traversing open space.

  6. Develop and prototype standards for implementing a graph of coordinate transformations as described in Testbed-18: Reference Frame Transformation Engineering Report.

  7. Research and prototype an effective approach to accommodate Lorentz space and time contractions within coordinate system transformations.

  8. Develop one or more versions of GeoTIFF for extraterrestrial use. This should include the case where the corner coordinates are at infinity.

  9. Develop and submit change requests for existing standards as needed.

2.1.2. Aim

To free OGC standards and technologies from Terrestrial constraints. Allow geospatial analytic tools and techniques to be used on other astronomical bodies as well as in deep space. Fully integrate the terrestrial and extraterrestrial analytic toolset and processes.

2.1.3. Previous Work

Definitions for non-Earth Planetary Coordinate Reference Systems and Coordinate Transformations

Testbed-18 analyzed in the 3D+ Standards Framework Engineering Report (22-036) current standards from ISO, such as ISO 19111: Geographic Information – Spatial referencing by coordinates, and the OGC GeoPose Standard. Both are not adequate for dealing with non-Earth geospatial data. Alternative approaches by geodetic and astronautic organizations, such as the Consultative Committee for Space Data Systems (CCSDS) Navigation Data — Definitions and Conventions, the International Earth Rotation and Reference Systems Service (IERS) IERS conventions, or the NASA NAIF SPICE Toolkit are also being discussed in the Testbed-18 report. The resulting practices are often similar to each other but cannot be understood as defining a Standard framework or approach.

The OGC Testbed-18 3D+ Data Space Object Engineering Report (23-011) begins with the application of ISO 19111: Geographic Information – Spatial referencing by coordinates to the reference frame of objects in space such as celestial bodies or spacecraft in orbit. The Engineering Report is located between the 3D+ Standards Framework Engineering Report (OGC 22-036), which presents the theoretical foundations, and the OGC Testbed-18 Reference Frame Transformation Engineering Report (22-038), which applies ISO 19111 to coordinate operations between the above frameworks for objects in orbit of any celestial body or in free flight in our solar system. Leaving dynamic reference frames aside, ISO 19111:2019 distinguishes two types of coordinate operations: Conversions and transformations. A conversion can include translation, rotation, change of units, etc. but with a result that is still associated to the same reference frame, for example the same spacecraft. By contrast a transformation involves a change of reference frame, for example from one spacecraft to another one.

OGC 22-038 discusses OGC GeoPose in addition to ISO 19111. GeoPose can be used for describing a relationship between, for example, a spacecraft and a ground station. Most concepts defined in GeoPose can also be expressed using existing ISO 19111 constructs and shared as an OGC Geography Markup Language (GML) encoding.

Additional information is available in the Compatibility Study between ISO 18026 CD, Spatial Reference Model (SRM) and ISO 19111, Geographic information – Spatial referencing by coordinates. The study provides an assessment of the compatibility of the concepts and data elements described in ISO 18026 and 19111.

The International Federation of Surveyors, as part of their Volunteer Surveyor program, hosts a two-week hackathon-style event that focuses on allowing individuals to work on the development of a (hypothetical) Land Tenure Reform System for Mars. Though planned as a fun exercise, interesting CRS related concepts may come out of this effort.

GeoTIFF

Testbed-17 researched Cloud Optimized GeoTiff (COG) aiming to develop a specification that can be directly considered by the GeoTIFF SWG to be put forward as an OGC Standard. It also compared COG with other solutions for multi-dimensional data in the cloud context with focus on Zarr. This Testbed-17 Tasks produced the OGC Testbed-17: Cloud Optimized GeoTIFF Specification Engineering Report (21-025). COG enables efficient access to GeoTIFF data on the cloud.

2.1.4. Work Items & Deliverables

The following diagram outlines all activities, work items, and deliverables in this task.

T19 Space Diagram

The following list identifies all deliverables that are part of this task. Detailed requirements are stated above. All participants are required to participate in all technical discussions and support the development of the Engineering Report(s) with their contributions.

  • D100-101 Moving Features Components – Components that implement the Topics #1, #4, and #5 as defined in Section Problem Statements and Research Questions above.

  • D102-103 CRS & Transformation Components – Components that implement the Topics #2, #3, #6, and #7 as defined in Section Problem Statements and Research Questions above.

  • D104 GeoTiff Component – Component that implements the Topics #8, and #9 as defined in Section Problem Statements and Research Questions above.

  • D001 Non-Terrestrial Geospatial Engineering Report – An Engineering Report which documents the approach, methodology and conclusions for Topics #1 through #7 above. The editor shall submit Change Requests to existing standards as needed.

  • D002 Extraterrestrial GeoTIFF – An Engineering Report which documents the approach, methodology and conclusions for Topics #8 and #9 above. The editor shall submit Change Requests to existing standards as needed.

2.2. Machine Learning: Transfer Learning For Geospatial Applications

New and revolutionary Artificial Intelligence (AI) and Machine Learning (ML) algorithms developed over the past 10 years have great potential to advance processing and analysis of Earth Observation (EO) data. While comprehensive standards for this technology have yet to emerge, OGC has investigated opportunities in ML standards for EO in its COSI Program (the Machine Learning threads in Testbeds 14, 15, and 16, see Engineering Reports available here), it has developed a proposal for an ML Training Data standard through its TrainingDML-AI Standards Working Group, and has provided analyses and recommendations of the proposed standard and its next steps in the Machine Learning thread of TestBed-18 (see Testbed-18 Engineering Report). This work continues these beginnings.

MLTransferLearning

Among the most productive methods in the application of ML to new domains has been the re-use of existing ML solutions for new problems. This is achieved through Transfer Learning, where a subset of the Domain Model produced by ML in a related domain are taken as the starting point for the new problem. With Transfer Learning, the investment in previous ML tasks, which can be enormous both in terms of the Training Data required to support a good model, and in terms of the computing power required to refine the model to achieve good performance in the learning process, can be made to pay off repeatedly. A major goal of this effort is to ascertain the degree to which Transfer Learning may be brought into an OGC standards regime.

Re-use depends on the new ML application being able to incorporate the results of previous ML applications. This means that the computational model, i.e., the ML architecture, of the earlier model has somehow to be aligned with that of the later ML application. Part of the work in this Testbed thread is to determine the data and information elements needed to succeed with Transfer Learning in the EO domain, e.g., how much information about the provenance of the ML model’s training data needs to be available? Is it important to have a representation of what is in-distribution vs what is out-of-distribution for the ML model? Do quality measures need to be conveyed for Transfer Learning to be effectively encouraged in the community? Are other elements required to support a standard regimen for building out and entering new Transfer Learning-based capabilities into the marketplace?

The strong need for alignment has also meant that, in practice, Transfer Learning has almost always been applied only within a single ML architecture, such as between earlier and later instances of TensorFlow. It would be of great benefit for cross-architecture Transfer Learning to be available such as, for instance, between instances of TensorFlow and PyTorch. This topic explores that possibility for the special case of geospatial applications.

The goal of Testbed-18 was to develop the foundation for future standardization of training data sets (TDS) for Earth Observation applications. The goal of this Testbed-19 task is to develop the foundation for future standardization of ML Models for transfer learning within geospatial, and especially Earth Observation, applications. The task shall evaluate the status quo of Transfer Learning, metadata implications for geo-ML applications of Transfer Learning, and general questions of sharing and re-use. Several initiatives, such as MicroSoft’s ONNX effort, have developed implementations that could be used for future standardization.

2.2.1. Problem Statement and Research Questions

Reusable ML models are crucial for ML and AI applications, and re-use through Transfer Learning has been central to the rapid deployment of AI over the past decade. But the Transfer Learning process remains somewhat informal, hindering clear understanding of system capabilities and opportunities for synergy. Moreover, Transfer Learning is in general bound to individually-developed architectures, and therefore are becoming a significant bottleneck in the more widespread and systematic application of AI/ML. The research question to pursue include:

  • General lack and inaccessibility of appropriate high-quality ML Models;

  • Absence of standards resulting in inconsistent and heterogeneous ML Models (source formats, architecture layout, quality control, meta data, repositories, licenses, etc.);

  • Limited discoverability and interoperability of ML Models;

  • Lack of best-practices and guidelines for generating, structuring, describing and curating ML Models.

This task shall tackle the following research questions:

  1. How is an ML Model to be described to enable Findability of the model for Transfer Learning applications?

  2. How is an ML Model to be managed to enable Access of the model for Transfer Learning applications?

  3. Are there significant opportunities for interoperability among ML Models even if they derive from different architectures, e.g., by mapping to a canonical representation as in, e.g., the ONNX standard?

  4. How is an ML Model to be described to enable efficient re-use through Transfer Learning applications?

  5. What are the main characteristics of the ML Model itself that make it suitable for Transfer Learning, and what additional information needs to be provided to sufficiently understand the nature and usability of the model? E.g., is provenance of the models’ domain required? Are quality measures (and could these be automatically generated)?

  6. Does the relationship between an ML Model and the training data set it was ultimately derived from need to be represented (and if so, could that relationship be automatically generated).

  7. Could/should the ML Models’ metadata describe a “performance envelope” based on the distribution of characteristics in the training data it was ultimately derived from (and could such a piece of information be automatically derived)?

2.2.2. Aim

The objective of this task is to document current approaches and possible alternatives to lay out a path for future standardization of interoperable/ transferable Machine Learning Models for Earth Observation applications.

2.2.3. Previous Work

Machine Learning was subject of the last four OGC Testbed activities, with Testbed-15 to Testbed-17 reports available here, the Testbed-18 report is currently in its final revision and directly available only to OGC members, but can be made available to interested parties on request.

2.2.4. Work Items & Deliverables

The following diagram outlines all activities, work items, and deliverables in this task.

T19 ML Diagram

The following list identifies all deliverables that are part of this task. Detailed requirements are stated above. All participants are required to participate in all technical discussions and support the development of the Engineering Report(s) with their contributions.

  • D003 Machine Learning Models Engineering Report - This Engineering Report captures all results of the ML TDS task and can serve as a baseline for future standardization.

  • D106-107 Machine Learning Components - Demonstrations of transfer learning. Ideally, the transfer happens across software environments, e.g., from PyTorch to TensorFlow or similar. Earth observation use cases are preferred, such as satellite image classification or object detection tasks.

2.3. Geodatacubes

Over the past decade, GDCs have been developed independently, resulting in a lack of interoperability between them. By improving interoperability, the vendor community will be able to proceed with specific GDC variants, while at the same time the consumer community will be able to interact much more effectively with different instances. With the increase in available data products served as GDCs, it is becoming increasingly important to understand exactly what a GDC entails and how it was created.

DataCubesBlog Sizes

In recent years, the international community has invested significant resources in Geospatial Data Cubes (GDCs): Infrastructure that supports storage and use of multidimensional geospatial data in a structured way. While advances have been made to develop GDCs to support specific needs, it is necessary to gain a clear understanding on whether existing GDCs are interoperable to allow organizations to address their specific needs. There is also a need for reference implementations that make GDCs interoperable and exploitable within an Earth Observation (EO) exploitation environment. This project aims to develop solution prototypes based on existing EO GDCs. Project findings will contribute to the further development of GDCs if necessary.

The OGC Geodatacube Standards Working Group (GDC SWG) is currently forming and will become active within the next few weeks. The charter - available online here - identifies the following work items as priorities:

  1. Define an Application Programming Interface (API) that serves the core functionalities of GDCs. GDC users will be able to handle different GDCs according to the same principles, as interoperability between GDCs will be achieved.

  2. Define a metadata model including provenance and data lineage information to describe all details about a GDC

  3. Identify formats to be used for data exchange. If existing formats do not meet the requirements, the SWG will extend its work to the development of a GDC exchange format.

This Testbed 19 task supports the work of the Standards Working Group, with which there will be close collaboration throughout the Testbed term. Testbed Participants will help define the GDC API and develop prototypes in an agile way to experiment with the API. The SWG charter has defined the following steps as in scope:

  • Identification of real-life use cases with industrial relevance

  • Definition of the GDC API (which may be a profile of (an) existing OGC API(s) or a new development)

  • Definition of exchange format recommendations, profiles, or new developments

  • Definition of the GDC metadata model (in particular information about how the GDC was built; similar to ARD concepts and vision around data provenance and lineage)

  • The GDC API shall support accessing and processing at minimum

  • Analysis of the usability of existing standards

2.3.1. Problem Statement & Research Questions

This task shall define the GDC API and metadata model, test the new API against a set of use cases partly defined in this document, partly to be defined together with the sponsors during the Testbed-execution, develop implementations that allow further experimentation (at least two implementations shall be available as open source software under the Apache 2.0 License, with the OGC name used in lieu of the Apache Software Foundation name, or a compatible permissive open source license approved by the Open Source Initiative), and develop a number of client applications for both data access and advanced visualization. The following three use case scenarios have been defined already. It is possible that as a result of the first ad-hoc meeting of the Geodatacubes SWG at the 125th OGC Member Meeting, 20-24 February 2023, additional use cases will be added or existing use cases will be revised. Please check the Corrigenda & Clarifications before submitting your proposal for any modifications.

Use Cases A

Develop a solution prototype to enable access to and exploitation of data and processors within distributed GDCs of different types.

  • Investigate and choose open existing open-source solution for the implementation of the prototype

  • Integration of at least one EO dataset and one other form of a geospatial data (e.g. vector, earth system prediction data)

  • Deploy and demonstrate the prototype in a cloud environment

  • Advances to open-source approaches developed through the project shall be incorporated into the appropriate open source initiatives by the participant

  • Document additional development effort required for the prototype to be implemented in an operational setting

Use Cases B

Use Cases C

2.3.2. Aim

This task aims to:

  • Understand usability of existing solutions to serve and process GDC data. The task shall provide a comparative analysis of similar standards to help identify gaps and guidance on which standard(s) to use for given GDC use cases.

  • Test specifications of the draft standard against operational requirements to determine the degree to which they meet real-world needs.

  • Develop the GDC API draft standard

  • Develop the GDC metadata model

  • Test both the API and the metadata model in real world scenarios, ideally by GDC operators/users

  • Develop open-source libraries that facilitate the experimentation with GDCs

  • Demonstrate advanced visualization concepts for GDC data

  • Demonstrate how GDC and Analysis Ready Data can align

  • Describe what needs to be done next

2.3.3. Previous Work

The OGC Testbed-17: Geo Data Cube API Engineering Report (21-027) is, like all COSI Program Engineering Reports, available from the OGC public engineering reports webpage. The OGC Testbed-17 Engineering Report (ER) documents the results and recommendations of the Geo Data Cube API task in 2021. The Testbed-17 Call for Participation provides additional insights into requirements and use cases. The ER defines a draft specification for an interoperable GDC API leveraging OGC API building blocks, details implementation of the draft API, and explores various aspects including data retrieval and discovery, cloud computing and Machine Learning. Implementations of the draft GDC API have been demonstrated with use cases including the integration of terrestrial and marine elevation data and forestry information for Canadian wetlands.

OGC Testbed-16: Data Access and Processing Engineering Report (20-016) summarizes the results of the 2020 Testbed-16 Data Access and Processing task. The task had the primary goal to develop methods and apparatus to simplify access to, processing of, and exchange of environmental and Earth Observation (EO) data from an end-user perspective.

The openEO initiative develops an open API to connect R, Python, JavaScript and other clients to big Earth observation cloud back-ends in a simple and unified way

2.3.4. Work Items & Deliverables

The following diagram outlines all activities, work items, and deliverables in this task.

T19 Geodatacubes Diagram

The following list identifies all deliverables that are part of this task. Detailed requirements are stated above. All participants are required to participate in all technical discussions and support the development of the Engineering Report(s) with their contributions.

  • D111/D112 OGC API-GDC instance: Instances of the future GDC API, open source implementations

  • D171/D172 OGC API-GDC instance: Instances of the future GDC API, ECMWF use cases

  • D173/D174 Viz Client: Client instance with support for advanced visualization of GDC data, at least support for ECMWF use cases

  • D113/114 Data Client: Client instances optimized for GDC interaction with support for GDC metadata model, support for all use cases

  • D175/176 Usability Tests: All prototypes and specifications shall be tested for ease of use and technical usability. These tests are ideally executed by Geodatacube providers or users. In any case, Participants for these work items need to be different to the other Participants in this task.

  • D11 GDC Task Engineering Report: OGC Testbed-19 Geodatacubes Task Engineering Report, capturing all use cases, existing technology assessments, implementation descriptions, and recommendations for future activities together with all experiences made and lessons learned during the execution of the task.

  • D71 OGC API-GDC draft standard: Draft standard, jointly developed and under the aegis of the OGC GDC-SWG

2.4. Analysis Ready Data

As we work towards the FAIR principles of finding, accessing, interoperating, and reusing physical, social, and applied science data easily, Analysis Ready Data is a key component of this capability. It has been estimated that data analysts spend up to 80% of their time identifying, selecting, and preparing datasets in order to analyze and integrate them. Analysis readiness aims to reverse this proportion by preparing data in advance for reusability across a range of analytical tasks. We want to select data with only the necessary characteristics for our needs, select specific areas and columns of interest, select applicable layers, and rely on defined, documented quality and provenance.

Most of all, we want to be able to combine, concatenate, and intersect multiple datasets based on their compatible states of spatiotemporal referencing and phenomenon calibration. Additionally, we need visualization capabilities, both as map layers and immersive, especially for improved climate and disaster understanding, mitigation, and response. Many innovative tools and techniques are being developed to address these needs. Analysis readiness is a way of normalizing and standardizing the use of these innovations.

ARD

The Analysis Ready Data (ARD) SWG is currently being chartered to develop both a core ARD framework standard and multi-part standards for analysis readiness of specific geospatial data product families and products. Born from work undertaken and challenges identified in the OGC Disaster Pilot 2021 and OGC Testbed-16, the Analysis Ready Data (ARD) SWG will develop a multi-part Standard for geospatial Analysis Ready Data in partnership with ISO/TC 211. The concept of ARD was initially developed by the Committee on Earth Observation Satellite (CEOS). CEOS defines ARD as “satellite data that have been processed to a minimum set of requirements and organized into a form that allows immediate analysis with a minimum of additional user effort and interoperability both through time and with other datasets.” CEOS has developed several ARD Product Family Specifications for optical and radar imagery. Through consultations with the community, CEOS has concluded that formal standardization of the CEOS-ARD concepts and specifications is needed to achieve broad uptake, particularly by the commercial sector. International ARD standardization will also help promote the broader concept and help avoid diverging interpretations of the ARD concept. The OGC ARD SWG will build on the CEOS work to develop standards that both formalize that work and extend it to all geospatial data.

The goal of ARD is for data to be processed once and then easily used and especially reused for a variety of analyses. This task will support the work of the newly formed ARD SWG in pursuing the goal of reuse by creating and exploring a number of scenarios for generating, integrating and applying ARD. Scenarios should examine the reuse of satellite-derived ARD and also consider the nature of analysis readiness for other geospatial datasets. With climate change and disaster resilience as major challenges of our time and in the current digital transformation we are experiencing, scenarios in these domains are preferred. The scenarios should demonstrate how one can discover, access, and use reliable ARD, as well as how ARD systems can best incorporate trustworthy and high-quality workflows, interoperability and scalability, and mapping / visualization of analytical outcomes.

There is a strong link between the standardization of dimensional axes in ARD and the resource organization of Geodatacubes. Proposals that include both the ARD topic and the Geodatacube topic are encouraged.

2.4.1. Problem Statement and Research Questions

OGC is working to define foundational elements which allow for the mixing and matching of different standards and target its mission of implementing Findable, Accessible, Interoperable and Reusable (FAIR) principles for scalable and repeatable use of data. The concept and implementation of analysis readiness can significantly address both climate and disaster resilience needs for information agility by improving access to interdisciplinary sciences such as natural, social, and applied sciences as well as engineering (civil, mechanical, etc.) public health, public administration and other domains of analysis and application.

In this task, we are looking for participants to provide scenarios that take advantage of current CEOS-ARD specifications and products while advancing ARD capabilities towards ARD workflows that can be utilized by end users to significantly reduce processing time, repetitive user tasks, and use of computational resources, enabling faster, easier, and more imaginative analysis of data. OGC invites participants to propose scenarios that target as many of the following outcomes as possible:

  • Improve characterization of data - metadata and catalogs

  • Improve ease of integration for satellite and non-satellite data and services, especially through the medium of datacubes

  • Improve ARD interoperability of data, services and tools through open standards usage / development

  • Reduce uncertainty and risk levels of data usage

  • Improve data provenance and therefore trust by including lineage, source data, processes, validity, and data quality

  • Improve autonomous workflows and use of datacubes as workflow resources

  • Improve geolocation accuracy

  • Improve reproducible workflow characterizations

  • Refine requirements for ARD file format, tiling schemes, pixel alignment or anything related to improvement of repeatable ARD distribution, access, and integration.

OGC prefers user scenarios be developed towards climate and/or disaster focus arenas; however, any relevant scenario is acceptable.

2.4.2. ARD Exemplar Use Cases

Analytical Reuse

Oona develops an analytical workflow that processes different optical imagery bands to categorize land cover. She wishes to target input datasets conforming to an ARD specification so as to reduce the processing the workflow must perform and increase its reusability across different satellite data products.

Longitudinal Interoperability

Ogden is investigating trends in surface temperature observations and uses datasets conforming to an ARD specification so as to maximize comparability of temperature estimates for a given location across multiple collections.

Analytical aggregation

Olson develops a workflow that integrates satellite collections in 3-day intervals to create cloudcover-minimized images. Use of collections conforming to an appropriate ARD specification will maximize the availability of cloud-free pixels for integration and maximize the positional accuracy of pixels in the resulting composite images.

2.4.3. Aim

Create, develop, identify, and implement where possible Analysis Ready Data (ARD) definitions and capabilities to advance provision of information to the right person in the right place at the right time. Additionally, increase ease of use of ARD through improved backend standardization and implementation of varied ARD application scenarios. This work should inform ARD implementers and users on standards and workflows to maximize ARD capabilities and operations. It should further support the ARD SWG in their standardization work. A close cooperation with the ARD SWG and associated activities is envisioned for this task.

2.4.4. Previous Work

This task is based on:

2.4.5. Work Items & Deliverables

The following diagram outlines all activities, work items, and deliverables in this task.

T19 ARD Diagram

The following list identifies all deliverables that are part of this task. Detailed requirements are stated above. All participants are required to participate in all technical discussions and support the development of the Engineering Report(s) with their contributions.

D051: ARD Task Engineering Report: An engineering report describing all the elements for an Analysis Ready Data system and building block components. It will describe ARD requirements, identify initial use case objective(s); components and elements needed to reach the objectives; how the testbed reached it’s objectives; identify technology gaps or elements of future work if applicable. This ER should also include how this work would be scalable within other domains or to be applied more broadly within the same domain. The report will be jointly developed by all task participants.

D151-153 ARD Demonstration Scenarios - Software implementations and demonstrations showing ARD in action. The demos shall be delivered as screen recordings and clearly describe the ARD scenario in detail, as well as how and to what extent the scenario implementation meets target outcomes of the task. Ideally, each scenario implementation is delivered as a repository and Docker container to allow future experimentation.

2.5. Agile Reference Architecture

The world has changed significantly since 2010 with Web 2.0 as the importance of location-referenced information is increasingly about the utilization of data (and hence spatial-temporal streaming and query) rather than map presentation. There is an emerging technological landscape that is dominated by scalable Cloud computing infrastructures, Internet of Things and Edge computing technologies. Distributed physical system deployments enable AR/VR and Digital Twins to bridge physical and virtual environments via data integration and analysis capabilities drawing from multiple sources. It is likely that the generation after this is going to be dominated by the semantic interoperability challenges to unleash the opportunities of increasing computational power and data volumes.

ARA

There is an accelerating rate of change in a technological landscape that is increasingly reliant on APIs that communicate through JSON-encoded messages rather than Web Services that communicate through XML-based encoded messages. It is likely that YAML which further simplifies but also extends JSON syntax will become more prevalent - in general flexibility of encoding and data views will need to be supported through better documentation of underlying semantics. This trend will serve to realize some of the core FAIR principles not well supported by current data exchange capabilities.

Increasingly there is going to be a need that information and data services can be created, developed and built so that they support and enable secure resilient data services that provide the flow of End to End (E2E) systems and/or networks. Interoperability of data over time spans greater than configurations of particular technical components will support both large scale cross-domain use of data, but also more dynamic environments where network availability and trust is a short term issue.

2.5.1. Problem Statement and Research Questions

We are seeking to understand in the future 2-3 years as we gain experience with new API technologies how information and data can operate across communications and networks in a robust, FAIR manner. How can you achieve the flow of information and data so that the client/server and or federated/mesh network is able to implement resilient data services for E2E systems? How can increasingly heterogeneous sources of data about the same real world phenomena be integrated as sensors and AI provide greater potential insights? How can we scale up techniques for models of the world to interact to support or validate our understanding as both modeling and data sources proliferate?In the future information and data services are likely to be secure and digital by design, they will have adopted a Zero Trust Architecture approach. This will be supported by data being secure, standardized, that is machine readable and exploitable. It will be necessary to develop and understand how generation after next resilient data services will operate. It is important to understand the necessary architecture and building block components that need to be implemented as foundations now.

Generation-after-next is an approach that does not exist and/or is a contributing technology that is not fully understood. Concepts will be ‘leap ahead’ that challenge the boundaries of current and emerging understanding. [TRL 1-3]

Resilient data services are important as they incorporate data centric decision making – relevant data, assured and delivered in the right way to enable the right decisions to be made at the right place.

This work should consider the following aspects:

  1. Needs to address the challenges of resilience, integration, and interoperability

  2. Universal access, for discovery and assurance

  3. Transformation of heterogeneous data sources into locally-useful data forms, including reformatting and dimensional up and down-scaling.

  4. Continuous integration and testing (CI/CT)

  5. Network Characteristics and incorporating intelligent monitoring to ensure Quality of Service

2.5.2. Aim

Seeking to create, develop and identify the architecture elements for agile reference architectures and understand how these can be used for defining different use cases which allow different implementations for API Building Blocks. This work is seeking to inform how generation after next Resilient Data Services will operate.

2.5.3. Previous Work

  • Secure Registry and Metadata

  • Data Centric Security: authenticity, integrity and security of the data

  • Information architectures

  • Features and Geometry FG-JSON extension of GeoJSON

  • Analysis Ready Datasets

  • Machine Readable

  • RDF Knowledge graphs

  • Flexible SPARQL query

  • HTML and machine readable formats and links to implementation resources to support discovery of options

2.5.4. Work Items & Deliverables

OGC is defining the foundational elements which will allow for the mix and matching of different standards and is embracing the Findable, Accessible, Interoperable and Reusable (FAIR) concept for scale and repeatable use of data. The use of agile reference architectures can be used for defining different use cases which allow different implementations for Building Blocks. This work is seeking to inform the foundations for this generation after next (GEN) approach.

T19 ARA Diagram

Figure 01 – This illustrates processes (Collect, Process, Manage and Store, Disseminate, Query and Consume) which are important functions of a Reference Architecture.

The proposed Testbed task will focus on the design of an Agile Reference Architecture that allows and informs the generation-after-next implementation of Resilient Data Services over secure and degraded networks (Denied Degraded Intermittent and Limited (DDIL)). For instance how might ML be used to ensure and maintain quality of services if the flow of information and data services for critical E2E systems are degraded? How can autonomous systems enable the flow of information and data enabling and ensuring data centric decision making? It is possible that different elements have not yet been considered, designed or considered and this work is the first step to enable this. Engineering Report deliverables from the proposed task are below:

D021 ARA Task Engineering Report: An engineering report describing all the architecture elements for Agile Reference Architectures and the role of API Building Block components. It will identify the components and elements needed for the target Use Case and the responsibilities of each. Options for realizing these responsibilities with existing specifications will be considered and elements of future work if they are necessary?

D022 ARA Task Engineering Report DDIL Use Case: An engineering report describing the enterprise, engineering, information, technological, and computational viewpoints for an implementation of Resilient Data Services based on a DDIL use case.

D121: An integrated knowledge base linking machine readable specifications required to implement the target DDIL Use Case

D122. The Agile Reference Architecture represented in RDF/Turtle format (i.e. a description of how the components and specifications are related in the form of a reusable pattern that can be adapted to new circumstances)

D123. An instance of OGC API Processes that:

  • Implements the OGC API Processes Part 3: Workflows and Chaining candidate Standard to take a specification of a reference architecture as input and creates an application package for deployment in a containerisation or cloud computing environment.

  • Implements OGC API Processes Part 1: Core to conduct decision support operations that are supported by Machine Learning

  • Supports sufficient semantic annotation to identify necessary contextual information to support reuse.

D124. An instance of OGC API Features serving OpenStreetMap data represented according to the NSG Topographic Data Store schema

D125. An instance of OGC API Features (and/or STA and/or EDR) - supporting real-time observations of some phenomena relevant to the Use Case,

  • Supports sufficient semantic annotation to identify necessary contextual information to support reuse.

D126. An instance of OGC API processes to support transformation of observational data into the reusable form required by the main analytical workflow.

  • CICT support for specification of transformation based on machine readable knowledge graph (deliverable D121).

D127. An instance of OGC API Tiles serving OS Open Zoomstack data.

D128. An instance of OGC API Records that provides metadata about all of the software components available in the system.

2.6. High Performance Computing

Notice: This topic has not yet been funded; all work proposed to the HPC topic should therefore be scoped as in-kind contributions. Some resource support will be available through OGC’s participation in the NSF-funded I-GUIDE project

2.6.1. Problem Statement and Research Questions

Large-scale geospatial analytical computation is becoming a critical need for tackling a wide range of sustainability problems such as climate change, disaster management, and food and water security. Geospatial researchers and practitioners interested in and utilizing geospatial analytics come from a wide range of disciplines including geography, hydrology, public health, social sciences, etc. While advanced cyberinfrastructure and computer science expertise is what enables large-scale computational problem solving, experts in various geospatial-related domains cannot all be expected to have the in-depth technical expertise necessary to interact directly with high-performance computing (HPC) resources on advanced cyberinfrastructure and optimize the use of those resources for their particular computational challenges.

HPC

To bridge this gap, it will be necessary to design and implement tools as middleware between user frontend and HPC backends. Consequently, this requires serious efforts not only in terms of research and development but also in generalizing and standardizing all aspects of accessing, utilizing, and managing High-Performance Geospatial Computing (HPGC) resources that contribute to their effective use in geospatial domains.

2.6.2. Aim

Evaluate previous and current work in the application of HPC for geospatial analytics. Develop initial standards for HPGC resource definitions and processing interfaces.

2.6.3. Previous Work

T19 HPC Diagram
Requirements & Open Questions:
  • Identify key application drivers of HPGC

  • Specify essential components of HPGC workflows

  • Review and comment on the CyberGIS-Compute framework. Assess its applicability as a starting point for the standardization of HPGC

  • Review and comment on the internal working of CyberGIS-Compute’s architectural components. Suggest possible improvements to the framework

  • Identify data-intensive geospatial analytics that can leverage HPGC

  • Identify HPGC-based geospatial analytics that can be standardized and made accessible to the broader geospatial community

Open Research & Development Questions
  • Are there any (representative) geospatial libraries that can be initially adapted in HPC environments to support a wide range of geospatial workflows?

  • For data-intensive workloads, what is a standardized geospatial data model that can exploit heterogeneous HPC resources?

  • How can code be contributed to an open HPGC platform while avoiding potential misuse of HPC computational resources?

2.6.4. Work Items & Deliverables

D081: High Performance Geospatial Computing Summary ER describing the investigations, experimentations (e.g. with CyberGIS-compute) and outcomes of the task, as well as answers to open questions, and recommendations for future work.

D181: A prototype Open HPGC API Service (e.g. OGC API Processes for HPGC) to access an HPGC-enabling backend such as CyberGIS-Compute (User API, Computational task on a job queue, and/or Maintainer Workers that access specific HPGC resources).

D182: A generic python package to access the D181 Open HPC API that can be integrated with diverse HPGC application codes.

D183: One or more Jupyter Notebooks using D182 to access the D181 API and implement orchestration between HPGC resources and computational repositories/containers for representative use cases

D184: One or more computational repositories/containers, which may consist of Jupyter Notebooks, Application Packages, and/or other code resources that can be accessed from Notebooks, to be staged to HPGC resources and support the computational workflows orchestrated by D183.

3. Deliverables Summary

The following tables summarize the full set of Initiative deliverables. Technical details can be found in section Technical Architecture.

Please also note that not all work items were supported by sponsor funding at time of CFP publication. Negotiations with sponsors are ongoing, but there is no guarantee that every item will ultimately be funded.

Bidders are invited to submit proposals on all items of interest under the assumption that funding will eventually become available.

Table 2. CFP Deliverables - Grouped by Task

Task

ID and Name

Geospatial in Space

  • D100-101 Moving Features Components

  • D102-103 CRS & Transformation Components

  • D104-105 GeoTiff Components

  • D001 Non-Terrestrial Geospatial Engineering Report

  • D002 Extraterrestrial GeoTIFF

Machine Learning: Transfer Learning

  • D003 Machine Learning Models Engineering Report

  • D106-107 Machine Learning Components

Geodatacubes

  • D111/D112 OGC API-GDC instance

  • D171/D172 OGC API-GDC instance

  • D173/D174 Viz Client

  • D113/114 Data Client

  • D175/176 Usability Tests

  • D11 GDC Task Engineering Report

  • D71 OGC API-GDC draft standard

Analysis Ready Data

  • D051: ARD Task Engineering Report

  • D151-153 ARD Demonstration Scenarios

Agile Reference Architecture

  • D021 ARA Task Engineering Report

  • D022 ARA Task Engineering Report DDIL Use Case

  • D121 An integrated knowledge base

  • D122 Agile Reference Architecture represented in RDF/Turtle format

  • D123 An instance of OGC API Processes

  • D124 An instance of OGC API Features serving OpenStreetMap data

  • D125 An instance of OGC API Features (and/or STA and/or EDR)

  • D126 An instance of OGC API processes to support transformation of observational data into the reusable form

  • D127 An instance of OGC API Tiles serving OS Open Zoomstack data.

  • D128 An instance of OGC API Records that provides metadata about all of the software components available in the system.

High Performance Computing

  • D081: Summary Report on CyberGIS-Compute

  • D181: An Open HPGC API Service

  • D182: Python package to access the Open HPC API

  • D183: One or more Jupyter Notebooks using D101 to access D100

  • D184: One or more computational repositories/containers

4. Miscellaneous

Call for Participation (CFP): The CFP includes of a description of deliverables against which bidders may submit proposals. Several deliverables are more technical in nature, such as documents and component implementations. Others are more administrative, such as monthly reports and meeting attendance. The arrangement of deliverables on the timeline is presented in the Master Schedule.

Each proposal in response to the CFP should include the bidder’s technical solution(s), its cost-sharing request(s) for funding, and its proposed in-kind contribution(s) to the initiative. These inputs should all be entered on a per-deliverable basis, and proposal evaluations will take place on the same basis.

Once the original CFP has been published, ongoing updates and answers to questions can be tracked by monitoring the CFP Corrigenda Table and the CFP Clarifications Table. The HTML version of the CFP will be updated automatically and stored at the same URL as the original version. The PDF version will have to be re-downloaded with each revision.

Bidders may submit questions using the Additional Message textbox in the OGC COSI Program Contact Form. Question submitters will remain anonymous, and answers will be regularly compiled and published in the CFP clarifications.

A Bidders Q&A Webinar will be held on the date listed in the Master Schedule. The webinar is open to the public, but anyone wishing to attend must register using the provided link. Questions are due on the date listed in the Master Schedule.

Participant Selection and Agreements: Following the submission deadline, OGC will evaluate received proposals, review recommendations with Sponsors, and negotiate Participation Agreement (PA) contracts, including statements of work (SOWs). Participant selection will be complete once PA contracts have been signed with all Participants.

Kickoff: The Kickoff is a meeting where Participants, guided by the Initiative Architect, will refine the Initiative architecture and settle upon specific use cases and interface models to be used as a baseline for prototype component interoperability. Participants will be required to attend the Kickoff, including breakout sessions, and will be expected to use these breakouts to collaborate with other Participants and confirm intended Component Interface Designs.

Regular Telecons and Meetings After the Kickoff, participants will meet frequently via weekly telecons and in person at OGC Member Meetings.

Development of Deliverables: Development of Components, Engineering Reports, Change Requests, and other deliverables will commence during or immediately after Kickoff.

Under the Participation Agreement contracts, ALL Participants will be responsible for contributing content to the ERs, particularly regarding their component implementation experiences, findings, and future recommendations. But the ER Editor will be the primary author on the shared sections such as the Executive Summary.

More detailed deliverable descriptions appear under Types of Deliverables.

Final Summary Reports, Demonstration Event and Other Stakeholder Meetings: Participant Final Summary Reports will constitute the close of funded activity. Further development work might take place to prepare and refine assets to be shown at webinars, demonstration events, and other meetings.

Assurance of Service Availability: Participants selected to implement service components must maintain availability for a period of no less than six months after the Participant Final Summary Report milestone.

Appendix A: Geodatacubes Use Case B

A.1. A Destination Earth-Inspired GeoDataCube Use Case

A.1.1. B1: BACKGROUND

Following the recent introduction of the new EUMETSAT portfolio of Big Data Services that facilitate on-line access to data, visualizations and post-processing capabilities, EUMETSAT has constantly been investigating methods for making data more efficiently accessible and exploitable by its users. Among the results of these efforts, research work in the area of data cubes and analysis-ready data has been performed (source 1, source 2). Continuing on this track and also drawing inspiration from the services to be provided in the scope of the Destination Earth (detailed in the next section), EUMETSAT is proposing a GeoDataCube Use Case detailed in the remainder of this document.

A.1.2. B2: INTRODUCTION TO DESTINATION EARTH

The objective of the European Commission’s Destination Earth (DestinE) initiative is to deploy several highly accurate digital replicas of the Earth (Digital Twins) in order to monitor and simulate natural as well as human activities and their interactions, to develop and test “what-if” scenarios that would enable more sustainable developments and support European environmental policies.

DestinE consists of three main components:

  • DestinE Core Service Platform (DESP) under ESA responsibility; a user-friendly platform that provides a large number of users with evidence-based policy and decision-making tools, applications and services, based on an open, flexible, scalable and evolvable secure cloud-based architecture. It will federate data, cloud and HPC infrastructures and integrate access to an increasing number of Digital Twins as they become gradually available via related European Commission and national efforts. The platform will employ novel digital technologies for providing data analytics, Earth-system monitoring, simulation and prediction capabilities to its users. At the same time, it will allow users to customize the platform, integrate their own data and develop their own applications;

  • Digital Twin Engine (DTE) under ECMWF responsibility, running simulation models. The initial two Digital Twins are the Climate Change Adaptation and the Weather-Induced and Geophysical Extremes Twins;

  • Destination Earth Data Lake (DEDL) under EUMETSAT responsibility; provides data storage, access and big data processing capabilities according to a defined and evolving DestinE data portfolio offered to DestinE users. It provides users with a seamless access to datasets via APIs, regardless of data type and location. The DEDL big data processing uses, when possible, near-data processing to maximize performance. The DestinE data lake federates with existing data holdings such as Copernicus DIAS, ESA, EUMETSAT and ECMWF as well as complementary data from diverse data spaces like in-situ or socio-economic data.

Among the data access and processing services, the DEDL will provide predefined and on-demand geospatial data cubes (GDCs) to users, removing unnecessary time for searching, accessing, processing data and allow users to focus on the development and implementation of models and applications.

This document describes a typical GDC made of the datasets that will be accessible to DestinE users, as well as external sources. The goal is to benefit from the improved weather and climate variables provided by the Digital Twins for earth system modelling, considering physics-based models as well as machine learning. Rather than focusing on a specific application, the proposal draws inspiration from the analysis of some of the initial use cases to be considered for Destination Earth.

A.1.3. B3: PROPOSED GEODATACUBE STRUCTURE AND FORESEEN USAGE

Using geospatial and temporal data for modelling purposes may require a huge amount of time spent to prepare input data: discovering, accessing, retrieving data, co-registering datasets, applying spatial and temporal upscaling/downscaling operations. A wide range of sources are needed: remote sensing time series, meteorological and climate data, soil data, land use/land cover maps, socio-economic data, distributed by various providers, in various file formats and spatio-temporal definitions. Geospatial data cubes aim to shorten the data preparation time and make access more efficient.

This proposal focuses on two applications of the GDC:

  • Applying a hydrological model at a kilometric scale, such as the LISFLOOD model. The list of required input data can be found online.

  • Gap filling of statistics with machine learning and study of climate change adaptation options using historical weather archive and climate projections

While users require the same input data, the requirements in terms of spatial scale, temporal coverage and time-steps would largely differ from one user to the other.

A.1.4. B3.1: Conformance Of The GDC To Existing Standards

It would be beneficial that the proposed data cube complies to several standards ensuring its interoperability:

  • Using a standardized grid system, considering the work undergoing on Discrete Global Grid Systems Specification, would be particularly relevant for DestinE given the various resolution of input data and user requirements (link1, , link2, , link3)

  • Time and date definitions and timestamps should comply to ISO 8601, using UTC as a reference

  • A common definition of tiles can be foreseen. The WebMarctorQuad tiles concept could be used see as well here

  • Finally, variable definitions, names, acronyms, units should be as close as possible to some standards, e.g. WMO best practices for climate variables

A.1.5. B3.2: Data Cubes Dimensions

The GDC should be structured considering the following dimensions:

  • x, y (latitude, longitude),

  • t (time),

  • v (variables).

A.1.6. B3.3: Spatial Coverage

The proposal is to do an experiment for an area of interest centred on the Rhine-Meuse delta, in the Netherlands, an area exposed to a risk of floods associated to a high vulnerability in some places (see here; page 6). The region covers approximately 9340km2. The coordinates of the bounding box, in the three most commonly used projections, are listed below.

Sample area of interest
MINX MINY MAXX MAXY

EPSG:4326 WGS 84

3.65472

51.34007

5.27169

52.09142

EPSG:3857 Pseudo-Mercator

406841

6681670

586842

6816671

EPSG:3035 ETRS89-ext. / LAEA

3879479

3147235

3997194

3239098

The map projection used for the GDC is expected to be the Pseudo-Mercator (EPSG 3857), but users might want to use other projections. For the two applications foreseen, two spatial scales are considered: 1km grids for the hydrological model and the finest administrative level available at European scale, corresponding to Local Administrative Units (LAU). Note that in the case of the climate change adaptation study, including the gap filling of statistics, the application focuses on the extraction of a vector data cube where, for each administrative unit, the required data would be vectors of several variables over time (one vector for each administrative unit).

A.1.7. B3.4: Temporal Coverage

Two distinct temporal coverages are used for the proposed GDC:

  • Long historical archives of weather data, completed with climate projections. The goal is to cover a period of at least 30 years of historical data, and the climate projections for the next 30 years.

  • For the other datasets, and particularly remote sensing data, only the last two years of observations would be needed.

    Due to Destination Earth development timeline, the implementation of this use case shall rely on already existing data sources. Destination Earth shall also offer access to long historical archives of weather data, from Q3 2023 onwards.

A.1.8. B4: INPUT DATA

This section is divided according to the main categories of data identified:

  • Administrative and environmental spatial units (Vector maps), considered as static,

  • Static environmental data,

  • Land use/land cover maps,

  • Weather and climate data,

  • Remote sensing image time series,

  • Statistics (Statistics are not included as some layers in the data cube, but should be accessed by the user, querying statistical databases).

A.1.9. B4.1: Administrative and environmental spatial units

Vector maps of administrative units are intended to be used mainly for the spatial aggregation of the other data listed in this proposal, to query statistical databases, and to extract vector data cubes. The goal is to create, for each vector layer, a raster layer in the GDC with the unique identifiers of spatial units available.

Layer name Year Provider URL

FAO Global Administrative Areas

FAO

https://data.apps.fao.org/map/gsrv/gsrv1/gadm/wms - https://gadm.org/

NUTS Administrative Units

2021

Eurostat

https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts

Local Administrative Units (LAU)

2019

Eurostat

https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/lau#lau19

Grids - 1km (including population density)

2021

Eurostat

https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/grids/

* Eurostat data can also be reached following this url or using a rest API.

A.1.10. B4.2: Static Environmental Data

A few static environmental data, constant variables over time, are used:

Layer Spatial resolution Provider URL

Copernicus EU-DEM

25m

CLMS

https://land.copernicus.eu/imagery-in-situ/eu-dem/eu-dem-v1.1?tab=mapview

Soilgrids250m

250m

ISRIC

https://www.isric.org/explore/soilgrids

* CLMS: Copernicus Land Monitoring Service.

Soilgrids250m data are available through a WCS or a dav network disk ISRIC - Index of /soilgrids/latest/data/. The variables needed are listed below, with their corresponding link. For each variable, the data are available over 6 soil depth intervals:0-5cm, 5-15cm, 15-30cm, 30-60cm, 60-100cm, 100-200cm. Only the mean prediction should be retrieved. Thus, for each variable at each depth intervals, the layer containing the suffix "_mean" (mean prediction) is retrieved. (Note that only a subset of variables or depth interval can be selected for the experiment).

Variable name Short description WCS URLs

bdod

Bulk density of the fine earth fraction

https://maps.isric.org/mapserv?map=/map/bdod.map

cec

Cation Exchange Capacity of the soil

https://maps.isric.org/mapserv?map=/map/cec.map

cfvo

Volumetric fraction of coarse fragments (> 2 mm)

https://maps.isric.org/mapserv?map=/map/cfvo.map

clay

Proportion of clay particles (< 0.002 mm) in the fine earth fraction

https://maps.isric.org/mapserv?map=/map/clay.map

nitrogen

Total nitrogen (N)

https://maps.isric.org/mapserv?map=/map/nitrogen.map

phh2o

Soil pH

https://maps.isric.org/mapserv?map=/map/phh2o.map

sand

Proportion of sand particles (> 0.05 mm) in the fine earth fraction

https://maps.isric.org/mapserv?map=/map/sand.map

silt

Proportion of silt particles (≥ 0.002 mm and ≤ 0.05 mm) in the fine earth fraction

https://maps.isric.org/mapserv?map=/map/silt.map

soc

Soil organic carbon content in the fine earth fraction

https://maps.isric.org/mapserv?map=/map/soc.map

A.1.11. B4.3: Land Use And Land Cover Maps

Three distinct high-resolution datasets are used here:

  • The first is distributed by CLMS, called the high resolution layers, corresponding to global maps of specific land use classes at 10m resolution: 1. Imperviousness, 2. Tree Cover, 3. Grassland, 4. Water and wetness. Only the status maps for year 2018 are considered and can be retrieved.

  • The second is a global map of land use and land cover, Worldcover, distributed by the ESA through a WMTS mapping service.

  • The third data source is the Land Parcel Identification System (LPIS), identified in Dutch as BRP (Basisregistratie Gewaspercelen). These vector layers, available on a yearly basis, are reporting the cultivated crops for each field. Data for 2021 and 2022 are distributed in geopackages.

The only relevant field associated to the maps is the crop name (gewas).

A.1.12. B4.4: Weather and Climate Data

Regarding weather and climate data, as the Destination Earth Digital Twin outputs will be available from Q3 2023 onward, which is incompatible with the timeline of the Testbed-19 implementation, we propose alternatives from ERA5 weather reanalysis and CMIP6 climate change scenario. Only data at ground level (single level) would be necessary.

Dataset Spatial resolution Frequency Temporal coverage Provider URL

ERA5-Land hourly reanalysis - single level (surface)

9km

Hourly

1982 to present

CDS (Climate Data Store)

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=overview

Digital twins on Climate change adaptation

9km

Daily

Present to next 30 years

ECMWF

Provisional, no guarantee the simulations can be provided on-time

CMIP6 - SSP2-4.5- EC-Earth3-CC model

80km

Daily

2000-2049

CDS

https://cds.climate.copernicus.eu/cdsapp#!/dataset/projections-cmip6?tab=form

The table below lists the variables to be retrieved from ERA5 land. For the CMIP6 scenario, all variables would be needed (daily maximum near-surface air temperature, daily minimum near-surface air temperature, near-surface air temperature, near-surface specific humidity, precipitation, and sea level pressure).

ShortName Name Temporal aggregation

10u

10m_u_component_of_wind

mean

10v

10m_v_component_of_wind

mean

2d

2m_dewpoint_temperature

mean

2t

2m_temperature

min, mean, max

sp

surface_pressure

mean

ssrd

surface_solar_radiation_downwards

sum

tp

total_precipitation

sum

A.1.13. B4.5: Remote sensing

Remote sensing data of different sensors and processing levels are used here. As mentioned before, only the last two years of data would be required.

Dataset Spatial resolution Frequency Provider URL

Global 10-daily Leaf Area Index

300m

10-daily

CLMS

https://land.copernicus.eu/global/products/lai

Daily Surface Soil Moisture (SSM)

1km

Daily

CLMS

https://land.copernicus.eu/global/products/ssm

Sentinel 2 High resolution vegetation indices (LAI and QFLAG2)

10m

5-daily

CLMS/Wekeo

https://land.copernicus.eu/pan-european/biophysical-parameters/high-resolution-vegetation-phenology-and-productivity/data-access-hr-vpp - https://www.wekeo.eu/data?view=viewer

SENTINEL-2 Level-2A MSI

10m

5-daily

COAH (Copernicus Open Access Hub)

https://scihub.copernicus.eu/dhus/#/home

A.1.14. B4.6: Statistics

A non-exhaustive list of data sources is presented here. Statistical databases are not spatialized, can be incomplete, or requiring a harmonization/homogenization process and filtering out erroneous data or filling missing data. No datasets need to be retrieved and used for this proposal. The databases listed here are only provided as a supplementary information, eventually useful to define an API for GDCs interoperability. The main interest would be to ease the access to the available statistics on a region of interest, by defining a method within the GDC interoperable API allowing to query a database using an API provided by the statistical institutes, using the identifiers of the statistical units to retrieve data for the spatial coverage of the GDC (e.g. NUTS regions identifier, country identifier). Statistics are often provided in the SDMX format, and their access would be facilitated if any standard was defined for the interaction between SDMX, OGC web mapping services and GDC.

A.1.15. B4.6.1: Eurostat

Eurostat is the main provider of statistics considered. Some of the services for retrieving statistics were decommissioned in January 2023. Information about the current API are available online.

A.1.16. B4.6.2: UN-COMTRADE

UN COMTRADE (global annual and monthly trade statistics database) is a potential candidate for some applications, e.g. emergency response in case of a disaster disrupting the availability of goods or services, hampering the transport of goods, impacts of weather extremes on production and exchanges of goods.

UN COMTRADE data are distributed through an API.

A.1.17. B4.6.3: CBS - Local Statistics Of The Netherlands

Member states statistical institutes are the best sources to retrieve local statistics. For the Netherlands, official statistics are collected and distributed by CBS (Centraal Bureau voor de Statistiek). Local statistics are availble online at distributed though an ODATA REST API.

A.1.18. B5: PROCESSES AND WORKFLOWS

Consistent metadata should be maintained while users are using the GDC, keeping track of the data preparation workflow specific to each input dataset and variable, from the input data source up to the fine-tuned users' outputs, with a detailed description of each process, ideally including user-defined functions.

A.1.19. B5.1: Unit conversion

All input variables should be scaled on the fly, using the bias and offsets for remote sensing images and any scaling factor available. For meteorological variables, several unit conversions would be necessary. A non-exhaustive list of conversion could be: Kelvin degrees to Celsius degrees for air temperature, converting precipitations from meters to millimetres, solar radiation from J.m-2 to W.m-2. Any conversion would ideally be stored as a metadata.

A.1.20. B5.2: Environmental Static Data

One variable would be derived from the Copernicus DEM, the slope. This can be achieved using GDAL or the Richdem python package. The soil data could be used to derive soil hydraulic properties, for example, using the euptf2 R package.

A.1.21. B5.3: Weather And Climate Data

The following operations would be applied on ERA5 data:

  • Deriving relative humidity from the dewpoint, temperature and surface pressure.

  • The 10m u and v component of wind should be first extrapolated to 2m u and v component, and then converted to wind speed (m/s) and azimuthal direction. Those operations can be done using CDO - Climate Data Operators, or a python UDF function,

All variables are then aggregated at daily scale, using 4 different reducers: mean, min, max and sum, as mentioned in the table referring to ERA5 input variables. Minimum and maximum reducers are only applied to temperature (to derive minimum and maximum daily temperatures).

From the daily aggregates, the potential evapotranspiration is calculated (Penman-Monteith or another equation). Saga-GIS can be used for this purpose (link 1, link 2). Ideally, to limit the transfer of data and avoid downloading hourly variables, the cdstoolbox could be used.

A.1.22. B5.4: Remote Sensing

Several processes could be applied to remote sensing image time series. As high-resolution Sentinel 2 time series can be incomplete (cloudy data), users would have to process it to have consistent time series. This can be done via 2 ways:

  • By creating 10-daily or 15 daily composite keeping the maximum value of a vegetation index. In this case, the number of valid observations for the period considered is generally kept in the metadata.

  • Missing data can also be estimated by interpolating values between two observations, creating daily time-series. More complex processes can be applied for this purpose, such as a neural network to simulate missing data.

In a next step, the image time series can be smoothed with a moving average for example. Using the smoothed image time series, the user can finally proceed to a land use/land cover classification, using a neural network or a decision tree (considering LPIS data are not available in real-time). This would be achieved using Sentinel 2 full scenes.

A.1.23. B5.5: Spatial Aggregation By Land Use/Land Cover Class

The spatial aggregation processes applied to data can be simple, such as downscaling all input variables to a desired resolution (e.g. from 10m to 1km resolution).

More complex aggregations are foreseen to be used, such as aggregating (averaging) an indicator available at 10m resolution, on a specific land use and land cover class, to a 1km spatial resolution or at administrative level. In detail, a user may want to aggregate data only on urban areas or wheat cultivated fields, but may want to aggregate values for various land use land cover classes.

The common reducer is the mean, but various reducers could be used, such as the median, standard deviation, interquartile range etc. Geopandas groupby operations are a good example of such processes.

The LPIS data could be used to extract the mean LAI over wheat fields for each date of observations, and this process could be part of an entire workflow he will replicate in real time while new observations are arriving.

Spatial aggregation by land use/land cover class

A.1.24. B5.6: Access to web services, earth observation platforms, other data cubes

At this stage, or earlier in the workflow, the user should be able to query one of the statistical sources mentioned above and create a new layer of information in the data cube. Interactions with other data cubes and web mapping services should also be possible. Some examples of interactions foreseen are listed here:

  • Interact with various web mapping services (W*S servers); a typical example is the FAO CSW, or the layers distributed by the EEA through an ArcGIS rest API

  • Accessing STAC catalogues

  • Interact with other earth observation platforms, Google Earth Engine, JeoDPP, eventually through the openeo API or WPS services

  • Retrieve data from spatial databases (PostGIS, ArcGIS, etc…​) owned by the user or accessible to the user

A.1.25. B5.7: Output File Format

To enlarge the interoperability, particularly to physics-based models, the user may require to export the prepared data in a specific file format ready to be used to run a model. There could be various reasons explaining why a user would need to export the data: many models are programmed in Fortran and/or the user does not have the source code, making adapting the model to the GDC API more time consuming than just exporting files.

Regarding the output file formats, in the case of the implementation of a hydrological model, the user should be able to export COG tiffs, GRIB2 or NetCDF files, and choose the compression method (for GRIB2, a jpeg2000 compression would be preferred).

Regarding the second application, the study of adaptation to climate change after estimating missing statistics, users should be able to export a csv file, containing as header the identifier of the spatial units considered, and as variables their centroid (x,y), variables names and dates. Ideally, the resulting data should be available to other users through a W*S service or any other method.

Appendix B: Geodatacubes Use Case C

B.1. Case Study: Meteorological Data Cubes

B.1.1. C1: INTRODUCTION to ECMWF DATA CUBES

ECMWF is an intergovernmental organisation whose main responsibility is operational numerical weather prediction (NWP). ECMWF runs 4 operational forecast cycles per day, consisting of one high-resolution deterministic forecast and a 51-member ensemble forecast. The high-resolution deterministic forecast provides the best guess of the future atmospheric conditions, and forecasts to 10 days ahead. The ensemble forecasts are perturbed forecasts which represent the uncertainty in the known state of the atmosphere and model parameterisation. They are used to create probabilistic forecasts, and typically forecast up to 15 days ahead.

The users of this data are national weather services, commercial users, and the general public. A large portion of ECMWF data is served as Open Data. All forecast data output is also archived in the Meteorological Archival and Retrieval System (MARS), which currently contains over 400PiB of meteorological data.

As well as operational forecasts, ECMWF also conduct research and are involved in projects such as Copernicus and Destination Earth, which all produce data which is curated for semantic storage and access in the same systems.

All data at ECMWF, whether it is operational or research, is stored and accessed using a semantic data modelling language (the MARS language), and the entirety of the data is addressed as a single, multi-dimensional datacube. The granularity of the datacube is two-dimensional global fields, so latitude and longitude are not part of the datacube axes explicitly.

B.1.2. C1.1: Datacube Dimensionality

The dimensions of the ECMWF datacube include all the axes which describe our data. An example of a retrieval request using the MARS language is shown:

ecmwf1

This request highlights some key properties of the meteorological which are described below.

B.1.3. C1.2: Branching

The datacube does not have the same dimensionality in all directions. For example, choosing stream “enfo” for ensemble forecasts means that the user has selected a branch of the datacube which stores 51 ensemble members, and the “number” axis exists in this branch to label each ensemble member. For stream “oper”, for the high- resolution deterministic model, there is no “number” axis.

Likewise selecting levtype “sfc” (surface) instead of “pl” (pressure levels) would reduce the dimensionality of the datacube again, because there is only one surface level, and the “levelist” dimension disappears.

B.1.4. C1.3: Irregularity and Sparsity

Similarly, the datacube does not have the same length in all dimensions. For both deterministic (stream “oper”) and ensemble members (stream “enfo”) the step axis (the forecast timestep, or lead-time, in hours) exists, but it has a different length. The deterministic model forecasts to 10 days (step 240) but the ensemble forecasts forecast to 15 days (step 360) or more. Additionally, not every step is written: some forecasts will write output every 3rd step or every 6th step, and may change frequency at certain checkpoints. For example, the last 5 days of the ensemble forecasts produce data less frequently.

Fundamentally, the schema of the datacube is a tree, where the values specified for slower-moving axes recursively determine the structure (dimensionality and size) of the sub-datacubes. The overall datacube can be perceived as a tree of smaller datacubes, with different branches for different metadata axes; however, it is possible for the user to retrieve data from multiple branches at once, so this simplification does not help in conceptualizing an API. At the API level, we typically consider all axes part of the datacube regardless of how the data is stored.

B.1.5. C1.4: Latitude and Longitude

Our datacube stores two-dimensional global fields such that, currently, the granularity of the request does not include latitude and longitude. The main reason for this is that many of the global fields are not regular lat-lon cartesian grids, thus they cannot simply be expressed as cartesian dimensions in a datacube. The grids are typically octahedral grids with semi-unstructured topology:

ecmwf2

However, we support processing keys as part of the request which serve a similar purpose:

ecmwf3

These axes do not exist in the datacube itself, but the user does not need to care about the internal representation.

B.1.6. C1.5: Measurable, Countable and Metadata Axes

We make the distinction between measurable and non-measurable axes. A measurable axis is an axis where it physically makes sense to express a range of values. All of the temporal and spatial axes are measurable:

ecmwf4
Note that meteorological forecast data has two axes of time, the “date-time” combination which specifies the start time of the forecast, and the “step” which specifies the timestep of the model. The forecast produced on Monday at 00Z, at step 48, represents the same point in time as the forecast produced on Tuesday at 00Z, at step 24. Indeed, the same point in time is represented by many sequential cycles of the forecast. To fully specify the data, both time axes need to be given.

Other axes are non-measurable but still accept a range of values. For example, the ensemble number axis is non-measurable, but it is countable (it can be ordered and numbered using natural numbers). We allow range-based requests on countable axes:

ecmwf5

The remaining axes are non-measurable and uncountable and can be considered as unordered axes. Some of these keys can only accept single-value queries, others can be multi-valued (where the structure of their sub-datacubes are equal) and they take a list of values. Measurable and countable axes also allow a list of values, as well as single values and ranges.

ecmwf6

Conceptually, unordered axes which only accept a single value can be considered metadata for navigating to a branch of the datacube, rather than part of the query. However, it is important that a datacube access standard allows querying to more than just the common three spatial axes and temporal axis; it must support querying across any arbitrary measurable, countable axis or multi-valued metadata axis, using ranges or a list of values.

B.1.7. C1.6: Special Values and Default Values

Measurable and countable axes both take range-based values with a syntax of start/to/end/by/step, but there are other special values that can be used. For example, the “ALL” value retrieves all discrete points in a dimension. Certain metadata axes also take special values, for example, the “date” axis takes negative integers which refer to the latest forecasts at any given moment.

All axes also have sensible defaults, which will vary depending on the branch of the datacube selected (“enfo” defaults will be different to “oper” defaults). The following request is valid:

ecmwf7

B.1.8. C1.7: Return Types

The data is returned to the user as GRIB messages (1 message per field). A REST API with the ability to extract from a global lat-lon field may return in other formats in the future, and this should be standardised.

B.1.9. C1.8: Verbs

The requests shown above are for data retrieval requests. Similar requests are used for archival. This is beyond the scope of this document, as users are only typically retrieving data.

B.1.10. C2: NON-ORTHOGONAL DATA ACCESS

ECMWF has recently been developing new ways of interacting with our datacube to benefit our users. Our current API supports sub-setting our datacube by specifying ranges along orthogonal axes. The result of a retrieval request is always just a smaller datacube. For a user requiring data for a polygonal region, or a spatio-temporal corridor, delivering an orthogonal bounding datacube around the data of interest can be highly inefficient and places a significant post-processing burden on our users. As our forecast model resolution increases, this is posing a scalability challenge as well as a cognitive burden on our users.

We wish to support our users to allow them to express exactly the data they need. We want to allow the user to provide any n-dimensional polygon (a “polytope”) which describes their query across any measurable or countable axis, and then use this as a stencil to cut out from the datacube. This includes extracting data from non- cartesian lat-lon axes.

ecmwf8

The interface for this data access is not fully defined, and we would like to see if it could be defined as part of the GeoDatacubes standard. We are currently considering three variations on this API:

  • Low-level: the user provides raw n-dimensional convex polytope vertices which define the bounds of the request.

  • Mid-level: the user can construct requests using constructive geometry and simple shapes (Box, Sphere, Path, Polygon). The mid-level API converts the request into raw convex polytopes which are used with the low-level API.

  • High-level: the user can ask for common domain-specific features (time- series, vertical profiles, trajectories, frames). The high-level API converts the request into a request to the mid-level API.

This resultant data should be analysis-ready, with minimal further post-processing required by the user to integrate into their analysis. The return type should be some form of hierarchical data structure or point-cloud, and should be standardised and configurable by the user.

B.1.11. C3: USAGE EXAMPLES

For further context, it is useful to provide some example usage patterns of ECMWF meteorological data.

  • User A requests a vertical profile of multiple parameters for a specific area, from the latest forecast, for a particular step. This allows the user to see how the weather in their region is developing. This request crosses two axes (parameter and level) and also sub-sets in lat-lon.

  • User B requests multiple physical parameters from the latest forecast, for the entire forecast period, for a particular 3D region, in order to provide boundary conditions for a higher-resolution regional forecast. This request crosses three axes (parameter, step and level) and also sub-sets in lat-lon.

  • User C requests a rainfall parameter from the latest forecast, for 3 days ahead, from all ensemble members. The ensemble forecast predict a range of weather possibilities to account for uncertainties in the forecasting process. The user can estimate the likelihood of rain from this data. Due to the volume of the data, the user is likely to specify to re-grid this data to 1-degree resolution using the “grid” key. This request crosses one axis (ensemble number) and includes a processing directive.

  • User D requests global fields of several parameters from a range of different past forecasts and forecast lead-times. This allows the user to evaluate how the weather forecast has been changing as the lead-time decreases. This request crosses three axes (date/time, step and parameter). Note that this request crosses both axes of time.

  • User E requests several parameters over time and space corresponding to the path of a tropical cyclone in the Atlantic. This request crosses multiple axes (date-time, parameter, level) and also sub-sets in lat-lon. The bounding box of this data is very large, so the request may have to be split, with different lat-lon regions for different date-times, corresponding to the location of the tropical cyclone over time. This user would be more efficiently served by an API which supports non-orthogonal requests.

B.1.12. C4: SUMMARY OF REQUIREMENTS FOR A GEODATACUBE API

Our current data access is not implemented as a REST service yet. Given a suitable standard, we would like to offer this. The following is a summary of the requirements of a standard which would support our current datacube access:

  • Must support querying across any arbitrary measurable, countable, or multi- valued metadata dimension, for example across datetime, step, ensemble number, parameter. This includes querying across multiple axes of time. The names of these axes should not be hardcoded; they should be flexible to support different domains and different use-cases.

  • Must support axes with non-uniform spacing

  • Must take into account non-cartesian gridded data, which is significantly different to raster data

  • Must support processing directives for re-gridding, thinning, interpolating or other transformations of data

  • Must support domain-specific interpretation of special values in queries (e.g. “ALL”, “-1”), which apply some form of transformation to the input values.

  • Must scale to very large datasets, where it may take some time to serve the data. It is expected that this is through some form of asynchronous request design.

  • Must support optional keys, where the implementation can provide sensible defaults

  • Ideally, supports non-orthogonal data access requests

B.1.13. C5: PROPOSED TESTBED-19 USE-CASES 5.1

B.1.14. C5.1: Conventional Access

Given an n-dimensional datacube of meteorological data, which may be stored in a hierarchical object store such as FDB or some other datacube representation. We wish to serve data according to a user’s requests which contains a combination of measurable axes, countable axes, and metadata. The MARS language for requests above can be used as a basis, but sensible simplifications may be possible (such as merging date and time into a datetime object). The usage examples (A to D) above may serve as good examples for this type of access.

B.1.15. C5.2: Non-Orthogonal Access

Given an n-dimensional datacube of meteorological data, which may be stored in a hierarchical object store such as FDB or some other datacube representation. We wish to serve data according to a user’s requests which contain a combination of measurable axes, countable axes, metadata and n-dimensional geometric stencils (polytopes or equivalent). The usage example (E) above, may serve as a good example for this type of access.

Appendix C: Testbed Organization and Execution

C.1. Initiative Policies and Procedures

This initiative will be conducted within the policy framework of OGC’s Bylaws and Intellectual Property Rights Policy ("IPR Policy"), as agreed to in the OGC Membership Agreement, and in accordance with the OGC COSI Program Policies and Procedures and the OGC Principles of Conduct, the latter governing all related personal and public interactions.

Several key requirements are summarized below for ready reference:

  • Each selected Participant will agree to notify OGC staff if it is aware of any claims under any issued patents (or patent applications) which would likely impact an implementation of the specification or other work product which is the subject of the initiative. Participant need not be the inventor of such patent (or patent application) in order to provide notice, nor will Participant be held responsible for expressing a belief which turns out to be inaccurate. Specific requirements are described under the "Necessary Claims" clause of the IPR Policy.

  • Each selected Participant will agree to refrain from making any public representations that draft Engineering Report (ER) content has been endorsed by OGC before the ER has been approved in an OGC Technical Committee (TC) vote.

  • Each selected Participant will agree to provide more detailed requirements for its assigned deliverables, and to coordinate with other initiative Participants, at the Kickoff event.

C.2. Initiative Roles

The roles generally played in any OGC COSI Program initiative include Sponsors, Bidders, Participants, Observers, and the COSI Program Team. Explanations of the roles are provided in Tips for New Bidders.

The COSI Team for this Initiative will include an Initiative Director and an Initiative Architect. Unless otherwise stated, the Initiative Director will serve as the primary point of contact (POC) for the OGC.

The Initiative Architect will work with Participants and Sponsors to ensure that Initiative activities and deliverables are properly assigned and performed. They are responsible for scope and schedule control, and will provide timely escalation to the Initiative Director regarding any high-impact issues or risks that might arise during execution.

C.3. Types of Deliverables

All activities in this testbed will result in a Deliverable. These Deliverables generally take the form of Documents or Component Implementations.

C.3.1. Documents

Engineering Reports (ER) and Change Requests (CR) will be prepared in accordance with OGC published templates. Engineering Reports will be delivered by posting on the (members-only) OGC Pending directory when complete and the document has achieved a satisfactory level of consensus among interested participants, contributors and editors. Engineering Reports are the formal mechanism used to deliver results of the COSI Program to Sponsors and to the OGC Standards Program for consideration by way of Standards Working Groups and Domain Working Groups.

Tip

A common ER Template will be used as the starting point for each document. Various template files will contain requirements such as the following (from the 1-summary.adoc file):

The Executive Summary shall contain a business value statement that should describe the value of this Engineering Report to improve interoperability, advance location-based technologies or realize innovations.

Ideas for meeting this particular requirement can be found in the CFP Background as well as in previous ER content such as the business case in the SELFIE Executive Summary.

Document content should follow this OGC Document Editorial Guidance (scroll down to view PDF file content). File names for documents posted to Pending should follow this pattern (replacing the document name and deliverable ID): OGC Testbed-18: Aviation Engineering Report (D001). For ERs, the words Engineering Report should be spelled out in full.

C.3.2. Component Implementations

Component Implementations include services, clients, datasets, and tools. A service component is typically delivered by deploying an endpoint via an accessible URL. A client component typically exercises a service interface to demonstrate interoperability. Implementations should be developed and deployed in all threads for integration testing in support of the technical architecture.

Important

Under the Participation Agreement contracts, ALL Participants will be responsible for contributing content to the ERs, particularly regarding their component implementation experiences, findings, and future recommendations. But the ER Editor will be the primary author on the shared sections such as the Executive Summary.

Component implementations are often used as part of outreach demonstrations near the end of the timeline. To support these demos, component implementations are required to include Demo Assets. For clients, the most common approach to meet this requirement is to create a video recording of a user interaction with the client. These video recordings may optionally be included in a new YouTube Playlist such as this one for Testbed-15.

Tip

Videos to be included in the new YouTube Playlist should follow these instructions:

  • Upload the video recording to the designated Portal directory (to be provided), and

  • Include the following metadata in the Description field of the upload dialog box:

    • A Title that starts with "OGC Testbed-18:", keeping in mind that there is a 100-character limit [if no title is provided, we’ll insert the file name],

    • Abstract: [1-2 sentence high-level description of the content],

    • Author(s): [organization and/or individuals], and

    • Keywords: [for example, OGC, Testbed-18, machine learning, analysis ready data, etc.].

Since server components often do not have end-user interfaces, participants may instead support outreach by delivering static UML diagrams, wiring diagrams, screenshots, etc. In many cases, the images created for an ER will be sufficient as long as they are suitable for showing in outreach activities such as Member Meetings and public presentations. A server implementer may still choose to create a video recording to feature their organization more prominently in the new YouTube playlist. Another reason to record a video might be to show interactions with a "developer user" (since these interactions might not appear in a client recording for an "end user").

Tip

Demo-asset deliverables are slightly different from TIE testing deliverables. The latter don’t necessarily need to be recorded (though they often appear in a recording if the TIE testing is demonstrated as part of one of the recorded weekly telecons).

C.4. Proposal Evaluation

Proposals are expected to be brief, broken down by deliverable and precisely addressing the work items of interest to the bidder. Details of the proposal submission process are provided under the General Proposal Submission Guidelines.

Proposals will be evaluated based on criteria in two areas: technical and management/cost.

C.4.1. Technical Evaluation Criteria

  • Concise description of each proposed solution and how it contributes to achievement of the particular deliverable requirements described the Technical Architecture,

  • Overall quality and suitability of each proposed solution, and

  • Where applicable, whether the proposed solution is OGC-compliant.

C.4.2. Management/Cost Evaluation Criteria

  • Willingness to share information and work in a collaborative environment,

  • Contribution toward Sponsor goals of enhancing availability of standards-based offerings in the marketplace,

  • Feasibility of each proposed solution using proposed resources, and

  • Proposed in-kind contribution in relation to proposed cost-share funding request.

Note that all Participants are required to provide at least some level of in-kind contribution (costs for which no cost-share compensation has been requested). As a rough guideline, a proposal should include at least one dollar of in-kind contribution for every dollar of cost-share compensation requested. All else being equal, higher levels of in-kind contributions will be considered more favorably during evaluation. Participation may also take place by purely in-kind contributions (no cost-share request at all).

Once the proposals have been evaluated and cost-share funding decisions have been made, the COSI Team will begin notifying Bidders of their selection to enter negotiations to become and initiative Participant. Each selected bidder will enter into a Participation Agreement (PA), which will include a Statement of Work (SOW) describing the assigned deliverables.

C.5. Reporting

Participants will be required to report the progress and status of their work; details will be provided during contract negotiation. Additional administrative details such as invoicing procedures will also be included in the contract.

C.5.1. Monthly Reporting

The COSI Team will provide monthly progress reports to Sponsors. Ad hoc notifications may also occasionally be provided for urgent matters. To support this reporting, each testbed participant must submit (1) a Monthly Technical Report and (2) a Monthly Business Report by the first working day on or after the 3rd of each month. Templates and instructions for both of these report types will be provided.

The purpose of the Monthly Business Report is to provide initiative management with a quick indicator of project health from each participant’s perspective. The COSI Team will review action item status on a weekly basis with assigned participants. Initiative participants must remain available for the duration of the timeline so these contacts can be made.

C.5.2. Participant Final Summary Reports

Each Participant should submit a Final Summary Report by the milestone indicated in the Master Schedule. These reports should include the following information:

  1. Briefly summarize Participant’s overall contribution to the testbed (for an executive audience),

  2. Describe, in detail, the work completed to fulfill the Participation Agreement Statement of Work (SOW) items (for a more technical audience), and

  3. Present recommendations on how we can better manage future OGC COSI Program initiatives.

This report may be in the form of email text or a more formal attachment (at the Participant’s discretion).

Appendix D: Proposal Submission

D.1. General Proposal Submission Guidelines

This section presents general guidelines for submitting a CFP proposal. Detailed instructions for submitting a response proposal using the Bid Submission Form web page can be found in the Step-by-Step Instructions below.

Important

Please note that the content of the "Proposed Contribution" text box in the Bid Submission Form will be accessible to all Stakeholders and should contain no confidential information such as labor rates.

Similarly, no sensitive information should be included in the Attached Document of Explanation.

Proposals must be submitted before the deadline indicated in the Master Schedule.

Bidders responding to this CFP must be organizational OGC members familiar with the OGC mission, organization, and process.

Proposals from non-members or individual members will be considered provided that a completed application for organizational membership (or a letter of intent) is submitted prior to or with the proposal.

Tip

Non-members or individual members should make a note regarding their intent to join OGC on the Organizational Background page of the Bid Submission Form and include their actual Letter of Intent as part of an Attached Document of Explanation.

The following screenshot shows the Organizational Background page:

organizational background page
Figure 1. Sample Organizational Background Page

Information submitted in response to this CFP will be accessible to OGC and Sponsor staff members. This information will remain in the control of these stakeholders and will not be used for other purposes without prior written consent of the Bidder. Once a Bidder has agreed to become a Participant, they will be required to release proposal content (excluding financial information) to all initiative stakeholders. Sensitive information other than labor-hour and cost-share estimates should not be submitted.

Bidders will be selected for cost share funds on the basis of adherence to the CFP requirements and the overall proposal quality. The general testbed objective is to inform future OGC standards development with findings and recommendations surrounding potential new specifications. Each proposed deliverable should formulate a path for (1) producing executable interoperable prototype implementations meeting the stated CFP requirements and (2) documenting the associated findings and recommendations. Bidders not selected for cost share funds may still request to participate on a purely in-kind basis.

Bidders should avoid attempts to use the initiative as a platform for introducing new requirements not included in Technical Architecture. Any additional in-kind scope should be offered outside the formal bidding process, where an independent determination can be made as to whether it should be included in initiative scope or not. Out-of-scope items could potentially be included in another OGC IP initiative.

Each selected Participant (even one not requesting any funding) will be required to enter into a Participation Agreement contract ("PA") with the OGC. The reason this requirement applies to purely in-kind Participants is that other Participants will likely be relying upon their delivery. Each PA will include a Statement of Work ("SOW") identifying specific Participant roles and responsibilities.

D.2. Questions and Clarifications

Once the original CFP has been published, ongoing updates and answers to questions can be tracked by monitoring the CFP Corrigenda Table and the CFP Clarifications Table

Bidders may submit questions using the Additional Message textbox in the OGC COSI Program Contact Form. Question submitters will remain anonymous, and answers will be regularly compiled and published in the CFP clarifications.

A Bidders Q&A Webinar will be held on the date listed in the Master Schedule. The webinar is open to the public, but anyone wishing to attend must register using the provided link. Questions are due on the date listed in the Master Schedule.

D.3. Proposal Submission Procedures

The process for a Bidder to complete a proposal is essentially embodied in the online Bid Submission Form. Once this site is fully prepared to receive submissions (soon after the CFP release), it will include a series of web forms, one for each deliverable of interest. A summary is provided here for the reader’s convenience.

For any individual who has not used this form in the past, a new account will need to be created first. The user will be taken to a home page indicating the "Status of Your Proposal." If any defects in the form are discovered, this page includes a link for notifying OGC. The user can return to this page at any time by clicking the OGC logo in the upper left corner.

Any submitted bids will be treated as earnest submissions, even those submitted well before the response deadline. Be certain that you intend to submit your proposal before you click the Submit button on the Review page.

Important

Because the Bid Submission Form is still relatively new, it might contain some areas that are still brittle or in need of repair. Please notify OGC of any discovered defects. Periodic updates will be provided as needed.

Please consider making local backup copies of all inputs in case any need to be re-entered.

D.3.1. High-Level Overview

Clicking on the Propose link will navigate to the Bid Submission Form. The first time through, the user should provide organizational information on the Organizational Background Page and click Update and Continue.

This will navigate to an "Add Deliverable" page that will resemble the following:

proposal submission form AddDeliverable
Figure 2. Sample "Add Deliverables" Page

The user should complete this form for each proposed deliverable.

Tip

For component implementations having multiple identical instances of the same deliverable, the bidder only needs to propose just one instance. For simplicity, each bidder should just submit against the lowest-numbered deliverable ID. OGC will assign a unique deliverable ID to each selected Participant later (during negotiations).

On the far right, the Review link navigates to a page summarizing all the deliverables the Bidder is proposing. This Review tab won’t appear until the user has actually submitted at least one deliverable under the Propose tab first.

Tip

Consider regularly creating printed output copies of this Review page at various points during proposal creation.

Once the Submit button is clicked, the user will receive an immediate confirmation on the website that their proposal has been received. The system will also send an email to the bidder and to OGC staff.

Tip

In general, up until the time that the user clicks this Submit button, the proposal may be edited as many times as the user wishes. However, this initial version of the form contains no "undo" capability, so please use caution in over-writing existing information.

The user is afforded an opportunity under Done Adding Deliverables at the bottom of this page to attach an optional Attached Document of Explanation.

proposal submission form attached doc
Figure 3. Sample Dialog for an "Attached Document of Explanation"
Important

No sensitive information (such as labor rates) should be included in the Attached Document of Explanation.

If this attachment is provided, it is limited to one per proposal and must be less than 5Mb.

This document could conceivably contain any specialized information that wasn’t suitable for entry into a Proposed Contribution field under an individual deliverable. It should be noted, however, that this additional documentation will only be read on a best-effort basis. There is no guarantee it will be used during evaluation to make selection decisions; rather, it could optionally be examined if the evaluation team feels that it might help in understanding any specialized (and particularly promising) contributions.

D.3.2. Step-by-Step Instructions

The Propose link takes the user to the first page of the proposal entry form. This form contains fields to be completed once per proposal such as names and contact information.

It also contains an optional Organizational Background field where Bidders (particularly those with no experience participating in an OGC initiative) may provide a description of their organization. It also contains a click-through check box where each Bidder will be required (before entering any data for individual deliverables) to acknowledge its understanding and acceptance of the requirements described in this appendix.

Clicking the Update and Continue button then navigates to the form for submitting deliverable-by-deliverable bids. On this page, existing deliverable bids can be modified or deleted by clicking the appropriate icon next to the deliverable name. Any attempt to delete a proposed deliverable will require scrolling down to click a Confirm Deletion button.

To add a new deliverable, the user would scroll down to the Add Deliverable section and click the Deliverable drop-down list to select the particular item.

The user would then enter the required information for each of the following fields (for this deliverable only). Required fields are indicated by an asterisk ("*"):

  • Estimated Projected Labor Hours* for this deliverable,

  • Funding Request*: total U.S. dollar cost-share amount being requested for this deliverable (to cover burdened labor only),

  • Estimated In-kind Labor Hours* to be contributed for this deliverable, and

  • Estimated In-Kind Contribution: total U.S. dollar estimate of the in-kind amount to be contributed for this deliverable (including all cost categories).

Tip

There’s no separate text box to enter a global in-kind contribution. Instead, please provide an approximate estimate on a per-deliverable basis.

Cost-sharing funds may only be used for the purpose of offsetting burdened labor costs of development, engineering, documentation, and demonstration related to the Participant’s assigned deliverables. By contrast, the costs used to formulate the Bidder’s in-kind contribution may be much broader, including supporting labor, travel, software licenses, data, IT infrastructure, and so on.

Theoretically there is no limit on the size of the Proposed Contribution for each deliverable (beyond the raw capacity of the underlying hardware and software). But bidders are encouraged to incorporate content by reference where possible (rather than inline copying and pasting) to avoid overloading the amount of material to be read in each proposal. There is also a textbox on a separate page of the submission form for inclusion of Organizational Background information, so there is no need to repeat this information for each deliverable.

Important

A breakdown (by cost category) of the "Inkind Contribution" may be included in the Proposed Contribution text box for each deliverable.

However, please note that the content of this text box will be accessible to all Stakeholders and should contain no confidential information such as labor rates.

Similarly, no sensitive information should be included in the Attached Document of Explanation.

This field Proposed Contribution (Please include any proposed datasets) should also be used to provide a succinct description of what the Bidder intends to deliver for this work item to meet the requirements expressed in the Technical Architecture. This language could potentially include a brief elaboration on how the proposed deliverable will contribute to advancing the OGC standards baseline, or how implementations enabled by the specification embodied in this deliverable could add specific value to end-user experiences.

A Bidder proposing to deliver a Service Component Implementation can also use this field to identify what suitable datasets would be contributed (or what data should be acquired from another identified source) to support the proposed service.

Tip

In general, please try to limit the length of each Proposed Contribution to about one text page per deliverable.

Note that images cannot be pasted into the field Proposed Contribution textbox. Bidders should instead provide a link to a publicly available image.

A single bid may propose deliverables arising from any number of threads or tasks. To ensure that the full set of sponsored deliverables are made, OGC might negotiate with individual Bidders to drop and/or add selected deliverables from their proposals.

D.4. Tips for New Bidders

Bidders who are new to OGC initiatives are encouraged to review the following tips:

  • In general, the term "activity" is used as a verb describing work to be performed in an initiative, and the term "deliverable" is used as a noun describing artifacts to be developed and delivered for inspection and use.

  • The roles generally played in any OGC COSI Program initiative are defined in the OGC COSI Program Policies and Procedures, from which the following definitions are derived and extended:

    • Sponsors are OGC member organizations that contribute financial resources to steer Initiative requirements toward rapid development and delivery of proven candidate specifications to the OGC Standards Program. These requirements take the form of the deliverables described herein. Sponsors representatives help serve as "customers" during Initiative execution, helping ensure that requirements are being addressed and broader OGC interests are being served.

    • Bidders are organizations who submit proposals in response to this CFP. A Bidder selected to participate will become a Participant through the execution of a Participation Agreement contract with OGC. Most Bidders are expected to propose a combination of cost-sharing request and in-kind contribution (though solely in-kind contributions are also welcomed).

    • Participants are selected OGC member organizations that generate empirical information through the definition of interfaces, implementation of prototype components, and documentation of all related findings and recommendations in Engineering Reports, Change Requests and other artifacts. They might be receiving cost-share funding, but they can also make purely in-kind contributions. Participants assign business and technical representatives to represent their interests throughout Initiative execution.

    • Observers are individuals from OGC member organizations that have agreed to OGC intellectual property requirements in exchange for the privilege to access Initiative communications and intermediate work products. They may contribute recommendations and comments, but the COSI Team has the authority to table any of these contributions if there’s a risk of interfering with any primary Initiative activities.

    • Supporters are OGC member organizations who make in-kind contributions aside from the technical deliverables. For example, a member could donate the use of their facility for the Kickoff event.

    • The COSI Team) is the management team that will oversee and coordinate the Initiative. This team is comprised of OGC staff, representatives from member organizations, and OGC consultants. The COSI Team communicates with Participants and other stakeholders during Initiative execution, provides Initiative scope and schedule control, and assists stakeholders in understanding OGC policies and procedures.

    • The term Stakeholders is a generic label that encompasses all Initiative actors, including representatives of Sponsors, Participants, and Observers, as well as the COSI Team.

    • Suppliers are organizations (not necessarily OGC members) who have offered to supply specialized resources such as cloud credits. OGCs role is to assist in identifying an initial alignment of interests and performing introductions of potential consumers to these suppliers. Subsequent discussions would then take place directly between the parties.

  • Proposals from non-members or individual members will be considered provided that a completed application for organizational membership (or a letter of intent) is submitted prior to or with the proposal.

    • Non-members or individual members should make a note regarding their intent to join OGC on the Organizational Background page of the Bid Submission Form and include their actual Letter of Intent as part of an Attached Document of Explanation.

  • Any individual wishing to gain access to the Initiative’s intermediate work products in the restricted area of the Portal (or attend private working meetings / telecons) must be a member-approved user of the OGC Portal system.

  • Individuals from any OGC member organization that does not become an initiative Sponsor or Participant may still (as a benefit of membership) observe activities by registering as an Observer.

  • Prior initiative participation is not a direct bid evaluation criterion. However, prior participation could accelerate and deepen a Bidder’s understanding of the information presented in the CFP.

  • All else being equal, preference will be given to proposals that include a larger proportion of in-kind contribution.

  • All else being equal, preference will be given to proposed components that are certified OGC-compliant.

  • All else being equal, a proposal addressing all of a deliverable’s requirements will be favored over one addressing only a subset. Each Bidder is at liberty to control its own proposal, of course. But if it does choose to propose only a subset for any particular deliverable, it might help if the Bidder prominently and unambiguously states precisely what subset of the deliverable requirements are being proposed.

  • The Sponsor(s) will be given an opportunity to review selection results and offer advice, but ultimately the Participation Agreement (PA) contracts will be formed bilaterally between OGC and each Participant organization. No multilateral contracts will be formed. Beyond this, there are no restrictions regarding how a Participant chooses to accomplish its deliverable obligations so long as these obligations are met in a timely manner (whether a 3rd-party subcontractor provides assistance is up to the Participant).

  • In general, only one organization will be selected to receive cost-share funding per deliverable, and that organization will become the Assigned Participant upon which other Participants will rely for delivery. Optional in-kind contributions may be made provided that they don’t disrupt delivery of required, reliable contributions from the assigned Participants.

  • A Bidder may propose against any or all deliverables. Participants in past initiatives have often been assigned to make only a single deliverable. On the other hand, several Participants in prior initiatives were selected to make multiple deliverables.

  • In general, the Participant Agreements will not require delivery of any component source code to OGC.

    • What is delivered to OGC is the behavior of the component installed on the Participant’s machine, and the corresponding documentation of findings, recommendations, and technical artifacts contributed to Engineering Report(s).

    • In some instances, a Sponsor might expressly require a component to be developed under open-source licensing, in which case the source code would become publicly accessible outside the Initiative as a by-product of implementation.

  • Results of other recent OGC initiatives can be found in the OGC Public Engineering Report Repository.

Appendix E: Abbreviations

The following table lists all abbreviations used in this CFP.

AI

 Artificial Intelligence

CFP

Call for Participation

COSI

Collaborative Solutions and Innovation Progam

CR

Change Request

DDIL

Denied, Degraded, Intermittent, or Limited Bandwidth

DER

Draft Engineering Report

DWG

Domain Working Group

ER

Engineering Report

GPKG

GeoPackage

OGC

Open Geospatial Consortium

ORM

OGC Reference Model

OWS

OGC Web Services

NSG

 National System for Geospatial Intelligence

PA

Participation Agreement

POC

Point of Contact

Q&A

Questions and Answers

RM-ODP

Reference Model for Open Distributed Processing

SIF

Sensor Integration Framework

SOW

Statement of Work

SWG

Standards Working Group

TBD

To Be Determined

TC

OGC Technical Committee

TEM

Technical Evaluation Meeting

TIE

Technology Integration / Technical Interoperability Experiment

URL

Uniform Resource Locator

WFS

Web Feature Service

WPS

Web Processing Service

WG

Working Group (SWG or DWG)

Appendix F: Corrigenda & Clarifications

F.1. Corrigenda Table

The following table identifies all corrections that have been applied to this CFP compared to the original release. Minor editorial changes (spelling, grammar, etc.) are not included.

Table 3. Corrigenda Table
Section Description Date of Change

1.4 Master Schedule

Added location of in person kickoff meeeting (USA)

March 8, 2023

1.4 Master Schedule

Added Q&A Webinar link. Updated date and location of kickoff meeting.

March 28, 2023

D.3 Proposal Submission Procedures

Updated link for Bid Submission Form

April 4, 2023

1.4 Master Schedule

Updated timeline for M13 to be performed in person and remote during OGC iDays on Dec 7

May 5, 2023

2.5 Agile Reference Architecture & 3. Deliverables Summary

Added ARA component deliverables

May 5, 2023

F.2. Clarifications Table

The following table identifies all clarifications that have been provided in response to questions received from organizations interested in this CFP.

Please us this convenience link to navigate to the end of the table.

Table 4. Clarifications Table
Question Clarification

-- Pre-Release --

Q:

A:

F.3. End of Clarifications Table (convenience link)

.