Demonstration Platform Documentation
An Orientation to Cloud Computing for Earth Scientists
Cloud computing is service based computing and enables EO experts to focus their efforts on transforming EO data to useful information, rather than on configuring and managing computing systems.
With cloud computing we refer to tiers of services:
- Infrastructure as a Service (IaaS) – e.g. Virtual Machine providing computing resources
- Platform as a Service (PaaS) – e.g. Virtual Machine pre-loaded with useful tools and features
- Software as a Service (SaaS) – e.g. Software accessed through a web system,
- Data as a Service (DaaS) – e.g. EO data accessed through a web interface
A fundamental feature of cloud computing is elasticity – the ability to create a virtual machine on-demand, scale up or down computing resources as needed, and pay for only what you use. With EO datasets are becoming larger with intensive computation needed for data processing, cloud computing provides processing power to complete a massive analytical process and storage when it is needed.
It is important to understand that simply moving your data and existing analytical approaches and tools to a virtual server in the cloud is not cloud computing. The benefits of cloud computing are realized through building of cloud native systems and workflows from the ground up.
Important features of cloud native geospatial platforms include:
- Cloud native formats – these formats enable more efficient workflows on the cloud. The Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that lets clients ask for just the portions of a file that they need.
- Object storage – removes the complexity and scalability challenges of a hierarchical file system with folders and directories. The pay-as-you-go model can be much more cost effective especially if satellite EO data are provided as DaaS by the platform. Users need to consider egress fees when retrieving data from the public cloud.
- Metadata and search – when a user wants to search for all the imagery in their area and time of interest, using conventional technology they can’t make just one search — they have to use different tools and connect to API’s that are similar but all slightly different. The SpatioTemporal Asset Catalog (STAC) specification provides a common language to describe a range of geospatial information, so it can more easily be indexed and discovered. STAC aims to make that much easier, by providing common metadata to expose geospatial assets.
- Scaling with containers and Kubernetes – Docker containers are a standard approach to bundle and run applications. Kubernetes provides a framework to manage the deployment and scaling of containers. Dask was developed to scale Python packages and the Python ecosystem to multi-core machines and distributed clusters.