What we’re reading

Understanding Open Source Serverless Platforms: Design Considerations and Performance

Serverless computing is increasingly popular because of the promise of lower cost and the convenience it provides to users who do not need to focus on server management. This has resulted in the availability of a number of proprietary and open-source serverless solutions. We seek to understand how the performance of serverless computing depends on a number of design issues using several popular open-source serverless platforms. We identify the idiosyncrasies affecting performance (throughput and latency) for different open-source serverless platforms. Further, we observe that just having either resource-based (CPU and memory) or workload-based (request per second (RPS) or concurrent requests) auto-scaling is inadequate to address the needs of the serverless platforms.

Cloud-Native Repositories for Big Scientific Data

Scientific data has traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow towards the petabyte scale. A “cloud-native data repository,” as defined in this paper, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access & inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing’s full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.

Counting Trees in Africa’s Drylands

An international team of scientists has used artificial intelligence and commercial satellites to identify an unexpectedly large number of trees spread across arid and semi-arid areas.

USGS Landsat release cloud-enable Collection 2

USGS Landsat has just released Collection 2, a major upgrade to the Landsat archive that improves accuracy, improves cloud compatibility and usability, and expands access to standard higher-level products

UK Parliament Report: Remote sensing and machine learning

There is increasing interest in using machine learning to automatically analyse remote sensing data and increase our understanding of complex environmental systems. While there are benefits from this approach, there are also some barriers to its use. This POSTnote examines the value of these approaches, and the technical and ethical challenges for wider implementation.

NASA Datasets Available in Cloud Optimized GeoTIFFs

In collaboration with the Amazon Web Service (AWS) Public Dataset Program, NASA has made the following datasets available in Cloud Optimized GeoTIFF (COG) format for the COVID-19 Space Apps Challenge.

GEO Virtual Symposium 2020

The Group on Earth Observations (GEO) Virtual Symposium 2020 will be held from June 15-19, 2020. The global GEO community will benefit from a series of interactive webinars that will provide in-depth discussions from experts on a range of relevant issues to the GEO Work Programme Flagships, Initiatives and Activities. Session topics will focus on the first year of 2020-2022 GEO Work Programme, focusing on strengthening the capability of GEO Work Programme activities to implement their plans effectively.

Video: Landsat Data is Moving to the Cloud

The Landsat series of Earth-observing satellites has been continuously acquiring land surface imagery since 1972. Over 8.5 million Landsat scenes are currently available for download. Soon it will all be accessible from a cloud environment, in a cloud optimized format that gives you more flexible, customized access. In the past, users could spend 80% of their time downloading and processing files. With Landsat in the cloud, you get direct access to big data without the big files and big headaches.

NASA to launch 247 petabytes of data into AWS – but forgot about egress costs

Audit finds error could mean less data flows to users unless agency pays up for downloads

OGC Testbed 15: Innovating Geospatial Data Processing, Analysis, and Visualization

The Open Geospatial Consortium (OGC) has published the outcomes of 2019’s biggest research and development initiative, Testbed-15. The key outcomes, including detailed Engineering Reports, overview presentations, and videos, are freely available on the Testbed-15 webpage. Testbed-15 advanced research across the following technologies:
– Earth Observation (EO) data models, applications, catalogues, and process discovery.
– Data Security in a geospatial environment using encrypted containers.
– Federated Cloud Environments incorporating OGC Open Web Services.
– Secure Delta Updates to geospatial data in Denied, Disrupted, Intermittent, and Limited (DDIL) situations.
– An Open Portrayal Framework and APIs for sharing portrayals of geospatial content.
– Machine Learning models and outputs integrating with OGC Open Web Services.

Scroll to top