Hatfield’s satellite deep learning project to monitor the habitat of Canada’s Woodland Caribou is featured by the Canadian Space Agency as part of their Biodiversity Day spotlight.
Technological advances happen quick and now with cloud infrastructures we have the unprecedented means to make such deep integration possible. However, transforming an established operational setup, such as was developed and used for the Global Land Service over the years, to another completely new and technological challenging cloud computing environment is not a trivial job. Especially considering that many production chains need to be decomposed into modular bits and pieces which then have to be newly forged into a smooth and fully integrated machinery to provide the user with a transparent, yet integrated, set of tools. The scope of this report is to tackle exactly this: providing clear suggestions for an efficient ‘cloudification’ of the Copernicus global land production lines and user interfaces, and investigating if there is a tangible benefit and what would be the effort involved.
The core tools of science (data, software, and computers) are undergoing a rapid and historic evolution, changing what questions scientists ask and how they find answers. Earth science data are being transformed into new formats optimized for cloud storage that enable rapid analysis of multi-petabyte datasets. Datasets are moving from archive centers to vast cloud data storage, adjacent to massive server farms. Open source cloud-based data science platforms, accessed through a web-browser window, are enabling advanced, collaborative, interdisciplinary science to be performed wherever scientists can connect to the internet. Specialized software and hardware for machine learning and artificial intelligence (AI/ML) are being integrated into data science platforms, making them more accessible to average scientists. Increasing amounts of data and computational power in the cloud are unlocking new approaches for data-driven discovery. For the first time, it is truly feasible for scientists to bring their analysis to data in the cloud without specialized cloud computing knowledge. This shift in paradigm has the potential to lower the threshold for entry, expand the science community, and increase opportunities for collaboration while promoting scientific innovation, transparency, and reproducibility. Yet, we have all witnessed promising new tools which seem harmless and beneficial at the outset become damaging or limiting. What do we need to consider as this new way of doing science is evolving?
Trent Kershaw, Program Director at Digital Earth Australia, and Brian Killough, NASA Langley Research Center, CEOS Systems Engineering Office, discuss how the Open Data Cube came about and how it’s being used around the world.
The STAC Community’s plan to get to 1.0.0 in early 2021.
Serverless computing is increasingly popular because of the promise of lower cost and the convenience it provides to users who do not need to focus on server management. This has resulted in the availability of a number of proprietary and open-source serverless solutions. We seek to understand how the performance of serverless computing depends on a number of design issues using several popular open-source serverless platforms. We identify the idiosyncrasies affecting performance (throughput and latency) for different open-source serverless platforms. Further, we observe that just having either resource-based (CPU and memory) or workload-based (request per second (RPS) or concurrent requests) auto-scaling is inadequate to address the needs of the serverless platforms.
Scientific data has traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow towards the petabyte scale. A “cloud-native data repository,” as defined in this paper, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access & inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing’s full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.
An international team of scientists has used artificial intelligence and commercial satellites to identify an unexpectedly large number of trees spread across arid and semi-arid areas.
USGS Landsat has just released Collection 2, a major upgrade to the Landsat archive that improves accuracy, improves cloud compatibility and usability, and expands access to standard higher-level products
GEO Analytics Canada has been featured in the September 2020 ’Association Québécoise de Télédétection (L’AQT) Bulletin.