ML4EO Training given by Radiant Earth. Designed to strengthen practitioners’ local capacity and skills in support of creating impactful machine learning applications
The SpatioTemporal Asset Catalog (STAC) community is pleased to announce the release of version 1.0.0
The rapid advance of digital technologies has significantly impacted education in recent years. It is evident that the increasing digitization of the economy and society will require students to become comfortable with technology to prepare for the future. In turn, this also requires teachers to be supported to develop the skills and knowledge required to fully utilize the capabilities of technology, whether in the classroom or in a hybridized model that utilizes distributed online learning
Hatfield’s satellite deep learning project to monitor the habitat of Canada’s Woodland Caribou is featured by the Canadian Space Agency as part of their Biodiversity Day spotlight.
Technological advances happen quick and now with cloud infrastructures we have the unprecedented means to make such deep integration possible. However, transforming an established operational setup, such as was developed and used for the Global Land Service over the years, to another completely new and technological challenging cloud computing environment is not a trivial job. Especially considering that many production chains need to be decomposed into modular bits and pieces which then have to be newly forged into a smooth and fully integrated machinery to provide the user with a transparent, yet integrated, set of tools. The scope of this report is to tackle exactly this: providing clear suggestions for an efficient ‘cloudification’ of the Copernicus global land production lines and user interfaces, and investigating if there is a tangible benefit and what would be the effort involved.
The core tools of science (data, software, and computers) are undergoing a rapid and historic evolution, changing what questions scientists ask and how they find answers. Earth science data are being transformed into new formats optimized for cloud storage that enable rapid analysis of multi-petabyte datasets. Datasets are moving from archive centers to vast cloud data storage, adjacent to massive server farms. Open source cloud-based data science platforms, accessed through a web-browser window, are enabling advanced, collaborative, interdisciplinary science to be performed wherever scientists can connect to the internet. Specialized software and hardware for machine learning and artificial intelligence (AI/ML) are being integrated into data science platforms, making them more accessible to average scientists. Increasing amounts of data and computational power in the cloud are unlocking new approaches for data-driven discovery. For the first time, it is truly feasible for scientists to bring their analysis to data in the cloud without specialized cloud computing knowledge. This shift in paradigm has the potential to lower the threshold for entry, expand the science community, and increase opportunities for collaboration while promoting scientific innovation, transparency, and reproducibility. Yet, we have all witnessed promising new tools which seem harmless and beneficial at the outset become damaging or limiting. What do we need to consider as this new way of doing science is evolving?
Trent Kershaw, Program Director at Digital Earth Australia, and Brian Killough, NASA Langley Research Center, CEOS Systems Engineering Office, discuss how the Open Data Cube came about and how it’s being used around the world.
The STAC Community’s plan to get to 1.0.0 in early 2021.
Serverless computing is increasingly popular because of the promise of lower cost and the convenience it provides to users who do not need to focus on server management. This has resulted in the availability of a number of proprietary and open-source serverless solutions. We seek to understand how the performance of serverless computing depends on a number of design issues using several popular open-source serverless platforms. We identify the idiosyncrasies affecting performance (throughput and latency) for different open-source serverless platforms. Further, we observe that just having either resource-based (CPU and memory) or workload-based (request per second (RPS) or concurrent requests) auto-scaling is inadequate to address the needs of the serverless platforms.
Scientific data has traditionally been distributed via downloads from data server to local computer. This way of working suffers from limitations as scientific datasets grow towards the petabyte scale. A “cloud-native data repository,” as defined in this paper, offers several advantages over traditional data repositories—performance, reliability, cost-effectiveness, collaboration, reproducibility, creativity, downstream impacts, and access & inclusion. These objectives motivate a set of best practices for cloud-native data repositories: analysis-ready data, cloud-optimized (ARCO) formats, and loose coupling with data-proximate computing. The Pangeo Project has developed a prototype implementation of these principles by using open-source scientific Python tools. By providing an ARCO data catalog together with on-demand, scalable distributed computing, Pangeo enables users to process big data at rates exceeding 10 GB/s. Several challenges must be resolved in order to realize cloud computing’s full potential for scientific research, such as organizing funding, training users, and enforcing data privacy requirements.