Data Analytics

Challenge 1: Unraveling Hidden Order and Dynamics in a Heterogeneous Ferroelectric System Using Machine Learning

Ferroelectrics are materials that have spontaneous electric polarization that can be reversed by the application of an external electric field. Real materials are not pure and possess various defects. In the presence of heterogeneities, in addition to the global order-parameter, there are additional manifestations on the local order-parameter, that lead to hidden order in the material.

We are specifically interested in discovering such ‘hidden’ order from molecular dynamics simulations and correlate them with the type of heterogeneities present in the simulation. We would like to ascertain how this order influences not just the memory function, but also its ‘dynamics’ under externally applied field. Dynamic control of memory is the basis of ferroelectric-based neuromorphic materials.

Challenge 2: Finding Novel Links in COVID-19 Knowledge Graph

The scientific literature is expanding at incredible rates, which were recently estimated to be in the millions of new articles per year. Extracting information from such vast stores of knowledge is an urgent need, as exemplified by the recent open release of materials relevant to the current SARS-CoV-2 pandemic. In this context, this challenge seeks to develop algorithms for the analysis and mining of knowledge graphs. The main task in this challenge is to leverage a graph of biomedical concepts related to COVID-19 and the relations between them to try to discover novel, plausible relations between concepts. For this challenge, the participants will be provided with a graph dataset of biomedical concepts and relations between them extracted from scientific literature, along with all-pairs shortest path information between the concepts in the graph. They will be asked to analyze the data and use it to predict which concepts will form direct novel relations in the future. In addition, they will be asked to rank the predicted links according to the predicted importance of each relation.

Challenge 3: Synthetic-to-Real Domain Adaptation for Autonomous Driving

The ultimate ambition of autonomous driving is that it would be able to transport us safely to every corner of the world in any condition. However, a major roadblock to this ambition is the inability of machine learning models to perform well across a wide range of domains. For instance, a computer vision model trained on data from sunny California will have diminished performance when tested on data from snowy Michigan. The goal of this challenge is to develop innovative techniques to enhance domain adaptation capabilities. The accompanying dataset spans both synthetic and real-world domains to present a unique challenge to the participant.

Challenge 4: Analyzing Resource Utilization and User Behavior on Titan Supercomputer

Resource utilization statistics of submitted jobs on a supercomputer can help us understand how users from various scientific domains use HPC platforms and better design a job scheduler. We explore to generate insight regarding workload distribution and usage patterns domains from job scheduler trace, GPU failure information, and project-specific information collected from Titan supercomputer. Furthermore, we want to know how the scheduler performance varies over time and how the users’ scheduling behavior changes following a system failure. These observations have the potential to provide valuable insight, which is helpful to prepare for system failures. These practices will help us develop and apply novel machine learning algorithms in understanding system behavior, requirement, and better scheduling of HPC systems.

Challenge 5: Sustainable Cities: Socioeconomics, Building Types, and Urban Morphology

In urban environments, demographic, and infrastructural characteristics co-evolve and together determine risks, vulnerability and resilience. Infrastructure systems such as energy and water determine many environmental risks and provide access to various essential services. These risks and benefits are transferred across long distances and differentially across demographic and socioeconomic subgroups. Additionally, urban environments have significant effects on public health and population level resilience, especially to extreme events such as heat waves. However, interactions among urban microclimate, urban morphology, socioeconomic heterogeneity, and anthropogenic activities are not well understood. To begin to understand these interactions, our team has developed three new datasets for the Las Vegas Metropolitan Statistical Area, and we challenge the participants to combine these data sets (and other relevant data of participants’ choosing) to answer our challenge questions.

Challenge 6: Where to go in the Atomic world

Scanning Probe and Electron microscopes are the tools that opened the atomic world for exploration by providing the beautiful images of metals, oxides, semi- and superconductors with atomic resolution. Currently, both the scanning probe and transmission electron microscopy (SPM and STEM) fields almost invariably rely on classical rectangular raster scans in which the beam or the probe rapidly traverses the surface in the fast scan direction, and slowly shifts in the perpendicular direction forming the slow scan direction. This scanning mode offers both the advantage of easy implementation and yields data in the form of 2D maps that can be readily interpreted by a human eye.

However, the rectangular scanning is inefficient from the information theory point of view, since the interesting information is often concentrated in a small number of regions on the sample surface. Hence, beyond rectangular scanning becomes a key. prerequisite for the AE aimed at structural discovery, minimizing the surface damage, or attempting controlled modification of the surface. The possible paradigms for AE in this case are summarized in Figure 1. Ultimately, we can envision the freeform scanning approaches, where the direction and velocity of the probe motion are updated based on previously detected information. However, given the latencies of SPM and STEM imaging, this will necessitate development of specialized light algorithms and edge computations, as discussed below. On a shorter time frame, adapting the parameters of the predefined waveform, e.g., pitch of the spiral or line density in rectangular sub scans. offers a more practical alternative.

Edge Computing

Challenge 7: Increased Image Spatial Resolution for Neutron Radiography

Neutron radiography (nR) is a technique that is used for a broad range of applications such as energy materials, engineering, geomaterials (rocks, plants and soil), biology, and archeology. A sample placed in front of a 2D detector is illuminated with neutrons and a shadowgraph is measured based on the neutrons that transmitted through the sample. Like most imaging techniques, domain applications require increased spatial resolution beyond what is currently and routinely achievable today. New detector technologies offer the possibility to increase spatial resolution by reducing the effective pixel size. More specifically, advanced analysis is used to precisely locate the position of impact of a detected neutron (also called neutron event). The challenge here is to develop a novel method to better resolve the neutron position for the Timepix 3 detector, hence increasing the spatial resolution of the radiograph. Specifically, the goal of this challenge it to resolve features that are smaller than 25-50 µm.

Challenge 8: High Dimensional Active Learning for Microscopy of Nanoscale Materials

Atomic force microscopy (AFM) is a premiere research tool utilized to explore material behavior at the nanoscale and has been a mainstay of nanoscience in the past three decades. It consists of a tip at the end of a cantilever that interacts with a sample to derive information on the sample properties and correlate the functional properties to microstructural features of the sample. Usually, in addition to regular raster-based scanning for high-resolution images of topography, the AFM enables individual point-based spectroscopy, where stimulus is applied to the tip or sample or environment and the response of the material is measured locally. Typically, the spectroscopy can be time consuming, as each pixel can take from ~0.1 to ~10s to acquire, and this has to be repeated across a grid of points to determine the response variability on spatially heterogeneous samples. One example is in measuring the relaxation of piezoelectric response in ferroelectric materials, as shown in our recent work (Kelley et al. npj Computational Materials 6, 113 (2020)).

Challenges 2021