The 2018 Smoky Mountains Computational Sciences and Engineering Conference (SMC18) is hosting its second annual Data Challenge. For this event, we have enlisted research scientists from across Oak Ridge National Laboratory (ORNL) to be data sponsors and help create data analytics challenges for eminent data sets at the laboratory. The role of our data sponsors is to provide a significant data set and formulate 3 to 5 challenge questions associated with the data set they provide. The challenge questions for each data set will cover multiple difficulty levels, with the first question in each challenge being suitable for a novice, and each question thereafter increasing in difficultly, with the series of questions ending with an advanced/expert level challenge question. These challenges are intended to draw scientists and researchers who may be at the beginning stages of incorporating data analytics into their workflow, to data analytics experts who are interested in applying novel data analytics techniques to data sets that are of national importance. This year there are five data challenges, and a team of up to 4 members may take on any of these challenges.
To participate in this year’s SMC18 Data Challenge, register your team and select a challenge! The top two teams from each challenge will be selected to present at SMC18 where the overall winner will be selected.
To answer the challenge please submit a paper describing your solution of no more than 5 pages in length with pictures as well as a 3-minute narrated video describing your solutions. Detailed instructions for submissions can be found on the submissions page.
The Challenge will be open from May 14th to July 31st. Papers and videos are due by 5:00 PM EDT on July 31st. The top teams will be notified by August 7. Winning teams will be asked to present a poster at SMC 2018 of:
Experts in neutron scattering methodologies, instrumentation, and analysis coding aimed at exploring the properties of materials, this group of scientists enjoys peeling back the readily-understood “outer” layer of information to reveal patterns in materials’ internal structures. Thomas Proffen, Garrett Granroth, Christina Hoffmann, Pete Peterson, and Ross Whitfield use diffuse scattering methodologies to discover how a material’s structure supports its properties and functions. The method is especially effective with less-ordered, “chaotic” materials marked by atomic vacancies, mismatches, and other irregularities. Examining the range of variability in a material’s structure is best facilitated by high-performance computing, the researchers believe, because neutron scattering instruments generate large multidimensional datasets in time, space, and temperature dimensions, and the possibilities for interpretation are almost infinite without the computational capacity to narrow the choices in smart and systematic ways. This is their second year being Data Challenge sponsors.
The group has donated a dataset developed in collaboration with scientists at Laboratoire Léon Brillouin, CEA Saclay, France. The dataset was compiled from characterizations of a magnetic superconducting material—strontium-14, copper-24, oxygen-41, or Sr14Cu24O41—known informally as telephone number compound. Discovered as a byproduct of cuprate synthesis and examined since the 1980s with x-ray, electron, and neutron scattering, as well as Raman and other microscopic methods, the compound is still mysterious. Its intertwined-lattice structure gives rise to its acting as a charge density wave insulator or antiferromagnetic in some circumstances and as a superconductor in other circumstances, and despite intense research activity there is still much to be learned. Understanding how its structural features affect one another and how features contribute to function can be challenging, however, because current methods for analyzing data generated with diffuse scattering technologies are slow and difficult. Through their participation in this year’s Data Challenge, Thomas, Garrett, Christina, Pete, and Ross hope to gain insight into new approaches to make data analysis quicker and easier. The researchers recognize additional benefits: The Data Challenge puts scientists in touch with skilled data analysts, and the dialogue spurred by this intersection of two increasingly interconnected camps helps scientists learn how to discuss their projects with a nontechnical audience. The researchers have implemented some of the ideas returned from last year’s Data Challengers.
Jibonananda (Jibo) Sanyal, Melissa Allen, and Joshua New have worked together on numerous projects, applying modeling and simulation techniques to questions that lie at the intersection of environment and urban infrastructure. Their backgrounds in computer science and climate science are complementary yet different enough that each can bring unique insights to solve project-related challenges. Their current collaboration, the Urban Exascale Computing Project, reflects the fact that urban dynamics necessarily involves a human element: People wake up in a building each day, take some mode of transportation to work and school, typically spend the day in another building, and then trace their way back home. Jibo, Melissa, and Josh apply methods such as building energy modeling, transportation simulations, and weather and atmospheric data analysis to examine how the needs of human beings shape the built environment and how the built environment affects weather and climate. Their investigations generate vast amounts of data, and using high-performance computing tools allows them to run simulations and develop observations much more quickly than with traditional data analysis.
The group’s dataset was generated under a Laboratory Directed Research and Development project aimed at developing an urban microclimate and energy planning tool. For the project, Melissa and Josh analyzed the relationships among climate, structures, land cover, and energy use to close capability gaps in neighborhood-level modeling and simulation, future energy use projections, and visualization tools. The researchers are excited to be first-time data sponsors for this year’s Data Challenge. Their dataset contains one month of weather data taken at 15-minute intervals in a section of downtown Chicago; the latitude/longitude location for each building in the study area, its 2D footprint, and height; and a year’s energy simulation output for each building. Jibo, Melissa, and Josh look forward to being presented with novel methods for interpreting and visualizing their data, and they hope that participants will find interesting outliers in the dataset. The group hopes participants enjoy the interdisciplinary nature of their dataset and the challenges reflected in it.
Since 2012, Alex Belianinov and Stephen Jesse have worked together on numerous projects using scanning probe microscopy. They both have an interest in utilizing data-intensive methods on materials data, building novel instrumentation, and developing techniques. The researchers recognize that today’s new characterization methods generate a large volume of data, and they believe that physical scientists need to embrace big, deep, smart data, using the power of high-performance computing to extract real physical insight from data, as opposed to simply generating data by taking measurements. Alex and Stephen are self-taught in data analysis techniques, but participation in the Data Challenge is exciting, from their perspective, because the contest allows them to interact with people who look at problems from a different, data-driven perspective. They are encouraged by the fact that Data Challengers can pick up a problem without having domain expertise of the underlying research and deliver real insights based on what the data reveal. The researchers also enjoy learning about other data sponsors’ datasets and challenges. This is Alex and Stephen’s second year being Data Challenge sponsors.
Alex and Stephen have donated a dataset related to a synthetic material called lithium niobate. Due to its structural qualities, this manmade material is used to manufacture components in cell phones, sound transducers, optics, and amplifiers. Studied for approximately 50 years, lithium niobate remains interesting for researchers seeking to understand why it behaves as it does and how it responds under various real-life conditions. Advancing this fundamental knowledge would pave the way to developing similar materials that are environmentally friendly, have better capabilities in industrial applications, and are easier to make. By submitting the lithium niobate dataset to this year’s Data Challenge, Alex and Stephen hope to gain perspective about new ways to analyze and visualize materials data in general and, specifically, to find out what answers the data might reveal about lithium niobate’s chemical behavior at the microscale.
Robert Patton’s publications mining research empowers researchers to track trends within their fields by providing a machine learning-based tool that will allow them to view published research on their topics of interest. With search results at their fingertips, researchers can reduce the time and effort they need to plan cutting-edge research using fresh methodologies, avoiding work that has already been done. Research-funding organizations, DOE Office of Science user facilities, and academic conference planners can also benefit from this tool because it could help them, for example, measure the impact of their research dollars or facility allocations or locate potential meeting presenters and attendees. Numerous algorithms enable the tool to identify a research paper’s subject area and find connections among papers, as well as to conduct a search quickly and at scale. As a computer scientist, Robert is a seasoned user of data analytics to solve research problems. He has been working in publications mining for several years. This is Robert’s second year as a data sponsor.
Robert’s Data Challenge utilizes Microsoft Academic Graph, a dataset that contains 166 million publications and 64 million connections among papers, such as common author, journal, research area, or conference. Last year, he particularly enjoyed watching the growth and learning among novice participants. Robert also looks forward to getting feedback from more advanced Data Challengers, who often present sponsors with completely new techniques. He hopes to identify potential collaborators among this year’s participants and is interested to see whether participants use the dataset for unique purposes. Robert believes creating algorithms to run searches on such a large dataset will be a great exercise in ingenuity.
Vincent Paquit’s research focuses on developing a data analytics framework for qualification and certification of additively manufactured parts—a critical piece for ensuring broader adoption of this technology in industry. One component of this framework seeks to improve in situ quality control during the additive manufacturing process by bringing the power of data analytics to bear on the problem of part defects. Industry currently uses expensive and time-consuming methods such as computed tomography and mechanical testing to assess the fitness of 3D-printed parts. An alternative involves inspecting each layer of an object as it is being built to generate a material characteristics map, which describes the quality of the part. Through a Cooperative Research and Development Agreement with ARCAM, Vincent works with their Q10 electron beam 3D printer to develop process parameters, improve system monitoring, and optimize printing strategies, with the end goal of engineering a feedback loop control system that will provide detection and correction of defects in real time. He points out that, while basic research, such as modeling the melting process in an electron beam system to understand melt-pool behavior, can tolerate a relatively long problem-to-solution timeframe, applying data analysis techniques to manufacturing quality control processes demands quicker turnaround—even as fast as one second.
Vincent has donated a dataset of near infrared images taken by a camera located at the top of the ARCAM printer’s printing chamber. The camera captures an image of each layer of material deposited during the 3D printing process, and the images show defects in the material layers as spots or bright areas. Vincent attended last year’s Smoky Mountain Conference as an observer and recognized that his research could benefit from high-performance computing solutions. As a data sponsor for the 2018 Data Challenge, he looks forward to being presented with creative methods for applying data analytics to manufacturing-based quality control that focus on feedback speed, and he hopes to make connections with computer scientists and data analysts who may be completely new to the fields of material science and additive manufacturing. Vincent also aims to show industry partners that involvement in data challenges can generate insights that benefit their manufacturing processes.
This group of computer and computational scientists is engaged in the Oak Ridge Leadership Computing Facility’s (OLCF’s) efforts to continuously move the field of supercomputing forward. They examine the inner workings of programming languages used by Oak Ridge National Laboratory’s (ORNL’s) leadership-class supercomputers, Summit and Titan, to determine how the individual components work together and how they can be customized to enable more effective, efficient simulations. Jack Wells, Oscar Hernandez, Reuben Budiardja, Graham Lopez, Vivek Sarkar, and Jisheng Zhao also look closely at parallelism—how nodes of a supercomputer interact to produce simulations—and how changes made to one component of an application influence another. The group’s dataset is connected to ORNL’s CAASCADE project, which aims to cultivate knowledge of how scientists use programming languages for supercomputing and whether the code that application developers generate meets their needs. Because their future plans include collecting data from all the programming languages used by OLCF supercomputers, Jack, Oscar, Reuben, Graham, Vivek, and Jisheng are also developing methods for gathering information in a scalable way, and they hope to apply machine learning to this research. Their dataset contains data about E3SM, a climate modeling application developed with US Department of Energy funding that can be used to simulate Earth’s atmosphere, water bodies, land and sea ice, and geology.
Jack, Oscar, Reuben, Graham, Vivek, and Jisheng are participating in the Data Challenge for the first time this year. They look forward to the opportunity to shine a light on CAASCADE and the OLCF’s supercomputing and simulations work. The researchers believe participation in the Data Challenge will be mutually beneficial to data sponsors and challengers: Challengers can apply new data visualization techniques to data questions, filling a knowledge gap among researchers, and the Data Challenge gives challengers the opportunity to think about how visualization techniques can contribute to understanding datasets and supercomputing methodologies.