SMC Data Challenge 2021: Call for Sponsors

2021 Smoky Mountains Computational Sciences and Engineering Conference (SMC2021)

Kingsport, TN, USA, Aug 24-26, 2021 (Website:


What is SMC?

The Smoky Mountains Computational Sciences and Engineering Conference (SMC 2021) is a premier forum organized by the Department of Energy’s Oak Ridge National Laboratory (ORNL) that brings together researchers in high-performance computing (HPC) and integrated instruments for science; industry and government practitioners; and developers, users and policy makers to exchange ideas about the latest developments in HPC and to share best practices for a wide range of applications. The conference has four major focus areas: theory, experiment, modeling and simulation, and data, which are geared toward accelerated node computing and integrated instruments for science.

Historically, SMC has attracted a wide range of participants from academia, national laboratories, industry, and others with a penchant for solving problems. SMC 2021 features the interdisciplinary needs and nature of HPC and bridges the gap between HPC research and enterprise innovation, providing an alternative to highly specialized conferences focused on individual disciplines. The nineteenth iteration of SMC (SMC 2021) will take place from August 24 to 26, 2021. The conference format will be a hybrid event held in-person in Kingsport, Tennessee at the MeadowView Conference Resort & Convention Center and online via livestream. The setting for this conference may change depending on COVID-19 travel protocols and recommendations. The proceedings from SMC 2021 will be published in the Springer Communications in Computer and Information Science (CCIS) book series.


We invite proposals from data sponsors to take part in our fifth annual Smoky Mountains Data Challenge. This year’s SMC Data Challenge (SMCDC 2021) will be held in conjunction with SMC 2021 and will feature two categories of competitive shared tasks. The first track will consist of data analytics challenges for scientific datasets and the second track will present scientific use cases that utilize edge computing. In each challenge, participants will present solutions for exciting data analytics and research questions, providing new ideas for improvements to existing methods and advances beyond them.


Important Dates

Here is a tentative timeline of milestones:

Challenge descriptions due Feb 20
Datasets Mar 10
Website goes live Apr 1
Outreach/marketing Apr 1 until registration closes
Registration opens May 10
Registration closes Jun 22
Submissions due Jul 31
Selection process ends Aug 6
SMC Conference Aug 24 – Aug 26


Category 1: Data Analytics Challenges 

With the explosion of data from scientific instruments and simulations, there is a critical need for data-driven methods for scientific discovery. We are inviting proposals from data sponsors that explore a variety of research questions. Previous years challenges included topics like publication mining, microscopy pixel classification, data from urban sciences to understand energy usage and improve vehicle efficiency, and data from cancer clinical trials, to name a few. Please refer to the SMC data challenge website for previous data challenges.

Selection Criteria

  • Data can be made publicly available and published with a DOI
  • Data is related to an ORNL research campaign and is scientific in nature
  • Tasks are motivated by the researcher or engineering behind the data
  • If the dataset is over 2 GB in size, it includes a smaller trial dataset no larger than a few MB for challenge participants to use to build their initial methods
  • Tasks are well defined and show a progression of difficulty from first to last – Task 1 should be doable by a student or novice data scientist, and task 4 should entice a senior researcher or data scientist
  • Challenge has the potential to attract a wide variety of participants

How to submit?


Category 2: Edge Computing Challenges

We are excited to announce that we are collaborating with NVIDIA to tackle research challenges in one of the most exciting areas of computing, edge computing. We are inviting use case proposals for edge computing challenges. Edge computing nodes are designed to reduce bandwidth and computational complexity and perform tasks such as real-time signal and image processing, combinatorial optimization, agent-based modeling, big data analysis, etc. For example, data processing tasks at edge such as compression, filtering, trigger handling, etc. can shorten time to discovery for researcher.

The sponsorship application submission process is similar to the data analytics call, but the format of the proposal may differ. A sample proposal can be seen here in this use case on Distracted Driver Monitoring.

Selection Criteria

  • The use case is related to an ORNL research campaign or engineering development and is scientific in nature
  • Use case tasks are motivated by the researcher or engineering behind the data
  • The use case has the potential to attract a wide variety of participants
  • Any related data can be published with a DOI

How to submit?


Be a data sponsor

As a data sponsor, you will be asked to:

  • Identify one or more datasets on which you would like to have novel data analytics performed
  • Devise three to five challenging data analytics questions you would like to see explored. These questions should vary from easy to difficult because we will have both novice and experienced/advanced participants
  • Once accepted as a data sponsor, get a Digital Object Identifier (DOI) for your dataset
  • Evaluate the submissions upon completion and submission of challenges and select top submissions by participant category
  • At the conference, help choose the Data Challenge winners!

Why should you submit?

Proposals should contain the following information

  • Names and affiliations of the sponsor members (including a designated contact person)
  • Description of the task, including its background and relevance
  • Description of the dataset, including variables, storage format, and size
  • Challenges of interest arranged by their level of solvability from easy to difficult
  • Any other information such as suggestions for participants on how to get started, relevant publications, and webpages

Data sponsor roles and responsibilities

  • Work with the organizers to describe your datasets and identify relevant questions
  • Work with our data scientists to standardize research questions
  • Help the challenge participants understand your dataset
  • Select winners

Where does all my data go?

  • Your data will be published with a DOI and served from
  • All data sets must be publicly shareable.
  • Anyone in the world with a search engine will have access

How do I submit?

Please submit your data sponsor proposals via EasyChair


If you have any questions regarding the challenge, please contact Pravi Devineni or Suzanne Parete-Koon at