2020: Challenge 5

Using Machine Learning to Understand Uncertainty in Subsurface Exploration

Keith Gray, Max Grossman, Anar Yusifov
BP plc

Challenge Science Domain: Geosciences

Data Set Name: Synthetic Seismic Realizations

Description of the Data Set

In the energy industry, an understanding of subsurface characteristics and structure is crucial to identifying and localizing untapped resources. At a high level, the process of taking an entirely unexplored region of earth and generating an actionable understanding of its structure includes:

  1. Seismic data collection: Collect raw signals from the subsurface using techniques similar to sonograms used in hospitals.
  2. Seismic data pre-processing: Quality check and clean the collected raw signals.
  3. Seismic migration & velocity model construction: Use the raw signals and our understanding of the likely geology of the region to construct a 3D representation of the subsurface.
  4. Seismic interpretation: Using the constructed 3D representation, interpret where faults, layers, and other important structural features are in the subsurface.

With each of these steps comes an amount of uncertainty from various sources of potential error: instrument error, human error, modeling error, and more. Despite this, the output of most seismic processing workflows is a single, gold standard, output image. An image which we know cannot possibly be 100% accurate!

It is crucial that future seismic processing workflows start to incorporate uncertainty when estimating the true subsurface structures. Rather than outputting a single interpretation, we should aim to emit a spectrum of possible realizations and an understanding of where uncertainty is high or low.

The dataset included in this data challenge serves as a starting point in exploring techniques for quantifying uncertainty in seismic processing workflows. In this dataset we are focused on quantifying and visualizing the uncertainty in our estimations of the density of the subsurface based on how varying those estimates impacts our output 3D volume. At a high level, this dataset consists of a set of synthetic but realistic models of the density of the subsurface, randomly generated based on a single, known, synthetic ground truth. This dataset also includes the final 3D realizations generated using those density models (also called velocity models). These files are stored in the industry standard SEGY format, and an example Jupyter notebook is provided to illustrate how to load and visualize them.

Challenge Questions

The end goal of this data challenge is to construct an uncertainty map for a given seismic survey, labeling each pixel in a final 2D seismic image with a value between 0.0 and 1.0 indicating how volatile the estimate for that pixel is.

However, we also welcome submissions that include any intermediate work towards that end goal or answers to any of the below challenge questions. Even if you are unable to complete the entire challenge, any submissions that show progress towards this end goal and lay out ideas for how the challenge could eventually be completed will be considered.

  • Given that geophysicists generally use horizontal lines in gathers as a good indicator of velocity model accuracy, build a model (analytical, mathematical, data-driven, or otherwise) to estimate the quality of each velocity model based on its associated gathers.
  • Train a model to label each pixel with an uncertainty value between 0.0 and 1.0 indicating how uncertain any given realization of that part of the subsurface is.
  • Generate a single uncertainty map given all of the velocity models, realizations, and gathers at hand.
  • Generate some form of visualization of this uncertainty map of the subsurface.

For more background information, please see the PDF included with the challenge.