Loading

Seismic Facies Identification Challenge

[Explainer] - EDA of Seismic data by geographic axis

Here’s my EDA notebook on how the seismic data varies by geographic axis, along with some ideas for training.

dipam_chakraborty

EDA on variation by geographic axis

Here’s my EDA notebook on how the seismic data varies by geographic axis, along with some ideas for training.

A peek some of the stuff in the notebook

How the patterns look per label:

 

 

How the facies vary by z-axis

 

 

 

 

Splitting the data for training based on the EDA results

 

 

Do share your feedback. :blush:

AICrowd Seismic Facies Identification Challenge

https://www.aicrowd.com/challenges/seismic-facies-identification-challenge

The goal of the Seismic Facies Identification challenge is to create a machine-learning algorithm which, working from the raw 3D image, can reproduce an expert pixel-by-pixel facies identification.

Problem description: 3D image Semantic Segmentation

Setup Environment 📚

In [ ]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

Domain knowledge 💡

The following are the geologic descriptions of each labels:

1 : Basement/Other: Basement - Low S/N; Few internal Reflections; May contain volcanics in places
2 : Slope Mudstone A: Slope to Basin Floor Mudstones; High Amplitude Upper and Lower Boundaries; Low Amplitude Continuous/Semi-Continuous Internal Reflectors
3 : Mass Transport Deposit: Mix of Chaotic Facies and Low Amplitude Parallel Reflections
4 : Slope Mudstone B: Slope to Basin Floor Mudstones and Sandstones; High Amplitude Parallel Reflectors; Low Continuity Scour Surfaces
5 : Slope Valley: High Amplitude Incised Channels/Valleys; Relatively low relief
6 : Submarine Canyon System: Erosional Base is U shaped with high local relief. Internal fill is low amplitude mix of parallel inclined surfaces and chaotic disrupted reflectors. Mostly deformed slope mudstone filled with isolated sinuous sand-filled channels near the basal surface.

According to the literature, the facies are identified using the curves in the readings

📌 Remember

The curves seems to be low level to mid level features, so a majority of the prediction likely depends on that, but the broader context may also help

Load data 💾

In [ ]:
# Downloading Data
!wget https://datasets.aicrowd.com/default/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/data_train.npz

# Download Labels
!wget https://datasets.aicrowd.com/default/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/labels_train.npz

# Download Labels
!wget https://datasets.aicrowd.com/default/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/data_test_1.npz
--2020-10-06 18:34:08--  https://datasets.aicrowd.com/default/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/data_train.npz
Resolving datasets.aicrowd.com (datasets.aicrowd.com)... 35.189.208.115
Connecting to datasets.aicrowd.com (datasets.aicrowd.com)|35.189.208.115|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3.us-west-002.backblazeb2.com/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/data_train.npz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=002ae2491b744be0000000002%2F20201006%2Fus-west-002%2Fs3%2Faws4_request&X-Amz-Date=20201006T183409Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=35f7df9d7038f10366ad4a03d356b85d1bc688e1139e453352b305355779036d [following]
--2020-10-06 18:34:09--  https://s3.us-west-002.backblazeb2.com/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/data_train.npz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=002ae2491b744be0000000002%2F20201006%2Fus-west-002%2Fs3%2Faws4_request&X-Amz-Date=20201006T183409Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=35f7df9d7038f10366ad4a03d356b85d1bc688e1139e453352b305355779036d
Resolving s3.us-west-002.backblazeb2.com (s3.us-west-002.backblazeb2.com)... 206.190.215.254
Connecting to s3.us-west-002.backblazeb2.com (s3.us-west-002.backblazeb2.com)|206.190.215.254|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 1715555445 (1.6G) [application/octet-stream]
Saving to: ‘data_train.npz’

data_train.npz      100%[===================>]   1.60G  14.8MB/s    in 2m 11s  

2020-10-06 18:36:21 (12.5 MB/s) - ‘data_train.npz’ saved [1715555445/1715555445]

--2020-10-06 18:36:21--  https://datasets.aicrowd.com/default/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/labels_train.npz
Resolving datasets.aicrowd.com (datasets.aicrowd.com)... 35.189.208.115
Connecting to datasets.aicrowd.com (datasets.aicrowd.com)|35.189.208.115|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3.us-west-002.backblazeb2.com/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/labels_train.npz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=002ae2491b744be0000000002%2F20201006%2Fus-west-002%2Fs3%2Faws4_request&X-Amz-Date=20201006T183622Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=769a069551c6c250128e59d4f431e95c8d8efa80a24b2b678472e7a3efe6e91f [following]
--2020-10-06 18:36:22--  https://s3.us-west-002.backblazeb2.com/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/labels_train.npz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=002ae2491b744be0000000002%2F20201006%2Fus-west-002%2Fs3%2Faws4_request&X-Amz-Date=20201006T183622Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=769a069551c6c250128e59d4f431e95c8d8efa80a24b2b678472e7a3efe6e91f
Resolving s3.us-west-002.backblazeb2.com (s3.us-west-002.backblazeb2.com)... 206.190.215.254
Connecting to s3.us-west-002.backblazeb2.com (s3.us-west-002.backblazeb2.com)|206.190.215.254|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 7160425 (6.8M) [application/octet-stream]
Saving to: ‘labels_train.npz’

labels_train.npz    100%[===================>]   6.83M  10.7MB/s    in 0.6s    

2020-10-06 18:36:23 (10.7 MB/s) - ‘labels_train.npz’ saved [7160425/7160425]

--2020-10-06 18:36:23--  https://datasets.aicrowd.com/default/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/data_test_1.npz
Resolving datasets.aicrowd.com (datasets.aicrowd.com)... 35.189.208.115
Connecting to datasets.aicrowd.com (datasets.aicrowd.com)|35.189.208.115|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3.us-west-002.backblazeb2.com/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/data_test_1.npz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=002ae2491b744be0000000002%2F20201006%2Fus-west-002%2Fs3%2Faws4_request&X-Amz-Date=20201006T183625Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=1a67bd65b350dabb8ec276f4e9b80c4f2a5c3bc820324a8786f054d279bec310 [following]
--2020-10-06 18:36:25--  https://s3.us-west-002.backblazeb2.com/aicrowd-public-datasets/seamai-facies-challenge/v0.1/public/data_test_1.npz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=002ae2491b744be0000000002%2F20201006%2Fus-west-002%2Fs3%2Faws4_request&X-Amz-Date=20201006T183625Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=1a67bd65b350dabb8ec276f4e9b80c4f2a5c3bc820324a8786f054d279bec310
Resolving s3.us-west-002.backblazeb2.com (s3.us-west-002.backblazeb2.com)... 206.190.215.254
Connecting to s3.us-west-002.backblazeb2.com (s3.us-west-002.backblazeb2.com)|206.190.215.254|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 731382806 (698M) [application/octet-stream]
Saving to: ‘data_test_1.npz’

data_test_1.npz     100%[===================>] 697.50M  23.4MB/s    in 32s     

2020-10-06 18:36:59 (21.6 MB/s) - ‘data_test_1.npz’ saved [731382806/731382806]

In [ ]:
train_full = np.load('data_train.npz', allow_pickle=True, mmap_mode='r')["data"]
labels_full = np.load('labels_train.npz', allow_pickle=True)["labels"]
test = np.load('data_test_1.npz', allow_pickle=True)["data"]

A tiny peek at the data ❕

In [ ]:
fig, ax = plt.subplots(1,3, sharey=True);
fig.set_size_inches(20, 8);
fig.suptitle("2D slice of the 3D seismic data volume", fontsize=20);
yc = 100
print(np.unique(labels_full[:, :, yc]))
ax[0].imshow(train_full[:, :, yc], cmap='terrain');
ax[0].set_xlabel('X Axis: West - East', fontsize=14);
ax[0].set_ylabel('Z Axis: Top - Bottom', fontsize=14);
ax[1].imshow(labels_full[:, :, yc]);
ax[1].set_xlabel('X Axis: West - East', fontsize=14);
ax[2].imshow(train_full[:, :, yc], cmap='terrain');
ax[2].imshow(labels_full[:, :, yc], alpha=0.4, cmap='twilight');
ax[2].set_xlabel('X Axis: West - East', fontsize=14);
[1 2 3 4 5 6]

EDA on images 📷

In [ ]:
## Data histogram, min, max, mean, std
tr_ravel = train_full.ravel()
minval, maxval, mean, std = np.min(tr_ravel), np.max(tr_ravel), np.mean(tr_ravel), np.std(tr_ravel)
print('Min: %0.4f, Max: %0.4f, Mean: %0.4f, Std: %0.4f' % 
      (minval, maxval, mean, std))
hist = plt.hist(tr_ravel, bins=100);
plt.title("Histogram of data values");
Min: -5195.5234, Max: 5151.7188, Mean: 0.6766, Std: 390.3082

📌 Remember

Data is nearly zero mean, range is much higher than std. Probably good to normalize inputs

In [ ]:
## Let's increase the contrast on the above data view by clipping at mean ± 3*std

normclip = lambda img: np.clip(img, mean-3*std, mean+3*std)

fig, ax = plt.subplots(1,3, sharey=True);
fig.set_size_inches(20, 8);
fig.suptitle("2D slice of the 3D seismic data volume", fontsize=20);
ax[0].imshow(normclip(train_full[:, :, yc]), cmap='terrain');
ax[0].set_xlabel('X Axis: West - East', fontsize=14);
ax[0].set_ylabel('Z Axis: Top - Bottom', fontsize=14);
ax[1].imshow(labels_full[:, :, yc]);
ax[1].set_xlabel('X Axis: West - East', fontsize=14);
ax[2].imshow(normclip(train_full[:, :, yc]), cmap='terrain');
ax[2].imshow(labels_full[:, :, yc], alpha=0.4, cmap='twilight');
ax[2].set_xlabel('X Axis: West - East', fontsize=14);