Machine Learning Pipeline for Automatic Detection of Magnetospheric Boundaries

2. Machine Learning Pipeline for Automatic Detection of Magnetospheric Boundaries

The work package “Machine Learning Solutions for Data Analysis and Exploitation in Planetary Science” within Europlanet 2024 Research Infrastructure will develop machine learning (ML) powered data analysis and exploitation tools optimized for planetary science.
In this workshop, we will introduce a ML pipeline for the automated detection of magnetospheric boundaries in spacecraft in situ data around Earth. First, we will briefly give an overview about the physical problem. Then, we will guide the participants through the developed ML code with the help of a sample data set and discuss problems encountered during the development of the pipeline.

Europlanet 2024 RI has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871149.

2.1. Installation

git clone https://github.com/epn-ml/Tutorial_IAP_Boundaries.git
python -m venv wsenv
source wsenv/bin/activate
cd Tutorial_IAP_Boundaries
pip install -r requirements.txt
ipython kernel install --user --name=wsenv
jupyter lab

Download saved model, dataset and labels from https://figshare.com/articles/dataset/Tutorial_IAP_Boundaries_Data/21153403

2.2. Data preparation

# at first, we want to import the necessary packages

%load_ext autoreload
%autoreload 2

# Don't print warnings
import warnings
warnings.filterwarnings('ignore')
import sys

import pickle 
import numpy as np
import pandas as pds
import datetime
import preprocess as pp
import crossing as cr
import matplotlib.pyplot as plt
# then we load the list of boundary crossings and the dataset

years = [2006,2007,2008]

sc = 'C1'
width = 10
crosslist = cr.get_crosslist('CL_BS_crossings_2002_2014.txt',sc,years)

data = pds.read_csv('data_resampled.csv',index_col=0)
data.index = pds.to_datetime(data.index)

data.head()
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/var/folders/x3/2bzh843n0tv469w6l6sd8sq00000gn/T/ipykernel_23916/2176712100.py in <module>
      7 crosslist = cr.get_crosslist('CL_BS_crossings_2002_2014.txt',sc,years)
      8 
----> 9 data = pds.read_csv('data_resampled.csv',index_col=0)
     10 data.index = pds.to_datetime(data.index)
     11 

~/miniconda3/envs/epn-ml-book/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~/miniconda3/envs/epn-ml-book/lib/python3.9/site-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    678     kwds.update(kwds_defaults)
    679 
--> 680     return _read(filepath_or_buffer, kwds)
    681 
    682 

~/miniconda3/envs/epn-ml-book/lib/python3.9/site-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds)
    573 
    574     # Create the parser.
--> 575     parser = TextFileReader(filepath_or_buffer, **kwds)
    576 
    577     if chunksize or iterator:

~/miniconda3/envs/epn-ml-book/lib/python3.9/site-packages/pandas/io/parsers/readers.py in __init__(self, f, engine, **kwds)
    931 
    932         self.handles: IOHandles | None = None
--> 933         self._engine = self._make_engine(f, self.engine)
    934 
    935     def close(self):

~/miniconda3/envs/epn-ml-book/lib/python3.9/site-packages/pandas/io/parsers/readers.py in _make_engine(self, f, engine)
   1215             # "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
   1216             # , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217             self.handles = get_handle(  # type: ignore[call-overload]
   1218                 f,
   1219                 mode,

~/miniconda3/envs/epn-ml-book/lib/python3.9/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    787         if ioargs.encoding and "b" not in ioargs.mode:
    788             # Encoding
--> 789             handle = open(
    790                 handle,
    791                 ioargs.mode,

FileNotFoundError: [Errno 2] No such file or directory: 'data_resampled.csv'

There appear quite large differences in magnitude of the various features. Therefore we scale our data!

# scale the data

from sklearn.preprocessing import StandardScaler

scale = StandardScaler()
scale.fit(data)

data_scaled = pds.DataFrame(index = data.index, columns = data.columns, data = scale.transform(data))

data_scaled.head()
dens tpar tperp vel_gse_x vel_gse_y vel_gse_z v_abs b_gse_x b_gse_y b_gse_z b_abs
2002-01-01 10:02:00 -0.027453 -0.001001 -0.001 -0.000979 -0.001048 0.000947 -0.001037 2.626536 2.409025 -1.839272 2.098737
2002-01-01 10:03:00 -0.027529 -0.001001 -0.001 -0.000976 -0.001043 0.000935 -0.001040 2.617199 2.377015 -1.865195 2.089613
2002-01-01 10:04:00 -0.027518 -0.001001 -0.001 -0.000976 -0.001042 0.000934 -0.001039 2.606989 2.338094 -1.896799 2.079298
2002-01-01 10:05:00 -0.027526 -0.001001 -0.001 -0.000975 -0.001044 0.000933 -0.001040 2.601780 2.302699 -1.916597 2.068956
2002-01-01 10:06:00 -0.027508 -0.001001 -0.001 -0.000977 -0.001044 0.000934 -0.001039 2.596544 2.267779 -1.934071 2.058333
# we plot examples

for i in range(0,5):
    crosslist[i].plot_cross(data, delta=20,label=None,pred=None)
../../../_images/IAP_Pipeline_6_0.png ../../../_images/IAP_Pipeline_6_1.png ../../../_images/IAP_Pipeline_6_2.png ../../../_images/IAP_Pipeline_6_3.png ../../../_images/IAP_Pipeline_6_4.png

Let’s take a closer look at the data. We can clearly see the boundary crossing in the example plotted. But how will we translate this for our model to learn? Simply segmenting the time series into “crossing” and “no crossing” will give a HUGE data imbalance. So it does not seem to work as a classification problem. We also have to decide on how to deliver the data to a possible model. Single points? Or time frames of multiple hours?

# load label
similarities = pds.read_csv('similarities_C1_width'+str(width)+'.csv', index_col = 0)
similarities.index = pds.to_datetime(similarities.index)

We decided on a parameter between 0 and 1, that simultanously defines if a given time frame contains a bow shock crossing and how far from the center it occurs. The size of the window was chosen to be 20 minutes in order to not contain more than two crossings in one image but still include enough data to perfectly see the crossing.

label

Nevertheless, the problem of imbalanced data is not yet completely resolved. Let’s see why:

cr.crossingsperyear(crosslist,years)
2006: 46
2007: 141
2008: 138

Given the fact that each crossing now equals 20 non-zero values, we still have a highly imbalanced dataset. So let’s take a look at when our crossings actually happen.

plt.plot(similarities.index[similarities.index.year==2006],similarities[similarities.index.year==2006])
[<matplotlib.lines.Line2D at 0x7fdafcdc7a00>]
../../../_images/IAP_Pipeline_12_1.png

2.3. Training, validation, and test sets

There are times where the spacecraft does not cross the bow shock for quite a while, for example when it is in the night side part of the magnetopause or too far away in the solar wind. So to simplify the problem, we will only use the times in which we expect crossings to occur. Thus, the next step is to create the training set, validation set and test set.

# randomly split the data into train, test and val

import window as wdw

eventhours = wdw.geteventhours(crosslist,years)
np.random.shuffle(eventhours)

testhours = eventhours[0:42]
valhours = eventhours[43:85]
trainhours = eventhours[85:]
window = width*2

# create windows

x_val_windowed, y_val = wdw.createrandomwindows(data,similarities,window,valhours)
x_train_windowed, y_train = wdw.createrandomwindows(data, similarities,window,trainhours)
windowing done%
windowing done%

2.4. Our model

Now that the data preprocessing is done, we can start with building a model. We use a pretty simple architecture to avoid overfitting, which was adapted from automatic detection of interplanetary coronal mass ejections.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv1D, MaxPooling1D, Flatten, BatchNormalization
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, LearningRateScheduler

model_path = 'savedmodel'
#callbacks
callbacks = []

# reduce learning-rate when reaching plateau
callbacks.append(
    ReduceLROnPlateau(monitor='val_loss', factor=0.5,
                      patience=25, epsilon=0.001,
                      cooldown=1, verbose=1))

# add early stopping
callbacks.append(
    EarlyStopping(monitor='val_loss', min_delta=0.001,
                  patience=50, verbose=1))

callbacks.append(ModelCheckpoint(model_path, verbose=1, save_best_only=True))
2022-09-19 00:32:51.097824: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-09-19 00:32:51.097845: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:`epsilon` argument is deprecated and will be removed, use `min_delta` instead.
input_shape = (window, data.shape[1])
from tensorflow.keras.optimizers import Adam

model = Sequential()
model.add(
    Conv1D(
    20,
    kernel_size=3,
    padding='causal',
    activation='relu',
    input_shape=input_shape))
model.add(
    Conv1D(
    20,
    kernel_size=3,
    padding='causal',
    activation='relu',
    input_shape=input_shape))
model.add(
    Conv1D(
    20,
    kernel_size=3,
    padding='causal',
    activation='relu',
    input_shape=input_shape))
model.add(
    Conv1D(
    20,
    kernel_size=3,
    padding='causal',
    activation='relu',
    input_shape=input_shape))
model.add(
    Conv1D(
    20,
    kernel_size=3,
    padding='causal',
    activation='relu',
    input_shape=input_shape))
model.add(
    Conv1D(
    20,
    kernel_size=3,
    padding='causal',
    activation='relu',
    input_shape=input_shape))
model.add(
    Conv1D(
    20,
    kernel_size=3,
    padding='causal',
    activation='relu',
    input_shape=input_shape))
model.add(MaxPooling1D())
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='MeanSquaredError', optimizer=Adam(learning_rate=1e-4))
2022-09-19 00:32:51.835450: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-09-19 00:32:51.835471: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-09-19 00:32:51.835486: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (hannahruedisser-ThinkPad-T14s-Gen-2a): /proc/driver/nvidia/version does not exist
2022-09-19 00:32:51.835662: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

We can start training, now that the model is built.

model.fit(x_train_windowed,
    y_train,
    epochs=600,
    batch_size=8,
    verbose=1,
    validation_data=(x_val_windowed, y_val),
    callbacks=callbacks,
    #sample_weight=weight_train,
    shuffle=True)
Epoch 1/600
2022-09-19 00:32:51.946900: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
643/643 [==============================] - 1s 1ms/step - loss: 0.1617 - val_loss: 0.1178

Epoch 00001: val_loss improved from inf to 0.11782, saving model to model
2022-09-19 00:32:53.676703: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: model/assets
Epoch 2/600
643/643 [==============================] - 1s 1ms/step - loss: 0.1073 - val_loss: 0.1086

Epoch 00002: val_loss improved from 0.11782 to 0.10864, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 3/600
643/643 [==============================] - 1s 1ms/step - loss: 0.1007 - val_loss: 0.1022

Epoch 00003: val_loss improved from 0.10864 to 0.10221, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 4/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0908 - val_loss: 0.0788

Epoch 00004: val_loss improved from 0.10221 to 0.07877, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 5/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0736 - val_loss: 0.0685

Epoch 00005: val_loss improved from 0.07877 to 0.06845, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 6/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0613 - val_loss: 0.0619

Epoch 00006: val_loss improved from 0.06845 to 0.06190, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 7/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0540 - val_loss: 0.0651

Epoch 00007: val_loss did not improve from 0.06190
Epoch 8/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0468 - val_loss: 0.0592

Epoch 00008: val_loss improved from 0.06190 to 0.05919, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 9/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0444 - val_loss: 0.0571

Epoch 00009: val_loss improved from 0.05919 to 0.05712, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 10/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0423 - val_loss: 0.0567

Epoch 00010: val_loss improved from 0.05712 to 0.05675, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 11/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0389 - val_loss: 0.0551

Epoch 00011: val_loss improved from 0.05675 to 0.05513, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 12/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0374 - val_loss: 0.0563

Epoch 00012: val_loss did not improve from 0.05513
Epoch 13/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0350 - val_loss: 0.0527

Epoch 00013: val_loss improved from 0.05513 to 0.05267, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 14/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0341 - val_loss: 0.0519

Epoch 00014: val_loss improved from 0.05267 to 0.05187, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 15/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0326 - val_loss: 0.0515

Epoch 00015: val_loss improved from 0.05187 to 0.05151, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 16/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0309 - val_loss: 0.0496

Epoch 00016: val_loss improved from 0.05151 to 0.04957, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 17/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0292 - val_loss: 0.0507

Epoch 00017: val_loss did not improve from 0.04957
Epoch 18/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0275 - val_loss: 0.0488

Epoch 00018: val_loss improved from 0.04957 to 0.04877, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 19/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0279 - val_loss: 0.0487

Epoch 00019: val_loss improved from 0.04877 to 0.04874, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 20/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0248 - val_loss: 0.0471

Epoch 00020: val_loss improved from 0.04874 to 0.04714, saving model to model
INFO:tensorflow:Assets written to: model/assets
Epoch 21/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0253 - val_loss: 0.0482

Epoch 00021: val_loss did not improve from 0.04714
Epoch 22/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0240 - val_loss: 0.0495

Epoch 00022: val_loss did not improve from 0.04714
Epoch 23/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0229 - val_loss: 0.0497

Epoch 00023: val_loss did not improve from 0.04714
Epoch 24/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0230 - val_loss: 0.0504

Epoch 00024: val_loss did not improve from 0.04714
Epoch 25/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0222 - val_loss: 0.0561

Epoch 00025: val_loss did not improve from 0.04714
Epoch 26/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0220 - val_loss: 0.0474

Epoch 00026: val_loss did not improve from 0.04714
Epoch 27/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0212 - val_loss: 0.0474

Epoch 00027: val_loss did not improve from 0.04714
Epoch 28/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0198 - val_loss: 0.0488

Epoch 00028: val_loss did not improve from 0.04714
Epoch 29/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0197 - val_loss: 0.0511

Epoch 00029: val_loss did not improve from 0.04714
Epoch 30/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0184 - val_loss: 0.0508

Epoch 00030: val_loss did not improve from 0.04714
Epoch 31/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0186 - val_loss: 0.0483

Epoch 00031: val_loss did not improve from 0.04714
Epoch 32/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0182 - val_loss: 0.0476

Epoch 00032: val_loss did not improve from 0.04714
Epoch 33/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0173 - val_loss: 0.0480

Epoch 00033: val_loss did not improve from 0.04714
Epoch 34/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0173 - val_loss: 0.0534

Epoch 00034: val_loss did not improve from 0.04714
Epoch 35/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0163 - val_loss: 0.0517

Epoch 00035: val_loss did not improve from 0.04714
Epoch 36/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0170 - val_loss: 0.0493

Epoch 00036: val_loss did not improve from 0.04714
Epoch 37/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0154 - val_loss: 0.0517

Epoch 00037: val_loss did not improve from 0.04714
Epoch 38/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0151 - val_loss: 0.0512

Epoch 00038: val_loss did not improve from 0.04714
Epoch 39/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0145 - val_loss: 0.0546

Epoch 00039: val_loss did not improve from 0.04714
Epoch 40/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0150 - val_loss: 0.0505

Epoch 00040: val_loss did not improve from 0.04714
Epoch 41/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0147 - val_loss: 0.0477

Epoch 00041: val_loss did not improve from 0.04714
Epoch 42/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0144 - val_loss: 0.0492

Epoch 00042: val_loss did not improve from 0.04714
Epoch 43/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0139 - val_loss: 0.0483

Epoch 00043: val_loss did not improve from 0.04714
Epoch 44/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0134 - val_loss: 0.0528

Epoch 00044: val_loss did not improve from 0.04714
Epoch 45/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0136 - val_loss: 0.0503

Epoch 00045: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.

Epoch 00045: val_loss did not improve from 0.04714
Epoch 46/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0120 - val_loss: 0.0507

Epoch 00046: val_loss did not improve from 0.04714
Epoch 47/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0116 - val_loss: 0.0488

Epoch 00047: val_loss did not improve from 0.04714
Epoch 48/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0119 - val_loss: 0.0504

Epoch 00048: val_loss did not improve from 0.04714
Epoch 49/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0115 - val_loss: 0.0506

Epoch 00049: val_loss did not improve from 0.04714
Epoch 50/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0114 - val_loss: 0.0496

Epoch 00050: val_loss did not improve from 0.04714
Epoch 51/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0115 - val_loss: 0.0516

Epoch 00051: val_loss did not improve from 0.04714
Epoch 52/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0112 - val_loss: 0.0496

Epoch 00052: val_loss did not improve from 0.04714
Epoch 53/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0113 - val_loss: 0.0506

Epoch 00053: val_loss did not improve from 0.04714
Epoch 54/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0110 - val_loss: 0.0524

Epoch 00054: val_loss did not improve from 0.04714
Epoch 55/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0109 - val_loss: 0.0514

Epoch 00055: val_loss did not improve from 0.04714
Epoch 56/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0108 - val_loss: 0.0521

Epoch 00056: val_loss did not improve from 0.04714
Epoch 57/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0108 - val_loss: 0.0520

Epoch 00057: val_loss did not improve from 0.04714
Epoch 58/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0114 - val_loss: 0.0528

Epoch 00058: val_loss did not improve from 0.04714
Epoch 59/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0109 - val_loss: 0.0527

Epoch 00059: val_loss did not improve from 0.04714
Epoch 60/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0102 - val_loss: 0.0514

Epoch 00060: val_loss did not improve from 0.04714
Epoch 61/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0107 - val_loss: 0.0503

Epoch 00061: val_loss did not improve from 0.04714
Epoch 62/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0103 - val_loss: 0.0519

Epoch 00062: val_loss did not improve from 0.04714
Epoch 63/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0100 - val_loss: 0.0517

Epoch 00063: val_loss did not improve from 0.04714
Epoch 64/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0102 - val_loss: 0.0514

Epoch 00064: val_loss did not improve from 0.04714
Epoch 65/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0097 - val_loss: 0.0528

Epoch 00065: val_loss did not improve from 0.04714
Epoch 66/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0100 - val_loss: 0.0518

Epoch 00066: val_loss did not improve from 0.04714
Epoch 67/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0100 - val_loss: 0.0535

Epoch 00067: val_loss did not improve from 0.04714
Epoch 68/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0097 - val_loss: 0.0516

Epoch 00068: val_loss did not improve from 0.04714
Epoch 69/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0103 - val_loss: 0.0519

Epoch 00069: val_loss did not improve from 0.04714
Epoch 70/600
643/643 [==============================] - 1s 1ms/step - loss: 0.0097 - val_loss: 0.0523

Epoch 00070: ReduceLROnPlateau reducing learning rate to 2.499999936844688e-05.

Epoch 00070: val_loss did not improve from 0.04714
Epoch 00070: early stopping
<keras.callbacks.History at 0x7fdaa8456550>

We can also just load the best model trained so far and test it on some unseen data.

from tensorflow.keras.models import load_model

model = load_model('model') 
x_test_windowed, test_df = wdw.createrandomtest(data, similarities,window,testhours)
y_pred = model.predict(x_test_windowed, verbose=1)
y_predSeries = pds.DataFrame(index=test_df.index,data=np.ravel(y_pred))
y_predSeries['pred']=np.ravel(y_pred)

testlist = [i for i in crosslist if datetime.datetime(i.crosstime.year,i.crosstime.month,i.crosstime.day,i.crosstime.hour) in testhours]
windowing done%
54/54 [==============================] - 0s 758us/step
for i in testhours[0:10]:
    cr.plot_results(data,similarities,y_predSeries,i)
../../../_images/IAP_Pipeline_24_0.png ../../../_images/IAP_Pipeline_24_1.png ../../../_images/IAP_Pipeline_24_2.png ../../../_images/IAP_Pipeline_24_3.png ../../../_images/IAP_Pipeline_24_4.png ../../../_images/IAP_Pipeline_24_5.png ../../../_images/IAP_Pipeline_24_6.png ../../../_images/IAP_Pipeline_24_7.png ../../../_images/IAP_Pipeline_24_8.png ../../../_images/IAP_Pipeline_24_9.png

We can already see that there are some bumps where a crossing takes place, but we also missed some! We start with some postprocessing to turn the prediction into a comparable catalog.

import scipy.signal as ss
import postprocess

peaks = ss.find_peaks(y_predSeries['pred'].values,prominence=0.2)
predlist = []
for i in peaks[0]:
    predlist.append(cr.Crossing(y_predSeries.index[i]))

TP, FN, FP = postprocess.evaluate(predlist,testlist, thres=3)
Precision is: 0.5423728813559322
Recall is: 0.5161290322580645
True Positives 32
False Negatives 30
False Positives 27

We did manage to find some of the crossings, but there still is a lot of work to do:

  • increase data

  • use non-resampled dataset

  • use additional features

  • tune hyperparameters

  • further experiment with model architecture

  • crossvalidation