DL-Framework
The DL-Framework is an approach to generate an experimental environment that can easily be reused and extended but without the requirements to change core functionality. The framework is implemented using Tensorflow 2.0 in combination with tf.keras. If you add new model functionalities, like data loading, these operations should be handled by the framework and not be implemented again and again. The same situation for the training loop. In a segmentation task, the training loop for all kind of networks is completely identical, it's not necessary to implement these loops for each model.
For more flexibility, the framework is configured by YAML configuration files. This allows it, flexibly, to configure experiments without dying in an argument hell of a command-line application. Additional pre-processing methods, metrics or loss functions can easily append or replaced just by changing the parameter in an experiment configuration.
If something is missing, don't hesitate to implement it! A model is missing? Checkout out the already implemented models and start with your implementation. This is also the case for loss functions, data loaders, metrics or pre-processing functions. While semantic segmentation training or classification tasks are identically in the training loop, you can also implement your loop. The best example is the CycleGAN implementation, that overrides the basic training loop.
Docs
Installation
Use the following command to install the required python packages via pip.
pip install -r requirements.txt
To install dlf module on your system, use
pip install .
Usage
Experiment Configuration
experiment:
Key | Summary |
---|---|
output_folder | Required. Path where the models/weights and logging data are saved |
model:
The model section contains required parameters to initialize an ANN model from the framework. All list of all available models and the corresponding parameters can you find in the Documentation/Models.
input_reader:
The input reader section is responsible to provide data during training, validation and test (not implemented at the moment). You can find all data generators and the corresponding parameters in Documentation/Data generators
Key | Summary |
---|---|
training_reader | Optional. Provides data used during training |
validation_reader | Optional. Provides data used during validation |
training_reader | Optional. Provides data used during training |
training:
This section contains all training specific parameters. These are, for instance, the number of steps to train, after every N step start the evaluation, callbacks, metrics or just the batch size used during training.
Key | Summary |
---|---|
metrics | Optional. A list of metrics which are evaluated during training/validation/test. Documentation/Metrics |
callbacks | Optional. A list of callbacks which are executed during training/validation/test.Documentation/Callbacks |
num_steps | Required. Number of steps to train |
batch_size | Optional. Number of samples per gradient update |
eval_every_step | Optional. Start evaluation at every N step (step mod N == 0) |
save_strategy | Optional. Provides a strategy when a model should be saved. Documentation |
Example configuration
Below is a sample configuration that is used to train a CNN for a segmentation task.
In the experiment section, we define the output folder where all results are saved.
As model the configuration specifies the vgg_encoder_decoder and the corresponding parameters.
In this case, the input is an RGB image with a resolution of 512x512 pixels.
We set up the network that we can distinguish between 7 classes.
Note: &num_classes is YAML syntax to define a variable which we can reuse in our configuration
With model_weights
the network is forced to use pre-trained weights at the given path to initialize the network.
By specifying SparseCategoricalCrossentropyIgnore as loss function we override the default loss of the model.
The fact that the vgg_encoder_decoder
model uses a softmax-layer as the last layer we pass for the argument from_loggits
as False value.
As mentioned before, we reuse the value of the num_classes
variable as input for the SparseCategoricalCrossentropyIgnore objective.
As the stochastic gradient descent method, this experiment uses Adam with a learning rate of 0.00001.
During training, the network receives the input from a tf_record_segmentation_reader
.
Not only the path to the TFRecord and Labelmap file is specified also an option to remap classes.
This functionality allows it to change categories of pixels during training from e.g. class 1 to class 0.
The preprocess_list
contains a list of data augmentation methods that are applied before the images are fed into the network. Not only single scalars can be used as variables also lists and dictionaries.
In this case, we use the remap-list also for the validation_reader.
For a better overview of the training, the experiment uses two metrics.
First the SparseCategoricalCrossentropyIgnore
and additional the SparseMeanIoU
.
All these metric values are logged to Tensorboard by using the callback SegmentationLogger
but also segmentation mask examples.
The training is executed for 50.000 steps with a batch size of 4 and after every hundredth step, the model is evaluated.
The save_strategy
ensures that the model is only stored when the value of validation_sparse_mean_iou
improves.
experiment:
output_folder: /mnt/data/experiments/segmentation/SegNet_VGG_transposed
model:
vgg_encoder_decoder:
input_shape:
- 512
- 512
- 3
num_classes: &num_classes 7
summary: True
use_skip_layers: True
model_weights: /mnt/data/experiments/segmentation/SegNet_VGG_transposed/checkpoint
loss:
SparseCategoricalCrossentropyIgnore:
num_classes: *num_classes
from_logits: False
optimizer:
- Adam:
learning_rate: 0.00001
input_reader:
training_reader:
name: tf_record_segmentation_reader
path: /mnt/data/datasets/wheeled_walker_100k_8fps_25switch/25k_sample_4_mask/training.tfrecord
labelmap: &labelmap /mnt/data/datasets/wheeled_walker_100k_8fps_25switch/label_map.pbtxt
ignore:
remap: &mapping
1: 0
2: 0
3: 0
4: 1
5: 1
preprocess_list:
h_flip:
v_flip:
resize:
width: 512
height: 512
validation_reader:
name: tf_record_segmentation_reader
path: /mnt/data/datasets/omnidetector-Flat/training-mask.tfrecord
labelmap: *labelmap
shuffle: False
ignore:
remap:
preprocess_list:
resize:
width: 512
height: 512
training:
metrics:
SparseCategoricalCrossentropyIgnore:
from_logits: False
num_classes: *num_classes
SparseMeanIoU:
num_classes: *num_classes
callbacks:
SegmentationLogger:
num_classes: *num_classes
# num_visualizations: 200
opacity: 0.4
num_steps: 50000
eval_every_step: 100
batch_size: 4
save_strategy:
monitor: validation_sparse_mean_iou
mode: max
Run an experiment
To start an experiment it is just required to execute the experiment.py with the corresponding configuration file.
python experiment.py --config config/config_dst.yml