Processing module
Main Processing
- class EthoCNN(noClasses)
Bases:
ModuleThe implmentation of the CNN to each annotate stacked image with the corresponding behaviour. The algorithm is designed to function robustly in the context of low distinctive features of the mouse and potential big changes of the environment.
The images used here have 11 channels corresponding to 11 frames, and are typically obtained with the help of the methods from
StackFramesThis transforms movement features into image features that are more relevant to behaviour and easier to classify.The structure is similar to the one from the image classification proposed by Krizhevsky et al. in ImageNet. One major difference (apart from the number of channels) is that the first convolution layer is formed by 4 asymetrical filters that will react differently to various movement patterns.
- noClasses
numer of classes in which the image will be classified
- fc_size
size of the fully connected layer = 4096
- cf_s
size of the last convolution layer = 8
- cf_n
number of channels of the last convolution layer
- bn1
normalization layer for the input image N(11 channels)
- c11
first filter of the first convolution layer: C(11, 64, (11, 1), padding=(5,0), stride=2)
- c12
second filter of the first convolution layer: C(11, 64, (1, 11), padding=(0,5), stride=2)
- c13
third filter of the first convolution layer: C(11, 64, (7, 3), padding=(3,1), stride=2)
- c14
fourth filter of the first convolution layer: C(11, 64, (3, 7), padding=(1,3), stride=2)
- c2
second convolution layer: C(256, 384, 7, stride=2)
- c3
third convolution layer: C(384, 512, 5)
- c4
fourth convolution layer: C(512, self.cf_n, 3)
- dropout
dropout layer
- fc1
first fully connected layer to transform the output of the convolution layers FC(self.cf_s * self.cf_s * self.cf_n, self.fc_size)
- fc2
second fully connected layer to obtain the final classification: FC(self.fc_size, self.noClasses)
- forward(x)
Implements the architecture of the CNN as:
bn1-[c11, c12, c13, c14]-P(3,2)-c2-P(3,2)-c3-c4-dropout-fc1-dropout-fc2
After an initial batch normalization, the 4 convolutions of the first layer are applied to the normed input. The results are stacked and then 2D max pooling is applied on the resulting tensor. Another 2D max pooling is applied after the second layer convolution. Each convolution layer is followed by a Rectified Linear Unit (ReLU). A Dropout step that randomly zeroes some of the elements of the input tensor during training is applied before each fully connected layer.
- Parameters
x (array) – set of images that will be processed in parallel
- Returns
the log_softmax value of the classification results for each input image in
x- Return type
array
- class DataReaderAV(logger, videoFilePath, annFilePath=None)
Class to read video and annotation files together. It uses PyAV and Pandas libraries to read the files and provides a simplified interface.
- avContainer: av.container.Container
reference to the PyAV object that provides the functionality to read video data
- videoStream = None
reference to the PyAV object corresponding to the video stream from which the frames are read
- annData: pandas.DataFrame
memory representation of the annotation data in Pandas format
- open(streamId=0, offset=0)
Initializes the variables needed for the subsequent reading of the data.
The video file in
videoFilewill be accessible throughavContainer. The video stream identified throughstreamIdwill be accessible invideoStream.If an annotation file was set in
annFile, the containing data is read into theannDatavariable.
- readFrames(n)
Reads up to
nframes and their corresponding annotations starting from the first frame after the last one read, or from theoffsetif this is the first call for this object.The frames returned are converted to single channel, and the top and bottom part are cropped out. The resulting content is converted to a square and scaled to 256*256 pixels.
Along with the visual data, the timestamp of the frame is also read from the video file. This value represents the time that passed from the beginning of the recording, and will typically be the encoding frame, unless the camera shutter time is used at recording time.
If annotation data is available, for each frame, the corresponding annotation from
annDatais attached to the result. In this version of the software, we pack groom and micromovement together as mm+- Parameters
n (int) – maximum number of frames to be read
- Returns
read and scaled frames, together with the corresponding frame number, timestamp and, if present, annotation
- Return type
array
Helper functions
Module StackFrames contains functions to prepare the video frames for the CNN input.
- getTestTensors(x)
Computes tensors for processing/testing. Every 6th image in
xis stacked together with the 5 five previous and the 5 following following images to form a new 11 channel image. A torch.Tensor is created from the array containing these new images.If cuda is available, the data is also copied to cuda memory.
- Parameters
x (array) – input frames, single channel
- Returns
tensor containing the stacked images
- Return type
torch.Tensor
- getTensors(x, ann=None, modify=False)
Compute tensors for training. Every 8th image in
xis stacked together with the 5 five previous and the 5 following following images to form a new 11 channel image. A torch.Tensor is created from the array containing these new images.If
modifyis True, some geometric transformations will be applied to 80% of the stacked images. This helps the training process.If annotation data is provided, a similar process occurs here too. For every 8th annotation, the 5 previous and the 5 following annotations are also considered. The dominant annotation in the set is selected, the one in the middle is prefered when there is a tie. This way the very short/noisy annotation events are ignored. A torch.Tensor is created from the array containing these new annotations.
If cuda is available, the data is also copied to cuda memory.
- Parameters
x (array) – input frames, single channel
ann (array, optional) – annotation data, defaults to None
modify (bool, optional) – True value triggers geometric altering of the image data, defaults to False
- Returns
a tensor containing the stacked images and one containing the annotation tensor, or None
- Return type
torch.Tensor, torch.Tensor
Integration
- class ProcComm(port=2000)
Provides a basic interface for creating and communicating with the
ProcessVideo.ProcessVideoclass.This is the entry point of the Docker instance.
- start_sockServer()
Creates a socket listening on the
portport. When running inside a Docker instance, this port is internal, and the mapping to the local system communication port is done at creation time.When the connection is established, the method reads the file name that is to be processed and creates the file name for the output. The full file names are relative to the Docker instance mounted folder.
It then starts the processing by calling
callProc()
- callProc(videoFile, outputFile, logger)
Creates an instance of
ProcessVideo.ProcessVideofor and calls theProcessVideo.ProcessVideo.process()method until the whole video is processed. It regularily sends the progress feed-back through theconnSocketsocket.- Parameters
videoFile (str) – full path to the video file to be processed
outputFile (str) – full path to the output file
logger (Logger.Logger) – simple logger used mostly for debugging
- sendMessage(message)
Sends the size of the message and the contents through the
connSocketsocket- Parameters
message (str) – data to be sent
- Raises
RuntimeError – raised when socket send method fails
- receiveMessage()
Reads a message from the
connSocketsocket- Returns
message data that was read
- Return type
- class ProcessVideo(modelPath, noClasses, videoFile, outputFile, segSize=125)
Wrapper for the instiating the CNN with a specified model and using it to process a video file.
- segSize: int
maximum number of frames that will be read and processed at a time - depends on the available memory
- process(logger)
Generator function for behaviour classification.
Creates a
EthoCNN.EthoCNNinstance withnoClassesoutputs and loads the trained model present inmodelPath. The resulting object will be running in evaluation mode, and, if available, on the CUDA environment.For each call, the method will process a maximum of
segSizeframes, store the results in the output file and return the percentage to the total frames that have been processed so far.Each classification is done for a set of 11 frames. The process is performed with a stride of 5 as described in
StackFrames.getTestTensors. The result is then attributed to the frame in the middle of the interval, the 3 frames before, and the 2 frames right after that. Therefore there are no behaviour bouts shorter than 240ms. This is conform to the behaviour definition and avoides noisy results, while speeding up the processing.The current output contains these behaviours: drink, eat, mm+, hang, rear, rest, and walk
- Parameters
logger (Logger.Logger) – simple logger used mostly for debugging
- Yield
the percentage of frames already processed
- Return type
- class Logger(logPath)
Simple logger used mostly for debugging.
- setLogFile(logPath)
Creates
logFilehandle to the ‘logPath’ location.- Parameters
logPath (str) – path to where the log file will be created
Training and Testing
- class TrainModel(noClasses, log)
Class to train a new model using a set of annotated video intervals.
- optimizer: torch.optim.Optimizer
the handle to the optimizer used for updating the model’s parameters
- criterion: torch.nn.CrossEntropyLoss
the handle to the loss class used for computing the lost gradients
- trainIntervals: []
the list of
TrainIntervalobjects that form an epoch
- stagnating: int
counts the number of iterations where the average cost is under a certain threshold
- initModel()
Initializes the
model, theoptimizerand thecriterion.If cuda is available, they will all be transfered to the GPU memory.
- addTrainInterval(videoFile, annFile)
Creates a new
TrainIntervalobject and adds it to the list intrainIntervals. This will use the video found at videoFile and the annotation file found at annFile path.
- trainEpoch()
Train an epoch.
Calls the
TrainInterval.TrainInterval.train()for all the items added intrainIntervals. The average cost, the used target annotaions, the training accuracy score and the corresponding confusion matrix are accumulated and displayed.- Returns
average cost
- Return type
- train(epochs)
Trains the model at
modelfor a specified number of epochs. All but the first the intermediate models are saved.The process might stop before if the average cost persists under a cetrain threshold for more then 5 steps.
- Parameters
epochs (int) – the numer of epochs that will be trained
- saveModel(path)
Saves the current model at the specified path.
- Parameters
path (str) – path to the file where the model will be saved
- loadModel(path)
Creates a new
EthoCNN.EthoCNNinstance withnoClassesoutputs and loads into it the state dictionary from the specified path.- Parameters
path (str) – path to the file where the model is found
- class TrainInterval(model, optimizer, criterion, videoFile, annFile, noClasses, logger)
Class to train a model with one annotated video interval.
- optimizer: torch.optim.Optimizer
the handle to the optimizer used for updating the model’s parameters
- criterion: torch.nn.CrossEntropyLoss
the handle to the loss class used for computing the lost gradients
- train()
Trains the model at
modelwith the frames fromvideoFileagainst the target inannFileThe process is split into steps of maximum
segSizeframes. The classification is done with a stride of 8, as described inStackFrames.getTensors(). For each step, the predicted annotations are computed, then the loss gradients against the target from the manual annotation. Theoptimizerthen updates the model parameters.
- class TestModel(noClasses, modelPath, logger)
Class for testing the model found at
modelPathwithnoClassesoutputs.- testIntervals: []
the list of TestInterval objects
- addTestInterval(videoFile, annFile)
Creates a new
TestInterval.TestIntervalobject and adds it to the list intestIntervals. This will use the video found at videoFile and the annotation file found at annFile path.
- test()
Calls the
TestInterval.TestInterval.test()method for all the objects intestIntervals. The results are accumulated and the final accuracy and confusion matrix are displayed at the end.Intermediate accuracy results are also displayed to follow the progress.
- class TestInterval(modelPath, noClasses, videoFile, annFile, logger)
Class for testing a saved model against a video file and the associated annotation file.
- test()
Loads the model at
modelPathin a newEthoCNN.EthoCNNinstace withnoClassesoutputs. This will be used to classify all the frames in the video found atvideoFilelocation. The process is split into steps of maximumsegSizeframes.The classification is done with a stride of 6, as described in
StackFrames.getTestTensors(). The result is then attributed to the frame in the middle of the interval, the 3 frames before, and the 2 frames right after that. The current output contains these behaviours: drink, eat, mm+, hang, rear, rest, and walk.These results are then compared to the ones from the annotation file for the corresponding frames to compute the accuracy score and fill the confusion matrix.
- Returns
the accuracy score and the confusion matrix
- Return type
float, []
SelectTrainingData