Processing module

Main Processing

class EthoCNN(noClasses)

Bases: Module

The implmentation of the CNN to each annotate stacked image with the corresponding behaviour. The algorithm is designed to function robustly in the context of low distinctive features of the mouse and potential big changes of the environment.

The images used here have 11 channels corresponding to 11 frames, and are typically obtained with the help of the methods from StackFrames This transforms movement features into image features that are more relevant to behaviour and easier to classify.

The structure is similar to the one from the image classification proposed by Krizhevsky et al. in ImageNet. One major difference (apart from the number of channels) is that the first convolution layer is formed by 4 asymetrical filters that will react differently to various movement patterns.

noClasses: numer of classes in which the image will be classified

fc_size: size of the fully connected layer = 4096

cf_s: size of the last convolution layer = 8

cf_n: number of channels of the last convolution layer

bn1: normalization layer for the input image N(11 channels)

c11: first filter of the first convolution layer: C(11, 64, (11, 1), padding=(5,0), stride=2)

c12: second filter of the first convolution layer: C(11, 64, (1, 11), padding=(0,5), stride=2)

c13: third filter of the first convolution layer: C(11, 64, (7, 3), padding=(3,1), stride=2)

c14: fourth filter of the first convolution layer: C(11, 64, (3, 7), padding=(1,3), stride=2)

c2: second convolution layer: C(256, 384, 7, stride=2)

c3: third convolution layer: C(384, 512, 5)

c4: fourth convolution layer: C(512, self.cf_n, 3)

dropout: dropout layer

fc1: first fully connected layer to transform the output of the convolution layers FC(self.cf_s * self.cf_s * self.cf_n, self.fc_size)

fc2: second fully connected layer to obtain the final classification: FC(self.fc_size, self.noClasses)

forward(x)

Implements the architecture of the CNN as:

bn1-[c11, c12, c13, c14]-P(3,2)-c2-P(3,2)-c3-c4-dropout-fc1-dropout-fc2

After an initial batch normalization, the 4 convolutions of the first layer are applied to the normed input. The results are stacked and then 2D max pooling is applied on the resulting tensor. Another 2D max pooling is applied after the second layer convolution. Each convolution layer is followed by a Rectified Linear Unit (ReLU). A Dropout step that randomly zeroes some of the elements of the input tensor during training is applied before each fully connected layer.

Parameters: x (array) – set of images that will be processed in parallel
Returns: the log_softmax value of the classification results for each input image in x
Return type: array

class DataReaderAV(logger, videoFilePath, annFilePath=None)

Class to read video and annotation files together. It uses PyAV and Pandas libraries to read the files and provides a simplified interface.

videoFile: str: the path to the video file being read

annFile: str: the path to the annotation file being read

logger: Logger: simple shared logger used mostly for debugging

avContainer: av.container.Container: reference to the PyAV object that provides the functionality to read video data

videoStream = None: reference to the PyAV object corresponding to the video stream from which the frames are read

annData: pandas.DataFrame: memory representation of the annotation data in Pandas format

totalFrames: int: the total number of frames in the selected video stream

open(streamId=0, offset=0)

Initializes the variables needed for the subsequent reading of the data.

The video file in videoFile will be accessible through avContainer. The video stream identified through streamId will be accessible in videoStream.

If an annotation file was set in annFile, the containing data is read into the annData variable.

Parameters

streamId (int, optional) – the index of the video stream from which the frames will be read, defaults to 0
offset (int, optional) – the frame position at which the first frame will be read, defaults to 0

Returns

True when successfully opened both files, False otherwise

Return type

bool

readFrames(n)

Reads up to n frames and their corresponding annotations starting from the first frame after the last one read, or from the offset if this is the first call for this object.

The frames returned are converted to single channel, and the top and bottom part are cropped out. The resulting content is converted to a square and scaled to 256*256 pixels.

Along with the visual data, the timestamp of the frame is also read from the video file. This value represents the time that passed from the beginning of the recording, and will typically be the encoding frame, unless the camera shutter time is used at recording time.

If annotation data is available, for each frame, the corresponding annotation from annData is attached to the result. In this version of the software, we pack groom and micromovement together as mm+

Parameters: n (int) – maximum number of frames to be read
Returns: read and scaled frames, together with the corresponding frame number, timestamp and, if present, annotation
Return type: array

Helper functions

Module StackFrames contains functions to prepare the video frames for the CNN input.

getTestTensors(x)

Computes tensors for processing/testing. Every 6th image in x is stacked together with the 5 five previous and the 5 following following images to form a new 11 channel image. A torch.Tensor is created from the array containing these new images.

If cuda is available, the data is also copied to cuda memory.

Parameters: x (array) – input frames, single channel
Returns: tensor containing the stacked images
Return type: torch.Tensor

getTensors(x, ann=None, modify=False)

Compute tensors for training. Every 8th image in x is stacked together with the 5 five previous and the 5 following following images to form a new 11 channel image. A torch.Tensor is created from the array containing these new images.

If modify is True, some geometric transformations will be applied to 80% of the stacked images. This helps the training process.

If annotation data is provided, a similar process occurs here too. For every 8th annotation, the 5 previous and the 5 following annotations are also considered. The dominant annotation in the set is selected, the one in the middle is prefered when there is a tie. This way the very short/noisy annotation events are ignored. A torch.Tensor is created from the array containing these new annotations.

If cuda is available, the data is also copied to cuda memory.

Parameters

x (array) – input frames, single channel
ann (array, optional) – annotation data, defaults to None
modify (bool, optional) – True value triggers geometric altering of the image data, defaults to False

Returns

a tensor containing the stacked images and one containing the annotation tensor, or None

Return type

torch.Tensor, torch.Tensor

Integration

class ProcComm(port=2000)

Provides a basic interface for creating and communicating with the ProcessVideo.ProcessVideo class.

This is the entry point of the Docker instance.

port: int: port on which the connection will be established, default is 2000

connSocket: socket: connection socket to send and receive data

start_sockServer()

Creates a socket listening on the port port. When running inside a Docker instance, this port is internal, and the mapping to the local system communication port is done at creation time.

When the connection is established, the method reads the file name that is to be processed and creates the file name for the output. The full file names are relative to the Docker instance mounted folder.

It then starts the processing by calling callProc()

callProc(videoFile, outputFile, logger)

Creates an instance of ProcessVideo.ProcessVideo for and calls the ProcessVideo.ProcessVideo.process() method until the whole video is processed. It regularily sends the progress feed-back through the connSocket socket.

Parameters

videoFile (str) – full path to the video file to be processed
outputFile (str) – full path to the output file
logger (Logger.Logger) – simple logger used mostly for debugging

sendMessage(message)

Sends the size of the message and the contents through the connSocket socket

Parameters: message (str) – data to be sent
Raises: RuntimeError – raised when socket send method fails

receiveMessage()

Reads a message from the connSocket socket

Returns: message data that was read
Return type: str

class ProcessVideo(modelPath, noClasses, videoFile, outputFile, segSize=125)

Wrapper for the instiating the CNN with a specified model and using it to process a video file.

videoFile: str: the full path to the video that will be processed by the current instance

outputFile: str: the full path to the output file where the results will be stored

noClasses: str: number of classed used by the model (current model uses 10)

modelPath: str: full path to trained model that will be used

segSize: int: maximum number of frames that will be read and processed at a time - depends on the available memory

process(logger)

Generator function for behaviour classification.

Creates a EthoCNN.EthoCNN instance with noClasses outputs and loads the trained model present in modelPath. The resulting object will be running in evaluation mode, and, if available, on the CUDA environment.

For each call, the method will process a maximum of segSize frames, store the results in the output file and return the percentage to the total frames that have been processed so far.

Each classification is done for a set of 11 frames. The process is performed with a stride of 5 as described in StackFrames.getTestTensors. The result is then attributed to the frame in the middle of the interval, the 3 frames before, and the 2 frames right after that. Therefore there are no behaviour bouts shorter than 240ms. This is conform to the behaviour definition and avoides noisy results, while speeding up the processing.

The current output contains these behaviours: drink, eat, mm+, hang, rear, rest, and walk

Parameters: logger (Logger.Logger) – simple logger used mostly for debugging
Yield: the percentage of frames already processed
Return type: int

class Logger(logPath)

Simple logger used mostly for debugging.

logFile: str: handle to the file where the logs will be writtem

setLogFile(logPath)

Creates logFile handle to the ‘logPath’ location.

Parameters: logPath (str) – path to where the log file will be created

log(message, printMess=False)

Writes the message in through the file handle logFile. Every message will be preceeded by the current time.

If printMess is true, the message will be also sent to the standard output.

Parameters

message (str) – message to be logged
printMess (bool, optional) – flag to print the message to the standard output, defaults to False

closeLogFile(): Closes the logFile handle.

Training and Testing

class TrainModel(noClasses, log)

Class to train a new model using a set of annotated video intervals.

optimizer: torch.optim.Optimizer: the handle to the optimizer used for updating the model’s parameters

criterion: torch.nn.CrossEntropyLoss: the handle to the loss class used for computing the lost gradients

model: EthoCNN: the model that will be trained

trainIntervals: []: the list of TrainInterval objects that form an epoch

noClasses: int: number of classes the model is initialized with

stagnating: int: counts the number of iterations where the average cost is under a certain threshold

logger: Logger: simple logger used mostly for debugging

initModel()

Initializes the model, the optimizer and the criterion.

If cuda is available, they will all be transfered to the GPU memory.

addTrainInterval(videoFile, annFile)

Creates a new TrainInterval object and adds it to the list in trainIntervals. This will use the video found at videoFile and the annotation file found at annFile path.

Parameters

videoFile (str) – path to the video file to be used for training
annFile (str) – path to the annotation file to be used as target

trainEpoch()

Train an epoch.

Calls the TrainInterval.TrainInterval.train() for all the items added in trainIntervals. The average cost, the used target annotaions, the training accuracy score and the corresponding confusion matrix are accumulated and displayed.

Returns: average cost
Return type: float

train(epochs)

Trains the model at model for a specified number of epochs. All but the first the intermediate models are saved.

The process might stop before if the average cost persists under a cetrain threshold for more then 5 steps.

Parameters: epochs (int) – the numer of epochs that will be trained

saveModel(path)

Saves the current model at the specified path.

Parameters: path (str) – path to the file where the model will be saved

loadModel(path)

Creates a new EthoCNN.EthoCNN instance with noClasses outputs and loads into it the state dictionary from the specified path.

Parameters: path (str) – path to the file where the model is found

cleanup(): Deletes the model, the optimizer and the criterion and calls system cleanup functions.

class TrainInterval(model, optimizer, criterion, videoFile, annFile, noClasses, logger)

Class to train a model with one annotated video interval.

videoFile: str: the path to the video file to be used for training

annFile: str: the path to the file containing the groundtruth for the video

model: EthoCNN: the handle to the model used for training

optimizer: torch.optim.Optimizer: the handle to the optimizer used for updating the model’s parameters

criterion: torch.nn.CrossEntropyLoss: the handle to the loss class used for computing the lost gradients

noClasses: int: number of outputs for the current model

segSize: int: the maximum number of frames that will be processed in parallel

logger: Logger: simple logger used mostly for debugging

train()

Trains the model at model with the frames from videoFile against the target in annFile

The process is split into steps of maximum segSize frames. The classification is done with a stride of 8, as described in StackFrames.getTensors(). For each step, the predicted annotations are computed, then the loss gradients against the target from the manual annotation. The optimizer then updates the model parameters.

Returns: the average training cost, the set of target annotations used, the training accuracy score and the corresponding confusion matrix
Return type: float, [], float, []

class TestModel(noClasses, modelPath, logger)

Class for testing the model found at modelPath with noClasses outputs.

testIntervals: []: the list of TestInterval objects

modelPath: str: path to the saved model that will be tested

noClasses: int: number of classes the model is initialized with

logger: Logger: simple logger used mostly for debugging

addTestInterval(videoFile, annFile)

Creates a new TestInterval.TestInterval object and adds it to the list in testIntervals. This will use the video found at videoFile and the annotation file found at annFile path.

Parameters

videoFile (str) – path to the video file to be tested
annFile (str) – path to the annotation file to be used

test()

Calls the TestInterval.TestInterval.test() method for all the objects in testIntervals. The results are accumulated and the final accuracy and confusion matrix are displayed at the end.

Intermediate accuracy results are also displayed to follow the progress.

class TestInterval(modelPath, noClasses, videoFile, annFile, logger)

Class for testing a saved model against a video file and the associated annotation file.

videoFile: str: path to the video file that will be used

annFile: str: path to the annotation file that will be used

segSize: int: the maximum number of frames that will be processed in parallel

noClasses: int: number of classes the model is initialized with

modelPath: str: path to the saved model that will be tested

logger: Logger: simple logger used mostly for debugging

test()

Loads the model at modelPath in a new EthoCNN.EthoCNN instace with noClasses outputs. This will be used to classify all the frames in the video found at videoFile location. The process is split into steps of maximum segSize frames.

The classification is done with a stride of 6, as described in StackFrames.getTestTensors(). The result is then attributed to the frame in the middle of the interval, the 3 frames before, and the 2 frames right after that. The current output contains these behaviours: drink, eat, mm+, hang, rear, rest, and walk.

These results are then compared to the ones from the annotation file for the corresponding frames to compute the accuracy score and fill the confusion matrix.

Returns: the accuracy score and the confusion matrix
Return type: float, []

SelectTrainingData