Processing module

Main Processing

class EthoCNN(noClasses)

Bases: Module

The implmentation of the CNN to each annotate stacked image with the corresponding behaviour. The algorithm is designed to function robustly in the context of low distinctive features of the mouse and potential big changes of the environment.

The images used here have 11 channels corresponding to 11 frames, and are typically obtained with the help of the methods from StackFrames This transforms movement features into image features that are more relevant to behaviour and easier to classify.

The structure is similar to the one from the image classification proposed by Krizhevsky et al. in ImageNet. One major difference (apart from the number of channels) is that the first convolution layer is formed by 4 asymetrical filters that will react differently to various movement patterns.

noClasses

numer of classes in which the image will be classified

fc_size

size of the fully connected layer = 4096

cf_s

size of the last convolution layer = 8

cf_n

number of channels of the last convolution layer

bn1

normalization layer for the input image N(11 channels)

c11

first filter of the first convolution layer: C(11, 64, (11, 1), padding=(5,0), stride=2)

c12

second filter of the first convolution layer: C(11, 64, (1, 11), padding=(0,5), stride=2)

c13

third filter of the first convolution layer: C(11, 64, (7, 3), padding=(3,1), stride=2)

c14

fourth filter of the first convolution layer: C(11, 64, (3, 7), padding=(1,3), stride=2)

c2

second convolution layer: C(256, 384, 7, stride=2)

c3

third convolution layer: C(384, 512, 5)

c4

fourth convolution layer: C(512, self.cf_n, 3)

dropout

dropout layer

fc1

first fully connected layer to transform the output of the convolution layers FC(self.cf_s * self.cf_s * self.cf_n, self.fc_size)

fc2

second fully connected layer to obtain the final classification: FC(self.fc_size, self.noClasses)

forward(x)

Implements the architecture of the CNN as:

bn1-[c11, c12, c13, c14]-P(3,2)-c2-P(3,2)-c3-c4-dropout-fc1-dropout-fc2

After an initial batch normalization, the 4 convolutions of the first layer are applied to the normed input. The results are stacked and then 2D max pooling is applied on the resulting tensor. Another 2D max pooling is applied after the second layer convolution. Each convolution layer is followed by a Rectified Linear Unit (ReLU). A Dropout step that randomly zeroes some of the elements of the input tensor during training is applied before each fully connected layer.

Parameters

x (array) – set of images that will be processed in parallel

Returns

the log_softmax value of the classification results for each input image in x

Return type

array

class DataReaderAV(logger, videoFilePath, annFilePath=None)

Class to read video and annotation files together. It uses PyAV and Pandas libraries to read the files and provides a simplified interface.

videoFile: str

the path to the video file being read

annFile: str

the path to the annotation file being read

logger: Logger

simple shared logger used mostly for debugging

avContainer: av.container.Container

reference to the PyAV object that provides the functionality to read video data

videoStream = None

reference to the PyAV object corresponding to the video stream from which the frames are read

annData: pandas.DataFrame

memory representation of the annotation data in Pandas format

totalFrames: int

the total number of frames in the selected video stream

open(streamId=0, offset=0)

Initializes the variables needed for the subsequent reading of the data.

The video file in videoFile will be accessible through avContainer. The video stream identified through streamId will be accessible in videoStream.

If an annotation file was set in annFile, the containing data is read into the annData variable.

Parameters
  • streamId (int, optional) – the index of the video stream from which the frames will be read, defaults to 0

  • offset (int, optional) – the frame position at which the first frame will be read, defaults to 0

Returns

True when successfully opened both files, False otherwise

Return type

bool

readFrames(n)

Reads up to n frames and their corresponding annotations starting from the first frame after the last one read, or from the offset if this is the first call for this object.

The frames returned are converted to single channel, and the top and bottom part are cropped out. The resulting content is converted to a square and scaled to 256*256 pixels.

Along with the visual data, the timestamp of the frame is also read from the video file. This value represents the time that passed from the beginning of the recording, and will typically be the encoding frame, unless the camera shutter time is used at recording time.

If annotation data is available, for each frame, the corresponding annotation from annData is attached to the result. In this version of the software, we pack groom and micromovement together as mm+

Parameters

n (int) – maximum number of frames to be read

Returns

read and scaled frames, together with the corresponding frame number, timestamp and, if present, annotation

Return type

array

Helper functions

Module StackFrames contains functions to prepare the video frames for the CNN input.

getTestTensors(x)

Computes tensors for processing/testing. Every 6th image in x is stacked together with the 5 five previous and the 5 following following images to form a new 11 channel image. A torch.Tensor is created from the array containing these new images.

If cuda is available, the data is also copied to cuda memory.

Parameters

x (array) – input frames, single channel

Returns

tensor containing the stacked images

Return type

torch.Tensor

getTensors(x, ann=None, modify=False)

Compute tensors for training. Every 8th image in x is stacked together with the 5 five previous and the 5 following following images to form a new 11 channel image. A torch.Tensor is created from the array containing these new images.

If modify is True, some geometric transformations will be applied to 80% of the stacked images. This helps the training process.

If annotation data is provided, a similar process occurs here too. For every 8th annotation, the 5 previous and the 5 following annotations are also considered. The dominant annotation in the set is selected, the one in the middle is prefered when there is a tie. This way the very short/noisy annotation events are ignored. A torch.Tensor is created from the array containing these new annotations.

If cuda is available, the data is also copied to cuda memory.

Parameters
  • x (array) – input frames, single channel

  • ann (array, optional) – annotation data, defaults to None

  • modify (bool, optional) – True value triggers geometric altering of the image data, defaults to False

Returns

a tensor containing the stacked images and one containing the annotation tensor, or None

Return type

torch.Tensor, torch.Tensor

Integration

class ProcComm(port=2000)

Provides a basic interface for creating and communicating with the ProcessVideo.ProcessVideo class.

This is the entry point of the Docker instance.

port: int

port on which the connection will be established, default is 2000

connSocket: socket

connection socket to send and receive data

start_sockServer()

Creates a socket listening on the port port. When running inside a Docker instance, this port is internal, and the mapping to the local system communication port is done at creation time.

When the connection is established, the method reads the file name that is to be processed and creates the file name for the output. The full file names are relative to the Docker instance mounted folder.

It then starts the processing by calling callProc()

callProc(videoFile, outputFile, logger)

Creates an instance of ProcessVideo.ProcessVideo for and calls the ProcessVideo.ProcessVideo.process() method until the whole video is processed. It regularily sends the progress feed-back through the connSocket socket.

Parameters
  • videoFile (str) – full path to the video file to be processed

  • outputFile (str) – full path to the output file

  • logger (Logger.Logger) – simple logger used mostly for debugging

sendMessage(message)

Sends the size of the message and the contents through the connSocket socket

Parameters

message (str) – data to be sent

Raises

RuntimeError – raised when socket send method fails

receiveMessage()

Reads a message from the connSocket socket

Returns

message data that was read

Return type

str

class ProcessVideo(modelPath, noClasses, videoFile, outputFile, segSize=125)

Wrapper for the instiating the CNN with a specified model and using it to process a video file.

videoFile: str

the full path to the video that will be processed by the current instance

outputFile: str

the full path to the output file where the results will be stored

noClasses: str

number of classed used by the model (current model uses 10)

modelPath: str

full path to trained model that will be used

segSize: int

maximum number of frames that will be read and processed at a time - depends on the available memory

process(logger)

Generator function for behaviour classification.

Creates a EthoCNN.EthoCNN instance with noClasses outputs and loads the trained model present in modelPath. The resulting object will be running in evaluation mode, and, if available, on the CUDA environment.

For each call, the method will process a maximum of segSize frames, store the results in the output file and return the percentage to the total frames that have been processed so far.

Each classification is done for a set of 11 frames. The process is performed with a stride of 5 as described in StackFrames.getTestTensors. The result is then attributed to the frame in the middle of the interval, the 3 frames before, and the 2 frames right after that. Therefore there are no behaviour bouts shorter than 240ms. This is conform to the behaviour definition and avoides noisy results, while speeding up the processing.

The current output contains these behaviours: drink, eat, mm+, hang, rear, rest, and walk

Parameters

logger (Logger.Logger) – simple logger used mostly for debugging

Yield

the percentage of frames already processed

Return type

int

class Logger(logPath)

Simple logger used mostly for debugging.

logFile: str

handle to the file where the logs will be writtem

setLogFile(logPath)

Creates logFile handle to the ‘logPath’ location.

Parameters

logPath (str) – path to where the log file will be created

log(message, printMess=False)

Writes the message in through the file handle logFile. Every message will be preceeded by the current time.

If printMess is true, the message will be also sent to the standard output.

Parameters
  • message (str) – message to be logged

  • printMess (bool, optional) – flag to print the message to the standard output, defaults to False

closeLogFile()

Closes the logFile handle.

Training and Testing

class TrainModel(noClasses, log)

Class to train a new model using a set of annotated video intervals.

optimizer: torch.optim.Optimizer

the handle to the optimizer used for updating the model’s parameters

criterion: torch.nn.CrossEntropyLoss

the handle to the loss class used for computing the lost gradients

model: EthoCNN

the model that will be trained

trainIntervals: []

the list of TrainInterval objects that form an epoch

noClasses: int

number of classes the model is initialized with

stagnating: int

counts the number of iterations where the average cost is under a certain threshold

logger: Logger

simple logger used mostly for debugging

initModel()

Initializes the model, the optimizer and the criterion.

If cuda is available, they will all be transfered to the GPU memory.

addTrainInterval(videoFile, annFile)

Creates a new TrainInterval object and adds it to the list in trainIntervals. This will use the video found at videoFile and the annotation file found at annFile path.

Parameters
  • videoFile (str) – path to the video file to be used for training

  • annFile (str) – path to the annotation file to be used as target

trainEpoch()

Train an epoch.

Calls the TrainInterval.TrainInterval.train() for all the items added in trainIntervals. The average cost, the used target annotaions, the training accuracy score and the corresponding confusion matrix are accumulated and displayed.

Returns

average cost

Return type

float

train(epochs)

Trains the model at model for a specified number of epochs. All but the first the intermediate models are saved.

The process might stop before if the average cost persists under a cetrain threshold for more then 5 steps.

Parameters

epochs (int) – the numer of epochs that will be trained

saveModel(path)

Saves the current model at the specified path.

Parameters

path (str) – path to the file where the model will be saved

loadModel(path)

Creates a new EthoCNN.EthoCNN instance with noClasses outputs and loads into it the state dictionary from the specified path.

Parameters

path (str) – path to the file where the model is found

cleanup()

Deletes the model, the optimizer and the criterion and calls system cleanup functions.

class TrainInterval(model, optimizer, criterion, videoFile, annFile, noClasses, logger)

Class to train a model with one annotated video interval.

videoFile: str

the path to the video file to be used for training

annFile: str

the path to the file containing the groundtruth for the video

model: EthoCNN

the handle to the model used for training

optimizer: torch.optim.Optimizer

the handle to the optimizer used for updating the model’s parameters

criterion: torch.nn.CrossEntropyLoss

the handle to the loss class used for computing the lost gradients

noClasses: int

number of outputs for the current model

segSize: int

the maximum number of frames that will be processed in parallel

logger: Logger

simple logger used mostly for debugging

train()

Trains the model at model with the frames from videoFile against the target in annFile

The process is split into steps of maximum segSize frames. The classification is done with a stride of 8, as described in StackFrames.getTensors(). For each step, the predicted annotations are computed, then the loss gradients against the target from the manual annotation. The optimizer then updates the model parameters.

Returns

the average training cost, the set of target annotations used, the training accuracy score and the corresponding confusion matrix

Return type

float, [], float, []

class TestModel(noClasses, modelPath, logger)

Class for testing the model found at modelPath with noClasses outputs.

testIntervals: []

the list of TestInterval objects

modelPath: str

path to the saved model that will be tested

noClasses: int

number of classes the model is initialized with

logger: Logger

simple logger used mostly for debugging

addTestInterval(videoFile, annFile)

Creates a new TestInterval.TestInterval object and adds it to the list in testIntervals. This will use the video found at videoFile and the annotation file found at annFile path.

Parameters
  • videoFile (str) – path to the video file to be tested

  • annFile (str) – path to the annotation file to be used

test()

Calls the TestInterval.TestInterval.test() method for all the objects in testIntervals. The results are accumulated and the final accuracy and confusion matrix are displayed at the end.

Intermediate accuracy results are also displayed to follow the progress.

class TestInterval(modelPath, noClasses, videoFile, annFile, logger)

Class for testing a saved model against a video file and the associated annotation file.

videoFile: str

path to the video file that will be used

annFile: str

path to the annotation file that will be used

segSize: int

the maximum number of frames that will be processed in parallel

noClasses: int

number of classes the model is initialized with

modelPath: str

path to the saved model that will be tested

logger: Logger

simple logger used mostly for debugging

test()

Loads the model at modelPath in a new EthoCNN.EthoCNN instace with noClasses outputs. This will be used to classify all the frames in the video found at videoFile location. The process is split into steps of maximum segSize frames.

The classification is done with a stride of 6, as described in StackFrames.getTestTensors(). The result is then attributed to the frame in the middle of the interval, the 3 frames before, and the 2 frames right after that. The current output contains these behaviours: drink, eat, mm+, hang, rear, rest, and walk.

These results are then compared to the ones from the annotation file for the corresponding frames to compute the accuracy score and fill the confusion matrix.

Returns

the accuracy score and the confusion matrix

Return type

float, []

SelectTrainingData