dtorch.nn

Description

This module contain the layers that can be used to construct models. Here is an example of a Linear Neural Network:

from dtorch import nn
from dtorch.jtensors import JTensors

class MyModel(nn.Module):

    def __init__(self) -> None:
        super().__init__()

        self.linear_layer1 = nn.Linear(1, 32)
        self.linear_layer2 = nn.Linear(32, 4)


    def forward(self, x : JTensors):
        return self.linear_layer2(self.linear_layer1(x))


# data
x = JTensors([2, 3, 4, 5])

# model
model = MyModel()

# call the model
# give a tensor of size (4,) as it's the output of the layer2
result = model(x)

Layers

class Module

The module that other layers and modules are based on.

save(path: str)

Parameters:: path (str) – The path where the model will be saved.

Save the model to a file.

load(path: str)

Parameters:: path (str) – The path of the model.

Load the model from a file.

parameters() → list[Parameter]: Return the list of parameters of each submodules contained in the model reccursivly. It’s useful to give the control to an optimizer.

eval(): Set the dropout of the module and submodule to False and the require_grads attribute of the parameters to False. This permit a faster forward has gradients material don’t need to be kept.

train(): On the contrary of the eval method, set all dropout and require_grads of parameters to True.

class Linear(Module)

The most basic layer for applying a linear transformation to the data.

__init__(in_features, out_features, bias=True, device=None, dtype=None)

Applies a linear transformation to the incoming data \(y=xA^T+b\)

param int in_features:

size of each input sample

param int out_features:

size of each output sample

param bool bias:

If set to False, the layer will not learn an additive bias. Default: True

Example:

>> m = nn.Linear(20, 30)
>> input = dtorch.random(128, 20)
>> output = m(input)
>> print(output.size())
(128, 30)

class Sequential(Module)

A module that when applied, apply each modules that it registered iterativly

__init__(self, *modules: Module)

param Module modules:

The list of module to apply when appplying the module.

Ex:

model = nn.Sequential(
  nn.Linear(1,20),
  nn.ReLU()
)

# here is the equivalent of applying a linear then relu layer
model(x)

class Softmax(Module)

The equivalent of applying the functionnal.softmax method.

Mathematicaly: \(softmax(z)_j = \frac{e^{z_j}}{\sum{k=1}^Ke^{z_k}}\)

It’s a distribution function used to set result to a pourcentage, usualy used for classification purposes.

class Sigmoid(Module)

The equivalent of applying functionnal.sigmoid function.

Mathematicaly: \(f(x) = \frac{1}{1 + e^{-x}}\)

Similar to tanh but the result is in [0, 1] where the result of tanh is in [-1, 1]

It’s an activation layer.

class ReLU(Module)

equivalent to max(x, 0) in python.

According to the universal approximation theorem, any function can be represented using linear and relu layers one after another. It’s actually canceling the backpropagation of the gradients on some neurones.

class Tanh(Module)

equivalent to functionnal.tanh

It define the hyperbolic tangante to the curve. It’s an activation layer.

class Dropout(Module)

A layer that randomly make a neurone fail by not passing the data though to next layer with a given probability.

__init__(p: float = 0.5)

Parameters:: p (float) – The probaility of each neurone failing.

It’s useful for data augmentation and it has be proven to be a key component to less variance.

class RNN(Module)

A layer (uni / multi couche) that is used for building Reccurent Neural Networks.

__init__(input: int, hidden: int, num_layers: int = 1, non_linearity: str = 'tanh', bias: bool = True, batch_first: bool = False, dropout: float = 0)

Parameters:

input (int) – number of feature per elements.
hidden (int) – size of the context (the second output of the forward)
num_layers (int) – number of reccurent layers. Multiple layers cover more efficiently complex functions. Default : 1
non_linearity (str) – The activation function that is used σ. Default: ‘tanh’. Possibles ‘tanh’, ‘sigmoid’.
batch_first (bool) – Default to False. If false, input shape is (sequence_size, batch_size, input_size) else (batch_size, sequence_size, input_size)
bias (bool) – Determine if a weight should be added in the calcul or not. Default : False.
dropout (float) – Parameter that add a layer between layers and context similarly to the ReLU layer.

Mathematicaly, if \(h_t\) is the hidden state at time t, σ is the non_linearity function, Wi is the input weight and Wh is the hidden weight. Then if bias is False \(h_t = W_i * x_t + W_h * h_{t-1}\)

Otherwise, with \(B_i\) being the bias of the input and \(B_h\) being the bias of the hidden state: \(h_t = W_i * x_t + B_i + W_h * h_{t-1} + B_h\)

\(H_0\) is usualy initialised at 0, whereas other weights are initialised from the xavier distribution.

When calling the model, this one will output a tuple (y, hidden_state)

class Conv1d(Module)

A layer for convolution with weights that adapt through training.

Warning

This module has not been tested throughfully yet. If any problem seem to resort, do not hesitate to report about it on the github of the project.

__init__(in_channels: int, out_channels: int, kernel_size: int, stride: int | None = 1, bias: bool = True)

Parameters:

in_channels (int) – The number of channels the data has. Ex: In an image with 3 color, the data may have 3 channels.
out_channels (int) – Number of channel the output data need to have.
kernel_size (int) – width of the convolution kernel (the window going through the data).
stride (Optional[int]) – Default to 1. This parameter define the number of step the window take in space. (Not tested yet.)
bias (bool) – Define if a bias weight should be used.

Input shape: (batch, in_channel, width)

Output shape: (batch, out_channel, width - kernel_size) without counting the stride.

This layer is often used for a network to understand a complex data like images and audio.

class Conv2d(Module): Note

Implemented soon. (within 2 weeks)