{ "cells": [ { "cell_type": "markdown", "id": "a5a94bf9", "metadata": { "papermill": { "duration": 0.005199, "end_time": "2024-11-27T10:39:55.818412", "exception": false, "start_time": "2024-11-27T10:39:55.813213", "status": "completed" }, "tags": [] }, "source": [ "# Convolution layer" ] }, { "cell_type": "markdown", "id": "4a871151", "metadata": { "papermill": { "duration": 0.017831, "end_time": "2024-11-27T10:39:55.840507", "exception": false, "start_time": "2024-11-27T10:39:55.822676", "status": "completed" }, "tags": [] }, "source": [ "Digital images have [multiple channels](https://en.wikipedia.org/wiki/Channel_(digital_image)). The **convolution layer** extends the convolution operation to handle feature maps with multiple **channels**. The output feature map similarly has channels adding a further semantic dimension to the downstream representation. For an RGB image, a convolution layer learns three 2-dimensional kernels $\\boldsymbol{\\mathsf{K}}_{lc}$ for each output channel, each of which can be thought of as a **feature extractor**. Features across input channels are blended by the kernel:\n", "\n", "$$\n", "\\begin{aligned}\n", "{\\bar{\\boldsymbol{\\mathsf X}}}_{lij}\n", "&= {\\boldsymbol{\\mathsf u}}_{l} + \\sum_{c=0}^{{c}_\\text{in}-1} ({\\boldsymbol{\\mathsf X}}_{[c,\\,:,\\, :]} \\circledast {\\boldsymbol{\\mathsf K}}_{[l,\\,{c},\\, :,\\,:]})_{ij} \\\\\n", "&= {\\boldsymbol{\\mathsf u}}_{l} + \\sum_{c=0}^{{c}_\\text{in}-1}\\sum_{x = 0}^{{k}-1} \\sum_{y=0}^{{k}-1} {\\boldsymbol{\\mathsf X}}_{c,\\, i + x,\\, j + y} \\, {\\boldsymbol{\\mathsf K}}_{lcxy} \\\\\n", "\\end{aligned}\n", "$$\n", "\n", "for $l = 0, \\ldots, {c}_\\text{out}-1$. The input and output tensors $\\boldsymbol{\\mathsf{X}}$ and $\\bar{\\boldsymbol{\\mathsf{X}}}$ have the same dimensionality and semantic structure which makes sense since we want to stack convolutional layers as modules, and the kernel $\\boldsymbol{\\mathsf{K}}$ has shape $({c}_\\text{out}, {c}_\\text{in}, {k}, {k}).$ The resulting feature maps inherit the spatial ordering in its inputs along the spatial dimensions. The entire operation is linear and each convolution operation is independent for each output channel. \n", "\n", "**Remark.** This form is called two-dimensional convolution since the kernel scans two dimensions. Meanwhile, one-dimensional convolutions can be used to process sequential data. In principle, we can add as many dimensions as required." ] }, { "cell_type": "markdown", "id": "31bf34a3", "metadata": { "papermill": { "duration": 0.001955, "end_time": "2024-11-27T10:39:55.843790", "exception": false, "start_time": "2024-11-27T10:39:55.841835", "status": "completed" }, "tags": [] }, "source": [ "## Input and output channels" ] }, { "cell_type": "code", "execution_count": 1, "id": "0f5c9c7a", "metadata": { "execution": { "iopub.execute_input": "2024-11-27T10:39:55.847048Z", "iopub.status.busy": "2024-11-27T10:39:55.846878Z", "iopub.status.idle": "2024-11-27T10:39:58.248255Z", "shell.execute_reply": "2024-11-27T10:39:58.247887Z" }, "papermill": { "duration": 2.41057, "end_time": "2024-11-27T10:39:58.255509", "exception": false, "start_time": "2024-11-27T10:39:55.844939", "status": "completed" }, "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "
import random\n",
"import warnings\n",
"from pathlib import Path\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib\n",
"from matplotlib_inline import backend_inline\n",
"\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"\n",
"DATASET_DIR = Path("./data/").resolve()\n",
"DATASET_DIR.mkdir(exist_ok=True)\n",
"warnings.simplefilter(action="ignore")\n",
"backend_inline.set_matplotlib_formats("svg")\n",
"matplotlib.rcParams["image.interpolation"] = "nearest"\n",
"\n",
"RANDOM_SEED = 0\n",
"random.seed(RANDOM_SEED)\n",
"np.random.seed(RANDOM_SEED)\n",
"torch.manual_seed(RANDOM_SEED)\n",
"DEVICE = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("mps") if torch.backends.mps.is_available() else torch.device("cpu")\n",
"