Appendix: Benchmarking

Recall BP has time and memory complexity that is linear in network size. This assumes each node executes in constant time and the outputs are stored. Moreover, the gradient should never be asymptotically slower than the function (assuming local gradient computation takes constant time). Testing this here empirically.

%config InlineBackend.figure_format = "svg"
from tqdm.notebook import tqdm
import time
import matplotlib.pyplot as plt


x = [Node(1.0)] * 3
network_size  = []
fwd_times = {}
bwd_times = {}

for i in tqdm(range(10)):
    nouts = [200] * (i + 1) + [1]
    model = MLP(n_in=3, n_outs=nouts, activation="relu")

    fwd_times[i] = []
    bwd_times[i] = []
    for j in range(5):
        t0 = time.process_time()
        pred = model(x)
        t1 = time.process_time()
        fwd_times[i].append(t1 - t0)

        t0 = time.process_time()
        pred.grad = 1.0
        pred.backward()
        t1 = time.process_time()
        bwd_times[i].append(t1 - t0)

    network_size.append(len(model.parameters()) + sum(nouts) + 3)

../../../_images/cdc437b33fdc1614b2ec6aea257e2ff4b5f6de403c48dfa47bea416fd6942e5d.svg

Fig. Roughly linear time complexity in network size for both forward and backward passes.