Neural Network Regression Demo

This demo fits a small neural network to a noisy quadratic function $y = x^2 + \text{noise}$ using PyTorch. It walks through the essential building blocks you’ll reuse in every deep RL lab.

Key Concepts

The Training Loop

Every neural network training loop follows the same four steps:

  1. Forward pass — feed input through the network to get predictions
  2. Compute loss — measure how far predictions are from targets (here, MSE)
  3. Backward pass — call loss.backward() to compute gradients via backpropagation
  4. Optimizer step — call optimizer.step() to update weights, then optimizer.zero_grad() to clear gradients for the next iteration

Network Architecture

The demo uses a single hidden layer with ReLU activation:

\[\text{Input}(1) \xrightarrow{\text{Linear}} \text{Hidden}(10) \xrightarrow{\text{ReLU}} \text{Hidden}(10) \xrightarrow{\text{Linear}} \text{Output}(1)\]

This is the simplest architecture that can learn a nonlinear mapping. ReLU (Rectified Linear Unit) introduces nonlinearity — without it, stacking linear layers would still be a linear function.

Things to Notice

  • Train vs. test data: Only the first 80 points are used for training. Watch how the green prediction curve fits the remaining 20 points it has never seen — this is generalization.
  • Loss decreasing: The loss value drops quickly at first, then levels off. This is typical gradient descent behavior.
  • torch.no_grad(): Used during prediction for visualization. It tells PyTorch not to track gradients, saving memory and computation when we don’t need to backpropagate.
  • Device handling: device = 'cuda' if torch.cuda.is_available() else 'cpu' lets the same code run on GPU or CPU. All tensors and the model must be on the same device.

Try Modifying

  • Change n_hidden from 10 to 2 — can the network still fit the curve?
  • Increase it to 100 — does it overfit the training data?
  • Change the learning rate lr=0.01 to 0.1 or 0.001 — how does convergence speed change?
  • Replace F.relu with torch.sigmoid — how does the fit change?
NN Regression Demo
Download
# %% [markdown]
# # Neural Network Regression Demo
#
# A simple example of fitting a neural network to a noisy quadratic function
# using PyTorch. This demonstrates the core training loop:
# **forward pass → compute loss → backward pass → update weights**.

# %%
import torch
import torch.nn.functional as F
import platform
if 'microsoft' in platform.uname().release.lower():
    import matplotlib
    matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# %% [markdown]
# ## Generate Data
#
# We create a noisy quadratic dataset: $y = x^2 + \text{noise}$.
# The first 80 points are used for training; the rest are held out.

# %%
x = torch.linspace(-1, 1, 100).unsqueeze(1).to(device)
y = (x.pow(2) + 0.2 * torch.rand(x.size(), device=device))

train_x = x[:80]
train_y = y[:80]

plt.figure(figsize=(10, 4))
plt.scatter(x.cpu(), y.cpu(), color="orange", label="All data")
plt.scatter(train_x.cpu(), train_y.cpu(), color="red", label="Training data")
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('Noisy Quadratic Data')
plt.show()

# %% [markdown]
# ## Define the Network
#
# A single hidden layer with ReLU activation. This is the simplest
# architecture that can learn a nonlinear mapping.

# %%
class Net(torch.nn.Module):
    def __init__(self, n_hidden=10):
        super().__init__()
        self.l1 = torch.nn.Linear(1, n_hidden)
        self.l2 = torch.nn.Linear(n_hidden, 1)

    def forward(self, x):
        return self.l2(F.relu(self.l1(x)))


net = Net(n_hidden=10).to(device)
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)

# %% [markdown]
# ## Training Loop
#
# Each iteration:
# 1. **Forward pass** — compute predictions
# 2. **Loss** — mean squared error between predictions and targets
# 3. **Backward pass** — compute gradients via backpropagation
# 4. **Optimizer step** — update weights using Adam

# %%
fig, ax = plt.subplots(figsize=(10, 6))

for step in range(200):
    # Forward pass
    pred = net(train_x)
    loss = F.mse_loss(pred, train_y)

    # Backward pass + update
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Live visualization
    ax.cla()
    ax.set_title('Regression Fit')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_xlim(-1.1, 1.5)
    ax.set_ylim(-0.25, 1.25)
    ax.scatter(x.cpu(), y.cpu(), color="orange", alpha=0.5, label="All data")
    ax.scatter(train_x.cpu(), train_y.cpu(), color="red", s=15, label="Train")

    with torch.no_grad():
        fit = net(x)
    ax.plot(x.cpu(), fit.cpu(), 'g-', lw=2, label="Prediction")

    ax.text(1.0, 0.1, f'Step {step}', fontsize=14, color='red')
    ax.text(1.0, 0.0, f'Loss {loss.item():.4f}', fontsize=14, color='red')
    ax.legend(loc='upper left')
    plt.pause(0.01)

plt.show()