Phase 2 · Session 04 · 50 min

Drawing Lines Through Dots

Big idea

The simplest "learning" task: given dots on a graph, find the line that fits them best. Linear regression — derived as a direct extension of $y = m x + b$ from algebra class.

By the end, you'll be able to

Read a scatter plot and sketch a reasonable best-fit line
Translate y = mx + b into ML notation (ŷ = wx + b)
Compute predictions by hand for a small dataset
Use numpy and scikit-learn to fit a line in code

A line is a function with two knobs

This is the bridge from algebra class to ML. Take it slowly.

Recall $y = m x + b$ from algebra. It's the equation of a line:

m is the slope: how steep the line is. If you increase x by 1, y goes up by m.
b is the intercept: where the line crosses the y-axis. The value of y when x = 0.

In ML, you use this exact same equation, but you re-cast its meaning. You're going to use a line to make predictions. So:

x is the input (the thing we know about an example, like hours studied)
y is the true output for that example (the actual exam score)
ŷ ("y-hat") is the model's predicted output (what the line says for this x)
The line is your model:

\overset{y}{^} = m x + b

The line predicts ŷ for any input x. The slope m tells you how much ŷ changes per unit increase in x. The intercept b is the prediction when x = 0.

Notation switch (gentle)

ML papers and textbooks use w instead of m (for "weight") and keep b (for "bias," same as intercept). Same equation, different letters:

\overset{y}{^} = w x + b

Why the rename? Because when you have multiple input features, you'll have multiple slopes. Calling them all "m" gets confusing. Calling them $w_{1}, w_{2}, w_{3}$ scales naturally. You'll see this in Chapter 7. From now on, w is the slope and b is the intercept.

Why "weight" and "bias"? "Weight" because it tells you how much weight this feature should get when computing the prediction. A big weight means this feature matters a lot. A small weight means it doesn't. "Bias" is older terminology from neural networks, and it just means "the constant that shifts the prediction up or down." Don't confuse it with bias-the-fairness-concept from Chapter 3. Same word, different meaning.

Parameters are what you learn

Lock this in: the model is the equation $\overset{y}{^} = w x + b$ . The parameters are the specific numbers w and b that make this line this particular line.

When you say "I trained a model," you mean: I found good values of w and b for the data. That's literally it for linear regression. Every concept that comes after (gradient descent, neural networks, transformers) is sophisticated machinery for finding good parameters.

For linear regression in 1D, there are only 2 parameters: w and b. For a neural network, there might be billions. The principle is identical.

Computing a prediction by hand

Given specific values of w and b, computing ŷ for any x is just arithmetic.

Example: suppose w = 5 and b = 50, and x = 8 (a student studies 8 hours).

\overset{y}{^} = w x + b = 5 \times 8 + 50 = 90

The model predicts a score of 90.

Try a few of these on paper. Vary w and b. Different (w, b) pairs give different predictions for the same x. The whole point of training is to pick the (w, b) that gives good predictions for all the points in your dataset, not just one.

Quick exercise: for each row of (w, b, x), compute ŷ.

w	b	x	ŷ = ?
3	10	4	?
0.5	20	100	?
-2	50	5	?
1	0	7	?

(Answers: 22, 70, 40, 7.)

Linear regression in three ways

You'll compute predictions three ways, in increasing levels of abstraction.

Way 1 — pure Python

Write the formula directly.

# A single prediction by hand.
w = 5      # slope
b = 50     # intercept
x = 8      # input: hours studied

y_hat = w * x + b
print("Prediction:", y_hat)   # 90

Way 2 — multiple predictions with numpy

Real datasets have many inputs. You use arrays.

import numpy as np

w = 5
b = 50
# A whole array of inputs.
x_array = np.array([1, 2, 4, 6, 8, 10, 12])

# Numpy applies the operation to every element. This is called "vectorization."
y_hat_array = w * x_array + b
print(y_hat_array)

Output

What just happened: numpy's array multiplication is element-wise. Adding 50 adds 50 to each element. You computed 7 predictions in one line.

Way 3 — plot the line

import matplotlib.pyplot as plt

# Some made-up data.
x_data = np.array([1, 2, 4, 6, 8, 10, 12])
y_data = np.array([55, 65, 70, 75, 90, 95, 115])  # actual exam scores

# Your guessed line.
w, b = 5, 50
x_line = np.linspace(0, 13, 100)         # 100 points from 0 to 13
y_line = w * x_line + b

plt.scatter(x_data, y_data, label='actual data')
plt.plot(x_line, y_line, color='red', label=f'line: y = {w}x + {b}')
plt.xlabel('Hours studied')
plt.ylabel('Exam score')
plt.legend()
plt.show()

The plot shows the dots and your guessed line. Some dots are above the line, some below. Your line is approximately right, but not perfect.

Pause and look at it. Could you do better? Drag the line in your imagination — slope up a bit, intercept down a bit. The question for next chapter is: how do you measure how good a line is? Without that measurement, "better" has no meaning.

When linear regression makes sense

Linear regression assumes the relationship between x and y is roughly a straight line. That sounds restrictive but covers a lot of real data:

Hours studied vs. exam score
Square footage vs. house price
Years of experience vs. salary
Outdoor temperature vs. ice cream sales

It does not make sense when:

The relationship is wildly non-linear (a thrown ball's trajectory, a parabola)
The output is a category (spam vs not spam — classification, Phase 3)

Fit a line with scikit-learn

You've been computing predictions for guessed w and b. What if you let sklearn find the best w and b for you?

from sklearn.linear_model import LinearRegression

# Your data, reshaped into the form sklearn expects.
# X must be 2D: (n_examples, n_features). Even with one feature, you need 2D.
X = np.array([[1], [2], [4], [6], [8], [10], [12]])
y = np.array([55, 65, 70, 75, 90, 95, 115])

# Create the model. Train it. Print the learned w and b.
model = LinearRegression()
model.fit(X, y)

print("Learned w:", model.coef_[0])      # the slope
print("Learned b:", model.intercept_)     # the intercept

# Predict for a new student.
x_new = np.array([[7]])   # 7 hours studied
y_pred = model.predict(x_new)
print(f"Predicted score for 7 hours: {y_pred[0]:.1f}")

Two-line training. The slope and intercept print out something like w = 5.3, b = 50.2.

sklearn just found the "best" line. You don't yet know how it defined "best." You don't yet know how it found it. That's Chapters 5 and 6. But you can already use this in real projects.

Vocabulary

Parameter / weight (w)—The "knob" that tells the model how much each feature matters. In y = wx + b, w is the slope.

Bias (b)—The intercept. The model's prediction when all features are zero.

Prediction (ŷ)—What the model outputs, written y-hat.

Linear regression—The algorithm "fit a straight line through data, use it to predict numbers."

ActivityPredict your class's grades· 20 min

In Colab:

Make a small dataset by hand. 10 rows. Two columns: hours_studied and exam_score. Make up plausible numbers.
Plot it as a scatter chart.
Fit a LinearRegression from sklearn.
Print the learned w and b.
Predict scores for hypothetical study times: 0 hours, 5 hours, 20 hours.
Stretch: add a row of bizarre data (say, 5 hours studied with a score of 5; or 15 hours with a score of 40). Refit the model. How much did w and b change? Why?

Questions you might have

What if my data isn't a straight line?

What if I have more than one input?

How did sklearn find w and b?

This seems too simple to be useful.

Next upChapter 5 — How wrong are we?

Today you drew a line through dots. Next, we make "best fit" precise. We define what it means for one line to be better than another, and derive the most important formula in introductory ML: Mean Squared Error.

Drawing Lines Through DotsLab · in development