What Does “Similar” Mean for Indoor Temperature Time Series?
Introduction: why indoor temperature time series analysis matter
Indoor temperature time series are rich signals that encode a mixture of information:
- Building thermal properties (insulation, thermal mass)
- Occupant behavior
- Weather influence
- HVAC equipment power and dynamics
- Control strategies (thermostats, HVAC policies)
Because of this, temperature trajectories are widely used in:
- Energy efficiency analysis
- HVAC control optimization
- Fault detection
- Control policy comparison and selection
However, comparing temperature time series is not straightforward, and different methods can yield very different notions of “similarity”.
In this article, we try to answer the following question:
How to compare indoor temperature time series effectively?
But our answer looks aims at being able to say that two temperature trajectories are “similar” if they correspond to similar thermal behavior and/or control policies, not just if they are numerically close or show similar shapes.
Ultimately, we’re looking for similarity metrics that are proxies for causal equivalence.
This blog post explores several ways to compare indoor temperature time series, ranging from purely mathematical distances to model-based approaches rooted in building physics. We will highlight their strengths, weaknesses, and typical use cases, using reproducible Python snippets throughout.
Problem setup
We assume we observe indoor temperature time series collected over comparable periods (e.g. daily episodes), possibly under different HVAC control policies.
Let each trajectory be represented as:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
T = np.array([...]) # shape: (time_steps,)
Here is an example of a real indoor temperature series captured in a building, at night (when HVAC is off) and early morning (HVAC on).
From this original time series, we derive 3 other time series from the first by:
- applying a +2h shift on the time axis.
- applying +2h shift on the time axis and +1°C shift on the temperatures’ axis.
- applying +2h shift + gaussian noise.
Below code excerpt shows how to quickly generate this synthetic data:
# 2h shift (sampling freq is 15min)
T_ts = np.roll(T, shift=8)
# 2h & 1°C shifts
T_ts_vs = T_ts + 1.0
# 2h + gaussian noise
T_ts_noise = T_ts + np.random.normal(0.0, 0.2, len(T_ts))
Here is how they look like:
They don’t look very similar due to the shifts and noise, but they all correspond to the same thermal dynamics, just observed under different conditions. The methods we will explore should ideally capture this similarity.
In order to compute and visualize pair-wise distances, we’ll use the following method:
# we pin labels as they shouldn't change throughout the post
labels = [
"temp. series (orig)",
"temp. series (2h shift)",
"temp. series (2h & 1°C shift)",
"temp. series (2h shift + noise)"
]
def dist_heatmap(dist: np.ndarray, labels: list[str], method: str):
assert dist.shape[0] == dist.shape[1], "dist matrix should be square (pair-wise distances)"
assert dist.shape[0] == len(labels), "number of dist rows should be equal to number of labels"
mask = np.triu(np.ones_like(dist, dtype=bool))
sns.heatmap(dist, annot=True, xticklabels=labels, yticklabels=labels, mask=mask)
plt.title(f"Heatmap of distances (method: {method})")
Method 1 - Euclidean Distance (Pointwise Comparison)
Theory
The simplest way to compare two time series is the Euclidean distance also known as L2 norm:
$d(T_1, T_2) = \sqrt{\sum_t (T_1(t) - T_2(t))^2}$
This approach assumes:
- Perfect time alignment
- Comparable sampling rates
- Differences at each timestamp matter equally
Pros and Cons
Pros
- Very fast
- Easy to interpret
- Works well for aligned, low-noise data
Cons
- Extremely sensitive to time shifts
- Penalizes small delays harshly
- Not robust to control-induced phase shifts
Example
Here is a simple code excerpt to compute pair-wise L2 distances.
# compute pair-wise distances
values = np.array([T, T_ts, T_ts_vs, T_ts_noise])
diff = values[np.newaxis, :] - values[:, np.newaxis]
dist = np.sqrt(np.sum(diff**2, axis=2))
dist_heatmap(dist=dist, labels=labels, method="Euclidean distance")
We see in the heatmap below that the Euclidean distance fails to capture the similarity between the original and shifted/noisy series.
Even if temperatures are normalized (mean subtraction or z-scoring), Euclidean distance remains highly sensitive to time misalignment. Normalization fixes amplitude issues, not temporal ones.
Method 2 - Dynamic Time Warping (DTW)
Theory
Dynamic Time Warping (DTW) addresses temporal misalignment by non-linearly warping time to align similar patterns.
Instead of matching point-to-point, DTW finds an optimal alignment path that minimizes cumulative distance.
Theoretically, DTW solves: $DTW(T_1, T_2) = \min_{warp} \sum_{(i,j) \in warp} |T_1(i) - T_2(j)|$
Where warp is a set of index pairs defining the alignment.
Interpretation
DTW is particularly useful when:
- Two policies produce similar thermal dynamics
- But with shifted times
Pros and Cons
Pros
- Robust to time shifts
- Captures shape similarity
Cons
- More expensive computationally ($O(N²)$ complexity, which can become prohibitive for large datasets)
- Can over-align unrelated patterns
- Less interpretable distances
- Sensitive to noise if not constrained
Example
# tslearn is a great package for time-series manipulation
from tslearn.metrics import dtw
# compute DTW distance between original and shifted/noisy series
from itertools import combinations
dist = np.zeros((values.shape[0], values.shape[0]))
for i, j in combinations(range(values.shape[0]), 2):
dist[i, j] = dist[j, i] = dtw(values[i], values[j])
dist_heatmap(dist=dist, labels=labels, method="Dynamic Time Warping")
The DTW distance heatmap below shows that DTW has reduced distances between the original and shifted/noisy series, and less variance overall, but still captures some differences due to value shifts and noise.
Method 3 - Frequency Domain Comparison with Fast Fourier Transforms (FFT)
Theory
Indoor temperature dynamics often exhibit:
- Daily cycles
- Slow thermal inertia
- Periodic control behavior
The Fast Fourier Transform (FFT) allows us to compare signals in the frequency domain rather than time.
Specifically, we compute the FFT as following:
$C(f) = \sum_{t=0}^{N-1} T(t) \cdot e^{-2\pi i f t / N}$
Where:
- (C(f)): complex FFT coefficients at frequency (f)
- (N): number of time steps
We then compare the magnitude spectrum, which captures the strength of different frequency components, ignoring phase information. We’re particularly interested in low-frequency components that reflect thermal dynamics, so we can truncate the FFT to the first frequencies.
Interpretation
FFT-based comparison emphasizes:
- Dominant thermal time constants
- Oscillatory behavior
- Control aggressiveness
It ignores exact timing and focuses on structural similarity.
Pros and Cons
Pros
- Robust to both time and value shifts, and noise
- Highlights periodic patterns
- Captures global dynamics
Cons
- Loses temporal localization
- Harder to interpret physically
- Sensitive to window length
Example
We take our real and synthetic indoor temperature time series and compute FFTs on them.
def compute_fft_magnitude_with_freqs(data, fs, ncoeffs=10, no_dc=True):
N = len(data)
# magnitude of FFT
c_fft = abs(np.fft.fft(data))
# number of coefficients to keep. By default, half of the spectrum.
if ncoeffs is None:
ncoeffs = N // 2
# frequencies
freqs = np.fft.fftfreq(N, d=1/fs)
# keep positive frequencies
c_fft = c_fft[:ncoeffs + 1]
freqs = freqs[:ncoeffs + 1]
# remove DC component if requested
if no_dc:
c_fft = c_fft[1:]
freqs = freqs[1:]
return freqs, c_fft
# compute FFT for distinct series, sampling freq = 15min
freq = 1 / (15 * 60)
fft_values = np.array([compute_fft_magnitude_with_freqs(series, fs=freq)[1] for series in values])
# compute pair-wise distances in frequency domain
diff = fft_values[np.newaxis, :] - fft_values[:, np.newaxis]
dist = np.sqrt(np.sum(diff**2, axis=2))
# plot
dist_heatmap(dist=dist, labels=labels, method="FFT Magnitude Distance")
Below we show that, despite having temporal and value shifts, magnitude of the low frequencies FFT components are very “close” (strictly identical apart for the noisy one).
Computed distances verify our visual analysis:
Method 4 - RC Model Parameter Identification (Model-Based)
Theory
For each pair of zones we want to compare, we can identify parameters of a model and compare the parameters.
We can simplify the model to a first-order Resistance-Capacitance (RC) model. With following governing equation:
$C \cdot \frac{dT}{dt} = Q_{hvac} - \frac{(T - T_{out})}{R}$
Where:
- $C$ = Thermal Capacitance (e.g. thermal inertia). Larger $C$ means slower temperature change.
- $R$ = Thermal Resistance (insulation quality).
- $T$ = Indoor temperature.
- $T_{out}$ = Outdoor temperature.
- $Q_{hvac}$ = Heating power input (unknown, assumed constant when ON).
This formulation assumes negligible internal and solar gains and a single well-mixed thermal zone. These assumptions don’t always hold. When the heating is OFF ($Q_{hvac}$ = 0), the temperature decays to $T_{out}$. When the heating is ON ($Q_{hvac}$ = $P_{heat}$, a constant), the temperature rises towards a higher equilibrium.
Implementation Steps
- Data prep: select 2 time series for which outdoor air temp aligns. Segment each time series into cooling and heating periods (skipped here for brevity).;
-
Curve fitting: estimate parameters of cooling and heating periods equations:
-
Cooling: $T(t) = T_{out} + (T_{initial} - T_{out}) * e^{\frac{-t}{\tau}}$ where $\tau = R \cdot C$ (and optionally $T_{out}$) are the parameters fitted. $T_{out}$ is assumed constant over the fitting window.
-
Heating: $T(t) = T_{inf} - (T_{inf} - T_{initial}) * e^{\frac{-t}{\tau}}$ where $T_{inf} = T_{out} + R * P_{heat}$ is the parameter fitted.
After step 2, the 2 main parameters fitted are:
-
$\tau = R \cdot C$ reflects the combined effect of thermal mass and insulation and should be interpreted as a global thermal time constant. A large $\tau$ could be: high thermal mass (large $C$) or high insulation (large $R$) or both.
-
$G = R * P_{heat}$, product of resistance - can be considered as constant in a building - and heating power, which is thereafter $G$ (or
gainin the code).
-
-
Quantify closeness of the 2 sets of parameters using:
$\sqrt{\frac{(\tau_1 - \tau_2)}{\max{\tau_1, \tau_2}}^2 + \frac{(G_1 - G_2)}{\max{G_1, G_2}}^2}$
This produces a scale-free distance of range $[0, \sqrt(2)]$.
Practical considerations:
- use actual weather data as $T_{out}$ instead of fitting it => much less variance on fitted $\tau$.
- instead of identifying HVAC on and off segments from indoor temp, simply use the actuator value of the HVAC equipment when available.
- check
np.sqrt(np.diag(pcov))fromcurve_fitto get error estimates on fitted parameters.
Interpretation
Instead of asking:
Are these trajectories similar?
We ask:
Do these trajectories correspond to similar thermal systems?
Pros and Cons
Pros
- Physically interpretable
- Robust to noise and shifts
- Useful as a feature for clustering and policy evaluation
Cons
- Requires additional signals (outdoor temp, actuator states)
- Model mismatch can bias results
- More complex to implement
Example
Below is a code excerpt to fit cooling and heating curves and compute pair-wise distances based on RC parameters.
from scipy.optimize import curve_fit
def cooling_model(t, tau, T_out, T0):
return T_out + (T0 - T_out) * np.exp(-t / tau)
def heating_model(t, tau, T_inf, T0):
return T_inf + (T0 - T_inf) * np.exp(-t / tau)
def estimate_cooling(time, temp, T_out):
t = (time - time[0]) * 1e-9 # ns -> s
T0 = temp[0]
popt, pcov = curve_fit(
lambda t, tau: cooling_model(t, tau, T_out, T0),
t,
temp,
# initial guess for tau, 3600s is a reasonable value for building thermal inertia
p0=(3600,)
)
tau = popt[0]
return tau
def estimate_heating(time, temp, tau):
t = (time - time[0]) * 1e-9 # ns -> s
T0 = temp[0]
popt, pcov = curve_fit(
lambda t, T_inf: heating_model(t, tau, T_inf, T0),
t,
temp,
p0=(temp.max(),)
)
T_inf = popt[0]
return T_inf
def analyze_period(time_cool, temp_cool, time_heat, temp_heat, T_out):
tau = estimate_cooling(time_cool, temp_cool, T_out)
T_inf = estimate_heating(time_heat, temp_heat, tau)
heating_gain = T_inf - T_out
return tau, heating_gain
def thermal_distance(tau1, gain1, tau2, gain2):
d_tau = abs(tau1 - tau2) / max(tau1, tau2)
d_gain = abs(gain1 - gain2) / max(gain1, gain2)
return np.sqrt(d_tau**2 + d_gain**2)
Below plot shows fitted cooling and heating curves on the original temperature time series after segmenting into cooling and heating periods.
On the distance matrix below, we see that the RC parameter-based distance captures similarity between the original and shifted/noisy series, with a slightly larger distance for the noisy one. Note: given the distance computation method, the scale is different from previous methods ($[0, \sqrt(2)]$ here vs larger ranges before). All the scores are below 0.3 except for the noisy one, indicating strong similarity.
Summary: When to Use What?
Below we summarize the main methods discussed, their best use cases, and limitations.
| Method | Best Use Case | Main Limitation |
|---|---|---|
| Euclidean | Aligned, low-noise signals | Time sensitivity |
| DTW | Shifted or delayed responses | Over-alignment |
| FFT | Structural, periodic similarity | Loss of timing |
| RC parameters | Physical & control similarity | Model dependence |
Here is a quick decision table based on available data:
| Method | Data required |
|---|---|
| Euclidean | Indoor temp |
| DTW | Indoor temp |
| FFT | Indoor temp |
| RC parameters | Indoor + outdoor temp + HVAC state |
Beyond similarity: toward decision-making
Comparing temperature time series is a powerful tool, but similarity alone is not enough when decisions are involved.
For example:
- Selecting the best control policy among a pool
- Based on recent episodes and temperature outcomes
- Without deploying each policy online
This naturally leads to more advanced frameworks such as:
- Off-policy evaluation
- Contextual multi-armed bandits
- Policy selection under uncertainty
Time series comparison can serve as a drift detection, feature extraction or filtering step, but robust decision-making requires statistical guarantees and counterfactual reasoning.
👉 In a follow-up post, we will explore how temperature trajectories can be embedded into policy selection pipelines, bridging building control, reinforcement learning, and bandit theory.