Forecasting Bitcoin Prices (VAR, XGBoost, Prophet) — A Practical Python Walkthrough
Bitcoin is a high-volatility asset. That volatility is exactly why forecasting is hard—and why the modeling workflow matters more than any single algorithm. This article outlines a clean, reproducible pipeline for forecasting BTC using three complementary approaches: a multivariate VAR model, a feature-based XGBoost regressor, and Prophet for decomposable trend + seasonality baselines.
Scope & assumptions
- This is forecasting, not trading advice. A model that reduces error does not automatically translate into a profitable strategy.
- We forecast prices with daily data. If you need intraday predictions, the data engineering and feature design changes materially.
- We compare three modeling philosophies. Multivariate time-series dynamics (VAR), supervised regression on engineered lags (XGBoost), and decomposable trend/seasonality baselines (Prophet).
Data sources
This workflow uses BTC price history and an external macro signal that can plausibly co-move with BTC under certain market regimes (e.g., a dollar index / USD strength proxy). The goal is not to claim a causal story, but to test whether a multivariate signal improves forecast stability.
- Bitcoin historical data (Open/High/Low/Close/Volume)
- USD index / macro proxy (to test multivariate relationships)
If you don’t have the macro series, you can still run the full pipeline using BTC-only (Prophet + XGBoost with lags).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import TimeSeriesSplit
Preprocessing that actually holds up
Financial time series are typically non-stationary: trends and regime shifts dominate. You can either: (1) model non-stationarity explicitly (Prophet), or (2) transform the series to stabilize it (log + differencing) before fitting models like VAR.
Recommended transformations
- Sort by time and enforce a daily frequency.
- Forward-fill only when it’s justified (e.g., market holidays in macro series).
- Log transform for price-like series, then difference to remove trend.
- Split chronologically (no random train/test split).
# Example structure: you can adapt this to your data files.
# btc_df columns: Date, Close (and optionally Volume)
# usd_df columns: Date, Close (macro proxy)
btc_df = pd.read_csv("data/BTC-USD.csv", parse_dates=["Date"])
usd_df = pd.read_csv("data/USD.csv", parse_dates=["Date"])
btc = btc_df[["Date", "Close"]].rename(columns={"Close": "btc_close"}).sort_values("Date")
usd = usd_df[["Date", "Close"]].rename(columns={"Close": "usd_index"}).sort_values("Date")
df = (btc.merge(usd, on="Date", how="inner")
.dropna()
.set_index("Date")
.asfreq("D"))
# If your macro series has gaps on weekends/holidays:
df["usd_index"] = df["usd_index"].ffill()
df.head()
# Transform prices to stabilize variance; then difference to reduce trend.
log_df = np.log(df[["btc_close", "usd_index"]])
# First difference is often enough; second difference is sometimes used, but can over-whiten.
diff_df = log_df.diff().dropna()
diff_df.head()
Model 1: VAR (Vector Autoregression)
VAR is a multivariate time-series model: each variable is explained by its own lags and the lags of the other variables. It’s a solid choice when you believe your series move together over time and you want an interpretable linear baseline.
What VAR does well
- Captures lagged cross-effects between BTC and macro signals.
- Fast to fit and easy to diagnose.
- Baseline you can trust before moving to heavier models.
Where it struggles: strong nonlinearity, structural breaks, and “shock” regimes (very common in crypto).
from statsmodels.tsa.api import VAR
# Hold out the last N days for a simple backtest
N_TEST = 14
train = diff_df.iloc[:-N_TEST]
test = diff_df.iloc[-N_TEST:]
model = VAR(train)
# Let information criteria pick a sensible lag cap
res = model.fit(maxlags=30, ic="aic")
# Forecast in transformed space (diff of log)
fc_diff_log = res.forecast(y=train.values[-res.k_ar:], steps=N_TEST)
fc_diff_log = pd.DataFrame(fc_diff_log, index=test.index, columns=train.columns)
fc_diff_log.head()
def invert_diff_log_forecast(log_history: pd.DataFrame, fc_diff_log: pd.DataFrame) -> pd.DataFrame:
"""
Given:
log_history: historical log series (level), indexed by date
fc_diff_log: forecasted first-differences of log series
Returns:
forecasted log levels, then exponentiated price levels.
"""
last_log = log_history.iloc[-1]
fc_log = fc_diff_log.cumsum().add(last_log, axis="columns")
fc_level = np.exp(fc_log)
return fc_level
log_history_train = log_df.loc[train.index]
var_forecast_level = invert_diff_log_forecast(log_history_train, fc_diff_log)
var_forecast_level.head()
Model 2: XGBoost on lag features
XGBoost is not a time-series model by default. It becomes one when you build supervised features that encode time: lags, rolling statistics, and cross-series lags (e.g., lagged USD index changes).
Why this helps
- Nonlinear relationships (common in crypto) are easier to approximate.
- Flexible features let you incorporate volume, volatility, and macro context.
- Strong tabular baseline with a known bias-variance profile.
import xgboost as xgb
# Use log returns for more stable modeling
ret = np.log(df[["btc_close", "usd_index"]]).diff().dropna()
def make_lag_features(frame: pd.DataFrame, lags=(1,2,3,7,14), roll_windows=(7,14)) -> pd.DataFrame:
out = frame.copy()
for col in frame.columns:
for L in lags:
out[f"{col}_lag{L}"] = frame[col].shift(L)
for w in roll_windows:
out[f"{col}_rollmean{w}"] = frame[col].rolling(w).mean()
out[f"{col}_rollstd{w}"] = frame[col].rolling(w).std()
return out.dropna()
feat = make_lag_features(ret)
target = feat["btc_close"] # next-day log return target (aligned after dropna)
X = feat.drop(columns=["btc_close"])
y = target
# Chronological split (no leakage)
N_TEST = 60
X_train, X_test = X.iloc[:-N_TEST], X.iloc[-N_TEST:]
y_train, y_test = y.iloc[:-N_TEST], y.iloc[-N_TEST:]
model = xgb.XGBRegressor(
objective="reg:squarederror",
n_estimators=600,
learning_rate=0.03,
max_depth=4,
subsample=0.9,
colsample_bytree=0.9,
reg_alpha=0.0,
reg_lambda=1.0,
random_state=42
)
model.fit(X_train, y_train)
pred_ret = pd.Series(model.predict(X_test), index=y_test.index, name="pred_btc_logret")
pred_ret.head()
# Build a forecasted price series from returns
last_price = df["btc_close"].loc[pred_ret.index.min() - pd.Timedelta(days=1)]
pred_price = (np.exp(pred_ret.cumsum()) * last_price).rename("xgb_btc_price")
pred_price.head()
Model 3: Prophet baseline
Prophet is useful as a decomposable baseline: it models trend + seasonality + holidays (optional) and provides uncertainty intervals. In crypto, it’s rarely “the best” model in absolute error terms across all regimes, but it is a strong governance baseline: if your fancy model can’t beat Prophet out-of-sample, it’s not production-ready.
Note: older notebooks often use fbprophet.
The current package is typically prophet.
from prophet import Prophet
btc_prophet = df.reset_index()[["Date", "btc_close"]].rename(columns={"Date":"ds", "btc_close":"y"})
# Optional: log-transform can help for exponential growth regimes
# btc_prophet["y"] = np.log(btc_prophet["y"])
m = Prophet(daily_seasonality=True)
m.fit(btc_prophet)
future = m.make_future_dataframe(periods=60, freq="D")
forecast = m.predict(future)
forecast[["ds", "yhat", "yhat_lower", "yhat_upper"]].tail()
Evaluation: keep it honest
With time series, evaluation failures are usually process failures: leakage, bad splits, or comparing models in different spaces (returns vs prices) without translating consistently. The clean approach: evaluate on the same target and horizon using chronological splits.
Metrics that are meaningful
- MAE (robust, interpretable)
- RMSE (penalizes big misses—important in crypto)
- Directional accuracy (optional, if you care about sign of returns)
def rmse(y_true, y_pred):
return float(np.sqrt(mean_squared_error(y_true, y_pred)))
# Example evaluation for XGBoost price path (pred_price) vs actual btc_close
actual = df["btc_close"].loc[pred_price.index]
mae_val = float(mean_absolute_error(actual, pred_price))
rmse_val = rmse(actual, pred_price)
mae_val, rmse_val
Conclusions and recommended next upgrades
- VAR gives a disciplined multivariate baseline—good for understanding lag relationships, weak in nonlinear shock regimes.
- XGBoost is the best “tabular” workhorse once you engineer lag/rolling features and enforce leakage-safe splits.
- Prophet is a governance baseline: if you can’t beat it out-of-sample, your pipeline isn’t ready.
Notebook
Full notebook (GitHub): crypto_forecast.ipynb
If you want a cleaner rendered view (no GitHub UI friction): View on nbviewer