📖 Case Study Playbook: Commodity Price Forecasting & Sourcing Analytics

Step 1: Problem Definition & Sourcing Alignment

Commodity price volatility is a major risk factor in manufacturing sectors where raw materials form a substantial portion of cost of goods sold (COGS). Traditional sourcing operates reactively—either buying at the prevailing spot rate or relying blindly on the previous month's averages. This approach exposes profit margins to sudden macroeconomic surges and supply contractions.

To transition from reactive to data-driven procurement, a predictive analytics project must align two distinct domains: sourcing operations and quantitative modeling.

Translating Sourcing Strategy to Analytics Objectives

A sourcing team has specific operational choices (e.g., placing spot orders versus locking monthly forward averages). The model's target is not merely to get a low error metric, but to output directional accuracy and variance estimates that allow sourcing managers to select the optimal contract option.

🎯 Alignment Framework

Business Risk: Profit margin erosion due to raw material price spikes.
Procurement Action: Lock forward contract (hedging) or purchase spot (capitalizing on downtrends).
Analytics Objective: Forecast the average commodity price for the next 30 days and predict the direction (uptrend vs. downtrend) with high statistical confidence.

Sourcing Constraints to Account For

Lead Time: Mining, shipping, and delivery schedules require buying decisions to be locked days or weeks in advance.
Minimum Order Quantities (MOQ): Sourcing managers cannot buy arbitrary amounts daily; purchases are batched, making weekly or monthly averages key.
Contract Structures: Spot pricing (highly volatile daily rate) vs. average average pricing (locked average rate over a monthly trading window).

Step 2: Data Engineering & Feature Pipeline

A forecasting model is only as good as its underlying data. Sourcing daily data across global exchanges requires a robust data engineering pipeline to handle asynchronous trading calendars, reporting lags, and feature scaling.

1. Sourcing and verifying primary data

Commodity pricing data must be gathered from official exchanges. For base metals like Copper, Aluminum, and Lead, the London Metal Exchange (LME) is the primary global benchmark. Since trading calendars differ between exchanges (e.g., LME in London, SHFE in Shanghai, COMEX in New York), cross-verification with premium institutional market databases is necessary to ensure data sanctity.

2. Handling missing values and data gaps

Financial time series contain natural gaps (weekends and national holidays). Simply dropping these rows disrupts sequential time series models like ARIMA.

Linear Interpolation: Calculates intermediate values based on the adjacent active trading days, preserving continuous sequential paths.

Y_t = Y_{t-1} + \left( \frac{Y_{t+k} - Y_{t-1}}{k+1} \right)

Forward Fill: Carries forward the last known price. Useful for exchange inventory levels (stocks) since stock levels remain unchanged during holidays.

3. Variable selection and multicollinearity

To explain metal price volatility, we collect macroeconomic indicators (USD Index, DXY), energy costs (Brent Crude), exchange rates (USD/INR), and regional exchange prices (SHFE, COMEX).

Multicollinearity (high correlation between independent variables) can inflate coefficients and destabilize regression models. We drop redundant features (e.g., keeping only one benchmark treasury yield instead of several closely correlated tenors).

4. Scaling and Normalization

Independent variables exist on vastly different scales (e.g., stock levels in millions of tons, crude oil in dollars, interest rates in percentages). Distance-based and gradient-descent algorithms (like SVM and MLP Neural Networks) will prioritize larger-scale features if not scaled. We apply Standardization:

Z = \frac{X - \mu}{\sigma}

This centers features around a mean of 0 with a standard deviation of 1, preventing numerical instability during training.

Step 3: Exploratory Data Analysis (EDA)

EDA identifies statistical characteristics, variance, and seasonal trends to guide model selection and highlight key market relationships.

1. Descriptive Statistics & Volatility Signals

Analyzing the range, mean, and standard deviation reveals the historical volatility of each target commodity.

Commodity Target	Min Price ($/MT)	Max Price ($/MT)	Average Price ($/MT)	Standard Deviation ($)	Volatility Status
LME Copper Price	$4,310.50	$10,857.00	$6,989.86	$1,585.81	Severe Volatility / Skewed
LME Aluminum Price	$1,421.50	$3,984.50	$2,042.67	$398.47	Moderate / Cyclical
LME Lead Price	$1,568.50	$2,683.00	$2,028.91	$231.25	Stable / Mean-Reverting

2. Distribution Analysis: Skewness & Kurtosis

Normal distributions have a kurtosis of 3. Commodities like Copper (4.11) and Aluminum (4.98) display high kurtosis (leptokurtic). This indicates heavy-tailed distributions with a high frequency of extreme outliers (sudden price spikes). Model validation must prepare for these extreme shifts.

3. Correlation Heatmaps & Cross-Market Indicators

Heatmaps reveal how commodities move in relation to exogenous variables:

Energy correlation: Brent Crude Oil correlates strongly with Copper and Lead (0.68), reflecting the impact of logistics and refining energy costs.
Exchange rate correlation: The USD Index (DXY) correlates negatively with global commodities, as a stronger dollar makes dollar-denominated metals more expensive in other currencies.
Arbitrage correlation: LME Aluminum has a 0.89 correlation with Shanghai Futures Exchange (SHFE) Copper, identifying a strong cross-commodity, cross-exchange market link.

4. Seasonality and Day-of-Week Evaluation

Analyzing returns by day of the week helps identify recurring short-term trends:

Monday: Historically has the highest price drops, but exhibits severe volatility (high risk/reward).
Thursday: Offers balanced, moderate savings with lower volatility risk.
Wednesday: Displays the lowest price movements (safest day to buy, but with minimal potential savings).

Step 4: Time Series & ARIMA/SARIMA Modeling

Classical time series models rely on historical lags and residuals. To model these patterns, we must understand stationarity and lag structures.

1. Stationarity Testing: Augmented Dickey-Fuller (ADF) Test

A stationary series has a constant mean, variance, and autocorrelation over time. If a series is non-stationary, models like ARIMA will yield unreliable, spurious regressions. The ADF test evaluates:

🔍 ADF Hypotheses

- Null Hypothesis (H₀): The series has a unit root (is non-stationary).
- Alternative Hypothesis (H₁): The series is stationary (no unit root).

If the test statistic is more negative than the critical value (p-value < 0.05), we reject the null hypothesis and confirm stationarity.

Differencing to Achieve Stationarity

Daily commodity prices are almost always non-stationary due to trend structures (Copper p-value: 0.695, Aluminum p-value: 0.434). We apply Differencing (d=1):

Y'_t = Y_t - Y_{t-1}

Subtracting the previous value removes trends, stabilizing the mean.

2. Reading ACF and PACF Plots

After differencing, Autocorrelation (ACF) and Partial Autocorrelation (PACF) plots help identify the lag structures for the model:

Autocorrelation Function (ACF): Measures correlation between $Y_t$ and lag $Y_{t-k}$, including indirect effects. Cutoffs in ACF help identify the Moving Average (MA) order q.
Partial Autocorrelation Function (PACF): Measures correlation between $Y_t$ and lag $Y_{t-k}$, controlling for intermediate lags. Cutoffs in PACF identify the Autoregressive (AR) order p.

3. Inside the SARIMA Model

SARIMA combines lag parameters and seasonal adjustments:

\text{SARIMA}(p,d,q)(P,D,Q)_s

p (AR order): Number of historical price lags used to predict the current price.
d (Differencing order): Number of times differencing is applied to achieve stationarity.
q (MA order): Number of historical residual forecast error lags utilized.
P, D, Q: Seasonal AR, seasonal differencing, and seasonal MA orders.
s: Seasonal period length (e.g., 12 for monthly cycles).

4. Model Selection Criteria: AIC & BIC

To select the best model orders, we balance model fit against complexity (preventing overfitting) using AIC (Akaike Information Criterion):

\text{AIC} = 2k - 2\ln(\hat{L})

where $k$ is the number of parameters and $\hat{L}$ is the maximum likelihood. The model order with the lowest AIC is selected (e.g., Copper's optimal model: `SARIMA(2,0,4)(0,0,2)[12]`, AIC: 1775.78).

Step 5: Machine Learning & Ensemble Regressors

While classical time series models focus on linear historical lag structures, machine learning regressors are designed to capture complex, non-linear relationships across multiple market factors.

1. Linear Regressors: Lasso & Ridge

Linear models are highly interpretable but prone to underfitting when pricing displays non-linear volatility. Regularization is applied to control coefficient sizes:

Lasso (L1 Regularization): Adds a penalty proportional to the absolute values of the coefficients: $\text{Penalty} = \lambda \sum_{j=1}^p |\beta_j|$ Lasso forces irrelevant coefficients to absolute zero, acting as an automatic feature selector.
Ridge (L2 Regularization): Adds a penalty proportional to the squared values of the coefficients: $\text{Penalty} = \lambda \sum_{j=1}^p \beta_j^2$ Ridge shrinks coefficients close to zero but retains all variables, helping manage multicollinearity.

2. Tree Ensembles: Random Forest & XGBoost

Decision trees recursively split data based on feature thresholds. Individual trees are highly prone to overfitting, which we mitigate using ensemble methods:

Random Forest (Bagging): Trains multiple decision trees in parallel on bootstrapped data samples and random feature subsets. The final prediction averages the outputs: $\hat{y} = \frac{1}{B} \sum_{b=1}^B T_b(x)$ This significantly reduces model variance.
XGBoost (Gradient Boosting): Trains trees sequentially. Each new tree is optimized to predict the residual errors of the combined preceding trees: $y_i^{(t)} = y_i^{(t-1)} + \eta f_t(x_i)$ XGBoost uses regularization and gradient descent to minimize errors, making it highly effective for complex commodity markets.

3. Advanced Non-Linear Models: SVM & MLP Neural Networks

Support Vector Regression (SVR): Maps input data into higher-dimensional spaces using kernel functions (like RBF) to find a hyperplane that fits data within a tolerance boundary $\epsilon$. SVR is effective for high-dimensional data, but selecting optimal parameters (C, $\gamma$) is difficult.
Multi-Layer Perceptron (MLP): A feedforward neural network that learns complex non-linear mappings through hidden layers and backpropagation. MLP is highly powerful, but prone to overfitting on noisy financial data if not constrained by early stopping.

⚠️ Technical Insight: Tuning Complexity

While Neural Networks (MLP) and SVR capture non-linear relationships, they underperform compared to tree-based ensembles (XGBoost/RF) in noisy commodity markets. Neural networks struggle with low signal-to-noise ratios and tend to memorize training noise, leading to higher out-of-sample errors (e.g. Lead MLP RMSE: 144.30 vs. XGBoost RMSE: 51.26).

Step 6: Training, Testing & Validation Methodology

To guarantee that a forecasting model performs well in production, validation must be structured to prevent data leakage and measure generalization.

1. Chronological Split vs. Random Split

In standard machine learning, datasets are shuffled and split randomly. In time-series forecasting, this is a critical error (data leakage).

Shuffling allows future data points to appear in the training set to predict past prices, creating artificially high accuracy during testing that fails in production.

We apply a Chronological Split (e.g. training on 2014-2022 data and testing strictly on 2023-2024 data), preserving temporal ordering.

2. k-Fold Time Series Cross-Validation

To evaluate model stability, standard k-fold CV cannot be used. We apply an Expanding Window Time-Series Split:

Fold 1: Train on Year 1-5, test on Year 6.
Fold 2: Train on Year 1-6, test on Year 7.
Fold 3: Train on Year 1-7, test on Year 8.

This ensures the training set only contains observations that occurred prior to the test set, measuring performance stability across different market cycles.

3. Production Simulation: Out-of-Sample Rolling Validation

In production, models generate forecasts over a forward horizon (e.g. 30 days) using rolling averages of input features.

We validate this by comparing the model's 30-day predicted averages against actual monthly prices. The table below illustrates this comparison using actual sample data:

Commodity	Validation Phase	Actual price ($/MT)	Random Forest Predicted ($/MT)	XGBoost Predicted ($/MT)	Lasso Predicted ($/MT)	Validation Insight
Copper (Cu)	First Half Avg	$9,480.10	$9,200.45	$9,195.67	$9,147.34	Tree-based ensembles capture the mid-month trend changes accurately. Lasso (linear) consistently underestimates, showing limits in capturing complex market shifts.
	Second Half Avg	$8,977.65	$9,123.29	$9,093.94	$9,054.65
	Full Month Avg	$9,228.88	$9,145.34	$9,143.67	$9,086.52
Aluminum (Al)	First Half Avg	$2,520.90	$2,420.67	$2,415.44	$2,382.38	Random Forest provides the most consistent out-of-sample forecast, matching the monthly average price with minor error.
	Second Half Avg	$2,366.50	$2,466.24	$2,442.56	$2,399.49
	Full Month Avg	$2,443.70	$2,443.62	$2,427.55	$2,375.62

Copper Validation: Actual vs. Predictions

Aluminum Validation: Actual vs. Predictions

Step 7: Evaluation Metrics & Selection Criteria

To select the final production model, performance must be measured across multiple metrics. Different metrics capture different types of prediction errors.

1. Metric Definitions & Sourcing Intuition

R-Squared (R² - Coefficient of Determination): Measures the proportion of variance in the target variable explained by the model: $R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}$ *Sourcing Intuition:* Indicates general model fit. However, a high $R^2$ can still mask large absolute errors in volatile periods.
Root Mean Squared Error (RMSE): Measures the standard deviation of residuals: $\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}$ *Sourcing Intuition:* RMSE squares the errors before averaging, penalizing larger deviations more heavily. Useful when large pricing errors carry severe financial risks.
Mean Absolute Error (MAE): Measures the average magnitude of absolute errors: $\text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|$ *Sourcing Intuition:* MAE treats all errors linearly. It represents the typical dollar-value deviation of predicted prices, making it highly intuitive for procurement managers.

2. Model Selection Benchmarks (Sample Results)

Algorithm	Commodity	R² Score	RMSE ($/MT)	MAE ($/MT)	Selection Assessment
XGBoost (Tuned)	Copper	0.998	64.35	44.26	Selected. Lowest overall error metrics.
XGBoost (Tuned)	Aluminum	0.983	46.71	32.32	Selected. Superior handling of non-linear variance.
Random Forest	Lead	0.943	52.84	35.97	Selected. Outperforms XGBoost on Lead by reducing overfitting.
Random Forest	Copper	0.998	67.78	44.91	Alternative. High stability, slightly higher RMSE than XGBoost.
Lasso Regression	Copper	0.998	78.04	56.44	Underperformed. Linear model struggles with rapid mid-month shifts.
SVM (RBF Kernel)	Lead	0.825	92.32	69.24	Rejected. Struggles with volatile patterns.
MLP (Neural Network)	Copper	0.989	164.52	119.51	Rejected. High error variance; prone to overfitting noise.

R² Score Comparison Across Models

Step 8: Operational Sourcing Decision Framework

The final stage of the analytics lifecycle is deployment, converting predicted values into actionable procurement decisions.

📈 1. Sourcing Allocation: Spot vs. Monthly Average average

If the model predicts a price trend shift, procurement managers adjust allocations:

Model Predicts an Uptrend (>2% month-on-month increase): Secure inventory early. Allocate 75% of demand to forward average-pricing contracts on week one, hedging against price surges.
Model Predicts a Downtrend (>2% month-on-month decrease): Capitalize on falling prices. Shift 70% of demand to daily spot purchases, postponing buying to later weeks.

🇨🇳 2. Cross-Market Lag Indicator Arbitrage

LME prices correlate strongly with preceding moves in Asian markets due to timezone differences.

The Indicator: Shanghai Futures Exchange (SHFE) Copper prices correlate heavily (0.89) with LME Aluminum and Lead.
Operational Rule: Monitor SHFE prices at 10:00 AM IST daily. If SHFE rises significantly, buy LME spot inventory immediately to capitalize on lag-1 autocorrelation (coefficient: 0.82).

🛡️ 3. Volatility Cushion: Buffer Stock Calculations

Macroeconomic and geopolitical shocks (e.g., shipping delays or supply constraints) can cause pricing to deviate from historical models.

Operational Rule: Use the model's 300-day rolling standard deviation to determine inventory buffer sizes. During high-volatility months (typically August and September), increase physical buffer stock by 10% to mitigate supply chain disruption risks.