Model documentation

Crime forecasting model methodology

This page explains how an XGBoost-style public safety forecasting workflow can work, how this demonstration model approximates that logic, and which risks need governance before any system is treated as decision support.

Back to forecasting system
Part 1

What XGBoost adds to crime forecasting

XGBoost stands for Extreme Gradient Boosting. It is a supervised machine-learning method that builds many decision trees in sequence, where each new tree tries to correct errors made by the previous ones.

For forecasting work, XGBoost is often used when structured data contains mixed feature types, nonlinear relationships, threshold effects, and interactions between variables such as time, weather, land use, and area characteristics.

  • It handles structured tabular data well.
  • It can capture interactions such as weekend evening + nightlife area + rain.
  • It usually performs strongly without heavy feature scaling.
  • It can be tuned for regression tasks such as hourly incident-count prediction.
Part 2

How it fits a public safety demo

In a full machine-learning version, the target could be incident_count for one sample area at one hour. Candidate features would include:

  • area type, density, population, nightlife, student, and vulnerability markers
  • hour of day, day of week, weekend/night indicators, and seasonality
  • weather context and persistence blocks
  • historical incident patterns from past area-hour combinations

The application layer would then convert predicted incident counts into map colors, patrol-planning signals, risk bands, and short operational summaries.

Prototype model

Parameters used in this crime forecasting demo

The active map uses a deterministic, readable forecast engine inspired by an ML workflow. It blends historical area-hour behavior, sample-wide temporal patterns, weather context, and a small motion wave so the logic can be inspected without hiding everything inside a black box.

Component Current setting What it does
Forecast horizon 48 hours Builds an hourly outlook for the next two days.
Area exact history weight 0.55 Uses the same area + weekday + hour pattern as the main anchor.
Area hour fallback weight 0.18 Falls back to area + hour behavior across all weekdays.
Area day fallback weight 0.12 Adds area + weekday behavior independent of exact hour.
Area mean weight 0.10 Keeps each area tied to its overall historical baseline.
Shared temporal bridge weight 0.05 Injects sample-wide time-of-day pressure into the area baseline.
Temporal pulse clamp 0.70 to 1.34 Prevents timing effects from becoming unrealistically large.
Weather factors clear 1.04, rain 0.92, snow 0.82, storm 0.88, extremes 0.90 Modulates hourly incident expectations by weather state.
Weather outlook logic Mostly clear, with rain blocks lasting 3 to 6 hours Keeps the short forecast visually stable and easy to follow.
Motion wave 1 +/- 0.05 sinusoidal variation Adds slight hour-to-hour movement so adjacent hours are not identical.
Series smoothing 0.22 previous, 0.56 current, 0.22 next Smooths the hourly forecast to avoid harsh spikes between neighboring hours.
Risk score formula 0.68 incident intensity + 0.32 per-capita pressure Converts forecast counts into a 1 to 10 visible risk score.
Operational thresholds

Risk-to-patrol strategy bands

  • Risk 8-10: visible patrol presence and fast-response positioning
  • Risk 6-7: targeted patrol pass during the higher-demand window
  • Risk 1-5: routine coverage with watchlist awareness
Future upgrade path

What a production public safety model would add

  • real model training and validation on rolling historical windows
  • feature importance and SHAP-style interpretability layers
  • separate calibration of prediction intervals and rare-event handling
  • automatic refresh from newly ingested weather and incident feeds
Next-generation concept

How the forecasting system could improve

A stronger future version would go beyond this area-level prototype and evolve into a richer forecasting service. The main upgrade path is to add more informative features, more granular spatial modeling, live API integrations, and a stronger governance layer around the map.

1. Richer feature engineering

The next model should ingest more predictive features. Incident category detail, such as whether an event is traffic-related, disturbance-related, or tied to another public-order type, would let the model distinguish between very different demand patterns.

  • incident category and subcategory
  • traffic-specific versus public-order-specific signals
  • event calendars, paydays, school cycles, and nightlife seasonality
  • lagged history, persistence features, and confidence intervals

2. Finer GIS and spatial resolution

Instead of forecasting only by broad sample area, a real implementation could use a much finer GIS grid, for example a 100 x 100 meter lattice. That would reveal hot corridors, transport routes, venue clusters, and recurring micro-locations.

  • grid-based aggregation instead of area-only aggregation
  • street-network, transport-node, and land-use context
  • clearer separation between corridor effects and broad area averages

3. Live APIs and refreshed predictions

A fuller application could communicate with weather services through APIs and use live weather feeds when computing each new hourly forecast. In an operational ecosystem, it could also ingest carefully governed aggregated incident counts and refresh predictions automatically.

  • live weather API integration
  • aggregated incident feeds with privacy and governance controls
  • hourly refresh cycles for every area and forecast step

4. Interactive product layer

The interactive map is useful because it makes assumptions visible. A stronger version would keep the heatmap, hourly slider, numerical forecast, and risk-score overlay, but drive them from a validated model with uncertainty estimates and visible audit trails.

Users should be able to inspect each area hour by hour, read why a signal changed, and see when the model is uncertain enough that no action should be taken.

5. Scalable service concept

The same logic could be scaled from a small pilot to a larger regional service, but only if the data pipeline, monitoring, governance, and public accountability mechanisms scale with it.

A reusable plugin could embed the forecast map into a local website or internal dashboard, but the model should remain auditable rather than becoming a hidden score generator.

6. Operating model and governance

Any real deployment would need a clear ownership model, data-sharing rules, validation cadence, appeal process, and shutdown criteria. The system should define who is allowed to act on a forecast and what evidence is required before a recommendation changes patrol behavior.

Without those safeguards, a forecasting interface can convert uncertain data into confident-looking recommendations and create harmful feedback loops.