Treatment Personalisation
Treatment personalisation is the practice of assigning different interventions, experiences, or variants to different users based on their observed characteristics and learned response patterns.
Key points
- Yildirim frames Contextual Bandits as one-step reinforcement-learning methods for dynamic treatment personalisation: the system adjusts traffic based on which treatment works for whom [src-021].
- The distinction from a standard A/B test is adaptivity. A/B tests allocate traffic statically; multi-armed bandits adapt traffic overall; contextual bandits adapt traffic by user context [src-021].
- Running separate bandits for every context segment can work for tiny context spaces, but it breaks down as user characteristics grow. Contextual bandits use models to share information across contexts rather than enumerating every segment independently [src-021].
- Uplift modelling and contextual bandits both target personalisation, but contextual bandits personalise on the fly rather than waiting for a completed A/B test and a later uplift model [src-021].
- Hightouch adds the marketing-operations version of the same problem: true personalisation means deciding content, channel, timing, frequency, and offer for each customer based on their current context, not just their static segment [src-023].
- The personalisation gap appears when teams keep adding smaller segments and more journey rules but still cannot adapt to individual intent, urgency, and behaviour at scale [src-023].
- Hightouch’s reinforcement-learning framing adds the feedback loop: individual customer responses teach the system which offers, send times, channels, and message types work for that person and for similar customers [src-024].
- Hightouch’s contextual-bandit example contrasts Sarah and Marcus: one customer responds to exclusive previews on Saturday mornings, while another responds to discount offers on Wednesday evenings, so the same campaign executes differently for each customer [src-026].
- The benefit is precision over approximation: contextual bandits learn individual patterns directly rather than repeatedly shrinking segments and assuming everyone inside a group behaves the same [src-026].
Related entities
_(none yet)_
Related concepts
- Contextual Bandits
- Exploration-Exploitation Trade-off
- Offline Policy Evaluation
- AI Decisioning
- Personalisation Gap
- Reinforcement Learning for Marketing
- Agentic Marketing
- Customer Feature Matrix
Source references
- [src-021] Ugur Yildirim — “An Overview of Contextual Bandits” (2024-02-02)
- [src-023] Hightouch — “Under the hood of AI Decisioning, part one: Overcoming the personalization gap”
- [src-024] Hightouch — “Under the hood of AI Decisioning, part two: Reinforcement learning”
- [src-026] Hightouch — “Under the hood of AI Decisioning, part four: Contextual bandits”