Novelty Effect Detection: A Learning Guide
What You're About to Understand
After working through this guide, you'll be able to diagnose whether an A/B test result is inflated by novelty-driven behavior spikes — or whether that "declining treatment effect" you're staring at is actually a statistical illusion. You'll know which detection method to reach for in a given situation, why the most popular visualization for spotting novelty effects is dangerously misleading, and how to argue both sides of the "is novelty even real?" debate with practitioners who've spent decades in the field.
The One Idea That Unlocks Everything
The novelty effect is the sugar rush of product experimentation.
When you give a child candy, you get a burst of energy that doesn't represent their baseline activity level. If you measured their "productivity" during the sugar rush and projected it forward, you'd wildly overestimate what candy does for them long-term. The novelty effect works the same way: users encounter something new, their brain's dopamine system fires (literally — there's a dedicated neural circuit for this), they explore and engage more than they normally would, and then they settle back to whatever the change actually earns based on its real utility.
The trap is that most experiments measure during the sugar rush and call it the truth. The deeper trap is that sometimes what looks like a sugar rush wearing off is actually just a mathematical illusion created by how you drew the graph.
If you remember nothing else: the novelty effect is real behavior (not measurement noise), but it's detected incorrectly far more often than it actually occurs.
Learning Path
Step 1: The Foundation [Level 1]
Imagine you redesign the navigation menu on your app. You ship it to 50% of users as an A/B test. In week one, the treatment group clicks 40% more than control. By week three, the gap has shrunk to 8%. You ship the new nav expecting an 8% lift. Three months later, the real-world impact is... 8%. The system worked.
But here's what happened in between: that 40% initial spike wasn't the feature being great. It was returning users noticing the change and exploring it. They clicked around because the menu was different, not because it was better. That exploration phase — driven by curiosity, not utility — is the novelty effect.
The opposite exists too. Change aversion (sometimes called the primacy effect) is when users initially reject a change because it disrupts their habits. Facebook switching from chronological to algorithmic feeds initially decreased time-in-feed. Users hated it — at first. Then they adapted, and engagement climbed above the old baseline. Same phenomenon, opposite direction.
Both are "learning effects" — the transition period as users adjust to something new. The core pattern:
- Novelty effect: initial positive spike → decay → stabilization at true effect
- Change aversion: initial negative dip → recovery → stabilization at true effect
Key Insight: The novelty effect requires contrast. Users need an established expectation ("the old way") to experience novelty ("the new way"). This is why it primarily affects returning users of products they use frequently. A first-time visitor to your site has nothing to contrast against — though they do carry expectations from similar products.
Check your understanding:
1. Your test shows a treatment effect of +25% in week 1 and +25% in week 3. Is this consistent with a novelty effect? Why or why not?
2. Why would a backend algorithm change (invisible to users) be unlikely to produce a novelty effect?
Step 2: The Mechanism [Level 2]
The novelty effect isn't a metaphor. It's a neurological event.
Your brain contains a region called the zona incerta with neurons that fire specifically in response to novel stimuli — not because the stimuli are rewarding, but because they're new. Separately, dopamine neurons fire on prediction error: the gap between what you expected and what you got. A redesigned navigation menu violates your prediction of what the app looks like, dopamine fires, and your attention is captured.
This is the exploration side of the exploration-exploitation tradeoff. When monkeys were given dopamine transporter blockers in experiments, they "optimistically value and over-select novel options relative to the best alternative." The brain is wired to investigate new things even at the cost of exploiting known-good options. From an evolutionary standpoint, this is adaptive — checking out a new berry bush might reveal a better food source.
With repeated exposure, the prediction error shrinks (the brain updates its model), dopamine response decreases, and attention allocation normalizes. The thing you explored now has to earn your engagement based on its actual utility. This process — habituation — is the biological clock behind the novelty effect's decay.
Worked example: Detecting novelty via new-vs-returning segmentation
You've run a test for 4 weeks. Overall treatment effect: +15%. Now you segment:
| Segment | Treatment effect |
|---|---|
| New users (joined during test) | +14% |
| Returning users (existed before test) | +18% week 1, +12% week 4 |
New users show a stable +14% throughout — they have no "old normal" to contrast against, so there's no novelty response. Returning users show a declining effect. The gap between returning users' early effect (+18%) and their late effect (+12%) is your novelty estimate. The new-user effect (+14%) is your best estimate of the "true" treatment effect.
This works because novelty requires prediction error, and prediction error requires a prior expectation. No prior experience → no prediction error → no novelty effect.
But it's not perfect. New users aren't truly naive. They carry expectations from similar products, from marketing, from culture. And they may differ from returning users in demographics and tech-savviness, confounding the comparison.
Key Insight: An open question remains about the directionality of the mechanism — whether novelty actively increases dopamine signaling or instead prevents the gradual reductions that occur with habituation. The distinction matters for modeling, but either way, the behavioral outcome (temporary engagement spike) is the same.
Check your understanding:
1. Why does the new-vs-returning user segmentation rely on the concept of prediction error?
2. A product is used once per year (like tax software). Would you expect a strong novelty effect from a UI redesign? Why?
Step 3: The Hard Parts [Level 3]
Here is where the simple "spike then decay" story breaks down in at least four important ways.
Hard Part 1: The Cumulative Chart Illusion
This might be the single most important thing in this entire guide. David Swinstead, after 20 years in A/B testing, identified that the most common "evidence" for novelty effects is an optical illusion.
When you plot cumulative treatment effects over time, you'll see a chart that starts volatile, then stabilizes. Teams interpret this as: "Look, the novelty wore off and the true effect emerged." But cumulative averages always stabilize as more data accumulates. It's the Central Limit Theorem — each new day's data has proportionally less impact (1/n effect). This happens regardless of whether the underlying daily effects are changing at all.
Swinstead demonstrated this with simulations: identical datasets with no novelty effect produce cumulative charts that look exactly like "novelty wearing off." The fix is brutally simple: always examine daily treatment effects, not cumulative ones. If daily effects show a clear declining trend, you have a genuine novelty effect. If they show consistent variance throughout, you don't — the cumulative chart was lying to you.
Why do smart people keep falling for this? Confirmation bias. They learned about novelty effects, they expect to see them, and the cumulative chart appears to confirm. The human visual system is wired to find declining-then-stabilizing patterns meaningful. Overriding it requires deliberate analytical effort every time.
Hard Part 2: The U-Shaped Familiarization Curve
A 2022 gamification study found something the simple model doesn't predict: after the novelty spike decays (weeks 4-6), engagement rises again (weeks 6-10). This "familiarization effect" represents users who've habituated to the novelty but are now developing genuine competence and discovering real utility.
If this generalizes beyond gamification, it's transformative. The burn-in approach (discard early data) might also discard the beginning of the recovery. The "true long-term effect" might be higher than the post-novelty dip. Current correction methods could be systematically underestimating treatment effects.
Whether the U-curve generalizes is the single biggest open question in this domain.
Hard Part 3: The "Is It Even Real?" Debate
Ron Kohavi (Microsoft/Airbnb) and the Microsoft ExP team treat novelty effects as a frequent validity threat requiring systematic detection. David Swinstead argues they're "overstated, exaggerated, and scapegoated," with real evidence being "extremely rare."
The resolution is probably product-type dependent. High-frequency consumer apps (social media, search, email) — where users have strong habitual patterns — DO show novelty effects. Ecommerce, B2B, and low-frequency products likely show them rarely. The base rate across all A/B tests has never been systematically measured. No one knows the actual prevalence.
Hard Part 4: The Philosophical Problem
The standard framework assumes true effect = observed effect - novelty effect. But this requires two assumptions that deserve scrutiny:
- Stationarity: there IS a stable long-term effect to converge to. But what if behavior is inherently non-stationary?
- Separability: you CAN separate the novelty component from the real component. But the excitement of discovering a new feature has genuine value — it's part of the product experience.
Calling novelty an "effect" (implying contamination) rather than a "phase" (implying natural process) smuggles in a value judgment: only the steady state matters. But real products exist in time, and initial experiences have real value.
Check your understanding:
1. You're looking at a cumulative treatment effect chart that shows convergence after 2 weeks. What's the ONE thing you must do before concluding novelty has worn off?
2. Why might the U-shaped familiarization curve, if it generalizes, mean that current novelty correction methods are too aggressive?
The Mental Models Worth Keeping
1. Prediction Error = Novelty Fuel
The novelty effect runs on the gap between what the brain expected and what it encountered. No gap, no novelty. This is why new users, low-frequency users, and invisible backend changes are largely immune. Use it to pre-screen which tests are novelty-susceptible before you even run them.
2. Sugar Rush vs. Sustained Nutrition
Initial engagement tells you about curiosity; sustained engagement tells you about utility. When evaluating a test, always ask: "Is this metric measuring exploration or habitual use?" Click-through rate in week 1 is sugar rush data. Retention at week 4 is nutritional data.
3. The Cumulative Average Trap
Cumulative charts always converge — it's math, not signal. Any time someone shows you a cumulative metric chart as evidence of novelty, your first instinct should be: "Show me the daily data." This mental model prevents the most common false positive in novelty detection.
4. Two Competing Clocks
Novelty decays on a fast clock (days to weeks); familiarization grows on a slow clock (weeks to months). They produce a U-shaped curve when both are present. If you only measure during the fast clock's window, you'll miss the recovery. Think of it as: curiosity fades fast, competence builds slow.
5. The Contrast Requirement
Novelty effects need a "before" to contrast with an "after." The magnitude scales with: (a) how visible the change is, (b) how frequently the user interacts with the product, and (c) how strong their existing habits are. This three-factor model lets you estimate novelty risk before running a test.
What Most People Get Wrong
1. "Novelty effects always create positive spikes"
- Why people believe it: The word "novelty" sounds positive — exciting, shiny, new.
- What's actually true: Change aversion creates equally real negative initial spikes. Returning users may reject a change that disrupts their workflow. The effect works in both directions.
- How to tell in the wild: If your treatment effect is worse in early days and improves over time, you may be seeing change aversion, not a failed feature.
2. "Cumulative charts showing stabilization prove novelty wore off"
- Why people believe it: The visual pattern (volatile → stable) matches the novelty narrative perfectly.
- What's actually true: Cumulative averages mathematically must stabilize. It's the 1/n effect, not evidence of anything.
- How to tell in the wild: Plot daily treatment effects. If they're flat with random noise throughout, there was no novelty effect — the cumulative chart was an illusion.
3. "Just run the test longer and the problem goes away"
- Why people believe it: If novelty decays over time, more time = less novelty. Logically sound.
- What's actually true: Longer tests introduce new confounds — seasonality, user composition shifts, competitor actions, external events. You're trading one validity threat for others.
- How to tell in the wild: If your treatment effect is shifting after week 4, ask whether anything else changed in the environment, not just whether novelty is decaying.
4. "The novelty effect is a major concern for all A/B tests"
- Why people believe it: It's taught as a universal validity threat.
- What's actually true: It primarily affects visible changes to products with habitual returning users. Backend changes, pricing tests, content variations, and low-frequency products have minimal novelty risk.
- How to tell in the wild: Ask three questions — Is the change visible? Do users interact frequently? Are they returning users? If any answer is "no," novelty risk is low.
5. "Statistical correction can remove novelty effects"
- Why people believe it: If we can model the decay, we should be able to subtract it.
- What's actually true: Analytics Toolkit explicitly states "novelty effects should NOT be corrected with statistical methods." The effect is real user behavior, not statistical noise. You can model it, but "removing" it means deciding that real user experiences don't count.
- How to tell in the wild: If someone presents a "novelty-corrected" treatment effect, ask what modeling assumptions they made about the decay function and whether those assumptions were validated.
The 5 Whys — Root Causes Worth Knowing
Chain 1: Why do A/B test results overestimate long-term impact?
Tests capture novelty spike → Users explore out of curiosity, not utility → Dopamine fires on novel stimuli → Novelty-seeking is evolutionary adaptation → Low-cost exploration could find better resources
- Level 2 deep: Evolutionary novelty-seeking is misaligned with product measurement because digital features exist in rapidly changing contexts where novelty response doesn't predict long-term utility
- Level 3 deep: The frequentist framework underlying A/B testing was designed for single-point-in-time measurements, not dynamic systems with time-varying treatment effects
Chain 2: Why do cumulative charts create the novelty illusion?
Cumulative averages converge as sample size increases → Each new day has less proportional impact (1/n) → Variance of sample mean decreases with n (CLT) → This is math, not signal → Analysts interpret convergence as "stabilization" when it's actually "precision increasing"
- Level 2 deep: Practitioners persistently fall for this because it confirms the expected narrative — classic confirmation bias in data analysis
- Level 3 deep: Even when you know the math, the visual is compelling. The human visual system detects patterns, and a declining-then-stabilizing curve feels meaningful
Chain 3: Why is the burn-in approach fundamentally limited?
Choosing burn-in duration is arbitrary → Different user segments have different novelty durations → Discarding data reduces statistical power → The approach assumes a clean phase boundary that doesn't exist → The "stable phase" itself may be temporary
- Level 2 deep: Practitioners keep using it because simplicity beats correctness when experimentation is democratized
- Level 3 deep: The cost of a slightly wrong feature decision is small; the cost of requiring sophisticated statistics for every test is large. It's a rational organizational tradeoff
Chain 4: Why is the novelty effect especially problematic in digital products?
Digital products have habitual users who notice small changes → High-frequency usage creates strong baseline expectations → Any change contrasts with established patterns → Products are designed for habit formation → Habit-forming design is the dominant business model (attention economy)
- Level 2 deep: The attention economy amplifies novelty effects because constant feature shipping (to prevent user habituation) also contaminates measurement
- Level 3 deep: This creates a self-reinforcing loop: novelty inflates metrics → teams think more features = more value → they ship more → more novelty contamination → repeat
The Numbers That Matter
-
30% of a standard deviation — the average magnitude of Hawthorne/novelty effects in education research. That's roughly a 50-63% score rise in some contexts. To put that in perspective, the most effective educational interventions produce about 40-50% of a standard deviation. A novelty effect can be as large as the real effect of a genuinely good intervention.
-
2-27% — the range of novelty bias in clinical trials (from a meta-analysis of 522 trials). Novel interventions systematically appear this much better than when the same intervention is no longer new. That's not a rounding error in medicine — it's the difference between a drug getting approved or not.
-
1.18x — how much more effective a medicine appears when it's novel vs. when it's tested against something newer. Same drug, different perception of novelty, 18% difference in apparent effectiveness.
-
75% abandonment at 4 weeks — the rate at which undergraduate students stopped wearing fitness trackers. Compare this to 30% overall (Gartner). Students with no prior tracking habit — all extrinsic motivation, no intrinsic — lose interest fastest. The novelty-to-utility transition is where most users fall off.
-
4 weeks — the typical onset of novelty decline in gamification, aligning with approximately 4 exposures to a weekly-frequency product. This is your rough minimum test duration for any change to a product used weekly.
-
8 weeks — the duration after which Hawthorne/novelty effects generally decay to negligible levels. This is the threshold education researchers use for "reliable" study results. Most A/B tests run for 2 weeks. That's like measuring your marathon time by your first-mile split.
-
6-10 weeks — when the "familiarization recovery" begins in the U-curve model. If you stopped measuring at week 4 (when novelty bottoms out), you'd miss the entire rebound.
-
6% exaggeration (95% CI: 2-16%) — the amount chemotherapy treatment effects were inflated by novelty bias across 229 trials. In oncology, a 6% misestimate of drug effectiveness has life-or-death implications.
Where Smart People Disagree
Debate 1: Is the novelty effect common or rare in practice?
- Kohavi, Microsoft ExP, Statsig argue it's a frequent validity threat needing systematic detection. Microsoft published formal detection methods (DiD estimator) because they encounter it routinely in Edge and other products.
- Swinstead argues that after 20 years of A/B testing, finding real evidence is "extremely rare" and the concept is "scapegoated" — used to explain away inconvenient results.
- Unresolved because: no one has measured the base rate across a large set of experiments. Both sides are generalizing from their specific product domains. High-frequency consumer apps likely DO show it; low-frequency products likely don't.
Debate 2: Should you correct for novelty effects statistically?
- Analytics Toolkit says flatly: "novelty effects should NOT be corrected with statistical methods." The effect is real behavior, not measurement error. Examine the time series instead.
- Microsoft's Sadeghi et al. propose the DiD estimator specifically to estimate and remove the novelty component. Others advocate fitting decay curves and extrapolating the asymptotic effect.
- Unresolved because: it depends on the use case. For ship/no-ship decisions, qualitative understanding (time series analysis) suffices. For revenue forecasting, you may need a quantitative model. Neither side is wrong — they're answering different questions.
Debate 3: What do we do with the initial excitement?
- The standard framework treats novelty-driven engagement as contamination to be filtered out.
- The counter-argument: if a feature creates 2 weeks of genuine excitement, that excitement has real value. Products exist in time. The novelty framework implicitly says only the steady state matters — but who decided that?
- Unresolved because: it's a value judgment, not a technical question. The "right" answer depends on your product's business model and time horizon.
What You Don't Know Yet (And That's OK)
After absorbing this guide, you can detect novelty effects, avoid the most common false positives, and choose appropriate detection methods. Here's where your knowledge runs out:
- No one knows the true base rate. How often do novelty effects actually occur across all experiments? No systematic audit exists. Your estimates of novelty risk are based on heuristics, not data.
- The optimal burn-in period remains ad hoc. Conventions exist (1-2 weeks), but no principled method determines when novelty has dissipated for your specific change.
- The decay function debate is unresolved. Exponential? Power-law? U-shaped? The mathematical model you choose dramatically changes your long-term effect estimate, and no one has established which model is correct — it probably varies by context.
- Individual variation is unmeasured. Novelty-seeking is a real personality trait that varies enormously between people, but it's never been measured in A/B testing contexts. Your aggregate results blend high-novelty-seekers with low-novelty-seekers.
- Cross-experiment novelty contamination is unexplored. When companies run thousands of simultaneous experiments, the cumulative effect of many small changes may create a perpetual novelty state. The measurement implications are unknown.
- The U-curve's generalizability is unproven. The familiarization recovery has been demonstrated only in gamification. Whether it applies to product experimentation broadly is the most consequential open question in the field.
Subtopics to Explore Next
1. Difference-in-Differences (DiD) Estimation
Why it's worth it: Unlocks the most statistically powerful method for separating novelty from genuine treatment effects, as formalized by Microsoft Research.
Start with: Sadeghi et al. (2021), "Novelty and Primacy: A Long-Term Estimator for Online Experiments" on arXiv (2102.12893)
Estimated depth: Medium (half day)
2. Interrupted Time Series Analysis
Why it's worth it: Gives you a general-purpose framework for detecting any change in trend or level after an intervention — applicable far beyond novelty effects.
Start with: Bernal et al. (2017) tutorial in International Journal of Epidemiology: "Interrupted time series regression for the evaluation of public health interventions"
Estimated depth: Medium (half day)
3. Bayesian Changepoint Detection
Why it's worth it: Enables automatic detection of when the novelty phase transitions to the stable phase, eliminating the need for arbitrary burn-in periods.
Start with: Adams & MacKay (2007), "Bayesian Online Changepoint Detection" — then the PMC paper on novelty detection in attentional habituation
Estimated depth: Deep (multi-day)
4. The Exploration-Exploitation Tradeoff in Recommendation Systems
Why it's worth it: Reveals how platforms like Netflix and Spotify operationalize the same novelty-vs-familiarity tension your experiments face — at algorithmic scale.
Start with: Search "multi-armed bandit recommendation systems exploration exploitation"
Estimated depth: Medium (half day)
5. Holdout Testing Design and Implementation
Why it's worth it: Holdout tests are the gold standard for post-launch novelty measurement. Understanding their design lets you build the infrastructure for ground-truth validation.
Start with: Eppo's blog post "Holdouts: Measuring Experiment Impact Accurately" + CXL's "Hold-Out Groups: Gold Standard or False Idol?"
Estimated depth: Surface (1-2 hours)
6. Heterogeneous Treatment Effects (Causal Forests)
Why it's worth it: Lets you move from "does novelty exist in this test?" to "which user segments are affected?" — a much more actionable question.
Start with: Athey & Imbens (2016), "Recursive Partitioning for Heterogeneous Causal Effects"
Estimated depth: Deep (multi-day)
7. Habituation in Neuroscience
Why it's worth it: Grounds your understanding of novelty decay in the actual biological mechanism, making your intuitions about duration and magnitude more reliable.
Start with: WashU neuroscience study on zona incerta novelty-seeking neurons + PMC paper on dopamine modulation of novelty-seeking behavior
Estimated depth: Surface (1-2 hours)
8. Time-Varying Treatment Effects in Econometrics
Why it's worth it: The novelty effect is a specific case of a broader problem — treatments whose effects change over time. Econometrics has 30+ years of methods for this.
Start with: Search "time-varying treatment effects difference in differences econometrics"
Estimated depth: Deep (multi-day)
Key Takeaways
-
The most common "evidence" for novelty effects — cumulative charts stabilizing — is a mathematical certainty, not a signal. Always check daily data before concluding novelty exists.
-
Novelty effects require contrast. No established user expectations → no novelty response. Pre-screen tests by asking: visible change? Habitual users? High frequency?
-
The opposite of novelty is equally real. Change aversion creates negative initial dips. A treatment that looks like a failure in week 1 may be a winner by week 4.
-
"Run the test longer" trades one validity threat for others. Seasonality, composition shifts, and external events all increase with duration. It's a tradeoff, not a solution.
-
New-vs-returning user segmentation is the most accessible detection method and should be your default first check on any test involving visible UI changes.
-
The U-shaped familiarization curve means we might be over-correcting. If users recover engagement after the novelty dip, discarding early data also discards the start of the recovery signal.
-
Novelty detection itself is subject to analyst bias. Teams who expect novelty will choose methods (cumulative charts, short-period comparisons) that tend to confirm it.
-
The concept of "novelty effect" is weaponized politically in both directions — to dismiss winning tests someone dislikes AND to justify shipping tests that showed only initial promise.
-
Products designed for habit formation are the most susceptible. The attention economy's core business model (engagement loops, daily triggers) creates exactly the conditions where novelty effects thrive.
-
The "feature factory" feedback loop is self-reinforcing. Novelty inflates metrics → teams believe more features = more value → they ship faster → more novelty contamination → metrics stay inflated.
-
The brain has a dedicated novelty circuit (zona incerta) separate from the reward system. Novelty-seeking isn't "excitement about reward" — it's an independent cognitive drive. This means novelty effects exist even when the new thing isn't useful.
-
No principled method exists for choosing the optimal burn-in period. The 1-2 week convention is organizational habit, not science. Treat it as a starting heuristic, not a solution.
-
In clinical trials, novelty bias inflates apparent treatment effects by 2-27%. This isn't just a product experimentation problem — it affects life-or-death medical decisions.
-
The ratio of extrinsic-to-intrinsic motivation at adoption predicts novelty decay magnitude. Products adopted for their own novelty (fitness trackers bought on impulse) decay harder than products adopted as tools for existing goals.
Sources Used in This Research
Primary Research
- Sadeghi et al. (2021/2022), "Novelty and Primacy: A Long-Term Estimator for Online Experiments" — Microsoft Research, arXiv & Technometrics
- Springer/IJETHE (2022), "Gamification suffers from novelty but benefits from familiarization" — 14-week longitudinal study establishing the U-curve
- JAMIA Open (2019), "Beyond novelty effect: motivation for long-term activity tracker use"
- PMC (2018), "Dopamine Modulates Novelty Seeking Behavior During Decision Making"
- WashU Neuroscience (2021), study identifying zona incerta as novelty-seeking brain region
- PMC (2013), "Novelty detection in long-term attentional habituation using Bayesian changepoint"
- Bernal et al. (2017), "Interrupted time series regression tutorial" — International Journal of Epidemiology
- PMC (2021), "Three Statistical Approaches for Assessment of Intervention Effects"
- American Statistician (2023), "Statistical Challenges in Online Controlled Experiments"
- ResearchGate (2019), "Understanding the Novelty Effect: How long should an Experimental Intervention Last?"
- Catalog of Bias, "Novelty bias" — meta-analyses of clinical trial novelty effects (522 trials)
Expert Commentary
- David Swinstead, "The Novelty Effect Myth in AB Testing" — the skeptic's case
- Statsig (two articles), "Novelty effects: Everything you need to know" / "Why features get boosts"
- Ben Staples (Medium/Geek Culture), "The Novelty Effect: An Important Factor in A/B Tests"
- Kai Huang (Medium), "The Most Common Pitfall in Product Experiments: Novelty Effect"
- Mark Eltsefon (Medium), "Don't be afraid to run into novelty effect"
- Eppo, "Holdouts: Measuring Experiment Impact Accurately"
- CXL, "Hold-Out Groups: Gold Standard or False Idol?"
- SplitBase, "5 Validity Threats That Will Make Your A/B Tests Useless"
- Invesp, "Validity Threats to Your AB Test"
- Dirk Elston / JAAD (2021), "The novelty effect" (Letter from the Editor)
Good Journalism
- Clive Thompson (Medium/Message), "The Novelty Effect" — on high-tech tools and habituation
- Gartner (2016), press release on wearable device abandonment rates
Reference
- Ron Kohavi, "Trustworthy Online Controlled Experiments" (2020) / Experiment Guide
- Analytics Toolkit, glossary entry on "Novelty Effect"
- Wikipedia, "Novelty effect"
- DataCamp, "Novelty effects detection (Python)" course material
- scikit-learn documentation, "Novelty and Outlier Detection"