当前位置:文档之家› Detection Algorithms for Biosurveillance:A tutorial

Detection Algorithms for Biosurveillance:A tutorial

Detection Algorithms for Biosurveillance: A tutorial
Greg Cooper Bill Hogan Andrew Moore Robin Sabhnani Rich Tsui Mike Wagner Weng-Keen Wong
Professor Assistant Professor Professor and co-director of Auton Lab Senior Programmer, Auton Lab Computer Science and gfc@https://www.doczj.com/doc/8e5529835.html, RODS lab, U. Pitt RODS lab, U. Pitt Computer Science, Carnegie Mellon Robotics Institute, Carnegie Mellon wrh@https://www.doczj.com/doc/8e5529835.html, awm@https://www.doczj.com/doc/8e5529835.html, sabhnani@https://www.doczj.com/doc/8e5529835.html, tsui@https://www.doczj.com/doc/8e5529835.html, mmw@https://www.doczj.com/doc/8e5529835.html, wkw@https://www.doczj.com/doc/8e5529835.html,
Research Professor and RODS lab, U. Pitt associate Director of RODS lab Professor and Director of RODS lab RODS lab, U. Pitt
Graduate Student, Auton Lab Computer Science, Carnegie Mellon
Tutorial slides by Andrew Moore
Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: https://www.doczj.com/doc/8e5529835.html,/~awm/tutorials . Comments and corrections gratefully received.
RODS: https://www.doczj.com/doc/8e5529835.html,/rods Auton Lab: https://www.doczj.com/doc/8e5529835.html,
Biosurveillance Detection Algorithms: Slide 1
Copyright ? 2002, 2003, Andrew Moore
Many Methods!
Method Time-weighted averaging Serfling ARIMA SARIMA + External Factors Univariate HMM Kalman Filter Recursive Least Squares Support Vector Machine Neural Nets Randomization Spatial Scan Statistics Bayesian Networks Contingency Tables Scalar Outlier (SQC) Multivariate Anomalies Change-point statistics FDR Tests WSARE (Recent patterns) PANDA (Causal Model) FLUMOD (space/Time HMM) Has Pitt/CMU tried it? Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Tried but little used Yes Yes Yes Yes Yes Tried and used Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes (w/ Howard Burkom) Yes Under development Multivariate signal tracking? Spatial ?
Yes Yes Yes Yes Yes Yes Yes
Yes
Yes
Yes Yes Yes Yes
Yes Yes Yes
Details of these methods and bibliography available from “Summary of Biosurveillance-relevant statistical and data mining technologies” by Moore, Cooper, Tsui and Wagner. Downloadable (PDF format) from https://www.doczj.com/doc/8e5529835.html,/~awm/biosurv-methods.pdf
Copyright ? 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 2
1

What you’ll learn about
? Noticing events in bioevent time series ? Tracking many series at once ? Detecting geographic hotspots ? Finding emerging new patterns
Copyright ? 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 3
What you’ll learn about
? Noticing events in bioevent time series ? Tracking many series at once ? Detecting geographic hotspots ? Finding emerging new patterns
Copyright ? 2002, 2003, Andrew Moore
These are all powerful statistical methods, which means they all have to have one thing in common…
Biosurveillance Detection Algorithms: Slide 4
2

What you’ll learn about
? Noticing events in bioevent time series ? Tracking many series at once ? Detecting geographic hotspots ? Finding emerging new patterns
Copyright ? 2002, 2003, Andrew Moore
These are all powerful statistical methods, which means they all have to have one thing in common… Boring Names.
Biosurveillance Detection Algorithms: Slide 5
What you’ll learn about
? Noticing events in bioevent time series ? Tracking many series at once ? Detecting geographic hotspots ? Finding emerging new patterns
WSARE
Copyright ? 2002, 2003, Andrew Moore
These are all powerful statistical methods, which means they all have to have one thing in common… Boring Names.
Univariate Anomaly Detection Multivariate Anomaly Detection
Spatial Scan Statistics
Biosurveillance Detection Algorithms: Slide 6
3

What you’ll learn about
? Noticing events in bioevent time series ? Tracking many series at once ? Detecting geographic hotspots ? Finding emerging new patterns
WSARE
Copyright ? 2002, 2003, Andrew Moore
Univariate Anomaly Detection Multivariate Anomaly Detection
Spatial Scan Statistics
Biosurveillance Detection Algorithms: Slide 7
Univariate Time Series
Signal
Time Example Signals:
? ? ? ? ?
Copyright ? 2002, 2003, Andrew Moore
Number of ED visits today Number of ED visits this hour Number of Respiratory Cases Today School absenteeism today Nyquil Sales today
Biosurveillance Detection Algorithms: Slide 8
4

(When) is there an anomaly?
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 9
(When) is there an anomaly?
This is a time series of counts of primary-physician visits in data from Norfolk in December 2001. I added a fake outbreak, starting at a certain date. Can you guess when?
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 10
5

(When) is there an anomaly?
This is a time series of counts of primary-physician visits in data from Norfolk in December 2001. I added a fake outbreak, starting at a certain date. Can you guess when? Here (much too high for a Friday)
(Ramp attack)
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 11
An easy case
Signal
Time
Dealt with by Statistical Quality Control Record the mean and standard deviation up the the current time. Signal an alarm if we go outside 3 sigmas
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 12
6

An easy case: Control Charts
Upper Safe Range
Signal
Mean
Time
Dealt with by Statistical Quality Control Record the mean and standard deviation up the the current time. Signal an alarm if we go outside 3 sigmas
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 13
Control Charts on the Norfolk Data
Alarm Level
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 14
7

Control Charts on the Norfolk Data
Alarm Level
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 15
Looking at changes from yesterday
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 16
8

Looking at changes from yesterday
Alarm Level
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 17
Looking at changes from yesterday
Alarm Level
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 18
9

We need a happy medium:
Control Chart: Too insensitive to recent changes Change from yesterday: Too sensitive to recent changes
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 19
Moving Average
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 20
10

Moving Average
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 21
Moving Average
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 22
11

Moving Average
we an ? c w is t ho out th u B r. ab ette itative b ks nt Loo e qua b
Copyright ? 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 23
Algorithm Performance
Allowing one False Alarm per TWO weeks…
Allowing one False Alarm per SIX weeks…
Copyright ? 2002, 2003, Andrew Moore
standard control chart using yesterday Moving Average 3 Moving Average 7 Moving Average 56 hours_of_daylight hours_of_daylight is_mon hours_of_daylight is_mon ... is_tue hours_of_daylight is_mon ... is_sat CUSUM sa-mav-1 sa-mav-7 sa-mav-14 sa-regress Cough with denominator Cough with MA
ct te de c k to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike t sp etec d ck to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike sp
0.39 0.14 0.36 0.58 0.54 0.58 0.7 0.72 0.77 0.45 0.86 0.87 0.86 0.73 0.78 0.65
Biosurveillance Detection Algorithms: Slide 24
3.47 3.83 3.45 2.79 2.72 2.73 2.25 1.83 2.11 2.03 1.88 1.28 1.27 1.76 2.15 2.78
0.22 0.1 0.33 0.51 0.44 0.43 0.57 0.57 0.59 0.15 0.74 0.83 0.82 0.67 0.59 0.57
4.13 4.7 3.79 3.31 3.54 3.9 3.12 3.16 3.26 3.55 2.73 1.87 1.62 2.21 2.41 3.24
12

Algorithm Performance
Allowing one False Alarm per TWO weeks…
Allowing one False Alarm per SIX weeks…
Copyright ? 2002, 2003, Andrew Moore
standard control chart using yesterday Moving Average 3 Moving Average 7 Moving Average 56 hours_of_daylight hours_of_daylight is_mon hours_of_daylight is_mon ... is_tue hours_of_daylight is_mon ... is_sat CUSUM sa-mav-1 sa-mav-7 sa-mav-14 sa-regress Cough with denominator Cough with MA
ct te de c k to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike t sp etec d ck to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike sp
0.39 0.14 0.36 0.58 0.54 0.58 0.7 0.72 0.77 0.45 0.86 0.87 0.86 0.73 0.78 0.65
Biosurveillance Detection Algorithms: Slide 25
3.47 3.83 3.45 2.79 2.72 2.73 2.25 1.83 2.11 2.03 1.88 1.28 1.27 1.76 2.15 2.78
0.22 0.1 0.33 0.51 0.44 0.43 0.57 0.57 0.59 0.15 0.74 0.83 0.82 0.67 0.59 0.57
4.13 4.7 3.79 3.31 3.54 3.9 3.12 3.16 3.26 3.55 2.73 1.87 1.62 2.21 2.41 3.24
Algorithm Performance
Allowing one False Alarm per TWO weeks…
Allowing one False Alarm per SIX weeks…
Copyright ? 2002, 2003, Andrew Moore
standard control chart using yesterday Moving Average 3 Moving Average 7 Moving Average 56 hours_of_daylight hours_of_daylight is_mon hours_of_daylight is_mon ... is_tue hours_of_daylight is_mon ... is_sat CUSUM sa-mav-1 sa-mav-7 sa-mav-14 sa-regress Cough with denominator Cough with MA
ct te de c k to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike t sp etec d ck to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike sp
0.39 0.14 0.36 0.58 0.54 0.58 0.7 0.72 0.77 0.45 0.86 0.87 0.86 0.73 0.78 0.65
Biosurveillance Detection Algorithms: Slide 26
3.47 3.83 3.45 2.79 2.72 2.73 2.25 1.83 2.11 2.03 1.88 1.28 1.27 1.76 2.15 2.78
0.22 0.1 0.33 0.51 0.44 0.43 0.57 0.57 0.59 0.15 0.74 0.83 0.82 0.67 0.59 0.57
4.13 4.7 3.79 3.31 3.54 3.9 3.12 3.16 3.26 3.55 2.73 1.87 1.62 2.21 2.41 3.24
13

Seasonal Effects
Signal
Time
Fit a periodic function (e.g. sine wave) to previous data. Predict today’s signal and 3-sigma confidence intervals. Signal an alarm if we’re off. Reduces False alarms from Natural outbreaks. Different times of year deserve different thresholds.
Copyright ? 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 27
Algorithm Performance
Allowing one False Alarm per TWO weeks…
Allowing one False Alarm per SIX weeks…
Copyright ? 2002, 2003, Andrew Moore
standard control chart using yesterday Moving Average 3 Moving Average 7 Moving Average 56 hours_of_daylight hours_of_daylight is_mon hours_of_daylight is_mon ... is_tue hours_of_daylight is_mon ... is_sat CUSUM sa-mav-1 sa-mav-7 sa-mav-14 sa-regress Cough with denominator Cough with MA
ct te de c k to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike t sp etec d ck to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike sp
0.39 0.14 0.36 0.58 0.54 0.58 0.7 0.72 0.77 0.45 0.86 0.87 0.86 0.73 0.78 0.65
Biosurveillance Detection Algorithms: Slide 28
3.47 3.83 3.45 2.79 2.72 2.73 2.25 1.83 2.11 2.03 1.88 1.28 1.27 1.76 2.15 2.78
0.22 0.1 0.33 0.51 0.44 0.43 0.57 0.57 0.59 0.15 0.74 0.83 0.82 0.67 0.59 0.57
4.13 4.7 3.79 3.31 3.54 3.9 3.12 3.16 3.26 3.55 2.73 1.87 1.62 2.21 2.41 3.24
14

Day-of-week effects
Fit a day-of-week component E[Signal] = a + deltaday E.G: deltamon= +5.42, deltatue= +2.20, deltawed= +3.33, deltathu= +3.10, deltafri= +4.02, deltasat= -12.2, deltasun= -23.42
A simple form of ANOVA
Copyright ? 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 29
Regression using Hours-in-day & IsMonday
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 30
15

Regression using Hours-in-day & IsMonday
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 31
Algorithm Performance
Allowing one False Alarm per TWO weeks…
Allowing one False Alarm per SIX weeks…
Copyright ? 2002, 2003, Andrew Moore
standard control chart using yesterday Moving Average 3 Moving Average 7 Moving Average 56 hours_of_daylight hours_of_daylight is_mon hours_of_daylight is_mon ... is_tue hours_of_daylight is_mon ... is_sat CUSUM sa-mav-1 sa-mav-7 sa-mav-14 sa-regress Cough with denominator Cough with MA
ct te de c k to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike t sp etec d ck to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike sp
0.39 0.14 0.36 0.58 0.54 0.58 0.7 0.72 0.77 0.45 0.86 0.87 0.86 0.73 0.78 0.65
Biosurveillance Detection Algorithms: Slide 32
3.47 3.83 3.45 2.79 2.72 2.73 2.25 1.83 2.11 2.03 1.88 1.28 1.27 1.76 2.15 2.78
0.22 0.1 0.33 0.51 0.44 0.43 0.57 0.57 0.59 0.15 0.74 0.83 0.82 0.67 0.59 0.57
4.13 4.7 3.79 3.31 3.54 3.9 3.12 3.16 3.26 3.55 2.73 1.87 1.62 2.21 2.41 3.24
16

Regression using Mon-Tue
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 33
Algorithm Performance
Allowing one False Alarm per TWO weeks…
Allowing one False Alarm per SIX weeks…
Copyright ? 2002, 2003, Andrew Moore
standard control chart using yesterday Moving Average 3 Moving Average 7 Moving Average 56 hours_of_daylight hours_of_daylight is_mon hours_of_daylight is_mon ... is_tue hours_of_daylight is_mon ... is_sat CUSUM sa-mav-1 sa-mav-7 sa-mav-14 sa-regress Cough with denominator Cough with MA
ct te de c k to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike t sp etec d ck to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike sp
0.39 0.14 0.36 0.58 0.54 0.58 0.7 0.72 0.77 0.45 0.86 0.87 0.86 0.73 0.78 0.65
Biosurveillance Detection Algorithms: Slide 34
3.47 3.83 3.45 2.79 2.72 2.73 2.25 1.83 2.11 2.03 1.88 1.28 1.27 1.76 2.15 2.78
0.22 0.1 0.33 0.51 0.44 0.43 0.57 0.57 0.59 0.15 0.74 0.83 0.82 0.67 0.59 0.57
4.13 4.7 3.79 3.31 3.54 3.9 3.12 3.16 3.26 3.55 2.73 1.87 1.62 2.21 2.41 3.24
17

CUSUM
? CUmulative SUM Statistics ? Keep a running sum of “surprises”: a sum of excesses each day over the prediction ? When this sum exceeds threshold, signal alarm and reset sum
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 35
CUSUM
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 36
18

CUSUM
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 37
Algorithm Performance
Allowing one False Alarm per TWO weeks…
Allowing one False Alarm per SIX weeks…
Copyright ? 2002, 2003, Andrew Moore
standard control chart using yesterday Moving Average 3 Moving Average 7 Moving Average 56 hours_of_daylight hours_of_daylight is_mon hours_of_daylight is_mon ... is_tue hours_of_daylight is_mon ... is_sat CUSUM sa-mav-1 sa-mav-7 sa-mav-14 sa-regress Cough with denominator Cough with MA
ct te de c k to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike t sp etec d ck to tta ys p a Da am r d a of te n tec tio e ac s d Fr ike sp
0.39 0.14 0.36 0.58 0.54 0.58 0.7 0.72 0.77 0.45 0.86 0.87 0.86 0.73 0.78 0.65
Biosurveillance Detection Algorithms: Slide 38
3.47 3.83 3.45 2.79 2.72 2.73 2.25 1.83 2.11 2.03 1.88 1.28 1.27 1.76 2.15 2.78
0.22 0.1 0.33 0.51 0.44 0.43 0.57 0.57 0.59 0.15 0.74 0.83 0.82 0.67 0.59 0.57
4.13 4.7 3.79 3.31 3.54 3.9 3.12 3.16 3.26 3.55 2.73 1.87 1.62 2.21 2.41 3.24
19

The Sickness/Availability Model
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 39
The Sickness/Availability Model
Copyright ? 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 40
20

相关主题
文本预览
相关文档 最新文档