B<rrdtool> B<create> I<filename>
S<[B<--start>|B<-b> I<start time>]>
S<[B<--step>|B<-s> I<step>]>
-S<[B<DS:>I<ds-name>B<:>I<DST>B<:>I<heartbeat>B<:>I<min>B<:>I<max>]>
-S<[B<RRA:>I<CF>B<:>I<xff>B<:>I<steps>B<:>I<rows>]>
+S<[B<DS:>I<ds-name>B<:>I<DST>B<:>I<dst arguments>]>
+I<heartbeat>B<:>I<min>B<:>I<max>]>
+S<[B<RRA:>I<CF>B<:>I<cf arguments>]>
=head1 DESCRIPTION
Specifies the base interval in seconds with which data will be fed
into the B<RRD>.
-=item B<DS:>I<ds-name>B<:>I<DST>B<:>I<heartbeat>B<:>I<min>B<:>I<max>
+=item B<DS:>I<ds-name>B<:>I<DST>B<:>I<dst arguments>
A single B<RRD> can accept input from several data sources (B<DS>).
(e.g. Incoming and Outgoing traffic on a specific communication
source from an B<RRD>. A I<ds-name> must be 1 to 19 characters long in
the characters [a-zA-Z0-9_].
-I<DST> defines the Data Source Type. See the section on "How to Measure" below for further insight.
-The Datasource Type must be onw of the following:
+I<DST> defines the Data Source Type. The remaining arguments of a
+data source entry depend upon the data source type. For GAUGE, COUNTER,
+DERIVE, and ABSOLUTE the format for a data source entry is:
+
+B<DS:>I<ds-name>B<:>I<GAUGE | COUNTER | DERIVE | ABSOLUTE>B<:>I<heartbeat>B<:>I<min>B<:>I<max>
+
+For COMPUTE data sources, the format is:
+
+B<DS:>I<ds-name>B<:>I<COMPUTE>B<:>I<rpn-expression>
+
+To decide on a data source type, review the definitions that follow.
+Consult the section on "HOW TO MEASURE" for further insight.
=over 4
overflow checks. So if your counter does not reset at 32 or 64 bit you
might want to use DERIVE and combine it with a MIN value of 0.
+=over
+
+=item NOTE on COUNTER vs DERIVE
+
+by Don Baarda E<lt>don.baarda@baesystems.comE<gt>
+
+If you cannot tolerate ever mistaking the occasional counter reset for a
+legitimate counter wrap, and would prefer "Unknowns" for all legitimate
+counter wraps and resets, always use DERIVE with min=0. Otherwise, using
+COUNTER with a suitable max will return correct values for all legitimate
+counter wraps, mark some counter resets as "Unknown", but can mistake some
+counter resets for a legitimate counter wrap.
+
+For a 5 minute step and 32-bit counter, the probability of mistaking a
+counter reset for a legitimate wrap is arguably about 0.8% per 1Mbps of
+maximum bandwidth. Note that this equates to 80% for 100Mbps interfaces, so
+for high bandwidth interfaces and a 32bit counter, DERIVE with min=0 is
+probably preferable. If you are using a 64bit counter, just about any max
+setting will eliminate the possibility of mistaking a reset for a counter
+wrap.
+
+=back
+
=item B<ABSOLUTE>
is for counters which get reset upon reading. This is used for fast counters
next overflow. Another usage is for things you count like number of messages
since the last update.
+=item B<COMPUTE>
+
+is for storing the result of a formula applied to other data sources in
+the B<RRD>. This data source is not supplied a value on update, but rather
+its Primary Data Points (PDPs) are computed from the PDPs of the data sources
+according to the rpn-expression that defines the formula. Consolidation
+functions are then applied normally to the PDPs of the COMPUTE data source
+(that is the rpn-expression is only applied to generate PDPs). In database
+software, these are referred to as "virtual" or "computed" columns.
+
=back
I<heartbeat> defines the maximum number of seconds that may pass
always set the min and/or max properties. This will help RRDtool in
doing a simple sanity check on the data supplied when running update.>
-=item B<RRA:>I<CF>B<:>I<xff>B<:>I<steps>B<:>I<rows>
+I<rpn-expression> defines the formula used to compute the PDPs of a COMPUTE
+data source from other data sources in the same <RRD>. It is similar to defining
+a B<CDEF> argument for the graph command. Please refer to that manual page
+for a list and description of RPN operations supported. For
+COMPUTE data sources, the following RPN operations are not supported: PREV,
+TIME, and LTIME. In addition, in defining the RPN expression, the COMPUTE
+data source may only refer to the names of data source listed previously
+in the create command. This is similar to the restriction that B<CDEF>s must
+refer only to B<DEF>s and B<CDEF>s previously defined in the same graph command.
+
+=item B<RRA:>I<CF>B<:>I<cf arguments>
+
The purpose of an B<RRD> is to store data in the round robin archives
-(B<RRA>). An archive consists of a number of data values from all the
-defined data-sources (B<DS>) and is defined with an B<RRA> line.
+(B<RRA>). An archive consists of a number of data values or statistics for
+each of the defined data-sources (B<DS>) and is defined with an B<RRA> line.
When data is entered into an B<RRD>, it is first fit into time slots of
the length defined with the B<-s> option becoming a I<primary data point>.
-The data is also consolidated with the consolidation function (I<CF>)
-of the archive. The following consolidation functions are defined:
-B<AVERAGE>, B<MIN>, B<MAX>, B<LAST>.
+The data is also processed with the consolidation function (I<CF>)
+of the archive. There are several consolidation functions that consolidate
+primary data points via an aggregate function: B<AVERAGE>, B<MIN>, B<MAX>, B<LAST>.
+The format of B<RRA> line for these consolidation functions is:
+
+B<RRA:>I<AVERAGE | MIN | MAX | LAST>B<:>I<xff>B<:>I<steps>B<:>I<rows>
I<xff> The xfiles factor defines what part of a consolidation interval may
be made up from I<*UNKNOWN*> data while the consolidated value is still
regarded as known.
-I<steps> defines how many of these I<primary data points> are used to
-build a I<consolidated data point> which then goes into the archive.
+I<steps> defines how many of these I<primary data points> are used to build
+a I<consolidated data point> which then goes into the archive.
I<rows> defines how many generations of data values are kept in an B<RRA>.
=back
+=head1 Aberrant Behavior Detection with Holt-Winters Forecasting
+
+by Jake Brutlag E<lt>jakeb@corp.webtv.netE<gt>
+
+In addition to the aggregate functions, there are a set of specialized
+functions that enable B<RRDtool> to provide data smoothing (via the
+Holt-Winters forecasting algorithm), confidence bands, and the flagging
+aberrant behavior in the data source time series:
+
+=over 4
+
+=item B<RRA:>I<HWPREDICT>B<:>I<rows>B<:>I<alpha>B<:>I<beta>B<:>I<seasonal period>B<:>I<rra num>
+
+=item B<RRA:>I<SEASONAL>B<:>I<seasonal period>B<:>I<gamma>B<:>I<rra num>
+
+=item B<RRA:>I<DEVSEASONAL>B<:>I<seasonal period>B<:>I<gamma>B<:>I<rra num>
+
+=item B<RRA:>I<DEVPREDICT>B<:>I<rows>B<:>I<rra num>
+
+=item B<RRA:>I<FAILURES>B<:>I<rows>B<:>I<threshold>B<:>I<window length>B<:>I<rra num>
+
+=back
+
+These B<RRAs> differ from the true consolidation functions in several ways.
+First, each of the B<RRA>s is updated once for every primary data point.
+Second, these B<RRAs> are interdependent. To generate real-time confidence
+bounds, then a matched set of HWPREDICT, SEASONAL, DEVSEASONAL, and
+DEVPREDICT must exist. Generating smoothed values of the primary data points
+requires both a HWPREDICT B<RRA> and SEASONAL B<RRA>. Aberrant behavior
+detection requires FAILURES, HWPREDICT, DEVSEASONAL, and SEASONAL.
+
+The actual predicted, or smoothed, values are stored in the HWPREDICT
+B<RRA>. The predicted deviations are store in DEVPREDICT (think a standard
+deviation which can be scaled to yield a confidence band). The FAILURES
+B<RRA> stores binary indicators. A 1 marks the indexed observation a
+failure; that is, the number of confidence bounds violations in the
+preceding window of observations met or exceeded a specified threshold. An
+example of using these B<RRAs> to graph confidence bounds and failures
+appears in L<rrdgraph>.
+
+The SEASONAL and DEVSEASONAL B<RRAs> store the seasonal coefficients for the
+Holt-Winters Forecasting algorithm and the seasonal deviations respectively.
+There is one entry per observation time point in the seasonal cycle. For
+example, if primary data points are generated every five minutes, and the
+seasonal cycle is 1 day, both SEASONAL and DEVSEASONAL with have 288 rows.
+
+In order to simplify the creation for the novice user, in addition to
+supporting explicit creation the HWPREDICT, SEASONAL, DEVPREDICT,
+DEVSEASONAL, and FAILURES B<RRAs>, the B<rrdtool> create command supports
+implicit creation of the other four when HWPREDICT is specified alone and
+the final argument I<rra num> is omitted.
+
+I<rows> specifies the length of the B<RRA> prior to wrap around. Remember
+that there is a one-to-one correspondence between primary data points and
+entries in these RRAs. For the HWPREDICT CF, I<rows> should be larger than
+the I<seasonal period>. If the DEVPREDICT B<RRA> is implicity created, the
+default number of rows is the same as the HWPREDICT I<rows> argument. If the
+FAILURES B<RRA> is implicitly created, I<rows> will be set to the I<seasonal
+period> argument of the HWPREDICT B<RRA>. Of course, the B<rrdtool>
+I<resize> command is available if these defaults are not sufficient and the
+create wishes to avoid explicit creations of the other specialized function
+B<RRAs>.
+
+I<seasonal period> specifies the number of primary data points in a seasonal
+cycle. If SEASONAL and DEVSEASONAL are implicitly created, this argument for
+those B<RRAs> is set automatically to the value specified by HWPREDICT. If
+they are explicity created, the creator should verify that all three
+I<seasonal period> arguments agree.
+
+I<alpha> is the adaptation parameter of the intercept (or baseline)
+coefficient in the Holt-Winters Forecasting algorithm. See L<rrdtool> for a
+description of this algorithm. I<alpha> must lie between 0 and 1. A value
+closer to 1 means that more recent observations carry greater weight in
+predicting the baseline component of the forecast. A value closer to 0 mean
+that past history carries greater weight in predicted the baseline
+component.
+
+I<beta> is the adaption parameter of the slope (or linear trend) coefficient
+in the Holt-Winters Forecating algorihtm. I<beta> must lie between 0 and 1
+and plays the same role as I<alpha> with respect to the predicted linear
+trend.
+
+I<gamma> is the adaption parameter of the seasonal coefficients in the
+Holt-Winters Forecasting algorithm (HWPREDICT) or the adaption parameter in
+the exponential smoothing update of the seasonal deviations. It must lie
+between 0 and 1. If the SEASONAL and DEVSEASONAL B<RRAs> are created
+implicitly, they will both have the same value for I<gamma>: the value
+specified for the HWPREDICT I<alpha> argument. Note that because there is
+one seasonal coefficient (or deviation) for each time point during the
+seasonal cycle, the adaption rate is much slower than the baseline. Each
+seasonal coefficient is only updated (or adapts) when the observed value
+occurs at the offset in the seasonal cycle corresponding to that
+coefficient.
+
+If SEASONAL and DEVSEASONAL B<RRAs> are created explicity, I<gamma> need not
+be the same for both. Note that I<gamma> can also be changed via the
+B<rrdtool> I<tune> command.
+
+I<rra num> provides the links between related B<RRAs>. If HWPREDICT is
+specified alone and the other B<RRAs> created implicitly, then there is no
+need to worry about this argument. If B<RRAs> are created explicitly, then
+pay careful attention to this argument. For each B<RRA> which includes this
+argument, there is a dependency between that B<RRA> and another B<RRA>. The
+I<rra num> argument is the 1-based index in the order of B<RRA> creation
+(that is, the order they appear in the I<create> command). The dependent
+B<RRA> for each B<RRA> requiring the I<rra num> argument is listed here:
+
+=over 4
+
+=item *
+
+HWPREDICT I<rra num> is the index of the SEASONAL B<RRA>.
+
+=item *
+
+SEASONAL I<rra num> is the index of the HWPREDICT B<RRA>.
+
+=item *
+
+DEVPREDICT I<rra num> is the index of the DEVSEASONAL B<RRA>.
+
+=item *
+
+DEVSEASONAL I<rra num> is the index of the HWPREDICT B<RRA>.
+
+=item *
+
+FAILURES I<rra num> is the index of the DEVSEASONAL B<RRA>.
+
+=back
+
+I<threshold> is the minimum number of violations (observed values outside
+the confidence bounds) within a window that constitutes a failure. If the
+FAILURES B<RRA> is implicitly created, the default value is 7.
+
+I<window length> is the number of time points in the window. Specify an
+integer greater than or equal to the threshold and less than or equal to 28.
+The time interval this window represents depends on the interval between
+primary data points. If the FAILURES B<RRA> is implicity created, the
+default value is 9.
+
=head1 The HEARTBEAT and the STEP
Here is an explanation by Don Baarda on the inner workings of rrdtool.
=item Mail Messages
-Assume you have a methode to count the number of messages transported by
+Assume you have a method to count the number of messages transported by
your mailserver in a certain amount of time, this give you data like '5
messages in the last 65 seconds'. If you look at the count of 5 like and
B<ABSOLUTE> datatype you can simply update the rrd with the number 5 and the
temperatures supplied for 100 hours (1200 * 300 seconds = 100
hours). The second RRA stores the minimum temperature recorded over
every hour (12 * 300 seconds = 1 hour), for 100 days (2400 hours). The
-third and the fourth RRA's do the same with the for the maximum and
+third and the fourth RRA's do the same for the maximum and
average temperature, respectively.
+=head1 EXAMPLE 2
+
+C<rrdtool create monitor.rrd --step 300
+DS:ifOutOctets:COUNTER:1800:0:4294967295
+RRA:AVERAGE:0.5:1:2016
+RRA:HWPREDICT:1440:0.1:0.0035:288>
+
+This example is a monitor of a router interface. The first B<RRA> tracks the
+traffic flow in octects; the second B<RRA> generates the specialized
+functions B<RRAs> for aberrant behavior detection. Note that the I<rra num>
+argument of HWPREDICT is missing, so the other B<RRAs> will be implicitly be
+created with default parameter values. In this example, the forecasting
+algorithm baseline adapts quickly; in fact the most recent one hour of
+observations (each at 5 minute intervals) account for 75% of the baseline
+prediction. The linear trend forecast adapts much more slowly. Observations
+made in during the last day (at 288 observations per day) account for only
+65% of the predicted linear trend. Note: these computations rely on an
+exponential smoothing formula described in a forthcoming LISA 2000 paper.
+
+The seasonal cycle is one day (288 data points at 300 second intervals), and
+the seasonal adaption paramter will be set to 0.1. The RRD file will store 5
+days (1440 data points) of forecasts and deviation predictions before wrap
+around. The file will store 1 day (a seasonal cycle) of 0-1 indicators in
+the FAILURES B<RRA>.
+
+The same RRD file and B<RRAs> are created with the following command, which explicitly
+creates all specialized function B<RRAs>.
+
+C<rrdtool create monitor.rrd --step 300
+DS:ifOutOctets:COUNTER:1800:0:4294967295
+RRA:AVERAGE:0.5:1:2016
+RRA:HWPREDICT:1440:0.1:0.0035:288:3
+RRA:SEASONAL:288:0.1:2
+RRA:DEVPREDICT:1440:5
+RRA:DEVSEASONAL:288:0.1:2
+RRA:FAILURES:288:7:9:5>
+
+Of course, explicit creation need not replicate implicit create, a number of arguments
+could be changed.
+
+=head1 EXAMPLE 3
+
+C<rrdtool create proxy.rrd --step 300
+DS:TotalRequests:DERIVE:1800:0:U
+DS:AccumDuration:DERIVE:1800:0:U
+DS:AvgReqDuration:COMPUTE:AccumDuration,TotalRequests,0,EQ,1,TotalRequests,IF,/
+RRA:AVERAGE:0.5:1:2016>
+
+This example is monitoring the average request duration during each 300 sec
+interval for requests processed by a web proxy during the interval.
+In this case, the proxy exposes two counters, the number of requests
+processed since boot and the total cumulative duration of all processed
+requests. Clearly these counters both have some rollover point, but using the
+DERIVE data source also handles the reset that occurs when the web proxy is
+stopped and restarted.
+
+In the B<RRD>, the first data source stores the requests per second rate
+during the interval. The second data source stores the total duration of all
+requests processed during the interval divided by 300. The COMPUTE data source
+divides each PDP of the AccumDuration by the corresponding PDP of
+TotalRequests and stores the average request duration. The remainder of the
+RPN expression handles the divide by zero case.
+
=head1 AUTHOR
Tobias Oetiker E<lt>oetiker@ee.ethz.chE<gt>