X-Git-Url: https://git.octo.it/?a=blobdiff_plain;f=doc%2Frrdcreate.pod;h=49db70ec4ca86c33c424cefb90b3ab65b03fc1c6;hb=9fac51648328756dca9b77b808bfa867d441fe1a;hp=3b7fb685772c6baaa743972d96697f2ea1737403;hpb=08fb8353d857335537fba9058f9b87cdc948c2a8;p=rrdtool.git
diff --git a/doc/rrdcreate.pod b/doc/rrdcreate.pod
index 3b7fb68..49db70e 100644
--- a/doc/rrdcreate.pod
+++ b/doc/rrdcreate.pod
@@ -1,23 +1,20 @@
=head1 NAME
-rrdtool create - Set up a new Round Robin Database
-
-=for html
+rrdcreate - Set up a new Round Robin Database
=head1 SYNOPSIS
-B B I
-S<[B<--start>|B<-b> I]>
-S<[B<--step>|B<-s> I]>
+B B I
+S<[B<--start>|B<-b> I]>
+S<[B<--step>|B<-s> I]>
S<[BIB<:>IB<:>I]>
S<[BIB<:>I]>
=head1 DESCRIPTION
-The create function of the RRDtool lets you set up new
-Round Robin Database (B) files.
-The file is created at its final, full size and filled
-with I<*UNKNOWN*> data.
+The create function of RRDtool lets you set up new Round Robin
+Database (B) files. The file is created at its final, full size
+and filled with I<*UNKNOWN*> data.
=over 8
@@ -34,7 +31,7 @@ value should be added to the B. B will not accept
any data timed before or at the time specified.
See also AT-STYLE TIME SPECIFICATION section in the
-I documentation for more ways to specify time.
+I documentation for other ways to specify time.
=item B<--step>|B<-s> I (default: 300 seconds)
@@ -43,17 +40,17 @@ into the B.
=item BIB<:>IB<:>I
-A single B can accept input from several data sources (B).
-(e.g. Incoming and Outgoing traffic on a specific communication
-line). With the B configuration option you must define some basic
-properties of each data source you want to use to feed the B.
+A single B can accept input from several data sources (B),
+for example incoming and outgoing traffic on a specific communication
+line. With the B configuration option you must define some basic
+properties of each data source you want to store in the B.
I is the name you will use to reference this particular data
source from an B. A I must be 1 to 19 characters long in
the characters [a-zA-Z0-9_].
I defines the Data Source Type. The remaining arguments of a
-data source entry depend upon the data source type. For GAUGE, COUNTER,
+data source entry depend on the data source type. For GAUGE, COUNTER,
DERIVE, and ABSOLUTE the format for a data source entry is:
BIB<:>IB<:>IB<:>IB<:>I
@@ -62,24 +59,26 @@ For COMPUTE data sources, the format is:
BIB<:>IB<:>I
-To decide on a data source type, review the definitions that follow.
-Consult the section on "HOW TO MEASURE" for further insight.
+In order to decide which data source type to use, review the
+definitions that follow. Also consult the section on "HOW TO MEASURE"
+for further insight.
=over 4
-=item B
+=item B
-is for things like temperatures or number of people in a
-room or value of a RedHat share.
+is for things like temperatures or number of people in a room or the
+value of a RedHat share.
=item B
-is for continuous incrementing counters like the
-ifInOctets counter in a router. The B data source assumes that
-the counter never decreases, except when a counter overflows. The update
-function takes the overflow into account. The counter is stored as a
-per-second rate. When the counter overflows, RRDtool checks if the overflow happened at
-the 32bit or 64bit border and acts accordingly by adding an appropriate value to the result.
+is for continuous incrementing counters like the ifInOctets counter in
+a router. The B data source assumes that the counter never
+decreases, except when a counter overflows. The update function takes
+the overflow into account. The counter is stored as a per-second
+rate. When the counter overflows, RRDtool checks if the overflow
+happened at the 32bit or 64bit border and acts accordingly by adding
+an appropriate value to the result.
=item B
@@ -113,72 +112,105 @@ wrap.
=back
-=item B
+=item B
is for counters which get reset upon reading. This is used for fast counters
which tend to overflow. So instead of reading them normally you reset them
-after every read to make sure you have a maximal time available before the
+after every read to make sure you have a maximum time available before the
next overflow. Another usage is for things you count like number of messages
since the last update.
=item B
-is for storing the result of a formula applied to other data sources in
-the B. This data source is not supplied a value on update, but rather
-its Primary Data Points (PDPs) are computed from the PDPs of the data sources
-according to the rpn-expression that defines the formula. Consolidation
-functions are then applied normally to the PDPs of the COMPUTE data source
-(that is the rpn-expression is only applied to generate PDPs). In database
-software, these are referred to as "virtual" or "computed" columns.
+is for storing the result of a formula applied to other data sources
+in the B. This data source is not supplied a value on update, but
+rather its Primary Data Points (PDPs) are computed from the PDPs of
+the data sources according to the rpn-expression that defines the
+formula. Consolidation functions are then applied normally to the PDPs
+of the COMPUTE data source (that is the rpn-expression is only applied
+to generate PDPs). In database software, such data sets are referred
+to as "virtual" or "computed" columns.
=back
I defines the maximum number of seconds that may pass
-between two updates of this data source before the value of the
+between two updates of this data source before the value of the
data source is assumed to be I<*UNKNOWN*>.
-I and I are optional entries defining the expected range of
-the data supplied by this data source. If I and/or I are
-defined, any value outside the defined range will be regarded as
-I<*UNKNOWN*>. If you do not know or care about min and max, set them
-to U for unknown. Note that min and max always refer to the processed values
-of the DS. For a traffic-B type DS this would be the max and min
-data-rate expected from the device.
+I and I define the expected range values for data supplied by a
+data source. If I and/or I any value outside the defined range
+will be regarded as I<*UNKNOWN*>. If you do not know or care about min and
+max, set them to U for unknown. Note that min and max always refer to the
+processed values of the DS. For a traffic-B type DS this would be
+the maximum and minimum data-rate expected from the device.
I
-I defines the formula used to compute the PDPs of a COMPUTE
-data source from other data sources in the same . It is similar to defining
-a B argument for the graph command. Please refer to that manual page
-for a list and description of RPN operations supported. For
-COMPUTE data sources, the following RPN operations are not supported: COUNT, PREV,
-TIME, and LTIME. In addition, in defining the RPN expression, the COMPUTE
-data source may only refer to the names of data source listed previously
-in the create command. This is similar to the restriction that Bs must
-refer only to Bs and Bs previously defined in the same graph command.
+I defines the formula used to compute the PDPs of a
+COMPUTE data source from other data sources in the same . It is
+similar to defining a B argument for the graph command. Please
+refer to that manual page for a list and description of RPN operations
+supported. For COMPUTE data sources, the following RPN operations are
+not supported: COUNT, PREV, TIME, and LTIME. In addition, in defining
+the RPN expression, the COMPUTE data source may only refer to the
+names of data source listed previously in the create command. This is
+similar to the restriction that Bs must refer only to Bs
+and Bs previously defined in the same graph command.
=item BIB<:>I
The purpose of an B is to store data in the round robin archives
-(B). An archive consists of a number of data values or statistics for
+(B). An archive consists of a number of data values or statistics for
each of the defined data-sources (B) and is defined with an B line.
-When data is entered into an B, it is first fit into time slots of
-the length defined with the B<-s> option becoming a I.
+When data is entered into an B, it is first fit into time slots
+of the length defined with the B<-s> option, thus becoming a I.
+
+The data is also processed with the consolidation function (I) of
+the archive. There are several consolidation functions that
+consolidate primary data points via an aggregate function: B,
+B, B, B.
+
+=over
+
+=item AVERAGE
+
+the average of the data points is stored.
+
+=item MIN
+
+the smallest of the data points is stored.
+
+=item MAX
+
+the largest of the data points is stored.
-The data is also processed with the consolidation function (I)
-of the archive. There are several consolidation functions that consolidate
-primary data points via an aggregate function: B, B, B, B.
-The format of B line for these consolidation functions is:
+=item LAST
+
+the last data points is used.
+
+=back
+
+Note that data aggregation inevitably leads to loss of precision and
+information. The trick is to pick the aggregate function such that the
+I properties of your data is kept across the aggregation
+process.
+
+
+The format of B line for these
+consolidation functions is:
BIB<:>IB<:>IB<:>I
I The xfiles factor defines what part of a consolidation interval may
be made up from I<*UNKNOWN*> data while the consolidated value is still
-regarded as known.
+regarded as known. It is given as the ratio of allowed I<*UNKNOWN*> PDPs
+to the number of PDPs in the interval. Thus, it ranges from 0 to 1 (exclusive).
+
I defines how many of these I are used to build
a I which then goes into the archive.
@@ -189,52 +221,77 @@ I defines how many generations of data values are kept in an B.
=head1 Aberrant Behavior Detection with Holt-Winters Forecasting
-by Jake Brutlag Ejakeb@corp.webtv.netE
-
In addition to the aggregate functions, there are a set of specialized
functions that enable B to provide data smoothing (via the
-Holt-Winters forecasting algorithm), confidence bands, and the flagging
-aberrant behavior in the data source time series:
+Holt-Winters forecasting algorithm), confidence bands, and the
+flagging aberrant behavior in the data source time series:
-=over 4
+=over
-=item BIB<:>IB<:>IB<:>IB<:>IB<:>I
+=item *
+
+BIB<:>IB<:>IB<:>IB<:>I[B<:>I]
+
+=item *
+
+BIB<:>IB<:>IB<:>IB<:>I[B<:>I]
-=item BIB<:>IB<:>IB<:>I
+=item *
+
+BIB<:>IB<:>IB<:>I
-=item BIB<:>IB<:>IB<:>I
+=item *
-=item BIB<:>IB<:>I
+BIB<:>IB<:>IB<:>I
+
+=item *
-=item BIB<:>IB<:>IB<:>IB<:>I
+BIB<:>IB<:>I
+
+=item *
+
+BIB<:>IB<:>IB<:>IB<:>I
=back
These B differ from the true consolidation functions in several ways.
First, each of the Bs is updated once for every primary data point.
Second, these B are interdependent. To generate real-time confidence
-bounds, then a matched set of HWPREDICT, SEASONAL, DEVSEASONAL, and
-DEVPREDICT must exist. Generating smoothed values of the primary data points
-requires both a HWPREDICT B and SEASONAL B. Aberrant behavior
-detection requires FAILURES, HWPREDICT, DEVSEASONAL, and SEASONAL.
-
-The actual predicted, or smoothed, values are stored in the HWPREDICT
-B. The predicted deviations are store in DEVPREDICT (think a standard
-deviation which can be scaled to yield a confidence band). The FAILURES
-B stores binary indicators. A 1 marks the indexed observation a
-failure; that is, the number of confidence bounds violations in the
-preceding window of observations met or exceeded a specified threshold. An
-example of using these B to graph confidence bounds and failures
-appears in L.
+bounds, a matched set of SEASONAL, DEVSEASONAL, DEVPREDICT, and either
+HWPREDICT or MHWPREDICT must exist. Generating smoothed values of the primary
+data points requires a SEASONAL B and either an HWPREDICT or MHWPREDICT
+B. Aberrant behavior detection requires FAILURES, DEVSEASONAL, SEASONAL,
+and either HWPREDICT or MHWPREDICT.
+
+The predicted, or smoothed, values are stored in the HWPREDICT or MHWPREDICT
+B. HWPREDICT and MHWPREDICT are actually two variations on the
+Holt-Winters method. They are interchangeable. Both attempt to decompose data
+into three components: a baseline, a trend, and a seasonal coefficient.
+HWPREDICT adds its seasonal coefficient to the baseline to form a prediction, whereas
+MHWPREDICT multiplies its seasonal coefficient by the baseline to form a
+prediction. The difference is noticeable when the baseline changes
+significantly in the course of a season; HWPREDICT will predict the seasonality
+to stay constant as the baseline changes, but MHWPREDICT will predict the
+seasonality to grow or shrink in proportion to the baseline. The proper choice
+of method depends on the thing being modeled. For simplicity, the rest of this
+discussion will refer to HWPREDICT, but MHWPREDICT may be substituted in its
+place.
+
+The predicted deviations are stored in DEVPREDICT (think a standard deviation
+which can be scaled to yield a confidence band). The FAILURES B stores
+binary indicators. A 1 marks the indexed observation as failure; that is, the
+number of confidence bounds violations in the preceding window of observations
+met or exceeded a specified threshold. An example of using these B to graph
+confidence bounds and failures appears in L.
The SEASONAL and DEVSEASONAL B store the seasonal coefficients for the
-Holt-Winters forecasting algorithm and the seasonal deviations respectively.
+Holt-Winters forecasting algorithm and the seasonal deviations, respectively.
There is one entry per observation time point in the seasonal cycle. For
-example, if primary data points are generated every five minutes, and the
-seasonal cycle is 1 day, both SEASONAL and DEVSEASONAL with have 288 rows.
+example, if primary data points are generated every five minutes and the
+seasonal cycle is 1 day, both SEASONAL and DEVSEASONAL will have 288 rows.
In order to simplify the creation for the novice user, in addition to
-supporting explicit creation the HWPREDICT, SEASONAL, DEVPREDICT,
+supporting explicit creation of the HWPREDICT, SEASONAL, DEVPREDICT,
DEVSEASONAL, and FAILURES B, the B create command supports
implicit creation of the other four when HWPREDICT is specified alone and
the final argument I is omitted.
@@ -247,7 +304,7 @@ default number of rows is the same as the HWPREDICT I argument. If the
FAILURES B is implicitly created, I will be set to the I argument of the HWPREDICT B. Of course, the B
I command is available if these defaults are not sufficient and the
-create wishes to avoid explicit creations of the other specialized function
+creator wishes to avoid explicit creations of the other specialized function
B.
I specifies the number of primary data points in a seasonal
@@ -257,11 +314,11 @@ they are explicitly created, the creator should verify that all three
I arguments agree.
I is the adaption parameter of the intercept (or baseline)
-coefficient in the Holt-Winters forecasting algorithm. See L for a
+coefficient in the Holt-Winters forecasting algorithm. See L for a
description of this algorithm. I must lie between 0 and 1. A value
closer to 1 means that more recent observations carry greater weight in
-predicting the baseline component of the forecast. A value closer to 0 mean
-that past history carries greater weight in predicted the baseline
+predicting the baseline component of the forecast. A value closer to 0 means
+that past history carries greater weight in predicting the baseline
component.
I is the adaption parameter of the slope (or linear trend) coefficient
@@ -286,25 +343,26 @@ be the same for both. Note that I can also be changed via the
B I command.
I provides the links between related B. If HWPREDICT is
-specified alone and the other B created implicitly, then there is no
-need to worry about this argument. If B are created explicitly, then
-pay careful attention to this argument. For each B which includes this
-argument, there is a dependency between that B and another B. The
-I argument is the 1-based index in the order of B creation
-(that is, the order they appear in the I command). The dependent
-B for each B requiring the I argument is listed here:
+specified alone and the other B are created implicitly, then
+there is no need to worry about this argument. If B are created
+explicitly, then carefully pay attention to this argument. For each
+B which includes this argument, there is a dependency between
+that B and another B. The I argument is the 1-based
+index in the order of B creation (that is, the order they appear
+in the I command). The dependent B for each B
+requiring the I argument is listed here:
-=over 4
+=over
=item *
HWPREDICT I is the index of the SEASONAL B.
-=item *
+=item *
SEASONAL I is the index of the HWPREDICT B.
-=item *
+=item *
DEVPREDICT I is the index of the DEVSEASONAL B.
@@ -312,7 +370,7 @@ DEVPREDICT I