Big bunch of improvements for the caching daemon.

[rrdtool.git] / doc / rrdcached.pod
diff --git a/doc/rrdcached.pod b/doc/rrdcached.pod

index ab13ea7..e762659 100644 (file)
--- a/doc/rrdcached.pod
+++ b/doc/rrdcached.pod
@@ -6,7 +6,7 @@ rrdcached - Data caching daemon for rrdtool
  
  =head1 SYNOPSIS
  
-B<rrdcached> [B<-l> I<address>] [B<-w> I<timeout>] [B<-f> I<timeout>]
+B<rrdcached> [B<-l> I<address>] [B<-w> I<timeout>] [B<-z> I<delay>] [B<-f> I<timeout>] [B<-j> I<dir>]
  
  =head1 DESCRIPTION
  
@@ -42,6 +42,13 @@ C<unix:/tmp/rrdcached.sock>, will be used.
  Data is written to disk every I<timeout> seconds. If this option is not
  specified the default interval of 300E<nbsp>seconds will be used.
  
+=item B<-z> I<delay>
+
+If specified, rrdcached will delay writing of each RRD for a random number
+of seconds in the rangeE<nbsp>[0,I<delay>).  This will avoid too many
+writes being queued simultaneously.  This value should be no greater than
+the value specified in B<-w>.  By default, there is no delay.
+
  =item B<-f> I<timeout>
  
  Every I<timeout> seconds the entire cache is searched for old values which are
@@ -54,6 +61,19 @@ cases. This timeout defaults to 3600E<nbsp>seconds.
  Sets the name and location of the PID-file. If not specified, the default,
  C<I<$localststedir>/run/rrdcached.pid> will be used.
  
+=item B<-j> I<dir>
+
+Write updates to a journal in I<dir>.  In the event of a program or system
+crash, this will allow the daemon to write any updates that were pending
+at the time of the crash.
+
+On startup, the daemon will check for journal files in this directory.  If
+found, all updates therein will be read into memory before the daemon
+starts accepting new connections.
+
+The journal will be rotated with the same frequency as the flush timer
+given by B<-f>.  On clean shutdown, the journal files are removed.
+
  =item B<-b> I<dir>
  
  The daemon will change into a specific directory at startup. All files passed
@@ -74,6 +94,40 @@ used.
  
  =back
  
+=head1 EFFECTED RRDTOOL COMMANDS
+
+The following commands may be made aware of the B<rrdcached> using the command
+line argument B<--daemon> or the environment variable B<RRDCACHED_ADDRESS>:
+
+=over 4
+
+=item B<dump>
+
+=item B<fetch>
+
+=item B<flush>
+
+=item B<graph>
+
+=item B<graphv>
+
+=item B<info>
+
+=item B<last>
+
+=item B<lastupdate>
+
+=item B<update>
+
+=item B<xport>
+
+=back
+
+The B<update> command can send values to the daemon instead of writing them to
+the disk itself. All other commands can send a B<FLUSH> command (see below) to
+the daemon before accessing the files, so they work with up-to-date data even
+if the cache timeout is large.
+
  =head1 HOW IT WORKS
  
  When receiving an update, B<rrdcached> does not write to disk but looks for an
@@ -160,7 +214,7 @@ Timed out values are inserted at the "tail".
  
  =item
  
-Explicitely flushed values are inserted at the "head".
+Explicitly flushed values are inserted at the "head".
  
  =item
  
@@ -192,21 +246,163 @@ files will be messed up good!
  
  You have been warned.
  
-=head1 BUGS
+=head1 PROTOCOL
+
+The daemon communicates with clients using a line based ASCII protocol which is
+easy to read and easy to type. This makes it easy for scripts to implement the
+protocol and possible for users to use L<telnet(1)> to connect to the daemon
+and test stuff "by hand".
+
+The protocol is line based, this means that each record consists of one or more
+lines. A line is terminated by the line feed character C<0x0A>, commonly
+written as C<\n>. In the examples below, this character will be written as
+C<E<lt>LFE<gt>> ("line feed").
+
+After the connection has been established, the client is expected to send a
+"command". A command consists of the command keyword, possibly some arguments,
+and a terminating newline character. For a list of commands, see
+L<Valid Commands> below.
+
+Example:
+
+  FLUSH /tmp/foo.rrd<LF>
+
+The daemon answers with a line consisting of a status code and a short status
+message, separated by one or more space characters. A negative status code
+signals an error, a positive status code or zero signal success. If the status
+code is greater than zero, it indicates the number of lines that follow the
+status line.
+
+Examples:
+
+ 0 Success<LF>
+
+ 2 Two lines follow<LF>
+ This is the first line<LF>
+ And this is the second line<LF>
+
+=head2 Valid Commands
+
+The following commands are understood by the daemon:
  
  =over 4
  
-=item
+=item B<FLUSH> I<filename>
+
+Causes the daemon to put I<filename> to the B<head> of the update queue
+(possibly moving it there if the node is already enqueued). The answer will be
+sent B<after> the node has been dequeued.
+
+=item B<HELP> [I<command>]
+
+Returns a short usage message. If no command is given, or I<command> is
+B<HELP>, a list of commands supported by the daemon is returned. Otherwise a
+short description, possibly containing a pointer to a manual page, is returned.
+Obviously, this is meant for interactive usage and the format in which the
+commands and usage summaries are returned is not well defined.
  
-Tree nodes are never deleted.
+=item B<STATS>
+
+Returns a list of metrics which can be used to measure the daemons performance
+and check its status. For a description of the values returned, see
+L<Performance Values> below.
+
+The format in which the values are returned is similar to many other line based
+protocols: Each value is printed on a separate line, each consisting of the
+name of the value, a colon, one or more spaces and the actual value.
+
+Example:
+
+ 9 Statistics follow
+ QueueLength: 0
+ UpdatesReceived: 30
+ FlushesReceived: 2
+ UpdatesWritten: 13
+ DataSetsWritten: 390
+ TreeNodesNumber: 13
+ TreeDepth: 4
+ JournalBytes: 190
+ JournalRotate: 0
+
+=item B<UPDATE> I<filename> I<values> [I<values> ...]
+
+Adds more data to a filename. This is B<the> operation the daemon was designed
+for, so describing the mechanism again is unnecessary. Read L<HOW IT WORKS>
+above for a detailed explanation.
+
+=item B<WROTE> I<filename>
+
+This command is written to the journal after a file is successfully
+written out to disk.  It is used during journal replay to determine which
+updates have already been applied.  It is I<only> valid in the journal; it
+is not accepted from the other command channels.
  
  =back
  
+=head2 Performance Values
+
+The following counters are returned by the B<STATS> command:
+
+=over 4
+
+=item B<QueueLength> I<(unsigned 64bit integer)>
+
+Number of nodes currently enqueued in the update queue.
+
+=item B<UpdatesReceived> I<(unsigned 64bit integer)>
+
+Number of UPDATE commands received.
+
+=item B<FlushesReceived> I<(unsigned 64bit integer)>
+
+Number of FLUSH commands received.
+
+=item B<UpdatesWritten> I<(unsigned 64bit integer)>
+
+Total number of updates, i.E<nbsp>e. calls to C<rrd_update_r>, since the
+daemon was started.
+
+=item B<DataSetsWritten> I<(unsigned 64bit integer)>
+
+Total number of "data sets" written to disk since the daemon was started. A
+data set is one or more values passed to the B<UPDATE> command. For example:
+C<N:123:456> is one data set with two values. The term "data set" is used to
+prevent confusion whether individual values or groups of values are counted.
+
+=item B<TreeNodesNumber> I<(unsigned 64bit integer)>
+
+Number of nodes in the cache.
+
+=item B<TreeDepth> I<(unsigned 64bit integer)>
+
+Depth of the tree used for fast key lookup.
+
+=item B<JournalBytes> I<(unsigned 64bit integer)>
+
+Total number of bytes written to the journal since startup.
+
+=item B<JournalRotate> I<(unsigned 64bit integer)>
+
+Number of times the journal has been rotated since startup.
+
+=back
+
+=head1 BUGS
+
+No known bugs at the moment.
+
  =head1 SEE ALSO
  
  L<rrdtool(1)>, L<rrdgraph(1)>
  
-=head1 AUHOR
+=head1 AUTHOR
  
  B<rrdcached> and this manual page have been written by Florian Forster
  E<lt>octoE<nbsp>atE<nbsp>verplant.orgE<gt>.
+
+=head1 CONTRIBUTORS
+
+kevin brintnall E<lt>kbrint@rufus.netE<gt>
+
+=cut
+