repNodeShardNumber – The store-wide unique shard
(replication group) number that the replication node
runs within. This is the number that directly follows
the rg specifier when a replication node is displayed in
a show topology command such as rg1-rn1.
In this case, the repNodeShardNumber
would be 1.
repNodeNumber – The shard (replication group) local
unique replication node number. This is the number that
directly follows the rn specifier when a replication
node is displayed in a show topology
command such as rg1-rn1. In this case, the
repNodeNumber would be 1.
repNodeServiceStatus – The current status of the replication node. They are as follows:
starting (1) – The storage node agent is booting up.
waitingForDeploy (2) – The replication node is waiting to be registered with the storage node agent.
running(3) – The replication node is running.
stopping(4) – The replication node is in the process of shutting down.
stopped(5) – An intentional clean shutdown.
errorRestarting(6) – The replication node is restarting after encountering an error.
errorNoRestart(7) – Service is in an error state and will not be automatically restarted. Administrative intervention is required.
unreachable(8) – The replication node is unreachable by the admin service.
The following metrics can be monitored to get a sense for the performance of each replication node in the cluster. There are two flavors of metric granularity:
Interval – By default, each node in the cluster will sample performance data every 60 seconds and aggregate the metrics to this interval. This interval may be changed using the admin plan change-parameters and supplying the statsInterval parameter with a new value in seconds (see http://docs.oracle.com/cd/NOSQL/html/AdminGuide/setstoreparams.html#changeparamcli).
Cumulative – Metrics that have been collected and aggregated since the node has started.
The metrics are further broken down into measurements for operations over single keys versus operations over multiple keys.
All timestamp metrics are in UTC, therefore appropriate conversion to a time zone relevant to where the store is deployed is necessary.
repNodeIntervalStart – The start timestamp of when this sample of single key operation measurements were collected.
repNodeIntervalEnd –The start timestamp of when this sample of single key operation measurements were collected.
repNodeIntervalPeriod – The number of milliseconds that the replication has collected single key operation measurements (repNodeIntervalEnd - repNodeIntervalStart).
repNodeIntervalTotalOps – Total number of single key operations (get, put, delete) processed by the replication node in the interval being measured.
repNodeIntervalThroughput – Number of single key operations (get, put, delete) per second completed during the interval being measured.
repNodeIntervalLatMin – The minimum latency sample of single key operations (get, put, delete) during the interval being measured.
repNodeIntervalLatMax – The maximum latency sample of single key operations (get, put, delete) during the interval being measured.
repNodeIntervalLatAvg – The average latency sample of single key operations (get, put, delete) during the interval being measured (returned as a string).
repNodeIntervalLatAvgInt – The average latency sample of single key operations (get, put, delete) during the interval being measured (returned as an integer).
repNodeIntervalLatAvgFrac – The fractional part of the average latency sample of single key operations (get, put, delete) during the interval being measured (returned as an integer).
repNodeIntervalPct95 – The 95th percentile of the latency sample of single key operations (get, put, delete) during the interval being measured.
repNodeIntervalPct99 – The 95th percentile of the latency sample of single key operations (get, put, delete) during the interval being measured.
repNodeCumulativeStart – The start timestamp of when the replication started collecting cumulative performance metrics (all the below metrics that are cumulative).
repNodeCumulativeEnd – The end timestamp of when the replication ended collecting cumulative performance metrics (all the below metrics that are cumulative).
repNodeCumulativeTotalOps – The total number of single key operations that have been processed by the replication node.
repNodeCumulativeThroughput – The sustained operations per second of single key operations measured by this node since it has started.
repNodeCumulativeLatMin – The minimum latency of single key operations measured by this node since it has started.
repNodeCumulativeLatMax – The maximum latency of single key operations measured by this node since it has started.
repNodeCumulativeLatAvg – The average latency of single key operations measured by this node since it has started (returned as a string).
repNodeCumulativeLatAvgInt – The maximum latency of single key operations measured by this node since it has started (returned as an integer).
repNodeCumulativeLatAvgFrac – The fractional part of the cumulative average latency of single key operations (get, put, delete) measured (returned as an integer) by the node since it has started.
repNodeCumulativePct95 – The 95th percentile of the latency of single key operations (get, put, delete) since it has started.
repNodeCumulativePct99 – The 99th percentile of the latency of single key operations (get, put, delete) since it has started.
repNodeMultiIntervalStart – The start timestamp of when this sample of multiple key operation measurements were collected.
repNodeMultiIntervalEnd – The end timestamp of when this sample of multiple key operation measurements were collected.
repNodeMultiIntervalPeriod – The number of milliseconds that the replication has collected multiple key operation measurements (repNodeMultiIntervalEnd – repNodeMultiIntervalStart).
repNodeMultiIntervalTotalOps – Total number of multiple key operations (execute) processed by the replication node in the interval being measured.
repNodeMultiIntervalThroughput – Number of multiple key operations (execute) per second completed during the interval being measured.
repNodeMultiIntervalLatMin – The minimum latency sample of multiple key operations (execute) during the interval being measured.
repNodeMultiIntervalLatMax – The maximum latency sample of multiple key operations (execute) during the interval being measured.
repNodeMultiIntervalLatAvg – The average latency sample of multiple key operations (execute) during the interval being measured (returned as a string).
repNodeMultiIntervalLatAvgInt – The average latency sample of multiple key operations (execute) during the interval being measured (returned as an integer).
repNodeMultiIntervalLatAvgFrac – The fractional part of the average latency sample of multiple key operations (execute) during the interval being measured (returned as an integer).
repNodeMultiIntervalPct95 – The 95th percentile of the latency sample of multiple key operations (execute) during the interval being measured.
repNodeMultiIntervalPct99 – The 95th percentile of the latency sample of multiple key operations (execute) during the interval being measured.
repNodeMultiIntervalTotalRequests – The total number of multiple key operations (execute) during the interval being measured.
repNodeMultiCumulativeStart – The start timestamp of when the replication node started collecting cumulative multiple key performance metrics (all the below metrics that are cumulative).
repNodeMultiCumulativeEnd – The end timestamp of when the replication node started collecting cumulative multiple key performance metrics (all the below metrics that are cumulative).
repNodeMultiCumulativeTotalOps – The total number of single multiple operations that have been processed by the replication node since it has started.
repNodeMultiCumulativeThroughput – The sustained operations per second of multiple key operations measured by this node since it has started.
repNodeMultiCumulativeTotalRequests – The total number of multiple key operations measured by this node since it has started.
repNodeCacheSize – The size in bytes of the replication node's cache of B-tree nodes. This is calculated using the DBCacheSize utility referred here.
repNodeConfigProperties ‐ The set of configuration name/value pairs that the replication node is currently running with.
repNodeCollectEnvStats – True or false depending on whether the replication node is currently collecting performance statistics.
repNodeStatsInterval – The interval (in seconds) that the replication node is utilizing for aggregate statistics.
repNodeMaxTrackedLatency – The maximum number of milliseconds for which latency statistics will be tracked. For example, if this parameter is set to 1000, then any operation at the repnode that exhibits a latency of 1000 or greater milliseconds is not put into the array of metric samples for subsequent reporting.
repNodeJavaMiscParams – The value of the -Xms, -Xmx, and -XX:ParallelGCThreads= as encountered when the Java VM running this replication node was booted.
repNodeLoggingConfigProps – The value of the loggingConfigProps parameter as encountered when the Java VM running this replication node was booted.
repNodeHeapSize – The current value of –Xmx for this replication node.
repNodeMountPoint – Used only for KVLite.
repNodeLatencyCeiling – The upper bound (in milliseconds) at which latency samples may be gathered at this replication node before an alert is generated. For example, if this is set to 3, then any latency sample above 3 generates an alert.
repNodeThroughputFloor – The lower bound (in operations per second) at which throughput samples may be gathered at this replication node before an alert is generated. For example, if this is set to 300,000, then any throughput calculation at this replication node that is lower than 300,000 operations per seconds generates an alert.