Monitoring Console

The monitoring console module collects a multitude of metrics from all nodes and accumulates the data on the DAS. The data is visualised by the monitoring console web-application.

The metrics include server metrics as well as metrics defined in user applications using MicroProfile Metrics and metrics derived from MicroProfile Health.

Please be aware that following details reflect the current state of a evolving product that are likely to change in the future.

Enabling the GUI

To enable the monitoring console GUI run the following asadmin command:

asadmin> set-monitoring-console-configuration --enabled=true

The console is now available at http://localhost:8080/monitoring-console/ . Like the admin GUI the monitoring console only runs on the DAS.

To stop the monitoring console GUI run:

asadmin> set-monitoring-console-configuration --enabled=false

Stopping the GUI does not disable the data collection. This is a background process that is controlled separately as described in the next section.

The monitoring console feature is only contained in Payara Server distribution.

Enabling Data Collection

The collection of monitoring data gets enabled and disabled in connection with the general configuration: MonitoringMonitoring ServiceEnabled

To remove only the monitoring data collection from the server without disabling the monitoring service itself delete the monitoring-console-core.jar from the /glassfish/modules folder of your installation(s) and restart the server.

Graphical User Interface

After enabling the monitoring console open the home page at /monitoring-console/ (usually on port 8080).

The console comes with a set of preconfigured pages each focussing on different aspects of the server.

The console will let you know that some metrics are only available when the monitoring configuration of certain server parts is set to HIGH.

Pages

The monitoring console is organized in fully customisable pages. A page is a collection of widgets arranged in columns. Usually pages are focussed on visualising metrics of a specific area of the server, like HTTP or request tracing. The monitoring console has preconfigured pages as a starting point that can be changed and extended by the user. New custom pages can be created.

Switching Pages

To switch page use the drop-down menu at the top centre of the page.

Page Navigation

Since version 5.2020.2, each page switch also updates the URL making the URL bookmarkable. URLs can be copied and shared with others to share particular pages with them.

Updating URLs after switching pages also means that once the page is changed using the dropdown navigation, the browser’s page history based navigation can be used to go back to already visited pages.

Changing Page Layout

Each page uses a column layout of 1-4 columns. The number of columns for a page can be changed using top right Layout menu.

Changing the column layout of a page does not change the preferred position of the widgets. If a widget is set to column 3 but a 2-column layout is used all widgets of the 3rd or 4th column get pushed to the 2nd column instead. This allows switching layouts without redoing the configuration entirely.

Changing Page Update Interval

By default the graphs on a page update every 2 seconds. To control the interval use top right Data Refresh menu or open the Settings and change the interval in the Global settings.

Global Settings

The interval can be paused and continued without changing the interval duration by using the paused checkbox next to the interval setting.

Adding Pages

To add a new page use the top right Page menu and select New…​.

This creates a new Unnamed page which can be renamed in the Page settings.

Page Settings

After naming the page add widgets.

Deleting a Page

To delete a page open top right Page menu, select Delete and confirm the browser confirmation dialogue.

Pages that are based on a preset cannot be deleted. They can only be reset to default preset configuration.

Reset a Page

When the configuration of a preconfigured page or any of the widgets on it was changed by the user the page can be reset by selecting Reset from the top right Page menu.

The reset will affect all widgets and the general page settings of the currently shown page. All changes made by the user will be overridden.

To reset all pages and remove all user created pages clear the browser’s local storage for the monitoring console page.

Page Synchronisation

Monitoring Console uses the browser’s local storage to store the page configuration (which includes the widgets placed on a page). This means by default changes to page configurations do not affect other users. To make pages available to others and to start from a common base configuration pages are explicitly synchronised with a persistent server page configuration.

The chosen role controls the behaviour and available options for the page synchronisation. It is the Administrator's role that is meant to manage the persistent server page configuration. The Administrator pushes local changes to the server to update the base configuration. Users and Guests pull the server configuration to update their local configuration with the base. The control remains with each user.

The Page Synchronisation dialogue shown when entering the console UI allows the user to pick the pages that should be updated (pulled).

Page Synchronisation

Based on the role the pages to synchronise are pre-selected.

  • Guest: by default updates all pages available remotely unless they are already up to date

  • Administrator and User: by default updates all pages available remotely unless a page has been locally changed

The newest version is marked by a green background for easier orientation.

This dialogue can also be opened from the Global settings by clicking the Update Local Pages button shown Page Sync property.

Global Page Synchronisation Settings

Administrators also find a button to Update Remote Pages with any changes applied to local changes. This is only needed for pages that are not using _auto_matic updates.

To share a page that so far is not part of the persistent remote configuration open the Page settings and look for the Sync property:

Page Synchronisation Settings

Users and Administrators can both Push a page to update the persistent remote configuration and Pull the remote configuration to update the local page. Administrators can also enable or disable the automatic page synchronisation. When enabled any local change done by an Administrator is automatically pushed to the remote configuration and can be pulled by other users.

Changing User Role

The monitoring console can be used in one of three roles:

  • Guest: A user that uses the console only on occasion to check existing metrics on existing pages ("read-only" user).

  • User: A user that uses the console to experiment with its features without affecting others.

  • Administrator: A user that uses the console to configure the configuration that is the basis for other users.

When no role is set for the session the user is asked to choose a role when opening the monitoring console.

Role Selection

The selection can be changed at any time in the Global settings for Role.

Role Setting

Page Settings

The individual page settings are explained in more detail in the context of the operations they are involved in. This table gives an additional overview.

Table 1. Page Settings
Property Default Description

Name

-

Name of the page; only editable for pages that do not originate from a preset

Page Rotation

yes

Whether or not to include the page in the page rotation

Type

Manual

Manual page (widgets added by the user) or Query page (widgets added from query matches)

Add Widgets (Manual only)

-

Selection of the series to add as a widget to the current page

Max Size (Query only)

-

Maximum number of widgets to place on the page from query matches

Query Series (Query only)

-

The series pattern to use to query for series to place on the page

Query Interval (Query only)

-

Number of seconds between repopulating the page with widgets from query matches, can be given using any time unit, e.g. 1min, 30sec, 1h

Sync (non Guest role only)

-

Push to update the shared state with the local page, Pull to update the local state with the shared state

Sync (Admin role only) auto

yes

Whether or not to push changes to the local page to the shared state automatically when using the admin role

Widgets

Widgets are the components that can be placed on a page to show a particular metric. While there are different types of widgets usually a widget shows a line graph of one or more metrics for the entire cluster.

Adding a Widget

Widgets can be added to any page. Open the side panel by clicking on Settings top right menu.

Make sure the page is of Type Manual. Pages of type Query can be changed to Manual as well which keeps them as they are for further manual customisation.

Use the Select metric(s) button of the Add Widgets property of the Page settings to open the metric selection dialog.

Page Settings

In the selection dialog use the filters to find the metric you are looking for.

Metric Selection Dialog

When the metric is found in the list of matches check it to add it to the selection. After selecting one or more metrics to show in the widget confirm the selection by pressing Ok button of the dialog.

Table 2. Selection Filters
Filter Description

Source

Filter by the the technology or area that provided the metric within the server

Namespace

Filter by the namespace of a metric; this is the tag called ns at the start of a full series key ns:x MetricName

Group

Filter by the group tag of a metric; this is the tag called @ often found in the middle of a full series key ns:x @:y MetricName

Metric

Filter by the metric name; this is last and only mandatory part of a full series key MetricName

Series

Free text search by substring occuring in the full series key

MicroProfile Application

Filter by the registry name containing a MircoProfile metric (which is equivalent with the application name)

MicroProfile Type

Filter by the metric type as provided with MicroProfile metadata

MicroProfile Property

Filter by the property of a MircoProfile metric; These are the different values available for the same metric object as individual series, for example the current, min and max counts of a counter. Note that not all actual properties of metric objects are available as metric series in the Monitoring Console.

MicroProfile Name

Free text search for the name provided with MicroProfile metadata

MicroProfile Display Name

Free text search for the display name provided with MicroProfile metadata (which defaults to the MicroProfile name if non was provided)

MicroProfile Description

Free text search for the description provided with MicroProfile metadata

Filters that cannot be used in combination with chosen options of other filters are not shown and have no effect. They reappear as soon as they can be used.

When added, widgets are automatically placed in a column so that they are evenly distributed on the page. Use the side panel widget settings to change column, span or item position of the widget or use the quick menu for the widget by clicking the cog symbol in the top right corner of the widget and select the action you want to apply to the widget.

The widget series name field allows you to enter the name manually for advanced use cases or in case a metric is not available now but known to become available at some point. See Series with Wildcards for more details.

Populate Widgets from Queries

The second form of populating a page is to not manually add widgets to it but to define a series query. Matching metrics will be used to automatically populate the page with widgets. To use this option change the page Type to Query in the Page settings.

Page Settings

At a minimum the Query Series has to be defined. This should be a series using a pattern containing wild-cards (see Series with Wildcards for more details). The matching series are ordered and placed on the page in intervals. The interval is controlled by the Query Interval property. Usually the interval is between 1 minute and 1 hour. If no interval is given the page will first update after 365 days.

The Update button can be used to force a page update before the next interval starts and ends. Changes to the Query Series, Query Interval or Max Size do not have immediate effect and can equally applied immediately using the Update button.

Depending on the Query Series and its potential to match vast numbers of metrics it can be a good idea to limit the maximum number of widgets automatically placed on the page by the population from query using the Max Size property.

Configuring a Widget

To configure a widget select it by clicking on its title.

This opens the side panel and shows the detail configuration of the widget. The side panel can be closed by either deselecting the widget, by clicking on its title again, or by clicking the Settings menu or its Hide item.

If the settings have been opened explicitly through the Settings menu an empty selection will not close the settings side panel.

The Widget settings are concerned with the widgets position within the page.

Widget Settings
Table 3. Widget Settings
Property Default Description

Display Name

(undefined)

Optional name to customise the displayed widget title. If no name is given the Data Series is shown.

Type

Time Curve

Changes the type of graph used to show the metric. This also affects which data of the metric the widget is focussed on. Possible choices: Time Curve: a line chart of current values (usually last 60 seconds), Range Indicator: show distributions of values (of last 60 seconds), Alerts: shows a table of alerts, Annotations: shows a table or list of annotations; these are key-value attributes associated with a metric series, RAG Status: Shows a red, amber, green tableau for the status of the series per instance. This should only be used with series of Data Unit of type Up/Down

Mode

(Default)

If available for the type of widget the display mode can be switched between Table and List. Table view is best fitted for annotations that share common attributes, lists are better fitted for annotations with potentially differing attributes.

Column

1

The column in the layout the widget prefers to be in.

Item

1

Within columns widgets are sorted by their item value, lowest values first. The item position is automatically updated during page layout should it be needed.

Size

1 / 1

How many width (number of columns) and height (number of rows) the widget should span. Note that this is a target size. If page number of columns is reduced the width might be reduced to match the page layout.

The Data settings are concerned with what and how the data is shown in the graph of a widget.

Widget Data Settings
Table 4. Data Settings
Property Default Description

Series

-

The key(s) of the data series shown in the graph. Use Change metric(s) button to change metrics linked to the widget. Note that only series of the same unit (same Y-axis) are meant to be shown in the same graph. Multiple series of different Y-axis are currently not supported.

Unit

Count

The unit controls how the raw number of a metric is interpreted and represented e.g. when displaying in axis labels. The unit also controls what values are accepted as input for numbers, like thresholds, that are on the same axis. Possible units are: Count, Milliseconds, Nanoseconds, Bytes, Percentage, Up/Down (0 is DOWN, anything above is UP).

Unit 1/sec

false

Whether or not to show the metric as a change per second (delta between two points in the series normalised to average delta per second).

Upscaling

(undefined)

When defined all values are multiplied by this factor. This can e.g. be used to move a decimal range between zero and one to a percentage range 0-100.

Upscaling decimal value

false

Check to mark the metric as one that is provided as a decimal value during collection (floats or doubles and alike). This will then revert the conversion applied during collection which converts decimal values to an integer number with 4 available decimal digits by multiplying them by 10,000. So this divides values by 10,000 again to move them back into their original value range.

Extra Lines

(none checked)

Check to add the all time minimum, maximum or average line to the graph

Lines

(none checked)

Options on how to display the current value line in the graph: Check Points to show individual data points on the line, check Curvy to use B├ęzier curves instead of straight line connections.

Background

Fill

When Fill is checked the area between line and axis is filled. The background transparency can be configured globally in the Colors settings.

X-Axis

Labels

Whether or not to show labels for the x-axis.

Y-Axis

(undefined)

Set a minimum or maximum number for the y-axis instead. When not set range is derived automatically from the data points.

Coloring

Instance Name

Used to select the colours the data is shown with. Instance Name will show data in the same colour that belongs to the same instance. Series Name shows the metric series in the same colour (independent of instance). This is individual to each widget. Result Set Index will show each matching series in a different colour. Instance and Series Name shows the same instance and series in the same colour. This is global. This means another widget using this colouring with the same instance and series will have the same colour as well.

Fields

(blank)

A comma or space separated list of fields to show for an annotation list or table. The given field order is reflected in the view.

Annotations

(checked)

Whether or not to show annotations for an alert table.

Configuring Widget Decorations

Decorations are visual helpers that can be added to a graph of a widget in order to make it easier to for the user to quickly understand the data displayed. In particular these are reference lines.

Widget Decoration Settings
Property Default Description

Waterline

(undefined)

Adds a simple reference line that could mark a upper or lower limit. Enter in unit of y-axis, e.g. 20ms, 5s, 2GB, 4%, 123

Threshold Reference

Off

Controls what value the Alarming Threshold and Critical Threshold are compared to. Usually this is the most recent value but in some cases the all time minimum, maximum or average are useful too.

Alarming Threshold

(undefined)

The limit for the "Alarming" state. If Critical Threshold is undefined or if it’s value is larger than this value the alarming threshold is exceeded if the reference value is larger than the threshold. Otherwise, if the Critical Threshold is lower than the Alarming Threshold, the threshold becomes 'inverse', and is exceeded when the reference value falls below the alarming value. When exceeded the status becomes "Alarming". Enter in unit of y-axis, e.g. 20ms, 5s, 2GB, 4%, 123

Critical Threshold

(undefined)

The limit of the "Critical" state. If Alarming Threshold is undefined or it’s value is smaller than this value the critical threshold is exceeded if the reference value is larger then the threshold. Otherwise, if the Critical Threshold is lower than the Alarming Threshold, the threshold becomes 'inverse', and is exceeded when the reference value falls below the critical value. When exceeded the status becomes "Critical", overrides status "Alarming". Enter in unit of y-axis, e.g. 20ms, 5s, 2GB, 4%, 123

Besides adding a reference line to the graph the critical and alarming thresholds affect the status evaluation.

Use the colour input to individually override the widget’s colour for waterline, alarming line or critical line in graphs. Press the button right next to it which shows the default colour as configured in the Colors settings to reset the individual colour to use the default setting instead. A colour set to the default changes with the default.

Configuring Widget Status Messages

The current value of a metric can automatically be categorised as a certain status. For example based on the threshold values of decorations or by encountered errors or missing data. The Status settings allows to set a custom message for a particular status.

Widget Status Settings
Property Default Description

"No Data"

(undefined)

The message to show when no data for a metric is available. For example to hint about configurations needed to make the metric available.

"Alarming"

(undefined)

The message to show when the threshold got exceeded and the status evaluates to Alarming.

"Critical"

(undefined)

The message to show when the threshold got exceeded and the status evaluates to Critical.

The messages support simple markup using *bold* and _italic_.

Configuring Widget Alerts

The values of metric series can be tracked by the server to report values that out of a healthy range. This is represented by states:

  • Red: values in this range indicate an unhealthy state

  • Amber: values in this range indicate a degraded state

  • Green: values in this range indicate a healthy state

  • White: values in this range are in no particular state (neither good or bad, maybe undecidable)

Widget Alerts Settings
Property Default Description

Show

(all checked)

Filters that allow to control the type of alerts that should be shown. Include Amber and/or Red alerts, Ongoing and/or Stopped alerts, Acknowledged and/or Unacknowledged alerts. If for any of these 3 groups of pairs both are unchecked the list will be empty. Widgets of type Time Curve automatically exclude stopped and acknowledged alerts to avoid cluttering the graph with alerts that aren’t relevant any more. Use a widget of type Alerts Table to show all alerts according to the filter at all times.

Handling Alerts

Alerts can occur when a watch is configured for a metric series. A watch describes the conditions and thresholds used to determine a state from the recent values of a metric series. It is either Red (unhealthy), Amber (degraded), Green (healthy) or White (undetermined). A Red or Amber state causes an alert that is shown on top of the graph.

Alert for a line graph

When an alert occurred for a metric shown in a graph the background of the line causing the alert is coloured in Red or Amber default colour until the alert is stopped.

Alerts are considered ongoing as long as the state is Red or Amber. If it is evaluated to Green or White the alert stops. Stopped alerts are no longer shown on top of line graphs. Another way to remove an alert from the line graph is to acknowledge the alert by checking its checkbox. To see even stopped or acknowledged alerts add a widget of type Alert Table.

Alert Table

By default an alert table shows all alerts including stopped and acknowledged alerts. Use the widget Alerts settings to exclude alerts with certain properties.

Alert table shows the overall state of each instance in the table’s legend.

Alerts settings can equally be used for line graphs to create a more restrictive setting.

Currently watches are only added programmatically as part of the server. User defined watches are not yet possible.

Removing a Widget

Any widget can be removed from any page. Click the cog symbol in the top right of the widget, select Remove and confirm the browser dialogue.

Remember that a page originating from a preset can be reset to recover a widget that was removed accidentally.

Using Advanced Data Series

A series is a name or key for a particular metric shown in a widget. Each metric has a unique key or series name. Usually these names use tags to categorise metrics into groups. For example:

ns:jvm MetricName
ns:jvm @:groupName MetricName

The patterns shown above exemplify conventions used throughout Payara monitoring console. The first grouping level usually is the ns (short for namespace), the second the @ (short for group). In the example the metrics belong to the jvm namespace and the groupName group within that namespace.

When metrics are selected using the dropdowns these details are filled in for the user by selecting items from the dropdowns. Alternatively, a user may choose to enter the series key manually which allows using a wild-card star instead of either a tag value (like jvm) or the MetricName. For example:

ns:jvm @:* Duration

This will then match all existing series that match the pattern. This feature can be used to automatically show all metrics of a certain pattern in a single widget as it is done by the request tracing page.

Global Configurations

The configuration that controls the monitoring console UI. It is stored in the browser’s local storage. For manual sharing the configuration can be exported and imported in the form of JSON files. Server integrated sharing can be done using the built in page synchronisation which is a two-way update mechanism to update either the local configuration with server configuration or the other way around which allows the sharing of changes made by one user with other users in a more convenient way.

Table 5. Global Settings
Property Default Description

Data Refresh

2s

Interval duration used to update the currently shown page with new data (poll duration). To pause any page update check enabled checkbox.

Page Rotation

(undefined duration)

Duration each page is shown during page rotation. To run a rotation enter a interval duration using a number with time unit and check the enabled checkbox. To stop rotation un-check the enabled checkbox.

Role

User

Select the used user role. See section Changing User Role for details.

Page Sync

-

Update Local Pages button opens the dialog to pull page configuration updates from server (not available to Guest role). Update Remote Pages button pushes changes to local pages to remote (server) if such a page already exist remotely (Administrator role only).

Watches

-

Just a short-cut to the watches configuration page

Export Configuration

A configuration contains all pages. All changes made to a page or widget setup are contained in this configuration. To download this configuration as JSON file select Export from the Settings top right menu.

Import Configuration

To import a previously exported JSON file select Import in the top right Settings menu and select the JSON file to import. This feature can be used to version and distribute configurations for similar installations.

Importing files will override any changes made to pages existing in both the current configuration and the imported file. Importing pages will not remove pages that are not contained in the imported file.

Page Rotation

Monitoring console is built with the intention of being used to display metrics on status monitors. To allow viewing a variety of pages and their metrics on a single screen the pages can be automatically rotated showing each page for a configured interval duration.

To enable page rotation open the Settings and check the enabled checkbox in the Global Settings Page Rotation property. This starts the rotation immediately.

Global Settings

The duration can be adjusted changing the text field of the Page Rotation to another time value. Use units to express the duration, as for example: 1min, 30sec. Changes take effect immediately.

The rotation will show each page for the configured duration. Pages can be excluded from rotation by switching to the page using the page menu. Open the Settings and uncheck the Include in Rotation checkbox in the Page settings.

Color Configurations

Newer versions of the Monitoring Console allow to configure the colours used in graphs. There is a global colour configuration affecting all graphs.

Widget Status Settings
Table 6. Colors Settings
Property Default Description

Scheme

-

The selected scheme is applied to set the colours palette for Data as well as Defaults and Opacity. This is only a convenient way to switch all of these between different settings. The individual properties can be changed afterwards as usual without affecting the scheme.

Data # (Palette)

(colours of Payara scheme)

The list of colours to use when colouring data series in graphs. This is the line colour in line charts or the bar colour in bar charts. Background will use the same colour with Opacity. Add new colours by using the + button. The - button removes the last colour of the list. If there are less colours in this palette than needed for different lines in a graph new colours are derived based on this palette. The derived colours focus on bright colours with high contrast but might not always fit well with the rest of the palette. In such case consider adding more colours to the palette.

Defaults

(default of Payara scheme)

Sets the default colours for Waterline, Alarming and Critical as well as Red, Amber, Green and White state lines, backgrounds and texts. Some of these can be individually overridden in the widget’s configuration for these lines.

Opacity

10

Background opacity for lines or bars in graphs, 0 (fully transparent) to 100 (solid).

Thickness

(3)

Line width on a scale 1-8. Each point corresponds to 0.5px with. Default for lines in a line graph is 3, so 1.5px.

User Defined Watches

A watch describes the conditions to meet to start and stop alerts in relation to a specific metric. For example to get an alert every time an HTTP request causes a server error (responds with any of the 5xx status codes) a custom watch is installed. This is very similar to a custom health check that could be linked to any metric available where the thresholds are defined as part of the watch. There are many varieties of conditions that can be formulated. Each watch has a start condition which when met starts a new alert. Optionally a stop condition can be given which when met will stop the alert. If no such stop condition is given an alert stops as soon as its start condition is no longer met. Such pairs of start and stop conditions can be defined for 3 levels:

  • Unhealthy: corresponds to a "red" alert

  • Degraded: corresponds to an "amber" alert

  • Healthy: conditions for particularly good values, these will not cause alerts but can be used to mark a healthy range in graphs.

Each watch must have at least an Unhealthy or a Degraded condition. When both are given alerts can transition between Degraded and Unhealthy as the same issue or alert. Unhealthy takes precedence over Degraded and Healthy, Degraded takes precedence over Healthy should their condition be met at the same time. Should a Degraded alert be acknowledged but it transitions to Unhealthy afterwards the acknowledgement is reset so this change is not missed.

Creating a Watch

To create custom watches go to the Watches page using the top page menu or the Global settings.

Creating a Watch

The above example shows a relatively simple watch named HTTP server errors. The name of the watch can be anything as long as it is unique. The watch should have both a Unhealthy and a Degraded level.

The field after the If holds the name of the metric series the watch is linked to, in this case the ns:http ServerCount5xx metric, as shown on the HTTP page. The in clause specifies the unit type of the given metric, here it is the Count of 5xx responses. Following is the comparison operator is selected, here >. This is followed by the threshold value, here 3 or 1, and the scope of the comparison, here in sample. As both do not use a stop condition the checkbox after until is not checked.

The below table describes possible choices for each step of a watch condition.

Table 7. Watch Condition Segments
Segment Start/Stop Description

If metric-series

Both

Give the name of the metric series to watch. As monitoring is a generic, dynamic and open system there is no definitve pre-defined list of available metrics. The metrics currently available are listed at http://localhost:8080/monitoring-console/api/series/ (assuming DAS runs on localhost).

in unit

Both

Select the unit for the given metric series. This should be the same unit given in widget configuration Data property Unit. Depending on the chosen unit the threshold can be given in values of this unit, e.g. when selecting Milliseconds the threshold can be given as 1s for 1 second.

is operator

Start

Select the comparison operator to use when comparing the actual value(s) to the threshold. The comparison is of form actual operator threshold.

is threshold

Start

The numeric threshold. The upper/lower limit. With the right unit selected the number can be given with a unit such as 5min for time units, or 4KB for bytes.

is scope

Start

Select a scopes for the actual values that are compared to the threshold. The semantics of the different scoped are described in a separate table below.

is scope-length

Start

The length defines how many of the most recent data points of the watched metric are included in the scope of the comparison. A number with time unit refers to the number of the points, a number with time unit to the duration of points to consider for the scope.

until operator

Stop

same as for start

until scope

Stop

same as for start

until scope-length

Stop

same as for start

The below table describes the semantics of the different types of comparison scopes.

Table 8. Watch Condition Scopes
Scope Description

for last

The comparison must be true for each point in the scope for the condition to be met. Similar to a all logic. The number of points included is specified afterwards. This is either a bare number referring number of points to include or a number with a time unit referring to a duration where all points within the duration from most recent point backwards are included in the scope.

for average of last

The comparison must be true for the single average value of all points in the scope for the condition to be met. The number of points included is specified afterwards as number of duration from which to consider points.

in last

The comparison must be true for at least one of the points included in the scope for the condition to be met. Similar to an any logic. The number of points included is specified afterwards as number of duration from which to consider points.

in sample

The comparison must be true for at least one of the points available for the condition to be met. For this scope type no length is given as all available points are considered. This can vary but usually consists of the last minute of data.

After composing the watch conditions press Save or Update to install the watch. If a watch of the same name already exists the existing watch is replaced. Watches that originate from server modules cannot be replaced. To change them disable the existing watch and click to edit it, this will create a duplicate with a different name that can be adjusted freely.

Editing a Watch

To edit a previously created watch click on its name in the list of watches or chose Edit from the cogs menu on the right.

Watches that originate from server modules cannot be edited. When clicking their names or when choosing Duplicate from their cogs menu a duplicate with a different name is created. The suggested name Copy of {original name} can be changed as long as it is different to existing watch names.

When changing the name of an edited watch to an existing name this overrides that watch. In other words watches are always identified by their name.

Disabling or Enabling a Watch

Go to the Watches page using the top page menu or the Global settings.

List of Watches

The status of each watch is reflected by the checkbox left of the watches name, checked for enabled, unchecked for disabled. Disabled or stopped watches names are also greyed out. To disable an enabled watch un-check the checkbox or use the Disable item from the cogs menu on the right of the watch widget. To enable a disabled watch check the checkbox or use the Enable item from the cogs menu. The change has immediate effect.

Preconfigured Pages

Core Page

The Core page gives an overview of some of the important instance metrics. Each instance in the grid is shown separately.

Core Page
  • Active Sessions: Total number of active session for the instance.

  • Thread Count: Total number of threads used by the instance.

  • Request Count: Number of HTTP request per second processed by the instance.

  • Heap Usage: Percentage of the current instance maximum heap memory already used.

  • CPU Usage: Percentage CPU usage by the instance (not the machine but the process).

  • Thread Pool Current Thread Usage: Percentage of HTTP thread pool threads currently in use by the instance.

HTTP Page

The HTTP page gives a rough overview of the state of the HTTP services of each instance.

HTTP Page
  • Connection Queue Count Open Connections: Total number of open connection in the connection queue

  • Thread Pool Current Threads Busy: Number of threads in the HTTP thread pool currently processing a HTTP request.

  • Server Count2xx: Requests per seconds and instance responding with Success.

  • Server Count3xx: Requests per seconds and instance responding with Redirection.

  • Server Count4xx: Requests per seconds and instance responding with Client Error.

  • Server Count5xx: Requests per seconds and instance responding with Server Error.

JVM Page

The JVM pages gives a general overview of the state of the JVM running Payara Server Community.

JVM Page
  • Used Heap: Total size of the used heap memory

  • Heap Size: Size of the committed (reserved) heap memory (OS level)

  • Loaded Classes: Total number of loaded classes

  • Unloaded Classes: Total number of unloaded classes

  • Live Threads: Total number of threads used by the JVM (including daemon threads)

  • Daemon Threads: Total number of daemon threads used by the JVM

Monitoring Page

A page that shows key metrics on the monitoring system itself that feeds the monitoring console with data. This is mostly useful to identify problems with the monitoring system or to cross check that unexpected metric values aren’t caused by general problems with monitoring.

Monitoring Page
  • Source Times: Shows the duration it took to collect metric data from the individual sources within the server. Durations should be well below a second. Usually in two digit millisecond range.

  • Sources: Number of sources metrics are collected from in a particular second. This varies as some sources are collected in an interval of several seconds.

  • Sources with Errors: Shows how many of the sources that were collected caused an error during collection that potentially ends the collection of that source so that some metrics usually provided by that source might no longer be collected. This can explain missing metrics.

  • Metrics Time: The total time it took to collect all sources. As collection happens once every second this should be well below 1 second. This metric is watched and alerts are created should the time exceed 600ms for multiple times in a row. This time can vary a lot depending on the server configurations as sources are enabled and disabled. It can also be influenced strongly by the deployed applications as they can contribute metrics in large numbers.

  • Watches Time: The total time it took to collect and evaluate all watches. This time is less critical but it should usually be well below 1 second. Typically in the range of two digit milliseconds.

  • Alerts: The total number of ongoing (active) alerts.

Health Checks Page

A page that gives an overview of the state of the server health checks. The individual checks need to be enabled in the server HealthCheck configuration.

It is not required to enabled the HealthCheck service in the general tab in case the checks should only be enabled for metrics based monitoring without causing health check notifier messages.
Health Checks Page
  • CPU: CPU Health Check, percentage of CPU used by the server instance during last 4 seconds

  • GC: Garbage Collection Health Check, percentage of time used to collect garbage objects during the last 4 seconds

  • Memory: Machine Memory Usage Health Check, percentage of the total available physical memory used

  • Heap: Heap Memory Usage Health Check, percent of total available heap memory used by the JVM

  • MP Health: MicroProfile Health Check, percent of instances that respond with UP status to a liveliness health endpoint request (this includes the DAS that is the only instance that is performing this check every 12 seconds)

  • Connection Pool: Connection Pool Health Check, percent of connections currently used for each JDBC connection pool

  • Alerts: A list of all health check related alerts

Thread Health Page

The Threads page gives detailed information on thread health of the cluster and lists information on stuck or hogging threads. Total number of live and daemon threads are given for reference.

Thread Health Page
  • Stuck Thread Incidents: Table listing the most recent information for threads that are considered stuck. This is the case if the time passed since a thread started a not yet finished unit of work becomes longer than the set threshold in the Stuck Threads health check configuration. This could be because it is waiting at a lock or the operation takes a considerable amount of time.

  • Hogging Thread Incidents: Table listing the most recent information for threads that are considered hogging. This is the case if the thread’s use of the CPU in percent points is higher than the threshold value configured in the Hogging Threads health check configuration for n times in a row, where n is 1 + the configured number of retries.

  • Live Threads: Total number of threads used by the JVM (including daemon threads)

  • Daemon Threads: Total number of daemon threads used by the JVM

Be aware that the number shown in the example image are not characteristic. For demonstration purposes the thresholds have been set uncharacteristically low to be able to "force" entries. Usually the threshold for stuck threads should be in region of some seconds to a few minutes. The threshold for hogging threads should be well above 90% with a few retries.

Request Tracing Page

The Request Tracing page overview page shows traces that exceeded the set threshold.

If this page does not show any data, no request has exceeded the threshold or the request tracing has not been enabled. Check ConfigurationRequest Tracing.

The Trace Duration Range widget shows all traces for the same method as a single bar. The length and position of the bar gives the range of total trace duration for all the occurrences that exceeded the threshold. The bar starts at the minimum duration observed and ends at the maximum duration observed in the last minute. The legend shows the average duration and the method traced (innermost trace span operation)

Request Tracing Page

When all time minimum and/or maximum are added in the Widget Data settings the bar starts and/or ends at these "all time" values instead of the minimum and/or maximum within the last minute.

The Trace Duration Above Threshold widget shows a graph where each point marks the duration of a "slow" request at the moment the requests occurred. Possible alerts are shown on top of the graph.

The Trace Data view adds tracing details for the "slow" requests that can be sorted by wall-time or value using the widget menu (gear icon in top right corner of the widget).

By clicking on one of the bars in the overview’s Trace Duration Range widget the details of all kept traces for this method can be viewed. This data corresponds to the traces stored in the trace-store. For each trace, each of the spans corresponds to a bar in this graph.

Request Tracing Details Page

By default the traces are sorted longest (slowest) to shortest (fastest) total duration. On the time axis the spans are shown relative to the start of the trace.

When the side panel is opened the details of each trace span can be viewed by clicking on the span bar.

The legend gives average durations for each of the spans occurring for the traced method.

The detail view does not automatically refresh. To refresh click the refresh icon next to the Sorting menu in the top right of the widget.

The spans can be sorted by wall time by selecting Sort by Wall Time or by duration by selecting Sort By Duration in the Sorting menu in the top right of the widget.

To get back to the overview close the details by clicking the x-icon on the right side of the Sorting menu.

Be aware that the graphs in the example images above show unrealistically low numbers due to the way that request tracing was configured. In a more realistic configurations the durations are more likely to be in range of few hundred milliseconds to some seconds.

SQL Page

The SQL page gives insight into slow SQL queries for all connection pools configured in Payara Server Community.

To capture slow SQL queries the JDBC Connection Pool configuration’s Slow Query Log Threshold in the Advanced tab needs to be set to a positive number.
Slow SQL Page
  • Slow SQL Queries: Shows SQL details for queries that were too slow. Value refers to the duration of the SQL query.

  • Worst SQL Execution Time: Shows SQL query activity. Each data point gives the duration of the slowest query run during last second. Zero means no SQL query was run.

  • Slow SQL Alerts: A list of alerts caused by too slow SQL queries in reference to the Slow Query Log Threshold given in the pool configuration. Single outliers cause a degraded state (amber alerts), outlier groups cause unhealthy state (red alerts).

Alerts Page

The main use of the Alerts page is to check for any server alerts. The page is split into two lists: the upper list containing all ongoing alerts, and the lower list containing past alerts. As always for alert lists both lists are sorted starting with the most recent severe alert progressing to the oldest least severe alert. Annotations associated with alerts are hidden in this view to make room for more alert entries.

Alerts Page

Past alerts can be managed by acknowledging the alerts which makes them disappear from the list.

Application Metrics Page

This page uses a query to populate the page with widgets for series that originate from the MicroProfile Metrics of the server and applications deployed on the server. As this potentially are very many series the page by default is limited to at most 32 of them. The automatic placement will chose those series that show most dynamic. By default the widget placement is updated every minute. This means the order and selection of widgets placed on the page can change once a minute based on their most recent dynamics. The defaults can be customised using the Page settings.

RAG Status Page

Like the application metrics page this page uses a query to populate the page with widgets automatically. The metric series originate from the MicroProfile Health checks of the server and any of the deployed applications.

The purpose of this page is to give a quick strong visual indication of the health status of the cluster based on MicroProfile Health checks.

Each MicroProfile Health check is represented as a widget which shows the current status of the check in red (down for a while), amber (just went down) or green (up) for each of the instances where the check is performed.

RAG Status Page

In the example these are the checks Service1 and Service2 of the health-domo-1.0-SNAPSHOT application.

In addition to the MicroProfile Health checks there are aggregate metric series for each type (Health, Liveness, Readiness) per application and for the cluster as well as an aggregate Overall combining the three types health, liveness, readiness into a single metric series per application and for the cluster.

User Defined Metrics

Payara monitoring console is build with the intention to allow user defined metrics to be visualised. So far these include the MicroProfile Metrics that might originate from a deployed application and other extensions of the server. These are included in the available metrics automatically.

Open the Settings click Select metric(s) button for Add Widgets property in the Page settings and select the MicroProfile Metrics as Source in the selection dialog.

Page Selection for MP Metrics

Eventual MP tags used become the @ group of the shown series key, the MP metric stays the metric name.

User Defined Alerts

At current stage alerts cannot be defined in the monitoring console application by the user. While the infrastructure for custom alerts is available a GUI has not been added yet. That means so far alerts are only configured for health checks and the monitoring collection.