Box command

Box plots, also called Box and Whiskers (see wikipedia), are a graphical way to compare distributions of data. These are non-parametric approaches (i.e., do not make any assumption about the underlying shape of a distribution of data).

Use the Box command to create box plots, point distributions, sideways histograms, and violin plots.

Values

Specify one column of data in the Values field.

Type

In addition to the standard Box and Whisker, there are several other types of graphs you can create using a Box command.

Whisker

By default, the Box command draws a standard box and whisker diagram, or Tukey plot. Further options in the detail view:

IQR: The box is drawn around the Inner Quartile Range (IQR), where the IQR is the difference between the first and third quartile. Outliers (1.5 times beyond the IQR ) are drawn as filled in circles, and extreme outliers (3 times beyond the IQR) are open circles. The whiskers are drawn to the smallest/largest non-outlier. Choose whether or not to draw outliers.

In this graph, 1.5 times the IQR is shaded, indicating the range, beyond which, a point is an outlier.

Min/Max: Draw the whiskers out to the minimum and maximum values.

The same graph as above drawn with the ‘Min/Max’ option.

Percentages: Specify one or more percentages to place the whiskers.

Points

The Box command can draw a point cloud. You can change the Maker type and change the Point color to use a color scheme.

Here is the same example drawn with Type = ‘Points’.

Sideways Histograms

Probability/Histogram options are used to draw sideways histograms. You can control the width of the Bin width.

Probability scales the height individually for each dataset. Histogram scales the height relative to the entire dataset.

Violin Plots

There are two options for violin plots, Violin PDF (scales each individually) and Violin Count (scales based on the number of points across the dataset).

When you have the same number of data points in each group, Violin PDF and Violin Count produce similar results.

Position

The Position indicates the location of the box on the X-axis. 

When the Position is set to the default of ‘Single value’, the number to the right indicates the numerical location on the X-axis, and each command draws one box.

When the Position is set to a numerical column, the value of the column can be used to specify multiple locations, effectively binning (i.e., grouping) the data. Thus, multiple box plots are created with one command.

Bins

When the Position is set to a numerical column, you have the option of further binning the data, using the Bins menu. For example, below the data are binned using a stride of 0.1.

Labels

When the Position is set to a text column, you have the option of specifying the label on the x-axis, using the Labels menu. With the first menu, select a column to specify what groups to include and in what order. With the second menu, select a column that specifies a new label (optional).

Direction

The Direction drop-down box allows you to position the boxes on the Y-axis.

Width

You can vary the Width of the box plot.

Fill

The Fill with drop-down box has options for solid colors, patterns, gradients, and color schemes.

Summary Statistics

The Box command has a scrollable table is shown that provides a list of summary statistics.

If you have a single box (one bin of data), the statistics will be in a single scrollable column.

If you have multiple boxes drawn by a command, each box is listed as a row in the table.

Directly below this table, you can specify additional percentages that you would like to compute. There is also a check box that will add a label with the numerical value of the median to the box plot.

Related Articles