DataGraph Forums › Technical Support › Support Desk › Multiple significant differences
Tagged: significant difference, statistics
Hi,
I was wondering what is the most elegant, easiest and hopefully automated way to show multiple significant differences between groups in DataGraph.
I have 7 groups of data that I need to show their significant difference pairwise.
Thanks,
Kourosh
Hello Kourosh,
We uploaded a file to the examples that generates the mean and standard deviation from 7 groups of data.
The file uses expression columns to create a list for unique combination. Here are the expressions that generate the list in two columns A and B.
A = k-ceil((-1 + sqrt(1+8*(n-#+1)))/2)
B = A+A.count
where
k = number of unique groups (in this case 7)
n = number of unique combinations
= k*( k− 1)/2
Once these columns are populated (A and B), they are used to pull the associated means at that row index. (e.g., ColumnName(index) ), such that each pair of means is in one row of the table.
Here is a visual representation of each comparison.
This could be a starting point for performing other statistical testing but before we went much further and wanted to get your feedback.
Thank you for your quick response and this amazing example. This is indeed very useful and a very good starting point.
Looking at this example, what I want to do is to have a bar plot showing mean and std of each group (1 to 7). Then show if difference between them is statically significant or not, preferably by labeling them (similar to the attached figure).
I have already done the statistical analysis with Tukey’s HSD using Python (so I have F and p-values and I know the differences between means of which pairs are statistically significant). I want to see what is the easiest way to achieve a figure similar to the one attached with DataGraph.
Thanks,
Kourosh
The above figure is from this reference:
M. Aliabouzar et al., “Acoustic Droplet Vaporization in Acoustically Responsive Scaffolds: Effects of Frequency of Excitation, Volume Fraction and Threshold Determination Method.,” Ultrasound Med Biol, vol. 45, no. 12, pp. 3246-3260, 2019
Hi,
I just wanted to follow up on this and see if you have any suggestion for me to achieve this most efficiently.
Thanks,
Kourosh
Hi Kourosh,
Can you share a screen shot of what the output data looks like from your Python code? I think that would help a lot to see how best create these graphs in DataGraph.
Hi,
Thanks for your response. See attached screenshot.
For this attached example, I will have a bar plot with 7 bars corresponding to t01, t02, …, t07. I want to show the following,
* The difference between t01 and t02, t03, …, t07 is statistically significant.
* The difference between t02 and t05 is statistically significant.
To do so I want to use similar approach that I show in my previous post (with Greek alphabet above each bar).
Thanks,
Kourosh
I worked though a simplified example, using made up data, to show how you could do this.
The made up data has a mean for each group. Each combination is in another group with the difference between the mean, and a reject column for each combination. The third group in green shows a count of the number of rejects for each group.
(FYI – The total column is calculated using two Pivot commands to count the rejects for each group. These have to be summed to get an overall total.)
Once you have the total rejects, you can use a Points command to add the labels. The Points x and y locations are the same as the bar. This allows you to specify the label using the data, but you set the Marker to ‘No Point.’ Each Point command draws one of the labels, either alpha, beta, or gamma.
Here is the expanded view of the Points command that shows the alpha character, where it is only drawn where the total is greater or equal to one. The label is the same for each bar that qualifies.
Notice the align options are also set to place the label ‘Center&Above’ the point a variable is used to set the point offset in the y direction.
For the beta labels, the settings are as follows:
Gamma would be the same, except 2 is replaced with 3, and of course an updated label.
For a complete data set, you would need to have the label location take into account the error bars. For example, use an expression to calculate mean + stderr, and use the result for the y location in the points.
Does this help? Is this similar to what you need to do?
P.S. Sorry we took a bit long to get this example to you!
Hi,
Thanks for the example. I will go through it. Quick question, can you also show the details of pivot commands. Basically, how you calculated the counts?
Thanks,
Kourosh
No problem. The idea is to use the Pivot to sum the number of rejects. You have two Pivot commands that vary only by the Rows input, either group1 or group2. You need two since every combination is only listed once, so you have to consider both columns when you count.
Since each column has n-1 entries, you need to make sure the output is ordered the same way. That is why you have to use the Map rows options, and select a column that lists all the groups in the order of the mean values.
Here you can see the two commands and the result graphically.
Once you set up the commands, use the gear menu on each Pivot command and Extract all Pivot columns.
Here, the extracted columns are renamed to ‘Column1’ and ‘Column2’ and you can see how they are summed for the total.
I had some time to try this on my data and it works perfect, thank you so much.
I am including small modification that I made in case others are interested.
The only thing that I added was another condition when plotting greek symbols. Since the goal is to compare one sample against others it is better to exclude that particular sample. Also, as the sample number increases it become redundant to compare to the past samples.
See the attached screenshot.
Thanks as always,
Kourosh
In the above figure \alpha is versus sample 1 and \beta is versus sample 2.
DataGraph Forums › Technical Support › Support Desk › Multiple significant differences