DataGraph Forums › General › Getting Started › Understanding DataGraph Data Management
Tagged: data
I really like DataGraph—and the DataGraph package for R that I just learned about!—but there’s a data organizational topic I always struggle with.
I work with good-sized piles of data. Dozens to even hundreds of variables (i.e., columns). In a single DataGraph file, I may have dozens of views (i.e., charts) of that data, each focused on different aspects of it. Some charts use variables A, B & C, others might use A, D & E. I have no problem making those visualizations.
My issue is, for each of those charts, the sort order often needs to be different.
Right now I move from chart to chart adjusting the sort order as I go, but that makes me feel as though I’m missing something fundamental in the DataGraph workflow.
I would expect that—and especially with the power of the DataGraph package for R—I could update my underlying data and have easy access to all my charts for quick export tio publication.
Appreciate any thoughts/guidance.
— Robert
Hi Robert!
Do you change the sort order because you want the graphs sorted by the magnitude of a variable? I’m asking because the Pivot command can sort automatically but it may require rethinking how to organize your data.
Thanks for the reply! Here’s a simple example of what I mean.
My data set might include 25 different of performance metrics (as columns) for a series of people (as rows). I’d like to have simple, horizontal bar charts for each of the 25 metrics, but in each I’d like the bars to be sorted from greatest to least performance.
If I sort the data table by the metric for the first chart, it looks right but all the others look haphazard. Right now, I have to manually switch to the next chart and re-sort the data table, print it, and then switch to the next chart.
This seems error prone and I just felt that can’t possibly be a best practice for using DataGraph. I guess I’m asking whether I’m missing a way to set a sort order for each of the graphs I create in a single DataGraph file.
That does sounds tedious! My advice is to use the Pivot command instead of the Bars.
The Pivot command will do the automatic sorting for you: Pivot command: Sorting
Here I tried to mimic the description of your data. There is a graph for each column. To create from scratch, highlight the person column and one metric column in the data table, then click the Pivot command short cut. Then change the sort order as shown below.
Is this similar to your case, where you have a row per person and then different metrics in each column?
Yes, this is helpful, thank you.
Generally, I stack Bars commands on top of one another then use the Use as mask feature to highlight specific results in different colors. This might be a challenge with the Pivot command as masks seem to work differently (i.e, masks with pivots determine whether bars are included at all, as opposed to the Bars command, in which masks determine if the bars are included but not drawn).
This may, however, be a tradeoff I have to accept in order to gain the other advantage of independent sorting.
Thanks again for your reply.
Let me try to explain a bit why the mask works differently in the Bars and Pivot …
The Bars command determines the location for each bar based on the row number in the data table. That is why you can mask a column from view and the other columns remain in the same place (i.e., the row number did not change).
The Pivot command creates a virtual table of the data selected. When you apply a mask to a pivot, the mask is applied to the data before the ‘virtual table’ is created. Only unmasked data are in the virtual table, which determines the row location for graphing. You can view the ‘virtual table’ by expanding the Pivot command.
To color individual bars from the pivot, extract the virtual table into the data table to use in other commands.
Click the gear menu on the Pivot command and select “Extract all pivot columns”
This will create new columns with the sorted data.
In the figure below, the extracted columns were renamed and placed in a group (Metric A). Then a Bars command was added with a mask to color the data for Person matching ‘3’.
An option to consider is to use a Bar command instead of the Bars. This command has a Color scheme option, such that the color of the bars are data driven.
Here is an example with a color scheme, where all the people set to blue, except 3.
With a color scheme, the color is changed in one place and will update in all the graphs. Here the color for Person 1 was set to green.
I also wanted to mention that since you have many columns in some datasets, we recommend using a flat format.
This is more a of database type of format where you would use masks to select the data to plot. If you want help setting this up feel free to email us a dataset.
This is brilliant. I can’t thank you enough for such a thorough and helpful explanation. I have this working, and tied to my R data set via the DataGraph package, this is quite a bit of impressive automation.
While I have you, and if you’d indulge me just one other question, I notice that the Bars command I was using previously offers data label placement at Top, Middle, Start and End whereas the Bar command offers Start, Middle and End.
I had been using “Top” positioning with Bars so that data labels appear at the very end of the horizontal bar. Is that not possible with Bar?
DataGraph Forums › General › Getting Started › Understanding DataGraph Data Management