2 weeks, 5 days ago david blackwellParticipant
I plot a series of small multiple time-series charts of COVID data for each state and the District of Columbia. The data range typically begins around March 1, 2020 and extends to the most recent date of data. When I import the csv file into Datagraph, I always sort all columns increasing by the ‘date’ column, followed by a ‘natural’ sort of all columns by the ‘state’ column. From here it is a simple matter to plot the time-series charts.
Sometimes I wish to include a data median line in the charts for reference purposes. I know how to get the median number variable for data in a column by right-clicking and selecting ‘median’. But the median number produced is for all rows in the table, which includes rows of data from over a year ago. Is there a means or formula for getting the ‘median’ number for the most recent data, in my case, the last 51 rows of data (each state and the D.C.) in the table?
I imagine it has to do with using a range of row index number as a mask, or using the most recent data as a mask, but I’m unsure how to do this in a formula for a variable. And the range of index numbers would change every day as I add new rows tot he data table.
This may be a simple question, but I’m unsure how to achieve this other than exporting data back into Excel and manually calculating the value. Any help would be appreciated. Thanks in advance.2 weeks, 5 days ago david blackwellParticipant
A clarification to my question:
“I imagine it has to do with using a range of row index number as a mask, or using the most recent data as a mask…” should read:
“..or using the most recent date…”2 weeks, 3 days ago dgteamModerator
See if this simplified example helps.
Here a Mask column is used to filter the Number column only to include certain dates.
Then you can use a Number from column variable to pull the median from the column “Number Mask”. It does not have a short cut to create by right clicking (something we probably should add) but the mask column can be selected, like any other number column, for the variable.
Here you can see the median pulled from each.
Would this work for your file?2 weeks, 3 days ago david blackwellParticipant
Your masked column method works like a charm! I performed a quick trial based on your answer to test this and need to go back a clean it up a little, but it will work for multiple charts I publish. It saves me the trouble of processing the median for select days in Excel, which is not bad but is extraneous and more subject to error.
Thank you so much!
BTW – I also use an variable for the data’s most recent update day. ?I can use this variable within the masked column definition to avoid need to update the new Median variable each time I publish.
I publish upwards of 600-700 charts almost daily (including a chart for each state or district for many variables). This and other tricks the Forum has given me has enabled me to publish this data relatively quickly each day.
Here is an image of the resultant chart. The median value is represented by the dashed black line. The median numerical value is also given in the legend.
Thanks again for the quick reply…2 weeks, 3 days ago david blackwellParticipant
By the way, pardon the typos in my previous reply and I noticed the chart values in the image I sent do not quite correspond the the y-axis. This is what I meant by going back to “clean things up”.
Nevertheless, the method you provided to find a median for select days in my column works very well and is easy to implement.2 weeks, 3 days ago dgteamModerator
No worries on the typos. Happens to us all the time, though we have more latitude to fix them behind the scenes 🙂
I think you were asking if you can use a variable for the mask criteria, and the answer would be yes, definitely.
This is such a wonderful project. Incredibly useful representation of COVID data and impressive that you keep this up to date.
You must be logged in to reply to this topic.