The increased accessibility of data is predicted to forever change the way we do things, it will touch upon every sector, field and industry, understanding how to read and communicate it, is imperative. The volume alone forces us to realize the importance of interpretation, in fact, the amount of data produced in the past two years alone has exceeded the entire history of the human race in volume. As such, modeling, predicting and visualizing this data is becoming an increasingly important skill set, one that has been amplified by the presence of elegant and sophisticated data visualization tools available online.
However, before we begin picking out the type of charts/graphs that we intend to use for our analysis, it’s important to answer the following questions: “What is my data saying?” and “What am I communicating through this chart/graph?” The beauty of data lies in its flexibility, it can be visualized in a number of different ways, it can be analyzed for different queries, that being said the enormity of certain data sets such as genomic data or output variables from a national survey rely on the analyst’s ability to focus on the message that the data will communicate. The ultimate purpose of visualizing data is to assist in the development of conclusions, comprehensible next steps, identifying what has happened, what is happening and if possible what will happen. Consider a manager preparing a quarterly analysis of his/her team. We would expect to see revenues and sales made in that quarter, these results would be visualized as declarative graphs (i.e., a graph depicting a message). However, what about questions such as where are sales projected to be in the next quarter? Or perhaps an explanation as to why sales lagged in a particular month. These questions demand further exploration and data wrangling before coming up with the final conclusion, asking questions about your data offers fresh insight and perhaps information you were unaware of previously.
These principles are further expanded upon when considering the type of visualization needed to convey a message. The complexity of a data set does not necessarily equate to a complex and busy graph. Consider the following example, you are interested in conducting an analysis on Canadians active engagement with their family doctors, the general understanding is that those who are in frequent contact with their doctors have a greater chance of early screening and diagnosis. You are interested in determining whether Canadians are frequenting their doctors regularly, this information will assist with the development of promotional material. You decide to use publicly available data from Statistics Canada’s 2014 survey results, there you find the following data and produce your graph:
The graph demonstrates a higher percentage of the 20-34 age group without a family doctor, we also find that a higher percentage of males in comparison to females do not have a doctor, this conclusion is seen across all age groups. The graph has done a good job at organizing the participants into their respective age groups, it provides solid evidence that males (in all age groups) do not frequent their doctors within 2014, one question we can ask here is “Is this conclusion seen across multiple years and if so is it increasing or decreasing? From this observation, we can start building out a hypothesis and a starting point to our analysis.
Through Statistics Canada database, we are able to pull up data sets from 10 years back:
Here we see a declarative graph that focuses on the conclusion that the percentage of Canadian males in comparison to females do not have a regular doctor, this trend is seen consistently over the past 10 years at an average of 19.1% of all males. In this graph, we are able to form a concise conclusion that prompts further actions such as identifying the causes to this high percentage (i.e., accessibility, education), or conducting further analysis as to whether this percentage is correlated to increased diseases or a higher mortality rate in males. Out of the two graphs, which one communicates the overall conclusion easily?
Data that is communicated appropriately with the right tools should be relatively easy to deliver and allow room for new questions. When developing visualizations to communicate an idea/finding, the audience to which you are presenting to will be an important consideration to make. With the second graph, we are able to communicate the main message that males are less likely to have a family doctor relative to females, we’re not bogged down by explaining the age buckets or what the graph data set means. A busy chart may serve your data set and speak to the work being done (i.e., correlations between variables), however, it may prove difficult to communicate to an audience that is not familiar with the nuances of your work. Depending on your audience (i.e., senior executive’s vs research group) you will need to identify early on what your chart/graph will communicate. Asking questions such as “do I expect my audience to make thorough investigations into these findings?” or “are they interested in declarative messages that will prompt actions?” will help in formulating your delivery.
The acquisition of data is a necessity and we are at an age where open source databases will provide a wealth of information. However, without the science and fundamental understanding of how this data can be communicated, organized and used for decision-making processes, we cannot expect to benefit from the data or the tools available to us. Ask questions, pose different scenarios, offer different perspectives, only then can you truly benefit from data visualization and in turn produce meaningful and actionable results.
Statistics Canada, 2014. https://www150.statcan.gc.ca/n1/pub/82-625-x/2015001/article/14177-eng.htm