LISTEN TO THE EPISODE:
Choosing the Right Data Visualization Tools and Charts
As a data storytelling evangelist, one of the most common questions I get from my audiences is about which data visualization tools and charts to choose for certain business scenarios.
My answer is always, it depends on the situation. Navigating that decision can feel daunting in the ever-changing landscape of visual communication.
Thankfully, there are data visualization gurus with the right blend of expertise and impressive teaching skills like Kristen Sosulski to help. Not only is she an associate professor and Director of the Learning Science Lab at NYU Stern School of Business, but she is also the author of three books, her most recent one called Data Visualization Made Simple.
As a professor, Kristen teaches MBA students and executives data visualization, programming and business analytics. As Director, she develops immersive online learning environments for business school education.
And somehow she still has time to regularly consult, deliver seminars, and lead workshops on data viz techniques and best practices as a leading expert.
And in this episode, Kristen shares valuable insights about data visualization software, including a break down of the criteria to consider which data visualization tools and which charts work best for different business scenarios.
In This Episode on Data Visualization Tools and Charts, You’ll Learn…
- How Kristen’s discovered her passion for data visualization during grad school and how a Film Professor taught her to leverage film techniques.
- A breakdown of the Data Visualization Software Checklist discussed in her newest book, Data Visualization Made Simple.
- A walkthrough of several different data visualization tools and what they are best used for.
- The people who inspired her during her journey.
- The critical gems of data visualization wisdom she always shares with her students.
- The excellent advice she would give herself if she could go back in time described as the wedding favor of your presentation.
Data Visualization People, Resources, & Links Mentioned
- NYU Stern School of Business
- W.R. Berkley Innovation Labs – Learning Science
- “Storytelling with Data” by Cole Nussbaumer Knaflic
- ArcGIS
- Tableau
- Power BI
- IBM Analytics
- JavaScript Libraries Database
- R Project for Statistical Computing
- Qlik
- Domo
- Google Charts
- Python
- D3
- Hadley Wickham
- Hans Rosling
- Scott Berinato
- Edward Tufte
Kristen’s Data Viz Upgrade:
- “Never give a live demo. If you have a beautiful interactive display, screen record it first and embed it in PowerPoint. Make sure it is set to start as soon as you go to the next slide and then allow yourself to talk over it. It gives you two things, one is the promise that it’s not going to fail.
- Two is that it gives you an opportunity to practice. You’ve already recorded what you want to share with your audience and now you can prepare your story and think about how you want to lead them. You can make it look just as good as a live demo if you record it at a high quality and you can even do custom zooms, which you can’t do live.”
How to Keep Up with Kristen:
Thanks for Listening!
Thanks so much for joining me. Have some feedback you’d like to share or a question for Kristen? Leave a note in the comments below, and we’ll get back to you!
If you enjoyed this episode, please share it using the social media buttons you see at the left of the post.
If you liked what you heard, I would love if you could leave me a rating or review in iTunes. Ratings & reviews are extremely appreciated and very important in the rankings algorithm. The more ratings, the better the chance of fellow practitioners getting to hear this helpful information!
And finally, don’t forget to subscribe to the show on iTunes to get automatic updates and never miss a show.
A very, very special thanks to Kristen for joining me this week. And as always, viz responsibly, my friends.
Do you have a burning question for Kristen about data visualization tools and charts, or her amazing “Data Visualization Made Simple” book? If so, ask away!
Lea Pica: [00:00:00] It's December people. Lea Pica here. Today's guest is a powerhouse in the data visualization community and the next member of my Women in Analytics spotlight. Stay tuned to find out who's keeping it simple on the Present Beyond Measure Show, episode 40.
Lea Pica: [00:00:42] Hey guys welcome to the fortieth episode of Present Beyond Measure. The only podcast at the intersection of presentation data visualization storytelling and analytics my podcast is now officially older than I am. This is the place to be if you're ready to make maximum impact and create credibility through thoughtfully presented insights. So the year is coming to a close and man it was nothing short of exhilarating. My trip to Conversion Hotel in the Netherlands was both one of the most enlivening and most physically and emotionally challenging journeys I've experienced as a professional speaker.
Lea Pica: [00:01:24] I'll be chronicling that journey very soon in a tell-all blog post that shares a cautionary tale of resilience. But in a nutshell please always check passport requirements of other countries before heading to the airport and try to avoid flying in unexpected winter storms the next day. Enough said…
Lea Pica: [00:02:48] All right so I am completely stoked about today's guest. She is a rare find in the data visualization world; a true blend of deep expertise, impressive teaching skills, and an amazing personality to boot. Are you ready? Let's go.
Lea Pica: [00:03:10] Hello Hello and welcome. Today's guest is an associate professor at NY Stern's School of Business where she teaches MBA students and executives data visualization programming and business analytics. Light stuff. She is the director of the Learning Science Lab for NYU Stern where she develops immersive online learning environments for business school education. But as a leading expert on data visualization she regularly consults delivers seminars leads workshops on data of his techniques and best practices. You can find her speaking on this subject at events like Social Media Week NYC the plot-con conference and Tableau events, and her third book Data Visualization Made Simple: Insights Into Becoming Visual was just published and it is now on my short list of books to recommend as an amazing start to awesome data storytelling. And I'm thrilled to have her as my next guest in my Women in Analytics spotlight. Please welcome Kristen Sosulski. Hello.
Kristen Sosulski: [00:04:15] Hello. Really happy to be here Lea. Thank you.
Lea Pica: [00:04:18] Oh it's my pleasure. You know it's funny. Several practitioners, I've talked to recommended that I connect with you and I just never got around to it. And it's funny including someone I just taught in a workshop and who had attended one of your courses. But somehow you managed to find me first. So I was thrilled to have you on. I know. So I would love to hear, everyone wants to hear everyone's origin story. How is it that you fell into this whole world of data communication and what do you love about being here.
Kristen Sosulski: [00:04:53] Well my story I think is pretty unique not so straightforward. The straightforward part is that you know I went to business school. I was always working with data. I was an Information Systems major. And when I was in grad school like most people is really where I kind of fell in love with data visualization and so I was working with a film professor and we were collaborating on a project and he wanted to actually look at how we could visualize film. And this technique was so like just knocked my socks off like I had no idea what this meant. And so I just kind of like an eager grad student just went along for the journey and he taught me about how you can look at film by kind of stripping away the narrative and dramatic content and actually being able to graph and visualize what's happening structurally in a scene and how you can use elements of film to build suspects and how you can actually look at that as a way to get insights into how actors actually directors plan for suspense building and film Wow I love that.
Lea Pica: [00:06:00] So a couple of practitioners you know Cole Nussbaumer Knaflic, in her Storytelling with Data Book actually talked about some Hollywood inspired data storytelling techniques that can be applied. It's hard to wrap your head around the idea can be applied to data communications. So I'd love to actually hear you know what are some of those elements that you learned about that you think have carried over into what you do now.
Kristen Sosulski: [00:06:26] Oh I mean first like the progressive disclosure of information. Right. And so being able to start with you know to start a presentation with a question and be able to kind of like slowly reveal you know the insight or key takeaway through your data graphic I'm not showing everything at once right. Like you never view a whole film that one it feels so totally. That's just one example.
Lea Pica: [00:06:53] I love that I love to give the analogy to Game of Thrones like if they put up a black screen with a synopsis of Game of Thrones and that was it. No one would really watch it or feel like Red Wedding: Everyone dies. So. I love that.
Lea Pica: [00:07:13] What an interesting background that prepared you for this role. I love that. So I would love to dive right into your book and your book I think gives such a comprehensive toolset for looking at everything from the tools to the charts to the slides to the delivery approach, like really comprehensive. But I thought what we would do today is I haven't touched on the tools and the actual chart types a lot on this show.
Lea Pica: [00:07:44] And that was really core aspect of your book that I thought was executed really well. So I thought I'd give what I think everyone wants to know more of is let's hear about the tools and let's hear about the charts and we'll go from there. So you know I'd love to get specific scenarios about when you would suggest particular tools. What are their respective advantages? Where do they excel, pun intended, and really start with some a great tool you have in there which is the Data Software Evaluation Checklist. Tell me about that.
Kristen Sosulski: [00:08:20] Absolutely. So there's a lot of different types of software out on the market that allows you to visualize data and I kind of categorize these into four different categories. And so one is that you have your basic productivity tools. These are things like Excel and PowerPoint and really just out of the box productivity tools, Google Charts is actually even. I put that in the category of a productivity tool like a low you know setup time and pretty high value in terms in terms of the data graphics. Great. The second category is your true data visualization software packages so you know if you're creating a sophisticated geospatial displays you want to use like ArcGIS or Tableau is another great example like something you really specifically tied to building data graphics where if that's the sole purpose then you have your business intelligence tools like PowerBi and you know Watson Analytics all these are examples of tools used to kind of report on and show what's happening in the present right there. They're there to help you build dashboards in those dashboards tend to be really efficient data graphics that convey what's happening in the moment and then you have category number four which I call our programming packages. And so this is where you're able to do all the cool stuff that's really dynamic and interactive but really high start-up costs. So something like you know any of the Javascript Libraries that allow for visualization to be deployed on the web for interactivity and also what we call animation.
Kristen Sosulski: [00:09:55] Another example is R which is a programming package I use intensely so it's a statistical programming package that has it has a series of libraries that are excellent for visualizing data and that works really really well when you're doing any kind of data modeling or statistical analysis because then you have you have a complete toolkit so I can I can clean my data I can make it look like nice and fancy and shiny and formatted in the right format for the ways that I need to model it and also then display it communicate those insights out to an audience or my data team. So those are kind of the four categories of software and then I can discuss kind of how we know which one to kind of choose for a purpose.
Lea Pica: [00:10:42] So what kind of criteria are on this evaluation checklist that you've created.
Kristen Sosulski: [00:10:48] Great. So I have I have a checklist of seven different criteria. And so the first speaks to sharing. So, Lea, think about it like if I create a data graphic and you want to modify it we know that's like a lot harder if you don't have the software like Tableau or something like that. Then you have to download it and then it's it's not easy for sharing. But if we have it set up in our organization and it makes it much easier for sharing. So you want to think about that.
Kristen Sosulski: [00:11:12] Two is the output format. One of the things I talk about so much in my class Lea is about designing data graphics for the medium so you give PowerPoints probably every you know every hour
Lea Pica: [00:11:28] Right now, I'm actually giving one right now.
Kristen Sosulski: [00:11:31] So I'm constantly building these decks and those data graphics I use are much different than the data graphics I create for the web where I want an audience to interact with it. So now I'm not only becoming a data graphic designer but now an interface designer on top of it. I have drop downs and mouse-overs all those things I have to think about. And then finally for print sometimes out of the box when we create a data graphic in the font size is like point eight. We know when we print that out it's going to be a little pixilated and it might not be the highest quality format when they use colors that end up looking very similar. When you print them out in black and white so if you're using like a blue and green which are similar hues and densities you'll have a chart that looks like it's one color. So those are some things to really think about in terms of the output formats for your for your data graphics. Then we have interoperability so let's say I am using R to model my data. Well, what if I don't want to visualize it in R I want to visualize in different package because it has a certain more aesthetic or it's what my organization uses.
Kristen Sosulski: [00:12:38] So you want to think about that interoperability between software packages and I take something built in Excel and edit it in a different program than we have. This is four, maybe it should be number one, which is display types. Think about it. I want to create a data map. Well, I can't really do that at Excel right. You know so that will somehow some in some ways dictate the type of software you pick and then number five is data exploration. So thinking about will do I actually want to be able to run some summary statistics and learn more about my data before I start visualizing it? And are there tools that allow me to do that? Or do I already have to have really a question in mind? Can I do something that more unsupervised for instance? So that's a that's a question. And then a big part of my book is talking about incorporating the visualization into one's practice. So we think about simplicity. Is a tool easy to use? Is it something that I can just add on to my existing work or is there a really high start-up cost?
Lea Pica: [00:13:47] Wow I love how comprehensive that approach is. And for me what it spoke to was that this is about using a portfolio or a suite of tools because there's just going to be different audiences different mediums different scenarios different visuals you're going to need to communicate. So what I would love is to give the listeners a sense of different charting scenarios where different tools might be valuable. So I'd love to start with. I mean I have a feeling it's pretty well-known the chart types that are available in Excel and PowerPoint. These are great for some of the basics like bars and lines and such. But when do people want to start playing around with you know Tableau and Qlik and you know getting into the Bi tools like PoundBi Domo. Things like that. So let's start with Tableau maybe.
Kristen Sosulski: [00:14:41] Great. So Tableau is amazing for geospatial displays specifically ones that are geospatial displays of the US so their data maps are terrific. They use a projection called the Mercator projection that we're really familiar with it's what Google Maps uses. And so typically if we don't need to change the projection if we like that Google Maps type view we can change the background and we can we can plot our geospatial data. Latitude and longitude right. We can plot zip code anything that's you know around an address. We can we can plot in Tableau either as like a field map so we can field regions like states or we can plot points like precise locations of every like Starbucks and Dunkin Donuts for instance in lower Manhattan. We can plot those as points so that those are just some examples when you want to do something a little bit like connect points together that takes a little bit more coding but it definitely possible and Tableau I say OK.
Lea Pica: [00:15:45] So now how about you talked about Google Charts so I'll admit I haven't dabbled as much in the Google Suite other than some of the basic word processing and sheets and whatnot. So what are some of the things people may not know that Google charts offer?
Kristen Sosulski: [00:16:02] You know one thing that is really terrific if you're using Google Sheets and you want to have a small progress bar like a spark that shows like what percentage of this task is done from zero to one hundred, you can create a little mini graphic at the end of each task. Really really super handy. And so they're called Spark bars. There's also sparklines that just show kind of like you know how something has changed over time at a glance. It's really great for like mini dashboarding type tasks where you might not want to create a whole dashboard for you know for your organization or your team but just a few a few little indicators like key metrics that are really important to report on. So that's what I love Google charts.
Lea Pica: [00:16:47] So it might be an interesting way to spice up an otherwise extremely boring dry table. Right. So there is something they can visually compare metrics if people are determined to receive information or consume it in a table format. Would you say?
Kristen Sosulski: [00:17:04] Absolutely. One thing that I really encourage with my teams are you know I don't really want to hear a story about what happened, just to show me the data. Show me that you know what percentage of this task is complete or incomplete and I can look at it at a glance and then I can ask my own questions around it. But just at least have a starting point to know what status I think it allows for the poor for much more seamless communication.
Lea Pica: [00:17:31] Awesome. OK. And then Power Bi. So this is a tool that you know I believe you have to pay for. There's no like free option. So what could a company benefit from going towards Power Bi?
Kristen Sosulski: [00:17:43] I mean this is great for four dashboarding at several levels. So say you're running a sales team for instance and you want all your salespeople to be aware of the latest product changes and changing prices. Those types of things where they can also they can see what's happening in the present. Now I can see my sales progress over time, I can see how I did today compared to yesterday, and then as a manager, I can see all my salespeople you know how they did today versus yesterday. And so it's a great way to kind of aggregate what's happening and I say like in pretty much the present. When you want to dig back into like reporting in the past or prediction into the future then you're building some custom interfaces for those activities.
Lea Pica: [00:18:30] I see and you know that's also interesting because I had some experience working with a team dashboarding in Power Bi and we discovered a chart type that I had never seen before. I don't know if they came up with it. And of course, I'm blanking. It might have been called a spaghetti chart but you know I'd love to get some when we talk about chart types I'm going to bring up stacked bars. But one of their chart types seem to solve for an issue I would run into with stack bars where you are looking at composition changing over time but the ranking of those segments might change but they don't change visually in a normal stacked bar but with their possibly spaghetti chart did was actually change the ranking so that you were always seeing the ranking but the segments would change a pole position almost like are in a race car race. It's hard to describe it. I thought that that was a really cool innovation that I hadn't seen anywhere else and I'm still noodling (quote) if it's a solution.
Kristen Sosulski: [00:19:39] I mean being able to display to the users the most important information. I mean that's absolutely critical. And knowing that and so I mean when you're building any of these dashboards you know you kind of have to know what questions the user has in mind. And I think you know a lot of that just takes a lot of talking to people and usability testing to see what pieces of information are really the most important and then can they interpret the display.
Lea Pica: [00:20:06] This is so crucial. So I guess if you had to pick one tool, you had to build a dashboard in a day and you had one tool that you could use. Which one would you probably most likely choose? And would that depend on the situation?
Kristen Sosulski: [00:20:24] Oh wow. That's super hard for me. But.
Lea Pica: [00:20:27] I know they're like all your children.
Kristen Sosulski: [00:20:29] They're like all my children. You know if I had it if I had to build it in the day and I wanted it to make it look amazing I would probably use Tableau. I think I would get the biggest bang for my buck in terms of visually if first I needed to win an audience over by having you know kind of a usable design I would use that.
Lea Pica: [00:20:49] Ok that's great to know. So then after we talk about dashboarding tools I also want to get into some of the toolsets that are kind of really coming up on the scene but I don't think a lot of practitioners have necessarily gotten exposed to them yet. So I'm talking about like R&R Studio and Python, D3. So tell me why a practitioner would want to get involved in programming with R when there's so many other kinds of whizzy wig based platforms out there.
Kristen Sosulski: [00:21:20] Ok. So I mean R is ranked as like the second or third most popular like programming language for data scientists so that in Python and SQL are really going to be your top picks if you want to learn anything about coding. And so SQL we know as a query language for databases. So that allows us to get our data and then you know using something like R allows us to use various modeling packages like data mining packages and machine learning packages to interrogate our data.
Kristen Sosulski: [00:21:54] So if we want to do any kind of like sediment analysis for instance if we have text data so there's libraries that allow us to kind of process those and model them in addition to tools that allow us to visualize our data that are that are pretty sophisticated so you can you can actually create a dashboard and R you can, and there are a lot of different libraries like for instance I for all my classes I write tutorials and I write them all in R and I can actually produce it in a book format so I can create slides in R I can create PDF's in R yeah. It's a super extensible language. Everything is in latex so you get you get a nice print format for any content that you create and it allows for really easy sharing of projects among team members. So a lot of different advantages in addition to just like Python there are notebooks and so you can have a visual interface for coding as well which is which is I think helpful for those that might be new to it like you mentioned.
Lea Pica: [00:22:57] Ok. So that's really interesting. So are you seeing R being used, you know you mentioned it's extremely popular with data scientists for an analyst who's not necessarily a data science professional? Are you seeing applications in a daily capacity I guess daily applications of these more sophisticated programming languages?
Kristen Sosulski: [00:23:23] You'll notice I would say to answer you quickly. Yes absolutely. As we have more and more data, Excel tends to not serve our purposes right. We can only see like maybe a couple hundred rows on our screen at a single time it might start to flicker if we have too much data and then know doing anything like transposing our data or reformatting it and getting it preparing our data for visualization tools like R really help us do that quickly, we can write scripts that we can implement over and over again. Which makes it handy so we can kind of create a template for how we investigate any dataset and that's like you know I tip I have later on for you is like creating these different templates or scripts that say you know like import the data like OK let's look for missing values let's make sure it's in the right format. And here are seven templates for the types of charts I might want to create. And so with just a few lines of code, I can analyze at a basic level any data set. Very very quickly.
Lea Pica: [00:24:28] Ok so this is this is very interesting. When I think of the first tool I want to use for exploration like I have a wall of fresh crunchy juicy data and I can't wait to start moving it around, I immediately think of plugging it into Tableau so do some of these programming languages allow for kind of exploration the same way? Or is it more where you have a clear sense of the kind of visual you want to create and you're not in an exploratory capacity?
Kristen Sosulski: [00:25:02] So when I think about exploring a dataset like that kind of like rich juicy theory that looks like multivariate right. Yeah. So like for multivariate data like R is amazing. So it was designed for statistical analysis. So what you get immediately out of the box are all these different multivariate displays. So now I'm able to compare every variable to each other in something called a scatter plot matrix. Okay. To do that in Tableau I would have to create a single scatter plot which would map every variable to each other and then move it to a dashboard and then describe all the variables that. It's about probably 20 minutes to create one of them. It's one line of code in R.
Lea Pica: [00:25:43] Interesting.
Kristen Sosulski: [00:25:46] So a scatter plot matrix is a great way to compare you know continuous variables to each other especially if a lot of them so if you're thinking about something like crimes. So if you're looking at like the relationship between burglaries and murders in a city you know based on normalized for population that's a common example for multivariate data that we use in teaching. For example. We also have another chart type called a parallel coordinates chart which allows us to easily map every single variable and its relationship to one another. So they're mapped as it looks like a line chart. But you have local axes that connect each variable to each other and you can see how what those relationships are pretty.
Lea Pica: [00:26:33] This is so interesting. So if a listener wanted to get started with R and really dive in is there a resource that comes top of mind that's a good entry into that?
Kristen Sosulski: [00:26:44] Oh absolutely. I mean there are tons of R courses if you're not at NYU Stern You can't take mine but maybe one day I'll do an online one. But look there's Hadley Wickham is you know has written several great books on R. He has a new one called R for Data Science. It's like a year old or something. It's available for free. And it's also available in print through O'Riley. He is not only an expert coder in R but just a great teacher in terms of just his writing style and so you can kind of follow along. He has examples and questions at the end of the chapter so I think that's a great place to start. But you also need to make sure you practice and do it. Just reading about it isn't really going to get you too far. So just make sure that you do some of those exercises to practice and have some good data to work with.
Lea Pica: [00:27:37] Right. Awesome. Ok so now there was one more platform I was really interested in you talk a little bit about augmented reality and how that's starting in animation and how that's starting to influence data. So you mentioned something called Quantum Biz in your book and I went to their site and my jaw kind of dropped at one of the ones they showed where they used data to create this virtual environment of ships passing through the Panama Canal. So can you speak to kind of what you're seeing into the future like where the technology is really taking us?
Kristen Sosulski: [00:28:14] Yeah I mean the idea of that project is that we wanted to kind of like visualize like how much cargo is actually traversing the canal. And it's such an interesting system. If you think about you know a ship going from Shanghai to New York you know through the Panama Canal how that happens. Right. So it's time series data. To some extent what is going over you know what. Over time is like that you know the ships the ships speed over time but also like how much cargo and how many stops they're making along the way and how much it costs. There are so many different variables about that. And it tends to look very flat to me when I try to visualize that right. So 3-D and Quantum Biz's work on that is I think a step in that direction for sure to really understand the systems of the world and being able to see them like closer to reality than just kind of as a two-dimensional plane.
Lea Pica: [00:29:10] Oh man it was it was really I didn't quite understand what you were talking about but I had to see it and then I really understood how powerful that could be as a storytelling tool of not showing a bunch of line graphs of how a ship or how much cargo progressed or a donut chart showing the number. But rather an actual visual animation of the ships and everything. So I thought that was so cool. I think people should check that out. So we've talked about some of the platforms this was really valuable. I'd also love to talk about the right charts for situations so I might skip over some of the more basic stuff. We love bars for composition and categorical comparison and stuff. But first I want to know what are the charts you see misused most of the time like you'd love to see them either use differently or you don't want them used at all.
Kristen Sosulski: [00:30:06] Sure. So I won't speak to the pie chart. I'm sure a lot of guests talk about that. I mean so look I actually see the line chart used like poorly like folks mapping things like age on the x-axis or for a line chart and so you really need to have real-time series data not not categorical data for mapping that and you know people misusing it like really like trying to just show a change over a few years and using a line chart where you might only have three years and over that you have a big problem in terms of how that line is calculated from one point to the next it's making a lot of assumptions of what happened in between. Good point. So we want to have a lot more data to show variation when we're using time series and so I see that I see that misused a lot. So that's just one example. Second one is that we love data maps and sometimes we just you know if we have one or two locations we tend to think let's just create a field map and you know fill in Texas and California.
Lea Pica: [00:31:12] Yeah.
Kristen Sosulski: [00:31:14] Highest populous. I call that the population math problem looked like a big one. So you have you have a map of the US and you tend to see similar patterns of every map of the US that you look at because we look at our most populous states and those tend to have the highest values of whatever it is that we're looking at like sales for instance. And so making sure that data is normalized when we show when we use data graphics is super important when we show maps of the US. I always see like Alaska and Hawaii like missing or zoomed out so much that I can't see any detail like the continental US. So those are some examples.
Lea Pica: [00:31:57] Ok good point. So then which routes would you actually like to see used more but are like the underdogs of the race.
Kristen Sosulski: [00:32:08] That's interesting. One of the things I talk a lot about with my students is knowing the level of sophistication of your audience. And are they able to interpret something that might seem a simple chart type to many of us, like a histogram or a box plot but not so simple to the audience? Maybe because they haven't had the statistical training or they did but most people forget it or we forget what that horizontal line means it in a box slot which is the median, not the average and so. Knowing that level of sophistication. I mean I think, we want to show like relationships I like. I really liked to use scatter plots I feel like they're very interpretable. If the presenter can help interpret what it is that we're looking at and what the intersection of those two points really mean. So I would like to see more sophisticated displays to show to show phenomena. I see a lot of data graphics used to report on things that happened in the past and I'd like to see more or more charts that kind of can show predictions or raise questions about the future and I think those tend to be much more interesting.
Lea Pica: [00:33:30] Ok. Awesome. So I'd love to talk about some of the chart types and I've had some hang-ups and maybe you can help either debunk my hang-ups or reinforce whichever but first stacked bars and graphs. You know I've struggled with them in the past because I find that they communicate too much at once which is volume and composition and trending. So what's your feeling on trying to show these three things at once or is it just too much?
Kristen Sosulski: [00:34:02] You know you want to keep the number of stocks or subcategories per bar like four or less.
Kristen Sosulski: [00:34:08] So if we have more than four we have already have a problem. And if you want to be able to compare you know across categories you probably want to make it 100 percent stack bar rather than actual quantity. So then you can see the proportional change over time. So you see kind of like more of a block of color like say you're the top bar is like a light grey in the bottom bar is the black you can see that light gray and that kind of ribbon of change that those are the things that you're looking for if you're not seeing that. You might want to avoid using the stacked bar. Right. So if you're not able to have any discernible like trend over time across categories you might want to just look at it in terms of quantities that you wouldn't use the hundred percent. Do you also want to like reduce the number of categories that you want you don't want like a stacked bar to show like you know different demographics per state in the US? I mean that's just way too many. And so those are just a few things that come up to my head.
Lea Pica: [00:35:10] Ok. And then bubble charts so I know in terms of our brains and trying to understand relatives circle area that's one reason why the pie chart trips us up. So what are your thoughts on effective bubble chart usage?
Kristen Sosulski: [00:35:27] I mean we have the example of like Hans Rosling who like used it brilliantly right. He did because you know he first you see him presenting it right. It's not like that he showed you a static data graphic without the bubbles moving and so with bubble charts which is just a scatter plot with the bubble size by a different variable and that size that variable might be something like a quantity like population or sales. The bubbles tend to overlay each other. So without the use of animations that kind of like see the bubbles change, we have the overlapping problem. And then we have the comparison problem. If things are overlapping that we can't compare because we can't see them. But if we're showing this change over time it tends to be very effective, especially when we can point out I have a section of my book on this. But the bubbles that are the most important and if we remember I don't know if you remember the Hans Rosling talk but what color is it what color is China? Do you remember?
Lea Pica: [00:36:32] No.
Kristen Sosulski: [00:36:34] He says China's the big Red Bubble. I figured it was red. And you follow it through the whole presentation. And you don't really need to know what the other colors mean because he's really focused on one story and with every chart, we can tell you know dozens of stories. And it really depends. The one that you want to focus the audience attention on and you know the reason why the other countries in that example are plot it is because he's trying to show the entire worldview. But he has one story just about you know industrialized countries versus those that are.
Kristen Sosulski: [00:37:10] And he keeps the story very simple as he explains it but bubble charts on their own when they're static and the bubbles overlay we have that problem like you mentioned the accuracy and the size of the bubbles. Sometimes we can't really see exactly where they're plotting because the bubble is taking up to the screen. And so those are some things to really pay attention for.
Lea Pica: [00:37:30] Ok. Those are great things to look out for. And then actually we already talked about some pitfalls with maps. So I'd love to talk about stream graphs. I actually hadn't come across this name, so tell us about this.
Kristen Sosulski: [00:37:46] OK so so stream graphs are similar to stacked areas. So if you look at a stacked area chart what it shows is some type of change over time. So and if we deconstruct a stacked area chart a stacked area chart is just a line chart with the area filled in below and we can have we stack them. So we have many different variables so let's just take stock price so we can show stock price over time with a line chart. We have three stocks Amazon Facebook and Apple. OK. And then you can you can fill in the area below those lines for each of our our stocks and then we get an air stacked area chart we can take that a step further if we want to show like you know volume over time our market cap over time of these three companies and we can do that and see that variation by aligning everything to the 0 axis. So you have you have some stocks that appear above and some below but it aligns the data snaps the data to the to the 0 axis and this point.
Lea Pica: [00:38:47] As the baseline?
Kristen Sosulski: [00:38:48] As a baseline. And you have you have values above and below and it's not important to actually see what's happening on the y but we want to see the variation and the volume over time.
Lea Pica: [00:39:00] So it almost sounds similar in concept to a target variance bar chart which I've used to set a particular target for a group of categories all to zero and then the differential between their actual performance is all aligned to that zero baselines essentially to create relative comparative. I'm blanking on the word relative comparative point.
Kristen Sosulski: [00:39:31] Absolutely yes
Lea Pica: [00:39:33] Cool. All right. So then took a lie factor. Yeah, I'm loving this idea. I talked a lot about this with Scott Barinato of the Harvard Business Review as well as in terms of integrity as a data storyteller. So what is lie factor?
Kristen Sosulski: [00:39:50] So the lie factor is a metric that invented by Edward Tustey, the grandfather of innovation graphics. And. It's so important to understand that when you when there's a change in data let's just say an increase from one year to the next that we show that change precisely and proportionally in the change in the data graphics. So if we want to show you know 2018 to 2019, we want to show that percentage change from one year to the next. And let's say it's a 10 percent change so that that bar chart should only be 10 percent bigger than the previous one. And a lot of times there's an over exaggeration in the data graphic versus the data itself. And that's where we get a live factor and so we want our we want our lie factor to be one. There's a simple calculation for it like the difference in data and we actually measure the difference in size of the graphic whether it be in like centimeters or inches. And then we're able to compute our life there.
Lea Pica: [00:40:55] So I loved the simplicity of that calculation and one of the questions that came to mind is totally sold on how that affects bars especially so your eye needs the entire length of the bar to properly interpreting compare them. But what I've seen is that there's more wiggle room when it comes to lines. You know I've often seen axes truncated in a way to focus if a line is showing like a body temperature where the range is much smaller but the impact of that change is it is highly impactful. So I'd love. I've never quite been able to wrap my head around like a clear description of is life actor the same four lines as it is four bars and if not why not.
Kristen Sosulski: [00:41:46] A lot of times with line charts we start with a Y-axis that might not be zero. And we do that to show variation in the data. So something like I'll go back to the stock price example. So if the stock was never below a hundred forty-four dollars, in this case, we wouldn't start with a zero baseline. Why. Because we want to show variation say you know if we're showing like a five day moving average or something we don't want to start at zero we want to start with like a mean value like a few values below like the lowest value in your data set at the lowest values 144 you might start at 140 for instance and that might be the min value of your y-axis and the max value might be a few values above the highest value in your data set. So let's say it's 150. And so now you're able to use the entire space the real state of the chart to show the variation over the five days instead of if there is only a variation of like two or three percent you might just see kind of a straight line with what I know right. And so that's why you know when we look at that in the paper we see these little sparklines and all those sparklines are built in the way that I just described.
Lea Pica: [00:42:52] Ok. They're truncated. OK, that's really useful. It's it's always the kind of thing where I've tried to have a really concrete explanation for it that's valuable. Ok. And now you talk a lot actually about animation and different kinds of ways to use animation and I would love to know how can someone use animation to communicate data in like an everyday practical setting at work.
Kristen Sosulski: [00:43:19] Great. So animation. There's there's sophisticated animation and there's you know animation we can kind of fake it and we do this all the time with PowerPoint. Right. You create a few slides instead of animating it. So when we want to kind of progressively disclose information we can take a whole chart and we can put whitespace over like two chords. Two-thirds of it and then remove that white space very slowly. This is like the really low tech small way of using animation. Right. I almost have we're literally revealing it in that way. So that's that's one way we can use animation. Another is if we do want to show some type of trend over time. So having having a time series where it could be a data map of the world and we want to show a changing population over time so you show a static worldview and the only way to add a time element to that is to animated or to show many different versions of that or frame by frame by frame which shows year by year by year. So those are only two options to show change over time so we either do something called a small multiple and an animation is just a small multiple all on the same page that we traverse using a slider that allows it to go frame by frame by frame. So think about having a PowerPoint and having nine slides with nine of the same data graphics with it changing by that year element.
Lea Pica: [00:44:56] Ok that's great. And actually I'm a huge fan of small multiples especially as a build because it helps you avoid that line chart spaghetti if you're trying to get a sense sometimes separating all of them out like a number of segments but the same metric can be really valuable.
Kristen Sosulski: [00:45:15] Absolutely. Absolutely. There's a great one by the New York Times that shows like how people perceive various their neighborhoods in New York City. And it's it's it's a map of New York City and they have about twelve of them all printed in color and a gradation going from light green to dark green in terms of like their satisfaction with you know like garbage pickup and noise and all these different things and you can see it by neighborhood it's great. It's a great use of small multiples.
Lea Pica: [00:45:42] I'll have to put that one on the show notes for people to check out. Awesome. Well, that was kind of designed to be a fire hose of running through. I think a lot of different points that I think people will find valuable and I'd love to transition to this new segment. The Women in Analytics spotlight. And first I'd really love to understand. Like what do you think were some of the keys to your success as a notable woman in this very male-dominated field still.
Kristen Sosulski: [00:46:15] Oh gosh I don't know of some keys to success. Well, one thing is that I was always kind of scrappy so I did everything myself. So whether you know I was never expecting someone to like process my data for me just so I can visualize it you know like I always I always went through the entire process like if I didn't know how to model something I would learn or take a class on data modeling and figure out kind of my gaps. And you know I had I had a mentor early on that actually gave me a book about knowing what you don't know and that's I think been a real key to me and my in my work is recognizing where I need to improve and refine my skills and being able to go out and do that.
Kristen Sosulski: [00:47:04] And so being able to have that kind of meta-cognitive like reflection saying you know I really don't know this that well how can I how can I better communicate this. And just having that recognition has helped me. And that probably help happened to me when I was in my early 20s. I don't think I really understood what I knew until then.
Lea Pica: [00:47:24] Mm-hmm. I like that I like getting very hands-on from the beginning. Yeah, I think that's super important. And what about setbacks did you encounter any struggles as a woman in this field of analytics. And if you didn't I think that's great too.
Kristen Sosulski: [00:47:45] I mean you know the setbacks I think are kind of like normal ones. I didn't have any kind of tragic setbacks or anything but I think you know there's been there's been times where you know like I might have done work and thought that I should get credit for it and didn't. And so knowing knowing how to kind of share and collaborate and work in teams and knowing where where you kind of deserve the credit and where it's really like a team effort and being able to you know my disposition is always kind of representing you know my work as part of something as something larger not just like my work but a team helped me do this as an example. But not always getting that kind of reciprocation back. You know I think it's been a struggle.
Lea Pica: [00:48:29] Sure. That's understandable. Well, I I really believe it's important to recognize both men and women that really support our journeys in this field. So I'd love to know if you have if you'd like to give a shout out to a woman and a man that really supported your professional journey.
Kristen Sosulski: [00:48:50] Oh my gosh. Absolutely. So in graduate school, my dissertation advisor Ellen Meyer who a professor of education at Teachers College Columbia University she's like was so supportive of my journey and my kind of like thirst for knowledge. And she did something really special like I mentioned how it's really hands on. And so my research ended up taking this really hands-on approach where I would actually go out and interview people. So it's a kind of get to the bottom of things. And it's like through her encouragement and tutelage she really guided me in and getting answers to my questions in a way that was very encouraged and allowed me to talk to people and get out there and not just kind of work with data actually. So and that's she's just taught me how to be very ethical in doing that. And so bringing those ethical practices into my writing into my work into how I cite sources like obsessively in my book and she's really been a huge advocate for that. And yeah that's awesome.
Lea Pica: [00:49:57] And what about a gentleman.
Kristen Sosulski: [00:49:59] So this is like a cheesy one but I would say, my dad. I mean he really he was always very I mean only child and only female child and really supportive. He actually paired me up with a woman in his company when I was like I think 18 for an internship. And she mentored me and working in technology. I was actually like a help desk technician and he kind of gave me a mentor to kind of be like this 18-year-old like helping everybody executives on their computers and those types of things and it was just amazing. So you know he's always supported supported my journey. And advocated for my place so.
Kristen Sosulski: [00:50:44] Oh well thank you. Thanks to both of you for bringing Kristin to the masses.
Lea Pica: [00:50:56] So I call the next segment the upgrade which is a power tip or resource or tool for doing our jobs of presenting data more effectively. I mean you've already packed in a thousand tips but is there one more quick tip you have to share with us.
Kristen Sosulski: [00:51:13] Only one word. OK. So OK so this is absolutely something I mention to my students all the time never ever give a live demo. So I want to I want to have an animation or a beautiful interactive display screen record it first, embedded in PowerPoint, make sure it's set to play as soon as you go to the next slide and then allow yourself to talk over it. This gives you two things one you know it's not going to fail in your screen is not going to flicker as you transition to the beautiful G3 visualization that you created and everybody's waiting for you to do that.
Kristen Sosulski: [00:51:56] And then secondly it gives you an opportunity to practice. You've already recorded what it is that you want to show your audience and then now you can record your. You could you could prepare your story for what you how you want to lead your audience through that and you can make it look just as good when you recorded at a very high quality. You can even you do some things like custom zooms that you want to be able to do. So that's my tip.
Lea Pica: [00:52:22] Ok. Awesome I think that's a great one whenever you're relying on technology to be there for you in the live moment. It has a way of letting you down sometimes. All right. So this is our final question. Think hard here. Imagine this very plausible scenario. Your front row at Wimbledon when suddenly you trip and fall into a rip in time and it pulls you back to the moment you're about to give your very first presentation. What does present day you say to yesterday you Oh this is good?
Kristen Sosulski: [00:53:01] Good. I have to think about this. I would say that we always hear about leaving the audience with a takeaway. And I would advocate for saying it showing it and printing it out for them.
Kristen Sosulski: [00:53:21] Finding a way that they literally can leave with like you know the wedding favor from your presentation at wedding favor. Super important. And that means you know that that means that you should be presented in a way that you're teaching and that you want to maximize your attention. You want to know that your audience remembers and you have to give them cues and help in remembering it. And so that means that you have to take yourself out of just becoming a presenter and becoming you know a teacher at the same time where you where you have empathy and care for your audience. And at the end, you really want to guarantee that they learn something or they've remembered something. And so that's what I would say for past day. Kristin is really two to step outside of myself and become part of the of the experience and the audience.
Lea Pica: [00:54:14] I absolutely love that I might be using like what's your wedding favor. And it pointed me to a theme in your book that I really resonated with as a pseudo teacher at this point which is even if you're presenting something brief and it's not a workshop imagine that you are their teacher for this time and that they are you are teaching them something and you are trying to teach in a way that is going to lock in to their long-term memory so that they will act on it. I thought that was such a valuable mindset I might be putting that on Twitter.
Kristen Sosulski: [00:54:54] That's great. That's right. No I mean it's so important. I think that's like this because I studied education for so long and it is just before I did it before I did and that's why I kind of I told that story is that I didn't have that kind of recognition. Just like wow I really want them to remember something. So I actually think about them as humans and how we process information and retain it. So build on something they already know you know or ask them a question have them think do something active. Otherwise, they'll forget it.
Lea Pica: [00:55:24] I love it. I love it I love it and love it. Well, unfortunately, our time has run out. I feel like this could be like a four day.
Lea Pica: [00:55:51] Well thank you so much. I mean this was such a fun episode to get really hands-on with the right tools the right charts the right mindsets. And I really appreciate the path that you're blazing especially for women in this field to help teach not only teach but teach others how to teach their audiences even if it's a simple presentation. So I give you kudos for that.
Kristen Sosulski: [00:56:17] Oh thank you so much. Thank you so much. It is my pleasure being here.
Lea Pica: [00:56:24] What an amazing interview right. Kristen is a really special find in this field and I am really loving how women are leading the charge as experts in the data visualization field. It's just so inspiring to play even a small part in that movement.