How I Feel When I Have A Conversation

This post doesn’t offer any solutions. It just shows how I feel. If that makes you uncomfortable or angry, don’t read it.

Hi! That’s me:

image

That is me having a conversation:

image

And that is me having a conversation that is sad. Or frustrating. Or hard. Or intimate. Or everything together:

image

Often, I use technology to have these conversation: The easy ones and the hard ones.

I’m not a security and privacy expert. I’m a normal informed user, using the technology many other people use: WhatsApp for my family. Telegram and Facebook Messenger for my friends. Slack for my coworkers.

Technology, in the best case, shouldn’t make us feel like we need to care about privacy. In the best case, we feel that we can control the audience of our conversations.

image

But we’re not there yet. I can’t get over the feeling that I’m not the only one in the room with my conversation partner. There’s SOMEBODY else.

image

This SOMEBODY is hard to define. Is is the service I’m using? Facebook? WhatsApp? Slack? Is somebody listening I don’t even know? And are all these SOMEBODIES sharing their information with other SOMEBODIES? I feel like I can’t judge that.

image

Knowing – or rather: not knowing – about these SOMEBODIES changed my conversations. In the beginning, I minded. And I spoke like I would speak to strangers: Controlled. Going safe. Imaging all the possible SOMEBODIES in the room.

image

But nothing happened. There were no consequences of any of my conversations. All these SOMEBODIES didn’t seem to care. Or they cared, but didn’t blab out my conversations. Or they did blab out my conversations, but nobody acted on them. And I didn’t know: Was that because my conversations were boring? Or was it because SOMEBODY was actually a nice guy?

image

I hoped for the latter and became braver. That’s were I am now: I don’t imagine anymore that every stranger out there listens to my conversations. SOMEBODY gets more and more invisible. He’s not part of my conversations anymore – at least in my mind:

image

So I tell my conversation partner the hard things via these services. The sad things, the shocking things, the uncomfortable things: Things I wouldn’t want strangers to know.

image

I still have friends who are very much aware of the SOMEBODY in the room. They want to make sure that he doesn’t listen. I don’t mind it. But I wonder how much it helps. And again, I feel like I can’t judge it.

image

But these days, most friends seem to care less than I do. It makes me feel a little bit uncomfortable, but also braver in my own conversation: If they tell SOMEBODY about all the hard and private things and nothing happens – then maybe I’ll be fine, too?

image

Conversations are hard by themselves. Being concerned about privacy doesn’t help them. Being NOT concerned about privacy doesn’t seem to be the solution, either. I’m left confused. But not confused or afraid enough to use PGP, Signal and other services and methods used by “privacy nerds”. We should all be privacy nerds – but it still isn’t as convenient as just being slightly scared. I don’t think that we will change. But I have faith that privacy technology will change; becoming more convenient.

And I really hope that all the SOMEBODIES don’t use the information they have gathered about me until then.



Which Cities Are On Similar Latitudes?

Yesterday afternoon I worked on a small project: I got rid of these annoying things called longitudes and just showed the latitudes of cities and continent borders on a graphic. I published it on Twitter and got TONS of feedback. Some people said that the cities are an odd selection, they wanted to know why I picked Paris over Rome, and some wanted a “‘add your city to this chart’ function”. Well, I’m not a fan of that. But I did extend on the original graphic a little bit, to do more justice to more cities.

flatland

I might add some other versions in the future. Maarten Lamrechts said he “Would like to see one with South flipped upwards, y-axis starting at 0”, and that seems like a great idea. And both DDNNNDLNIF and Kavya Sukumar mentioned that they would love to see a version with longitudes. I think that the latitudes are more interesting (because harder to compare on a map) than the longitudes, but it’s worth a try.

If somebody wants to tweak these versions etc, go ahead. It’s all Creative Commons (BY-NC-SA). Here is the data I used for this graphic; pulled from geonames.org. It contains the ten biggest cities for each country.

Also, if you enjoyed this kind of map, check out this map by Eric Odenheimer and by his work inspired maps by the Washington Post, and a by THEIR work inspired map by Andy Woodruff.

Edit: Devdutta Bhosale recreated the same idea in Tableau, but on an actual map. And you can compare cities on similar longitudes in his version as well. Thanks, Devdutta!

flatland


Edit2: Philipp Bock recreated the same idea with React and Redux, and you can search for cities and add all ones you want to / have lived in / have a personal relationship to. Thanks, Philipp!

flatland


…and Djam Saidmuradov built a d3 Version which shows you the capitals, the 20 biggest cities and also lets you search for cities. Thanks, Djam!

flatland



Less News, More Context

This is the transcript of a lightning talk, given at the 16th of June 2016 at the Information+ conference in Vancouver.

Goals are great. Only if we have a goal we can move towards it. And journalism has all sorts of shiny goals: It wants to be the guardian of society, a mirror of the world, create a better world and help people in their daily lives.

But journalism also likes eyeballs. And the way to attention is not always the same as the way to creating a better world. The way to attention is mostly NEWS, showing THAT something happened.

image

Why do news work so well? Because we are all drawn to novelty, like moths to the light. New equals important in our heads.

image

That’s why journalism talk so much about these dots – and blows them up. News sites tell us about these dots with different kinds of media: Text, video, photos, and, of course, information visualization. They all have their advantages and show the same dots from different perspectives. They enhance the dots and make them memorisable.

image

But I see that as problematic. First, these dots don’t represent the world – something that would be needed to get to these high goals journalism has. Instead of bringing a better understanding of the world, these dots lead to unjustified emotions and beliefs in our heads. One reason for that is the fact that bad things resonate more with us then positive ones. “If it bleeds, it leads.” If an event is bad, we may perceive it as more common.

But even if we report positive news, they don’t lead to a better understanding of the world. We all blow up unlikely events in our heads and think they are more common than they actually are, like plane crashes and lottery wins. On the other hand, very likely events are underestimated in our heads.

image

We also blow up terrorism in our heads. 400 of 1000 US adults worry that someone in their family will become a victim of terrorism, according to a survey by YouGov last December. 40% of US adults! I find that incredible sad. This irrational fear is built up through news coverage, but useless for navigating the world.

What I would like to see is even more examples of bigger pictures. Of comparing and setting things in context. That’s the actual super power of data visualization in journalism - not creating locator maps for news events.

image

We can show that one dot is actually part of a decreasing trend, like the Huffington Post did with terrorism in Europe after the Brussels attacks. Another example is Our World in Data by Hans Roser, who here shows that poverty is going down dramatically when seen over two centuries. And there’s “Früher war alles schlechter”, a weekly column in the German magazine DER SPIEGEL. One of my designs for this colum shows, for example, that there are way less plane crashes nowadays than 45 years ago.

Or we can show that a dot is actually an outlier. For example, the Los Angeles Times created a simulator that lets you experience how unlikely it really is to win in the lottery. That’s a great example of fighting against the hope that we, too, could easily be the lucky winners. That hope gets mostly fueled by the lottery industry and advertisement – but also by showing the lottery winner on newspaper covers on the next day.

image

Dots still have their place. To know about the dots makes sense for the fields we are working in or the ones we are personally invested in. And we still need news to serve as examples for the bigger picture. We want to get up and down the ladder of abstraction. But I’m arguing for more balance. But I think it’s crucial in journalism to ask ourselves more and more: With which information can my audience navigate this world better? Which informations will build useful beliefs?

image

Data Visualization can help with that. It’s an important tool for destroying useless believes and creating useful ones, and we should use it as such. Setting things in context can bring journalism closer to its actual goals. It can bring journalism closer to answering the question “In which kind of world do we live?” than answering “What’s new?” can bring us.



One Chart, Twelve Tools

image

Which tool or charting framework do you use to visualize data? Everybody I’ve met so far has personal preferences (“I got introduced to data vis with that tool!”, “My hero uses that tool and she makes the best charts!”). Often we keep working with the first not-entirely-bad tool or language that we encountered.

I think it can’t hurt to have a wider view of the options out there: To maybe discover better tools than the ones we use; but also to reassure us that the tools we use ARE really the best (so far). That’s what this post and the next post is about. I wanted to get to know as many options to visualise data as possible. To do that, I took the same dataset and visualized it with 12 different tools (this post) and 12 different charting libraries (next post).

If they are important tools I missed, or if I missed some features in a tool or a better way to get to the bubble chart, or if I’m wrong about a thing or two, or if you completely disagree with my opinion about these tools (which, I’m sure, will happen): Let me know on Twitter or via email (lisacharlotterost@gmail.com)!



The Data & the Visualization Form

To visualize data, you need data. I will use a dataset that contains the health expectancy in years, GDP per capita and population for 187 countries in 2015, provided by Gapminder. Here’s a Google Spreadsheet with the data:

image

I will try to visualize the data in the same form Gapminder does: On the x-axis, I want the GDP per capita (“income”). On the y-axis, I will put the health expectancy in years (“health”). And the size of the bubbles will represent the population of the country. Some tools call that a scatterplot, some call it a bubble chart.

image

I chose that visualization form over a simple bar chart for multiple reasons: Setting the size of the bubble will show which tools are more advanced. Also, we want the x-axis to start at 0 and be log-scaled. Let’s see which tools can handle that.

Please be aware that I will only use tools and programming languages that make sense for my data and not networking tools like Gephi. Also, the statements I make about the tools and languages are only true for the dataset I chose and definitely influenced by my past experience with the tool.

Some rules: I will try to reproduce that Gapminder chart as good as possible, but I won’t tweak the design more than the tool allows. Eg. in Illustrator, I will only use the chart tool, not the thousands of design options. Also, I won’t tweak the csv beforehand. The csv will stay like it is. Yihaa, let’s go!



Excel We start with the most common software to use charts. It’s also responsible for the most complicated process to get to this scatterplot of all my tools. After finally figuring out how to assign columns to axises, I couldn’t find a way to make the bubbles feel NOT like one big black merged cheese hole something. The option “Vary color by point” (I had all my hope on it) gives all bubbles an individual (very colorful) color. Hm. That said, Excel did an excellent job with the axises. image image



Google Sheets Google Sheets is my favorite spreadsheet app out there (so far). On a Mac at least, it just works smoother than Excel. Also, it’s far less complex and powerful than Excel, but “good enough” for the daily stuff I do in spreadsheets (split, unique, countif, ifelse, vlookup, max, average, median, simple math and pivot tables, because I loooove pivot tables). Google Spreadsheets CAN do bubble charts, but only with a little hack - I would have needed to change my data for it. image image



Adobe Illustrator Ah, Illustrator. A tool that almost every Information Designer uses for static designs, daily. Let’s be precise here: The charting options in Illustrator suck. For example: Illustrator can’t do bubble charts. Also, there is no option to set the Y-Axis to Zero (if you want that, you need to include a row in your data in which you put a 0 and that you then delete in the chart). Also, there is no option to set an axis to log-scale (if you want that, you need to do so in your data). Also, Illustrator connects all the dots (which, to be fair, can be helpful in 5% of the cases). Sigh. Edit: David Ingold showed me the way to set the origin to 0 and to disconnect the dots. image image



RAW by DensityDesign I’d call RAW a super-easy-to-use extension for Adobe Illustrator: You can export fancy charts as SVG or PDF and then tweak them in Illustrator. For scatterplots, RAW does have the option to change the size of the bubbles and the option to set the origin at 0 – but only for both axises. Also, no log scale. gif image



Lyra I like Lyra. I’m a fan. It’s been around for quite some time, but it’s still buggy and I wouldn’t recommend it as the only tool one should use. But more than its execution I like its concept: Lyra treats every visual element and its size, height and width as something that can be manipulated by data - with simple dragging variables and scales on them. You can export your final graph as SVG, so it’s Illustrator-tweakable. Lyra is what Illustrator SHOULD be. gif image



Tableau Public What can I say - Tableau Public just works well for data like this. It’s too slow for my taste to use it for interactive graphs, and it’s a huge shame that it’s not possible to export SVGs or PDFs or anything, really, in the free version. But for exploration, it’s still one of my favorite tools.

Edit: Ben Jones explained that it’s possible to download an Illustrator-tweakable PDF of the chart after uploading it to the cloud. Needing to upload the chart is not awesome, but being able to export a PDF is pretty awesome and could have helped me a lot in the past. It turns Tableau into an Illustrator extension. gif image



Polestar Like Lyra, Polestar is a creation of the University of Washington Interactive Data Lab. Polestar uses Vega Lite, which is based on Vega, which is based on D3.js. Their creators call Polestar a “lightweight Tableau-style interface for visual analysis”. And it IS really lightweight (and still very alpha, as some glitches and the missing option to resize the graphic show). But I’m all up for a free, opensource (browser-ran) alternative to Tableau, so I really hope that the lab will continue working on it. gif image



Quadrigram Quadrigram is a story building tool, and one of their sub-features is to build charts. You need to connect with your whole Google Drive account to Quadrigram, which seems shady. Honestly, I wouldn’t have included it here, but Alberto Cairo seems to be a fan. And because you all follow Alberto, you’ve probably asked yourself the same question: What is he talking about? Quadrigram appears to be a very simplified version of Lyra. Works for simple stuff, but is a little bit confusing and doesn’t offer log scales. But here too, you can export an SVG. gif image



Highcharts Cloud Highcharts is mostly known for their Javascript library, but after some time I figured out that they also have a click tool to generate charts: Highcharts Cloud. I was disappointed by the whole tool. It took me quite some time to figure out the following problems: 1) There is no way to assign variables to axises. Your data table needs to in the right order. 2) My “income” data wasn’t read as numbers, but as strings or categories, although Highcharts showed me that the data type is “numbers”. 3) “Invert Axises” inverts the axises, but also puts the y-axis upside down. But only the y-axis. Why? Nobody knows. 4) The range of my “health” column goes from 50 to 100, but Highcharts showed me the data on a 0 to 12.5k scale. Why? Again, nobody knows. I had to set the range manually. gif image



Easychart Fun Fact: There is an interface for the Highcharts library which works far better for my data than the official one by Highcharts: Easychart. Here, too, it’s not possible to assign variables to axises, so I had to delete my “country” column. And there seems to be the option to set the origin to 0 – but the chart disappears when I do so. And still. I got the chart I wanted. I’m happy. And you can download the result as PDF and SVG, meaning, it’s Illustrator-tweakable. gif image



Plotly Like Highcharts, Plotly comes with a click-tool and a Javascript library. Oh, and a R library. And a Python library. And more, and everything well explained. And unlike Highcharts Cloud and Easychart, I was able to assign variables to axises. Everything can be so simple. Not gonna lie: I’m impressed. And Plotly released a better version of their click tool - unfortunately only for Pro users. gif image



NodeBox I first came across NodeBox in this Visualoop article. Seemed neat! Visual programming languages like NodeBox have my sympathy. But yeah, while working on the scatterplot, it was pretty obvious that it’s not made for simple Data Visualizations. Instead of directly assigning values to x and y of the ellipse, I had to first create “points” and then pass their coordinates to the ellipse. In which format that happens? Nobody knows. It took me some time to figure that out – so I was almost relieved to notice that it’s not possible to create axises in NodeBox. Which makes it also very hard to know WHAT you plot. Looks beautiful, though. If you want to run it for yourself, find the code on GitHub! gif image



Datawrapper Some of you might have noticed that I didn’t mention their beloved Datawrapper. This tool is one of the easier chart-clicky-tool out there and is used by many newsrooms. But it doesn’t offer scatterplots. The explanation I was given: Datawrapper is supposed to be for presentation only, and scatterplots are a presentation format that should be handled with care.



Tha’s it! Let me know what you think on Twitter or via email (lisacharlotterost@gmail.com) (seriously, I love emails).

The many hours spent trying to understand all these tools were made possible by my Knight-Mozilla OpenNews fellowship at NPR. A big thank you to OpenNews, the NPR Visuals Team and the helpful comments at the GEN Data Journalism Unconference at the 10th of May in New York City.



One Chart, Twelve Charting Libraries

image

Charting Libraries. Gosh, there are so many out there. On Wikipedia and other websites, one can find a comparison of ca. 50 libraries – and these are only JavaScript libraries; not mentioning languages like Processing and libraries for Python and R. In the following blog post, I will try to get to know a few ones out of the great sea of possibilities. I want to understand their differences and how easy it is to learn them. To do so, I created the same bubble chart with twelve different frameworks. The chart and the underlying dataset I’ll use for that experiment are explained in the last post, “One Chart, Twelve Tools”.

I’m fairly new to most of these libraries. If there’s a better way to create the bubble chart than the one I used, or if I’m wrong about a thing or two, or if you completely disagree with my opinion about these libraries (which, I’m sure, will happen): Please let me know on Twitter or via email (lisacharlotterost@gmail.com), or as a pull request on Github.



R – native R is the hippest statistical language around these days. Many data journalists all around the world feel an urge to learn it. Personally, it took me some time to understand the concept of data frames, but it’s totally worth it – especially when R is used with additional libraries like dplyr and ggplot2. But first, let’s look at creating plots with native R, without any libraries. It is possible; often it’s only a plot(d$income,d$health). But to create a bubble chart, we need the symbols function - FlowingData shows how.

#set working directory
setwd("Desktop")

#read csv
d = read.csv("data.csv", header=TRUE)

#plot chart, set range for x-axis between 0 and 11
symbols(log(d$income),d$health,circles=d$population,xlim = c(0,11))

image



R – ggplot2 Native R is ok, but ggplot2 is where the fun begins. Again, it took me some time to get into it – especially because there is more than one possible way to write the ggplot2 command. But I’d consider ggplot2 one of the most flexible and at the same time easy to handle libraries out there.

#import library
library(ggplot2)

#set working directory
setwd("Desktop")

#read csv
d = read.csv("data.csv", header=TRUE)

#plot chart
ggplot(d) +
  geom_point(aes(x=log(income),y=health,size=population)) +
  expand_limits(x=0)

image



R – ggvis I’ve heard about ggvis only a few days ago. Similar to Bokeh, it tries to make interactive which wasn’t intended to be interactive: Ggvis’ graphics are built on Vega (a Javascript library built on D3.js). And its syntax is very similar to the one of dplyr, which I as a dplyr-Fan appreciate. I’m not sure if I’m a fan of the needed ~ before variables, though. And I couldn’t combine a log-scale with setting the domain of the x-scale to zero.

#import library
library(ggvis)
library(dplyr)

#set working directory
setwd("Desktop")

#read csv
d = read.csv("data.csv", header=TRUE)

#plot chart
d %>%
  ggvis(~income, ~health) %>%
  layer_points(size= ~population,opacity:=0.6) %>%
  scale_numeric("x",trans = "log",expand=0)

image

More R libraries which will produce JavaScript visualisations can be found on the htmlwidgets-Website. Juuso Parkkinen wrote a really good comparison of data vis libraries for R.



Python - matplotlib Matplotlib is the ggplot2 for Python: it’s a library for Python that makes building charts easier than Python does. I’m totally new to Python, so I found myself stuck in understanding how to import csv’s. For…hours. The Pandas library finally solved that problem for me. Also, I was surprised that I had to tweak the bubble size per hand. But from all the Python libraries I tried, matplotlib is definitely the easiest one.

#import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#read data
data = pd.read_csv("data.csv")

#plot chart
plt.scatter(np.log(data['income']), data['health'], s=data['population']/1000000, c='black')
plt.xlim(xmin=0) #set origin for x axis to zero
plt.show()

image



Python - Seaborn Seaborn is a library built on top of matplotlib. It is made for more statistical visualisations than matplotlib, and seems to be great every time you want to plot a LOT of different variables. For Non-statisticians, it might be overwhelming: There are two possible ways to create a scatterplot, and Seaborn defaults to drawing the regression model (aka “trendline”).

#import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#read data
data = pd.read_csv("data.csv")

#plot chart
g = sns.regplot('income', 'health', data=data, color='k',fit_reg=False)
g.set_xscale('log')
plt.show()

image



Python - Bokeh I found Bokeh really promising in the beginning – especially because it creates an HTML file and can be easily made interactive. It could be the perfect combination of a language for analysis (like ggplot2) and one for presentation (like D3.js). Personally, I’m disappointed that Bokeh uses Canvas instead of rendering SVGs. And I had some weird errors during my process.

#import libraries
import pandas as pd
from bokeh.plotting import figure, show, output_file

#read data
data = pd.read_csv("data.csv")

#plot chart
p = figure(x_axis_type="log")
p.scatter(data['income'], data['health'], radius=data['population']/100000,
          fill_color='black', fill_alpha=0.6, line_color=None)

#write as html file and open in browser
output_file("scatterplot.html")
show(p)

image

Great comparisons of more Python tools for data visualization can be found on Mode Analytics, Practical Business Python and Dataquest.



Processing Processing is the entrance to the world of coding for many designers. The huge advantage of Processing? It is highly, highly flexible; as much or even more than D3.js - and at the same time it’s easier to understand and write. The disadvantage? It’s not made for data visualisation. The processing coordinate system doesn’t start in the bottom left corner, but in the top left corner, so I had to invert the whole canvas. And axises are possible, but complicated. Also, the result is not made for the web. Javascript libraries like p5.js or Processing.js might solve that.

void setup() {
size(1000,500); #sets size of the canvas
background(255); #sets background color
scale(1, -1); #inverts y & x axis
translate(0, -height); #inverts y & x axis, step 2

Table table = loadTable("data.csv", "header"); #loads csv

  for (TableRow row : table.rows()) { #for each rown in the csv, do:

    float health = row.getFloat("health");
    float income = row.getFloat("income");
    int population = row.getInt("population");
    #map the range of the column to the available height:
    float health_m = map(health,50,90,0,height);
    float income_log = log(income);
    float income_m = map(income_log,2.7, 5.13,0,width/4);
    float population_m =map(population,0,1376048943,1,140);

    ellipse(income_m,health_m,population_m,population_m); //draw the ellipse
  }
}

image



D3.js D3.js is without current alternative options when it comes to creating highly customized, interactive data visualisations for the web. But using D3.js for a simple bubble chart is using an orchestra to just play one tone, one instrument at a time. Sure, you used the whole orchestra. But you could have played Beethoven.

D3.js is a Javascript library with so few defaults that you need to define everything yourself. The disadvantage? Lengthy code. The advantage? It forces you to think about every single one of your settings. One example: Because in D3 I need to define all ranges and domains of scales myself, I was forced to think about the sizes of the bubbles - of all the languages in this blog post, only Processing wanted me to do the same.

<!-- mostly followed this example:
http://bl.ocks.org/weiglemc/6185069 -->

<!DOCTYPE html>
<html>
<head>
  <style>

  circle {
    fill: black;
    opacity:0.7;
  }

  </style>
  <script type="text/javascript" src="D3.v3.min.js"></script>
</head>
<body>
  <script type="text/javascript">

  // load data
  var data = D3.csv("data.csv", function(error, data) {

    // change string (from CSV) into number format
    data.forEach(function(d) {
      d.health = +d.health;
      d.income = Math.log(+d.income);
      d.population = +d.population;
      console.log(d.population, Math.sqrt(d.population))
    });

  // set scales
  var x = D3.scale.linear()
    .domain([0, D3.max(data, function(d) {return d.income;})])
    .range([0, 1000]);

  var y = D3.scale.linear()
    .domain([D3.min(data, function(d) {return d.health;}),
      D3.max(data, function(d) {return d.health; })])
    .range([500, 0]);

  var size = D3.scale.linear()
    .domain([D3.min(data, function(d) {return d.population;}),
      D3.max(data, function(d) {return d.population; })])
    .range([2, 40]);

  // append the chart to the website and set height&width
  var chart = D3.select("body")
  	.append("svg:svg")
  	.attr("width", 1000)
  	.attr("height", 500)

  // draw the bubbles
  var g = chart.append("svg:g");
  g.selectAll("scatter-dots")

    .data(data)
    .enter().append("svg:circle")
        .attr("cx", function(d,i) {return x(d.income);})
        .attr("cy", function(d) return y(d.health);})
        .attr("r", function(d) {return size(d.population);});
  });

  </script>
</body>
</html>

image



D3.js Templates D3.js is complicated and waaay too flexible for 90% of all the charts that get plotted (or 99%? I’m making these numbers up). So some smart people thought: “Let’s use the power of D3.js and make it easy to plot the most common charts with it!” I call these add-ons D3.js template libraries. They are all Javascript libraries which require the D3 library. I tried the three ones I know of: C3.js, D4.js and NV3D.js.

When using C3.js, I first met the concept of “You have a csv that doesn’t look exactly the way we want our data? Nope nope nope, we won’t read that.” Meaning, my beloved csv got a strange side look from C3. Which then decided that it was unreadable.

Next, D4.js. I tried. I tried for almost an hour. I failed. My console in Chrome wasn’t showing any errors. I googled. Nothing. I gave up. That was the point where I learned that its crucial for programming languages to be documented well in the web to be usable. Edit: Mark Dagett, the creator of D4, published a way to build that chart with D4.

NVD3.js was better documented, and certainly more used than D4. NV3D.js too can only work with a very rigid data structure. But here, some help on the web let me read my CSV and produced a scatterplot. So half of my code was concerned with reading the data, but the other half looked like that:

...

nv.addGraph(function() {

    var chart = nv.models.scatter() //define that it's a scatterplot
        .xScale(D3.scale.log()) //log scale
        .pointRange([10, 5000]) //define bubble sizes
        .color(['black']); //set color

    D3.select('#chart') //select the div in which the chart should be plotted
        .datum(exampleData)
        .call(chart);

    //plot the chart
    return chart;
});

image



Highcharts.js h, Highcharts. I make it short: I failed. I read through multiple Tutorials how to import a csv, and there seem to be multiple import options. Eventually, I could import the csv, but I couldn’t translate my data into a bubble chart.

What’s the problem, you ask? It seems like you can’t assign variables to axises. I couldn’t tell Highcharts to put the “health” variables on the y-Axis; the data needed to be in the right order in the csv in the first place. But if you, my fellow vis friend, go all that way and actually have the data in place - then Highcharts will be beautiful. You will get a good-looking chart with just a few lines of Javascript.

Btw, if somebody wants to help me with getting that bubble chart done in Highcharts - please reach out to me. I will be eternally grateful (terms and conditions may apply). Edit: The nice folks at Highcharts helped me to build that graph (see comments). The missing magic was a function called “seriesMapping”, which maps the columns (“0”,”1”, etc.) to the axises.

<!DOCTYPE HTML>
<html>
  <head>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js" type="text/javascript"></script>
    <script src="https://code.highcharts.com/highcharts.js"></script>
    <script src="https://code.highcharts.com/modules/data.js"></script>
    <script src="https://code.highcharts.com/highcharts-more.js"></script>
  </head>
  <body>
    <div id="chart"></div>

    <script>
    var url = 'data.csv';
    $.get(url, function(csv) {

    // A hack to see through quoted text in the CSV
    csv = csv.replace(/(,)(?=(?:[^"]|"[^"]*")*$)/g, '|');

    $('#chart').highcharts({
      chart: {
        type: 'bubble'
      },

      data: {
        csv: csv,
        itemDelimiter: '|',
        seriesMapping: [{
          name: 0,
          x: 1,
          y: 2,
          z: 3
          }]
        },

        xAxis: {
          type: "logarithmic"
        },
        colors: ["#000000"],
      });
    });

    </script>
  </body>
</html>

highcharts



Vega One of the most important thing that came out of the University of Washington Interactive Data Lab is their “visualisation grammar” called Vega, and its light brother, Vega-Lite. Vega feels like an as-much-in-depth charting library like D3.js, but is a little bit less flexible, I believe. It’s definitely easier to build charts with Vega then it is with D3.js. The JSON-structure (which forces you to set everything in quotes and curly brackets) is a little bit annoying, but besides that I’m positively surprised.

{
  "width": 1000,
  "height": 500,
  "data": [
    {
      "name": "data",
      "url": "data.csv",
      "format": {
        "type": "csv",
        "parse": {
          "income": "number"
        }
      }
    }
  ],
  "scales": [
    {
      "name": "xscale",
      "type": "log",
      "domain": {
        "data": "data",
        "field": ["income"]
      },
      "range": "width",
      "nice": true,
      "zero": true
    },
    {
      "name": "yscale",
      "type": "linear",
      "domain": {
        "data": "data",
        "field": ["health"]
      },
      "range": "height",
      "zero": false
    },
    {
      "name": "size",
      "type": "linear",
      "domain": {
        "data": "data",
        "field": "population"
      },
      "range": [0,700]
    }
  ],
  "axes": [
    {
      "type": "x",
      "scale": "xscale",
      "orient": "bottom"
    },
    {
      "type": "y",
      "scale": "yscale",
      "orient": "left"
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {
        "data": "data"
      },
      "properties": {
        "enter": {
          "x": {
            "field": "income",
            "scale": "xscale"
          },
          "y": {
            "field": "health",
            "scale": "yscale"
          },
          "size": {
            "field":"population",
            "scale":"size",
            "shape":"cross"
          },
          "fill": {"value": "#000"},
          "opacity": {"value": 0.6}
        }
      }
    }
  ]
}

image



Vega Lite ….and here’s Vega Lite, the less complex & flexible than Vega, more high-level visualisation grammar. Similar to Vega it has a JSON-like structure, but it sets so much more defaults. It seems amazing, but I couldn’t figure out a way to set the height and width of the whole chart. Edit: The Vega people showed me how to set the height and the width of the chart. Doesn’t seem suuuuper intuitive, but ok. The output looks exactly the same as it does with the Vega-Lite editor Polestar.

{
  "data": {"url": "data.csv", "formatType": "csv"},
  "mark": "circle",
  "encoding": {
    "y": {
      "field": "health",
      "type": "quantitative",
      "scale": {"zero": false}
    },
    "x": {
      "field": "income",
      "type": "quantitative",
      "scale": {"type": "log"}
    },
    "size": {
      "field": "population",
      "type": "quantitative"
    },
    "color": {"value": "#000"}
  },
  "config": {"cell": {"width": 1000,"height": 500}}
  }

image



If you want to try any of the code for yourself: The code for all these charting libraries can be found on GitHub. Let me know if you have questions about the code or how to run it!

The many hours spent trying to understand all these libraries were made possible by my Knight-Mozilla OpenNews fellowship at NPR. A big thank you to OpenNews, the NPR Visuals Team and the helpful comments at the GEN Data Journalism Unconference at the 10th of May in New York City.

———

Edit: After writing this blog post and publishing it on Twitter, I got some great, great responses. Everybody who took the time and replied with a hint, a link or with critique: Thank you so much! You made my knowledge greater and this blog post better. I learned about gnuplot, dimple.js and TauCharts (see comments). Jeff Clark reproduced this chart with Lichen and Austin did the same with Periscope Data – tools I’ve never heard before.

There was also a discussion about if the term “charting library” is appropriate for all tools in this post, initiated by Ben Fry. I’ve learned: It’s not appropriate. R or Processing are not libraries, but languages. And d3.js and processing.org are libraries, but not mainly made for charting. Guys, I’ve learned so much in the last couple of days. Thank you!