Data Sci Adventures - part 3, fishy histogram

Published on 2021-05-27 18:18

Available in:

This post is a continuation from previous article which explored the fish data set with Python and R. In this one is a histogram created from data of one of the data species.

Unfortunately in Scott Murray's book histograms weren't covered. So I had to head to ObservableHQ for an example of d3.bin functionality. Unfortunately I always found articles there confusing, now that I know D3.js a bit more it's a bit easier. This post is a compilation of my notes.

My first step was splitting data according to fish species in Python and saving each file in JSON.

for i in range(len(group)):
    partial = {}
    for column in list(group.columns):
        partial[column] = group.iloc[i][column]
    result.append(partial)

with open(os.path.join(os.getcwd(), 'grouped', f"{specie}.json"), 'w') as f:
    f.write(json.dumps(result))

Second step is to get data in JavaScript:

d3.json("url to data", (data) => {
        return data
    }).then((data) => {
      // Code will be here  
    })

The most important part in creating a histogram is creating buckets. In d3 there's a function bin() which returns another function that will be used to create actual bins. In this example it's implemented as:

const buckets = d3.bin()(data.map((item) => item["Weight"]));

This is followed by setup for the chart.

const width = 350,
      height = 300,
      margin = { top: 60, right: 20, bottom: 40, left: 40},
      maxBins = d3.max(buckets, d => d.length),
      max = buckets[buckets.length - 1].x1,
      min = buckets[0].x0,
      svg = d3.select("#hist")
              .append("svg")
              .attr("height", height)
              .attr("width", width),

Next are axis settings:

x = d3.scaleLinear()
       .domain([min, max])
       .range([30, width - 30])
       .clamp(false),
y = d3.scaleLinear()
      .domain([0, maxBins])
      .nice() // Returns a new interval [niceStart, niceStop] covering the given interval [start, stop] and where niceStart and niceStop are guaranteed to align with the corresponding tick step.
      .range([height - margin.bottom, margin.top]), // Returns an array containing an arithmetic progression, similar to the Python built-in range.
xAxis = g => g.attr("transform", `translate(0,${height - margin.bottom})`)
              .call(d3.axisBottom(x).tickSizeOuter(0))
              .call(g => g.append("text")
                          .attr("x", (width - margin.right)/2)
                          .attr("y", 35)
                          .attr("fill", "#000")
                          .attr("text-anchor", "middle")
                          .text("Weight [g]")
                    );

Now that things are set up, it's time to put things together. In the first block the code appends group element to svg element. This group element will hold rectangles of the histogram. The most import function that is called here is data() which will iterate through the buckets.

The second part will add x axis to describe the data.

svg.append("g")
   .selectAll("rect")
   .data(buckets)
   .join("rect")
   .attr("fill", (d => binColor(d.x0)))
   .attr("x", d => x(d.x0) + 1)
   .attr("width", d => Math.max(0, x(d.x1) - x(d.x0) - 1))
   .attr("y", d => y(d.length))
   .attr("height", d => y(0) - y(d.length));

svg.append("g").call(xAxis);

No good chart is complete without labels and title.

const labels = svg.append("g")
                  .selectAll("text")
                  .data(buckets.filter(d => d.length > 0))
                  .join("text")
                  .attr("x", d => ((x(d.x0) + x(d.x1)) / 2) | 0)
                  .attr("y", d => y(d.length) - 2)
                  .style("fill", "black")
                  .style("font-size", 10)
                  .style("text-anchor", "middle");
      labels.text(d => {
            if (x(d.x1) - x(d.x0) < 50) {
                return d.length
            } else if (d.length > 1) {
                return `${d.length} items`
            } else if (d.length === 1) {
                return "1 item"
            } else {
                return "empty bucket"
            }
        });

svg.append("g")
   .append("text")
   .text("Bream weight distribution")
   .style("fill", "#000")
   .attr("font-weight", "bold")
   .style("font-size", 14)
   .style("text-anchor", "end")
   .attr("x", 250)
   .attr("y", 30);

When it's all put together it looks like this:

Link to used fish data