Click Bait

Abstract: A behind the scenes view of the SciTool’s Blog

When the SciTool’s Blog was initially created, there were automated weekly reports on which articles were the most popular. I never saw those reports directly, but I definitely heard about the results when my “Making Graphs Interesting” article got first place. Now “which article is doing best” is a common question at lunchtime. 

Unfortunately, those weekly reports aren’t available anymore. How can I know if I’m winning? 

The current visit statistics are tracked by Matomo. So, the first thing I tried was looking at the web interface for Matomo. Unfortunately, Matomo tracks all of scitools.com, not just the blog, and I couldn’t see a way to filter the results to just blog entries. But hey, I’m a computer scientist. I don’t need a webpage to generate a report for me. I’ll just download the data as a CSV and start playing with it.

Naturally, I open the data in Microsoft Excel (because I’m a fangirl of Microsoft Office). The table is 431 lines long. Not terrible for manual editing. So, I start manually removing every line that isn’t part of the blog. Here’s the first problem: entries appear more than once. There seems to be two major formats:

blog.scitools.com/Visualizing Change – SciTools Blog
Visualizing Change – SciTools Blog  

Additionally, a lot of the blog entries seem to be for keyword pages:

Ada – SciTools Blog
blog.scitools.com/Ada – SciTools Blog

I probably could still work out the statistics using Microsoft Excel with some formulas, but at this point, it’s faster for me to switch to a script. Since chances are I’ll want to run this script more than once, I might as well start it off correctly with argparse for the command line arguments. 

import argparse

parser = argparse.ArgumentParser(description='Matomo Blog Rankings')
parser.add_argument('file', type=argparse.FileType('r',encoding='utf-16'))
parser.add_argument('--label', default='Label')
parser.add_argument('--count', default='Unique Pageviews')
args = parser.parse_args()

I mainly need the file argument. I add an argument to change the column to count by for convenience and the label column for completeness. The encoding argument was a pain to find (I kept getting encoding errors), but finally, success!

I probably should use Pandas or something similar to load the CSV file, but I don’t think I have it installed so I’ll just use the quick and dirty approach: split each line by commas and index into the array:

header = args.file.readline()
cols = header.split(',')

labelCol = cols.index(args.label)
cntCol = cols.index(args.count)
Next, I’ll search for the common “-SciTool’s Blog” in the label and keep only those entries:
cnts = {}

line = args.file.readline()
while line:
    cols = line.split(',')
    label = None
    if cols[labelCol].endswith(' – SciTools Blog'):
      # Strip - Scitools Blog
      label = cols[labelCol][:-16]
    if label:
      # If it started as a full URL, keep only the final title
      if '/' in label:
        label = label.split('/')[-1]
      # Remove whitespace
      label = label.strip()
      # Store the Counts
      if label in cnts:
        cnts[label] += int(cols[cntCol])
      else:
        cnts[label] = int(cols[cntCol])
    line = args.file.readline()

Then I can print out the cnts dictionary and get my results. The remaining two problems are limiting the results to blog entries and fixing the ones that aren’t getting found. The major problem with the ones not found is that the titles contain commas. Something using Pandas might have solved for me, but it’s still faster to do a quick hack than to remember how to use Pandas. The final script:

import argparse

parser = argparse.ArgumentParser(description='Matomo Blog Rankings')
parser.add_argument('file', type=argparse.FileType('r',encoding='utf-16'))
parser.add_argument('--label', default='Label')
parser.add_argument('--count', default='Unique Pageviews')
args = parser.parse_args()

header = args.file.readline()
cols = header.split(',')

labelCol = cols.index(args.label)
cntCol = cols.index(args.count)

articles = {
"Debugging Understand with Understand":"Natasha Stander",
"As-Built Documentation":"Natasha Stander",
"Getting Git: Walking File History":"Natasha Stander",
"Getting Git: Blame":"Natasha Stander",
"Getting Git: Submodules":"Natasha Stander",
"Getting Git: Scanning Directories":"Natasha Stander",
"How Hard is it?":"Natasha Stander",
"Making Dependencies Interesting":"Natasha Stander",
"Minimize the Impact of Interruptions":"Heidi Esplin",
"Seeing the Unseeable":"Natasha Stander",
"Rogue Dependencies":"Kevin Groke",
"Visualizing Change":"Kevin Groke",
"De-Mystifying Dependencies":"Natasha Stander",
"API: Contents vs Value":"Natasha Stander",
"Short-term memory overload while debugging":"Ken Nelson",
"2 Uber Fast Ways to Browse Your Code":"Ken Nelson",
"Info is just a Hover Away":"Ken Nelson",
"One-Click Code Browsing":"Ken Nelson",
"Hyper-Xref in Understand":"Ken Nelson",
"Interesting Graphs about Your Code":"Kevin Groke",
"Understanding Macro Heavy Code":"Stephane Raynaud",
"Useful Script: Export All Metric Values":"Robby Bennett", #, Not Just Violations
"Help your future self… write code not “magic”":"Ken Nelson",
"Why swing just one hammer…oops IDE?":"Ken Nelson",
"Precise Searching for Smarter Changes":"Ken Nelson",
"Our Favorite Computer Science Quotations":"Ken Nelson",
"Why use Understand? (Dinner Party edition)":"Ken Nelson",
"Licensing Understand As You Grow":"Stephane Raynaud",
"Blame Can Be Good":"Natasha Stander",
"Making Graphs Interesting":"Natasha Stander",
"Useful Scripts: Scanning source code for profanity":"Jordan Colbeth",
"Making sense of GIT source code":"Ken Nelson",
"Just How Complex Is Your Project?":"Jordan Colbeth",
"Browsing Code with Tired Eyes":"Ken Nelson",
"The Science of Debugging":"Jason Haslam",
"Analyzing Makefile Based Source Code":"Ken Nelson",
"How I Cut Our Build Time in Half":"Natasha Stander",
"Understand’s most useful and smallest button":"Ken Nelson",
"3 Pre-hire Signs of a Successful Software Engineer":"Ken Nelson",
"Hey Managers… Just Say Yes.":"Ken Nelson",
"Code Browsing using Graphical Views":"Ken Nelson",
"Setting up use case specific tool and window layouts with Understand Sessions":"Ken Nelson",
"Set up Understand for Awesome Code Browsing":"Ken Nelson",
"My Tricks for Using Understand":"Natasha Stander",
"Tracking Uses of Deprecated Code Using Architectures":"Ken Nelson",
"Pro Tip: Start with Directory Structure and add some Human to it.":"Ken Nelson",
"Assessing Possible Changes in Source Code (a.k.a. My boss asked me to change this code)":"Ken Nelson",
"A visual tour of code complexity":"Ken Nelson",
"Functional Decomposition Architectures FTW":"Ken Nelson",
"Analyzing the Imperial College COVID simulation model source code":"Ken Nelson",
"Making big graphs manageable with “relationship” graphs":"Ken Nelson",
"Analyzing Boost on Linux":"Stephane Raynaud", #, a remarkable story to learn.
"Finding #pragma directives – A guide to writing your first Codecheck":"Jordan Colbeth",
"Not a software company? SciTools can help you become one.":"Stephane Raynaud",
"Lightweight but Powerful Code Reviews with Understand Annotations":"Ken Nelson",
"Lost in Translation – Finding Strings":"Natasha Stander",
"Opening Braces Should Appear on Their Own Line":"Kevin Groke"
}

titlesWithAComma = [
"Useful Script: Export All Metric Values",
"Analyzing Boost on Linux"
]

cnts = {}

line = args.file.readline()
while line:
    cols = line.split(',')
    label = None
    offset = 0
    if cols[labelCol].endswith(' – SciTools Blog'):
      # Strip - Scitools Blog
      label = cols[labelCol][:-16]
    else:
      for suffix in titlesWithAComma:
        if cols[labelCol].endswith(suffix):
          label = cols[labelCol]
          offset = 1
    if label:
      # If it started as a full URL, keep only the final title
      if '/' in label:
        label = label.split('/')[-1]
      # Remove whitespace
      label = label.strip()
      # Store the Counts
      if label in cnts:
        cnts[label] += int(cols[cntCol + offset])
      else:
        cnts[label] = int(cols[cntCol + offset])
    line = args.file.readline()

byAuthor = {}
print("Article Title,Count")
for k,v in sorted(articles.items(),reverse=True,key=lambda item: cnts.get(item[0],0)):
  print (k+"("+v+"),",cnts.get(k,0))
  byAuthor[v] = byAuthor.get(v,0) + cnts.get(k,0)

print("\n")
print("Author,Count");
for k,v in sorted(byAuthor.items(),reverse=True,key=lambda item: item[1]):
  print (k+",",v)

And the results for November:

Article Title,Count
Licensing Understand As You Grow(Stephane Raynaud), 107
Visualizing Change(Kevin Groke), 98
Interesting Graphs about Your Code(Kevin Groke), 88
Rogue Dependencies(Kevin Groke), 78
De-Mystifying Dependencies(Natasha Stander), 68
Analyzing Makefile Based Source Code(Ken Nelson), 44
My Tricks for Using Understand(Natasha Stander), 37
Minimize the Impact of Interruptions(Heidi Esplin), 34
Making Dependencies Interesting(Natasha Stander), 29
Seeing the Unseeable(Natasha Stander), 26
Making Graphs Interesting(Natasha Stander), 24
Code Browsing using Graphical Views(Ken Nelson), 24
Hyper-Xref in Understand(Ken Nelson), 22
How Hard is it?(Natasha Stander), 21
Getting Git: Scanning Directories(Natasha Stander), 16
Getting Git: Submodules(Natasha Stander), 15
A visual tour of code complexity(Ken Nelson), 15
Useful Script: Export All Metric Values(Robby Bennett), 14
Our Favorite Computer Science Quotations(Ken Nelson), 14
Analyzing the Imperial College COVID simulation model source code(Ken Nelson), 13
2 Uber Fast Ways to Browse Your Code(Ken Nelson), 12
Getting Git: Walking File History(Natasha Stander), 9
API: Contents vs Value(Natasha Stander), 9
Why swing just one hammer…oops IDE?(Ken Nelson), 9
Making sense of GIT source code(Ken Nelson), 9
Making big graphs manageable with “relationship” graphs(Ken Nelson), 9
As-Built Documentation(Natasha Stander), 8
Short-term memory overload while debugging(Ken Nelson), 8
One-Click Code Browsing(Ken Nelson), 8
How I Cut Our Build Time in Half(Natasha Stander), 8
Functional Decomposition Architectures FTW(Ken Nelson), 8
Opening Braces Should Appear on Their Own Line(Kevin Groke), 8
Understanding Macro Heavy Code(Stephane Raynaud), 7
Tracking Uses of Deprecated Code Using Architectures(Ken Nelson), 7
Why use Understand? (Dinner Party edition)(Ken Nelson), 6
Just How Complex Is Your Project?(Jordan Colbeth), 6
Browsing Code with Tired Eyes(Ken Nelson), 6
The Science of Debugging(Jason Haslam), 6
3 Pre-hire Signs of a Successful Software Engineer(Ken Nelson), 6
Set up Understand for Awesome Code Browsing(Ken Nelson), 6
Finding #pragma directives – A guide to writing your first Codecheck(Jordan Colbeth), 6
Assessing Possible Changes in Source Code (a.k.a. My boss asked me to change this code)(Ken Nelson), 5
Getting Git: Blame(Natasha Stander), 4
Precise Searching for Smarter Changes(Ken Nelson), 4
Blame Can Be Good(Natasha Stander), 4
Useful Scripts: Scanning source code for profanity(Jordan Colbeth), 4
Pro Tip: Start with Directory Structure and add some Human to it.(Ken Nelson), 4
Analyzing Boost on Linux(Stephane Raynaud), 4
Lost in Translation – Finding Strings(Natasha Stander), 4
Info is just a Hover Away(Ken Nelson), 3
Not a software company? SciTools can help you become one.(Stephane Raynaud), 3
Understand’s most useful and smallest button(Ken Nelson), 2
Setting up use case specific tool and window layouts with Understand Sessions(Ken Nelson), 2
Lightweight but Powerful Code Reviews with Understand Annotations(Ken Nelson), 2
Debugging Understand with Understand(Natasha Stander), 0
Help your future self… write code not “magic”(Ken Nelson), 0
Hey Managers… Just Say Yes.(Ken Nelson), 0


Author,Count
Natasha Stander, 282
Kevin Groke, 272
Ken Nelson, 248
Stephane Raynaud, 121
Heidi Esplin, 34
Jordan Colbeth, 16
Robby Bennett, 14
Jason Haslam, 6

Sadly, I’m not winning anymore. Even with my supposedly click-bait titles containing the word “interesting.” The top article is Stephane’s licensing article, directly linked by the pricing page. Can’t beat money. The next three are all Kevin’s. His success strategy was writing the email announcement for 6.1 and highlighting his own articles. I might have to counter by getting my students next semester to click on my articles for extra credit. 

But, I learned from my younger sister that the way to win is to redefine the rules until you’re winning. So, tallying by author, I’m just barely beating Kevin. The strategy of writing lots of articles may be working. And, I still have the option of changing the date range and running the script again. I’m sure there’s some “week” out there where I’m winning.