Meaningful Variable Names and the Boy Scout Rule
Quick! What’s the bug fixed by this commit?
-def compare_links(a_html, b_html):
- a_soup = bs4.BeautifulSoup(a_html, "lxml")
- b_soup = bs4.BeautifulSoup(b_html, "lxml")
+def compare_links(local_html, remote_html):
+ local_soup = bs4.BeautifulSoup(local_html, "lxml")
+ remote_soup = bs4.BeautifulSoup(remote_html, "lxml")
- a_links = set(a_soup.find_all("a"))
- b_links = set(b_soup.find_all("a"))
+ local_links = set(local_soup.find_all("a"))
+ remote_links = set(remote_soup.find_all("a"))
- old_hrefs = [x.attrs["href"] for x in a_links]
+ old_hrefs = [x.attrs["href"] for x in remote_links]
- new_links = [x for x in a_links if x.attrs["href"] not in old_hrefs]
+ new_links = [x for x in local_links if x.attrs["href"] not in old_hrefs]
Not that obvious, right?
Please read on to find out how this bug could never had happened, and how this commit should never have been made.
Context #
The above is a real commit I made recently.
It’s a script I use to automatically tweet about new links, like here
The first version of the algorithm was quite simple:
- Generate a new version of the blog locally. I use hugo
for this, meaning the
links.html
page is generated from alinks.md
markdown file - Fetch the remote
links.html
page fromdmerej.info
- Compare the two html files to find new links
- Generate a new tweet for each new link
rsync
the newly generatedlinks.html
to the remote server
That’s where the compare_links
function is used:
import requests
def main():
build_blog()
local_links_path = os.path.join(BLOG_SRC_PATH, "public",
"pages", "links", "index.html")
with open(local_links_path, "r") as fp:
local_links_html = fp.read()
remote_links_html = requests.get("https://dmerej.info/links.html").text
new_links = compare_links(local_links_html, remote_links_html)
if not new_links:
return
tweet_new_links(new_links)
In the first version I was just comparing two html files, so I named
the parameters with a_
and b_
prefixes:
def compare_links(a_html, b_html):
a_soup = bs4.BeautifulSoup(a_html, "lxml")
b_soup = bs4.BeautifulSoup(b_html, "lxml")
a_links = set(a_soup.find_all("a"))
b_links = set(b_soup.find_all("a"))
new_links = a_links - b_links
return new_links
Simple enough, right?
A new feature #
I was happy with this for a while, until I made a change in the description of a link:
--- a/content/pages/links.md
+++ b/content/pages/links.md
- [First description](http://example.com/website)
+ [Second description](http://example.com/website)
This caused my script to tweet about http://example.com/website
twice!
I thought: “OK, the fix is simple: instead of intersecting the two lists (old and new), build a new list containing all the links which URLs are not in the old list of links”
So this is the patch I made:
- new_links = a_links - b_links
+ old_hrefs = [x.attrs["href"] for x in a_links]
+
+ new_links = [x for x in a_links if x.attrs["href"] not in old_hrefs]
Found the bug yet?
Meaningful variable names for the win! #
Let me help you: instead of trying to fix the bug directly, let us think about the variable names for a moment.
The name with a_
and b_
prefixes suggest that the variables can be swapped,
and it’s not obvious that a_
refers to the “local” links, i.e the new ones
that we just added to the links.md
files, and b_
refers to the “remote” links,
i.e the one we just fetched from dmerej.info
.
Let’s re-write the function to use meaningful names instead:
def compare_links(local_html, remote_html):
local_soup = bs4.BeautifulSoup(local_html, "lxml")
remote_soup = bs4.BeautifulSoup(remote_html, "lxml")
local_links = set(local_soup.find_all("a"))
remote_links = set(remote_soup.find_all("a"))
new_links = local_links - remote_links
return new_links
Then when we want to re-do the patch, we naturally write:
- new_links = a_links - b_links
+ old_hrefs = [x.attrs["href"] for x in old_ref]
+
+ new_links = [x for x in new_links if x.attrs["href"] not in old_hrefs]
+ retur new_links
I hope that by now you found the bug. Instead of using the a_links
list
to create the list of new links, we should have used the b_links
list,
but the mistake was not obvious to spot because of the bad naming.
The Boy Scout Rule #
This rule comes from an old article from our friend Uncle Bob.
Paraphrasing a little bit, it says:
Always push a cleaner code than the one you pulled.
Have I followed this rule, I would have first made a commit to fix the variable names before changing the code to introduce a new feature, and then the bug would not have happened.
Instead I was lazy, so I introduced a bug, and the commit that fixes it is not obvious to read.
(At this point it does not really matter because I’m the only one writing this code, but still …)
Lessons learned #
- Meaningful variable names do matter
- If you follow the boy scout rule, make sure to separate the commits that simply refactor the code from the commits that actually fix bugs or introduce new commits, to have a more readable history.
That’s all for today, see you soon!
Thanks for reading this far :)
I'd love to hear what you have to say, so please feel free to leave a comment below, or read the contact page for more ways to get in touch with me.
Note that to get notified when new articles are published, you can either:
- Subscribe to the RSS feed
- Follow me on Mastodon
- Follow me on dev.to (mosts of my posts are mirrored there)
- Or send me an email to subscribe to my newsletter
Cheers!