Experiences On Using Static Code Analysis Tools for Python

Static code analysis, as the name implies, is the analysis of the non-running source code of a program. This can be done manually through code reviews where an experienced developer will inspect and walk through the code to find potential programming mistakes. However, such manual process are time consuming and can be improved through automated static code analysis tool.

In Python programming languages, Pyflakes, PyChecker, and Pylint is the common static code analysis tool. This post discusses the experiences on applying these tool to Subdown, an open-sourced image scraper console tool written in Python.

To evaluate and compare different these three static code analysis tool for the Python programming language, I've pick an open-sourced project called Subdown. This program is a image downloader console for the Reddit, an online news sharing community. The site is organized into multiple SubReddits, a breakdown of the smaller communities grouped by topics or interests. The program consist of a single file Python script which crawls a targeted SubReddits for its external URLs of images and asynchronous download these images. First, download the sample script.
$ wget https://raw.githubusercontent.com/radiosilence/subdown/master/subdown.py

Pyflakes
Pyflakes is a very basic fundamental tools. It only parse the Python source files to check for any errors. However, this tool does not check for any coding style violation. The warning shown below indicates that the Subdown program imports additional unused module named ‘mimetypes’. Loading unnecessary resources will slow down program execution and utilize additional memory resources.
$ pyflakes subdown.py 
subdown.py:11: 'mimetypes' imported but unused

PyChecker
Similar to Pyflakes, PyChecker also parse and check for source files for errors, hence shares similar warning with Pyflakes. Furthermore, Pychecker also import and executing Python modules for additional validation. This result illustrated below shows that the warning indicated that the same Python module, gevent was being imported to the application in two separate ways where it should be done once.
$ pychecker subdown.py
Processing module subdown (subdown.py)...
 ImportError: No module named _winreg
       
 Warnings...
 :28: self is not first method argument
 subdown.py:11: Imported module (mimetypes) not used
 subdown.py:17: Using import and from ... import for (gevent)

However, this is false positive warning. As shown from its code below, the Subdown program import all the methods from the gevent module. However, in second line, it reimport again from gevent module but only the monkey class so it can “monkey patch” the existing behaviours to work around the limitation of the standard socket module. Monkey patching is one of the feature of dynamic typed programming languages where we can extend and modify the existing behaviours of the methods, attributes, or functions during run-time. This technique is used typically to work around the constraints of no able to modify existing libraries.
16 import gevent
17 from gevent import monkey; monkey.patch_socket()

Pylint
Pylint, the next static code analysis tool in our evaluation, is the most comprehensive with lots of additional features. See Appendix A for the details output when ran against Subdown program. Instead of just checking for Python code errors like the previous two tools, it also check the coding style violation and code smells. Code style is validated against Python’s PEP 8 style guide. Meanwhile, code smells is a piece of inefficient code, while may run correctly, still have room for improvement through code refactoring. All these are categorized into five message types as shown below:
  • (C) convention, for programming standard violation
  • (R) refactor, for bad code smell
  • (W) warning, for python specific problems
  • (E) error, for probable bugs in the code
  • (F) fatal, if an error occurred which prevented pylint from doing further processing

Comparing the sample result below with previous two tools, we get similar warning of unused import. However, there is a new warning not found which is ‘Unreachable code’.
W:198,12: Unreachable code (unreachable)
C:200, 0: Missing function docstring (missing-docstring)
W: 11, 0: Unused import mimetypes (unused-import)

Extracting out the portion of code in shown below which corresponds to the warning of ‘Unreachable code’, it shows that Line 198 will not be executed at all due to the raise statement in line 197. Upon raising an exception in Line 197, the program will halt and exit the execution. This is a good example where the static code analysis tool can help to uncover incorrect assumption made by the developer.
194         try:
195             get_subreddit(subreddit, max_count, timeout, page_timeout)
196         except Exception as e:
197             raise
198             puts(colored.red(str(e)))

While writing this post, I've found another tool called Pylama, which is a helper tool that wraps several code linters like PyFlakes, Pylint, and others. However, there is an issue integrating with Pylint. You may give it a try but YMMV.

No comments:

Post a Comment