Static Analysis: It's Simply Critical

Comments:: 0

Tags:: Type Systems; Type Safety; Static Analysis; PHP; JavaScript; Best Practice

By Jesse Donat on: Oct. 21, 2017

I am someone who spends large portions of their time working in weakly typed languages. Namely, PHP and JavaScript. I argue that static analysis is a must-have when working with any weakly typed language. It is nearly impossible to make reliable code without it.

Static analysis makes up for the missing type system and allows you to scale.

I recently got in an argument with another developer over static analysis. It's been an ongoing point of contention. He was angry that code that works fine but didn't document its undetectable return types was failing CI. He argued that if the code works, it should pass. He argued that if the code works, it is "correct".

I put forward the exact opposite position. Code that works but doesn't statically analyze is incorrect. Code that is correct has no potential of runtime error.

My intentions are not only to sell you on the idea that you should be statically analyzing your code, but to convince you that it is fundamental to responsible reliable development.

Humans are fallible; code written by humans doubly so. By it's very nature that code is broken until proven otherwise. If it cannot be statically analyzed, it cannot be proven correct.

If code runs as expected, this only proves it correct in the limited scope it's been ran it within. Be that ones, tens or even hundreds of thousands of parameter combinations, it's always limited. There therefore must be zero expectation of it continuing to work.

To the contrary, if code statically analyzes unerring, it will run without runtime error in all cases, save a failure of the analyzer. This code is proven. This code is correct.

Code which "works" does nothing to prove that it's used as you expect. In particular elsewhere within the project. It does nothing to prove it will remain correctly used in all cases in the future.

I know what you are thinking "But… but… my code runs correctly within my expected domain", and that's fine. That is what's expected of it. But throw exceptions if it happen it doesn't know how to handle, things that creep outside the expected domain. It lets the future developers, the analyzer, and the program itself know the intended domain and when it steps outside it.

No amount of fuzz testing can account for another developer doing something you never expected.

To give a trivial but near real life example, consider the following:

function getUserById( userId ) {
    if(userId > 0) {
        return new User($userId);
    }

    return null;
}

var user = getUserById( input.value );
console.log( "Hello userId: " + user.getId() );

The above code may work throughout all your testing. Every place you use it, every time you try it, everything your application throws at it - success.

This code works, and yet this code is broken. This code does not handle the potential null, and has the potential to trigger a runtime error. This code has the capability of a runtime error, and is therefore incorrect.

Making this code correct would be as simple as:

var user = getUserById( input.value );

if(!user instanceof User) {
    throw new Error("User not found");
}

console.log( "Hello userId: " + user.getId() );

An Exception, as opposed to a runtime error, is something the code and coder anticipates. It's an easy way to show you've thought about the domain of your code. Even unhandled, it's clear the developer considered the possibility. It's easily recoverable. Throwing an exception here is correct.

Obviously you'd probably want to handle that Error in the example somewhere, but that's outside the scope of this writing.

The trivial case above is reasonably obvious and easy to spot. Let's instead imagine the function to get a user is deep in your codebase. While we are at it, let's also say that the logical User|null union we've created gets passed around. Passed several levels deep, rather than using it immediately where it's fetched. It rapidly becomes much less obvious, nigh impossible, to notice that there is an issue. A call 8 methods deep has no idea it might be receiving a null. This is exactly the kind of thing a static analysis is for.

A static analyzer knows the full scope and range of possible values. It is able to pinpoint problems a human fails to notice. Noting all function calls within the scope of a project and knowing all possible.

All this withstanding, static analysis cannot prove the logic is correct. That's not what static analysis is for. Unit tests and other testing prove that. Static analysis proves that the code itself executes free of unhandled runtime errors. It is the type checker your language forgot.

Arguable the better solution is to switch to a type safe language; one with compile time type checks and ideally union types. That's not always an option for most people though, consider the recent meteoric growth of JavaScript.

TypeScript is without a doubt the best option for JavaScript. I highly recommend it — consider it static analysis on steroids. It's JavaScript with the addition of strong typing. There's no reason to be writing straight JavaScript anymore. TypeScript is really a wonderful all in one solution.

Elm is a neat alternative but a completely different paradigm. While it compiles to JavaScript, it's completely foreign and more of an all-in situation. It borrows much of its syntax from Haskell, and offers the aforementioned union types.

For PHP on the other hand I recommend a handful of tools. Scrutinizer is in my experience the most feature complete of the hosted CI services. I make extensive use of PHP CodeSniffer which is an amazing tool based around a PHP tokenizer. It's a lot of fun writing your own sniffs in my experience. I have begun experimenting with Phan - it's a little unforgiving, in a good way… And of course PHPStorm has a pretty decent analyzer built in.