Custom Static Analysis Rules Showdown: Brakeman vs. Semgrep

In application assessments you have to do the most effective work you can in the time period defined by the client to maximize the assurance you’re providing. At IncludeSec we’ve done a couple innovative things to improve the overall effectiveness of the work we do, and we’re always on the hunt for more ways to squeeze even more value into our assessments by finding more risks and finding them faster. One topic that we revisit frequently to ensure we’re doing the best we can to maximize efficiency is static analysis tooling (aka SAST.)

Recently we did a bit of a comparison example of two open source static analysis tools to automate detection of suspicious or vulnerable code patterns identified during assessments. In this post we discuss the experience of using Semgrep and Brakeman to create the same custom rule within each tool for a client’s Ruby on Rails assessment our team was assessing. We’re also very interested in trying out GitHub’s CodeQL, but unfortunately the Ruby support is still in development so that will have to wait for another time.

Semgrep is a pattern-matching tool that is semantically-aware and works with several languages (currently its Ruby support is marked as beta, so it is likely not at full maturity yet). Brakeman is a long-lived Rails-specific static-analysis tool, familiar to most who have worked with Rails security. Going in, I had no experience writing custom rules for either one.

This blog post is specifically about writing custom rules for code patterns that are particular to the project I’m assessing. First though I want to mention that both tools have some pre-built general rules for use with most Ruby/Rails projects — Brakeman has a fantastic set of built-in rules for Rails projects that has proven very useful on assessments (just make sure the developers of the project haven’t disabled rules in config/brakeman.yml, and yes we have seen developers do this to make SAST warnings go away!). Semgrep has an online registry of user-submitted rules for Ruby that is also handy (especially as examples for writing your own rules), but the current rule set for Ruby is not quite as extensive as Brakeman. In Brakeman the rules are known as “checks”, for consistency we’ll use the term “rules” for both tools, but you the reader can just keep that fact in mind.

First custom rule: Verification of authenticated functionality

I chose a simple pattern for the first rule I wanted to make, mainly to familiarize myself with the process of creating rules in both Semgrep and Brakeman. The application had controllers that handle non-API routes. These controllers enforced authentication by adding a before filter: before_action :login_required. Often in Rails projects, this line is included in a base controller class, then skipped when authentication isn’t required using skip_before_filter. This was not the case in the webapp I was looking at — the before filter was manually set in every single controller that needed authentication, which seemed error-prone as an overall architectural pattern, but alas it is the current state of the code base.

I wanted to get a list of any non-API controllers that lack this callback so I can ensure no functionality is unintentionally exposed without authentication. API routes handled authentication in a different way so consideration for them was not a priority for this first rule.

Semgrep

I went to the Semgrep website and found that Semgrep has a nice interactive tutorial, which walks you through building custom rules. I found it to be incredibly simple and powerful — after finishing the tutorial in about 20 minutes I thought I had all the knowledge I needed to make the rules I wanted. Although the site also has an online IDE for quick iteration I opted to develop locally, as the online IDE would require submitting our client’s code to a 3rd party which we obviously can’t do for security and liability reasons. The rule would eventually have to be run against the entire codebase anyways.

I encountered a few challenges when writing the rule:

  • It’s a little tricky to find things that do not match a pattern (e.g. lack of a login_required filter). You can’t just search all files for ones that don’t match, you have to have a pattern that it does search for, then exclude the cases matching your negative pattern. I was running into a bug here that made it even harder but the Semgrep team fixed it when we gave them a heads up!
  • Matching only classes derived from ApplicationController was difficult because Semgrep doesn’t currently trace base classes recursively, so any that were more than one level removed from ApplicationController would be excluded (for example, if there was a class DerivedController < ApplicationController, it wouldn’t match SecondLevelDerivedController < DerivedController.) The Semgrep team gave me a tip about using a metavariable regex to filter for classes ending in “Controller” which worked for this situation and added no false positives.

My final custom rule for Semgrep follows:

rules:
- id: controller-without-authn
  patterns:
  - pattern: |
      class $CLASS
        ...
      end
  - pattern-not: |
      class $CLASS
        ...
        before_action ..., :login_required, ...
        ...
      end
  - metavariable-regex:
      metavariable: '$CLASS'
      regex: '.*Controller'  
  message: |
  $CLASS does not use the login_required filter.
  severity: WARNING
  languages:
  - ruby

I ran the rule using the following command: semgrep --config=../../../semgrep/ | grep "does not use"

The final grep is necessary because Semgrep will print the matched patterns which, in this case, were the entire classes. There’s currently no option in Semgrep to show only a list of matching files without the actual matched patterns themselves. That made it difficult to see the list of affected controllers, so I used grep on the output to filter the patterns out. This rule resulted in 47 identified controllers. Creating this rule originally took about two hours including going through the tutorial and debugging the issues I ran into but now that the issues are fixed I expect it would take less time in future iterations.

Overall I think the rule is pretty self-explanatory — it finds all files that define a class then excludes the ones that have a login_required before filter. Check out the semgrep tutorial lessons if you’re unsure.

Brakeman

Brakeman has a wiki page which describes custom rule building, but it doesn’t have a lot of detail about what functionality is available to you. It gives examples of finding particular method calls and whether user input finds their ways into these calls. There’s no example of finding controllers.

The page didn’t give any more about what I wanted to do so I headed off to Brakeman’s folder of built-in rules in GitHub to see if there are any examples of something similar there. There is a CheckSkipBeforeFilter rule which is very similar to what I want — it checks whether the login_required callback is skipped with skip_before_filter. As mentioned, the app isn’t implemented that way, but it showed me how to iterate controllers and check before filters.

This got me most of the way there but I also needed to skip API controllers for this particular app’s architecture. After about an hour of tinkering and looking through Brakeman controller tracker code I had the following rule:

require 'brakeman/checks/base_check'

class Brakeman::CheckControllerWithoutAuthn < Brakeman::BaseCheck
  Brakeman::Checks.add self

  @description = "Checks for a controller without authN"

  def run_check
  controllers = tracker.controllers.select do |_name, controller|
      not check_filters controller
  end
  Hash[controllers].each do |name, controller|
    warn  :controller => name,
          :warning_type => "No authN",
          :warning_code => :basic_auth_password,
          :message => "No authentication for controller",
          :confidence => :high,
          :file => controller.file
  end
  end

# Check whether a non-api controller has a :login_required before_filter
  def check_filters controller
  return true if controller.parent.to_s.include? "ApiController"
  controller.before_filters.each do |filter|
      next unless call? filter
      next unless filter.first_arg.value == :login_required
      return true
  end
  return false
  end
end

Running it with brakeman --add-checks-path ../brakeman --enable ControllerWithoutAuthn -t ControllerWithoutAuthn resulted in 43 controllers without authentication — 4 fewer than Semgrep flagged.

Taking a close look at the controllers that Semgrep flagged and Brakeman did not, I realized the app is importing shared functionality from another module, which included a login_required filter. Therefore, Semgrep had 4 false positives that Brakeman did not flag. Since Semgrep works on individual files I don’t believe there’s an easy way to prevent those ones from being flagged.

Second custom rule: Verification of correct and complete authorization across functionality

The next case I wanted assurance on was vertical authorization at the API layer. ApiControllers in the webapp have a method authorization_permissions() which is called at the top of each derived class with a hash table of function_name/permission pairs. This function saves the passed hash table into an instance variable. ApiControllers have a before filter that, when any method is invoked, will look up the permission associated with the called method in the hash table and ensure that the user has the correct permission before proceeding.

Manual review was required to determine whether any methods had incorrect privileges, as analysis tools can’t understand business logic, but they can find methods entirely lacking authorization control — that was my goal for these rules.

Semgrep

Despite being seemingly a more complex scenario, this was still pretty straightforward in Semgrep:

rules:
- id: method-without-authz
  patterns:
  - pattern: |
    class $CONTROLLER < ApiController
        ...
        def $FUNCTION
          ...
        end
    ...
    end
  - pattern-not: |
    class $CONTROLLER < ApiController
        ...
        authorization_permissions ... :$FUNCTION => ...
        ...
        def $FUNCTION
          ...
        end
    ...
    end
  message: |
  Detected api controller $CONTROLLER which does not check for authorization for the $FUNCTION method
  severity: WARNING
  languages:
  - ruby

It finds all methods on ApiControllers then excludes the ones that have some authorization applied. Semgrep found seven controllers with missing authorization checks.

Brakeman

I struggled to make this one in Brakeman at first, even thinking it might not be possible. The Brakeman team kindly guided me towards Collection#options which contains all method calls invoked at the class level excluding some common ones like before_filter. The following rule grabs all ApiControllers, looks through their options for the call to authorization_permissions, then checks whether each controller method has an entry in the authorization_permissions hash.

require 'brakeman/checks/base_check'

class Brakeman::CheckApicontrollerWithoutAuthz < Brakeman::BaseCheck
  Brakeman::Checks.add self

  @description = "Checks for an ApiController without authZ"

  def run_check

  # Find all api controllers
  api_controllers = tracker.controllers.select do |_name, controller|
      is_apicontroller controller
  end

  # Go through all methods on all ApiControllers
  # and find if they have any methods that are not in the authorization matrix
  Hash[api_controllers].each do |name, controller|
    perms = controller.options[:authorization_permissions].first.first_arg.to_s

    controller.each_method do |method_name, info|
      if not perms.include? ":#{method_name})"
        warn  :controller => name,
              :warning_type => "No authZ",
              :warning_code => :basic_auth_password,
              :message => "No authorization check for #{name}##{method_name}",
              :confidence => :high,
              :file => controller.file
      end
    end
  end
  end

  def is_apicontroller controller
  # Only care about controllers derived from ApiController
  return controller.parent.to_s.include? "ApiController"
  end

end

Using this rule Brakeman found the same seven controllers with missing authorization as Semgrep.

Conclusion

So who is the winner of this showdown? For Ruby, both tools are valuable, there is no definitive winner in our comparison when we’re specificially talking about custom rules. Currently I think Semgrep edges out Brakeman a bit for writing quick and dirty custom checks on assessments, as it’s faster to get going but it does have slightly more false positives in our limited comparison testing.

Semgrep rules are fairly intuitive to write and self explanatory; Brakeman requires additional understanding by looking into its source code to understand its architecture and also there is the need to use existing rules as a guide. After creating a few Brakeman rules it gets a lot easier, but the initial learning curve was a bit higher than other SAST tools. However, Brakeman has some sophisticated features that Semgrep does not, especially the user-input tracing functionality, that weren’t really shown in these examples. If some dangerous function is identified and you need to see if any user input gets to it (source/sink flow), that is a great Brakeman use case. Also, Brakeman’s default ruleset is great and I use them on every Rails test I do.

Ultimately Semgrep and Brakeman are both great tools with quirks and particular use-cases and deserve to be in your arsenal of SAST tooling. Enormous thanks to both Clint from the Semgrep team and Justin the creator of Brakeman for providing feedback on this post!

Leave a Reply

Discover more from Include Security Research Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading