Category Archives: data

Before I get started… (aka – Always Sanitize Your User Input!)

I’ve read through a lot of PHP sample code over the past decade, and one thing that is almost universally missing from the samples is data sanitization.  Many experienced developers assume that data sanitization is understood to be a requirement, and so they (correctly) exclude it from their demo code in order to save time and for the purposes of clarity.  In most of my examples on this blog, data sanitization will also be absent.  However, I think that this attitude has helped foster a culture of nonchalance in the PHP community when it comes to verifying input data and has led to some major disasters that could have easily been avoided with a little experience, attention to detail, and of course, a dash of proper QA.

With that being said, many modern PHP frameworks take care of the heavy lifting for you when it comes to data sanitization.  In CodeIgniter, the Input Class offers automatic data sanitization, XSS filtering and some other assorted security features and helpers.

For anyone who is not using a framework (and those who are, but want to understand PHP’s internal input filtering capabilities), it is important to know about the data sanitization functionality available in PHP, especially the new data filtering available in PHP 5, which has seriously improved the security of the system as a whole.  As far as I’m concerned, the two main priniciples behind data sanitization are:

  1. Ensure that the data you receive is in the expected format. Make sure integers are integers, and email addresses are email addresses, etc.
  2. Ensure that dangerous or malicious data is properly formatted or escaped to avoid damage to your website, to prevent theft or fraud, and to ensure the security and privacy of your users.

Big job, right? Luckily, PHP 5 has some really nice built-in features that help us accomplish this quite effectively. The most basic filtering function is filter_var. This function is extremely easy to use, and as the name suggests, filters variables to conform to the expected type.  As an example, if you want to ensure that a variable passed to a function is an integer, you would check it using the following filter:

function findUserById($id) {
    $id = filter_var($id,FILTER_VALIDATE_INT);
}

The code above will not only validate that the variable $id is, in fact, an integer, but it will also set $id to 0 (or false) on failure. It is helpful to be familiar with the various validation and sanitization filters that PHP 5 has available.

The filter_var function also allows various flags to be set to help PHP to filter the data to your satisfaction. For example, the following code will ensure that your variable is an integer, but it will also set a default value of 3 that will be returned on failure, and allow for hex values in addition to decimal values:

function setCode($id) {
    // let's assume that $id = 0XFC75;
    $options = array(
        'options' => array(
            'default' => 3
        ),
        'flags' => FILTER_FLAG_ALLOW_HEX,
    );
    $id = filter_var($id, FILTER_VALIDATE_INT, $options);
}

To save you the trouble of copying all of your $_POST and $_GET variables into local variables and then filtering them, PHP also offers input filtering via the filter_input function. This function is basically the same as filter_var, with minor differences in syntax that allow you to specify where to look for the variable you want to filter. For example, say you had a variable in your $_POST array called “firstname” that you would normally access using $_POST[“firstname”]. Here is how you could filter it to ensure that potentially dangerous characters like single and double quotes are escaped properly:

// Assume the user has entered his first name into a field called "firstname" and submitted the form
// you would normally access it using the $_POST["firstname"] variable

$firstname = filter_input(INPUT_POST,'firstname',FILTER_SANITIZE_STRING);

Now you can access the sanitized value using the $firstname variable!

The same is true of variables being passed in from the URL:

// Imagine the URL is 
// http://www.domain.com/adduser.php?firstname=frankie

$firstname = filter_input(INPUT_GET,'firstname',FILTER_SANITIZE_STRING);

And again, the sanitized value from $_GET[“firstname”] is now stored in the variable called $firstname.

Finally, PHP offers an even more streamlined approach to sanitizing your input data, using the filter_input_array function. This function allows you to define how PHP should sanitize and validate all of the data in your $_POST and $_GET arrays using a definition array and a single function call. Imagine you have the following HTML form:

<form action="adduser.php" method="post">
    First Name: <input type="text" name="firstname" /><br/>
    E-mail: <input type="text" name="email" /><br/>
    Age: <input type="text" name="age" /><br/>
    <input type="submit" value="Add User" />
</form>

How can you use filter_input_array to ensure that the user is at least 18 years old, has entered a valid email address, and ensure that no malicious or dangerous characters are going to be inserted into your database? Here is an example using filter_input_array:

$options = array(
    'firstname'   => FILTER_SANITIZE_STRING,
    'age' => array(
                 'filter'    => FILTER_VALIDATE_INT,
                 'options'   => array('min_range' => 18)
             ),
    'email' => FILTER_VALIDATE_EMAIL
);

$_CLEAN = filter_input_array(INPUT_POST, $options);

Now, you can access the validated and sanitized data in your $_CLEAN array instead of using $_POST directly, like so:

  $firstname = $_CLEAN["firstname"];
  if (!$_CLEAN["email"]) {
      // the email address was not valid, 
      // so display an error to the user
  }
  // and so on...

I hope you’ve learned a little something about sanitizing your data. This (very long) post was only meant to steer you in the right direction – The best way to learn about PHP’s data filtering functionality is to read the manual and try it out for yourself!