Escaping special chars in LDAP search filters

Json2Ldap iconWhen programming against the Json2Ldap web API (or for that matter, against any LDAP backend) it’s good to sanitise any user input that may go into a search filter. A typical case is authentication (login or single sing-on) applications, where an input username or email must be used to resolve a user’s distinct name (DN) in the LDAP directory.

Consider for example the following search filter template, which is used to get a user based on finding an exact match of their uid or mail attribute.

(|(uid=%u)(mail=%u))

The %u placeholder is replaced by the user input and the resulting string is used to create the LDAP search filter. If the user enters john@example.com into the login form the resulting search filter string would become this:

(|(uid=john@example.com)(mail=john@example.com))

If, however, the user enters *, this will trigger a match-any search, since this is what the asterisk means in the context of LDAP search filters.

(|(uid=*)(mail=*))

To prevent this from happening you may limit the range of acceptable input characters or you may use a function that sanitises the input by escaping all special characters in the assertion value. The special search filter characters and how to escape them is specified in detail in RFC 4515 (LDAP: String Representation of Search Filters).

Here is one such sanitising method, written in Java:

/**
 * Escapes any special chars (RFC 4515) from a string representing a
 * a search filter assertion value.
 *
 * @param input The input string.
 *
 * @return A assertion value string ready for insertion into a 
 *         search filter string.
 */
public static String sanitize(final String input) {

        String s = "";

        for (int i=0; i< input.length(); i++) {

                char c = input.charAt(i);

                if (c == '*') {
                        // escape asterisk
                        s += "\\2a";
                }
                else if (c == '(') {
                        // escape left parenthesis
                        s += "\\28";
                }
                else if (c == ')') {
                        // escape right parenthesis
                        s += "\\29";
                }
                else if (c == '\\') {
                        // escape backslash
                        s += "\\5c";
                }
                else if (c == '\u0000') {
                        // escape NULL char
                        s += "\\00";
                }
                else if (c <= 0x7f) {
                        // regular 1-byte UTF-8 char
                        s += String.valueOf(c);
                }
                else if (c >= 0x080) { 

                        // higher-order 2, 3 and 4-byte UTF-8 chars

                        try {
                                byte[] utf8bytes = String.valueOf(c).getBytes("UTF8");

                                for (byte b: utf8bytes)
                                        s += String.format("\\%02x", b);

                        } catch (UnsupportedEncodingException e) {
                                // ignore
                        }
                }
        }

        return s;
}

If there is sufficient demand, I will consider including such a sanitising method to the web API of Json2Ldap.