Improper Encoding or Escaping of Output

The software prepares a structured message for communication with another component, but encoding or escaping of the data is either missing or done incorrectly. As a result, the intended structure of the message is not preserved.


Description

Improper encoding or escaping can allow attackers to change the commands that are sent to another component, inserting malicious commands instead.

Most software follows a certain protocol that uses structured messages for communication between components, such as queries or commands. These structured messages can contain raw data interspersed with metadata or control information. For example, "GET /index.html HTTP/1.1" is a structured message containing a command ("GET") with a single argument ("/index.html") and metadata about which protocol version is being used ("HTTP/1.1").

If an application uses attacker-supplied inputs to construct a structured message without properly encoding or escaping, then the attacker could insert special characters that will cause the data to be interpreted as control information or metadata. Consequently, the component that receives the output will perform the wrong operations, or otherwise interpret the data incorrectly.

Demonstrations

The following examples help to illustrate the nature of this weakness and describe methods or techniques which can be used to mitigate the risk.

Note that the examples here are by no means exhaustive and any given weakness may have many subtle varieties, each of which may require different detection methods or runtime controls.

Example One

This code displays an email address that was submitted as part of a form.

<% String email = request.getParameter("email"); %>
...
Email Address: <%= email %>

The value read from the form parameter is reflected back to the client browser without having been encoded prior to output, allowing various XSS attacks (CWE-79).

Example Two

Consider a chat application in which a front-end web application communicates with a back-end server. The back-end is legacy code that does not perform authentication or authorization, so the front-end must implement it. The chat protocol supports two commands, SAY and BAN, although only administrators can use the BAN command. Each argument must be separated by a single space. The raw inputs are URL-encoded. The messaging protocol allows multiple commands to be specified on the same line if they are separated by a "|" character.

First let's look at the back end command processor code

$inputString = readLineFromFileHandle($serverFH);

# generate an array of strings separated by the "|" character.
@commands = split(/\|/, $inputString);

foreach $cmd (@commands) {

  # separate the operator from its arguments based on a single whitespace
  ($operator, $args) = split(/ /, $cmd, 2);

  $args = UrlDecode($args);
  if ($operator eq "BAN") {
    ExecuteBan($args);
  }
  elsif ($operator eq "SAY") {
    ExecuteSay($args);
  }
}

The front end web application receives a command, encodes it for sending to the server, performs the authorization check, and sends the command to the server.

$inputString = GetUntrustedArgument("command");
($cmd, $argstr) = split(/\s+/, $inputString, 2);

# removes extra whitespace and also changes CRLF's to spaces
$argstr =~ s/\s+/ /gs;

$argstr = UrlEncode($argstr);
if (($cmd eq "BAN") && (! IsAdministrator($username))) {
  die "Error: you are not the admin.\n";
}

# communicate with file server using a file handle
$fh = GetServerFileHandle("myserver");

print $fh "$cmd $argstr\n";

It is clear that, while the protocol and back-end allow multiple commands to be sent in a single request, the front end only intends to send a single command. However, the UrlEncode function could leave the "|" character intact. If an attacker provides:

SAY hello world|BAN user12

then the front end will see this is a "SAY" command, and the $argstr will look like "hello world | BAN user12". Since the command is "SAY", the check for the "BAN" command will fail, and the front end will send the URL-encoded command to the back end:

SAY hello%20world|BAN%20user12

The back end, however, will treat these as two separate commands:

SAY hello world
BAN user12

Notice, however, that if the front end properly encodes the "|" with "%7C", then the back end will only process a single command.

Example Three

This example takes user input, passes it through an encoding scheme and then creates a directory specified by the user.

sub GetUntrustedInput {
  return($ARGV[0]);
}

sub encode {
  my($str) = @_;
  $str =~ s/\&/\&amp;/gs;
  $str =~ s/\"/\&quot;/gs;
  $str =~ s/\'/\&apos;/gs;
  $str =~ s/\</\&lt;/gs;
  $str =~ s/\>/\&gt;/gs;
  return($str);
}

sub doit {
  my $uname = encode(GetUntrustedInput("username"));
  print "<b>Welcome, $uname!</b><p>\n";
  system("cd /home/$uname; /bin/ls -l");

}

The programmer attempts to encode dangerous characters, however the denylist for encoding is incomplete (CWE-184) and an attacker can still pass a semicolon, resulting in a chain with command injection (CWE-77).

Additionally, the encoding routine is used inappropriately with command execution. An attacker doesn't even need to insert their own semicolon. The attacker can instead leverage the encoding routine to provide the semicolon to separate the commands. If an attacker supplies a string of the form:

' pwd

then the program will encode the apostrophe and insert the semicolon, which functions as a command separator when passed to the system function. This allows the attacker to complete the command injection.

See Also

SEI CERT Oracle Secure Coding Standard for Java - Guidelines 00. Input Validation and Data Sanitization (IDS)

Weaknesses in this category are related to the rules and recommendations in the Input Validation and Data Sanitization (IDS) section of the SEI CERT Oracle Secure Codi...

SFP Secondary Cluster: Faulty Input Transformation

This category identifies Software Fault Patterns (SFPs) within the Faulty Input Transformation cluster.

CERT C++ Secure Coding Section 49 - Miscellaneous (MSC)

Weaknesses in this category are related to rules in the Miscellaneous (MSC) section of the CERT C++ Secure Coding Standard. Since not all rules map to specific weaknes...

Comprehensive CWE Dictionary

This view (slice) covers all the elements in CWE.

Weaknesses for Simplified Mapping of Published Vulnerabilities

CWE entries in this view (graph) may be used to categorize potential weaknesses within sources that handle public, third-party vulnerability information, such as the N...

Weaknesses without Software Fault Patterns

CWE identifiers in this view are weaknesses that do not have associated Software Fault Patterns (SFPs), as covered by the CWE-888 view. As such, they represent gaps in...


Common Weakness Enumeration content on this website is copyright of The MITRE Corporation unless otherwise specified. Use of the Common Weakness Enumeration and the associated references on this website are subject to the Terms of Use as specified by The MITRE Corporation.