Inappropriate Encoding for Output Context

The software uses or specifies an encoding when generating output to a downstream component, but the specified encoding is not the same as the encoding that is expected by the downstream component.


This weakness can cause the downstream component to use a decoding method that produces different data than what the software intended to send. When the wrong encoding is used - even if closely related - the downstream component could decode the data incorrectly. This can have security consequences when the provided boundaries between control and data are inadvertently broken, because the resulting data could introduce control characters or special elements that were not sent by the software. The resulting data could then be used to bypass protection mechanisms such as input validation, and enable injection attacks.

While using output encoding is essential for ensuring that communications between components are accurate, the use of the wrong encoding - even if closely related - could cause the downstream component to misinterpret the output.

For example, HTML entity encoding is used for elements in the HTML body of a web page. However, a programmer might use entity encoding when generating output for that is used within an attribute of an HTML tag, which could contain functional Javascript that is not affected by the HTML encoding.

While web applications have received the most attention for this problem, this weakness could potentially apply to any type of software that uses a communications stream that could support multiple encodings.


The following examples help to illustrate the nature of this weakness and describe methods or techniques which can be used to mitigate the risk.

Note that the examples here are by no means exhaustive and any given weakness may have many subtle varieties, each of which may require different detection methods or runtime controls.

Example One

This code dynamically builds an HTML page using POST data:

$username = $_POST['username'];
$picSource = $_POST['picsource'];
$picAltText = $_POST['picalttext'];

echo "<title>Welcome, " . htmlentities($username) ."</title>";
echo "<img src='". htmlentities($picSource) ." ' alt='". htmlentities($picAltText) . '" />';

The programmer attempts to avoid XSS exploits (CWE-79) by encoding the POST values so they will not be interpreted as valid HTML. However, the htmlentities() encoding is not appropriate when the data are used as HTML attributes, allowing more attributes to be injected.

For example, an attacker can set picAltText to:

"altTextHere' onload='alert(document.cookie)"

This will result in the generated HTML image tag:

<img src='pic.jpg' alt='altTextHere' onload='alert(document.cookie)' />

The attacker can inject arbitrary javascript into the tag due to this incorrect encoding.

See Also

SEI CERT Oracle Secure Coding Standard for Java - Guidelines 04. Characters and Strings (STR)

Weaknesses in this category are related to the rules and recommendations in the Characters and Strings (STR) section of the SEI CERT Oracle Secure Coding Standard for ...

Data Neutralization Issues

Weaknesses in this category are related to the creation or neutralization of data using an incorrect format.

Comprehensive CWE Dictionary

This view (slice) covers all the elements in CWE.

Weaknesses without Software Fault Patterns

CWE identifiers in this view are weaknesses that do not have associated Software Fault Patterns (SFPs), as covered by the CWE-888 view. As such, they represent gaps in...

CWE Cross-section

This view contains a selection of weaknesses that represent the variety of weaknesses that are captured in CWE, at a level of abstraction that is likely to be useful t...

Common Weakness Enumeration content on this website is copyright of The MITRE Corporation unless otherwise specified. Use of the Common Weakness Enumeration and the associated references on this website are subject to the Terms of Use as specified by The MITRE Corporation.