Text4Shell Explained: What Went Wrong And How to Hunt For It

By Yonatan Khanashvili, Threat Hunting Expert at Team Axon

Overview

CVE-2022-42889 (aka “Text4Shell”) was discovered by GitHub Security Labs researcher Alvaro Muñoz in March 2022. The vulnerability allows Remote Code Execution (RCE) in Apache Commons Text, receiving a critical CVS score of 9.8. The “Commons Text” library is commonly used in a variety of applications, which might be the reason for the high media hype the vulnerability received, perhaps unfairly.

Apache Commons Text is a commonly used library, originally released in 2017, which includes algorithms for string functionality. The library performs a process called “variable interpolation”, which evaluates the properties of strings which contain placeholders, in order to replace the placeholders with their corresponding values. But in versions of the library dating as far back as 2018, some default lookup instances included evaluations that could result in arbitrary code execution or contact with remote servers.

Deep Dive: What Went Wrong?

The vulnerability affects the Apache Commons Text Java library, which provides different methods for working with strings, beyond what the core Java offers. One of those methods, which is relevant to our case, is string interpolation.

In simple words, string interpolation is a technique that enables you to insert expression values into literal strings, much like format strings in C. By default (or at least until version 1.9) the method StringSubstitutor.replace within the library can process a variety of different lookup functions as part of the interpolation process.

Step 1: Locating the fix

Apache Commons Text is an open-source project, which means that its entire source code, together with a documentation of all changes made, are freely available. This has two important benefits:

It’s relatively easy to compare versions of the code and determine which changes were made and where.
With source code being available, there’s no need for any reverse engineering.

In order to locate the fix, we start by examining the last commit made on the Apache commons-text project. This commit included mitigation and a code fix provided with version 1.10. The commit contains many different changes, including doc updates and new test classes. The following screenshots from a diff process show our investigation process. Hopefully, you can do this on your own next time!

A good place to start the investigation is the StringLookupFactory class under the StringLookupFactory.java file. Clearly, there’s been a significant change there: a couple of new methods have been added, one of which - StringLookupFactory.createDefaultStringLookups - creates a HashMap (in simple words set of key/value pairs) with a list of different lookup function names.

text4shell_picture_1

Figure 1: The new lookups in StringLookupFactory

This list looks very specific. And indeed, when looking at the same list in the previous version (pre-fix), there are three additional lookup functions that are no longer present:

DefaultStringLookup.script
DefaultStringLookup.DNS
DefaultStringLookup.URL

This observation is corroborated by the commit notes:

text4shell_picture_2

Figure 2: Documentation of the removed lookups functions

The logical conclusion would be that those three were somehow vulnerable and were therefore removed. But at this point this is just an educated guess, and we need to dig further in order to affirm our conclusion.

Step 2: Following the lead

Further down the execution path, the method StringLookupFactory.addDefaultStringLookups uses the putAll method to instantiate a new HashMap (stringLookupMap) with the previously provided list of lookup functions:

text4shell_picture_3

Figure 3: Instantiating a new HashMap

During execution, InterpolatorStringLookup.lookup will take an input string, extract the schema (the lookup string itself), and validate it against the HashMap stringLookupMap which was built by the StringLookupFactory.addDefaultStringLookups and StringLookupFactory.createDefaultStringLookups methods shown above. In the new version, script, DNS, and URL lookup types are no longer included in the HashMap, and will therefore no longer be supported.

text4shell_picture4

Figure 4: lookup is using the HashMap to match lookup types

Step 3: Finding the root cause

It is evident from the fixed code that something in the script, DNS, and URL lookup types was deemed dangerous and therefore disabled. But this is a mitigation act, not a bug fix - what’s the actual problem? In order to answer this question, we need to determine the execution flow that leads to InterpolatorStringLookup.lookup and examine it.

Starting with the method StringSubstitutor.replace, which is responsible for replacing all occurrences of variables with their matching values, we can see that it’s calling StringSubstitutor.substitute:

text4shell_picture5

Figure 5: StringSubstitutor.replace calling StringSubstitutor.substitute

StringSubstitutor.substitute is responsible, as its name suggests, for substituting the provided variables. This involves parsing both the input literal strings and the string interpolation, using the StringSubstitutor.resolveVariable method, which in turn calls InterpolatorStringLookup.lookup:

text4shell_picture6

Figure 6: StringSubstitutor.resolveVariable calling InterpolatorStringLookup.lookup

At this point, InterpolatorStringLookup.lookup maps the relevant lookup function provided in the input string, as shown in figure 4.

Step 4: Bottom line (and a warning)

It is becoming evident that the potentially vulnerable method - StringSubstitutor.replace - wasn’t actually fixed. Instead, mitigation was added in front of it, to limit the input and use-cases, by modifying the relevant lookups on the StringLookupFactory class.

Hence, in case the Apache Commons Text library isn’t updated to the latest version (1.10), our call to the method StringSubstitutor.replace can accept a problematic lookup function that can allow running arbitrary code or making outbound connections on behalf of the affected server as - script, DNS, URL - those which were excluded by the mitigating code change.

Summing it up:

If we provide a string lookup of type DefaultStringLookup.DNS to StringSubstitutor.replace method as follows, we’ll be able to trigger an outbound connection on behalf of the java process or any other process that uses the library within its code:

public class test {

public static void main(String[] args) {

StringSubstitutor stringSubstitutor = StringSubstitutor.createInterpolator();

stringSubstitutor.replace(${dns:address|hunters.ai});

}

Similarly, we can use the DefaultStringLookup.script lookup type to actually execute code on the target system.

public class test {

public static void main(String[] args) {

StringSubstitutor stringSubstitutor = StringSubstitutor.createInterpolator();

stringSubstitutor.replace(${script:javascript:1 + 1});

}

Comparing Text4Shell with Log4Shell?

Text4Shell came into our lives following an intense year with many published vulnerabilities, such as Spring4Shell and the well-known Log4Shell, which affected common Java libraries. Consequently, the comparison is inevitable.

When we compare one vulnerability to another, we need to consider the various parameters involved in the vulnerability exploitation, such as the prevalence of the affected component, the ease with which it can be exploited, and more.

It’s easy to see the similarities between Log4Shell and Text4Shelll, but there’s still a major difference in the exploitation method: While Log4Shell only requires one string with a JNDI lookup in any of the headers inside the malicious packet, Text4Shell requires a direct call to the vulnerable StringSubstitutor.replace method. This direct call might come in different flavors, such as a GET request to a specific parameter, or a POST request to an input parameter. This obviously depends on how the developer designed and implemented those in the backend, but a determined attacker is only a relatively simple fuzzing action away from hitting a vulnerable spot.

Don’t get us wrong, we aren’t saying Text4Shell isn’t dangerous. In fact, it very much is. The prevalence of the Apache Commons Text library is very high and essentially creates a massive attack surface. But when performing a risk assessment, we need to take additional factors into account, namely the ease of exploiting the vulnerability flow. While we don't think you need to freak out, we highly recommend considering applying the mitigation by updating the library, identifying potentially affected components, and performing threat-hunting activity or related assets.

Hunt With Team Axon

After having explained how the vulnerability works, we can start discussing relevant threat-hunting processes that can subsequently be structured.

Visibility: Identifying potentially affected components

Visibility is a key point when discussing an RCE vulnerability on publicly exposed components. We want to be able to quickly identify and track them, and if necessary, also set up extended monitoring on them.

When the Java binary is loaded into memory, the related libraries will be provided in the command-line.

The query will provide visibility to different components which utilize the Apache Commons Text library and might be exposed to the Internet, based on correlation with incoming network traffic. Exclusively for Hunters’ customers, they will be classified by sensitivity (based on the asset tagging feature in the Hunters platform).

While Apache Commons Text may not be in the direct dependency tree of the project, it may be used indirectly by some other dependency. Therefore, we highly recommend scanning your code for usage of the vulnerable method with code scanning tools.

The query can be found here:

https://github.com/axon-git/rapid-response/blob/a8794b9adb537ac0584b9cd86e915e43b5314092/Text4Shell%20-%20CVE-2022-42889/vulnerable_devices_query.sql

Exploitation detection attempts via CDN/WAF logging

As shown earlier, practical exploitation of the vulnerability includes passing a relevant string lookup function in a parameter that utilizes the StringSubstitutor.replace method. We would therefore expect to see the exploitation attempt by including this string in relevant URI parameters, which are visible in your CDN or WAF logging.

Please note that the design and implementation of your application is likely to be specific to you, which means that in theory the backend implementation of StringSubstitutor.replace can include passing a string to an unrecorded header that isn’t logged by the CDN by default. You should keep that in mind when hunting for exploitation via CDN.

The following query will look for the relevant lookup string functions and a practical method of executing a process with a Java method. We decided to include all the existing lookup strings, simply in order to be able to monitor potential new and creative ways for exploiting the vulnerability. In case you want to only focus on the published methods, you can look for script, dns and url alone.

The query can be found here:

https://github.com/axon-git/rapid-response/blob/a8794b9adb537ac0584b9cd86e915e43b5314092/Text4Shell%20-%20CVE-2022-42889/exploitation_attempts_hunting_query.sql

Java BOI Process Execution: Reverse Shell Execution By Java Process

The following query will detect commonly abused binaries that are executed by a java process (either on a Windows or a Unix machine) followed by an external network connection.

The query can be found here:

https://github.com/axon-git/rapid-response/blob/a8794b9adb537ac0584b9cd86e915e43b5314092/Text4Shell%20-%20CVE-2022-42889/post_exploitation_hunting_query.sql

Suggested Mitigation

The vulnerability affects Apache Commons Text versions between 1.5 and 1.9. If your organization has direct dependencies on Apache Commons Text, we highly recommend upgrading to Apache Commons Text version 1.10, which isn't affected by the vulnerability. While specific patches are yet to be published by Apache, upgrading to version 1.10 is a necessary step.

There are claims online that the vulnerability isn't relevant for environments with Java (JDK) from 15+ versions. Those claims have been dismissed by Alvaro Muñoz, the researcher who found the vulnerability, since the vulnerability can be utilized in different JS engines with no specific dependencies. Hence, as far as we can tell, all JDK versions are affected by the vulnerability. Please refer to the Apache Commons Text site for more details.

Conclusion

In today’s blog post, we’ve dived into the recently published Text4Shell vulnerability, learned how it works by analyzing the changes in the fixed version of the library, and provided threat-hunting practices and recommendations you can start applying today.

Looking forward, the team is constantly monitoring the threat landscape, looking for exploitation attempts involving the vulnerability, and will update accordingly.

And remember: Just like we looked at the fix and located the vulnerability, so do hackers and criminals. Keep your systems patched!

See you next time!

~ Axon

To stay updated on threat hunting research, activities, and queries, follow Team Axon’s Twitter account (@team__axon).

Text4Shell Explained: What Went Wrong And How to Hunt For It

Overview

Deep Dive: What Went Wrong?

Step 1: Locating the fix

Step 2: Following the lead

Step 3: Finding the root cause

Step 4: Bottom line (and a warning)

Comparing Text4Shell with Log4Shell?

Hunt With Team Axon

Suggested Mitigation

Conclusion

PRODUCT

SOLUTIONS

SERVICES

COMPANY

Text4Shell Explained: What Went Wrong And How to Hunt For It

Overview

Deep Dive: What Went Wrong?

Step 1: Locating the fix

Step 2: Following the lead

Step 3: Finding the root cause

Step 4: Bottom line (and a warning)

Comparing Text4Shell with Log4Shell?

Hunt With Team Axon

Suggested Mitigation

Conclusion

Relevant posts

Protecting Postgres: Key Security Takeaways from a Postgres Honeypot

Hunting Ransomware in OneDrive: Straightforward Tips for Unpredictable Threats

DeleFriend: Severe design flaw in Domain Wide Delegation could leave Google Workspace vulnerable for takeover

The Human-Friendly Guide: Incident Response & Threat Hunting in Microsoft Azure, Part 1

Optimizing Threat Intel: Shifting Left on Detection to Minimize False Positives