Authors: Sparsh Kulshrestha, Shashank Barthwal
BeVigil OSINT API public documentation

History of Data Gathering and Scanning

Over the years, large-scale scanning of the internet has enabled the security community to identify widespread vulnerabilities and mitigate them before they can be exploited. The first project in this category was started in 1998 at Bell Labs, it was called the “Internet Mapping Project”. Some of the more recent projects are Shodan, Censys, etc. However, despite the current paradigm shift to a mobile-first ecosystem, there is limited availability of comprehensive datasets, vulnerability detection tools, and research and analysis for mobile operating systems, of which, Android is the most popular and widely used.

Importance of Asset Discovery from Android Apps

When you look at an application’s compiled code, it seems that it’s not readable and no hard-coded values can be found. However, an experienced and motivated pentester will find it rather easy to extract such keys and even automate the process of extracting hard-coded values. That’s why you should never store or hard-code sensitive keys inside your app.

The problem of considering compiled mobile applications as safe storage is real. This article claims that 0.5% of mobile applications contain AWS API keys which have resulted in the exposure of 100M+ users’ data. There are many other sensitive keys that should not be stored directly in code and there are several instances of exposed keys being used to carry out cyber attacks.

With over 3 billion active Android devices globally and ~8 million apps across 80+ app stores to choose from, the attack surface continues to grow. This is especially concerning given that Android apps are notorious for hardcoded assets and lax backend security that can be misused to carry out large-scale attacks.

The Problem

While we often focus on web security, there has been little progress in mobile security due to the:

Unavailability of tools for large-scale automated extraction and unpacking of APKs. Hence, to analyze a mobile app, it first needs to be manually decompiled. After which, its source code needs to be analyzed for exposed/ hardcoded assets.
Lack of data-driven techniques to extensively and continuously scan thousands of Android apps to identify vulnerabilities and exposed attack surfaces, both on the client-side and server-side, to determine their criticality and mitigate the risks they pose.
Absence of collaboration and data collection, analysis, and sharing within the security community when it comes to mobile apps.
The dearth of public awareness and visibility of mobile app-related threats.

Our Innovation

Our innovation addresses these issues by:

Creating and maintaining a searchable repository of 500,000+ Android apps that have been extracted and unpacked. We are in the process of indexing 200,000+ Android apps a month since researchers and developers across the world are uploading new apps on a daily basis. Users can search for exposed/ hardcoded assets such as API Keys and secrets, hardcoded credentials, authorization tokens, cloud assets, subdomains, URLs, endpoints, parameters, IP addresses, etc. in the source codes of the apps in the repository.
This dataset is extremely valuable when it comes to identifying vulnerabilities, such as cloud misconfigurations and injection vulnerabilities such as Log4shell, SSRF, and SQLi, etc., which affect poorly secured backend APIs and hosts powering the apps.
The datasets and search functionality, along with vulnerability assessment, and our research findings are available to the security community via an OSINT API. This will enable researchers and developers to automate asset discovery, reconnaissance and research, and analysis of Android apps.

Data Overview

Our tooling has the capability to identify a large number of different assets such as API Keys and secrets, hardcoded credentials, authorization tokens, cloud assets, subdomains, URLs, endpoints, parameters, IP addresses, etc.
Currently, we have scanned more than half a million – 500,000+ apps submitted on the platform since it was running, within which we have identified a total of about 150 million assets. This includes more than 1 million subdomains, about 7 million unique URLs, about half a million cloud assets, services – [S3, GCP, ELB, etc], and more than half a million API Keys or hardcoded secrets.

Chart — *Distribution of assets in our inventory*

*Distribution of top 6 secrets and keys in our inventory*

Scanning the Mobile World for Vulnerabilities

The first step of our research involved automating the process of extracting and unpacking apps, using only the app store link as input. After which, we had a searchable database of about half a million – 500,000+ apps’ source codes.
Using a regex, we identified 2 million+ domain URLs that were present in the source codes. The subdomain URLs of these domains were enumerated to identify all the injection points that could b tested.
The URLs are then matched to the respective package IDs in the dataset. And using the package IDs, the various other endpoint parameters are identified and matched with the corresponding URLs.
The URLs, in conjunction with the corresponding parameters, were injected with payloads for common vulnerabilities such as log4 shell RCE, SSRF, SQLi, etc.

BeVigil Vulnerability Scanning Use Cases

APKs Backend Analysis of Log4 Shell Vulnerability Using BeVigil

Function: Log4j is a popular logging framework used by many java applications.

Risk: Log4 Shell vulnerability (CVE-2021-44228) can allow remote code execution on the affected systems, poorly secured backend APIs, and hosts powering the apps. The Log4j security vulnerability allows attackers to execute malicious code remotely on a target computer. Meaning, bad actors (hackers) can easily steal data, install malware, or simply take control of a system via the Internet.

Scope: CloudSEK scans have identified that 130+ Android Apps have backend APIs affected by CVE-2021-44228 (Log4 Shell). 70% of the companies affected by log4 shell vulnerability are Indian companies.

Mitigation measures:

Update to version 2.17.1 or above of the log4j library.
Perform proper input validation on user strings input.

APKs Backend Analysis of SSRF Vulnerability Using BeVigil

Function: Server-side request forgery (also known as SSRF) is a web security vulnerability that allows an attacker to induce the server-side application to make HTTP requests to an arbitrary domain of the attacker’s choosing.

Risk: In a typical SSRF attack, the attacker might cause the server to make a connection to internal-only services within the organization’s infrastructure. A successful SSRF attack can often result in unauthorized actions or access to data within the organization, either in the vulnerable application itself or on other back-end systems that the application can communicate with. In some situations, the SSRF vulnerability might allow an attacker to perform arbitrary command execution.

Scope: In a recent study, CloudSEK scans identified 100+ Android Apps that have backend APIs affected by SSRF(Server Side Request Forgery). SSRF vulnerability can allow attackers to force the server to connect to arbitrary external systems, potentially leaking sensitive data such as authorization credentials.

Mitigation measures:

Some applications only allow input that matches, begins with, or contains, a whitelist of permitted values.
Some applications block input containing hostnames like 127.0.0.1 and localhost, or sensitive URLs like /admin.

Note: This research follows the guidelines of CloudSEK’s Responsible Disclosure Policy (Please check Appendix for complete details)

Conclusion

With businesses, transactions, and interactions going online, governments and organizations are focusing on bolstering user privacy and IT security. Android maintained its position as the leading mobile operating system worldwide in June 2021, controlling the mobile OS market with a close to 73% share. So it’s time that we addressed the need for greater Android security.

OSINT API: https://bevigil.com/osint-api
API documentation: https://osint.bevigil.com/

Unraveling Assets from Android Apps at Scale