Some time back we had a website with an 'unauthorised' plugin introduce sophisticated Malware into a website that resulted in damage that made its way into other system files. As a result of that breach we introduced additional measures to mitigate the impact of potential intrusions - a necessity given the sensitive nature of any financial asset.
Every server can be penetrated in some way, and it's only a matter of time before one finds themselves victims of a malicious actor. We create daily website backups so the restore process is always easy, but the process of tightening server functionality is one that is always ongoing. One of the more recent security measures we introduced makes use of our Abuse API that will simply ban any incoming request that is in any way associated with known, reported, or suspected illegal activity or abuse.
The Abuse (Malware) API is updated every hour with hundreds of thousands of records sourced from multiple vendors. If an incoming request matches a suspected source, that request is blocked and a '403 Forbidden' message is returned. This article will details how the Abuse (Malware) API is used.
No Solution is Perfect: Malware practitioners will often scan websites for vulnerabilities, crawl websites, or brute force login screens or other files that carry a weakness. The business of Malware is a sophisticated one so our methods (when used in isolation) isn't perfect, but it's a reasonable measure for blocking known malicious sources, and a highly effective tool when used in company with other tools.
Website Backups: We make client website backups every day, and Yabber provides a facility to create your own backup when required. The offsite backups are restored by us when necessary. This style of service will normally carry a significant fee but is provided to our clients at no additional cost. Well over 40% of all websites on the Internet are created with WordPress but like any open-sourced application, this make the software vulnerable to clever hackers. We believe that if we're going to use WordPress we need to provide a reliable and industry-leading framework for backups (in addition to providing a service where those backups are unlikely to be required).
Abuse (Malware) API
In addition to returning details on IP addresses and Network IPs that are associated with malicious activity, the Abuse and Malware API will also return details for identified web crawlers (many of which are used by the malware crowd). The API is a standard client-only RESTful API that is now integrated with about every application we create.
API Endpoint
The API supports a number of operations, but the standard GET request is most commonly used and is accessed via the following endpoint.
api.beliefmedia.com/abuse/abuse.json?ip={ip_address}&apikey={apikey}
Access to the API is made available via your primary API Key. No limitations are applied to use, but if you expect more than 30 requests per second we'd appreciate an understanding of your application. If accessing the API from a service that is not hosted by us you will also be required to supply a k
parameter which is a mutating string valid for no more than a few seconds.
This article will introduce a few of the expected API responses.
Examples
Let's consider the IP address of 45.80.158.62
. The request returns JSON that unfolds into the following array (truncated for readability).
A valid request will always return a code of 200
. The is_malware
and is_bot
keys will usually carry a value of 1
(true) or 0
(false), and it's these values that you'll generally reference for standard blocks. The network_count
key will return the number of IP addresses reported within the network range (in this case, 53), and those IPs are listed in the network_ip
array.
URLhaus is a website that provides a regularly updated list of Malware URLs, so we include those malicious URLs associated with the urlhaus_network
key. The excellent URLhaus data is only made available for non-commercial purposes so it shouldn't be used or relied upon for anything other than your own personal applications. The URLhaus data is indexed on the URLhaus IP ID and each associated URL record is indexed on a hash which is an md5 hash of the IP address (without any port if one was provided). Our primary data is derived via the network_ip
array... although, in reality, we rely almost exclusively on the single digit is_malware
flag.
In the next example we'll have a look at an IP that resolves to the Chinese-owned ByteDance (parent company of TikTok). The number of requests from ByteDance is truly staggering - sometimes in the order of over 1000 requests every hour. Do we really need ByteDance to have access to our website? Probably not. Should we ban this bot? Probably.
In the case of bots we'll try and provide ownership details, and we'll show other IP addresses associated with the same bot service.
In the last example we'll look at an IP that resolves as a bot and is also regarded as malware or malicious.
The Impact of Bots
Bots such as Amazon, ByteDance, MJ12 (Malware) and a range of others often provide no relevance or value to the every day operation or success of our website. The aggressive bots are commonly used to ingest information to train AI systems, or they're used for harvesting information that can be sold. The net impact is that these requests slow down your site, compromise website performance, and consume valuable resources. At the time of writing we're undecided how we'll handle many bots, but it's highly likely we'll simply ban all those that don't provide direct value. Many bots, such as Google, Bing, Yandex, and others, are permanently white-listed.
We've recorded tends of thousands of bots (with many others unidentified), and the number of requests per day to our systems is often in the magnitude of millions of requests. Compared against the same period just a few years ago we've seen a massive increase that we attribute to the training of various AI systems.
Bot API Endpoint: The API includes a large number of endpoints, such as those to query bots and associated IP addresses. The API will be detailed in full (over time) via our FAQ module.
Matrix API: We have our own bot that crawls only industry-specific websites (notably the finance industry), so we're aware of the needs for bots in order to gain an understanding that can be freely shared with others. However, our own BeNet bot makes requests at intervals that are unlikely to impact server performance and the data is open-sourced to the industry. We try to adhere to a practice that we consider best-practice.
Simplified API Endpoint
As noted a couple of times, it's the is_malware
key that is most relevant, and you'll likely want to minimise the returned payload if you're using the API for any real-world server-level applications. The parameter of simple=1
will force the response to return either a 1
or 0
with no other information. Error responses are empty when the simple
attribute is included.
Xena Integration
We recently overhauled the entire statistical engine into a module called Xena. The engine provides advanced statistics at the website level and extremely comprehensive statistics at the Yabber level (the former is provided for those that don't have Yabber access). Until now, we've recorded all resolved bots in a is_bot
field and we've returned that information when specifically called for in Yabber tables and API responses. The number of bot records is extremely significant. As of this week you'll start to see a drastically reduced count of bot views.
More Information
It's expected that we'll include API documentation into our FAQ module. If you're in immediate need for a solution we'd ask that you contact us for more information.