Bots are software that performs an automated task over the Internet. They are used for the productive task but they are frequently used for malicious activities. They are categorised as good and bad bots.
Good bots are used for positives purposes like Chabot for solving customer queries and web crawlers that are used for indexing search engines. A plain text file robot.txt can be placed in the root of site and rules can be configured in this file to allow/deny access to different site URLs. This way good bot can be controlled and are allowed to access certain site resources.
Then comes the bad bots, which are malicious programs and performs certain activities in the background in victims machine without the user’s knowledge. Such activities include accessing certain websites without the user’s knowledge or stealing the user’s confidential information’s. These are also spread across the Internet to perform DDoS (Distributed Denial of service) attack on target websites. Following techniques can be used to deter malicious bots from accessing resource extensive API of web applications.
Canvas fingerprint works on html5 canvas element. A small image is drawn on the canvas element of 1 x 1-pixel size. Each device generates a different hash of this image based on the browser, operating system and installed the graphics card. This technique is not sufficient enough to uniquely identify users because there will be certain group of users sharing the same configuration and device. But this has been observed that when the bot is scanning through the web pages it tries to access all link present on that landing page. So the time taken by the bot to click on the link on that page will always be similar. These two technique canvas fingerprint along with time to click can be combined to decide whether to provide access to the application or allow the user to validate that its a genuine user by shown a captcha. User will be given access to the site after successfully validating the captcha. So if this is observed that requests coming from the same device (same fingerprint) and with same time to click interval then this request will fall in bot category and has to be validated by showing captcha to the user. This will improve user experience, as captcha is not shown to all users but only for suspected requests.
The honey trap works by having few hidden links on the landing pages along with the actual link. Since bots are going to try all link on the page when sniffing the landing page it will get trapped in the hidden link which points to 404 pages. This data can be collected and use to identify the source of the request and later those sources can be blocked from accessing the website. Genuine user will only be able to see the actual link and will be able to access the site.
Blacklist IP address:
Not the best solution because the bot is smart enough to change the IP address with each request. However, it will help to reduce some of the bot traffic by providing one layer of protection.
There are some malicious apps that try to click certain links on the user’s device in the background and user does not know about this. Those requests have X-Requested-With header in the request which contains the app package name. The web application can be configured to block all requests that contain fraudulent X-Requested-With field value.
There are many third-party service providers that maintains a list of User-Agents that bots use. The website can be configured to block all the requests coming from these blacklisted user agents. Similar to IP address this field can also be changed by bot owners with each request, hence does not provide full proof protection.
The separate defence can be used in different cases. There are a few advertising partners that try to convert the same user again and again just to increase their share. For such cases, canvas fingerprint will be the most suitable solution. It identifies the request coming from the same device several times within a time interval and mark it as bot traffic and ask the user to validate to proceed further.
Honey trap will be an ideal defence when there are many advertising partners working to bring in more traffic and we need to generate the report which partner provide bot traffic more. From this report bot source can be identified and later request coming from that traffic can be blocked. Other defence works on blacklisting, eg IP address, User-Agent and X-Requested-with. These are part of request headers and can be easily changed by bot from time to time. When using these blacklisting defences, the application owner needs to make sure they are using the most updated list of fraud causing agents. Given the pace of new frauds happening daily keeping track of updated fraud causing agent will be a challenge.
So this can be used as the first line of defence to filter most notorious bots and later canvas fingerprint will filter advanced bots.