I recently installed some log charting software to get a view NGINX logs for the instance that servers up this site. While looking at some of the output I found some strange results so went to have a closer look at the actual logs
There were a whole bunch of entries with with a similar pattern, something similar to the following:
18.104.22.168 - - [15/Jul/2020:11:43:39 +0100] "CONNECT rate-market.ru:443 HTTP/1.1" 400 173 "-" "-"
CONNECT is a HTTP verb that is used by HTTP Proxy clients to tell the proxy server to set up a tunnel to the requested machine. This is needed because is the client is trying to connect to a HTTP server that uses SSL/TLS then it needs to be able to do the SSL/TLS handshake directly with that server to ensure it has the correct certificates.
Since my instance of NGINX is not configured to be a HTTP Proxy server it is rightly rejecting these requests as can been seen by the
400 (Bad Request) after the request string. Even though the clients making these request are all getting the 400 error they are continuing to send the same request over and over again.
I wanted to filter these bogus request out for 2 reasons
- They clog the logs up, making it harder to see real problems
- They take up resources to process
The solution is to use a tool called fail2ban, which watches log files for signs of abuse and then sets up the correct firewall rules to prevent the remote clients from being able to connect.
I already run fail2ban monitoring the SSH logs for all my machines that face the internet as there are plenty of scripts out there running trying standard username/password combinations attempting to log into any machine listening on port 22.
Fail2ban comes with a bunch of log filters to work will a applications, but the ones bundled on raspbian (RaspberryOS) at this time didn’t include one that would match this particular pattern. Writing filters is not that hard but it does require writing regular expressions which is a bit of dark art. The trick is to get a regex that only matches what we want.
The one I came up with is this:
^<HOST> - - \[.*\] \"CONNECT .* HTTP/1\.[0-1]\" 400
Let’s break that down into it’s specific parts
^this matches the start of the log line
<HOST>fail2ban has some special tags that match an IP address or Host name
- -this matches the 2 dashes in the log
\[.*\]this matches the date/timestamp in the square brackets, I could have matched the date more precisely, but we don’t need to in this case. We have to escape the
[as it has meaning in regex as we will see later.
\"CONNECTthis matches the start of the request string and limits it to just CONNECT requests
.*this matches the host:port combination that is being requested
HTTP/1\.[0-1]\"here we match both HTTP/1.0 and HTTP/1.1. We need to escape the
.and the square brackets let us specify a range of values
400and finally the 400 error code
Now we have the regex we can test it with the
fail2ban-regex command to make sure it finds what we expect.
# fail2ban-regex /var/log/nginx/access.log '^<HOST> - - \[.*\] \"CONNECT .* HTTP/1\.[0-1]\" 400' Running tests ============= Use failregex line : ^<HOST> - - \[.*\] \"CONNECT .* HTTP/1\.[0-1]\" 400 Use log file : /var/log/nginx/access.log Use encoding : UTF-8 Results ======= Failregex: 12819 total |- #) [# of hits] regular expression | 1)  ^<HOST> - - \[.*\] \"CONNECT .* HTTP/1\.[0-1]\" 400 `- Ignoreregex: 0 total Date template hits: |- [# of hits] date format |  Day(?P<_sep>[-/])MON(?P=_sep)ExYear[ :]?24hour:Minute:Second(?:\.Microseconds)?(?: Zone offset)? `- Lines: 526165 lines, 0 ignored, 12819 matched, 513346 missed [processed in 97.73 sec] Missed line(s): too many to print. Use --print-all-missed to print all 513346 lines
We can see that the regex matched 12819 lines out of 526165 which is in the right ball park. Now we have the right regex we can build the fail2ban config files.
First up is the filter file that will live in
# Fail2Ban filter to match HTTP Proxy attempts # [INCLUDES] [Definition] failregex = ^<HOST> - - \[.*\] \"CONNECT .* HTTP/1\.[0-1]\" 400 ignoreregex =
next is the jail file in
/etc/fail2ban/jail.d/nginx-connect.conf, this tells fail2ban what to do if it finds log lines that match the filter and where to find the logs.
[nginx-connect] port = http,https enabled = true logpath = %(nginx_access_log)s action = %(action_mwl)s
port entry is which ports to block in the firewall,
logpath should be obvious (it’s using a pre-configured macro for the path). And finally the
action, there are a number of pre-configured actions that range from just logging the fact that there were hits in the log all the way through to this on
action_mwl, which adds a firewall rule to block the
<HOST>, then emails me that with information about the IP address and a selection of the matching log lines.
I’ll probably reduce the action to one that just adds the firewall rule after a day or two when I have a feel for how much of this sort of traffic I’m getting. But the info about each IP address normally include a lookup of the owner and that usually has an email address to report problems. It’s not often that reporting things actually stops it happening, but it’s worth giving it ago at the start to see what happens.
Over the course of the afternoon since it was deployed the filter has blocked 148 IP addresses.
My default rules are a 12 hour ban if the same IP address triggers more than 3 times in 10 mins with a follow up of getting a 2 week ban if you trip that more than 3 times in 3 days.
Next up is to create a filter to match the
GET https://www.instagram.com requests that are also polluting the logs.