Nothing Special   »   [go: up one dir, main page]

Page MenuHomePhabricator

Squid HTTP_X_FORWARDED_FOR header not respected
Closed, DeclinedPublic

Description

Author: desrod

Description:
I run Apache behind Squid accellerators which sets a header of
HTTP_X_FORWARDED_FOR to represent the remote client IP address (instead of
Apache's REMOTE_ADDR header). MediaWiki's test for this is incorrect (fix
below). I tried using $wgUseSquid = true; in LocalSettings.php which did not
solve the problem for me.

In the file includes/ProxyTools.php there is a test for the header REMOTE_ADDR,
which will always be true, even when running behind Squid. In my case, the IP
reported for all clients is 127.0.0.1, which isn't useful when trying to track
down abuse of the wiki from outsiders.

I updated includes/ProxyTools.php to test for HTTP_X_FORWARDED_FOR first, then
REMOTE_ADDR. If HTTP_X_FORWARDED_FOR is found, it is used. If it isn't found,
REMOTE_ADDR is tested and used instead. The current test never reaches the
second clause, and this fixes that condition.

Below is my fixed stanza, in full. I've tested this and it works perfectly as
expected:

/* collect the originating ips */
# Client connecting to this webserver
if ( isset( $_SERVER['HTTP_X_FORWARDED_FOR'] ) ) {
        $ipchain = array( $_SERVER['HTTP_X_FORWARDED_FOR'] );

} else if ( isset( $_SERVER['REMOTE_ADDR'] ) ) {
        $ipchain = array( $_SERVER['REMOTE_ADDR'] );

} else {
        # Running on CLI?
        $ipchain = array( '127.0.0.1' );
}
$ip = $ipchain[0];

Version: 1.5.x
Severity: major

Details

Reference
bz2856

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:42 PM
bzimport set Reference to bz2856.
bzimport added a subscriber: Unknown Object (MLST).

desrod wrote:

Squid HTTP_X_FORWARDED_FOR header patch

Attached:

Can you confirm that you've set up the proxy settings for MediaWiki correctly?

We run the entire Wikimedia site behind squid proxies, and have no problems with this.

desrod wrote:

LocalSettings.php has one line that seems to be related to a proxy, and that is
$wgProxyKey, which doesn't have any comments associated with it. A quick grep of
the sources doesn't provide much info on its purpose either.

What proxy settings would I need to check? The whole proxy configuration is 100%
transparent on Apache and Squid, except for the addition of this header. Squid
runs on port 80, Apache runs on port 80. There are no ProxyPass directives in
Apache to point to Squid and vice versa.

Also, if this required specific Proxy handling, this would be the only
application out of over 200 we run here that needs to know its running behind a
proxy. Every application we run doesn't seem to care, except in the case where
they need to know the REMOTE_ADDR of the incoming user. In this case, they need
to look for that information in HTTP_X_FORWARDED_FOR, not REMOTE_ADDR.

The reason for filing this bug is obvious, if you look at the logic flow. You're
always checking REMOTE_ADDR first, which will ALWAYS exist. Its just that in the
case of running behind Squid, that header's value will contain 127.0.0.1 (or
whatever the non-public, internal IP of your squid proxy runs on). The proper
checking order should look for HTTP_X_FORWARDED_FOR first then REMOTE_ADDR. If
you're not running behind Squid or another proxy, HTTP_X_FORWARDED_FOR will not
exist, and REMOTE_ADDR can be trusted. If HTTP_X_FORWARDED_FOR does exist, then
its value should be used. Using REMOTE_ADDR in every case is flat-out wrong.

Another possible solution is mod_rpaf -> http://stderr.net/apache/rpaf/

But I don't want to run more modules than I have to, and certainly not for 1
single application that does not behave like the other 200 or so applications.

See DefaultSettings.php for all configurable settings.

LocalSettings.php contains only whatever overrides you've put into it.

alex wrote:

This bug also affects Cisco caches in use by at least one large ISP in
Australia. I found myself unable to edit for 24 hours due to a block being

placed on the transparent proxy server the isp uses. Here is their explanation:

IP-Based Authentication

Customers will often request that we bypass a particular website based on
the fact that it uses IP-based authentication for hosts. If the website
code has not been built upon web standards, they may only be checking for
the address of the remote requestor, rather than that supplied by the
X-Forwarded-For address.

Details on how to retrieve the X-Forwarded-For address are included here:

PHP: http://www.php.net/getenv

ASP: Request.ServerVariables("HTTP_X-Forwarded-For")

Example: http://www.lagado.com/proxy-test

The Cisco cache engines used by Internode will pass an X-Forwarded-For
variable to the server containing the original requestor IP address.

When the cache engines intercept a request, they will replace the
Client-IP field with their own IP address and store and supply the
original requestor's IP address in the X-Forwarded-For field.

X-Forwarded-For is an unofficial standard implemented originally by
Squid (http://www.squid-cache.org/) and adopted by many other
caching technologies since that time, including Cisco, who are the
vendor of the Cache engines we use on the Internode network.

You should also know that because you are using IP-Based authentication
for your website, you will potentially encounter issues outside of our
network, where customer requests coming from other ISPs also using
cache engines may also be supplying the cache engine's address in
in the Client-IP field.

We prefer not to implement bypasses where a problem can be worked
around by other means, as the more we add, the higher the long-term
load on the cache engines becomes. It is also notable that any cache
exemption we implement is not considered permanent, and may be removed
without notice at any time.

If at all possible, it would be better in the long-term to look at
implementing code to detect the IP address from either the
X-Forwarded-For address in addition to the Client-IP address field.
This should resolve any future issues you encounter with transparently
cached requests from most other networks.

Examples of using the X-Forwarded-For header can be shown below:

PHP: http://www.php.net/getenv
ASP: Using "Request.ServerVariables("HTTP_X-Forwarded-For")"

A website giving a fair bit of information about which variables
are supplied on request is at http://www.lagado.com/proxy-test.
As you can see, this website is retrieving all of the appropriate
fields, showing that the code of any website should be able to
detect the IP address of the original requestor, whether or not it is
transparently cached.

At this stage, we have not implemented the bypass to give you a
chance to review this information. Please get back to us if
there is a particular reason in which this option can not be
implemented, and we will then review the request.

I hope this helps and shows that the problem does lie at Wikipedias end.

We do respect X-Forwarded-For on known trusted servers (in particular
our own). Note that as client-provided data it CANNOT BE TRUSTED IN
GENERAL. An attacker can trivially create false X-Forwarded-For headers
with any arbitrary contents, which if blindly accepted would obscure
their real address (making them impossible to block) or falsely blame
some other address, perhaps with the intention of getting administrators
to block some other user.

alex wrote:

Whilst this works for some, I don't believe this bug should be closed as I believe
there are large numbers of users who sit behind transparent proxies at large ISPs.
The only solution for them if they share the proxy with a WikiVandal is to change
ISPs which is unsatisfactory.

I agree with the point the the X-Forwarded-For header cannot be trusted and is
easily faked. A more sophisticated solution would be to record the X-Forwarded-For
header AS WELL AS the REMOTE_ADDR from the server. Thus the unsophisticated
vandals can be dealt with as well as allowing normal users to continue. This will
also make it easier to track down vandals with the co-operation of ISPs as you can
identify that it came from that ISP (via REMOTE_ADDR) and you have the vandals
session IP (or that of an open proxy they might be using) via HTTP_X-Forwarded-For.
This would require changes to the code for blocking IPs to allow for

  1. Block a REMOTE_ADDR (currently implemented)
  2. Block a HTTP_X-Forwarded-For address via a REMOTE_ADDR (the patch implements

this, not currently implemented, suits unsophisticated and probably most common
vandals). Will not affect normal users behind the same proxy

  1. Don't trust HTTP_X-Forwarded-For via a particular REMOTE_ADDR (not currently

implemented, requires an additional table of blocked proxies, rampant vandal, but
you would record and have the HTTP_X-Forwarded-For header which will be useful for
the ISP in identifying the vandal and terminate them via their TOS, until which no
users at that ISP can edit).

I realise that this is now not a simple patch as it impacts other areas of code
and the operation of blocking functions if implemented this way. It would require
the creation of an additional table of blocked proxies and possibly trusted
proxies as well to cater for the case mentioned by Brion above. I do think though
that there are large numbers of editors who would benefit from a change such as
this.

At a minimum we really need some data as to the extent of use of transparent
proxies by editors, readers (potential editors) and vandals by collecting on one
or two servers the use of the HTTP_X-Forwarded-For header. There are also variants
on HTTP_X-Forwarded-For that would need to be catered for as well.

Can we collect some data and then make a decision?

Closing this bug as WORKSFORME since the case of the original reporter is what we do,
constantly, all day every day and it works fine.

If you wish to request some different treatment of external proxies please open a new
enhancement bug.