A couple of years ago, Google decided that it was going to insert its own image proxy between your Gmail messages and any remote images referenced therein. Before this, if an email included an image that was not sent along with the email (an external or remote image), when you read the email, your web browser saw the image reference in the message and loaded the image directly from the source. Now, however, Google rewrites the email message to refer to the image via Gmail’s image proxy, which loads and caches the image from the original location, and when you read the email, your web browser loads the image from Gmail’s image proxy rather than the original source. There are a number of benefits to this approach, but there are also drawbacks—namely, new bugs.
Periodically, I receive an email to my Gmail account from Zacks, the large investment research firm, which contains a remote image that has spaces in its filename, e.g. “motm cash sidelines image_624.png” in Sunday’s email. Not only are there spaces in the image filename, but these spaces aren’t escaped or encoded when the image URL is inserted into the HTML content of the email:
<img style="width: 624px; height: 269px; float: left; margin: 0px 8px 0px 0px;" src="https://staticx-tuner.zacks.com/images/articles/thumbnail/motm cash sidelines image_624.png" alt=""/>1 It’s certainly not best-practices to include spaces in any part of a URL, nor to include such URL in HTML with the spaces unescaped/unencoded, but as we have long-since left behind the world of strict and restrictive DOS and UNIX filename conventions, it is not unexpected these days to see humane filenames and file paths on the web (and, anyway, every reasonable web browser and email client knows how to handle such cases properly).
Both Mail and Safari know how to handle image URLs with (unescaped) spaces in them, and they do so correctly. Gmail, and its image proxy, fail.
The end result of the Gmail image proxy’s involvement in my email is that I have to extract the (rewritten) image URL from the message, strip off the part referring to the image proxy, and then fix the proxy’s broken encoding of the spaces in the original URL order to have an image URL I can then feed back to my browser and have it load the image (in another tab). Then I have the privilege of switching back and forth between the email text and the image the text is discussing. It’s a pain, but manageable, for one image. If there are several images in the email, repeating the process each time becomes quite annoying.
Back to investigating the problem—this is the URL for the “missing” image in Gmail:
https://ci4.googleusercontent.com/proxy/k0FVMVoNhpmHkkXhL6u7S4wzeMzBpLic1ugVLVVM4u-oIK79_Yb7WdjqITdHi0swAcPIGtpPGAK3B_MzoSvG32IRc2E6my-AqwWfDUPCvKezzfDRKGY-Ki9R3JORGPAhydwzYdLH_uxX7lKB2VCT93w=s0-d-e1-ft#https://staticx-tuner.zacks.com/images/articles/thumbnail/motm+cash+sidelines+image_624.png (Google’s systems, e.g., Blogger/Blogspot, tend to use the strange, non-standard practice of escaping/encoding spaces in URLs with the plus sign, even though they percent-encode everything else.) Recall that the filename of the image on the Zacks server is “motm cash sidelines image_624.png” but that the filename shown in the URL in Gmail is “motm+cash+sidelines+image_624.png”, and then evidence of the problem becomes apparent. If the Gmail image proxy tried to request “motm+cash+sidelines+image_624.png” from the Zacks web server, of course that image is not going to be found!
Without knowing more details about how the entire process of scanning an email for external image references, fetching and storing (caching) them via the image proxy, and the rewriting the original email’s image reference to point to the image proxy instead, it’s difficult to tell exactly where the problem lies. For instance, if the part of the image proxy that does the fetching of the external image encodes the URL using Google’s “standard” encode-spaces-with-plus-signs method and tries to fetch that, it won’t find the image. If the fetching part properly percent-escapes/encodes the URL before fetching but stores the image on the proxy server either with its original filename or as the percent-encoded version (which would be “motm%20cash%20sidelines%20image_624.png” for those keeping track), but the rewriting part uses the plus sign-encoding when rewriting the reference in the mail message, things will be broken. (Though it seems ridiculous to have one subsystem do thing one way and another subsystem do the same thing another way, it’s probably not uncommon in large, complex software—I’ve seen things like that before—and may only fail in edge or corner cases that the developer or team might not consider.) Or, if for some reason the image proxy assembles the entire URL that is later found in the Gmail message and then encodes it (using Google’s “standard” encode-spaces-with-plus-signs method) when inserting it back into the email, and then only tries to fetch the image once I, or someone else getting the same Zacks newsletter asks Gmail to load images (or has automatic image loading turned on), it’s going to fail if it doesn’t first change the plus signs back to spaces (or percent-encoded spaces).
It’s hard to say exactly where the bug might be, but make no mistake, it’s a bug; it’s Google’s bug (and it is no doubt caused in part by Google’s use of a non-standard encoding mechanism—spaces escaped with plus signs—in their web software).
1 The content of the email itself is actually sent encoded as Quoted-Printable, to protect it from such gremlins as 7-bit mail servers, but that’s not relevant to this bug and so I have decoded the Quoted-Printable here to make the HTML snippet more readily understandable. ↩︎