10.31.16

Gmail Image Proxy and Spaces in Image URLs

Posted in Software, Your Daily QA at 2:24 am by

A couple of years ago, Google decided that it was going to insert its own image proxy between your Gmail messages and any remote images referenced therein. Before this, if an email included an image that was not sent along with the email (an external or remote image), when you read the email, your web browser saw the image reference in the message and loaded the image directly from the source. Now, however, Google rewrites the email message to refer to the image via Gmail’s image proxy, which loads and caches the image from the original location, and when you read the email, your web browser loads the image from Gmail’s image proxy rather than the original source. There are a number of benefits to this approach, but there are also drawbacks—namely, new bugs.

Periodically, I receive an email to my Gmail account from Zacks, the large investment research firm, which contains a remote image that has spaces in its filename, e.g. “motm cash sidelines image_624.png” in Sunday’s email. Not only are there spaces in the image filename, but these spaces aren’t escaped or encoded when the image URL is inserted into the HTML content of the email: <img style="width: 624px; height: 269px; float: left; margin: 0px 8px 0px 0px;" src="https://staticx-tuner.zacks.com/images/articles/thumbnail/motm cash sidelines image_624.png" alt=""/>1 It’s certainly not best-practices to include spaces in any part of a URL, nor to include such URL in HTML with the spaces unescaped/unencoded, but as we have long-since left behind the world of strict and restrictive DOS and UNIX filename conventions, it is not unexpected these days to see humane filenames and file paths on the web (and, anyway, every reasonable web browser and email client knows how to handle such cases properly).

This is what Gmail, using its image proxy to load the external image, displays:
gmail-image-with-spaces

This is what Mac OS X Mail displays when viewing the same email:
mail-image-with-spaces

Finally, this is what a web browser (Safari, in this case, though any modern browser should display similarly) displays when viewing the HTML content of the same email:
html-image-with-spaces

Both Mail and Safari know how to handle image URLs with (unescaped) spaces in them, and they do so correctly. Gmail, and its image proxy, fail.

The end result of the Gmail image proxy’s involvement in my email is that I have to extract the (rewritten) image URL from the message, strip off the part referring to the image proxy, and then fix the proxy’s broken encoding of the spaces in the original URL order to have an image URL I can then feed back to my browser and have it load the image (in another tab). Then I have the privilege of switching back and forth between the email text and the image the text is discussing. It’s a pain, but manageable, for one image. If there are several images in the email, repeating the process each time becomes quite annoying.

Back to investigating the problem—this is the URL for the “missing” image in Gmail: https://ci4.googleusercontent.com/proxy/k0FVMVoNhpmHkkXhL6u7S4wzeMzBpLic1ugVLVVM4u-oIK79_Yb7WdjqITdHi0swAcPIGtpPGAK3B_MzoSvG32IRc2E6my-AqwWfDUPCvKezzfDRKGY-Ki9R3JORGPAhydwzYdLH_uxX7lKB2VCT93w=s0-d-e1-ft#https://staticx-tuner.zacks.com/images/articles/thumbnail/motm+cash+sidelines+image_624.png (Google’s systems, e.g., Blogger/Blogspot, tend to use the strange, non-standard practice of escaping/encoding spaces in URLs with the plus sign, even though they percent-encode everything else.) Recall that the filename of the image on the Zacks server is “motm cash sidelines image_624.png” but that the filename shown in the URL in Gmail is “motm+cash+sidelines+image_624.png”, and then evidence of the problem becomes apparent. If the Gmail image proxy tried to request “motm+cash+sidelines+image_624.png” from the Zacks web server, of course that image is not going to be found!

Without knowing more details about how the entire process of scanning an email for external image references, fetching and storing (caching) them via the image proxy, and the rewriting the original email’s image reference to point to the image proxy instead, it’s difficult to tell exactly where the problem lies. For instance, if the part of the image proxy that does the fetching of the external image encodes the URL using Google’s “standard” encode-spaces-with-plus-signs method and tries to fetch that, it won’t find the image. If the fetching part properly percent-escapes/encodes the URL before fetching but stores the image on the proxy server either with its original filename or as the percent-encoded version (which would be “motm%20cash%20sidelines%20image_624.png” for those keeping track), but the rewriting part uses the plus sign-encoding when rewriting the reference in the mail message, things will be broken. (Though it seems ridiculous to have one subsystem do thing one way and another subsystem do the same thing another way, it’s probably not uncommon in large, complex software—I’ve seen things like that before—and may only fail in edge or corner cases that the developer or team might not consider.) Or, if for some reason the image proxy assembles the entire URL that is later found in the Gmail message and then encodes it (using Google’s “standard” encode-spaces-with-plus-signs method) when inserting it back into the email, and then only tries to fetch the image once I, or someone else getting the same Zacks newsletter asks Gmail to load images (or has automatic image loading turned on), it’s going to fail if it doesn’t first change the plus signs back to spaces (or percent-encoded spaces).

It’s hard to say exactly where the bug might be, but make no mistake, it’s a bug; it’s Google’s bug (and it is no doubt caused in part by Google’s use of a non-standard encoding mechanism—spaces escaped with plus signs—in their web software).

        

1 The content of the email itself is actually sent encoded as Quoted-Printable, to protect it from such gremlins as 7-bit mail servers, but that’s not relevant to this bug and so I have decoded the Quoted-Printable here to make the HTML snippet more readily understandable. ↩︎

07.04.16

Your Daily QA: VoteYourPark

Posted in Software, Your Daily QA at 1:05 am by

For some time, I’ve been meaning to start a series of posts that outline software/website issues I run across on a daily basis (because I *do* run across buggy or poorly designed software/websites on a daily basis), in part in an effort to blog more often, but also to shine the light on what we as technology-using people deal with on a daily basis. It’s taken me a while—blogging is hard :P , and so is documenting and writing a good bug report—but herewith the first entry in Your Daily QA.)
——
The National Geographic Society, in partnership with the National Trust for Historic Preservation and American Express, has been running a vote to divy up $2 million in funding to help preserve or restore various sites in National Parks: #VoteYourPark.
#VoteYourPark

It’s a lovely site and a nice idea, but there are a couple of wonky implementation details that routinely drive me crazy.
#VoteYourPark sign-in formWhen you try to sign in using your email address, the sign-in form doesn’t work like a typical form. In most forms, pressing return in a text field will submit the form (or trigger the default button), and if you’ve been using the web on the desktop for a long time, you have finely-trained muscle memory that expects this to be true. Not for this form, however. Even worse, the “button” (it’s really a link camouflaged by CSS to look like a button) isn’t in the tab chain (or at least isn’t in the tab chain in the right location), so you can’t even tab to it and trigger the form via the keyboard after entering your password. Instead, every single day you have to take your hand off of the keyboard and click the “button” to log in.

The fun doesn’t end with the sign-in form, though. After you’ve voted for your parks, you have the opportunity to enter a sweepstakes for a trip to Yellowstone National Park. To do so, you have to (re)provide your email address.
#VoteYourPark sweepstakes sign-in formIn this form, though, pressing return does do something, because instead of being offered the opportunity to enter the sweepstakes, pressing return shows this:
#VoteYourPark sweepstakes sign-in form post-returnIt appears that you have lost your opportunity to enter the sweepstakes as a result of pressing return to submit the form. (If you close the message and then look closely at the area of the page that originally contained the sign-in information, there is a link that will let you enter the sweepstakes after the voting procedure is complete, but it’s fairly well hidden; I didn’t find it until muscle memory in the form submission had caused me to miss several chances to enter by triggering the above screen.)

There are other annoyances (like the progress meter that appears when loading the page and again after logging in), but they’re minor compared to breaking the standard semantics of web forms—and breaking them in different ways in two different forms on the same page. By all means, make your form fit with your website’s needs, but please don’t break basic web form semantics (and years of users’ muscle memory) in the process.