Aintaerjection

Liberalitas, Crudelitas, Insanitas
Work · Play · Photos · Musings · Projects · About

Project: New York Times Paywall

On March 28th, the New York Times website put up a paywall system; under it, you are allowed to view 20 articles every month, but after that, a nag screen is shown and you cannot dismiss it unless you buy a subscription to New York Times. Justifications aside, the articles still load. However, normal viewers are unable to actually read the article because of the aforementioned nag screen. To me, this is utterly silly, so I wrote up a Greasemonkey script to bypass the whole shebang.

Technically, there is nothing lacking on the page. There is only the addition of two elements that’s overlaid on the page. This was probably the easiest option for NYTimes to implement. However it is necessarily flawed through their reliance on the client-side user agent to do the blocking for them. Let’s walk through the steps.

  1. The article is loaded with the normal query strings (the bit after the ? in the URI), that is, without the gwh query parameter (gateway header?). This contains a JavaScript called mtr.js
  2. The body of mtr.js runs inside a closure and checks for the gwh query parameter. If it doesn’t exist, it injects another script into the page, meter.js, from meter-svc.nytimes.com. In the request for meter.js, it sends along the page referrer and a generated callback function for JSONP style injection. My guess is this is so that they can allow special promotional deals from partner sites for referral visits.
  3. The meter.js injection returns a JSON object with the attribute hitPaywall. If this attribute is true, mtr.js forwards the page to exactly the same page but with the addition of the gwh parameter. The gwh parameter is filled with a server-generated tracking value that is grabbed from the tracking cookie (you did know that NYTimes was using tracking cookies right?). This parameter changes with every request.
  4. The mtr.js script on the forwarded page then grabs the gwh parameter, compares it with the cookie value. If they are different, then the browser is sent back to the normal query string version of the page to begin the process all over again. If they are exactly the same, mtr.js registers a closure-internal loadGateway() function to be run on document load event.
  5. loadGateway() checks that the page is served from myaccount.nytimes.com. This is where content is served if you are a registered subscriber. If it is served from anywhere but there, it injects a third JavaScript file called gwy.js.
  6. gwy.js does three things. It inserts two overlay elements called overlay (imaginative!) and gatewayCreative. The overlay element contains a gradient that blacks out the page, and gatewayCreative shows the nag box. It then sets the style of the body element to overflow:hidden. It also registers a document.onkeypress handler to cancel the events for your left, right, down, up, page down, page up, home, and end buttons. That way you cannot scroll the page at all and are stuck looking at the nag screen laid over the tantalizingly present article. Oh yeah, it also registers a handler for document.body.ontouchmove so just in case you have a gesture-enabled device, you can’t get past that paywall either.

Since the user perspective only cares about the end result (gwy.js throwing up the overlay), defeating the mechanism at any point will work for the purposes of viewing the article. But to be safe, I bypassed it in as many places as I could.

  1. First things first, since the meat of mtr.js is run inside a closure scope, I cannot muck around with function hijacking inside it, which means I must allow it to inject meter.js. The injection mechanism is well known, using a <script> element so as to avoid many XSS issues. No XHR hijacking there.
  2. In fact the first place where I can hijack the entire process is at the attachment of the loadGateway() listener. The Greasemonkey script takes over window.addEventListener (sorry DOM specs), and checks that the page is not trying to register a load or keydown event. All other events are sent to the real thing so the helpful scroll-based suggestion pop-up still works.
  3. However, Greasemonkey scripts are executed at DOMContentLoaded event, which comes after mtr.js, which is loaded in the document head element. Every other hack from here on down deals with racing against mtr.js to set up before the gwy.js creates the overlay, except the last one.
  4. Since a lot of the time we are slower than mtr.js, we can take advantage of things that mtr.js sets up. The generated callback function that meter.js hits must hit after another AJAX call, so in between when mtr.js sets up the callback function and when the AJAX response comes back, we can hijack it to… do nothing. There’s only one problem. The generated callback function is generated with a name starting with a random letter followed by the POSIX time stamp. Since our script executes at a different time, and because the first letter is random, we can’t exactly tell what the function name is. However! We can loop through every object attached to the window context to find one that starts with a letter followed by a time stamp that is not too different from when we’re running and hijack that!
  5. Another thing that is already set up by the time mtr.js runs is a global object called NYTD. In it resides the attribute Hosts.jsHost, which as far as I can tell, is used for only one thing, to generate the path to gwy.js in loadGateway(). Since loadGateway() is registered to the page load event, even if we missed stealing the listener registration, we have the time in between that and when the event actually fires to hijack Hosts.jsHost and point it to a non-existent location.
  6. Finally, if all the above fails to run in time and gwy.js is loaded, the Greasemonkey script has one last brute force trick: find the overlay elements, hide them. This it tries to do, but since the timing on the AJAX call is completely unreliable, we must poll to find out when gwy.js has actually generated those overlays.
  7. That solves the visual elements, but you still can’t scroll. So another CSS rule is injected into the page for body {overflow:scroll !important}. The !important marks the rule as something that should not be overridden, so even if another script were to set body.style.overflow to something else, the the rule marked !important will take precedence.
  8. Lastly, the navigation keys don’t work, not to mention the gesture-based scrolling! This is much more of a hack than all the rest, since there’s no way to tell when gwy.js has overridden the onkeypress and ontouchmove listeners. (Well, there is, but it’s less work this way.) All that happens is for 3 seconds, the browser constantly sets onkeypress and ontouchmove to null.

That is the extent of the hacks around the NY Times paywall. I have no arguments against NY Times putting up a wall to capitalize on their content; however, doing so with the client’s resources (the entire page is loaded, including all ads) then forcing the client to do the server’s dirty work is disingenuous to both the audience and their ad sponsors.

Lastly, here’s a local mirror of the Greasemonkey script, in case Userscripts.org is down.

Tags: , ,

7 Responses to “Project: New York Times Paywall”

  1. Brent says:

    Nice work.

    I took a quick look at the JS being loaded after the paywall was launched and made an educated guess that mtr.js was pretty much the core of things. I simply added that file to my AdBlock blacklist and haven’t seen the paywall since.

    I still can’t believe the NY Times spent +$25 million on such an absurdly inefficient and transparent ‘solution’ to the problem of protecting their content. As you say, sending all the important bits to the client and having it do the work is just silly, and kind of rude.

  2. Andrew says:

    Works great for me! Thanks a lot, just what I’ve been needing!

  3. fontgoddess says:

    If this is how the NYTimes shows outward technical competency, I’m glad that I don’t have to work with their internal CMS or billing system.

    In the mean time, I’ll pay for a subscription when I’ve graduated and have a job. But maybe the best way businesses can get people to give them money is to make people *want* to give them money. Coercing people to do anything is a short-sighted plan, as they are likely to respect a business as much as they are respected by it. If there is an easier, cheaper, or just more pleasant option, customers will most definitely take their business elsewhere and are far more motivated to do so the more they are prodded or shaken down.

  4. Hello just wanted to give you a quick heads up. The text in your article seem to be running off the screen in Safari. I’m not sure if this is a format issue or something to do with web browser compatibility but I thought I’d post to let you know. The design and style look great though! Hope you get the problem solved soon. Cheers

  5. JB says:

    Or you can just browse with Chrome’s incognito mode and never worry about the pay wall.

  6. Aintaer says:

    You can do the same in Firefox with Ctrl-Shift-P. But at that point you might as well start up a new browser just to read NYT’s site. This is unintrusive to your regular browsing habits.

  7. YellowShark says:

    I went a different route – Linux + Squid (forward proxy software), wrote a simple rule that blocks mtr.js & gwy.js… then told Firefox to use my new proxy server.

    However, Chrome’s incognito mode works perfect. I just felt like geeking out with Squid, and being able to maintain my regular browsing habits. :)

Leave a Reply