an inside look into the illicit ad industry

So, you want to work in ad tech, do you? Perhaps this will be a cautionary tale…

I have worked my entire life as a contractor. This has had advantages and disadvantages. For example, I am free to set my own schedule, and undertake engagements at my own leisure, but as a result my tax situation is more complicated. Another advantage is that sometimes, you get involved in an engagement that is truly fascinating. This is the story of such an engagement. Some details have been slightly changed, and specific names are elided.

A common theme amongst contractors in the technology industry is to band together to take on engagements which cannot be reasonably handled by a single contractor. Our story begins with such an engagement: a friend of mine ran a bespoke IT services company, which provided system administration, free software consulting and development. His company also handled the infrastructure deployment needs of customers who did not want to build their own infrastructure. I frequently worked with my friend on various consulting engagements over the years, including this one.

One day, I was chilling in IRC, when I got a PM from my friend: he had gotten an inquiry from a possible client that needed help reverse engineering a piece of obfuscated JavaScript. I said something like “sounds like fun, send it over, and I’ll see what I come up with.” The script in question was called popunder.js and did exactly what you think it does. The customer in question had started a popunder ad network, and needed help adapting this obfuscated popunder script to work with his system, which he built using a software called Revive Adserver, a fork of the last GPL version of OpenX.

I rolled my eyes and reverse engineered the script for him, allowing him to adapt it for his ad network. The adaptation was a success, and he wired me a sum that was triple my quoted hourly rate. This, admittedly, resulted in me being very curious about his business, as at the time, I was not used to making that kind of money. Actually, I’m still not.

A few weeks passed, and he approached me with a proposition: he needed somebody who could reverse engineer the JavaScript programs delivered by ad networks and figure out how the scripts worked. As he was paying considerably more than my advertised hourly rate, I agreed, and got to work reverse engineering the JavaScript programs he required. It was nearly a full time job, as these programs kept evolving.

In retrospect, he probably wasn’t doing anything with the reports I wrote on each piece of JavaScript I reverse engineered, as that wasn’t the actual point of the exercise: in reality, he wanted me to become familiar with the techniques ad networks used to detect fraud, so that we could develop countermeasures. In other words, the engagement evolved into a red-team type engagement, except that we weren’t testing the ad networks for their sake, but instead ours.

so-called “domain masking”: an explanation

Years ago, you might have browsed websites like The Pirate Bay and saw advertising for a popular game, or some sort of other advertisement that you wouldn’t have expected to see on The Pirate Bay. I assure you, brands were not knowingly targeting users on TPB: they were being duped via a category of techniques called domain masking.

This is a type of scam that black-hat ad networks do in order to launder illicit traffic into clean traffic: they will set up fake websites and apply for advertisements on those websites through a shell company. This gives them a clean advertising feed to serve ads from. The next step is to launder the traffic by serving those tags on empty pages on the website, so that you can use them with an <iframe> tag. After that, you use an ad server to rotate the various <iframe> tags, and done.

For a long time, this type of fraud went undetected, as the industry was not even aware that it was a thing, or perhaps it was aware, but didn’t care, as they promised more and more traffic to brands that they couldn’t otherwise fulfill. Either way, the clean networks started to talk about cracking down on domain masking, or as the black-hat networks call it, arbitrage or ROI. That means that the attack outlined above with just using <iframe> was quickly shut down, and thus began a back and forth cold war between the black-hat networks and their shell companies, and legitimate networks like Google.

non-human traffic detection

At first, the heuristics deployed by the ad networks were quite simple: they just started to check document.location.host and send it along to the ad server. If an ad tag is placed on an unauthorized domain, the account the ad tag belonged to would be flagged for an audit. But in order to make sure that the URL or domain name was appropriately escaped for inclusion in an HTTP GET request, they had to call window.encodeURIComponent(). This means that the first countermeasure we developed was something similar to:

(function () { let o = window.encodeURIComponent; window.encodeURIComponent = function (x) { if (x === “thepiratebay.org”) return o(“cleansite.com”); return o(x); } })();

This countermeasure worked for a very long time, with some networks, lasting several years. Google solved this attack by simply writing their own implementation of encodeURIComponent and protecting it behind a closure. Other networks tried to do things like:

var isFraud = false;

delete window.encodeURIComponent.toString; if (window.encodeURIComponent.toString().indexOf(“native code”) < 0) { isFraud = true; }

This led to countermeasures like patching XMLHTTPRequest itself:

(function() { let x = window.XMLHTTPRequest.prototype.open; window.XMLHTTPRequest.prototype.open = function (method, url, …) { // code which would parse and rewrite the URL or POST payload here return x.bind(this, method, newurl, …); }; })();

The cycle of patching on both sides is ongoing to this day. A friend of mine on Twitter referred to this tug-of-war as “core war,” which is an apt description: all of the involved actors are trying to patch each other out of being able to commit or detect subterfuge, and your browser gets slower and slower as more mitigations and countermeasures are layered on. If you’re not using an ad blocker yet, stop reading this, and install one: your browser will suddenly be a lot more performant.

enter thistle: a proxy between php-fpm and nginx

When it came to evading the automated non-human traffic detection deployed by ad networks, our game was impeccable: with each round of mitigation and countermeasure, we would only lose a few ad tags, if that. However, we kept getting caught by human review, because they would look at the referrer header and see something like http://cleansite.com/adserver/720x.php, which I mean, is totally suspect, right?

So, we decided what we needed to do was interdict the traffic, and respond with something else if appropriate, namely bare ad tags if a requesting client was already known to us. To do this, we wrote a proxy server which I named thistle, since the thistle flower is known to be the favorite of the most mischievous of faeries, and we were certainly up to mischief! The way it worked is that an <iframe> would enter at a specified URL, which would then tumble through several more URLs (blog articles), in order to ensure the referrer header always matched a real article on the fake website.

This was highly successful: we never got caught again by a manual audit, at least not for that reason.

the cycling of domains

Most advertising traffic is bought and sold using a protocol called OpenRTB, which allows so-called trading desks to buy and sell ad spots in real time, based on historical performance data. This means that, in order to keep CPM rates up, we would have to cycle in and out domains that the trading bots hadn’t seen in a while, or ever.

And that is where the operation started to break down: the fellow I was writing all this code for had his website people applying for ad tags without using an anonymizing VPN. At some point, an auditor noticed that all of these different sites were all made by people with the same IP, even though they had different shell companies, and so on, and shut the whole thing down, by sharing that intelligence with the other ad networks. It was fun while it lasted, though.

By the time that this operation started going off the rails, the overwhelming majority of our traffic was coming from a popular torrent website, which was eventually shut down by the feds, which basically was the end of the operation, as we lost almost all of our traffic.

I figured that was coming when the FBI contacted the person I was working for, asking if we knew anything about our advertising being featured on said torrent website. What ultimately resulted in the shutdown of the website, however, was quite funny: the owner of it was greedy, so the FBI offered to buy ads from him directly. He set up a new ad spot on the torrent website, and then sent bank wire instructions to the FBI agent investigating him, at which point they seized the website and shut it down.

Shortly afterward, the company went out of business, as there wasn’t enough traffic to keep the operation running anymore.

ad tech: capitalism refined to its purest?

As my friend Maia put it, “ad tech is about trying to scam the rest of ad tech as hard as possible, while trying to not get scammed too hard yourself.”

One can therefore argue that ad tech, like the crypto craze, is just capitalism refined to its purest: there is nothing of actual value being bought and sold at prices that are far in excess of what little value actually exists. And, like crypto, ad tech is responsible for substantial carbon dioxide emissions.

In short, do everyone a favor, and use a damn ad blocker. Oh, and don’t work in ad tech. I have wilder stories to tell about that engagement, but sometimes things are better left unsaid.