The New Econometrics?

First, let me get this out of the way. This is going to be an emotional post. Knowing myself, it will end up reading like a rant riddled with spelling errors (so not much of difference there). Why, you ask? Because I care about Economics and I’m mad we were robbed a good topic that any Econometrics student should be offered. We are still robbed. So read this like a letter to both students and professors.

Econometrics is about data. Econometrics is about analysis and distilling information to obtain the best picture within the data to mimic the population at large. This isn’t statistics used to unearth correlations but, like any self respecting economists, to unearth causations; always trying to answer the why within a phenomenon. So, on one hand I can forgive the unfortunate avoidance of economists shying away from heavy data crunching. However, it is an unforgivable sin to not mention, at least in passing, the wealth of options surrounding students.

So what is this thing I keep raving about? Well, it is known as data mining and/or machine learning(ML). I will avoid explaining the differences between the two (mostly because the answer is a bit vague, especially for the scope of this article) 1. To explain the field itself, it is using algorithms seep through data and obtain meaningful relationships. Alright, that sounds kinda like Econometrics. And that is exactly my point. Having knowledge of the field makes you an even more complete econometrician. Remember all the linear regressions you made in Econometrics? Well, that is the first algorithm found in intro to ML. Basically, the entire semester I spent learning Mathematical Economics (which is like advanced Econometrics) was over in a week. Then came logistic regression (an even more useful algorithm). Then came neural networks. Then came feed-forward neural networks and (wait for it!) backpropagating networks. OK, I will stop with the forced revision. My point still stands on exciting and useful algorithms that can be used to detect relationships and avoid errors.

Ignoring the hype with big data 2, think of how much data is generated every single second. Think of events happening that were once hard to measure/track: mobile phones, geo-location, PaaS, SaaS and multiple ways fixed costs have become variable costs. Hal Varian puts it best,

“There is now a computer in the middle of most economic transactions. These computer¬≠mediated transactions enable data collection and analysis, personalization and customization, continuous experimentation, and contractual innovation.Taking full advantage of the potential of these new capabilities will require increasing sophistication in knowing what to do with the data that are now available” 3

I should note that Hal is the main reason I am writing this post. He works as the chief economist at Google. He has also written one of the most intriguing papers (Big Data: New Tricks for Econometrics) 4 concerning the future of Econometrics – a must read if you have read this far.

I pointed out exciting algorithms that might change the way we approach analysis. Some algorithms correct and anticipate their own errors! Not even joking! Remember when we had to account for bias in sampling? Well, ML has a better solution for correcting for this automatically 5. All me to quote Varian again –

“Our goal with prediction is typically to get good out-of-sample predictions. Most of us know from experience that it is all too easy to construct a predictor that works well in-sample, but fails miserably out-of-sample. To take a trivial example, ‘n’ linearly independent regressors will fit ‘n’ observations perfectly but will usually have poor out-of-sample performance. Machine learning specialists refer to this phenomenon as the ‘overfitting problem.’ ”

To be on point, you end up having algorithms that penalize themselves 6

It will be unfair to blame undergrad Economics’ syllabus for not including ML concepts. I should note that most of these concepts are relatively new. Case in point, Varian’s paper is still a working paper (last revision a week ago as of publishing this post). ML is also mostly computer science driven. The algorithms are not written with Economic theories in mind. This should not be an excuse however because inter-disciplinary studies are not uncommon. There is also the lack of basic coding knowledge associated with most economics students. I, personally, believe any student taking econometrics and wants to go into the field should, at least, have basic coding skills but that is an argument for another day.

In hindsight, this stopped being angsty rather quickly. However, I am still disappointed I missed out on exciting new topics during my earlier economic analysis lessons. Let this be a lesson to any econometrics student. There are mind-blowing projects and ventures popping up. You should not, however, think you will stop predicting wages versus education and age. That thing haunts you everywhere. Seriously, it’s everywhere!

1. [Stack Exchange has a good discussion on the differences.]

2. [I don’t think it’s even hype anymore. You know it’s mainstream when government scandals are invited to the party!]

3. [Varian, Hal. 2014. Beyond Big Data.]

4. [Varian, Hal. 2013. Big Data: New Tricks for Econometrics.]

5. [I understand that some of these methods are already applied in certain Econometrics works. Feel free to point out other interesting projects using these methods.]

6. [One of the funniest tweets from ML Hipster. You should follow him.]

Alternative Cross-Origin Resource Sharing (CORS) Techniques

My first encounter with Cross-Domain Request was when I was creating the UK football league stats table. My first approach was to download the data in a csv file and load it from my server. However, I discovered that it would be inefficient to keep replacing the file with the most recent data after each update (i.e. there are matches played at least once/week). I looked into ways I could easily query the data automatically from the source itself. I tried using AJAX but I had some CORS limitations. In short, the csv file source domain did not allow anyone (specifically any website) to query their data. In the end, I ended up just using a server side solution (which is probably not legal). I was inspired from then to research1 several ways one can implement CORS in different scenarios. Below is a sample of ways one can implement this technique.

Image Ping:

This works for GET requests. You won’t be able to read the response text. However, it’s a good way to make sure the target server receives a notification from the origin page. From what I’ve learnt, it is one of the ways ads can track their views. You can also track user-clicks (or general interaction) without unnecessary interruption.

var imgCORS = new Image();
imgCORS.onload = imgCORS.onerror = function(){ 
 console.log("Done & Dusted");
//we assign both onload and error the same fn to ensure we capture both responses from the server
imgCORS.src = "";

Script tags with JSON Padding (JSONP):

JSONP uses the script tags to communicate with other domains. It has a leg over the Image ping due its ability to read requests. To summarize JSONP, you create a script tag, you assign the source in a special formatted way, and finally you use a callback to read the response. Things to keep in mind:

  • The target source has to be ready to process your request
  • The requester is at the mercy of the target source because the callback will be executing whatever the target source sends back.
  • There is no defined method to process errors due to lack of error handling in browsers (setTimeout can be used but it would be assuming the same connection speed on each user)
function handleResponse(response){
  console.log("Your name is  " + + ", and you're " + response.age + " years old.");
var script = document.createElement("script");
script.src = "";
document.body.insertBefore(script, document.body.firstChild);

Comet/Long Polling:

I have to admit that I’ve never used comet, long-polling, or any form of server-push. My understanding is from a purely theoretical view. Feel free to correct me if need be. To explain it, comet/long-polling is using the server to push data instead of AJAX requesting data. This way you get a real-time response from the server (think sports scores/twitter updates). Short polling involves the browser requesting the server at regular intervals but long-polling reverses the process and holds the gates open (so to speak) until it has something to send. To summarize:

  1. Browser opens up a request
  2. Server holds the gates open until it has something to send back
  3. Browser receives the response from server and closes the request
  4. Browser immediately goes back to #1
function createStream(url, progress, finished){ 

  var xhr = new XMLHttpRequest(),
  received = 0;"get", url, true);
  xhr.onreadystatechange = function(){
   var result;

   if (xhr.readyState == 3){

     //get only the new data and adjust counter
     result = xhr.responseText.substring(received); //read from last end-point
     received += result.length;

     //call the progress callback

   } else if (xhr.readyState == 4){
  return xhr;
  var client = createStream("/streaming", function(data){
   console.log("Received: "+ data);
  }, function(data){

Server-Sent Events (SSE):

SSE is an API for read-only Comet requests. It supports short-polling, long-polling & HTTP streaming. That’s as far as my knowledge of it goes. Read up more about the API here

Web Sockets:

There’s far too much that has been written about Web Sockets. In short, Web Sockets are better versions of comet/long-polling. You should keep in mind that Web Sockets don’t operate on the standard http (hence ws:// So, how does it work exactly? I can verify that it is all magic!

1. [Majority of this information (and code) was obtained from Nicholas Zakas’ Professional JavaScript for Web Developers. The rest I tried to link to the original sources. Let me know if I missed anything!]