The New Econometrics?

First, let me get this out of the way. This is going to be an emotional post. Knowing myself, it will end up reading like a rant riddled with spelling errors (so not much of difference there). Why, you ask? Because I care about Economics and I’m mad we were robbed a good topic that any Econometrics student should be offered. We are still robbed. So read this like a letter to both students and professors.

Econometrics is about data. Econometrics is about analysis and distilling information to obtain the best picture within the data to mimic the population at large. This isn’t statistics used to unearth correlations but, like any self respecting economists, to unearth causations; always trying to answer the why within a phenomenon. So, on one hand I can forgive the unfortunate avoidance of economists shying away from heavy data crunching. However, it is an unforgivable sin to not mention, at least in passing, the wealth of options surrounding students.

So what is this thing I keep raving about? Well, it is known as data mining and/or machine learning(ML). I will avoid explaining the differences between the two (mostly because the answer is a bit vague, especially for the scope of this article) 1. To explain the field itself, it is using algorithms seep through data and obtain meaningful relationships. Alright, that sounds kinda like Econometrics. And that is exactly my point. Having knowledge of the field makes you an even more complete econometrician. Remember all the linear regressions you made in Econometrics? Well, that is the first algorithm found in intro to ML. Basically, the entire semester I spent learning Mathematical Economics (which is like advanced Econometrics) was over in a week. Then came logistic regression (an even more useful algorithm). Then came neural networks. Then came feed-forward neural networks and (wait for it!) backpropagating networks. OK, I will stop with the forced revision. My point still stands on exciting and useful algorithms that can be used to detect relationships and avoid errors.

Ignoring the hype with big data 2, think of how much data is generated every single second. Think of events happening that were once hard to measure/track: mobile phones, geo-location, PaaS, SaaS and multiple ways fixed costs have become variable costs. Hal Varian puts it best,

“There is now a computer in the middle of most economic transactions. These computer¬≠mediated transactions enable data collection and analysis, personalization and customization, continuous experimentation, and contractual innovation.Taking full advantage of the potential of these new capabilities will require increasing sophistication in knowing what to do with the data that are now available” 3

I should note that Hal is the main reason I am writing this post. He works as the chief economist at Google. He has also written one of the most intriguing papers (Big Data: New Tricks for Econometrics) 4 concerning the future of Econometrics – a must read if you have read this far.

I pointed out exciting algorithms that might change the way we approach analysis. Some algorithms correct and anticipate their own errors! Not even joking! Remember when we had to account for bias in sampling? Well, ML has a better solution for correcting for this automatically 5. All me to quote Varian again –

“Our goal with prediction is typically to get good out-of-sample predictions. Most of us know from experience that it is all too easy to construct a predictor that works well in-sample, but fails miserably out-of-sample. To take a trivial example, ‘n’ linearly independent regressors will fit ‘n’ observations perfectly but will usually have poor out-of-sample performance. Machine learning specialists refer to this phenomenon as the ‘overfitting problem.’ ”

To be on point, you end up having algorithms that penalize themselves 6

It will be unfair to blame undergrad Economics’ syllabus for not including ML concepts. I should note that most of these concepts are relatively new. Case in point, Varian’s paper is still a working paper (last revision a week ago as of publishing this post). ML is also mostly computer science driven. The algorithms are not written with Economic theories in mind. This should not be an excuse however because inter-disciplinary studies are not uncommon. There is also the lack of basic coding knowledge associated with most economics students. I, personally, believe any student taking econometrics and wants to go into the field should, at least, have basic coding skills but that is an argument for another day.

In hindsight, this stopped being angsty rather quickly. However, I am still disappointed I missed out on exciting new topics during my earlier economic analysis lessons. Let this be a lesson to any econometrics student. There are mind-blowing projects and ventures popping up. You should not, however, think you will stop predicting wages versus education and age. That thing haunts you everywhere. Seriously, it’s everywhere!

1. [Stack Exchange has a good discussion on the differences.]

2. [I don’t think it’s even hype anymore. You know it’s mainstream when government scandals are invited to the party!]

3. [Varian, Hal. 2014. Beyond Big Data.]

4. [Varian, Hal. 2013. Big Data: New Tricks for Econometrics.]

5. [I understand that some of these methods are already applied in certain Econometrics works. Feel free to point out other interesting projects using these methods.]

6. [One of the funniest tweets from ML Hipster. You should follow him.]

Advertisements

JavaScript and Floating Points Arithmetics

We (i.e. anyone who has played with JavaScript) all have heard something about how floating points are tricky and borderline impossible to deal with. While this is not exclusive to JS, it is worth knowing a thing or two behind the limitations of dealing with floating numbers.

Let’s start with a well known example:

var a = 0.1 + 0.2;
a === 0.3;   // false
console.log(a);   //0.30000000000000004

The only way to deal with this is to use the toFixed() property from the Number object or to convert everything into integers, perform the calculations then convert everything back into decimals. Both methods are not guaranteed to produce the correct result, especially when dealing with complex calculations with various floating point variables.

I found out the best way to understand floating point problems is to use the decimal system most humans are so used to. Try expressing 1/3 in a decimal system in the best way possible. There is literally no way to express it to its precision. There are hacks, like 0.333... repeating, but these are all ways that confirm our lack of expressing 1/3 in decimal. Something similar is happening with JavaScript and floating points.

Anyone who has taken an intro class in Calculus will be familiar with the Zeno’s paradox. To summarize it, 1 + 1/2 + 1/4 + 1/8 + .... will always approach 2 but never be equal to 2. This is because we are always halving our distance from 2. That is exactly what is going on when JavaScript tries to express some floating points.

Consider this Binary code:

Integers:
Binary: 1 => Decimal: 1
Binary: 10 => Decimal: 2
Binary: 1101 => Decimal: 13

Floating points:
Binary: 0.1 => Decimal: 0.5
Binary: 0.0101 => Decimal: 0.3125
Binary: 0.00011001 => Decimal: 0.09765625
Binary: 0.00011001100110011 => Decimal: 0.09999847412109375

As you can see from above, the binary value is getting closer and close to 0.1 (in Decimal) but never actually equals it. It is a shortcoming of expressing certain floating points in binary; in the same way we can never fully express certain floating points (e.g: 1/3) in decimal. You can try this with pretty much any base system (try expressing 0.1 (decimal) in Base 3).

To answer our original issue (i.e. 0.1 + 0.2), calculations are usually transformed into binary, evaluated then converted back into decimal. With its 32-bit limitation, the expression is limited to only 32 floating points. It then becomes:

0.00011001100110011001100110011001 //approx. of 0.1
+ 
0.00110011001100110011001100110011 //approx. of 0.2
__________________________________

0.01001100110011001100110011001100 //the actual result in binary to be converted into decimal

Want to try something even more fun?

for(var i = 0, x= 0; i<10;i++){
  x += 0.1;  //increment x by 0.1 ten times
}

console.log(x); //0.9999999999999999

PS: I should emphasize that this isn’t something that is unique to JavaScript. Most languages by default have this issue. I just used JavaScript because it’s the most comfortable/easy language to express the idea.