We really are standing on the shoulders of giants. For us, data scientists, to do our job we have to trust that the manufacturers who made our processors, memory, etc. haven’t sold us bit-flipping equipment. Atop that, we have to have faith that the syntax of the programming languages we write, truly maps to what it says it does. For example, we, data scientists, trust that the Python interpreter will correctly handle the commands that we give it, and this relies upon the fact that the underlying C language is correct in theory and in implementation.
The next layer of our faith is that the many gigabytes of data we use have been gathered, stored, and shipped without any loss of quality. Then, we must have faith that the algorithms we envision to analyze our data are theoretically sound, and then accurately articulated by our computer programs. There are a lot of delicate moving parts that a Data Scientist has to take into consideration, but we get everything to work!
This week we came across a difficult conundrum in that an extended version of a dataset we had been using for months did not match in many columns with the original data – this meant one of two things. Either the data we had been using all along was terribly flawed, or the new data was incorrect.
We are neither pilots nor soldiers. We won’t ever have to decide what to do when our engines begin to the ice at altitude, or what to do when we’re being shot at. We aren’t surgeons or tightrope walkers, and won’t have to decide what to do to stop excessive bleeding, or how to react to a life-threatening breeze. But we still work under pressure and have to make not only bold but CORRECT decisions in the face of seemingly insurmountable adversity. We had to stay up late at night meticulously checking capitalization of variables, considering how the SQL queries must have been written to generate the erroneous results, and evaluating the datasets to see which columns were reasonable and which weren’t. We had to make many phone calls and send emails to the right people, tirelessly working for days on end to ensure Data coherency.
And our trust in technology and ourselves wasn’t misplaced. We found a few simple mistakes in the incredibly complex space spanned by hardware, software, and our minds. Everything is back on track, and correct. What a miracle.