Author:Melvyn Drag

Melvyn Drag

About Melvyn Drag

Melvyn Drag works as Data Scientist with Avlino. He holds Master’s degree in Mathematical Science and Computational Science from Ohio State University. He is a mathematically sophisticated computer scientist who is skilled in OOP, functional programming, GPU computing, web development and data visualization. His favorite algorithms are the Conjugate Gradient Method, Huffman Coding, and the Horner Rule. He loves to work on challenging problems that require clever algorithm design and involve Python, C++, LISP, and CUDA/OpenCL coding in a Linux environment.

Random Triangles

innerImage3a This post is based on a BLOSSOMS lecture given by the famous mathematician Gilbert Strang, which can be found here: https://www.youtube.com/watch?v=XxHIrVTLubE. The question he posed is whether a randomly selected triangle will be acute (all angles less than 90) or obtuse (has an angle greater than 90 degrees).  One of the main results of the lecture was the figure drawn below, and I drew it in Matplotlib because a) the result was really interesting and b) it seemed like a pleasantly challenging plotting task. Before showing the image, here’re a few facts for those who haven’t taken a math class in a while: The equation of a plane is:

  Ax + By + Cz = D

and the sum of the angles in a triangle gives:

a1 + a2 + a3 = 180

, which is the equation of a plane where (A, B, C) = (1, 1, 1) and (x, y, z) = (a1, a2, a3)! The larger triangular region in the image represents the set of tuples (a1, a2, ­a3) of valid angles in a triangle. We get the large triangular region by plotting the section of the plane for which all the angles are non-negative. The boundary of the smaller triangle describes the subset of points where at least angles of the triangle is 90. I have listed three points where two of the angles are 90 degrees for you to see. If you inspect any other point on the boundary of the smaller triangle, you will see that one of the coordinates is 90. The further inspection gives an astounding result. You will notice that any point in the smaller triangle is a tuple of values less than 90. Any point in the large triangular region which is outside of the smaller region contains a value greater than 90. This means that 75% of the area of the triangle corresponds to obtuse triangles, whereas a mere 25% corresponds to acute triangles! So a random triangle is usually obtuse. Please watch the video cited above for a more thorough explanation. This result is analogous to the work we do at Avlino. We tackle our customer’s most challenging problems and provide them with answers that are intuitive and concise, and we go to great lengths to create graphical interpretations of the results that facilitate the client’s understanding. This post on StackOverFlow http://stackoverflow.com/questions/29188612/arrows-in-matplotlib-using-mplot3d was very helpful in understanding how to draw the graph above. The script I wrote to make this plot can be found here: https://github.com/melvyniandrag/Python-Snippets-to-Remember in the plane.py file.
Continue reading

Musings of a Data Scientist

We really are standing on the shoulders of giants. For us to do our job we have to trust that the manufacturers who made our processors, memory, etc. haven’t sold us bit-flipping equipment. Atop that, we have to have faith that the syntax of the programming languages we write, truly maps to what it says it does. For example, we trust that the Python interpreter will correctly handle the commands that we give it, and this relies upon the fact that the underlying C language is correct in theory and in implementation. The next layer of our faith is that the many gigabytes of data we use has been gathered, stored, and shipped without any loss of quality. Then, we must have faith that the algorithms we envision to analyze our data are theoretically sound, and then accurately articulated by our computer programs. There are a lot of delicate moving parts that a Data Scientist has to take into consideration, but we get everything to work! innerImage-musingsOfADataScientist This week we came across a difficult conundrum in that an extended version of a dataset we had been using for months did not match in many columns with the original data – this meant one of two things. Either the data we had been using all along was terribly flawed, or the new data was incorrect. We are neither pilots nor soldiers. We won’t ever have to decide what to do when our engines begin to ice at altitude, or what to do when we’re being shot at. We aren’t surgeons or tightrope walkers, and won’t have to decide what to do to stop excessive bleeding, or how to react to a life threatening breeze. But we still work under pressure and have to make not only bold but CORRECT decisions in the face of seemingly insurmountable adversity. We had to stay up late at night meticulously checking capitalization of variables, considering how the SQL queries must have been written to generate the erroneous results, and evaluating the datasets to see which columns were reasonable and which weren’t. We had to make many phone calls and send emails to the right people, tirelessly working for days on end to ensure Data coherency. And our trust in technology and ourselves wasn’t misplaced. We found a few simple mistakes in the incredibly complex space spanned by hardware, software, and our minds. Everything is back on track, and correct. What a miracle.
Continue reading