This realization has crept upon me. Even though I’ve talked about the flatness of information structures before, I’m seeing a strong pattern emerge which is, well, kind of blowing my mind. Keep in mind, I am a serious structuralist. I spent endless hours as a kid constructing vast cities out of pillows and furniture. I’ve always been drawn to the 3 dimensional representation of how we think, knowing full well that I could never be satisfied with a simple hierarchy of knowledge. Later in life, when I heard about neural networks, I was very turned on. Everything could connect to anything. Everything does connect to everything.

Point clouds

The flattening of my world finally hit an interesting point as I began to work with Google Refine (soon to be relaunched as Open Refine). I was trying to rework an Excel Spreadsheet for Ying Chan’s graduate seminar on Food Security. We found this very interesting dataset which maps food issues based upon news reports. The spreadsheet was in Chinese, but I wasn’t so worried. I was probably more taken back by the fact that I didn’t know all the names of China’s provinces in English, so what’s the big deal if I don’t know the Chinese. Besides, it’s all just a pattern. Anyway, the data was weird. I could see it immediately. The column which listed the location from the report, was a multi-valued set (comma separated), which listed the various regions in which the food issue occurred. My goal with this data, was to produce a map, using Google Fusion Tables. So before I did anything, I needed to sort out this weird column.

Now, I’ve always hated working with spreadsheets. This may have something to do with my lack of financial acumen, or might be a product of my early introduction to relational tables, but I’m pretty sure it is grounded in my deep need to create a structure – a multi-dimensional structure from the information. Simply put, I wanted this data in a database. I wanted to create at least a 2nd normal form relational database, so that regions could be linked to incidents. But because of how I was planning on working (using fusion tables), I needed to keep this data all flat. I resigned myself to this substandard position and began to work with Google Refine.

This problem was rather trival, once I could get past the non-trival – namely my own built-in world-view. I first has to fill all empty fields, so that wouldn’t be affected later, but then I simply had to split on multi-valued cells and fill down the empty cells. I now had a flat (and redundant, I kept muttering) structure. But it was ready to be worked with.

This whole time I am amazed at the power of Google Refine, as well as what is possible with both Drive and Fusion Tables. Because they all so neatly link together, they make manipulating and moving the data so easy. And even though I still was struggling with the idea of work with spreadsheets, I began to understand how important this simple structure is.

Although I believe all humans like structure, and even like me, like to form deep understanding of the multitude of relationships that can exist, when it comes to simply doing, we love taking the straightforward path. That’s where we line up everything we know and we duplicate our data, so we can audit and see it for what it is. We don’t want to have to juggle relationships in our minds when it comes to the affairs of the work in front of us. Sure, there is a deep satisfaction from being able to handle such complex relationships, but all of us, aren’t always able to deal with that, day to day.

This flattening of our world – the one I write (and think) about with respect to content is no different. But for some reason, I’ve been hanging on to the notion that it didn’t apply to data, to more pure datasets. The reality is, it is the same.