June and the Turing Centennial: A Time to Also Consider Civil Rights

June is LGBT Pride Month. It is also the month in which ACM awards the A.M. Turing prize to special individuals who have made contributions of fundamental importance to computing.

I’d like to take a moment to point out that Alan Turing, for whom the award is named, was gay and was the victim of homophobia and fascist laws of his time (Turing lived in England).

For those who are unfamiliar, Alan Turing is perhaps the most important contributor to modern computation theory.  He made many fundamental discoveries: he described, fundamentally, what types questions can be asked of computers; he presented a system of classifications related to what sort of components are actually needed in a computer to compute; he showed us the truly amazing essence of what makes computation unique by describing something called the halting problem.

Turing was not just a computer scientist though; he was a mathematician. He was critical in breaking German codes during World War II, and without his contribution to the war it is quite possible that it would not have been won.

Turing was also gay. After the war, he was prosecuted for being gay, which was illegal in Britain at the time. He was subjected to chemical castration as punishment for his homosexuality and subsequently committed suicide. It is incredibly ironic that a person whose work was critical to preventing world dominance of German fascism was himself a victim of what was arguably a fascist law and sentiment in the home country that he helped defend.

Alan Turing’s contributions to our modern society are immeasurable. Not only did he help defend the free world, but he was a founding member of a field that has radically transformed the way that we work, think, and live. Anyone who uses the internet or a computer for any purpose whatsoever owes him a great debt of gratitude.

Everyone should take the time this month to remember Turing, not only for the lessons that he taught us about computers and computational theory, but also about civil rights and the role and contributions that people who are different from ourselves can play in our society.

Posted in Uncategorized | Tagged , , | Leave a comment

Check it Out: Curry Functions in R

I’ve been working recently with RHadoop. This is my first time using R, and I’m constantly amazed by what a cool language it is. It has lots of nuanced features that take a while to understand. Its especially good at representing functional programming constructs, which is good because I like functional programming ;).

Today, while hacking together a patch I found this line of code in mapreduce.R:

csv.input.format = function(key = 1, ...) function(line) {

This got me thinking: I wonder if you can create Curry functions in R. It turns out you can!

If you’re not familiar with this functional programming construct, Curry functions are functions that create new functions by replacing the inputs of another function. This allows you to create something that looks kind of like a function that can be partially evaluated at some point and return a “half executed” function for use later.

Take this for example:

> divide = function(x) function(y) x/y
> # now I create a new function, divide two that divides two by something
> divide_two = divide(2)
> divide_two(3)
[1] 0.6666667
> divide(2)(3)
[1] 0.6666667
> three_divided = function(z) divide(z)(3)
> three_divided(2)
[1] 0.6666667

Cool, huh?

Posted in Uncategorized | Tagged , , | Leave a comment

How do You Represent Concurrent Failures in Logs or Stack Traces

One thing that logging and exception handling frameworks have not kept up very well with is stack traces arising from concurrent errors.

The usual approach to concurrency in logging is to just let all the threads or processes dump to the same log. This is fine, for the most part. While it is irritating to see messages interleaved throughout a log from such an arrangement, developers and system admins have been making sense of these types of logs for a long time.

Except that was back when our main unit of parallelism was the thread or process.

It’s only recently that we’ve started to see a proliferation of bridging models like fork/join, map/reduce, actors, etc. These models stand apart in that we actually have some pretty explicit program structure that allows us to do a better job representing failure because we have a very strong link between the fork that occurred in the control or parent process and the site of the failure.

Except the stack is no longer a stack. Its a tree or a DAG.

Most of the methods I’ve seen for this have been around the “fork/join” type paradigm, and basically involve writing out “Call failed with 5 exceptions” and then printing all the other stack traces out, one after another, in the log. Thats certainly workable but kind of messy and difficult to read.

At the same time, people have been representing trees like this:


+----
+---+-----
    +-----

using ASCII art for a long time. Thats definitely a solution, but not ideal, because we’d also like to easily parse the shape of the call graph from the log for graphical log inspection tools. Additionally, this particular format doesn’t work well for DAG’s.

You could also try something like explicitly naming the edges from nodes with out-degree greater than two, but that would again become difficult to read.

The final thought I have is providing output similar to git log (which I’ve pulled from the Git Book page):

$ git log --pretty=format:'%h : %s' --topo-order --graph
* 4a904d7 : Merge branch 'idx2'
|\
| * dfeffce : merged in bryces changes and fixed some testing issues
| |\
| | * 23f4ecf : Clarify how to get a full count out of Repo#commits
| | * 9d6d250 : Appropriate time-zone test fix from halorgium

This is perhaps the best solution, although it still lacks easy parseability. I also don’t know of any logging frameworks that are really set up to do this well at the moment (at least in my native language of Java ;).

How are you representing concurrent errors?

Posted in Uncategorized | Leave a comment

Why Hadoop Is Not a Step Backwards

Today I’m going to show how behind the times I am by commenting on an 8 month old blog post. I recently read this blog post criticizing Hadoop Map/Reduce as being inferior to RDBMS, and offering some historical and current discussion as to why this is the case.

There’s a lot of debate in the database community that is being driven primarily by the increasing popularity of Hadoop for… just about everything. The amazing thing about this debate is that it exists at all.

Hadoop Is Not A Database

Hadoop is actually a growing collection of quasi-independent projects at the Apache foundation. One of the projects is a map/reduce framework. One is a distributed file system. One is an RPC/serialization framework. One is a consensus framework. One is a database called Hbase.

Hbase actually shares a lot in common with RDBMS. It maintains a sorted index of records, and through clever programming a lot of RDBMS operations can be mapped to Hbase/Bigtable operations. The article mentioned above only briefly discusses and criticizes Hbase, and its criticism is totally orthogonal to the post’s thesis (so ignore it).

RDBMS Cannot Do Everything

There’s been a growing tendency over the last decade or so to try to shoehorn everything into an RDBMS without regard to why an RDBMS is being used in the first place. This unfortunate evolution of the RDBMS is reflected in new SQL server features allowing one to store gigabytes of data in a single field, and the hazardous effects of RDBMS overuse are seen in countless applications in the form of poor performance.

RDBMS are good at maintaining sorted, consistent representations of data that offers a balance between write and retrieval optimization.

HDFS and Map/Reduce are different. We don’t quite know all of what they’re good at yet because the two platforms are still rapidly evolving. We know, at least, that they’re good at allowing you to extract information from data where the schema is not yet known, or where the type of schema or index changes rapidly (sometimes on the whim of an analyst). They’re good at storing data that doesn’t need to be sorted or where the order is implicit in the write process (ie. chronological). They’re good at performing operations where all you really need is more space; maintaining all the locking and data structures that are afforded by an RDBMS is simply sometimes unnecessary.

Data Trumps Intuition

Maybe a bunch of small companies are being led astray by these “open source radicals.” Maybe a single rogue internet-scale company is doing all this “NoSQL” architecture stuff because the CTO there likes reinventing the wheel.

The fact is, though, that this is not the case. There’s lots of companies, several of which are very prominent, which have done the math for their particular application and decided to go NoSQL.

The Bottom Line

Its unfortunate to see blog posts trying to tear down the Hadoop ecosystem in favor of RDBMS via platitudinal assertions like “…the entire world saw the value of high-level languages and relational systems prevailed.” RDBMS have their place, and NoSQL data stores have their place as well.

Posted in Uncategorized | Leave a comment

Jobs On Flash (Why I Still Hate Steve, but He’s Sort of Right)

I was reading this article about Apple’s decision to not support flash on its mobile devices, and I hate to say it, but I think ol’ Steve is right.

Don’t get me wrong; Apple has always irked me. They seem like a company caught between two worlds. They benefit tremendously from the open source ecosystem that exists around the BSD/Linux/Unix platforms, but they inevitably layer a proprietary piece of software on those “core” goods that closes the system, keeps out competitors and locks in third parties and users.

A great example of this is the Cocoa framework. If Cocoa could run on Linux, then it would be possible to run most OSX applications on Linux. It would be great for Linux (and consumers), but… maybe not so much for Apple, who relies on their closed platforms to drive their unusually high profits. There’s an existing open source implementation of Cocoa (called GNU Step), but its been so long neglected that you really can’t run Apple applications on it anymore. The bottom line is that Apple uses (and, to be fair, contributes to) open source software 99% of the way, but stops short on the last 1% in order to accomplish their business goals.

Looking beyond software, Apple is also the first company to get in line to screw you on media standards (before even Microsoft!). Sure, the iPhone, iPod, and iPad all work well with the iTunes music (and app and movie and book) store, but what if I want another device (maybe Android based) that performs another technological function better. Maybe it has multithreading, maybe it has a better camera, maybe it shoots fireworks out of its butt. Who knows. Whatever the case, I see no reason that I should be prevented from putting media that I have purchased the rights to on it.

And, as if it wasn’t enough for the company to already have a draconian grip on the music market, it just spent $80m to buy out and shut down an emerging competitor, Lala, that actually sold access to music for a reasonable price, on their (and other) platforms.

The reality is, of course, that Apple only wants open standards in the corner of the internet that it doesn’t have a draconian grip on. And thats only so that it can try to find a way to extend that grip to that untapped market. For Steve Jobs to go around pointing the proprietary platform finger at another company, is, well, just plain hypocritical.

That having been said, I think Apple’s refusal to support Flash is going to benefit the internet as a whole. Adobe’s products have always been a thorn in the side of the creative internet. Flash’s prevalence has, for a long time, prevented alternative technologies from emerging and making Flash-style content readily available to lower-budget web designers. HTML 5 and JavaScript are finally bringing that functionality to the masses, and are doing it at reasonable cost.

The proliferation of open standards for rich internet applications will also force Microsoft to rethink its Silverlight technology, as well as its proprietary extensions to HTML, JavaScript, and (shutter) VBScript.

So, sorry Adobe. Maybe this is the best for everyone.

Posted in Uncategorized | Leave a comment

Media and SWT

I’ve recently decided I want to make a media player in Java, using SWT as a toolkit. My interest in doing so is motivated primarily by the lack of a good, extensible, media player. Anywhere.

Here’s a few example issues I hope to address:

  1. Most (or all) media players don’t have a good abstraction of the notion of a media “repository” and media UI. For example, a repository might be Youtube, an IPod, or a folder on your hard drive.
  2. Most (or all) media players weren’t designed to abstract their core functionality from their UI, so you have to basically write a new player if you want to, say, use the player on a mobile device or 10′ display.
  3. There’s a lot of attention paid to supporting IPods, but not much to Android or other devices.
IMHO, SWT/Java/Eclipse RCP is a great platform for such an application, primarily because of 1. and 2. It would be really great to have a media player which is designed in the RCP style, which can easily be ported to other platforms.
The only problem with this, of course, is that SWT doesn’t really have any native support for media.
I’m considering writing a media “framework” extension for SWT. Who knows, if I get really ambitious I might even try to get it integrated with the SWT core. The first platform I hope to support is OSX.
I’ve fiddled around with gstreamer, and although its decent for use with Linux, I simply don’t see it as being as mature (functionality-wise) as Quicktime, the major media framework in OSX. The bottom line is that Quicktime is stable and it plays pretty much anything.
For some reason, though, the documentation for developing Quicktime applications is pretty… well, terrible. Its all very Xcode/Objective-C/Cocoa specific. There’s a Java version of the framework which Apple seems to have (almost) totally abandoned. I’m almost wondering if it will be easier to use a free framework like gstreamer than Quicktime.
Posted in Uncategorized | Leave a comment

Archos 5

I got an Archos 5 MID for myself for Christmas, with the help of my credit card bonus points and a gift certificate from my mom.

This device is very cool. It runs Android and has a giant screen (5″). I plan to use it to watch movies and whatnot when I’m in the gym.

Unfortunately, getting this little thing running is harder than one might expect. I immediately found that when I connected it, I was unable to transfer anything to or from the device (more on that later). I was also prompted to update its firmware; it began “downloading” the firmware… and then never did. There should have been some sort of timeout or something, but there wasn’t. I then tried to figure out how to connect the thing to my Mac to do a manual update. After plugging and unplugging it several times, and telling it to mount, I gave up and pulled out my wife’s PC.

The PC couldn’t recognize the device either – why, I’ll never know. I don’t know what made me open my eyes after the third or fourth time I plugged in the device to the PC… but whatever the case, I noticed when I was hitting the mount button that it was attempting to mount using MTP (Media Transfer Protocol – a proprietary protocol specific to Windows PCs), rather than as a MSC (Mass Storage) device. A quick run through the system settings, and I was ready to go; I plugged it into my Mac, and then disk utility opened up and wanted to format it.

“Ok.” I thought, “Surely nobody would allow you to format something you shouldn’t.” So I happily formatted it as a FAT-32 drive, and the machine promptly flipped out and demanded I reboot it so it could reformat it’s hard drive.

Upon reboot, I was greeted by a rescue environment. Much to my relief (oddly enough), it asked for an SD card with a firmware copy on it. Great! Now I had my firmware updated, although not using a process I would recommend to anybody else.

Now the other issue – this device uses an EXT-2 file system. This means that the device’s MSC capability will not work AT ALL in Windows or OSX without downloading EXT2-IFS or MacFuse and Fuse-EXT (respectively).

Do all that, and you can just plug in the device and use it. And it actually works pretty well.

Posted in Uncategorized | Leave a comment