Nothing ventured, nothing gained

a blog by Marc Chung

Are you a haikuist?

»

“What is a haiku?”

From wikipedia:

“Haiku is a form of Japanese poetry, consisting of 17 syllables, in three metrical phrases of 5, 7, and 5 syllables respectively.”

Personally, I prefer the way Stephany, an early haikuist describes the way of the haiku:

To write a Haiku
Five then Seven, Five again
Simple, yet complex

And with that, I’d like to introduce you to a fun project that I’ve been working on for a few months. It’s called the Haikuist and it’s aimed at people that love composing haiku poetry.

“What is haikuist.com?”

Haikuist.com
Poetic micro-blogging
for haiku lovers.

Haikuist first began as a place to keep track of my random acts of haiku writings. Also, it stemmed from my deep frustration of the growing number of badly formed haiku being published on the internet. Counting syllables should be easier, so I did what every problem solving software engineer would do: write code!

Syllable detection isn’t technically the most trivial thing to solve, but what started as a 200 line Ruby1 program eventually grew into what you see today at haikuist.com.

I’m extremely excited about Haikuist. It’s fast, simple to use, and offers fascinating insight into the minds of creative people. I am surprised at how poetic people can be.

Haikuist is a work in progress and will continue to improve over time. Sign up at today and invite your friends along for the ride, you may be surprised by what they say.


  1. A programming language that, appropriately, originated in Japan.

A few notes on HTML 5 for developers

»

This past Labour Day weekend, I redid this entire blog in HTML 5. Here are some of the notes I took while diving into HTML 5.

The first thing to keep in mind is that HTML 5 is a work in progress and will continue to be until there exists two browsers with complete HTML 5 implementations. Remember this as you read the various W3C specs, wiki entries, and blog posts–including this one.

Web Developer’s Guide to HTML 5

  • The Web Developer’s Guide to HTML 5 is a good introduction to HTML 5 from an HTML author’s perspective. The biggest takeaway is that there are two syntax modes to HTML 5: Regular HTML syntax and XHTML syntax.

  • After understanding the differences between HTML and XHTML, going with XHTML makes more sense. The syntax is more strict, yet simpler and consistent since it follows XML syntactic requirements. HTML mode permits all sorts of exceptions, which ends up being a PITA to remember. For example, empty attributes (<input disabled>...</div>) is valid HTML, but invalid XHTML. I almost never remember this. I suspect that HTML support is for backwards compatibility reasons (since shipping a browser that breaks ten trillion web pages is evil).

  • For the rest of this post “HTML” means HTML 5 in regular HTML mode, and “XHTML” means HTML 5 in XHTML mode.

  • Using XHTML requires content to be served with a Content-Type of application/xhtml+xml, though in practice this doesn’t seem to be causing a huge problem. There are reasons for not obeying this, for instance HTML 5 allows authors to publish polyglot documents which conform to both HTML and XHTML. More on this below.

  • The other big note is the DOCTYPE, which is used to determine quirks mode rendering.

  • The DOCTYPE declaration isn’t technically required for XHTML documents since they are meant to be delivered with the correct XML MIME type thus instructing the browser to process it as XML in no quirks mode. I set it anyway since it ensures the most standards compliant rendering (It also gives me warm fuzzies).

HTML5 differences from HTML4

  • The next good read is HTML5 differences from HTML4. The most interesting thing I gleaned from this document is when it’s authors consider HTML 5 complete: “The HTML5 specification will not be considered finished before there are at least two complete implementations of the specification. This is a different approach than previous versions of HTML had. The goal is to ensure that the specification is implementable and usable by designers and developers once it is finished.” Pragmatic and competitive. I like this approach as it allows future browser implementors (however unlikely they are) to have access to two reference implementations.

  • To help you decide whether or not to adopt HTML 5, you can review the current state of the various browser implementations and also the comparisons between HTML 5 layout engines. Also make sure to check out the W3Schools’ browser statistics, which lists browser usage and trends on a month-by-month basis.

  • It’s very clear that HTML 5 is going to be a lot of things. It looks like they’re revisiting a lot of existing semantics (such as “the definition of URL” and “the origin concept”) and standardizing on their definitions.

  • I previously mentioned that HTML 5 allows authors to construct polyglot documents. One reason for this is so other markup languages like MathML and SVG can be included in the same document, but only in HTML mode.

  • The document then goes on to outline changes in elements and attributes. There are well over two dozen new elements addressing everything from document publishing (<section>, <article>, <header>, <footer>, <nav>), accessibility (<hidden>, <progress>), typography (<ruby>), and visualization (<canvas>) needs. Elements are also removed, notably support for <frames> which we all know has brain damaging affects on usability and accessibility.

Practical Resources

If you have any other great sources on getting started with HTML 5, make sure to share them in the comments below.

My software opinions

»

What follows are some strong opinions I have regarding software development. Most of these aren’t original, but together, they form the framework for how I think about software and it’s design and development in a production environment.

In no particular order:

  • Software engineering is a discipline. You don’t master the art of software engineer just because you have a passion, a hobby, a degree from a university, or a book titled “C++ in 365 days”. These factors may help, but it’s discipline that keeps you practicing, studying, and applying the scientific method, which ultimately makes you a better software engineer.

  • Focus on delivering value to customers. If you’re write it, test it, buy it, sell it, pay for it, run it, own it, or use it, you are a customer. Any single project or organization will have multiple customers. Figure out who your customers are and how to maximize their collective goals.

  • Reading software should be a pleasure. Good software reveals it intentions by following conventions and using design patterns. If software is primarily read by other software engineers, it should be written in a manner for other software engineers to read. Sometimes this manner is called a domain specific language.

  • Learn to build, follow, and break conventions. Conventions exist everywhere either by accident (a recurring pattern) or intentionally (by design). A team should follow conventions and also establish them when they don’t exist or capture them when they begin to emerge. Keep in mind that even conventions can have scalability issues. Teams should recognize when an existing convention stops scaling and introduce new ones. For large bodies of software every piece should have a place, and every place should have it’s piece.

  • Keep things simple by recognizing high compute costs. Never try to design everything up front and all at once. When developing new features, it’s easier to do so on top of components that are small and decoupled rather than large and complex. That’s why it’s important to start work by doing the simplest thing possible.

  • Testing is important, but not as important as strong team communication. A program with 100% test coverage doesn’t imply that it’s 100% functional. In fact, that opinion is extremely dangerous since it’s possible to have a busted program with 100% test coverage. Better communication will always beat better coverage.

  • Building products is a team sport. Finding the right people, motivating them, and putting them in the right positions is key to the success of a product. Don’t just hire someone for what they already know, hire them because of their ability to learn. The idea that great products are built by people in silos is no longer the truth. Collaboration and different perspectives make a better team and build a better product.

  • Writing testable code is the trick to writing reusable code. If you’re having a hard time isolating a piece of code, it’ll generally imply you’re going to have a hard time reusing it.

  • Code reusability isn’t as important as code usability. An API should be usable much like a user interface: it should be clear to users, simple to use, provide shortcuts for commonly completed activities, and consistent.

How Closures Behave In Ruby

»

Closures are commonly used to abstract over an expression or a statement.

Ruby gives you–not one–but three ways of creating closures from a block: Using Proc.new, the Kernel#proc method, or the Kernel#lambda method. So what’s the difference between these three techniques?

In Ruby 1.81, Kernel#lambda is an alias for Kernel#proc, so they actually behave identically. Proc.new, on the other hand, has slightly different semantics.

stargate$ cat a.rb
def kernel_proc
  proc { return "within proc" }.call
  return "kernel_proc"
end

def proc_class
  Proc.new { return "within Proc.new" }.call
  return "proc_class"
end

def kernel_lambda
  lambda { return "within lambda" }.call
  return "kernel_lambda"
end

if $0 == __FILE__
  puts "return from " + kernel_proc
  puts "return from " + proc_class
  puts "return from " + kernel_lambda
end

With Ruby 1.8:

stargate$ ruby a.rb
return from kernel_proc
return from within Proc.new
return from kernel_lambda

JRuby 1.1.52 also observes the same semantics:

stargate$ jruby a.rb
return from kernel_proc
return from within Proc.new
return from kernel_lambda

Ruby 1.93 has a different story.

In Ruby 1.9, Kernel#proc will behave identically to Proc.new. The reason behind this is described by David A. Black in this post to ruby-talk:

Matz agreed that it was confusing to have proc and Proc.new return different things, and said he would deprecate proc.

Eigenclass.org also confirms that proc is now a synonym of Proc.new further describing that both receive their arguments with multiple-assignment (block) semantics.

With Ruby 1.9:

proc{|a,b|}.arity               # => 2
proc{|a,b| "bacon"}.call(1)     # => "bacon"
Proc.new{|a,b|}.arity		# => 2
Proc.new{|a,b| "bacon"}.call(1) # => "bacon"

With Ruby 1.8:

proc{|a,b|}.arity               # => 2
proc{|a,b| "bits"}.call(1)      # => wrong number of arguments
Proc.new{|a,b|}.arity           # => 2
Proc.new{|a,b| "bits"}.call(1)	# => "bits"

Unfortunately, this change means that Kernel#proc may lead to undesirable side-effects in your code:

With Ruby 1.9

stargate$ ruby1.9 a.rb
return from within proc
return from within Proc.new
return from kernel_lambda

With MacRuby 0.34

stargate$ macruby a.rb
return from within proc
return from within Proc.new
return from kernel_lambda

When in doubt, go with Kernel#lambda. It’s a safe bet.

Updated: Pending further review, I’ve omitted the bit about Proc.new violating Tennent’s Correspondence Principle.


  1. ruby 1.8.7 (2008-06-20 patchlevel 22) [i686-darwin9]

  2. jruby 1.1.5 (ruby 1.8.6 patchlevel 114) (2009-01-22 rev 6586) [i386-java]

  3. ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-darwin9]

  4. MacRuby version 0.3 (ruby 1.9.0 2008-06-03) [universal-darwin9.0]

Why We Test Software

»

My first unit test

I remember my first passing unit test well.

It was written for a lab assignment for an undergraduate computer science course I took during Spring 2000 at Arizona State University, CSE200, an Introduction to Object-Oriented Programming, or something like that. Back then, though, it wasn’t called a unit test.

CSE200 had close to a hundred students every semester with each student responsible for writing about a dozen lab assignments.

As part of your assignment, you had to write a main function which read input from cin (or System.in) and passed along any arguments to the rest of your program for processing. When your program was done processing, it would write the results to cout (or System.out). To test your lab assignment, the tester (aka the QA TA) would run main and pass in arguments to your assignment. The tester would record your actual output and compare it against the expected output, complaining if there was a difference. There were many factors that went into your assignment’s final grade such as documentation, and your code’s overall object-orientyness, but generally, the more complaints recorded by the tester, the lower your grade.

For fun, I’m told, you could ask the professor to run your assignment against the extra credit data set which basically threw all sorts of data (good and bad) at your program.

Students taking this class were forced to make a decision: either they rose to the challenges of writing code to spec, or they dropped the class before the first lab assignment was due.

All code starts out as exploratory

I’d be lying if I said that I could sit down at a computer and write a program that ran perfectly, without making a single mistake. Most engineers can’t do this, nor should they.

When writing new code, most of my time is spent exploring ideas and intentionally proving to myself that the computer is doing what I’m telling it to. Programmers like to call this hacking, as in “I have a cool idea, give me a couple of hours to hack something out.”

A programming language geared towards this kind of exploratory hacking always come with an interactive console sometimes referred to as the REPL1. Languages like Ruby, Python, JavaScript, Erlang, Groovy, and Lisp ship out of the box with a REPL, while languages like C++ and the Java programming languages don’t2.

The REPL serves a single purpose: to allow programmers to freely experiment with their ideas both interactively and iteratively. It’s a forgiving environment. It neither cares if you make mistakes nor requires you to write ceremonious amounts of code in order to prove an idea works. A REPL lets you write and verify that your code works, so it often inspires or serves as the early revision of your program.

Have you ever gotten one of those “ah hah” moments when learning something new? I’m sure you know what I’m talking about. They’re the moments that further your domain knowledge on a subject. The moments that build on each other, eventually bridging the distance between novice and master. Frequently using a REPL to learn a programming language causes “ah hah” moments all over the software part of your brain.

You can think of exploratory hacking as the first step to writing solid code.

Writing solid code

In the <insert your industry here> industry, the most important thing you need to learn is how to communicate ideas effectively. To a software engineer, this means learning how to communicate ideas with your code which–depending on the collective professional experience of you and your team–ultimately means writing tests which communicate and enforce design.

There’s a ridiculously large body of knowledge on the topic of testing, addressing important questions such as:

  • Whether you should strive for 100% test coverage
  • Which testing framework you should use
  • Would Danica Keller code first, or test first

My personal philosophy on testing is simple:

We test software to drive communication between team members, to define and enforce the behavior of software, and to share mental assertions against what might one day eventually be an extremely large body of code.

This means that a team sometimes needs to write a lot of tests, and sometimes it doesn’t. It also means you can get away without tests if you’re single person working on a prototype.

At OpenRain, when a new engineer joins our team, before they start building features, they’re assigned the task of writing tests for an existing project. Periodically, this catches them by surprise and the reaction isn’t always positive. I reassure them that this is not the case:

We’re not telling you to write tests because we think you’re a bad programmer. We’re not trying to torture you. We know you’re smart. You’ve got a long and successful track record of building software, you play four instruments, you teach the Argentine Tango, and you can program in six languages. But we didn’t hire you to work for us, we hired you to work with us and in order to do that effectively, you need to know how the software works. The process of writing tests will help you do just that. It will give you a chance to explore the code, to learn how different pieces relate to each other, and to make your own assertions about the way things work.


  1. A Read-Eval-Print-Loop is the coolest toy ever. If your programming language doesn’t ship with one, run!

  2. Yes, I know about BeanShell, but I did say “ship out of the box.”

Presentations, Talks, Etc