What makes good research?

When joining my group, one of my PhD students got blamed by one of my colleagues: With her being best in class across all  theory courses, why would she "waste her talent" with a practitioner like me? Ah, the good old debates between theory and practice. As a researcher in software engineering, none of my work will ever be as elegant as what my colleagues in theory do, attacking the most complex problems with beautiful abstraction. But then, I know colleagues in theory who feel inferior to their more applied colleagues, because what these do actually has impact and interest in the real world.  

Obviously, different people have different ideas on what makes good research. But what are these categories?

Six years ago, I had a conversation with the founder of our faculty, Günter Hotz. Our faculty is well-known for its constant excellent hiring, based on ideas Hotz installed 40 years ago. So I asked him: How do you recognize good researchers? Or better: good research? To which he presented which I hereby name the Hotz model of good research:

It is actually fairly simple. There are three criteria, which are orthogonal. (He raised his hand, spreading off thumb, index, and middle finger at right angles to each other.) You ask the following questions:

First, was it hard? When you look at the research, is this something which required years of training and perspiration, or is this something anyone on the street could have come up with?

Second, is it elegant? Is this something which solves a specific problem, or is this something that you can apply over and over again to new problems?

Third, is it useful? Is this of value only to the academic community, or is this something people outside can use to create value or make money?

If you're good on one of the axes, you are doing good research.  If you're good on two axes, we'll hire you.  And if you're good on all three axes – well, I have yet to meet somebody who excelled in all three.

On this, he looked me in the eye and said:

Actually, now that I think of it, the "usefulness" axis is a fairly recent addition to the system.

There's a number of lessons to be learned from this. First, if your work was very hard, is very elegant, or very useful, you have every reason to be proud – and you deserve the respect from your fellow researchers, even if you don't share the same axes. In software engineering, usefulness is first and foremost, but truly hard or elegant work can still leave me highly impressed. If I get an idea of how it may be put to use, I feel struck by shock and awe.

Second, there is little chance you will be able to excel on all three axes. Usefulness calls for applicability in the real world, whose details will quickly spoil the elegance of your approach. Elegance implies simplicity, and simple is the antonym of hard – unless you claim it was real hard to get your approach simple. Making something really useful is hard, but was it a true intellectual challenge, or just a long engineering process? Difficulty, elegance, usefulness – pick any two.

Third, usefulness is still a recent criterion by academic standards, and as a new kid on the block, it may struggle to get accepted by traditional academics. But computer science has long left the academic ivory towers, and is progressing at a furious pace in the real world. As a young researcher, why limit your impact to dusty publications, or wait until the real world finally adopts your approach?

You have every right to choose the axis (or axes!) on which you plan to excel. Different people make different choices, but remember: The axes are orthogonal, and no axis dominates the others. The only important thing is to push the boundaries as far as you can. Face the hardest problems. Find the most elegant abstraction. Make it really useful. Push, push, push, with all love and respect. That's what makes good research.


August (and Christmas!) should be free of conference deadlines

The deadline for technical papers of this year's ICSE, the International Conference of Software Engineering, has been set to August 17 – a whopping 9 months before the conference takes place, and right in the middle of the busiest holiday month.  In most of Europe, schools and kindergartens are closed for holidays during that time, because it is naturally assumed that you'll be on holiday, too.  I can't even begin to speculate how such a deadline disrupts holidays and family time – and eventually harms the quality of submissions.

In the past ten years, the ICSE deadline has never been that early; except for ICSE 2011, it was always set in September.  For me, this means that half of the papers our group intended to submit for ICSE (essentially, all those which aren't complete next week) will now go to ICST instead.  Not only is the ICST deadline one month later and thus much more family-friendly, the conference is also two months before ICSE, implying a much quicker dissemination of results.

I suggest we as a software research community set up an implicit rule: No paper deadline in August.  If an August deadline is unavoidable, it could be used for abstracts (such that PC members can bid on the submissions), followed by a September deadline for the full papers.  If you agree, go and press the appropriate button.

[Update 2012-12-19: ECOOP 2013 ups the ante by extending their deadline to December 23.  Does the conference really expect researchers to drop their holiday preparations and write a paper instead?]


Are classroom lectures doomed?

I just finished recording an online course on software debugging at Udacity.  The format is superior to anything I could ever deliver in a classroom.  Universities should worry.

Udacitya startup aiming to "democratize university-level education", offers free online courses on various subjects of higher-level education.  At this point, it is mostly computer science: artificial intelligence, programming languages, software testing – and soon: software debugging.  The Udacity format is pretty unique; what you see for the most time is my hand as it is writing, drawing, and doodling on an electronic sketchpad, while I explain and deliver the material.  Every two to three minutes, you get a quiz (which is checked automatically), and you can only proceed if you answered it correctly.

The nice thing about this course format is that it combines the best aspects of the classroom and the textbook experience.  When I was a student, I have sat through many hours of lectures where math professors delivered one proof after the other.  I usually lost track after 10-15 minutes, so what I got from the lecture was a mental note to work through the textbook proof later at my desk.  I always thought that this was just my own experience, but I recently found this was actually shared by many of my current PhD students (who happen to be among the best of their class, including math).  Simply videotaping such lectures and making them available online will not make them more attractive, and I never felt this to be a viable alternative.

With the Udacity format, this is very different: As I am demonstrating a proof, I can come up with a quiz for every step, thus making sure the student really understands what is going on.  The fast students will quickly answer the questions, while the slower students will be able to repeat the unit at will until they are ready.  In the future, one could even imagine additional explanations for students who keep on failing the quiz, or extra challenges for students who come up with quick correct answers.  The course would thus automatically adapt to the weaker as to the stronger students – which is way superior to having a "one-size-fits-noone" classroom lecture.

Yes, preparing an online lecture in this format is quite some work; it is like preparing a textbook and its presentation at the same time. But then, your lecture scales to an unlimited number of students.  And if the students find the online format more attractive than the classroom format, this will have far-reaching consequences for higher-level education.

One could hope that professors will have much more time for face-to-face interaction – in seminars and lab projects, for instance –, and for inspiring and judging the creativity of their students.  What I see, though, is that lectures as we know them may face the same fate as the CD, as the DVD, or as the encyclopedia – namely to be replaced by well-done digital offers.  That alone is not so much of a problem.  But think of the fate of record stores, video rentals, or soon book stores – and you may get an idea of what universities should be fearing.


Coming soon: an online course on debugging

In the past weeks, I have been preparing an online course on software debugging for Udacity, the Silicon Valley startup that aims to "democratize education".  The course will be highly interactive, in seven 60-minute units with quizzes every 2-4 minutes, and deliver a systematic approach to debugging.  During the course, I will have the participants build automated debugging tools in Python, such as
  • an interactive debugger,
  • delta debugging on inputs,
  • inferring dynamic invariants,
  • statistical debugging, or
  • mining software archives.
Preparing these tools in Python was amazingly straight-forward (less than 45 minutes each – but then, I'm an expert); Python offers simple, yet effective tracing facilities that grant access to all events and states during execution.  (My only gripe is the lack of easy static analysis.)  Since systematic debugging is not frequently found in computer science curricula, I hope to cater to students as well as to professionals who are looking for additional training – and, of course, to improve the state of the practice in debugging.

I will be spending the next two weeks with Udacity in Palo Alto to record the units.  The format will be Udacity-like: Most of the time, you'll only see my hand writing, doodling, sketching, and developing the material on the screen while I am talking.  You can always stop and repeat as you like (or fast forward until the next automated quiz).  I am pretty excited about the format, and very much look forward to an exciting course as well.

To see what an online course at Udacity looks like, check out Wes Weimer's CS 262: Programming Languages (Building a Web Browser) and go to "Preview the Class".
To learn more about the recording process, see John Regehr's blog on recording a class at Udacity.
To learn more about my work on debugging, see my book "Why Programs Fail".


Turning iPad PDF annotations into anonymous reviews

Update: Updated script to 1.1 (2012-15-10)

As a researcher, I frequently peer review the papers of other researchers.  The standard way to do this is to print out the paper, make handwritten notes on it, and then convert it all into a written, anonymous review text that gets sent to the committee and the authors.  Recently, I have started reading papers on my iPad – it's more colorful, more interactive, and more portable.  Using GoodReader, I can annotate the papers, too.  But how do I get these annotations into a text that I can send as a review?

Let me show where the problem is.  In the paper to review, I highlight a passage:
In fact, building large applications takes minutes to hours. This is unacceptable.
and add a note "An incremental rebuild would be much faster. Please discuss." to the highlighted passage. (In GoodReader, you get this by tapping on the highlighted text and selecting "Open").  In a PDF viewer, this nicely translates into a highlighted section with a note.  For my students, this is fine.  My anonymous review, however, is supposed to come in text.  I can use GoodReader to turn the annotations into text:

Highlight, 12.04.2012 09:52, Andreas Zeller:
But an incremental rebuild would be much faster. Please discuss.
The problem is obvious: it lacks context (where exactly is the problem?), and won't be helpful for the author.  So I wrote a script for my Mac with which my annotations read like this:

Page 2 "building large applications takes minutes to hours": But an incremental rebuild would be much faster. Please discuss.
This way, the author knows precisely where the issue is; and I can nicely combine markup and comments in one single step.

The script leverages the free Skim PDF viewer and converts the annotations into the form above.  Paste it into AppleScript Editor and run it while Skim shows the annotated PDF; the extracted summary will show up in a TextEdit window.  Enjoy!

-- This script supports anonymous reviews of documents (e.g., submitted scientific papers) via PDF annotations.  It takes an annotated PDF (opened in the Skim PDF viewer) and produces a list of annotations in a form that is easy to edit; this list is copied to the clipboard and also opened in TextEdit. My typical workflow with this script is as follows.
-- 1. I have the PDFs to be annotated in DropBox.
-- 2. I annotate them on my iPad using GoodReader (other programs creating standard PDF annotations should work just as well).  I use the following markups:
--     * UNDERLINE means something straightforward to fix (e.g. typos)
--     * STRIKEOUT means something to delete
--     * HIGHLIGHT means something to comment upon
-- 3. I add NOTES to these annotations (especially the highlights) to comment on text-specific issues.
-- 4. I also add non-anchored notes for general comments not related to a speficic piece of text; this is where the summary and the general assessment goes.
-- 5. After editing, I again sync with DropBox and open the PDF in Skim on my Mac.  Then I run this script, and get the summary in my preferred format in a TextEdit window.
-- 6. I edit the script where needed, and integrate it into the official review form.
-- One can easily extend this script to differentiate more kinds of notes or attributes.

-- Enjoy!
-- Andreas Zeller <zeller@cs.uni-saarland.de>, 2012-10-15

-- Revision Notes
-- 2012-04-15: Revision 1.0
-- Initial Release

-- 2012-10-15: Revision 1.1
-- Adapted for Skim 1.3.22 (extended notes)
-- New: ignore highlight notes without associated text

tell application "Skim"
set res to ""
-- All of this only works with Skim notes, so we convert first
set documentName to name of document 1
set closeAtExit to 0
on error
open (choose file with prompt "Select annotated PDF file")
-- display dialog "Please open the annotated PDF in Skim." buttons {"Quit"}
-- error number -128
set closeAtExit to 1
end try
set documentName to name of document 1
set documentFile to file of document 1
if modified of document 1 then
save document 1
end if
convert notes (document 1)
set res to res & "== " & name of document 1 & "  =="
set pageNotes to notes of document 1
repeat with currentNote in pageNotes
-- display dialog type of currentNote as string
set notePage to the index of the page of currentNote
set noteType to the type of currentNote
set textForNote to return & "Page " & notePage
set skipNote to 0
-- add anchor if present
set noteSelection to the selection of currentNote as text
on error
set noteSelection to "(SELECTION)"
end try
set s to noteSelection
if s is not "missing value" then
-- remove newlines
set s to my replace_chars(s, return, " ")
set s to my replace_chars(s, ASCII character 10, " ")
-- remove hyphens
set s to my replace_chars(s, "- ", "")
-- trim text
repeat until s does not start with " "
set s to text 2 thru -1 of s
end repeat
repeat until s does not end with " "
set s to text 1 thru -2 of s
end repeat
set textForNote to textForNote & " \"" & s & "\""
end if
set textForNote to textForNote & ":"
-- add note type
set t to the type of currentNote as text
if t is "strike out note" then
set t to " please delete."
else if t is "underline note" then
set t to " please fix."
else if t is "highlight note" then
set t to ""
set skipNote to 1 -- ignore highlights unless with text
else if t is "anchored note" or t is "text note" then
set t to ""
end if
set textForNote to textForNote & t
-- add note text if relevant
if text of currentNote is not noteSelection then
set textForNote to textForNote & " "
if t is not "" then
set textForNote to textForNote & "("
end if
set textForNote to textForNote & text of currentNote
if extended text of currentNote as text is not "missing value" then
set textForNote to textForNote & "  " & extended text of currentNote
end if
if t is not "" then
set textForNote to textForNote & ")"
end if
set skipNote to 0
end if
if skipNote is 0 then
set res to res & textForNote & return
end if
end repeat
-- Copy result to clipboard
set the clipboard to res
-- Go back to original file
revert document 1
if closeAtExit is 1 then
close document 1
end if
-- display dialog ("You can now paste this text from the clipboard into your review form:" & res)
end tell

-- Now open TextEdit with the summary
tell application "TextEdit"
make new document
set n to documentName & " - Annotation Summary.txt"
tell front document
set its text to (the clipboard)
set its name to n
end tell
-- save document 1 in ((documentFile as text) & " - Annotation Summary.txt")
end tell

-- Yes, this is how to replace characters with AppleScript.  No comment.
on replace_chars(this_text, search_string, replacement_string)
set AppleScript's text item delimiters to the search_string
set the item_list to every text item of this_text
set AppleScript's text item delimiters to the replacement_string
set this_text to the item_list as string
set AppleScript's text item delimiters to ""
return this_text
end replace_chars