Andreas Zeller's Old Blog: April 2016

Ever heard of Paul Erdős? The 20th century Hungarian mathematician is not only known for his numerous contributions to Mathematics, but also for his multiple collaborations, engaging more than 500 collaborators. Frequently, he would just show up on their doorstep, work with them for some hours, and then get a joint paper out of that. A low Erdős number indicates academic closeness to Erdős, and is something one can brag about at academic venues. Yet, if today, Paul Erdős knocked on your door, and asked whether you would like to work with him, you should avoid any collaboration with him – if you work in Software Engineering, that is. Why is that?

The International Conference on Software Engineering, or ICSE for short, is the flagship conference of the field of Software Engineering. If you want to publish and present your greatest work, this is where you submit it. An ICSE submission is reviewed by three peer researchers, whose assessment eventually determines whether your work is accepted or not. Even if your work gets rejected, you at least get detailed reviews and high quality feedback.

Over the years, ICSE has observed that there were authors who apparently were way more interested in the reviews than in getting their papers accepted; authors who would submit up to ten papers, of which none got in; but the authors would at least get thirty reviews, all for free. This motivated ICSE to install a new limitation: Any single author can now appear only on up to three papers. If you have four papers ready for submission, then you are supposed to select the best three.

The ICSE program chairs argue that few authors and even fewer acceptances would be affected by this decision. But the problem with this decision is not the factual impact. It is the potential impact. What if a modern Paul Erdős knocked on your door and offered you to work with him? You'd have to say no, because he would have too many co-authored submissions already. What if you could not submit to the past conference, because you were the one organizing it, and still have work that wants to get published? What if four of your students all have great results at the same time, results that should be shouted out to the world? Well, too bad: you can only submit three of them, causing depression in the fourth student how is left out. None of these is likely to happen, but the fact that it could happen is causing concerns and anxiety, and rightly so. An open petition asking ICSE to revert its new rules has gained dozens of supporters overnight. (Disclaimer: me too.)

The Software Engineering community has members who have literally devoted their lives to Software Engineering research. They have no spouses, no kids, they work day and night. The boys would be out for a skiing weekend, and the girls would be out in their summer clothes – these folks are busy on the paper that they hope will make them famous. They serve on program committees, they write reviews, they organize conferences, they help others on their PhD theses. They are amazingly productive both on their own work as well as on the work of others. And these are the men and women whom the new ICSE rules send the message: Thank you, but no, thank you.

The problem of ICSE – and our community in general – is not so much an abundance of papers. It is the lack of reviews. It is our publications that determine our academic worth; much less so teaching; and even less so service. Great papers get you tenure and a raise, whereas great reviewing might get you a committee dinner. Rationally thinking, why should one spend time on reviews while one might just write papers that would get reviewed by others? Fortunately, the large majority of our community is still driven by the Categorical Imperative: We profit from the reviews of others, so we review their papers, too. What we don't like is members who game the system by not only submitting lots of papers, but also not participating in the review process.

Therefore, what our community needs to do is twofold. First, we need to think about reviewing processes that scale well and get high-quality reviews. The ICSE program board model is a step in the right direction; a VLDB-like journal model might be even better. Second, we should not penalize researchers for their own productivity; but instead create incentives for researchers who spend great effort on reviews and service. Rule by the carrot, not by the stick.

Such incentives for service should not be monetary (these wouldn't motivate researchers anyway); nor should they result in a different reviewing or acceptance process (this would be perceived as unfair). But how about raising the limit of submissions if you have a co-author who is also a frequent reviewer? Or allowing reviewing volunteers to apply for a one-day extension to the conference deadline? (You'd get plenty of applications on the last day :-) Or provide "fast track" journal reviewing for those authors who sport a status of "distinguished reviewer"? With such incentives, if a prolific reviewer like Paul Erdős knocks on your door, you would not boot him, but embrace him instead.

When you're organizing a big scientific conference – conventions where scientists from across the world would convene to exchange their latest and greatest –, you have to think about zillions of different things: rooms, projectors, food, coffee, budget, accommodations, badges, speakers, leaflets, dinners, just to name a few. It is impossible not to make mistakes, but it is generally possible to fix them once you know. The worst mistakes, though, are the ones you never thought they could be made.

Some time ago, I registered for one of these scientific conferences. The process is simple: You enter your details, select optional packages, finally enter your credit card data, and you're done. This being a computer science conference, you would think your data is all secure in the hand of experts. As I can now tell you from experience, this assumption is wrong. Very wrong. This single registration system contained not just one security flaw, but four – all independent of each other.

My Registration Screen

Security Flaw #1: The identifiable ID, or How I would be able to access the data of every conference participant

The fun began when I got my confirmation e-mail. Apparently, I was the first person to have registered with the conference – because my participant number was one. ("Hey – look at me; I am participant number one!" I said.) In my e-mail, I also got a link with which I would be able to access my registration. Following it would immediately lead me to the above registration screen.

The link was a bit unusual, because it would not contain any other "secret" information or token other than my participant number, though. (You might assume it would encode my name, my ZIP code, or some other information tied to me only.) So I asked myself: What would happen if I change the link from "?parm1=1" to, say, "?parm1=2" – that is, participant number two? I entered the link into my browser, and immediately, I saw the registration screen of Lars Grunske, a colleague of mine in Stuttgart, Germany.

Lars Grunske's registration screen

Now having the same privileges as Lars, I would be able to read and change all data at will. The idea arose to have him buy a few extra dinner tickets at his expense, but only briefly so. Trying the same for further participants gave the same (Hi Abhik!, ¡Hola Yasiel!).

In 2011, a similar mistake was made by UNESCO, who also used consecutive numbers for its internship applicants, and who thus leaked hundreds of thousands of applicant records on the Web. (German Article on Spiegel.de) What do you do when you discover such a problem? To protect the integrity of participant data, I dutifully reported the problem to the organizers, who immediately replied the issue would be fixed as soon as possible.

Lesson 1: When handling personal data, set it up such that access requires a secret that cannot be easily guessed.

Security Flaw #2: The Unsanitized input, or How I easily bypassed password checks

The next day, I got a new mail from the organizers. In addition to lots of high end security stuff (which would not protect from guessing a participant number), they now had introduced a secret word only known to the registrant, commonly known as a password.

Okay. I went to the site, and it indeed now requested that I enter my ID and password.

Revised login interstitial screen

Problem solved? Not at all. I sent the above mail to my Post-Docs Marcel and Juan Pablo "JP" Galeotti, whom I had talked about the problem the day before. Minutes later, Marcel Böhme sent me back an intriguing message:

Incredible. Could one really attack the system this way? Ten minutes later, JP chimed in with

Ha! Indeed, this worked like a charm. Eventually, I would simply enter "2' -- " as my ID, and any string as my password – and again, I would be Lars Grunske, and would be able to alter his data at will. Likewise, anyone with the above trick could do the same to my data.

How does this work? Internally, the conference registration system uses a database that is controlled by so-called SQL commands. When I enter my ID, say, "1", and my password, say, "1234", the system selects my data from the database using a SQL command looking like this:

SELECT * FROM REGISTRATIONS WHERE ID = '1' AND PASSWORD = '1234'

Note how the number I entered as ID ("1") becomes part of the command. By entering "2' -- " as ID and "whatever" as password. we get the command

SELECT * FROM REGISTRATIONS WHERE ID = '2' -- 'AND PASSWORD = 'whatever'

In a SQL command, anything starting with two dashes "--" is treated as a comment and ignored. So the system simply fetches the data from the registrant whose ID is 2, ignoring the password. This is known as a SQL injection attack, and the standard way to avoid these is to filter out all characters that would have a special meaning in SQL commands (like "'" or "--").

Refining my ID to, say, "2'; DROP TABLE REGISTRATIONS; -- ", I might even have been able to delete all registration data. (I hope they do backups!) How could one set up a SQL-based system and never have heard about SQL injection? Now this was beginning to get embarrassing.

Lesson 2: When setting up a publicly accessible service, identify common attack vectors and protect against them – for Web sites: buffer overflows, SQL injection, cross-site scripting, etc.

Security Flaw #3: Plaintext Passwords, or How I would now also steal personal passwords from all participants

But the embarrassment was not over yet. Remember how the e-mail above asked users to set up their own passwords? It turned out that the passwords were actually stored and displayed in the clear, as seen on Lars' revised registration screen:

Lars Grunske's registration screen, now with password

The password listed was Lars' password; saving it would allow me to log in with his user ID and password. I could easily have skimmed all passwords of all participants and I could have logged in long after the SQL vulnerability had been fixed.

But storing passwords in the clear is a bad practice for many more reasons. It gives the administrator access to all passwords, which provides an opportunity for thefts. Plus, and this is probably the worst: Many people tend to use the same passwords for different sites. Had Lars indeed changed his password as requested, and for instance chosen the same password he would use for Amazon or eBay, I would be able to log in at these sites on his behalf, and happily order stuff.

Had Lars used the same password he also uses for his mail, I could have accessed all of his passwords, everywhere – a simple click on "I have forgotten my password" links would have triggered "password reset" mails to his account, which I could easily have skimmed. I'm a nice guy, so I did none of this. (Plus, I have a cool joint research project with Lars.)

To cut a long story short: I sent another urgent report to the organizer, and hours later, the SQL vulnerability was closed. The one we had found, that is. No idea whether other vulnerabilities would be hidden somewhere in there, or how the system had been tested for security, if at all.

Lesson 3: Passwords should never be stored, displayed, or transmitted in the clear. Store hashes instead; and if a user requests a new password, create a new one instead.

Security Flaw #4: Compromised Forever, or How nobody would be able to change their passwords

Was it really the case that Lars had ignored the instructions and kept his original password? After all that had happened, I thought that maybe someone had done the same that I had done, and thus now had access to my conference password. So I decided to change it online. However, it turned out that changing the password did not work – you would always retain the old one, which would still be happily displayed for you. The good news was that this way, nobody would have been able to reuse existing passwords – and anyway, had I really wanted my password changed, another mail to the organizers might have done the trick. At this point, I decided not to further stress the relationship between the organizers and their software developers and leave this be.

Lesson 4: Always allow your users to reset their access data if they fear it may have been compromised.

All was well that ended well: The conference was truly magnificent, and as far as I know, nobody's data got compromised in any way. Of course, anybody could have gone through the steps described above, skimming data without ever reporting. But luckily, the first registrant (me) pointed out the issues before some fraudster could have spotted it, and of course, any of my colleagues would have done just the same. When your customers are nice people, consider yourself lucky.

Post Scriptum: The Horrible Homebrew, or Why it may be better to build on well-tested platforms

This brings me to the meta lesson to be learned here – and this tells something about process rather than product. If you set up a system from scratch, be it a conference management system, a shop, a student registration system, whatever, be aware of the many risks this entails, and be sure to have independent and thorough security testing. Using an existing, established, well-tested system instead may lower risk and overall cost, even if it may cost more upfront. When the damage is done, you wish you had decided differently – but then, it may be too late.

Final Lesson: When deciding between building and using a system, consider all risks and associated costs. If you build a new system, thoroughly test it for security. If you use an existing system, be sure it is well tested.

Andreas Zeller's Old Blog

2016-04-17

The new ICSE Erdős penalty, or why we should create incentives for frequent reviewers

2016-04-09

Four security flaws illustrated, all on one conference registration site

Security Flaw #1: The identifiable ID, or How I would be able to access the data of every conference participant

Security Flaw #2: The Unsanitized input, or How I easily bypassed password checks

Security Flaw #3: Plaintext Passwords, or How I would now also steal personal passwords from all participants

Security Flaw #4: Compromised Forever, or How nobody would be able to change their passwords

Post Scriptum: The Horrible Homebrew, or Why it may be better to build on well-tested platforms