Lessons learned from contributing to OpenSSL
Lessons learned from contributing to OpenSSL
If you look at a pro golfer, he will make it look easy, you don’t see the hours spent on the driving range, and you’d be forgiven to think that birdies are common occurrences. In my latest adventure I ported simdutf’s base64 algorithm into OpenSSL, improving it by up to 4x. In total,only 1331 LOC that has been added.Over a period of about 6 months, that doesn’t seem like a lot.
In this article, I will go over a few lessons I’ve learned that might not be apparent from just reading the PR itself. It might serve something as an epilogue to my undergrad degree in CS at TELUQ: or how I came from the position of self-taught basement hacker to OpenSSL contributor.
But first, a Preface
I was the code slinger for the project, thus all mistakes mentioned here were my own.
Throughout my stay at TELUQ, my adviser, professor Daniel Lemire, had strong personal beliefs with respect to freedom. On X, that means mostly libertarian leaning posts. For me, it meant that I was free to do (almost) anything I wanted, as long as I could justify it.
In this particular instance, updating OpenSSL was like builiding an apartment in the middle of Kowloon Walled City. The latter housed 21 millions souls in a space no greater than 4 or 5 football fields. The former dwarfed simdutf in LOC (350k), age and number of contributors. In both cases, it was unlikely that anyone was able to hold all this in his or her head.
Neither of us was familiar with the codebase thus making me the proverbial man on the ground. Not by virtue of rank, but merely by virtue of information. De facto, I took ownership of the project.
Lesson #0. Seek out other people as early as possible.
I am a deep introvert. I enjoy programming in isolation, and to a great extent, I believe that (some) degree of solitude is necessary for deep thought. In theory, all you need to get started is my Michelin guide to SIMD programming in 30 days.
However, I would caution against launching completely solo projects even if you have the technical skills to do so.
Mostly for these reasons:
It’s easy to go in a rabbit hole that will never get traction and have little to no impact. The projects you choose reflect the company you keep.
I myself don’t have a dataset that requires base64 encoding at 17 Gb/s.I would not be as challenged technically if I kept it solely to my personal needs, if only because I am spendthrift.
It’s easy to reinvent the wheel by mistake. E.g. There might be tools that the community have developed, but that are not readily accessible by ChatGPT or googling.
In system programming, there are a slew of useful contextual tricks that only come up in obscure places (e.g. not leetcode).
What might take someone a 30 second reply on github pointing to a specific .c file might save you a day or two of sweat.You get better by accumulating a bag of such tricks over time.
It’s easy to work on something that is hard for you in isolation, but your skills are probably best put to use elsewhere. E.g. it is easy to mistake effort for output.
As the old saying goes “Anyone can come up with an encryption scheme that he can’t break.”
When you work with others, the easy pickings are typically already made, so much like a mountain climber following a well worn path in the snow, you have a greater chance of reaching the summit earlier
Lesson #1. Good writing skills are really important
One big downside to working in isolation is that you don’t get to practice communication. To formalize your thoughts takes time and effort.
If I wanted a second opinion, I had to make decisions as to what I should present and what I should leave off the table. If I wanted the project to take another direction, I would have to distill what I knew, present it in a small enough chunk as to be digestible, and make my case.
When I wrote good summaries, it was easier to communicate my ideas, when I did a so-so job, I would have more explaining to do via Zoom (that only happened twice in total during my stay at TELUQ, but it’s worth mentioning).
I kept copious notes during the time I worked on OpenSSL. Originally, this series of blog post was much more expansive, but after all the editing, I kept only maybe 10% of all I’ve written.
Lesson #2. Code is writing too.
This seems obvious, but code is writing too. As such, code should be as simple as possible. How simple? As simple as this line of flash fiction:
“For sale: baby shoes, never worn”
The reason being that as a programmer, you’re selling the idea that the benefits outweigh the costs of understanding your code.
The chance of having a rejected PR is never zero and one thing that’s underappreciated is that maintainers are busy people: with all the discussions, revisions, back and forth, there were maybe 5 people that were nice enough to comment and critique. It took a few weeks for the review. Needless complexity means that’s one extra person that can say “No”.
In this particular case, it didn’t just increase the speed for end users, it would also serve as a model for future use of SIMD intrinsics in the OpenSSL project. That meant , potentially, every future contribution to OpenSSL.
Using common and simple patterns,also makes it a lot easier to onboard other people. I have a lot of empathy for the undergrad who will update my code to AVX2048 20 years from now.
Lesson #3. When porting, simple differences in features can seriously trip you up
There were several reasons for why the project took longer than we both planned: the first is that I underestimated the amount of work it would take for a particular feature. I had a choice to make: either go for Base64 encoding or decoding. Originally , I went for the latter.
These are my notes for March 26th that document two functions for base64 decoding(no need to go through them in details, it’s only to give an idea):
EVP_DecodeUpdate(EVP_ENCODE_CTX *ctx, unsigned char *out, int *outl,
const unsigned char *in, int inl):
Decodes up to inl characters from the input buffer (in). ✅
Stores the decoded output in the buffer out. ✅
The number of output bytes is stored in *outl. ✅
Caller Responsibility: Ensure that the out buffer is large enough for the output data. ✅
Processes data in chunks of up to 80 base64 characters at a time. Uses ctx buffer to do so (❌ We probably need to jettison this as simdutf processes different chunk sizes dictated by SIMD. Everything is kept in SIMD registers )
Buffering: (❌ See above)
If the input chunk is shorter than the internal size and its length is not a multiple of 4 (including padding)
I is buffered in ctx for later processing.
The later processing seems to be done in DecodeUpdate itself
If the final chunk length is a multiple of 4, it is decoded immediately without buffering.✅
Whitespace Handling:
Ignores any whitespace, newline, or carriage return characters. ✅
Added: Important distinction, formfeed(\f) is considered whitespace in Simdutf but not in OpenSSL
Soft End-of-Input: (❌)
The hyphen (-) is treated as a soft end-of-input (for PEM compatibility). https://wiki.openssl.org/index.php/Base64.
Subsequent bytes are not buffered.
A return value of 0 indicates that the soft end-of-input has been detected.
The soft end-of-input, if present, MUST occur after a multiple of 4 valid base64 input bytes.
Note: The soft end-of-input condition is not stored in ctx; the caller must avoid further calls to EVP_DecodeUpdate() after receiving a 0 or negative return.
Error Handling: ✅
If any invalid base64 characters are encountered, or if the padding character (=) appears in the middle of the data, the function returns -1 to indicate an error.
Return Values: ⚠️
A return value of 0 or 1 indicates successful processing.
A return value of 0 additionally indicates that no more input data is expected due to padding(=) or soft end-of-input(-).
Output Ratio: ✅
For every 4 valid base64 bytes processed(ignoring whitespace, carriage returns and line feeds), 3 bytes of binary output are produced (except at the end if one or two padding characters are present).
SRP Alphabet support: (❌)
Not in the documentation, but there is an option in the code to explicitly use the SRP alphabet.
Caller must reject return value: ( ✅ We do not use CTX? )
* Note: even though EVP_DecodeUpdate attempts to detect and report end of
* content, the context doesn’t currently remember it and will accept more data
* in the next call. Therefore, the caller is responsible for checking and
* rejecting a 0 return value in the middle of content.
Side effects on error: it sets the ctx -> num to the partial bytes encoded, and outl* to the bytes decoded
EVP_DecodeFinal()
Should be called at the end of a decoding operation.
It assumes there are no whitespaces in the input.
Differs from our own scalar function in that it does not operate on arrays of base64 characters but rather deals with what is left in the ctx.
Does not decode any additional data.
Return Values: ⚠️=
Returns 1 if there is no residual data.
Returns -1 if there is residual data whose length is not a multiple of 4 (indicating improper padding).
The warning signs and X’s signify that there were differences between OpenSSL’s and Simdutf base64 encoding. In isolation , they don’t look like much. However, much like ants overtaking a tarantula, when combined with a SIMD algorithm, emergent behaviour will overtake the unwary programmer.Each of these points represent a possibility for failure, so another test to write, another possible combination that could go wrong.
In SimdUnicode, a much simpler feature took me around 1.5 months to fix for similar reasons : granted, some of it was in part due to inexperience, but regardless, I would still consider it a tough endeavor today, with an extra year of experience or so. The bullet points should have given me pause, but I decided to continue with this regardless.
No excuse: I thought it was the bigger prize and that it was going to be easy. While there was steady progress, my pride didn’t let me quit. But I had to relent that ploughing through that list was going to take much more time than not, and so after conferring with Daniel, I decided to cut our losses and focus on encoding instead, which did not include error handling.
Not all was lost: the time taken to grok the codebase was directly applicable to the final PR, but being more conservative and less ambitious in the beginning would have been a wiser choice.
Another such feature was newline insertions, but that was a topic of another blog article.
And so that concludes the last three years where I was studying with TELUQ and Daniel Lemire. I really enjoyed my experience, I hope you had as much fun reading them as I had writing.

