Javascript code golfer

Code golf is the art of making source code smaller, usually by abusing the language in some way. The term "golf" is used because, like the sport of golf where the aim is to complete a course in as few strokes as possible, the aim in code golf is to complete a coding task in as few characters or bytes as possible.

The technique below is frequently used in Javascript code golfing (see dwitter.net and look for some examples). It relies on the fact that Javascript uses UTF-16 (or UCS-2) to store strings. Each character occupies 16 bits, but source code is usually written with 8-bit characters. What if you could store 2 source code characters inside one UTF-16 char? Well, with a bit of hacking you actually can.

Start by typing in some code below (example taken from here):

Resulting script:



Encoding, step by step

First we make sure that the string has an even number of characters and add a space to the end if necessary. For each character, we obtain its code:



Then we merge adjacent chars into pairs, each forming a single hex value:



The next step is prefixing each pair with `%u`, constructing a string of escaped characters:



Finally, we tell Javascript to unescape it, parsing each pair as if their were true unicode characters:



And that's it. Now you just need some code to wrap it around and make it ready for Javascript to parse and run it:



For a total of characters (compare with the characters of your input). Notice that template literals are abused here, taking advantage of the fact that the language doesn't complain about missing parentheses when we pass them as parameter. Also notice that your input string has to be big enough to compensate for the extra code needed to decode the resulting golfed string. That basically means your input has to be at least 96 characters long for the hack to be effective.

Decoding, step by step

To understand how decoding happens, just read it from inside out. Start by escape()ing your golfed string (spaces added for readability):



You will see our pairs back. The trick now is to break them into 8-bit chars again. That's what the `replace() part does. It replaces every `%uXXYY` with `%XX%YY` in a very hacky way with a very short regular expression (again, spaces added for readability):



Then unescape() comes into play, converting those hex values back into characters:



And we're done! We have our original source code back and ready to be eval()uated :-)


Written by Lucio Paiva
2018-12-01