Filtering my phone contacts with AWK

A few weeks ago I changed my Android cellphone for a newer one. I needed to import my old phone's contacts into the new phone. Since I don't use cloud storage solutions because of privacy reasons, I had to import the contacts manually, using vCard .vcf files.

Some years ago, when I still used Google services on my phone, the Gmail app decided to create a phone contact for each user I have emailed to. This created a lot of useless contacts. Contacts for whom I don't even know their phone number, only their e-mail. Unsurprisingly, this turned out to be quite annoying. I couldn't easily get rid of these new contacts, so I kept them even though I knew I wouldn't use them.

Since I was already changing my phone, it was a good occasion to finally delete all of these useless contacts. To do this, I would need to:

  • Export my old phone contacts into a .vcf file
  • Find or create a tool allowing me to programmatically delete the contacts without a phone number
  • Save the output into a new .vcf file, ready for import into my new phone

In order to write the tool to filter contacts, I would need to parse a .vcf file. Normally, I would have used Python to address the problem. I could have relied on an external library and trust it didn't have any bugs or vulnerabilities. Or maybe I could create my own .vcf/vCard parsing library, properly tested and documented. However, both options looked very complicated for the problem I intended to resolve. There had to be a simpler solution.

A .vcf file has the following format:

BEGIN:VCARD
VERSION:2.1
N:First Name;Last name;;;
FN:Visible Name
TEL;CELL:123-456-789
END:VCARD
BEGIN:VCARD
VERSION:2.1
N:Other;Contact;;;
FN:Other contact
EMAIL;PREF:user@gmail.com
END:VCARD

As you can see, the details of each contact are delimited by the lines BEGIN:VCARD and END:VCARD. The .vcf format doesn't look complicated. It is just plain text delimited by formatted lines.

Taking into account that .vcf files were simple, and that I only wanted to filter through my contact list once, I used AWK instead of Python. The AWK language is relatively small, and you can learn it in a few hours. Its Wikipedia page looks good enough as an introduction.

I took another look at the problem I wanted to solve, and built the following AWK program:

# lines will save the lines of the contact being processed into an array.
# n represents the length of the lines array. It will increment on each
# iteration.
{
    lines[n++] = $0; # This is like append in Python
}

/^TEL;/ {
    # The contact being processed has a phone number, so I want to keep it
    has_phone_number = 1
}

/^END:VCARD/ { # I reached the end of the contact

    # If the contact had a phone number, keep it (print all the saved lines)
    if (has_phone_number)
        for (i=0; i<n; i++)
            print lines[i]

    # In the next iteration I'll use a different contact. Reset the program's
    # state.
    has_phone_number = 0
    n = 0 # This is like emptying the lines array
}

I ran the program with awk -f program.awk <unfiltered-contacts.vcf >filtered-contacts.vcf. This created a new .vcf file that only contained the contacts with a phone number. It was ready to be imported into my new cellphone.

With just 13 lines of code (discarding comments and blank lines), I made a program that solved my problem perfectly. I didn't overthink it by installing external libraries, creating big class hierarchies, nor making complex file parsers.

It looks like I was way more productive using a 40-year old language than using Python, my go-to language for most problems. Because AWK is intended to be used for handling text files and writing throwaway programs, it was the perfect fit for my problem. Maybe the code wasn't very maintainable, but I don't have to care about it if I planned to discard the program after it ran successfully. I needed a quick solution, and AWK succeeded at it.

I hope that with this short blogpost I explained the essence of the AWK language. It is a fundamental tool for programmers and sysadmins. You can learn the language in a few hours, and it will definitely be a productivity boost.

Here are a few useful resources I used when learning AWK:

There also exists a book about the language written by their authors. I can't recommend it since I haven't read it yet. But in case the resources above make you want to learn more, this book will probably be a great choice.

Greetings!

Ekoparty CTF: Stegano Writeup

Hace unos días fue la Ekoparty Online 2020. Esta vez la conferencia fue online, así que opté por participar activamente en el CTF en vez de asistir a las charlas. Muchos retos me resultaron bastante desafiantes, entre ellos Stegano, uno de los últimos que pudimos resolver junto a EzequielTBH.

Para este reto se nos entregaba una imagen en formato BMP, así que nos imaginamos que la flag iba a estar escondida ahí. Subiendo la imagen al buscador Google Images y agregando la palabra clave steganography, nos encontramos con un blogpost hablando de la esteganografía usada por Loki-Bot. Sabiendo que la temática del CTF era el malware, nos imaginamos que el problema venía por ahí.

Leyendo el blogpost vimos que en una parte se describía, mostrando código C, el proceso por el cual se descifra un mensaje aplicándole un XOR con la clave @y_%_M_ew@ y con la variable de un byte dwKeysize, que representa la longitud de esta clave (en este caso, 10 o 0xA):

Descifrado de mensajes usando XOR

También se mencionaba un proceso por el cual se obtiene el mensaje escondido dentro del fichero BMP:

Proceso de obtención del mensaje oculto en el BMP

Si bien el proceso de descifrado usando XOR se entendió bastante bien, este código para obtener el mensaje en el BMP era algo confuso y tenía bastantes constantes que no entendimos de dónde se sacaron. Hacer un programa que obtenga el mensaje de forma perfecta nos hubiese llevado bastante tiempo, y faltaba poco para que el CTF termine. Sin embargo, la perfección no es necesaria para resolver un reto. Lo único que necesitábamos era obtener la flag, que tiene el formato EKO{...}. No nos importa que la información descifrada conserve su integridad, sino que nos podemos permitir que tenga algunos bytes que son basura, siempre y cuando nos muestre la flag. Teniendo en cuenta esto, usamos una versión simplificada del algoritmo de descifrado. No es para nada perfecta, pero cumplió su propósito.

Si volvemos al código en C que lee el fichero BMP podemos notar algunas cosas:

  • Se calcula desde qué byte empezar a descifrar teniendo en cuenta el header del fichero BMP, el ancho y alto de la imagen
  • Mucha de la lógica en el código consiste en detectar cuándo se llegó al final de una fila para así pasar a la fila siguiente, arrancando desde la primer columna. Los bloques if dentro del código se encargan de hacer esto.
  • En cada iteración del for, se leen dos bytes del fichero (esto se puede ver en el código resaltado). El & 0x0F se queda solamente con los últimos 4 bits de cada byte, ignorando los bits más significativos.
  • Se juntan los 4 últimos bits del segundo byte leído con los 4 últimos del primero, para así escribir un byte por cada dos leídos

Teniendo en cuenta que el tamaño de la flag es de unos pocos bytes y que el ancho del BMP es de 801 píxeles, es poco probable que la flag esté distribuida en dos filas distintas, así que todo el código relacionado a los límites de la imagen se puede ignorar. Además, como no nos importa descifrar información corrupta, siempre y cuando se encuentre la flag, tampoco es necesario calcular desde dónde empezar a descifrar: para simplificar el código, podemos descifrar todo desde el inicio hasta el final de la imagen.

Sabiendo esto, es posible tener una solución más simple al código en C descripto en el blogpost. Solamente habría que ir leyendo el fichero BMP de a dos bytes, juntarlos en un único byte usando operaciones a nivel de bits, y aplicarle un XOR con la clave @y_%_M_ew@. Esto sería bastante sencillo, aunque es necesario tener en cuenta algunas cosas:

  • Como se lee de a dos bytes, no es lo mismo arrancar en un byte par que en uno impar. Una de las dos formas va a producir información incorrecta.
  • El cifrado XOR depende de la posición en la que se arranca. Si arrancamos a descifrar desde una posición incorrecta, la clave puede quedar "desfasada" y descifrar cualquier cosa

Como solución a estos problemas, usamos un método sucio pero eficiente: probemos todas las combinaciones hasta dar con el resultado. Podemos leer primero desde un byte par, después desde uno impar. O podemos probar con todas las posibles rotaciones de la clave (como es de 10 bytes, hay 10 posibles rotaciones).

Finalmente, armamos el siguiente script de Python que lee la imagen y prueba descifrarla usando las combinaciones descriptas anteriormente:

import sys

with open('stegano.bmp', 'rb') as fp:
    bmp_contents = fp.read()

out = open('stegano-output', 'wb')

def try_decode(key, contents):
    encoded = []
    for i in range(0, len(contents)-1, 2):
        first_byte, second_byte = contents[i], contents[i+1]
        encoded.append(((second_byte & 0x0f) << 4) | (first_byte & 0x0f))
    decoded = [(c ^ 0xA ^ key[i%len(key)]) for (i, c) in enumerate(encoded)]
    out.write(bytes(decoded))


ORIGINAL_KEY = b'@y_%_M_ew@'

def possible_keys():
    # rotate the original key
    for i in range(len(ORIGINAL_KEY)):
        yield ORIGINAL_KEY[i:] + ORIGINAL_KEY[:i]

for key in possible_keys():
    try_decode(key, bmp_contents)
    try_decode(key, bmp_contents[1:])  # arrancar desde un byte impar

Una vez que lo ejecutamos y abrimos el fichero stegano-output con un editor hexadecimal, encontramos la flag que resultó siendo EKO{n0m0r3m4lw4r3_and_m04r_st3g4n00000000}:

Flag en el fichero stegano-output

Así pudimos resolver este reto de esteganografía. Si bien este tipo de retos no suelen ser mis preferidos, en este caso lo disfruté bastante. No es el clásico ejemplo de una imagen pasada por el steghide o herramientas similares, sino que requería entender el código que usaba un malware para ocultar la información.

Para concluir el post, quiero destacar la importancia de reconocer que el objetivo de un reto es obtener la flag. Teniendo en cuenta esto, es posible obviar ciertos detalles que no aportan mucho y son bastante tediosos, como fue en este caso la lógica de decidir qué bytes usar para descifrar. Si no hubiese tenido esto en cuenta, quizás no hubiese podido resolver el reto a tiempo.

Espero con este post haber descripto no solo la solución al reto, sino también el proceso que seguimos para solucionarlo. Pronto publicaré las soluciones a algún otro reto que me haya gustado.

Saludos!

Invitación a nerdear.la

El 17, 18 y 19 de octubre de este año se llevará a cabo Nerdear.la: una conferencia sobre devops, desarrollo y temática nerd en general. Se trata de la sexta edición del evento, y es la primera vez que se hace en Ciudad Cultural Konex. La entrada es 100% gratis, al igual que en años anteriores.

Nerdearla 2018

Este año, la conferencia contará con un excelente nivel de charlas que nada tiene que envidiarle a otros eventos, locales o internacionales. El jueves a las 16:35 yo voy a estar dando una charla sobre entornos reproducibles y cómo utilizar Nix como alternativa a Dockerfiles. Acto seguido, GiBA va a hablar sobre una herramienta de logging bastante prometedora que hicieron en Facebook.

Otras charlas que me interesaron:

  • Making Illegal States Unrepresentable (in JavaScript): por el título, supongo que va a hablar un poco de Elm, uno de mis lenguajes de programación preferidos, y mostrar como se pueden adaptar ciertas cosas del lenguaje a Javascript.
  • Diagnosing bad TDD habits with Dr.TDD: una charla sobre TDD y Smalltalk, que encima tiene un video en su descripción! Definitivamente no me la pienso perder.
  • Donald Knuth, TeX y la curva del dragón: hablará de TeX, el predecesor de LaTeX. Si bien no es un sistema que me guste mucho, tiene una historia muy interesante, en especial por tratarse de uno de los proyectos más importantes de Donald Knuth.
  • Standup Matemático: por si creían que para llamarse "nerdear.la" le faltaban cosas nerd.
  • 1969 - 2019: 50 years of UNIX and the landing on the Moon: charla de cierre dada por el conocido Jon «maddog» Hall. No pude asistir a la charla que dio la última vez que estuvo en Argentina, así que esta es mi oportunidad de hacerlo!

También estarán dando un taller de introducción a Bash dictado por las chicas de LinuxChix Argentina que vengo recomendando hace unos días a gente con poca experiencia con el uso de la terminal en GNU/Linux. Es de cupo limitado así que hay que inscribirse primero.

Y para quienes tengan que trabajar durante los días de la conferencia, también contará con un espacio de coworking que, al menos hasta el año pasado, siempre contó con una buena conexión a internet.

Para más información sobre el evento, puden visitar su sitio https://nerdear.la/.

Saludos!

Apéndice

No encontré en la página la descripción completa de mi charla, así que la dejo por acá para quien esté interesado/a:

“It works on my machine!” Seguramente todo desarrollaror/a haya usado esta frase en algún momento de su carrera para justificar la presencia de un bug. Decir esto implica, más allá del bug en cuestión, que hay una diferencia no muy evidente (pero sí perjudicial) entre los entornos de desarrollo y el productivo.

En los últimos años se trató de minimizar estas diferencias por medio de entornos reproducibles. Las herramientas de cloud computing, infrastructure as code y principalmente el uso de containers facilitaron esto. Sin embargo, estas herramientas no siempre garantizan que lo que hagamos sea reproducible. Además, en varios casos su complejidad de uso hace que se usen solamente en entornos de staging y no en las máquinas de los desarrolladores, lo que nos vuelve a llevar al “it works on my machine”.

Voy a hablar de Nix, un lenguaje de programación, package manager y build tool que tiene a la reproducibilidad como idea principal. Nix permite construir un entorno reproducible sin tener el overhead que traen el uso de containers o VMs (aunque también se lleva muy bien con estos). De esta forma, resulta muy conveniente tanto al momento de desarrollar como al servir el software en producción.

Bypassing a restrictive JS sandbox

While participating in a bug bounty program, I found a site with a very interesting functionality: it allowed me to filter some data based on a user-controlled expression. I could put something like book.price > 100 to make it only show the books that are more expensive than $100. Using true as filter showed me all the books, and false didn't show anything. So I was able to know whether the expression I used was evaluating to true or false.

That functionality caught my attention so I tried passing it more complex expressions, like (1+1).toString()==="2" (evaluated to true) and (1+1).toString()===5 (evaluated to false). This is clearly JavaScript code, so I guessed that the expression was being used as an argument to a function similar to eval, inside a NodeJS server. It seemed like I was close to find a Remote Code Execution vulnerability. However, when I used more complex expressions, I was getting an error saying that they were invalid. I guessed that it wasn't the eval function that parsed the expression, but a kind of sandbox system for JavaScript.

Sandbox systems used to execute untrusted code inside a restricted environment are usually hard to get right. In most cases there exist ways to bypass this protections to be able to execute code with normal privileges. This is specially true if they try to limit the usage of complex, feature bloated languages like JavaScript. The problem had already caught my attention, so I decided to spend my time trying to break this sandbox system. I would learn about JavaScript internals, and gain some bucks in case of finding and exploiting the RCE.

The first thing I did was identify what library the site was using to implement the sandbox, given that the NodeJS ecosystem is known for having tens of libraries that do the same thing, and in many cases all of them are doing it wrong. Maybe it was a custom sandbox library used only for the target site, but I discarded this possibility because it was really unlikely that the developers spent their time doing this kind of things.

Finally, by analyzing the app error messages I concluded that they were using static-eval, a not very known library (but written by substack, somebody well known in the NodeJS community). Even if the original purpose of the library wasn't to be used as a sandbox (I still don't understand what it was created for), its documentation suggests that. In the case of the site I was testing, it certainly was being used as a sandbox.

Breaking static-eval

The idea of static-eval is to use the esprima library to parse the JS expression and convert it to an AST (Abstract Syntax Tree). Given this AST and an object with the variables I want to be available inside the sandbox, it tries to evaluate the expression. If it finds something strange, the function fails and my code isn't executed. At first I was a bit demotivated because of this, since I realized that the sandbox system was very restrictive with what it accepted. I wasn't even able to use a for or while statement inside my expression, so doing something that required an iterative algorithm was almost impossible. Anyway, I kept trying to find a bug in it.

I did not find any bug at first sight, so I looked at the commits and pull requests of the static-eval GitHub project. I found that the pull request #18 fixed two bugs that allowed a sandbox escape in the library, exactly what I was looking for. I also found a blog post of the pull request author that explained this vulnerabilities in depth. I immediately tried using this techniques in the site I was testing, but unfortunately to me, they were using a newer static-eval version that already patched this vulns. However, knowing that somebody has already been able to break this library made me more confident so I kept looking for new ways to bypass it.

Then, I analyzed this two vulns in depth, hoping this could inspire me to find new vulnerabilities in the library.

Analysis of the first vulnerability

The first vuln used the function constructor to make a malicious function. This technique is frequently used to bypass sandboxes. For example, most of the ways to bypass the angular.js sandbox to get an XSS use payloads that end up accessing and calling the function constructor. It was also used to bypass libraries similar to static-eval, like vm2. The following expression shows the existence of the vulnerability by printing the system environment variables (this shouldn't be possible because the sandbox should block it):

"".sub.constructor("console.log(process.env)")()

In this code, "".sub is a short way to obtain a function ((function(){}) would also work). Then it access to the constructor of that function. That is a function that when called returns a new function whose code is the string passed as argument. This is like the eval function, but instead of executing the code immediately, it returns a function that will execute the code when called. That explains the () at the end of the payload, that calls the created function.

Result of executing the previous payload

You can do more interesting things than showing the environment variables. For example, you can use the execSync function of the child_process NodeJS module to execute operating system commands and return its output. This payload will return the output of running the id command:

"".sub.constructor("console.log(global.process.mainModule.constructor._load(\"child_process\").execSync(\"id\").toString())")()

The payload is similar to the previous one, except for the created function's body. In this case, global.process.mainModule.constructor._load does the same as the require function of NodeJS. For some reason I ignore, this function isn't available with the name require inside the function constructor, so I had to use that ugly name.

Result of executing the payload that runs the id command in the system

The fix for this vulnerability consisted in blocking the access to properties of objects that are a function (this is done with typeof obj == 'function'):

else if (node.type === 'MemberExpression') {
    var obj = walk(node.object);
    // do not allow access to methods on Function 
    if((obj === FAIL) || (typeof obj == 'function')){
        return FAIL;
    }

This was a very simple fix, bit it worked surprisingly well. The function constructor is available, naturally, only in functions. So I can't get access to it. An object's typeof can't be modified, so anything that is a function will have its typeof set to a function. I didn't find a way to bypass this protection, so I looked at the second vuln.

Analysis of the second vuln

This vuln was way more simple and easy to detect than the first one: the problem was that the sandbox allowed the creation of anonymous functions, but it didn't check their body to forbid malicious code. Instead, the body of the function was being directly passed to the function constructor. The following code has the same effect than the first payload of the blog post:

(function(){console.log(process.env)})()

You can also change the body of the anonymous function so it uses execSync to show the output of executing a system command. I'll leave this as an exercise for the reader.

One possible fix for this vulnerability would be to forbid all anonymous function declarations inside static-eval expressions. However, this would block the legitimate use cases of anonymous functions (for example, use it to map over an array). Because of this, the fix would have to allow the usage of benign anonymous functions, but to block the usage of malicious ones. This is done by analyzing the body of the function when it is defined, to check it won't perform any malicious actions, like accessing the function constructor.

This fix turned out to be more complex than the first one. Also, Matt Austin (the author of the fix) said he wasn't sure it would work perfectly. So I decided to find a bypass to this fix.

Finding a new vulnerability

One thing that caught my attention was that static-eval decided whether the function was malicious or not at definition time, and not when it was being called. So it didn't consider the value of the function arguments, because that would require to make the check when the function is called instead.

My idea was always trying to access the function constructor, in a way that bypasses the first fix that forbids that (because I'm not able to access properties of functions). However, what would happen if I try to access the constructor of a function parameter? Since its value isn't known at definition time, maybe this could confuse the system and make it allow that. To test my theory, I used this expression:

(function(something){return something.constructor})("".sub)

If that returned the function constructor, I would have a working bypass. Sadly for me, it wasn't the case. static-eval will block the function if it accesses a property of something with an unknown type at function definition time (in this case, the something argument).

One useful feature of static-eval that is used in almost all cases, is allowing to specify some variables you want to be available inside the static-eval expression. For example, in the beginning of the blog post I used the expression book.price > 100. In this case, the code calling static eval will pass it the value of the book variable so it can be used inside the expression.

This gave me another idea: what would happen if I make an anonymous function with an argument whose name is the same as an already defined variable? Since it can't know the value of the argument at definition time, maybe it uses the initial value of the variable. That would be very useful to me. Suppose I have a variable book and its initial value is an object. Then, the following expression:

(function(book){return book.constructor})("".sub)

would have a very satisfactory result: when the function is defined, static-eval would check if book.constructor is a valid expression. Since book is initially an object (whose typeof is object) and not a function, accessing to its constructor is allowed and the function will be created. However, when I call this function, book will take the value passed as argument to the function (this is "".sub, another function). Then it will access and return its constructor, effectively returning the function constructor.

Sadly, this didn't work either because the author of the fix considered this case. At the moment of analyzing the function's body, the value of all its arguments it set to null, overriding the initial value of the variables. This is a fragment of the code doing that:

node.params.forEach(function(key) {
    if(key.type == 'Identifier'){
      vars[key.name] = null;
    }
});

This code takes the AST node that defines the function, iterates over each of its parameters whose type is Identifier, takes its name and sets to null the attribute of vars with that name. Even if the code looks correct, it has a very common bug: it doesn't cover all possible cases. What would happen if an argument is something strange and its type isn't Identifier? instead of doing something sane and saying "I don't know what this is, so I'll block the entire function" (like in a whitelist), it will ignore that argument and continue with the rest (like a blacklist). This means that if I make a node representing a function argument have a type different from Identifier, the value of the variable with that name won't be overwritten, so it would use the initial value. At this time I was pretty confident that I found something important. I only needed to find how to set the key.type to something different from Identifier.

As I commented before, static-eval uses the esprima library to parse the code we give to it. According to its documentation, esprima is a parser that fully supports the ECMAScript standard. ECMAScript is something like a dialect of JavaScript with more features, that makes its syntax more comfortable to the user1.

One feature that was added to ECMAScript is function parameter destructuring. With this feature, the following JS code is now valid:

function fullName({firstName, lastName}){
    return firstName + " " + lastName;
}
console.log(fullName({firstName: "John", lastName: "McCarthy"}))

The curly braces inside the definition of the function arguments indicate that the function doesn't take two arguments firstName and lastName. Instead, it takes just one argument that is an object that must have the firstName and lastName properties. The previous code is equivalent to the following:

function fullName(person){
    return person.firstName + " " + person.lastName;
}
console.log(fullName({firstName: "John", lastName: "McCarthy"}))

If we see the AST generated by esprima (I did it by using this tool), we will have a very satisfactory result:

Result of parsing the function using parameter destructuring

Indeed, this new syntax makes the function argument have a key.type different from Identifier, so static-eval won't use it when it overrides the variables. This way, when evaluating

(function({book}){return book.constructor})({book:"".sub})

static-eval will use the initial value of book, that is an object. Then, it allows the creation of the function. But when it is called, book will be a function, so the function constructor is now returned. I found the bypass!

The previous expression returns the function constructor, so I only have to call it to create a malicious function, and then call this created function:

(function({book}){return book.constructor})({book:"".sub})("console.log(global.process.mainModule.constructor._load(\"child_process\").execSync(\"id\").toString())")()

I tried evaluating this expression in a local environment with the last version of static-eval, and I got what I was expecting:

Final working exploit

Mission accomplished! I found a bypass to the static-eval library allowing me to get code execution in the machine that uses it. The only required condition to make it work was knowing the name of a variable whose value isn't a function, and that has a constructor attribute. Both strings, numbers, arrays and objects fulfill this property, so it should be easy to achieve this condition. I only needed to use this technique in the site I was testing, get a PoC of the RCE and claim my money. Pretty simple. Or maybe not?

Discovering that the exploit didn't work in my target

Unfortunately, not. After doing all this work and find an elegant and functional bypass, I realized that it was not going to work in the site I was testing. The only condition required was to have the name of a variable whose value isn't a function, so you might be thinking I couldn't get it to make my technique work. However, it did satisfy this condition. The reason it didn't work is even more bizarre.

To give some context, the site wasn't using static-eval directly. It was using it through the jsonpath npm library. JSONPath is a query language with the same purpose as XPATH but made for JSON documents instead of XML ones. It was initially published in 2007 in this article.

After reading the JSONPath documentation, I realized that it is a very poor project, with a really vague specification about how it should work. Most of the features it implements were probably made in an afterthought, without properly considering if adding them was worth it, or if it was just a bad idea. It's a shame that the NodeJS ecosystem is full of libraries like this one.

JSONPath has a feature called filter expressions, that allows filtering documents that match a given expression. For example, $.store.book[?(@.price < 10)].title will get the books cheaper than $10, and then get their title. In the case of the jsonpath npm library, the expression between parenthesis is evaluated using static-eval. The site I was testing allowed me to specify a JSONPath expression and parsed it with that library, so the RCE there was evident.

If we see the previous JSONPath expression in detail, we can see that the expression passed to static-eval is @.price < 10. According to the documentation, @ is a variable containing the document being filtered (usually it is an object). Unfortunately, the creator of JSONPath had the idea to name this variable @. According to the ECMAScript specification, this isn't a valid variable name. So to make static-eval work, they had to do a horrible thing that is patching the esprima code so it considers @ as a valid variable name.

When you create an anonymous function in static-eval, it is embedded into another function that takes as argument the already defined variables. So if I create an anonymous function inside a JSONPath filter expression, it will create a function wrapping it that takes an argument named @. This is done by directly calling the function constructor, so it doesn't use the esprima patch of before. Then, when defining the function, it'll throw an error that I won't be able to avoid. This is just a bug in the library, that makes it fail when defining functions (both benign and malicious) inside filter expressions. And because of this, my bypass technique won't work with this library.

Just because of the horrible decision of naming a variable @ in a library that is used mainly in JS, where @ isn't a valid variable name in JS, I wasn't able to exploit the RCE in the site and obtain a 4-digit bounty. Why wouldn't the author name it _ (that is a valid variable name), document or joseph!! This time, I'll have to settle only with having discovered a great vulnerability in the library, and having learned a lot about JavaScript.

Conclusions

Even if I wasn't able to get the bounty I was expecting, I had a really good time playing with this library. And I used the concepts I learned to bypass a different kind of restricted JS environments, this time getting an economic reward. I hope to publish this other research soon.

I want to mention again the great previous work done by Matt Austin about static-eval. Without this material, maybe I wouldn't have found this new vulnerability.

As a general recommendation when testing a system, it is always tempting to replicate and isolate one feature of it in a local environment we control, so we can play with it more freely. In my case, I made a Docker instance with the static-eval library to try bypassing the sandbox. My problem was that I only used this instance during the whole research, without corroborating that what I was doing was valid in the real site. If I had done this before, maybe I would have noticed this wasn't going to work and I'd have moved to something else. The lesson learned is that you shouldn't abstract so much over a whole system, and that you should continuously test what you found in the real system, instead of doing it just at the end of your research.

Finally, if you're auditing a site that has a similar system that evaluates user-controlled expressions inside a sandbox, I highly recommend you to play with it a considerable amount of time. It would be strange to find a sandbox system free of vulnerabilities, specially if it executes dynamic, fully-featured programming languages like JavaScript, Python or Ruby. And when you find this kind of sandbox bypass vulns, they usually have a critical impact in the application that contains them.

I hope you enjoyed this post. Greetings!

Extra: Cronology of the vuln


  1. It's worth noting that this is a pretty vague and incorrect definition of what ECMAScript is. My indifference to the JavaScript ecosystem makes me don't even bother in finding a more correct definition. 

Licencia para Hackear está de vuelta!

Ya hace casi 4 años que no publico nada en mi blog. Un poco más si contamos solamente artículos de mi autoría. Bastantes cosas camiaron desde ese momento, en lo personal conseguí un trabajo bastante demandante y arranqué la facultad por lo que mi tiempo libre se vio considerablemente reducido.

Siempre estuve con ganas de retomar el blog en algún momento, aunque no sabía cuando. Hace unos meses me di cuenta de que prácticamente todo el material técnico que leía estaba escrito en inglés. Si bien ahora me llevo bastante bien con ese idioma, no lo hacía cuando arranqué en el mundo de la seguridad informática, y recuerdo lo complicado que era encontrar información de calidad escrita en español. La mayoría de blogs en español que seguía también dejaron de publicar cosas, o se convirtieron en simples propagandas de los productos de Telefónica. Por esto me decidí a retomar el blog de una vez por todas, quizás con contenido diferente al anterior pero manteniendo la idea de publicar contenido técnico, en español y libre de empresas tratando de vender un producto.

La idea en esta nueva etapa es tratar de que todo lo que se publique esté disponible en español. En caso de que sea alguna publicación importante también puede haber una traducción al inglés. Al igual que en la primera etapa, el contenido será principalmente sobre seguridad informática y programación. En particular, ya tengo artículos pensados sobre vulnerabilidades en librerías de NodeJS, soluciones a retos de CTFs y charlas de conferencias.

Algo sobre lo que no tengo buenos recuerdos es tener que usar el editor de Wordpress para redactar los artículos. Por esto voy a cambiar la plataforma que uso por un generador de sitios estáticos. El nuevo material que publique va a estar disponible en licenciaparahackear.github.io, no en el blog viejo. El material viejo va a seguir disponible solamente en licenciaparahackear.wordpress.com, al menos hasta que encuentre la forma de exportar correctamente los posts de un Wordpress a un sitio estático.

Estén atentos, porque pronto se viene la primera publicación de esta nueva etapa, y se viene con todo.

Saludos!