The research I’m working on involves estimating a firm’s probability of default over a variety of time horizons using the Merton Distance to Default model. The dataset contains daily financial information for more than 24,000 firms over the past 30 years. Given that I am calculating the probability of default over five time horizons, applying the Merton model will require solving the Black-Scholes equation roughly 305 million times. Luckily, the model is easily parallelized because the only data needed for the model, aside from the risk-free rate, is firm specific. This post shows how the Python library Pywren can leverage AWS Lambda to run hundreds of models in parallel, achieving a 270x speed-up over a quad-core i7-4770, with minimal changes to the simulation code. If you are interested in learning more about the model, see my post about implementing the model in Python.
Depending on what you need to solve, you may be able to pass information directly to Lambda via Pywren’s interface. My data was stored in S3 already, so I wrote a function that takes an S3 key and then handles loading the data, running the simulation, and writing the results to S3. The full code is available on my Github repo, but the main function of interest is:
Pywren makes executing your function on Lambda incredibly easy and familiar to anyone that has used Python’s map function or Python’s process pools. Behind the scenes, Pywren handles serializing all the supporting functions, data, etc. that are needed to execute your code on Lambda. This is what it comes down to:
Figure 1 below shows that within about 15 seconds of execution, the number of workers ramps up to a peak of ~900 and holds steady for ~16 minutes until execution finishes. The shaded gray lines show the runtime of each individual job.
Figure 2 shows that the majority of the simulation runs take less than 60 seconds to complete, well under Lambda’s time limit of 300 seconds:
Pywren makes it incredibly easy to leverage AWS Lambda for compute tasks, especially tasks which are inherently easy to parallelize. For more details on the exact performance that can be harnessed from Lambda, see the benchmarking done by Pywren’s authors, Eric Jonas et al.