Every year, Jane Street welcomes bright minds for its internship program, offering a unique opportunity to dive into the world of finance and technology. As the 2024 internship season concludes, we’re excited to showcase some of the remarkable projects our interns have undertaken. While the scope of Jane Street Internships is vast and diverse, we’ve selected a few standout projects to illustrate the kind of impactful work our interns contribute.
This year, we’ll delve into projects focused on enhancing our technological infrastructure and tools:
- Arya Maheshwari’s creation of Camels, an OCaml dataframe library inspired by Polars, aimed at improving data manipulation within our systems.
- Arvin Ding’s innovative approach to binary serialization, developing a faster protocol to optimize data processing speeds, crucial for our low-latency environments.
- Alex Li’s work on a time-travel debugger for Limshare, our risk management system, significantly enhancing our debugging capabilities and system observability.
Let’s explore these projects in detail and understand how Jane Street internships provide a platform for real-world impact.
Building Camels: A Dataframe Library for OCaml
Dataframes have become an indispensable tool for data analysis and manipulation, widely used across various domains from databases to spreadsheets and specialized libraries in languages like Python and R. In the Python ecosystem, libraries such as pandas and the more modern, high-performance Polars are essential for data scientists and engineers. Polars, written in Rust, leverages Rust’s concurrency model for significant performance gains, making it attractive to users of both Rust and Python.
At Jane Street, we also leverage dataframes within our OCaml environment. We’ve developed OCaml bindings for Polars, finding them beneficial for their programming convenience and safe parallelism. However, relying on Polars has presented challenges. Rust’s build system, combined with its compilation times, can slow down our development process. Additionally, we’ve encountered limitations and bugs in Polars and its bindings that proved difficult to resolve efficiently.
Fortunately, OCaml is evolving, incorporating performance-oriented features similar to those that make Rust suitable for projects like Polars. Our work on data-race-free parallel programming in OCaml, utilizing modes and unboxed types, opens new possibilities for building high-performance systems directly within OCaml.
This context led to the inception of Camels, a pure OCaml dataframe library. We envisioned it as a solution that would integrate seamlessly with our existing OCaml applications and serve as a practical testbed for OCaml’s advanced language features. This summer, Arya Maheshwari took on the challenge of creating the foundational version of Camels. The primary objective was to establish the core structure and APIs, setting the stage for future performance optimizations.
Arya focused on designing an API that balances ease of use with the potential for significant optimization. A key design decision was to separate the syntax of a computation from its semantics. This means that defining a computation, like the running_sumproduct
function below, merely constructs an expression representing the computation, rather than executing it immediately.
let running_sumproduct ~value ~weight ~ordering =
let open Expr in
let product = float value *. float weight in
let sorted_product = sort_by product ~by:(float ordering) in
cumsum sorted_product
This separation allows for a compilation phase where these expressions can be transformed into more efficient forms before execution. The execute_running_sumproduct
function demonstrates this, compiling the expression into an optimized query before running it against a dataframe.
let execute_running_sumproduct df ~value ~weight ~ordering =
Query.select (Query.view df) ~cols:[ running_sumproduct ~value ~weight ~ordering ]
|> Dataframe.compile_exn
|> Dataframe.execute
Another interesting design aspect of Camels is its approach to broadcasting. Broadcasting allows scalar values to be implicitly expanded into columns for operations involving both scalars and columns. While convenient, implicit broadcasting can sometimes lead to confusion. Camels opts for explicit broadcasting. For instance, adding 3 to a column requires explicitly broadcasting the scalar 3 to match the column’s dimensions:
let add3 column =
let open Expr in
int column + broadcast (int' 3)
To ensure type safety and catch broadcasting errors early, Arya implemented a type system for expressions. This system tracks the data type and whether an expression represents a scalar or a column. For example, omitting the broadcast
in the add3
example:
let add3 column =
let open Expr in
int column + int' 3
results in a compile-time error, clearly indicating a type mismatch related to column length:
This expression has type (int, Length.one) t but an expression was expected of type (int, Length.input) t Type Length.one is not compatible with type Length.input
While Camels is still under development, Arya’s internship work has established a robust foundation. Future development will focus on incorporating SIMD and multicore parallelism, and further refining query planning and expression fusion during the compilation phase. Arya’s contributions during her Jane Street internship have laid a promising groundwork for a powerful, native OCaml dataframe library.
Optimizing Binary Serialization for Speed
In the fast-paced world of trading, low-latency systems are paramount. Jane Street’s systems are designed to react swiftly to incoming market data: ingest data, perform calculations, and potentially transmit responses, all in minimal time. Beyond immediate trading actions, our systems also require robust logging of transactional information for subsequent analysis and debugging.
This logging process must be extremely efficient to avoid any performance drag on the primary trading operations. Typically, we achieve this by serializing critical data into a compact binary format. This serialized data is then handled by a separate, less time-sensitive process for further formatting and storage in our logs.
For serialization, we often employ Binprot, a highly efficient binary serialization format. We utilize code-generation syntax extensions to streamline the creation of serializer and deserializer code. However, Binprot is designed to balance speed and compactness, and isn’t specifically optimized for raw writing speed. It prioritizes minimizing the size of the serialized output, sometimes at the expense of write performance.
Arvin Ding’s internship project addressed this specific need: developing a library focused purely on maximizing serialization speed for OCaml data types, even if it meant increased output size. The core strategy was to move away from Binprot’s variable-length encoding of integers. OCaml integers are 64-bit (effectively 63-bit), typically requiring 8 bytes for representation. Binprot employs variable-length encoding to represent smaller integers using fewer bytes, as shown in this excerpt from Binprot’s integer encoding code:
let bin_write_int buf ~pos n =
assert_pos pos;
if n >= 0 then
if n < 128 (* can be stored in 7 bits *) then all_bin_write_small_int buf pos n
else if n < 16384 (* can be stored in 15 bits *) then all_bin_write_int16 buf pos n
else if arch_sixtyfour && n >= 2147483648l then all_bin_write_int64 buf pos (Int64.of_int n)
else all_bin_write_int32 buf pos (Int32.of_int n)
else … ;;
This variable-length encoding is effective in reducing data size, but it introduces computational overhead for each integer serialization and deserialization. More significantly, it makes the serialized representation diverge significantly from the in-memory representation, hindering the use of highly efficient bulk copy operations like memcpy
. memcpy
excels at rapidly copying contiguous blocks of memory, leveraging CPU parallelism.
By abandoning variable-length integer encoding, Arvin’s approach paved the way for utilizing memcpy
for serialization. Specifically, contiguous, non-pointer fields within OCaml records could be serialized using a single memcpy
operation, copying entire field blocks at once.
This project involved significant technical challenges. Arvin had to gain deep familiarity with low-level aspects of our codebase and the OCaml runtime. Instead of modifying the existing ppx_bin_prot
syntax extension, Arvin chose to work with typerep
, a first-class representation of OCaml types generated by a separate syntax extension. Serializers could then be implemented against typerep
, simplifying the development process compared to direct syntactic manipulation.
The project also necessitated delving into C bindings for low-level bit manipulation and a thorough understanding of OCaml’s memory layout. Resources like the Real World OCaml documentation on runtime memory layout proved invaluable. Debugging was also a significant part of the project. Arvin encountered issues with unexpected memory allocations, notably discovering that Obj.unsafe_ith_field
unexpectedly allocates memory when used on records containing only floats due to their special memory representation. This required Arvin to implement a custom C version of unsafe_ith_field
.
The final outcome was highly successful. Benchmarks consistently showed Arvin’s new serialization library outperforming Binprot. For smaller messages, performance improvements ranged from 10-20%. For larger messages with substantial non-pointer data, the speedup was dramatic, reaching up to 15 times faster. These performance gains have translated directly to our production systems. After deploying the new protocol for over a month, we observed tail latency reductions of 30-65% in real-world trading systems.
The stability and correctness of the library are also crucial. Extensive Quickcheck testing, developed by Arvin, contributed to the library’s reliability. With no observed crashes or serialization/deserialization errors, the library has proven robust and is now being considered for wider adoption across various Jane Street systems.
Developing a Time-Travel Debugger for Limshare
Risk management is a cornerstone of Jane Street’s trading operations. Our risk controls rely on a suite of risk-checking systems that enforce a comprehensive set of risk rules, each system managing a specific aspect of our trading activities. These systems operate within allocated risk limits, representing the maximum risk exposure permitted for each trading system. These limits are critical resources that directly influence our trading capacity. Historically, risk limits were allocated statically based on pre-defined configurations.
While straightforward, static allocation has limitations in maximizing limit utilization. It requires predicting system needs in advance, which is inherently imperfect. To address this, we developed Limshare, a dynamic risk limit allocation system designed to optimize limit usage while maintaining overall risk control. Limshare is built upon Aria, a framework for distributed state machine replication. Aria ensures system consistency and fault tolerance by maintaining a global log of updates. Applying these updates sequentially reconstructs the application’s state, providing inherent replication and persistence.
The update log in Aria applications offers a valuable debugging resource. In principle, one can step through the log events to trace the system’s state evolution and pinpoint the root cause of issues. However, our existing debugging tools for Limshare were rudimentary, primarily allowing step-by-step replay of Aria messages with limited state inspection. The existing interactive debugger lacked snapshotting capabilities. It replayed messages from the beginning of the day to a target time, then allowed stepping forward, but going backward required restarting the entire replay process, making debugging time-consuming.
Alex Li’s internship project aimed to significantly enhance the Limshare debugger by incorporating Aria’s snapshotting mechanism. Snapshots are essentially state summaries at specific points in time. Without snapshots, application startup requires replaying the entire update log. Snapshots allow starting from a recent snapshot and replaying only subsequent updates, drastically reducing startup time.
Alex implemented snapshotting within the debugger, enabling it to periodically capture application state snapshots. To “time-travel” backward, the debugger now locates the most recent snapshot preceding the desired time and replays messages forward from that snapshot, efficiently reconstructing the system state at any point in time. Crucially, Alex also implemented the initial snapshotting logic within Limshare itself as part of this project.
This enhanced debugger has become instrumental in investigating complex production incidents. Consider this example, inspired by real events, of using the debugger to understand why a specific order was unexpectedly rejected. First, we step forward in time until the rejection event is observed. Then, we step back one message to examine the system state just before the rejection decision.
> step-time 10m (Decision (Resize_pool 2) REJECTED) stop_condition: Breakpoint: Reached a rejected request decision. current stream time: 2024-08-26 10:36:23.967645939
> back-messages 1 (Request.Resize_pool (pool_id 140737488355330) (downcrash $9_241_233) (upcrash $1_391)) current stream time: 2024-08-26 10:36:23.967491204
Now, we can inspect the system state, focusing on limit usages and the rejected allocation request.
> print Checked out resources and limits
┌────────────────────┬─────────────┬──────────────┬─────────────┬──────────────┐
│ node id │ resources ↓ │ limit ↓ │ resources ↑ │ limit ↑ │
├────────────────────┼─────────────┼──────────────┼─────────────┼──────────────┤
│ kumquat │ U$6_453_178 │ U$10_000_000 │ U$34_748 │ U$10_000_000 │
└────────────────────┴─────────────┴──────────────┴─────────────┴──────────────┘
pools
┌─────────────────┬─────────────┬───────────────────────┬──────────────────────────┐
│ pool │ risk system │ request bundle │ size │
├─────────────────┼─────────────┼───────────────────────┼──────────────────────────┤
│ 140737488355330 │ nasdaq │ pts2, kumquat │ ↓ $0 | ↑ $0 │
│ 140737488355329 │ nyse │ pts1, kumquat │ ↓ $6_453_178 | ↑ $34_748 │
└─────────────────┴─────────────┴───────────────────────┴──────────────────────────┘
Undecided requests
┌───┬────────┬─────────────────┬─────────────────┬─────────────────────────┐
│ # │ Kind │ Node │ Pool │ Desired Size │
├───┼────────┼─────────────────┼─────────────────┼─────────────────────────┤
│ 1 │ Resize │ kumquat │ 140737488355330 │ ↓ $9_241_233 | ↑ $1_391 │
└───┴────────┴─────────────────┴─────────────────┴─────────────────────────┘
In this scenario, we observe that pts2
’s request was rejected because pts1
already held a significant limit reservation. To investigate further, we can time-travel back another five minutes to examine the duration of this reservation.
> back-time 5m ("Enforcer lease " (has_lease_until (2024-08-26 10:31:28.713728082-04:00))) stop_condition: Time_limit current stream time: 2024-08-26 10:31:23.713892045
> print Checked out resources and limits
┌────────────────────┬─────────────┬──────────────┬─────────────┬──────────────┐
│ node id │ resources ↓ │ limit ↓ │ resources ↑ │ limit ↑ │
├────────────────────┼─────────────┼──────────────┼─────────────┼──────────────┤
│ kumquat │ U$6_453_178 │ U$10_000_000 │ U$34_748 │ U$10_000_000 │
└────────────────────┴─────────────┴──────────────┴─────────────┴──────────────┘
pools
┌─────────────────┬─────────────┬───────────────────────┬──────────────────────────┐
│ pool │ risk system │ request bundle │ size │
├─────────────────┼─────────────┼───────────────────────┼──────────────────────────┤
│ 140737488355330 │ nasdaq │ pts2, kumquat │ ↓ $0 | ↑ $0 │
│ 140737488355329 │ nyse │ pts1, kumquat │ ↓ $6_453_178 | ↑ $34_748 │
└─────────────────┴─────────────┴───────────────────────┴──────────────────────────┘
This investigation revealed that the extended reservation was unwarranted, indicating a bug. This example highlights the power of effective observability tools: they illuminate previously hidden system behaviors, simplifying debugging and problem resolution. This aligns with the impact of other tools we’ve developed, such as magic-trace and memtrace.
The time-travel debugger’s concept is broadly applicable beyond Limshare. Recognizing its value, the Aria team has adopted the project and intends to make it a standard tool across the Aria ecosystem. Several other teams are already planning to integrate it into their Aria-based applications.
Join the Jane Street Internship Experience
If these projects resonate with you and you’re seeking a challenging and rewarding internship, we encourage you to apply for a Jane Street internship. Our program offers a unique opportunity to engage in meaningful, real-world projects that will significantly enhance your engineering skills, as exemplified by the projects discussed. A Jane Street internship is more than just a job; it’s a chance to contribute to impactful projects and grow as a technologist.
Author: Yaron Minsky, who joined Jane Street in 2002 and played a key role in the firm’s adoption of OCaml.