Improving calculation performance in graphs

Tips and tricks to speed up your calculations in Boulder Opal

Boulder Opal uses dataflow graphs to represent the computations or optimizations to be carried out, which provide maximum flexibility and customization. They allow, for instance, to perform control optimization in a wide variety of situations, such as using user-defined basis functions, on large quantum systems, or on systems subject to strong noise. They can also be used to simulate open quantum systems, quantum circuits, or noisy quantum systems.

If you want to learn more about graphs in Boulder Opal, you can read our introductory topic, go through our tutorial to use a graph to simulate a quantum system, or take a look at our user guides to learn about more advanced graph-based features in Boulder Opal. Our topic on graphs can give you a deeper understanding on what graphs are, why we use them, and an overview on the typical Boulder Opal graph workflow.

Dataflow graphs can lead to very efficient calculations. However, it sometimes requires programming in a different way than in other programming paradigms. Here we present a few things that you can consider while building graphs which will help you speed up their execution and get results faster.

Avoid loops and vectorize the calculations

Using loops to define your graph, particularly if they iterate over a large number of elements, can lead to inefficient graphs. If you are performing the same calculation for different sets of parameters, you should consider vectorizing it by batching the calculation to perform the calculation for the whole set of parameters at once.

You can easily create batched Hamiltonian terms by using the operations to create Pwc and Stf objects. Operations such as graph.pwc, graph.constant_pwc, graph.pwc_signal, graph.complex_pwc_signal, graph.constant_pwc_operator, graph.stf_operator, or graph.constant_stf_operator allow you to create batched time-dependent functions by passing Tensors or NumPy arrays with leading dimensions representing batches. Pwc and Stf objects have a value_shape and a batch_shape. The value_shape represents the shape of the function value (for example, a piecewise-constant operator in a qubit system would have a value_shape of (2, 2)). The batch_shape shows the number of batched elements in the function (which can be distributed among multiple dimensions). When manipulating batched Pwc or Stf objects, the broadcasting rules are applied to both the value and batch shape of the objects. Graph nodes to calculate time-evolution operators or infidelities can accept a batch of Hamiltonians, and return a Tensor with the evolution operators (or infidelity) for each element in the batch.

Our user guides contain many examples of batching being used in a wide variety of situations. In system identification, batching is a convenient way to process multiple data points at once, as shown in our user guides for transmission line characterization, or Hamiltonian parameter estimation for a small or a large amount of data. In a simulation, batching can parallelize the computation for different noise values in order to perform quasi-static scans of noise, or simultaneously perform a calculation for many time-dependent noise realizations affecting your quantum system or quantum circuit. Similarly, this parallelization can be used in stochastic optimization to apply different noise realizations at each iteration of the optimizer and generate controls that are robust against that noise.

Try eager execution

When you call qctrl.functions.calculate_graph, by default Boulder Opal compiles your graph in order to speed up its execution. This allows some operations to be computed in parallel and with low overhead, although it adds some compilation time. For most simulations of open system dynamics, large systems, or systems described by sampleable tensor function (Stf) objects, compiling the graph results in a faster computation.

For some graphs, however, the compilation can take a very long time, and ends up slowing down the total computation time. In such cases, it is worth executing the graph eagerly, meaning that the values of the graph nodes are calculated sequentially as they are encountered. For example, if you are simulating a piecewise-constant Hamiltonian defined over a large number (more than roughly 1000) of segments, using eager execution will most likely speed up the computation time.

You can execute your graph eagerly by passing execution_mode="EAGER" to qctrl.functions.calculate_graph.

qctrl.functions.calculate_graph(
    graph=graph, output_node_names=["hamiltonian", "states"], execution_mode="EAGER"
)

Calculate state evolution and use sparse matrices

If you are calculating the evolution of a quantum system (for a simulation or inside an optimization), computing the time-evolution operators can be very demanding for large systems. In the case where you are only interested in the evolution of a single initial state (rather than the full unitary generated by the Hamiltonian), performing state evolution instead can lead to an important speedup of your calculation. You can calculate the state evolution of a quantum system by using the graph.state_evolution_pwc or graph.density_matrix_evolution_pwc operations, depending whether your system is closed or open.

Moreover, if your quantum system has a large Hilbert space dimension (larger than several hundred), you should consider using a sparse representation of your system, as Hamiltonians in most high-dimensional systems contain mostly zeros. Sparse matrices allow for faster matrix multiplication and lead to a significant reduction in computation time in most state propagation calculations. The state evolution operations can take a sparse representation of your Hamiltonian, which you can create with operations such as graph.sparse_pwc_operator, graph.constant_sparse_pwc_operator, and graph.sparse_pwc_sum. These work analogously to graph.pwc_operator, graph.constant_pwc_operator, and graph.pwc_sum, but take SciPy sparse matrices instead of NumPy arrays.

To see how to use these nodes to perform state evolution in a variety of use cases, refer to our user guides. For example, you might be interested in optimizing controls on large sparse Hamiltonians or simulating large open system dynamics.

Compute large arrays inside the graph

When you pass a Graph object to a Boulder Opal function, the graph is sent to our cloud servers. Besides the graph structure, all the inputs that you've used when defining it are also uploaded, for instance the NumPy arrays representing the operators in the system.

When defining Hamiltonians for large systems, in particular if they belong to composite systems, these NumPy arrays for the Hamiltonian terms can be very large, and increase the payload associated to the graph. Creating some of those objects as Tensors inside of the graph, instead of using NumPy operations, will result in a much more lightweight graph, which will upload faster.

For example, if you are creating a Hamiltonian in a bipartite system, your terms might look like $\gamma(t) A \otimes B$, where $A$ and $B$ are $N_A \times N_A$ and $N_B \times N_B$ operators. In order to create this term inside a graph, you might do something like

product_operator = np.kron(A, B)
gamma_term = graph.pwc_operator(signal=gamma, operator=product_operator)

The input to the graph is np.kron(A, B) which is a $N_A N_B \times N_A N_B$ NumPy array.

If you use graph.kron instead of np.kron,

product_operator = graph.kron(A, B)
gamma_term = graph.pwc_operator(signal=gamma, operator=product_operator)

then product_operator is already a Tensor inside of the graph, whose inputs are two NumPy arrays (with shapes $N_A \times N_A$ and $N_B \times N_B$). The resulting gamma_term in the graph will be the same, but depending on $N_A$ and $N_B$, creating product_operator in the graph leads to a large reduction of the amount of data sent to the cloud, and speeds up the uploading of the graph. This reduction will also be particularly acute in composite system with many parts, such as many-qubit systems, as each Hamiltonian term is the result of many chained Kronecker products.

You can see examples of how this is done for multiqubit systems in a quantum circuit and in an atomic chain. Although this will not improve the performance of the optimization or calculation, it can make the payload smaller and reduce the time to upload the graph.