We launched ChemDoodle 3D v6.1 on August 28, 2020. Included was parallel processing support for the molecular modeling engine, which this article discusses in detail.
The complexity of molecular modeling algorithms increases exponentially as the number of atoms in the system increases. This fact forces us to endure longer and longer runtimes to optimize our molecules as they become larger. One remedy is to increase our computer's power with a faster central processing unit (CPU). And of course, a faster CPU will speed up the performance of any program. Yet, there is a limit to how fast contemporary CPUs can run at, regardless of how much effort the industry applies to increasing that ceiling, and the most powerful CPUs are priced accordingly.
There is another trick. Most modern computers are now equiped with multi-core CPUs. We can then split up the work of the algorithm across the multiple cores of the CPU and benefit from a much faster calculation. But unlike the free benefit of a faster CPU, correctly parallelizing computational tasks is not easy, and there are many pitfalls to implementing algorithms that safely benefit from parallelization. Fortunately, iChemLabs has already done all of the hard work for you in ChemDoodle 3D.
Did you know?
Just by using ChemDoodle 3D, you have already been taking advantage of parallel processing as the 3D graphics are produced on the graphics processing unit (GPU, also know as the graphics card), which is custom suited to parallelizing the creation of images. Parallelization of the molecular modeling engine in ChemDoodle 3D occurs on multi-core CPUs, instead of on the GPU.Learn More
Enabling parallel processing
You can enable parallel processing for the molecular modeling engine in ChemDoodle 3D by selecting the Enable Parallel Processing option in the Force Fields section of the Functions tab of the Preferences window.
In ChemDoodle 3D, we parallelized our force field implementations. This means that the optimization of a single molecule will run in parallel and a single molecule will be optimized faster, as opposed to optimizing multiple molecules in parallel, which is much easier to do, but provides no benefit to those optimizing a single system. It may seem logical for an algorithm run in parallel on a quad-core CPU (for instance) to run 4 times faster than a sequentially run (non-parallelized) algorithm, but this is not the case. In fact, by enabling parallelization, you may actually decrease the performance of your task. There are a number of reasons for this, which we will discuss now.
- Extent of Parallelization - The algorithm we are improving may not be able to be parallelized in its entirety. Only associative tasks may be parallelized. If we are optimizing 4 different molecules, we can expect a dramatic decrease in runtime by parallelizing the task across 4 cores, one for each molecule. The optimization of one molecule is irrelevant to the optimization of the others. But other algorithms cannot be handled so cleanly. For instance, the optimization procedure itself requires a certain number of partial steps from the unoptimized structure to the optimized structure. Each successful step towards the optimized molecular structure depends on the results of the step preceeding it. It does not make sense to run all of the steps concurrently, and they must be run sequentially. If only 10% of the runtime of a function is parallelizable, well then parallelization can never improve runtimes more than 10%.
- Overhead - Parallel processing does not come for free. Algorithms running in parallel must be stateless. In order to make an algorithm stateless, you have to produce specific data structures to store data for each task. More data is stored, and more objects are instantiated. A complex fork-join algorithm is employed to split the work and combine the results, significantly increasing the runtime of your algorithm. More memory will be used. Moving data between multiple cores is an expensive operation. In order for a parallel algorithm to run faster than its sequential counterpart, the runtime saved by splitting the work must overcome the runtime introduced by the additional overhead.
- Setup Specific - Most computers are unique: a specific make, operating system, CPU, GPU, memory, hyper-threading, etc. The benefit of parallel processing will be heavily dependent on your hardware. More cores available to your CPU will increase the likelyhood that parallelization will improve your runtime.
- Computation Specific - Just because a parallel algorithm improves runtime for a specific molecule calculated for a specific force field, that does not mean another molecule or another force field will have similar results. It is all dependent on how well the computations can be divided and how complex the computations are. Many programmers use the NQ model to assess parallel processing efficacy. N is the magnitude for splitting our computations and Q is the complexity of the computations. The higher the product, the more likely parallel processing will benefit the algorithm. For instance, the optimization runtime of a water molecule, with two bond stretch contributions and one angle bend contribution, will almost certainly be negatively impacted by parallelization.
- Other factors - Software is not running in a vacuum and other applications may be using computer resources. If you are running other CPU intensive applications, then parallel processing in ChemDoodle 3D will be less effective.
In ChemDoodle 3D, we have developed a powerful parallel processing system for our force fields. We micromanage the implementation with our own proprietary MapReduce algorithm, and perform our own chunking, forking and joining. Our goal is to minimize overhead. As we develop even better ways to reduce overhead, you will continue to see parallel processing performance in ChemDoodle 3D improve without doing anything.
We benchmark the optimization of a few chemical structures in this section to provide a better understanding of the benefits of parallel processing and illustrate when it is appropriate to enable parallel processing. For this analysis, we optimized several chemical structures, each 21 times, discarding the first result as a warm up. Any iterations resulting in unrealistic configurations were rerun. The 20 recorded runtimes for successful optimizations were then averaged for each molecule. Only the optimization runtimes were recorded, not any file parsing, structure loading, hydrogen enforcement, etc. The optimizations were performed using a MMFF94 force field with a conjugate gradients search direction and Newton line search to convergence. Understandably, each optimization is unique because the starting configuration is random, so the minimum and maximum runtime range may be large, but the averages were pretty consistent across multiple runs and this coarse analysis is suitable for our understanding of the parallelized force field performance in ChemDoodle 3D. The following table collects the results.
All benchmarks were performed on a 2017 iMac running macOS 10.15.6 with a 4.2 GHz Quad-Core Intel Core i7 CPU, with 8 logical cores due to hyper-threading. Java version 11.0.2 was used to compile and run the tests. No other CPU intensive applications were active.
The following graph illustrates the relationship between molecular complexity and runtime for optimizations.
For simple molecules, parallel processing will be a detriment to performance. At the equivalence point where both curves intersect, the optimization of a molecule will perform identically using both sequential or parallel processing. More complex molecules will see improved performance. In our testing environment, aspirin was close to the equivalence point, so molecules more complex than aspirin will benefit from parallel processing.
This equivalence point will be unique to the computer you are using ChemDoodle 3D on, and will be dependent on the force field, search direction and line search you are using and how many other applications you are running. The more complex your molecule gets, the more you will benefit from parallel processing up to the limit of dividing the work by the number of cores available. You can use this understanding to make an educated decision about whether parallel processing will benefit your work in ChemDoodle 3D.
Should parallel processing be enabled?
Why not? If your molecules are taking a long time to optimize and you desire faster runtimes, turn it on. If it helps, great! If it doesn't, simply turn it back off.
If you wish to reserve CPU cores for other tasks, and do not want ChemDoodle 3D to consume significant CPU resources, then keep it off.
If you have any questions or comments, please feel free to reach out!
Main image credits: The molecular graphics in the image were created in ChemDoodle 3D. The background circuit shapes are used with attribution: Circuit Vectors by Vecteezy