This article is the first part of a three-part series on chemical data recovery written by Kevin Theisen, President of iChemLabs:
We launched ChemDoodle 2D v11.2 on December 4, 2020. Included were powerful new tools for recovering chemical data from files, including Microsoft Office files, which this article discusses in detail.
In this artistic rendering, several archaeologists unearth fossils of chemical structures. This is a metaphor for new chemical data recovery features in ChemDoodle 2D.
The handling of chemical data is extremely important, as without it, chemists would not be able to store or communicate chemistry information through computers. Over the past century, many file formats have been proposed to represent chemical information. Several formats have become essential and are implemented by the vast majority of chemical software. These formats include MDL connection table formats (MOL, SDF, etc.), Chemical Markup Language (CML), and CDX/CDXML formats. There is a fair amount of chemistry software that “read” chemical files. However, most software that read chemical files understand little more than basic molecules and perhaps reactions, while there are a lot of objects and properties defined by the various formats for creating complex chemical data and figures. The goal of chemical file reading in ChemDoodle is to fully understand all objects and properties and to recover the graphics of the chemical figures drawn in other programs in a pixel-perfect manner.
Also prevalent in the professional chemistry community is an application process referred to as “round-trip editing” (fully supported by ChemDoodle and discussed in detail in the ROUND TRIP EDITING section of the ChemDoodle 2D user guide), which allows a user to create data in one program (such as ChemDoodle) and store the data in another program (such as Microsoft Office). This allows a chemist to easily manage chemical data, which is not an intrinsic data type in other programs like text or images are, without having to save and manage multiple files from different applications; for instance if a chemist includes a dozen reaction schemes in a word processor, then those reaction schemes can later be accessed and edited through the word processor’s application and file.
Professional chemists have come to rely on the convenience of round-trip editing, but it is fraught with risk. On several occasions in the past decade or so, Microsoft Office has dropped round-trip editing support on macOS, leading to many macOS users losing access to their chemical data. Therefore, as a practical consideration when working with niche data like chemical data, your computer and software will eventually change and there is a chance you will be locked out of accessing your data. Some software may become obsolete, the developer may change it, you may switch to a different operating system where software is no longer compatible, or you may no longer be able to afford the products you used in the past. In all of these cases, you will no longer have access to the original chemical data you created. Years of work and effort may be lost in this way.
ChemDoodle includes very thorough tools for recovering chemical data, whether through files, via interactions with other software, or through round-trip editing. In particular, ChemDoodle expertly handles chemical data and can recover embedded chemical data in other applications, regardless of the operating system you are using. This allows you to maintain control and access to your work. This process is very complex, and so we describe the different ways of recovering chemical data as levels. There are 4 different levels of chemical data recovery. ChemDoodle fully supports all 4 levels on Windows, macOS and Linux.
- Reading chemical files
- Pasting directly from other chemical applications
- Pasting embedded data from other (non-chemical) applications
- Recovering data from Microsoft Office files
Level 0: Reading chemical files
The most basic function for recovering chemical data is to read chemical files. ChemDoodle understands over 30 widely used chemical file formats. As stated in the introduction to this section, the goal of chemical file interpretation in ChemDoodle is to fully and completely handle file contents and to recover the graphics in a pixel perfect manner.
Data and graphics support in ChemDoodle is an immense undertaking. Yet, the ChemDoodle application does this expertly. If you are not satisfied with the way ChemDoodle opens your chemical file, simply send it to us and tell us what you are not satisfied with, and our goal will be to improve it.
Level 1: Pasting directly from other chemical applications
While it may not seem sensible to need two separate chemistry applications or ever to need two different chemical drawing applications, the scope of each application is different, and there are functions ChemDoodle can perform that other chemical software cannot. For instance, you may want to use ChemDoodle to style bond strokes or generate 3D coordinates. You can copy from other chemistry applications and paste into ChemDoodle on both Windows and macOS. ChemDoodle is compatible with ChemDoodle Collages, ChemDoodle JSON, MDLCT and MDLSK, and ChemDraw Interchange Format clipboard data on Windows, and is compatible with ChemDoodle Collages, ChemDoodle JSON and ChemDraw Interchange Format clipboard data on macOS.
Level 2: Pasting embedded data from other (non-chemical) applications
Some applications embed chemical metadata into figures pasted into 3rd party applications, in a process referred to as "round-trip editing". The technical procedure for recovering this data is different on Windows and macOS. However, on both operating systems, these figures can be copied from the 3rd party application and pasted into ChemDoodle to recover the original data for further editing without using or having access to the original chemical software that created the figure.
This process will be limited to the operating system the figure was created on. This is because Windows and macOS handle system metadata differently and most programs working on both operating systems (like Microsoft Office) do not convert the data for the other operating system.
As an example, if you have pasted a chemical figure into Microsoft Word on Windows, you will later be able to copy that chemical figure from Microsoft Word on Windows and paste into ChemDoodle for further editing. And if you have pasted a chemical figure into Microsoft Word on macOS, you will later be able to copy that chemical figure from Microsoft Word on macOS and paste into ChemDoodle for further editing. This works for any application correctly storing the chemical metadata, including WordPad on Windows and Pages and Mail on macOS.
If you need to get around this operating system limitation for Microsoft Office, look at the next section, Level 3.
Level 3: Recovering data from Microsoft Office files
ChemDoodle provides a unique solution for searching and extracting chemical figures out of, specifically, Microsoft Office files, regardless of what application or operating system the embedding occured with. Microsoft Word will be the most common filetype used to store embedded chemical data. This is especially useful if you are a macOS user collaborating with a chemist on Windows (and vice versa), or if you used to be a Windows user but switched to Linux, or if round-trip editing is broken by any part of the chain. To recover chemical data using this method in ChemDoodle, you do not need access to other chemistry software, Microsoft Office, or the original operating system the data was created on.
In this image, ChemDoodle on macOS recovers complex chemical figures drawn and pasted from ChemDraw into Microsoft Word on Windows.
Currently, the XML files from Word, Powerpoint and Excel can be handled, and ChemDoodle, ChemSketch™ and ChemDraw® embedded data can be recovered.
To recover chemical data from your Microsoft Office file, please follow these instructions:
- In ChemDoodle, select the File>Recover from MS Office File... menu item. A window will appear as shown here.
- Press the Select File button on the top left, and a file chooser will appear. Select the Microsoft Office document you wish to scan for chemical data and press the Open button.
- ChemDoodle will scan the file for compatible chemical data. A progress bar will be animated while the scan is in progress, as the process may take a few seconds or longer. Once complete, the window will display which type of Microsoft Office file was detected and which operating system ChemDoodle believes the file was created on. Any extracted figures will be listed in the Extracted Figures list, denoted by the number of molecules and shapes contained in the figure as well as type of chemical data the figure was represented in surrounded by parentheses. This is shown in the next image.
- The first extracted figure will automatically be selected and displayed in the preview on the right side of the window. Go through the Extracted Figures list and select the figure you wish to extract for further editing or any other purpose. You can zoom in on the preview by moving your mouse pointer over the preview panel.
- Press the Insert into Document button to load the extracted figure into your current document.
You can then repaste your content back into the original Microsoft Office file to replace the original figure, but keep in mind the original data will be replaced by ChemDoodle embedded data, and programs other than ChemDoodle may not be able to edit the newly embedded data.
Now, you can share you Microsoft Office files including embedded chemical data with your colleagues on Windows, macOS or even Linux!
In summary, you can use ChemDoodle to recover chemical data created in ChemDoodle and other chemical programs. This means you will be able to recover data in files you thought were lost long ago, you can edit chemical figures from Microsoft Office files even if you do not have access to Microsoft Office, and you can finally collaborate with chemists in creating documents even if they use a different operating system. When you have ChemDoodle with you, you are in control of your chemical data.
ChemDraw is a registered trademark of PerkinElmer, Inc., Microsoft is a registered trademark of Microsoft Corporation, ChemSketch is a trademark of ACD/Labs.