Overview
The Quantum Thermochemical API provides programmatic access to a collection of quantum chemistry computational resources based on the W4-11 dataset. It enables researchers to:
- Retrieve basis sets used for high-accuracy quantum thermochemical computations.
- Submit and query research results related to molecular ground state energy calculations.
- Explore community-contributed results, including accuracy metrics and metadata.
The API is designed to facilitate reproducible computational chemistry research and allow comparison of different algorithmic approaches to molecular energy computation. Key features include:
- Basis Retrieval: Access curated computational basis sets for W4-11 benchmark calculations.
- User Submissions: Submit algorithmic results and retrieve public research submissions.
- Flexible Queries: Search by DOI, page, or submission date and aggregate results.
Endpoints
1. Fetching Basis SetsGET /entries/:basis_name
- Description: Retrieves the specified basis set as a downloadable file
- Parameters: The name of the requested dataset to be downloaded. A comprehensive list can be found at BasisSetExchange.org , but to find if a specific basis is offered, please refer to our supported bases.
- Response: Returns a basis file containing corresponding information formatted as following:
- Each line of the file is formatted as an individual json object
- The first line contains the basis name formatted as
{$basis: "basis_name"} - The rest of the file consists of json objects representing the information of each of the 152 supported species:
{name: String, ecore: int, nelecas: int, ncas: int, h1e: tensor, h2e: tensor, cct2: tensor}name: String- identifier for the moleculeecore: int- number of electrons in the corenelecas: int- number of active electronsncas: int- number of configuration state functionsh1e: Tensor- one-electron integrals (kinetic + nuclear attraction)h2e: Tensor- two-electron integrals (Coulomb repulsion)cct2: Tensor- two-particle contraction tensor
- All tensor information is flat mapped and encoded in base-64, returned as a struct containing and integer list representing the shape of the tensor and a b64 string of its information:
{shape: int[], data: String}
2. User Submissions
POST /submission
- Description: Submit research results for review
- Request Body: Contains the DOI, email and ground state energy data computed:
{doi: String, email: String, data: {...species: float}}DOI- Valid digital object identifier available for cross-reference on Crossref REST APIEmail- A valid email to be contacted for submission confirmation and subsequent notificationsData- Map of string-float key value pairs, containing every species in the provided data set and its computed ground state energy
- Response: JSON message on completion of submission verification and confirmation email
GET /submission/fetch?...
- Description: Multipurpose endpoint for querying public submission data
- Query Parameters: There are 2 primary types of requests:
- SINGLE - Fetches individual submission data based on their DOI identifiers.
- Request:
GET /submission/fetch?type=single&doi=x,y,z,... - Description: Returns a json list of entries with doi's matching
x, y, or z, containing the doi and submission info, containing the data on submission and its accuracy. The data is formatted as a map of species to float tuple, such that the first index contains the value and the second is the error from precomputed ground state energies - Response:
[{doi: String, info: {data: { ...species: [value, absolute-error] }, accuracy: mean-absolute-error}}, ...]
- Request:
- PAGE - Fetches submissions in bulk, responding with doi, data and accuracy information. The max number of submissions per page is 200, and both pages and limit must be positive integer values.
- Request:
GET /submission/fetch?type=page&page=x&limit=y - Description: Returns a struct containing the page number
x, the limit of submissions per pagey, the total number of submissions, the total number of pages foryentries per page, and the result for thex'thpage ofyentries as a list of submissions. - Response:
{page: int, limit: int, total: int, totalPages: int, results: [ same format as single query ]}
- Request:
- SORT & FILTER - Provided query functionality for searching and filtering for specific submissions by page.
- Fields: The following fields are supported for both sorting and filtering:
- DOI
- Date
- Accuracy
- Sorting: Queries can be sorted with comma separated fields, in the order provided. The order is specified with either
:ascor:descsuffix on their respective field.
Example:?type=page&sort=date:desc,accuracy:descyields pages sorted first by submission time, then by accuracy. - Filtering: Queries can be filtered
- Fields: The following fields are supported for both sorting and filtering:
- SINGLE - Fetches individual submission data based on their DOI identifiers.
- Defaults:
- An unspecified query to "
/submission/fetch" is equivalent to "type=page&page=1&limit=15" - A partially specified query to "
/submission/fetch?type=single" will always throw a 404 with no items found - A partially specified query to "
/submission/fetch?type=page" will default to page 1 of 15 entries
- An unspecified query to "
Examples
Default Fetch:
Paginated Fetch:
Specific DOI Fetch:
Timeline Fetch:
Overview
Python Package
The W4Benchmark Python library is designed for benchmarking algorithms on the W4-11 dataset. It provides a consistent framework for controlled experiments, helping ensure results are reproducible and comparable. It supports both decorator-based workflows (for quick setup and automated iteration over all molecules) and manual workflows (for fine-grained control over computation).
Installation
Install the package from PyPI with pip install w4benchmark
Python 3.6+numpy >= 2.2.4requests >= 2.32.3
Usage
This package provides two main decorators:
@W4Decorators.process(...): For processing each molecule (e.g., computing energies)@W4Decorators.analyze(...): For analyzing the results (e.g., comparing predictions to reference data)
Each decorated function must accept exactly two parameters: the molecule name (str) and a Molecule object.
Each decorator accepts a list of runtime parameters, with a predefined set of variables set by default at decoration time:
geominfo_url: specifies the directory of species geometries data fileresources_url: specifies the root of the resources directory for finding cached basis sets, geometries and moreapi_url: the URL to query for basis set info if the specified basis is not found in resources directorybasis: the basis information to parse from resource directory (defaults to "sto6g", and if set to a value not found in resources will query from `api_url`)debug: the debug level for logging output (defaults to `logging.WARNING`)
All parameters can be queried and modified during runtime via the `W4.parameters` object. Arbitrary parameters can be added and queried at runtime to keep track of other runtime specifics (ex. `@W4Decorators.process( printValues = true )` will add a field `printValues` to `W4.parameters` with a value of `true`).
Example Script:
Create a script like compute.py:
from w4benchmark import W4Decorators, Molecule, W4
@W4Decorators.process(basis="sto6g", debug=logging.DEBUG)
def compute_energy(name: str, mol: Molecule):
# Replace with real computation
@W4Decorators.analyze(basis="sto6g")
def analyze_results(name: str, mol: Molecule):
# Replace with real analytics
Then run from the command line with either:python compute.py --process
or:python compute.py --analyze
Each command will iterate over every molecule in the dataset and apply the corresponding decorated function.
The GitHub repository contains more examples of library usage.
Advanced
Manual Iteration
If you want full control, you can manually run the W4 benchmark from within a __main__ block:
from w4benchmark import W4
if __name__ == '__main__':
W4.parameters.basis = "sto6g" # Set runtime parameters
W4.init() # manual execution requires the .init() function to be called
# Example usage
for name, mol in W4:
print(f"{name}: spin = {mol.spin}, charge = {mol.charge}")
This allows you to iterate through the dataset as a normal iterable. You can also dereference W4 with a specific species if you want to select a singular molecule object (ex. W4["acetaldehyde"]).
Multiple Decorations
You can apply the same decorator to multiple functions to group computations. This is useful when an algorithm is composed of sequential steps, since each decorated function runs in order. Using multiple decorated functions can make the data flow easier to follow and visually distinct.
For example, here’s how you might calculate the root-mean-square (RMS) radius of a molecule using two sequential @process functions:
from w4benchmark import W4Decorators, Molecule
import math
centroid = {}
@W4Decorators.process()
def process_a(species: str, mol: Molecule):
coords = [pos for _, pos in mol.geom]
centroid[species] = tuple(sum(values) / len(coords) for values in zip(*coords))
def dist2(a, b):
return sum((ai - bi) ** 2 for ai, bi in zip(a, b))
@W4Decorators.process()
def process_b(species: str, mol: Molecule):
dist_sq = [dist2(pos, centroid[species]) for _, pos in mol.geom]
rms_radius = math.sqrt(sum(dist_sq) / len(dist_sq))
print(f'species: "{species}", centroid: {centroid[species]}, rms: {rms_radius}')
All functions decorated with the same decorator (@W4Decorators.process in this case) will run sequentially, enabling advanced workflows that act on shared results. In this case, the centroid dict will be completely filled before any RMS calculations take place, which can help with debugging intermediary values.
Dataset Attribution
This tool builds on the W4-11 Dataset
Goerigk, L., & Grimme, S. (2011).
A thorough benchmark of density functional methods for general main group thermochemistry, kinetics, and noncovalent interactions.
Phys. Chem. Chem. Phys., 13, 6670–6688.
https://doi.org/10.1039/C0CP02984J
License
The library is licensed under CC BY-NC 4.0
You may use and adapt it for non-commercial purposes with proper attribution.
See the LICENSE for details