Generative AI Supports Quickly Calculating 3D Genomic Structures

A novel method predicts how a certain DNA sequence will organize itself in the cell nucleus in a matter of minutes as opposed to days.

Although every cell in your body has the same genetic code, only a portion of those genes are expressed by each cell. The three-dimensional structure of the genetic material, which regulates each gene’s accessibility, plays a role in these cell-specific gene expression patterns that guarantee that a brain cell differs from a skin cell.

MIT scientists have now developed a novel method that uses generative artificial intelligence to determine such 3D genomic architectures. Their method is significantly faster than current experimental methods for examining the structures since it can predict thousands of structures in a matter of minutes.

This method would make it easier for researchers to examine how the 3D arrangement of the genome influences the patterns and functions of gene expression in specific cells.

Our goal was to try to predict the three-dimensional genome structure from the underlying DNA sequence,

Now that we can do that, which puts this technique on par with the cutting-edge experimental techniques, it can really open up a lot of interesting opportunities.

Advertisement
Bin Zhang

Their findings were published in the journal Science Advances.

Cells are able to fit two meters of DNA into a nucleus that is only one-hundredth of a millimeter in diameter because of a complex known as chromatin, which is made up of proteins and DNA and has multiple layers of structure. A structure like beads on a string is created when long DNA strands loop around proteins known as histones.

Depending on the kind of cell, chemical tags called epigenetic modifications can be affixed to DNA at certain sites. These tags alter the chromatin’s folding and the accessibility of neighboring genes. These variations in chromatin structure aid in identifying the genes that are expressed at different periods within a particular cell or in distinct cell types.

Scientists have created experimental methods for figuring out chromatin architecture throughout the last 20 years. An extensively utilized method called Hi-C connects nearby DNA strands in the nucleus of the cell. By tearing the DNA into numerous small fragments and sequencing it, researchers may then identify which parts are close to one another.

This technique can be applied to single cells to identify structures within that particular cell or to large populations of cells to estimate an average structure for a chromatin segment. Hi-C and related methods are time-consuming, though, and it can take up to a week to produce data from a single cell.

Advertisement

In order to get around those restrictions, Zhang and his students created a model that makes use of current developments in generative AI to produce a quick and precise method of predicting chromatin configurations in individual cells. Their AI model is capable of rapidly analyzing DNA sequences and forecasting the chromatin structures that the sequences may generate within a cell.

Deep learning is really good at pattern recognition,

It allows us to analyze very long DNA segments, thousands of base pairs, and figure out what is the important information encoded in those DNA base pairs.

Bin Zhang

The researchers’ model, ChromoGen, consists of two parts. The underlying DNA sequence and chromatin accessibility data, which are generally accessible and particular to different cell types, are used to assess the information recorded in the first component, a deep learning model that has been trained to “read” the genome.

The second element is a generative AI model that has been trained on over 11 million chromatin conformations and can predict physically accurate chromatin conformations. These results were from tests on 16 cells from a strain of human B lymphocytes utilizing Dip-C, a variation of Hi-C.

Advertisement

This technique successfully captures sequence-structure relationships, and when combined, the first component tells the generative model how the environment particular to a certain cell type affects the creation of various chromatin structures. The researchers create a variety of potential structures for every sequence using their model. The reason for this is that a single DNA sequence can result in a wide variety of potential conformations because DNA is a very disordered molecule.

A major complicating factor of predicting the structure of the genome is that there isn’t a single solution that we’re aiming for. There’s a distribution of structures, no matter what portion of the genome you’re looking at. Predicting that very complicated, high-dimensional statistical distribution is something that is incredibly challenging to do.

Greg Schuette

Compared to Hi-C or other experimental methods, the model can produce predictions on a substantially faster timescale after it has been trained.

Whereas you might spend six months running experiments to get a few dozen structures in a given cell type, you can generate a thousand structures in a particular region with our model in 20 minutes on just one GPU.

Greg Schuette

Following model training, the researchers generated structure predictions for over 2,000 DNA sequences and compared them to the structures established through experimentation. They discovered that the structures produced by the model matched or closely resembled those observed in the experimental data.

Advertisement

We typically look at hundreds or thousands of conformations for each sequence, and that gives you a reasonable representation of the diversity of the structures that a particular region can have,

If you repeat your experiment multiple times, in different cells, you will very likely end up with a very different conformation. That’s what our model is trying to predict.

Bin Zhang

Also Read: Research Proves AI Boosts Efficacy of Cancer Treatment

The model’s ability to produce precise predictions for data from cell types other than the one it was trained on was another discovery made by the researchers. This implies that the model might be helpful in examining the ways in which different cell types’ chromatin configurations impact their functionality. The model may also be used to investigate the various chromatin states that can occur in a single cell and the ways in which gene expression is impacted by these modifications.

Examining how mutations in a specific DNA sequence alter the chromatin conformation could be another utility, as this could provide insight into how these mutations might contribute to illness.

Advertisement

There are a lot of interesting questions that I think we can address with this type of model,

Bin Zhang

Source: MIT News

Journal Reference: Schuette, Greg, et al. “ChromoGen: Diffusion Model Predicts Single-cell Chromatin Conformations.” Science Advances, 2025, DOI: https://doi.org/10.1126/sciadv.adr8265.


Last Modified:

Graduated from the University of Kerala with B.Sc. Botany and Biotechnology. Attained Post-Graduation in Biotechnology from the Kerala University of Fisheries and Ocean Science (KUFOS) with the third rank. Conducted various seminars and attended major Science conferences. Done 6 months of internship in ICMR – National Institute of Nutrition, Hyderabad. 5 years of tutoring experience.

Advertisement

Ajmal Aseem

Graduated from the University of Kerala with B.Sc. Botany and Biotechnology. Attained Post-Graduation in Biotechnology from the Kerala University of Fisheries and Ocean Science (KUFOS) with the third rank. Conducted various seminars and attended major Science conferences. Done 6 months of internship in ICMR – National Institute of Nutrition, Hyderabad. 5 years of tutoring experience.

Next Post

A New Study Reveals the Secret to a Sustained Immune Response in Cancer and Chronic Illnesses

Mon Feb 3 , 2025
The study discovered that the body may produce more ID3+ T cells in response to certain signals, opening the door for better therapies like CAR T cell therapy.
car t cells representation

Related Articles