In This Article
- The Puzzle at the Heart of Our Solar System
- Why Measuring a Comet Was Nearly Impossible
- How Did AI Crack the Comet Size Problem?
- What the Numbers Actually Mean for Our Origins
- The Questions That Still Need Answering
Every few years, a comet blazes across our sky and astronomers scramble to measure it — only to realise the glowing smear they're photographing tells them almost nothing about the solid rock buried inside. Now, a team of Chinese researchers has used an artificial intelligence tool to peer through that blinding halo for the first time at scale, measuring the true nucleus size of 28 comets. What they found is reshaping our understanding of deep-space comet size estimation and, with it, the entire story of how the solar system was born.
The Puzzle at the Heart of Our Solar System
Imagine trying to weigh a candle by looking only at the light it throws on the wall. That is roughly the problem planetary scientists have faced for decades when studying comets. A comet in the active phase — the one we can actually see — is wrapped in a vast cloud of gas and dust called a coma, which can stretch millions of kilometres and completely swamps any signal from the nucleus beneath.
This obscuration created a gaping hole in one of astronomy's most important balance sheets. Scientists have long known that comets come from two distinct reservoirs: the Oort Cloud, a vast spherical shell at the far edge of the solar system that feeds long-period comets (LPCs) on journeys of hundreds or thousands of years; and the Scattered Disk just beyond Neptune, which sends short-period comets (SPCs) whipping through the inner system every few years. The new study, published in Nature Communications, found that the ratio between these two populations is dramatically different from what any leading model predicted.
Why Measuring a Comet Was Nearly Impossible
Before this study, scientists had four main approaches to estimating comet nucleus size: direct spacecraft imaging, photometry (analysing brightness), infrared thermal modelling, and dynamical analysis (tracking non-gravitational forces from gas jets). Each one is either brutally limited in scope or plagued by the same coma-contamination problem.
Only a handful of comets have ever been visited by spacecraft — making direct imaging a statistical dead end. Photometry and infrared methods work better, but they still require painstaking subtraction of the coma's glow, and an unconstrained surface albedo (how reflective the nucleus is) can swing size estimates by a factor of two or more. Dynamical modelling sidesteps the coma but depends on activity patterns that vary wildly from comet to comet and are poorly understood. The result? For long-period comets especially, published size estimates have carried enormous, often unacknowledged, uncertainty.
How Did AI Crack the Comet Size Problem?
The key insight is elegant: a comet's water vapour output is governed almost entirely by the amount of sunlight hitting its surface, which in turn depends on how big that surface is. If you can model the physics of water sublimation accurately enough, you can work backwards from observed water production rates to derive the nucleus size — without ever needing to see through the coma.
The team, led by Shunjing Zhao and Xian Shi at Shanghai Astronomical Observatory, built a tool called ThermoONet. It uses a class of AI architecture called a deep operator neural network (DeepONet) to simulate cometary thermophysics — essentially, how heat conducts through a layered nucleus and drives ice to vapour. Crucially, ThermoONet runs these simulations a million times faster than conventional finite-difference numerical models. A calculation that used to take 10,000 seconds now takes 0.01 seconds. That speed unlocks a global optimisation technique called simulated annealing, which can explore an enormous range of possible nucleus properties to find the combination that best fits the observed water curve.
The team applied this to water production data collected by the SWAN ultraviolet spectrometer on the SOHO spacecraft, which has been monitoring comets continuously since the late 1990s — one of the most consistent long-term datasets in solar system astronomy. For each of the 28 comets, the model generated roughly 400 viable solutions with small variations, and the spread of those solutions gave a statistical uncertainty of about 17% per comet.
"We find that long-period comets possess significantly larger nuclei than short-period comets with comparable absolute brightness — a result that contradicts the prevailing expectation."
— Zhao, Shi et al. · Nature Communications, 2026The validation numbers are striking. For the four spacecraft-visited comets in the dataset — where the ground truth is known from direct imaging — the AI-derived sizes achieved a mean relative error of just 20%. For Hale-Bopp, whose nucleus was estimated from a 32-astronomical-unit observation, the error was only 8%. This is competitive with, and in some cases better than, photometric and infrared methods, without any of their coma-contamination baggage.
What the Numbers Actually Mean for Our Origins
Here is where the story takes its most dramatic turn. For decades, the prevailing assumption has been that long-period comets are intrinsically more "active" than short-period comets — meaning that a given amount of water vapour corresponds to a smaller nucleus in an LPC than in an SPC of the same brightness. This assumption underpinned virtually every estimate of how many objects the Oort Cloud and Scattered Disk contain.
The new comet size data demolishes that assumption. Across a broad brightness range from absolute magnitude 4 to 13, LPCs consistently have larger nuclei than SPCs of equivalent brightness, not smaller ones. The largest object in the sample, Comet Hale-Bopp (C/1995 O1), came in at 68 kilometres in diameter. The largest short-period comet, Halley, measured 11.6 km. This systematic difference, once properly accounted for in the population models, produces an Oort Cloud-to-Scattered-Disk population ratio of approximately 998 — nearly three orders of magnitude. Previous theoretical models built on the Nice model of giant planet instability predicted a ratio of only 5 to 20.
The Questions That Still Need Answering
The study is careful about what it is not. The sample of 28 comets, while the most systematically analysed collection yet, is still small by statistical standards. Long-period comets with large perihelia — those that swing well outside Earth's neighbourhood — are severely underrepresented in the SOHO/SWAN dataset because they are simply harder to observe continuously. Expanding that dataset is essential before the magnitude-to-size relationships derived here can be treated as definitive.
There is also the question of whether primordial protoplanetary disks were as smooth and uniform as simple models assume. ALMA observations of young planetary systems increasingly show rings, gaps, and asymmetries that could profoundly alter how planetesimals respond to gravitational perturbations — and therefore what the true initial conditions for OC/SD population were. Finally, the timing, geometry, and exact configuration of any proposed stellar flyby remains unconstrained. What the new data demands is an event; pinning down which event is a problem for future dynamical modellers.
The upcoming ESA Comet Interceptor mission will be the first spacecraft sent to study a dynamically new long-period comet — precisely the population whose sizes are most uncertain and most consequential for this debate. Its data, combined with an ever-growing SOHO/SWAN archive, should sharpen these estimates considerably within the decade.
- Long-period comets are genuinely large — on average larger, not smaller, than short-period comets of the same brightness, overturning a 30-year assumption that fed every population model.
- The Oort Cloud is far more crowded — the revised comet nucleus size data implies the Oort Cloud holds roughly 1,000 times more objects than the Scattered Disk, a discrepancy that giant planet migration alone cannot explain.
- A stellar flyby fits the bill — simulations show a close stellar encounter in the Sun's birth cluster could have simultaneously over-populated the Oort Cloud and stripped the Scattered Disk, producing exactly the ratio the AI measurements now reveal.
"The Oort Cloud is nearly three orders of magnitude more populated than the Scattered Disk, implying that processes such as stellar flybys could have played an important role in shaping the outer Solar System." — Zhao, Shi, Lei, Hui & Shi, Nature Communications, 2026.
📄 Source & Citation
Primary Source: Zhao, S., Shi, X., Lei, H., Hui, M.-T. & Shi, J. (2026). Deep learning-enabled size estimation of comets indicates a more dynamic early solar system. Nature Communications. https://doi.org/10.1038/s41467-026-72646-8
Authors & Affiliations: Shunjing Zhao (Nanjing University; Shanghai Astronomical Observatory), Xian Shi (Shanghai Astronomical Observatory, CAS), Hanlun Lei (Nanjing University), Man-To Hui (Shanghai Astronomical Observatory), Jianchun Shi (Shanghai Astronomical Observatory)
Data & Code: SOHO/SWAN water production rates: NASA PDS Small Body Node. ThermoONet source code: github.com/zsjnb7/ThermoONet-Comet
Key Themes: Comet nucleus size · Deep learning thermophysics · Oort Cloud · Scattered Disk · Stellar flyby · Solar system formation
Supporting References:
[1] Brasser, R. & Morbidelli, A. (2013). Oort cloud and Scattered Disc formation during a late dynamical instability in the Solar System. Icarus, 225:40–49.
[2] Pfalzner, S., Govind, A. & Zwart, S.P. (2024). Trajectory of the stellar flyby that shaped the outer Solar System. Nature Astronomy, 8:1380–1386.
[3] Lu, L. et al. (2021). Learning nonlinear operators via DeepONet. Nature Machine Intelligence, 3:218–229.
No comments yet. Be the first to share your thoughts.
Leave a Comment