fbpx
Login/ Portal For Embryologists For IVF Clinics Blog Download Contact Archive

52 Infertility Artificial Intelligence Approaches at ESHRE 2022

52 Infertility Artificial Intelligence
52 Infertility Artificial Intelligence approaches are reported at ESHRE 2022. Download our complete guide!

52 Infertility Artificial Intelligence Approaches

The upcoming ESHRE 2022 annual conference is, not surprisingly, packed full of amazing new artificial intelligence and machine learning approaches for infertility. We are witnessing the birth of a new discipline in real time: Artificial Intelligence (AI) for infertility, reproduction, and specifically for assisted reproductive technologies (ART).

AI, particularly a specific branch called Machine Learning (ML), continues to advance in embryology and human reproduction research. The number of AI/ML-related published abstracts presented at the annual American Society for Reproductive Medicine (ASRM) and European Society for Human Reproduction and Embryology (ESHRE) meetings numbered over 200 from 2019-2021, an incredible proliferation from 2018, when 16 abstracts were reported, and 2017, where just one abstract was reported each at ASRM (AI) and ESHRE (ML). There has been a distinct shift in the nature of the abstracts; from early reports of model creation to model characterization and testing, with many studies being supported by “industry” and many AI models being commercialized at a rapid rate.

It seems as though AI is reshaping healthcare in general and reproductive medicine before our eyes. AI, ML, natural language processing (NLP) and deep learning (DL) enable us to identify infertility related healthcare problems and solutions faster with more accuracy, using data patterns to make provider-informed clinical decisions.

Infertility Artificial Intelligence is Evolving Fast!

AI itself is a rapidly evolving field and research into unsolved problems is ongoing. Some of the top unsolved challenges within AI research and development are: the need for unsupervised learning and desire for explainable AI; the need for massive computing power (massively parallel computing may one day be solved by quantum computing); the carbon footprint (estimates suggest that the carbon footprint of training a single AI is as much as 284 tonnes of carbon dioxide equivalent); data privacy and security; exhibiting common sense; human like visual perception; the lifespan of AI, the development of conscious (moral) AI, the development of artificial imagination, identifying, withstanding and monitoring hazards (ie safety engineering for AI systems deployed in high stakes fields like healthcare to withstand adversaries, detect malicious use, optimize system safety for cyber attacks).

Working at the intersection of disciplines brings a unique set of approaches and tools to solve problems. Meetings, such as the world’s first Infertility Artificial Intelligence Conference, play an essential role in career development for young scientists and clinicians.  As a scientific discipline grows and matures, sub disciplines naturally emerge as knowledge expansion occurs. The emergence of specialties within a discipline brings with it new needs; standards, regulatory, training, research. As we struggle to keep communication practices current within our discipline and between disciplines, the need for a formal vehicle to facilitate deep connections is obvious. Scientific congress serves as a vehicle for the exchange of primary data, current practices, and promising leads for future research.

Download the dataset! 52 Infertility Artificial Intelligence studies reported at ESHRE 2022!

Title AuthorsAbstract
The bias is out of the bag: IVF culture dish well number influences embryo selection decision-making and implantation outcomeD. Seidman1, R. Maor1, M. Shapiro1, C.M. Howles2, M. Meseguer3, D. Gilboa1.
1AiVF Ltd., IVF Research and Development, Tel Aviv, Israel.
2ARIES Consulting, Scientific Affairs, Geneva, Switzerland.
3IVI RMA Valencia, IVF Laboratory, Valencia, Spain.
Study question:

Is there a selection bias against embryos placed in higher-numbered wells inside a multi-well IVF culture dish. Does this selection bias alone impact implantation outcomes?

Summary answer:
Top-quality embryos present in higher-numbered wells are statistically less likely to be selected for transfer, independent of any differences in quality or development between wells.

What is known already:
Substantial intra-and inter-observer variability in embryo selection, as well as differences in quality assessment and laboratory environment, have been shown to affect IVF success. Currently many clinics have adopted stringent guidelines to control for human errors and workflow variation. Still, the impact of errors in laboratory and medical procedures was reported as high as 12%. This is particularly relevant for the IVF lab, where high workload and stress influence rate of errors and patient outcome. This groundbreaking study emphasizes how cognitive tendencies are inherent to the embryo selection process.

Study design, size, duration:
This study used a retrospective dataset from three highly experienced fertility clinics (1 US and 2 European clinics). A total of 4,275 Fresh IVF cycles were analyzed. For each treatment cycle, embryo quality grades, corresponding embryo well numbers, day 5 selection and implantation outcomes were documented. All cycles were performed using the EmbryoSlide 12-well culture dish and a time-lapse system. All three datasets were analyzed separately and also combined.

Participants/materials, setting, methods:
For each dataset, three analyses were conducted: (1) total number of selected embryos were calculated for each corresponding well number; (II) the proportion of implanted embryos, relative to the total number of selected embryos, were quantified to calculate the “success rate” for each well number; (II) the distribution of top-quality embryos between wells were quantified and compared. Results were normalized by total number of transferred embryos and IVF implantation success rates reported for each clinic.

Main results and the role of chance:
A negative trend was found between well number, ranging from 1-12, and number of embryos selected for transfer. This trend was significant (p<0.05) and occurred independently in each dataset. Odds ratios (OR) for the relation between selecting embryos for transfer from wells 1-5, and from 8-12 = Clinic A: 2.16, Clinic B: 1.78, Clinic C: 2.45. Alternative hypotheses were tested: (1) top-quality embryos are clustered in lower-numbered wells during culture; (2) enhanced embryo quality and conditions are found in lower-numbered wells, which should manifest in higher rates of implantation. Results for each clinic showed a statistically even distribution of top-quality embryos between wells (within 2 standard deviations from the mean; not significant), yet ‘success rate’ for transferred embryos increased by well number (by 12-30% between wells 1-5 and wells 8-12; OR= 1.19, 1.06, 1.08 for Clinic A, B, and C, respectively). An inverse trend existed between an embryo’s likelihood of being selected for transfer, and its likelihood of implanting. We conclude that embryologists may tend to select the first acceptable embryo for transfer. Embryos from higher-numbered wells were significantly more likely to implant, since they overcame this bias when equitably evaluated and selected for transfer.

Limitations, reasons for caution:
Though our findings were significant, they need to be repeated on larger datasets with more inter-centre variation, and key embryo culture and outcome variables recorded.

Wider implications of the findings:
This study emphasizes the inherent human error that exists inside IVF clinics. Machine learning systems that reduce human bias and increase objective standardization, even if they are not inherently better than embryologists, would improve implantation rates. Future studies should be directed toward AI based technologies that can accomplish this.

Keywords:
Bias
embryo selection
Artificial Intelligence
Reducing inter-observer and intra-observer variability of embryo quality assessment using deep learningE. Saïs1, A. Mayeur1, O. Binois1, L. Hesters1, V. Puy1, C. Fossard2, M. Filali2, J. Vandame2, M. Poulain2, N. Frydman1.
1Antoine Beclere Hospital, Reproductive Biology – Fertility Preservation – CECOS, Clamart, France.
2Foch Hospital, Department of Obstetrics and Gynecology, Suresnes, France.
Study question:

Does deep learning for embryo quality assessment reduce inter-observer variability ?

Summary answer:
An AUC of 87.65% was obtained for predicting blastocyst quality with a deep-learning algorithm trained by five embryologists who had good agreement between themselves

What is known already:
Time-Lapse (TL) allows continuous observation of embryo development in a controlled and stable environment. Recently the use of deep learning, in particular convolutional neural networks have been introduced to enhance blastocyst image classification using the growing TL image and video data.

Study design, size, duration:
A total of 409 embryos (5 images per embryo for a total of 2 045 images) were included in this retrospective study between 2016 and 2020.

Participants/materials, setting, methods:
A machine-learning algorithm (Retinanet) was trained to recognize 2 045 blastocyst images from 409 embryos on 2560×1928 images and output 500×500 images with the blastocyst centered on the image. Five embryologists classified the blastocysts using Gardner’s grading system. Each image was associated with one final grade using a majority voting system. The dataset was split into a training and validation set (1 640 images plus data augmentation) and a testing set (405 images).

Main results and the role of chance:
Fair agreement was found between the 5 embryologists when grading the embryo using Gardner’s grading system, with a maximum weighted kappa score of 39.60% reached.
As for the intra-observer variability, we show that for the same embryologist grading the same embryo after a 3 month “wash out” period, in 12% of the cases the embryologist changes the grade and the fate of the embryo, meaning that an embryo that was transferred/frozen during the first annotation period was discarded during the second one, or an embryo that was discarded during the first annotation period was transferred/frozen during the second one.
An Area Under the Curve (AUC) of 87.65% was obtained when testing the quality of 81 embryos (405 images) after training our algorithm on 54 038 images.
For external validation we tested the algorithm with annotations of the test set from embryologists coming from another fertility center. An AUC of 82.72% was obtained.

Limitations, reasons for caution:
The scarce number of images available in our training set compared with data sets from other more consequent clinics, and the fact that the algorithm was trained by embryologists does not suppress variability entirely. The GoogLeNet algorithm was not fined tune and was used as is.

Wider implications of the findings:
AI is showing precious value the field of embryology, from enhancing blastocyst quality prediction to removing inter-observer subjectivity. A possible evolution to our framework would be to predict the Gardner’s grading system for each morphological parameter.

Keywords:
Artificial Intelligence
inter-observer variability
time-lapse
blastocyst
intra-observer variability
Annotation-free embryo score calculated by iDAScore® correlated with live birth and has no correlation with neonatal outcomes after single vitrified-warmed blastocyst transferS. Ueno1, J. Berntsen2, M. Ito1, T. Okimura1, K. Kato3.
1Kato Ladies Clinic, IVF Laboratrory, Tokyo, Japan.
2Vitrolife A/S, Data Science, Arhus, Denmark.
3Kato Ladies Clinic, Gynecology, Tokyo, Japan.
Study question:

Does the embryo score calculated by annotation-free embryo scoring system based on deep learning and time-lapse sequence images correlate with live birth (LB) and neonatal outcomes?

Summary answer:
Annotation-free embryo score calculated by iDAScore correlates with decreased miscarriage and increased LB and has no correlation with neonatal outcomes.

What is known already:
Embryo ranking models have recently been developed based on artificial intelligence (AI) and deep learning to rank embryos according to their potential for pregnancy. The practicability and usability of such models have been reported. And the previous report suggested iDAScore which is one of the deep learning models for embryo scoring was superior to traditional morphological assessment methods and morphokinetic embryo assessment models. However, few studies have used independent datasets to analyze the correlation between the score calculated by AI models, LB, and neonatal outcomes.

Study design, size, duration:
A total of 3,010 single vitrified-warmed blastocyst transfer (SVBT) cycles were analyzed retrospectively. The quality and scoring of embryos were assessed using iDAScore v1.0 (iDAScore, Vitrolife, Sweden). The cohort was divided into four groups based on the iDAScore according to the percentile (9.9-9.3, 9.2-8.7, 8.6-7.3 and, 7.2-1.0).

Participants/materials, setting, methods:
Scores were calculated using the iDAScore software module in the Vitrolife Technology Hub (Vitrolife, Gothenburg, Sweden). The correlation between iDAScore, LB rates and total miscarriage (TM), including 1st and 2nd trimester miscarriage, were analysed using a trend-test and multivariable logistic regression analysis. Furthermore, similarly, correlation between the iDAScore and neonatal outcomes were analysed.

Main results and the role of chance:
LB rates decreased as the iDAScore decreased (P < 0.05), and a similar inverse trend was observed for the TM rates (P < 0.05). Additionally, multivariate logistic regression analysis showed that iDAScore significantly correlated with increased LB (adjusted odds ratio: 1.742, 95% CI: 1.601–1.904, P < 0.05) and decreased TM (adjusted odds ratio: 0.799, 95% CI: 0.706–0.905, P < 0.05). There was no significant correlation between iDAScore and neonatal outcomes, including congenital malformations, sex, gestational age, and birth weight. Multivariate logistic regression analysis, which included maternal and paternal age, maternal body mass index, parity, smoking, and the presence or absence of caesarean section as confounding factors, revealed no significant difference in any neonatal characteristics (low birth weight, small for gestation, large for gestation, preterm birth, male sex rates, and major congenital malformation).

Limitations, reasons for caution:
SVBT was performed following minimal stimulation and natural cycle in vitro fertilisation. Therefore, only a few cycles of elective blastocyst transfer were available. However, there was no bias in selecting embryos for SVBT.

Wider implications of the findings:
Objective embryo assessment using a completely automatic and annotation-free model, like iDAScore, showed a good correlation with increased LB and decreased TM. Furthermore, it did not correlate with neonatal outcomes. Therefore, iDAScore may be an optimal LB prediction model after SVBT without affecting neonatal outcomes.

Keywords:
live birth
Artificial Intelligence
Neonatal outcomes
objective assessment
Single frozen blastocyst transfer
Simplifying the complexity of time-lapse decisions with AI: CHLOE (Fairtility) can automatically annotate morphokinetics and predict blastulation (at 30hpi), pregnancy and ongoing clinical pregnancyH.K. Yelke1, G. Ozkara2, B. Yuksel3, Y. Kumtepe Colakoglu2, M. Aygun3, A. Brualla4, I. Erlich4, C. Hickman4, S. Selimoglu3, B. Okten3, S. Kahraman3.
1İstanbul Memorial Sisli Hospital, ART and Reproductive Genetics, Istanbul, Turkey.
2Memorial Sisli Hospital, ART and Reproductive Genetics, Istanbul, Turkey.
3Istanbul Memorial Sisli Hospital, ART and Reproductive Genetics, Istanbul, Turkey.
4Fairtility, Consultant, Tel Aviv, Israel.
Study question:

What is CHLOE’s (Fairtility) efficacy of prediction of blastulation (at 30hpi), pregnancy and ongoing clinical pregnancy following single embryo transfer (SET)?

Summary answer:
CHLOE(Fairtility) algorithms are effective predictors of blastulation, ploidy, pregnancy, implantation and ongoing clinical pregnancy

What is known already:
Time-lapse incubators have increased the amount of information available to the embryologist to help determine the fate of embryos. This has led to differences in clinical practice between clinics in how this information is prioritised. Moreover, inter-operator inconsistencies and the time-consuming nature of manually annotating time-lapse videos are challenges currently experienced by time-lapse users that can be relieved with Artificial Intelligence(AI) tools, such as CHLOE(Fairtility). CHLOE levergaes AI-based predictors to predict blastulation and implantation, whilst providing transparency to which biological characteristics have led to that determination. There is a need to validate AI tools before their incorporation into clinical practice.

Study design, size, duration:
This was a single centre study that took place between 2017-2020, at Istanbul Memorial Sisli Hospital in Turkey, ART and Center. This was a retrospective cohort analysis that reviewed 6748 time-lapse videos containing 5392 cleaved embryos, 3763 blastocysts, 877 single embryo transfers(SET) with known ongoing pregnancy outcome (KOPO), 306 euploid SETs and 25 mosaic embryo SETs with KOPO. CHLOE blastocyst and implantation score efficacy of prediction of clinical outcomes was quantified using the metric AUC.

Participants/materials, setting, methods:
Time-lapse videos were assessed using CHLOE(Fairtility), an AI based tool, to quantify quantitative and qualitative morphokinetics (including automated annotations of tPNa,tPNf,t2,t3,t4,t5,t6,t7,t8,t9,tM,tSB,tB,tEB), CHLOE implantation score and CHLOE blastocyst score (calculated at 30hpi) relative to laboratory (ploidy results, blastulation) and clinical outcomes (biochemical, clinical and ongoing pregnancy) following overall SET. Binary logistic regression was used to calculate area under the curve (AUC) as a measure of prediction efficacy.

Main results and the role of chance:
Blastulation score assessment of cleaved embryos was predictive of blastulation (AUC=0.96, baseline=70% n=5392, p<0.001). Following PGT-A, implantation score was predictive of euploids (AUC=0.61, baseline=34%, n=1456, p<0.001), but not of embryos classified as mosaics (AUC=0.5, baseline=19%, n=1456, p>0.05). Following SET, implantation score was predictive of biochemical (AUC=0.71, baseline=49%, n=866, p<0.001), clinical and ongoing pregnancy rate (AUC=0.69, baseline=37%, n=866, p<0.001). Following SET of non-PGT-A embryos, implantation score decreased with increasing patient age (p<0.001). The type of aneuploidy (such as monosomy, trisomy, segmental) did not affect implantation score or blastulation score (p>0.05). Implantation score prediction of outcome was higher for non-PGT-A transfers than overall transfers for biochemical (Non-PGTA: AUC=0.73, baseline=33%, n=535, p<0.001; OVERALL: AUC=0.71, baseline=49%, n=866, p<0.001), clinical and ongoing pregnancy (Non-PGTA: AUC=0.76, baseline=24%, n=535, p<0.001; OVERALL: AUC=0.69, baseline=37%, n=866, p<0.001), despite lower baselines.

Limitations, reasons for caution:
This is a single centre study, using retrospective data where embryos were selected for transfer by human embryologists. Despite the data has heterogeneity in terms of clinical features, the study is part of a larger framework for responsible incorporation of AI into clinical practice through robust validation.

Wider implications of the findings:
AI-based tools have the potential of increasing consistency, efficiency and efficacy of embryo selection. The additional information on quantitative and qualitative morphokinetics that AI tools such as CHLOE provide, bring transparency to the prediction, allowing for improvement in personalisation of care down to each individual embryo.

Keywords:
pregnancy prediction
time lapse monitoring
artificial intelligence (AI)
IVF
embryo morphokinetics
An expected benefit analysis of using an interpretable machine learning model for gonadotropin starting dose selection during ovarian stimulationJ. Tang1, M. Fanton2, P. Maeder-York1, E. Hariton3, O. Barash4, L. Weckstein5, D. Sakkas6, A. Copperman7, K. Loewke2.
1Alife Health, Product Management, San Francisco, U.S.A..
2Alife Health, Data Science, San Francisco, U.S.A..
3University of California- San Francisco, Reproductive endocrinology and infertility, San Francisco, U.S.A..
4Reproductive Science Center of the San Francisco Bay Area, Embryology, San Francisco, U.S.A..
5Reproductive Science Center of the San Francisco Bay Area, Reproductive Endocrinology and Infertility- Obstetrics and Gynecology, San Francisco, U.S.A..
6Boston IVF, Embryology, Boston, U.S.A..
7RMA of New York, Obstetrics- Gynecology and Reproductive Science, New York City, U.S.A..
Study question:

What is the expected benefit of using a machine learning model for predicting the optimal starting dose of gonadotropin during ovarian stimulation?

Summary answer:
Patients who had an optimal starting gonadotropin dose had improved outcomes and used significantly less total FSH compared to propensity matched patients who did not.

What is known already:
The relationship between the starting dose of follicle-stimulating hormones (FSH) and ovarian response is complex. In general, too little starting FSH may lead to inadequate follicle recruitment, while too much may lead to excessive response. In completed cycles, there exists conflicting evidence of whether higher doses are beneficial or detrimental to the number of oocytes retrieved. The field of assisted reproduction has begun to apply machine learning techniques to clinical decision support for ovarian stimulation, but no studies have specifically investigated optimizing starting FSH dose selection.

Study design, size, duration:
We performed a retrospective analysis of patients undergoing autologous, non-cancelled IVF cycles from 2014 – 2020 (n=18,591) at three different IVF clinics in the United States. The primary outcomes were the average number of MIIs, 2PNs, and usable blastocysts in relation to starting and total doses of FSH.

Participants/materials, setting, methods:
A K-nearest neighbor similarity model was trained on all cycles and used to identify the 100 most similar patients to a patient-of-interest using age, BMI, baseline anti-mullerian hormone (AMH), and baseline antral follicle count (AFC). For each patient, a patient-specific dose response curve was created by fitting a constrained second order polynomial to the number of MII oocytes relative to the starting dose of FSH across all of the neighbors.

Main results and the role of chance:
For each patient, their individual dose response curve was used to determine if there was an optimal dose that maximizes the prediction of MIIs (called dose-responsive patients), or if the dose response curve shows no optimal dose (called non-responsive patients). 30% of cycles were identified as dose-responsive, 64% were identified as non-responsive, and 6% were inconclusive and excluded from analysis. Dose-responsive patients who received an optimal starting dose had, on average, 1.5 more MIIs, 1.0 more 2PNs, and 0.5 more usable blastocysts using 10 IU’s less of starting FSH and 195 IU’s less of total FSH compared to propensity-matched patients with non-optimal doses. Non-responsive patients who received a low starting dose had, on average, 0.3 more MIIs, 0.4 more 2PNs, and 0.3 more usable blastocysts using 150 IU’s less of FSH and 1375 IU’s less of total FSH compared to propensity-matched patients with a high starting dose.

Limitations, reasons for caution:
The primary limitation is the retrospective nature of this study. Further, our calculations of starting FSH combined the contribution of pure FSH plus the FSH component of FSH/LH medication, rather than evaluating each separately. We also did not differentiate between types of protocols, as the majority were antagonist cycles.

Wider implications of the findings:
Our results suggest that a patient similarity model for selecting starting FSH can help increase MII outcomes while reducing the amount of FSH given to a patient. Future work will include continuing to increase the diversity of our dataset and performing validation studies to show improved outcomes with model use.

Keywords:
ovarian stimulation
mature oocytes
Artificial Intelligence
machine learning
Artificial intelligence system detects “goldilocks” morphokinetic zone for embryos transferred or frozen in time-lapse videosJ.A. Castilla1, N. Almunia2, A. Brualla3, R. Jiménez4, A.M. Villaquirán4, I. Har-vardi5, A. Ben-Meir5, E. Gomez2.
1CEIFER Biobanco, Sperm and egg bank, Granada, Spain.
2Next Fertility Murcia, IVF Lab, Murcia, Spain.
3Fairtility, Embryology, Tel Aviv, Israel.
4Next Fertility Murcia, Gynecology, Murcia, Spain.
5Fairtility, Clinical department, Tel Aviv, Israel.
Study question:

Are there specific morphokinetic time points which can be used to determine whether an embryo should be discarded?

Summary answer:
Morphokinetic ranges where embryos will be discarded rather than transferred or cryopreserved, can be defined using time-lapse annotations automatically generated with artificial intelligence (AI).

What is known already:
Time-lapse incubation has changed the way embryos are selected. Instead of static daily observations, continuous monitoring of embryos allows for generation of morphokinetic parameters which quantify the pace of development. However, annotations by humans have been shown to incur operator variations and are time-consuming to perform. AI can automatically annotate embryos with equivalence in accuracy to experienced embryologists. Although most embryo selection methods are designed to identify the embryo with the highest chance of becoming a healthy live birth baby, the ability to identify embryos that will not be suitable for treatment is equally important for clinical decision making.

Study design, size, duration:
This is a prospective, observational, cohort study. Time-lapse videos from 142 embryos from a private fertility clinic in Spain were automatically annotated using CHLOE (Fairtility), an AI-based software. CHLOE automatically generated the following morphokinetic parameters: tPNa, tPNf, t2, t3, t4, t5, t6, t7, t8, t9+, tM, tSB, tB, tEB.

Participants/materials, setting, methods:
Embryos analysed were from donor and own oocyte’s treatments. Selected embryos were analysed using CHLOE, to automatically identify morphokinetic parameters. The distribution for each morphokinetic parameter was compared between fates (data presented for transferred + frozen vs discarded as mean+-standard deviation, 2-sided t-test). Each continuous morphokinetic parameter was categorised according to the ranges where embryo utilisation was futile (<1%), optimal (maximum utilisation rate) or reduced utilisation rate (between optimal and futile).

Main results and the role of chance:
For every morphokinetic parameter the difference in event time between frozen+transferred vs discarded embryos was statistically significant(p<0.003). The results detail the time point in hours for each morphokinetic feature to occur (mean(SD) frozen+transferred vs discarded, p-value):

tPNa (7.68(2.03)vs22.04(27.15),p<0.0001), tPNf (21.71(2.86)vs34.63(24.11),p<0.0001),

t2 (24.92(2.71)vs33.78(16.17),p<0.0001), t3 (34.62(4.03)vs42.58(22),p=0.0024),

t4 (37.29(4.31)vs48.29(20.29,p<0.0001), t5 (47.03(6.47)vs55.32(22.63),p=0029),

t6 (49.54(5.63)vs60.56(22.20),p<0.0001), t7 (53.1(7.86)vs69.13(24.54),p<0.0001),

t8 (57.78(9.78)vs77.33(25.79),p<0.0001), t9+ (69.14(7.39)vs81.9(21.96),p<0.0001),

tM (83.9(8.72)vs96.08(16.88),p<0.0001), tSB (97.89(7.55)vs105.38(11.38),p=0.0005),

tB (105.74(7)vs113.25(15.53),p=0.0002), eEB (110.65(7.58)vs120.47(11.36),p=0.0031).

When looking at the exact distribution of these embryos according to time, it became apparent that a goldilocks zone appeared whereby the proportion of embryos transferred or frozen peaked, and the number discarded was at its minimum. The converse was true when looking at the more extreme values of a particular parameter. Thus, we were able to determine the (optimal vs futile time ranges):

tPNa (4.4-8.8 hours, where the utilization rate was at its maximum vs <4.4 or >13.7, where the utilization rate was at its minimum), tPNf (19.1-23.2vs<9.4,>28.9), t2 (23.-36.4vs<19.9,>33.6), t3 (32.1-37.4vs>24.6,>43), t4 (34-40.2vs<29.5,>55), t5 (42.7-52vs<33.7,>63.5), t6 (45-4-54.2vs<36.10,>63.70), t7 (47.8-56.7vs<42.8,>77.5), t8 (49.2-64.5vs<44.5,>82.5), t9+ (64.1-74.2vs<57,>90), tM (76.6-92.6vs<64.7,>104.2), tSB (91.2-105vs<81.3,>113.8), tB(97.2-111.2vs<92,>118.7), tEB (103.4-116.7,<94.7,>122.5).

A 60% of the embryos were in the futile range in at least 1 parameter, from which only 1 in 3 were utilised.

Limitations, reasons for caution:
This is a single centre study. Further work will (i) test the limits across different clinics, with different geographical demographic variations, and varied clinical practices, to understand how these factors affect the limits between futile and optimal ranges of morphokinetics, and (ii) assess clinical outputs (implantation, ploidy, live birth).

Wider implications of the findings:
Identifying objective ranges for determining when an embryo is not suitable for treatment will help reduce variation between and within embryologists and clinics; will avoid overly optimistic decisions which waste time and resources and increase patient’s emotional burden, and increase professional confidence when selecting embryos for discarding, transfer or freezing

Keywords:
Arificial Intelligence
embryo selection
Artificial intelligence (AI) based triage for preimplantation genetic testing (PGT); an AI model that detects novel features in the embryo associated with ploidy.M. Meseguer Escriva12, R. Maor3, L. Bori4, M. Shapiro3, A. Pellicer5, D. Seidman3, A. Mercader6, D. Gilboa7.
1Instituto Valenciano de Infertilidad, IVF Laboratory, Valencia, Spain.
2Health Research Institute la Fe, BIOMARCADORES- MEDICINA GENÓMICA- ESTADÍSTICA Y ANÁLISIS MASIVO DE DATOS EN REPRODUCCIÓN HUMANA ASISTIDa, València, Spain.
3AIVF, Aivf, Tel-aviv, Israel.
4IVI Foundation, Embryology, Valencia, Spain.
5IVI RMA Rome, Reproductive Medicine, Roma, Italy.
6IVIRMA Valencia, Genetic Laboratory, València, Spain.
7AIVF, Aivf, Tel Aviv, Israel.
Study question:

Can an AI based triage system noninvasively detect aneuploidy in preimplantation embryos in a precise and valid manner?

Summary answer:
Using a feature extraction approach to identify features in time-lapse images, an AI model was validated and found to noninvasively detect ploidy with unprecedented accuracy.

What is known already:
Invasive PGT with trophectoderm biopsy is the gold standard for evaluating the genetic integrity of an embryo prior to transfer. Even so, its utility and diagnostic accuracy is debated due to concern of structural damage, sampling bias and viability after vitrification-warming. Though several noninvasive methods for evaluating ploidy have been developed, their main limitations lay in their accuracy. This study reports on the ongoing validation of an AI model that relies on feature extraction and thresholding techniques to distinguish between aneuploid and euploid embryos; the model is intended to be used in clinical settings for PGT triage and preferential transfer.

Study design, size, duration:
In this single-center study, we used a retrospective dataset consisting of time-lapse images from 2,502 preimplantation embryos with known ploidy status to train and validate the AI model.

Participants/materials, setting, methods:
The model utilized videos captured from time-lapse incubator (Embryoscope) up to 144 hours post-fertilization with chromosome analysis performed using next-generation sequence technology as ground truth labels. The data set was divided using an 70/15/15 training-validation-test split of the data. The AI model included convolutional neural network extracted features alongside spatial features based on several biological and clinical characteristics known to associate with ploidy, embryo behavior, and function. Performances were measured by validation and test-set accuracy.

Main results and the role of chance:
Five feature modules were included in the AI model for ploidy evaluation. All modules were analyzed separately and combined: (I) automated detection of abnormal morphokinetic patterns (t2-t8, tM, tSB, tB, tHB) differentiated between the two classes (aneuploid and euploid) to predict aneuploidy with an accuracy of 52%, p<0.05; (II) previously validated embryo grading classification algorithm demonstrated an association between A and C-grade embryos with euploidy and aneuploidy, respectfully, with an accuracy of 68%, p<0.05; (III) differential cell division activity and compaction between the two classes predicted aneuploidy with an accuracy of 73%, p<0.05; (IV): AI-based classification of mitochondrial DNA content, measured as 0.5 micron irregularities in time-lapse images, predicted aneuploidy with an accuracy of 77%, p<0.05; blastocoelic contractions of more than 8 microns in diameter predicted aneuploidy with 56% accuracy, p<0.05. Using our AI model, we were able to integrate all 5 features, thereby achieving an unprecedented 90% accuracy. Two features – detection of abnormal morphokinetic patterns and blastocoelic contractions – occur in a minority of embryos (in 3% and 20% of all embryos in the database, respectively). When they do occur, they independently predict aneuploidy with an accuracy of 90% and 82%, demonstrating the robustness of our multi-feature model.

Limitations, reasons for caution:
Our AI model needs to be tested on a large, multi-centric dataset to ensure standardization and ability to be replicated in different settings. Even so, given our high degree of demonstrated accuracy, we conclude that our single-center dataset was sufficient for developing the initial validation of the model reported here.

Wider implications of the findings:
The ‘explainability’ and implementation of our AI model enables more objective embryo quality assessment and improves the clinics’ ability to prioritize embryos for PGT and preferential transfer using a validated and trusted framework that reduce dramatically the chances of transferring an aneuploid embryo to our patients.

Keywords:
Artificial Intelligence
PGT-A
time-lapse
Embryo
Elucidation of blastocyst collapse and its consequences: a comprehensive artificial intelligence-powered analysis of 1943 embryos from 643 couples.D. Cimadomo1, A. Marconetto2, F. Innocenti1, S. Trio3, V. Chiappetta1, D. Soscia1, L. Albricci1, L. Dovere1, A. Giancani1, R. Maggiulli1, I. Erlich4, A. Ben-Meir4, I. Har-Vardi4, F.M. Ubaldi1, L. Rienzi1.
1Clinica Valle Giulia, GeneraLife IVF, Roma, Italy.
2National University of Córdoba, University Institute of Reproductive Medicine, Córdoba, Argentina.
3GeneraLife Milan, GeneraLife IVF, Milan, Italy.
4Fairtility Ltd., Fairtility, Tel Aviv, Israel.
Study question:

What are the causes and consequences of blastocyst collapse?

Summary answer:
~50% of blastocysts collapsed, especially if they are aneuploid and/or morphologically-poor. Yet, no impact on the live-birth-rate (LBR) per vitrified-warmed euploid single-embryo-transfer (SET) was reported.



What is known already:
Time-lapse-microscopy (TLM) is a powerful tool to describe the peculiar dynamics of preimplantation development. Lately, artificial intelligence (AI) has been also implemented to automatize and standardize such description. Here, we adopted AI to comprehensively portray blastocyst collapse, namely the phenomenon of embryo contraction with an efflux of blastocoel fluid and the detachment of the trophectoderm (TE) from the Zona Pellucida (ZP). Although, the causes of this event are still undetermined, small blastocyst contractions have been reported beneficial for the hatching process, while a full collapse has been associated with lower competence.

Study design, size, duration:
Observational study including 1943 blastocysts from 643 couples cultured in the Embryoscope between January-2013 and December-2020. TE biopsy without day3 ZP drilling and comprehensive-chromosome-testing were performed. The Fairtility® software automatically registered: (i)time of starting-blastulation (tSB), (ii)starting and ending time of each collapse (tSC and tEC), (iii)blastocysts’ areas, (iv)shrinkage% [(area at SC – area at EC)/area at SC)], (v)embryo:ZP ratio at EC (area of the collapsed embryo/area of the ZP), and (vi)time of biopsy (t-biopsy).

Participants/materials, setting, methods:
Blastocyst quality was defined according to Istanbul Consensus (11, excellent; 12-21, good; 22-13-31, average; 33-23-32, poor) and with the Fairtility implantation score (IS) as well, i.e., a continuous variable from 0 to 1 generated by the KID+ software based on the TLM videos of preimplantation development. The main outcome was the LBR per euploid SET adjusted for confounders through logistic regressions. All couple and embryo features were also investigated for their association with blastocyst collapse.

Main results and the role of chance:
47.3% of the blastocysts collapsed 1- to 9-times (interval between collapses: 4-8hr), and 73% of the couples had ≥1 collapsed blastocyst (1.8±1.1, range:1-8). No couple feature, though, was associated with blastocyst collapse. The longest collapses lasted 1.5±1.1 (0.13-5.1)hr, while the largest shrinkage% and embryo:ZP ratio at EC were 35±14% (10-78%) and 81±9% (33-90%), respectively. In ~50-60% of collapses a 20-40% blastocyst volume reduction was registered, 40-60% or 20-40% in ~15-30%, 60-80% in 0-4%. In case of multiple collapses, the first three involved smaller shrinkages. Blastocysts undergoing ≥1 collapse showed similar tSB as not-collapsing blastocysts, but progressively longer tEB and t-biopsy. The earlier the first event, the more the consecutive collapses. Notably, the poorer the morphology, the higher the risk (excellent, good, average, and poor not-collapsing blastocysts were 64%,50%,44% and 37%), number (e.g.,≥4 collapses were 0.4%,2%,4% and 8%) and duration (1.2±1.0,1.4±1.0,1.6±1.1 and 1.9±1.3hr) of blastocyst collapse. Collapsing blastocysts were significantly less euploid than non-collapsing (35% vs 47%; multivariate-OR:0.75,95%CI 0.6-0.92,p<0.01); conversely, their LBR per euploid SET (39% vs 46%) and miscarriage rate per clinical pregnancy (17% vs 11%), were not significantly different (adjusted-OR:1.0,95%CI 0.69-1.48,p=0.96 and adjusted-OR:1.65,95%CI 0.79-3.42,p=0.18, respectively). All data were confirmed also by defining blastocyst quality through the Fairtility IS.

Limitations, reasons for caution:
Gestational and perinatal outcomes were not assessed. Other culture strategies and media shall be assessed for their association with blastocyst collapse. Perhaps, future studies from other groups and with a larger sample size might unveil a significant impact on the clinical outcomes.

Wider implications of the findings:
Collapse is common and delays blastocyst full-expansion. Moreover, poor morphology and aneuploidies involve a higher risk of collapse(s); however, no impact was reported on the clinical outcomes after euploid SET. AI appears to increase the throughput of the analysis, but additional data are required to research the causes of collapse.

Keywords:
automation
Artificial Intelligence
blastocyst collapse
blastocyst shrinkage
euploid blastocyst implantation
Comparing the efficacy of two commercially available AI-based embryo assessment tools in a large clinical laboratory.D. Iacobelli1, E. Glage1, L. Burmeister2, K. Sorby1.
1Number 1 Fertility, Embryology, Melbourne, Australia.
2Number 1 Fertility, Clinician, Melbourne, Australia.
Study question:

Are artificial intelligence (AI) systems an effective tool to aid in assessment of embryo viability, that can be implemented in a busy clinical laboratory setting?

Summary answer:
LW and iDA scores both have value in predicting embryo potential, however iDA was a significantly stronger indicator of clinical pregnancy and live birth rates.

What is known already:
Embryologist assessment of blastocysts can be variable, influenced by a range of factors including fatigue, experience, emotional bias, workload, and time pressures. An advantage of AI technologies is the consistent and objective assessment of embryos, along with the potential to observe variations not detectable by the human eye. Both AI systems have been demonstrated by their developers to have a degree of effectiveness, however there is less data to show how this translates to routine clinical use in a laboratory not involved in the development of either system.

Study design, size, duration:
Embryos were cultured in EmbryoScope+ using Vitrolife sequential media at 6% CO2 and 5% O2. All single embryo transfers of blastocyst stage embryos during 2020 (n=806) were assessed by two independent commercially available AI-based systems, Life Whisperer (LW) by Presagen and Intelligent Data Analysis (iDA) by Vitrolife. Both systems assessed embryos independently of embryologist assessment. Scores from LW and iDA were analysed against clinical pregnancy and birth outcomes.

Participants/materials, setting, methods:
LW and iDA independently assigned each embryo a score from 0-10 based on prediction of embryo viability. LW scores were calculated by analysing a static 2D image of a blastocyst, input by the embryologist. iDA scores were generated from Embryoscope+ timelapse footage after a minimum of 112 hours in culture. Clinical pregnancy was defined as presence of a gestational sac, and pregnancy loss as any clinical pregnancy that did not result in live birth.

Main results and the role of chance:
This cohort had a mean patient age of 36.9 years and had undertaken a mean of 3.2 prior cycles. The overall clinical pregnancy rate was 45.3%. Of these pregnancies, 79.2% resulted in live birth, thus the pregnancy loss rate was 20.8%. Scores at the extremes of each scale, <5 and ≥9, resulted in a statistically significant difference in clinical pregnancy rates from both iDA scores (24.0% and 59.3% respectively, p=0.0001) and LW scores (38.0% and 53.1% respectively, p=0.0031), with this difference being more pronounced in iDA assessments. When stratifying scores into five categories, a linear relationship was observed between increasing score and pregnancy rate. This relationship was consistent whether assessing by raw score, as generated by the AI program, or by allocating a designated proportion of the cohort to each category. Interestingly, pregnancy loss significantly decreased with increasing iDA scores (iDA<5: 50% and iDA≥9: 13.7%, p=0.0066). Although a similar trend was seen with LW scores, this was less pronounced and not statistically significant (LW<5: 28.6% and LW≥9: 18.3%, p=0.1124). This demonstrates that iDA score was both more representative of the chance of establishing a clinical pregnancy and of that pregnancy resulting in a live birth.

Limitations, reasons for caution:
This study was conducted at a single clinical laboratory and results may not necessarily be applicable to all settings. Variables such as differing culture conditions, the transfer of multiple embryos, or transfer of embryos prior to the blastocyst stage, may impact these findings.

Wider implications of the findings:
AI technologies may aid embryologists in the selection of embryos and reduce variability between scientists. More effective ranking of embryos may reduce the number of cycles required to achieve a pregnancy and potentially reduce pregnancy loss. This is particularly valuable to patients who have many embryos available for transfer.

Keywords:
embryo selection
Artificial Intelligence
Embryo ranking agreement between embryologists and AI algorithmsN. Zaninovic1, J. Sierra2, J. Malmsten1, Z. Rosenwaks1.
1Weill Cornell Medicine, Center for Reproductive Medicine, New York, U.S.A..
2QED Analytics- LLC, n/a, New York, U.S.A..
Study question:

What is the level of agreement between different AI algorithms and embryologists when ranking blastocysts?

Summary answer:
In general, embryologists have a stronger level of agreement with each other, whereas AI algorithms differ greatly between embryologists and among each other.

What is known already:
Previous studies comparing agreement among embryologists ranking embryos have shown moderate to high inter-and intra-agreement levels. To our knowledge, this is the first study that endeavors to evaluate the level of agreement between different AI algorithms and embryologists in regard to ranking embryo quality.

Study design, size, duration:
Study data consisted of time-lapse images of 800 embryos from 100 patients (8 embryos each). All embryos were created from fresh oocytes retrieved at a single center between 2019 and 2020 and fertilized using ICSI. They were cultured in TLM incubators (Vitrolife, Sweden) and developed for 120 hours. The cohort included at least 8 embryos that started to blastulate (sTB) before 120 hours post-fertilization (HPF–ICSI). Patients older than 38 years were excluded.

Participants/materials, setting, methods:
Five international embryologists ranked embryos using single images; three also ranked embryos using TLM videos. Eight international AI companies anonymously ranked the embryos using AI models; half used single images while the others used full videos. The Kendal Tau statistic was used to determine the agreement level between the ranking methods; -1 denotes 100% disagreement and 1 denotes perfect agreement. The pair-wise agreement in selecting the top-one and top-two embryos was compared across all methods.

Main results and the role of chance:
The embryologists had relatively high degree of agreement in the overall ranking of 100 cycles (average K-t=0.70), slightly lower than the inter-embryologist agreement when using a single image or video (average K-t=0.78). Overall agreement between embryologists and the AI algorithms was significantly lower (average K-t=0.53) and similar to inter-AI algorithm agreement (average K-t=0.47). Notably, two of the eight algorithms had a very low agreement with other ranking methodologies (average K-t=0.05).

The average agreement in selecting the best-quality embryo (1/8 in 100 cycles, expected agreement by random chance, 12.5% CI95:6-19%) was 59.5% among embryologists and 40.3% for six AI algorithms, for the two algorithms with the low overall agreement, the incidence of the agreement was 11.7%.

Agreement on selecting the same top-two embryos/cycle (expected agreement by random chance, 25.0% CI95:17-32%) was 73.5% among embryologists and 56.0% among AI methods excluding two discordant algorithms, which had an average agreement of 24.4%, the expected range of agreement from random chance.

Intra-embryologist ranking agreement (single image vs. video) was 71.7% and 77.8% for single and top-two embryos, respectively.

Analysis of average raw scores indicated cycles with low diversity of embryo quality generally resulted in lower overall agreement between the methods (embryologists and AI models).

Limitations, reasons for caution:
Given the selection process for cycles and the corresponding embryos, the ground truth cannot be assertained as no implantation or pregnancy outcome was assessed or compared. Although this study can identify agreement between different ranking methods, it can not determine which assessment method is correct.

Wider implications of the findings:
Our results suggest that the AI method used to assign relative embryo quality may result in a significantly different selection and, presumably, outcome. Further studies should evaluate the source of the disagreement in embryos for which the outcome is known.

Keywords:
Artificial Intelligence
embryo ranking
Uncovering the value of day 7 blastocysts using artificial intelligence on time lapse videos.F. innocenti1, D. Cimadomo1, D. Soscia1, V. Casciani1, S. Trio2, V. Chiappetta1, L. Albricci1, R. Maggiulli1, G. Fabozzi1, I. Erlich3, A. Ben-Meir3, I. Har-Vardi3, A. Vaiarelli1, F.M. Ubaldi1, L. Rienzi1.
1Clinica Valle Giulia, GeneraLife IVF, Rome, Italy.
2GeneraLife Milan, GeneraLife IVF, Rome, Italy.
3Fairtility Ltd., Fairtility, Tel Aviv, Israel.
Study question:

Which is the clinical value of day 7 blastocysts?

Summary answer:
Ending embryo culture at 144 hours-post-insemination (hpi) would involve7.3%- and 4.4%-relative reductions in the patients obtaining euploid blastocysts and live birth(s)(LBs), respectively.

What is known already:
Many studies showed that day 7 blastocysts are clinically valuable although less euploid and less competent than faster growing embryos. Nevertheless, a large variability exists in: (i) the definition of “day 7”; (ii) the criteria to culture embryos to day 7; (iii)the clinical setting; (iv) the local regulation; and/or (v) the culture strategies and incubators. Here,we aimed at ironing out these differences and portray day 7 blastocysts with the lowest possible risk of bias. To this end, we have also adopted an artificial intelligence (AI)-powered software to automatize developmental timings annotations and standardize embryo morphological assessment.

Study design, size, duration:
Observational study including 1966 blastocysts obtained from 681 patients cultured in a time lapse incubator between January 2013 and December 2020 at a private Italian IVF center.

Participants/materials, setting, methods:
Trophectoderm biopsy without hatching and comprehensive-chromosome-testing were performed. Blastocysts were clustered in six groups based on the time-of-biopsy every 12hr from <120hpi (control) to >168hpi. Blastocyst quality, time-of-expanding-blastocyst (tEB) and duration of expansion were annotated through AI and confirmed manually. The main outcomes were euploidy-rate and LB-rate (LBR) per transfer. Lastly, patients obtaining (euploid) blastocysts, LBs, and supernumerary blastocysts, were reported based on a hypothetical 144hpi cut-off, and all relative reductions calculated.

Main results and the role of chance:
14.6% of the blastocysts reached full expansionbeyond 144hpi (5.9% between 144-156hpi, 7.9% between 156-168hpi, and 0.8% >168hpi). Slower blastocysts were of a worse quality based on the evaluation of both embryologists and AI. Both longer tEB and a longer duration of expansion concurred to day7 development, quite independently of embryo quality. The lower euploidy rate among day7 blastocysts is due to their worse morphology and more advanced oocyte age, rather than to a slower development per se. Conversely, the lower LBR was significant even after adjusting for confounders, with a first relevant decrease for blastocysts biopsied in the range 132-144hpi (N=76/208, 36.5% versus N=114/215, 53.0% in the control, multivariate-OR: 0.61, 95%CI 0.40-0.92, adjusted-p=0.02), and a second step for blastocysts biopsied in the range 156-168hpi (N=3/21, 14.3%, multivariate-OR:0.24, 95%CI 0.07-0.88, adjusted-p=0.03). Nevertheless, when the cut-off was set at 144hpi, no significant difference was reported. In this patient population, ending embryo culture at 144hpi would have caused 10.6%-, 7.3%-, 4.4%-, 13.7%-, and 5.2%-relative reductions in the number ofpatients obtaining blastocysts, euploid blastocysts, LBs, supernumerary blastocysts without a LB and after a LB, respectively.

Limitations, reasons for caution:
Gestational and perinatal outcomes were not assessed, and a cost-effectiveness analysis was not performed. We encourage the production of these data in other clinical settings and regulatory contexts.

Wider implications of the findings:
Day7 culture shall be supported following a careful case-by-case evaluation. Patients shall be aware of their lower competence, yet day7 blastocysts are valuable for poor-prognosis couples, couples less compliant towards other attempts in case of failures, and couples wishing for second children. AI may improve the generalizability of these evidence.

Keywords:
Day 7 blastocyst
Slow growing blastocyst
PGT-A
automation
Artificial Intelligence
Reporting on the value of Artificial Intelligence in predicting the optimal embryo for transfer: A systematic review and meta-analysisK. Sfakianoudis1, E. Maziotis2, S. Grigoriadis2, A. Pantou1, G. Kokkini2, A. Trypidi2, I. Angeli1, T. Vaxevanoglou1, K. Pantos1, M. Simopoulou2.
1Centre for Human Reproduction- Genesis Athens Clinic, Assisted Conception Unit, Chalandri- Athens, Greece.
2National and Kapodistrian University of Athens, Physiology, Athens, Greece.
Study question:

Are Artificial Intelligence (AI) based models effective in robustly predicting in vitro fertilization (IVF) outcome by assessing embryo quality?

Summary answer:
The majority of the AI-based models could provide an accurate prediction regarding live birth, clinical pregnancy, clinical pregnancy with fetal heartbeat and embryo ploidy status.

What is known already:
Precision and consistency in embryo quality evaluation are of paramount importance regarding the outcome of an IVF cycle. Numerous embryo grading and evaluation systems, employing morphological and morphokinetical assessment, have been proposed but without reaching a consensus yet. The main limitation of the aforementioned assessment systems is that they depend on human evaluation, which may be subject to subjectivity and interobserver variation. Thus, automated prediction models may be essential to optimize objectivity and reliability of embryo grading. Artificial neural network models may process microscopy images or time-lapse videos as input to predict the embryos’ potential competency.

Study design, size, duration:
A systematic review and meta-analysis including 18 published studies. The population consists of preimplantation embryos suitable for embryo transfer in IVF/ICSI cycles following employment of an AI-based prediction model. The outcome measures are prediction of live birth, clinical pregnancy, clinical pregnancy with heartbeat and ploidy status.

Participants/materials, setting, methods:
A systematic search of the literature was performed in the databases of Pubmed/Medline, Embase, and Cochrane Central Library limited to articles published in English up to August 2021. The initial search yielded a total of 694 studies with 97 of them being duplicates and other 579 being excluded on the grounds of not fulfilling inclusion criteria. Following full-text screening and citation mining a total of 18 studies were identified to be eligible for inclusion.

Main results and the role of chance:
Four studies reported on prediction of live birth. The sensitivity was 70.6% (95%C.I.: 38.1-90.4%) and specificity was 90.6% (95%C.I.:79.3-96.1%). The Area Under the Curve (AUC) of the Summary Receiver Operating Characteristics (SROC) curve was 0.905, while the partial AUC (pAUC) was 0.755. Employing the Bayesian approach, the total Observed:Expected ratio (O:E) was 1.12 (95%CI: 0.26–2.37; 95%PI:0.02-6.54). Ten studies reported on prediction of clinical pregnancy. The sensitivity and the specificity were 71% (95%C.I.: 58.1-81.2%) and 62.5% (95%C.I.: 47.4-75.5%) respectively. The AUC was 0.716, while pAUC was 0.693. Moreover, the total O:E ratio was 0.92 (95%CI: 0.61–1.28; 95%PI:0.13-2.43). Eight studies reported on prediction of clinical pregnancy with fetal heartbeat the sensitivity was 75.2% (95%C.I.: 66.8-82%) and the specificity was 55.3% (95%C.I.: 41.2-68.7%). The AUC was 0.722, while the pAUC was 0.774. The O:E ratio was 0.77 (95%CI: 0.54 – 1.05; 95%PI: 0.21-1.62). Four studies reported on the ploidy status of the embryo. The sensitivity and specificity were 59.4% (95%C.I.: 45.0-73.1%) and 79.2% (95%C.I.: 70.1-86.1%) respectively. The AUC was 0.751 and the pAUC was 0.585. The total O:E ratio was 0.86 (95%CI: 0.42 – 1.27; 95%PI: 0.03-1.83).

Limitations, reasons for caution:
The limited number of studies fulfilling inclusion criteria, along with the different designs applied when developing AI models which may lead to increased heterogeneity, stand as limitations. Inclusion of women regardless of their age presents as another limitation, as advanced maternal age has been associated with diminished IVF outcomes.

Wider implications of the findings:
Albeit, our findings support that AI is a highly promising tool in the era of personalized medicine providing precise predictions it does not appear to considerably surpass human prediction capabilities. More studies and more collaborations between the developers are of paramount importance prior to AI becoming the gold standard.

Keywords:
Artificial Intelligence
meta-analysis
IVF
Application of artificial intelligence using big data to devise and train a machine learning model on over 63,000 human embryos to automate time-lapse embryo annotation.A. Campbell1, R. Smith1, B. Petersen2, L. Moore3, A. Khan3, A. Barrie1.
1CARE Fertility Group, Embryology, Nottingham, United Kingdom.
2BMP Analytics, Mathematics, Aarhus, Denmark.
3BJSS, Data Science, Leeds, United Kingdom.
Study question:

Can a machine learning (ML) model, developed using modern neural network architecture produce comparable annotation data; utilisable for algorithmic outcome prediction, to manual time-lapse annotations?

Summary answer:
The model automatically annotated unseen embryos with comparable results to manual methods, generating morphokinetic data to enable comparably predictive outputs from an embryo selection algorithm.

What is known already:
The application of artificial intelligence across healthcare industries, including fertility, is increasing. Several ML models are available that seek to generate or analyse embryo images and morphokinetic data, and to determine embryo viability potential. Along with photographic images, the use of time-lapse in IVF laboratories has amassed numeric data, resulting predominantly from annotated manual assessment of images over time. Embryo annotation practice is variable in quality, can be subjective and is time-consuming; commonly taking several minutes per embryo. The development of rapid, accurate automatic annotation would represent a significant time-saving as well as an increase in reproducibility and accuracy.

Study design, size, duration:
Multicentre quality assured annotation data from 63,383 time-lapse monitored embryos (EmbryoScope®), comprising over 400 million individual images, were used to train a ML model to automatically generate morphokinetic annotations. Data was derived from 8 UK clinics within a cohesive group between 2012-2021. Accuracy was assessed using 900 unseen embryos (with live birth outcome) by comparing the output of an established in-house, prospectively validated embryo selection model when the input was either ML-automated, or manual annotations.

Participants/materials, setting, methods:
Multi-focal plane images were processed on the Azure cloud (Microsoft) and resampled to 300×300 pixels. A Laplacian-based focal stacking algorithm merged frames into a single image. The model consisted of an EfficientNetB4 Convolutional Neural Network classifier to extract features and classify the stage of embryo images. A Temporal Convolutional Network interpreted a time-series of image features; producing annotations from pronuclear fading through to blastocyst. Soft localisation loss function used QA data to integrate annotation subjectivities.

Main results and the role of chance:
The ML model rapidly and automatically generated annotations. Efficacy and comparability of the ML model to automate reliable, utilisable annotations was demonstrated by comparison with manual annotation data and the ML model’s ability to auto-generate annotations which could be used to predict live birth by providing annotation data to an established, validated in house embryo selection model. Live birth-predictive capability was measured, and benchmarked against manual annotation, using the area under the receiver operating characteristic curve (AUC).

When tested on time-lapse images, collected from pronuclear fading to full blastulation, representing 900 previously unseen, transferred blastocysts where live birth outcomes were blinded, the in-house developed auto-annotation ML model resulted in an AUC of 0.686 compared with 0.661 for manual annotations, for live birth prediction.

Auto annotation using the developed model took only milliseconds to complete per embryo. The developed auto-annotation model, built and tested on large data, is considered suitable for productionisation with the aim of being validated and integrated into an application to support IVF laboratory practice.

Limitations, reasons for caution:
Whilst this model was trained to recognise key morphokinetic events, there are other morphokinetic variables that may be useful in the prediction of live birth and further improve embryo selection, or deselection, ability. Akin to manual interpretation, some embryos may fail to be annotated or need second opinion.

Wider implications of the findings:
There is increasing evidence supporting the application of ML to utilise big data from time-lapse imaging and fertility care generally. Whilst promising benefits to IVF clinics and patients, responsible use of data is required alongside large high-quality datasets, and rigorous validation, to ensure safe and robust applications.

Keywords:
Artificial Intelligence
machine learning
time lapse imaging
embryo selection
annotation
Artificial intelligence blastocyst ploidy distinction through morphokinetics dataM. Nicolielo Barreto1, C. Jacobs1, R.C.M. Souza2, R. Erberelli1, J.R. Alegretti1, M.B. Chehin3, E.L.A. Motta4, M.F.G. Nogueira5, J.C. Rocha5, A.R. Lorenzon6.
1Huntington Medicina Reprodutiva – Eugin Group, Embryology Department, São Paulo- SP, Brazil.
2Institute of Biosciences- São Paulo State University, Graduate Program in Pharmacology and Biotechnology-, Botucatu- SP, Brazil.
3Huntington Medicina Reprodutiva – Eugin Group, Clinical Department, São Paulo- SP, Brazil.
4Huntington Medicina Reprodutiva – Eugin Group and Federal University of São Paulo, Clinical Department and Department of Gynecology- Paulista School of Medicine, São Paulo- SP, Brazil.
5School of Sciences and Languages- São Paulo State University, Department of Biological Sciences, Assis- SP, Brazil.
6Huntington Medicina Reprodutiva – Eugin Group, R&D Department, São Paulo- SP, Brazil.
Study question:

Is it possible to estimate blastocyst ploidy with an artificial intelligence (AI) algorithm built using embryo morphokinetics data?

Summary answer:
AI was able to estimate blastocysts status in euploid or aneuploid with an accuracy of 0.99 (training) and 0.71 and 0.70 AUC (blind-test), respectively.

What is known already:
Morphokinetic parameters are well associated with implantation rates, thus ensuring time-lapse as a useful tool to enhance embryo ranking and selection. However, regarding embryo ploidy status, clinics still rely exclusively on molecular/genetic testing. Our previous studies has indicated that euploid blastocysts are faster at time of pronucleous fading (tPNf) and time to blastulation (tB). Considering the amount of data derived from the morphokinetic annotations and the promising results that AI approach is bringing to the field, this study aims to verify AI accuracy to distinguish euploid and aneuploidy blastocysts using 17 morphokinetics parameters and prospectively compared to embryo biopsy results.

Study design, size, duration:
This is a prospective cohort study including 402 embryos (cultured in a time-lapse incubator, EmbryoscopePlus, Vitrolife) from 140 patients undergoing IVF treatment with preimplantation genetic testing for aneuploidy (NGS platform) after inform consent form signature between July 2019 and September 2021. Morphokinetics annotations were analyzed by AI in an association of the technique of artificial neural networks (ANN) and genetic algorithms (GA) for ploidy assessment.

Participants/materials, setting, methods:
Morphokinetic parameters were manually annotated up to full-expanded blastocyst time. Time intervals and time ratios were calculated, resulting in 17 variables for the IA analysis. Of the 402 embryos data, 252 were randomly divided into training, validating and testing (70%, 15% and 15%, respectively) of the ANN. The remaining 150 data were used for the blind test. The area under the curve (AUC) of the receiver operating characteristic curve was measured to obtain predictive power.

Main results and the role of chance:
From 402 blastocysts biopsied, 185 were euploid and 217 aneuploid. The AI algorithm was trained with 176 embryos (AUC for both euploid and aneuploid= 0.99), tested with 38 embryos (AUC for euploid=0.62 and aneuploid=0.61), validated with 38 embryos (AUC for euploid=0.82 and aneuploid=0.83). For the blind-test 150 embryos were used (AUC for euploid=0.70 and aneuploid=0.71). Blind-test database was checked only after AI algorithm was tested.

Limitations, reasons for caution:
The development of this AI algorithm was built from a single IVF center database. The absence of an extent dataset for the blind-test does not allow us to transpose this algorithm for clinical use at this moment.

Wider implications of the findings:
The use of artificial intelligence for embryo assessment is a promising tool in IVF laboratories. In our model, using only morphokinetics, a considerable predictive power to evaluate euploid (0.71 AUC) and aneuploid (0.70 AUC) embryos was achieved, indicating a potential use as a non-invasive approach for embryo ranking and selection.

Keywords:
Artificial Intelligence
Artificial Neural Networks
Time-lapse system
embryo ploidy
non-invasive technology
End-to-end deep learning system for recognition of euploid and aneuploid embryos using time-lapse videosE. Payá Bosch1, L. Bori1, M.Á. Valera1, A. Colomer2, V. Naranjo2, M. Meseguer3.
1IVIRMA Global, Research Laboratory, Valencia, Spain.
2Instituto de Investigación e Innovación en Bioingeniería, CVBLab, Valencia, Spain.
3IVIRMA Global, IVF Laboratory, Valencia, Spain.
E. Payá Bosch1, L. Bori1, M.Á. Valera1, A. Colomer2, V. Naranjo2, M. Meseguer3.
1IVIRMA Global, Research Laboratory, Valencia, Spain.
2Instituto de Investigación e Innovación en Bioingeniería, CVBLab, Valencia, Spain.
3IVIRMA Global, IVF Laboratory, Valencia, Spain.

Study question:

Can an Artificial Intelligence (AI) system based on a deep learning algorithm analyze time-lapse videos for ploidy status prediction?

Summary answer:
Our spatiotemporal model can distinguish aneuploid embryos from euploid embryos using time-lapse videos from 10 to 115 hours post-insemination (hpi) with an accuracy of 71,28%.

What is known already:
As the maternal age advances chances of aneuploidy in oocyte also increases and there is a high chance of early termination of pregnancy. Pre-implantation genetic testing for aneuploidy (PGT-A) is a reliable tool for detecting chromosomal status. However, PGT-A is an invasive technique in which protocol requires an embryo biopsy. Continuous monitoring of embryo development led to AI models for the prediction of ploidy based on blastocyst images or morphokinetic parameters. Previous publications showed that euploid embryos reach blastulation earlier than non-euploid embryos. This is the first attempt to predict ploidy by analyzing continuous embryo development through captured time-lapse images.

Study design, size, duration:
The present study consisted of a single-center retrospective analysis for the evaluation of ploidy status with a non-invasive method. We developed our models based on a balanced dataset of 940 videos (from 10 to 115 hpi) extracted from the EmbryoScope time-lapse system. All the videos were divided into 90% for training and validating and 10% for testing. The target class for the predicted models was the results of PGT-A on blastocyst by next-generation sequencing.

Participants/materials, setting, methods:
We used an end-to-end approach to develop an automated AI system capable of extracting features from images and classifying them considering temporal dependencies. First, a convolutional neural network (CNN) extracted the most relevant features from each frame. We used a deep architecture known as ResNet50. Second, a bidirectional long short-term memory (LSTM) layer received this information and analyzed temporal dependencies, obtaining a low-dimensional feature vector that defined each video. Finally, a multilayer perceptron classified them.

Main results and the role of chance:
Euploid and aneuploid precision was 69% and 75%, respectively. Euploid and aneuploid sensitivity was 79% and 64%, respectively. Euploid and aneuploid F1 score was 73% and 69% respectively. The global accuracy associated with our spatiotemporal model to differentiate between the two classes achieved 71,28% on this dataset. Additionally, we trained models with external information such as maternal age (38,3±3,9 versus 39,1±3,1), but the performance did not improve. Note that we did not apply a prior selection of good quality videos to study more reliably the possible inclusion of an AI model for chromosomal status analysis in clinical practice.

Limitations, reasons for caution:
The main limitation of this study is the single-center retrospective approach and the reduced size of the database, therefore future prospective research would improve model performance. However, the preliminary results showed the high potential of the methods.

Wider implications of the findings:
Our results showed potential automation of chromosomic status evaluation. Our findings led to a possible non-invasive method and the research of new unknown key factors for determining ploidy. Further studies with a large number of time-lapse videos could result in a potential translation to clinical use.

Keywords:
deep learning
Artificial Intelligence
embryology
aneuploidy
PGT-A
Non-invasive AI image analysis unlocks the secrets of oocyte quality and reproductive potential by assigning ‘Magenta’ scores from 2-dimensional (2-D) microscope imagesJ. Fjeldstad1, N. Mercuri1, J. Meriano2, A. Krivoi1, A. Campbell3, R. Smith4, K. Berrisford4, C. Drezet3, R. Casper5, D. Nayot6.
1Future Fertility, Embryology, Toronto, Canada.
2TRIO Fertility, Embryology, Toronto, Canada.
3CARE Fertility UK, Embryology, Nottingham, United Kingdom.
4CARE Fertility UK, Embryology, Sheffield, United Kingdom.
5TRIO Fertility, Reproductive Endocrinology and Infertility, Toronto, Canada.
6Future Fertility, Medical Director, Toronto, Canada.
Study question:

Can an Artificial Intelligence (AI) software tool, utilizing 2-D image analysis of mature oocytes, prospectively correlate an oocyte score to utilizable blastocyst development?

Summary answer:
Oocyte Magenta scores show a statistically significant difference in blastocyst development between the highest (7.1-10) and lowest (1.0-4.0) scored oocytes [46.1% vs 26.6%; p< 0.005].

What is known already:
Unlike sperm (WHO 2010) or embryos (Gardner blastocyst grading), there is no validated visual oocyte scoring system used in clinical practice. Embryologists have been unsuccessful in correlating oocyte morphological features to reproductive potential. A valuable oocyte scoring system should be able to correlate higher scores with improved embryological outcomes. Although not possible by the human eye, a non-invasive oocyte AI assessment tool (Magenta) has accomplished this feat in retrospective studies. This study applied the Magenta network at two IVF clinics in real-time; representing one of the few prospective AI studies in our field, and the only one focusing on oocytes.

Study design, size, duration:
This prospective, multi-center study was conducted from September – November 2021 by TRIO Fertility (Toronto, Canada) and CARE Fertility (Sheffield and Nottingham, UK), utilizing the oocyte AI image analysis tool, Magenta. Magenta was created with a convolutional neural network trained on 16,373 oocyte images and corresponding outcomes. Inclusion criteria was all IVF-ICSI patients who consented to participate without severe male factor (testicular or epididymal sources). Results are based on 392 images of oocytes (46 patients).

Participants/materials, setting, methods:
Non-invasive, light microscope images were taken of mature oocytes post-denudation, prior to ICSI, utilizing an image capture software. Images were uploaded and analyzed by Magenta, scoring each oocyte on a scale of 1-10, and remained in a blinded folder to the IVF clinics. De-identified patient outcomes were collected to analyze blastocyst development correlation with Magenta scores. Oocytes were handled as per good laboratory practice, without extended periods outside the incubator or disruption to standard protocols.

Main results and the role of chance:
Oocyte images were analyzed by Magenta to score each oocyte on a scale of 1-10. There was a total of 46 patients representing 392 oocytes from both TRIO (26, 280) and CARE (20, 112). The scoring spectrum was divided into 3 tiers (1.0-4.0: 188 oocytes; 4.1-7.0: 128 oocytes; 7.1-10: 76 oocytes). A utilizable blastocyst was defined as a Gardner grade of 2BB or greater on Day 5 or 3BB or greater by Day 6 of embryo development and of adequate quality for transfer, freezing or PGT-A biopsy.

The blastocyst development (positivity) rate was 26.6% (1.0-4.0), 32.0% (4.1-7.0) and 46.1% (7.1-10), with mean Magenta scores of 2.4, 5.5 and 8.2, respectively. The lowest and highest tier of Magenta scores were accordingly found to have the lowest and highest blastocyst rates, which was statistically significant (p-value < 0.005) by a Two-Proportions Z-test.

Overall, oocytes that developed into a utilizable blastocyst had a higher mean Magenta score (5.0) than oocytes that did not develop into a utilizable blastocyst (4.3); (p-value <0.05) by a Welch’s Two Sample t-test.

Limitations, reasons for caution:
Sample size is currently limited for this ongoing, prospective study. Therefore, additional male factor (non-surgical sperm sources) and possible poor images have not been removed from the current analysis. Furthermore, AI neural network accuracy is restricted by the amount of data it is trained on.

Wider implications of the findings:
Magenta has enabled visual oocyte assessments that will provide IVF-ICSI patients with insights into their oocyte quality; resulting in counselling benefits and the ability to make more informed, personalized decisions regarding future treatment plans. AI will inevitably improve the IVF process and prospective validation studies are critical in its evolution.

Keywords:
oocyte
Blastocyst development
Artificial Intelligence
In vitro fertilization (IVF)
Artificial intelligence algorithms reach expert-level accuracy in automated grading of blastocyst morphology assessment based on static embryo images and Gardner criteriaF. Kromp1, B. Balaban2, V. Cottin3, I. Cuevas Saiz4, P. Fancsovits5, M. Fawzy6, N. Findikli7, B. Kovacic8, D. Ljiljak9, I. Martínez Rodero10, L. Parmegiani11, O. Shebl12, R. Wagner13, M. Xie14, T. Ebner12.
1Software Competence Center Hagenberg, Data science, Hagenberg, Austria.
2American Hospital of Istanbul, In vitro fertilization lab, Istanbul, Turkey.
3Bethesda Spital Basel, Assisted Reproduction Technology Unit, Basel, Switzerland.
4Hospital General Universitario de Valencia, In vitro fertilization lab, Valencia, Spain.
5Semmelweis University School of Medicine, Division of Assisted Reproduction, Budapest, Hungary.
6IbnSina and Banon IVF Centers, In vitro fertilization lab, Sohag, Egypt.
7Bahceci Fulya IVF Centre Istanbul, In vitro fertilization lab, Istanbul, Turkey.
8University Medical Centre Maribor, Department of Reproductive Medicine and Gynecological Endocrinology, Maribor, Slovenia.
9Sestre Milosrdnice University Hospital Center, Department of Gynecology and Obstetrics, Zagreb, Croatia.
10Universitat Autònoma de Barcelona, Laboratori de Fecundació In Vitro, Barcelona, Spain.
11GynePro Medical Centers, Embryology lab, Bologna, Italy.
12Kepler University Linz, Gynecology- Obstetrics and Gynecological Endocrinology, Linz, Austria.
13Software Competence Center Hagenberg, Services and solutions, Hagenberg, Austria.
14University Hospital Zurich, Department of Reproductive Endocrinology, Zurich, Switzerland.
Study question:

Can artificial intelligence (AI) algorithms reach expert-level accuracy in blastocyst morphology assessment according to Gardner criteria?

Summary answer:
The prediction accuracy of the best performing AI algorithm (Deit), outperformed human-level mean accuracies compared to an embryologist majority vote for all Gardner morphological criteria.

What is known already:
Routinely, morphological grading of blastocysts is performed visually according to Gardner criteria, which suggest expansion (EXP), quality of inner cell mass (ICM), and trophectoderm (TE) as key parameters to predict treatment outcome. Consequently, blastocyst scoring is prone to inter-and intra-observer variability, which may lead to inconsistencies in selecting blastocysts for transfer. AI-based algorithms may help to improve treatment outcome predictability, as it has been suggested recently. In those studies, parameters such as blastocyst quality or stage were annotated by experts from static or time-lapse-derived blastocyst images, to train AI algorithms, e.g. XCeption or YOLO, and compare them to human annotators.

Study design, size, duration:
This retrospective study involves 2,270 images from 837 patients collected over a period of four years in a university IVF clinic.

Participants/materials, setting, methods:
All images were annotated by one senior embryologist and divided into a training and a balanced test set. Subsequently, eight embryologists labeled 300 test set images such that every single image was seen by at least four embryologists. Annotators diverging from the ensemble vote for more than one standard deviation were excluded (n=2) to set the ground truth labels. Finally, three AI architectures (XCeption, Swin, Deit) were trained and evaluated on that particular ground truth.

Main results and the role of chance:
Out of nine annotators, labelling accuracy of two embryologists diverged from the consensus vote for more than one standard deviation for at least one of the three Gardner criteria. The consensus vote was built from the remaining seven annotators (mean accuracy EXP 0.81, ICM 0.70, TE 0.67). The Swin architecture outperformed the mean expert accuracy for all three criteria (EXP 0.82, ICM 0.76, TE 0.68), while the Deit and the XCeption architecture outperformed the mean expert accuracy in ICM accuracy (Deit 0.72, XCeption 0.73), and performed equal or worse in EXP and TE accuracy (Deit EXP 0.77, ICM 0.73; XCeption EXP 0.77, TE 0.66). When compared to a recent study conducted on time-lapse imaging data using AI algorithms, all our models outperform the ICM accuracy and achieve comparable TE accuracy. To minimize the role of chance in calculating the models’ prediction accuracies, the SWA-Gaussian (SWAG) algorithm was used. SWAG is a method to reflect and calibrate uncertainty representation in Bayesian deep learning. It is based on modelling a Gaussian distribution for each networks’ weight and applying it as a posterior over all neural network weights to perform Bayesian model averaging.

Limitations, reasons for caution:
To reflect a real IVF lab scenario, embryologists of different origins and levels of experience were involved and no scoring training was offered to the participants. These facts could have potentially negatively affected the degree of consensus, although we excluded two annotators diverging from the mean labeling accuracy.

Wider implications of the findings:
In the past, AI algorithms proved to reliably differentiate between good and bad prognosis blastocysts but not necessarily between blastocysts of similar quality. Further AI-supported differentiation on the basis of expansion and cell lineages will facilitate the ranking of blastocysts and would bring automated scoring closer to clinical application.

Keywords:
Artificial Intelligence
Bayesian deep learning
blastocyst morphology
inner cell mass
trophectoderm
Machine and Deep learning models to classify Comet assay tests for sperm DNA fragmentation evaluationL. Serrano Berenguer1, S. Hincapié Monsalve2, S. Lara Cerrillo1, C. Rosado Iglesias1, E. Vegas Lozano3, F. Reverter Comes3, C. Ventura2, A. García Peiró1.
1CIMAB Male Infertility Centre, Research & Development, Sant Quirze del Vallés, Spain.
2Universitat Oberta de Catalunya, Departamento de Estudios de Informática- Multimedia y Telecomunicación, Barcelona, Spain.
3Universitat de Barcelona, Departamento de Genética- Microbiología y Estadística. Sección Estadística, Barcelona, Spain.
Study question:

To evaluate the performance of Machine and Deep learning classification models for sperm DNA fragmentation testing.

Summary answer:
Machine and Deep learning models achieved an accuracy over 90% for sperm DNA fragmentation testing.

What is known already:
Artificial Intelligence (AI) is an up-to-date tool that could improve current diagnostics in reproductive medicine. Paternal genome integrity is essential after fertilization. Therefore, high levels of DNA damage have been associated with male infertility. Sperm DNA fragmentation (SDF) is a well-known male fertility biomarker. Among different diagnostic tests analysing SDF, the Comet assay offers high specificity and sensibility. Human expert-analysis has shown to be highly precise but can be influenced by external factors, samples’ variability or tiredness. Consequently, computer-assisted diagnostic based on AI models may be helpful to improve the Comet assay analysis.

Study design, size, duration:
From February to June 2021, alkaline and neutral Comet assays tests were performed on semen samples from a heterogenous group of men. The CometAssay IVTM software was used to record quantitative data from spermatozoa: 500 normal and altered spermatozoa were assessed after the alkaline and neutral Comet assay. Images for each spermatozoon were also taken. The total size was 2000 analysed spermatozoa. Finally, a first validation step was performed using new data from 20000 spermatozoa.

Participants/materials, setting, methods:
For each spermatozoon, ten quantitative parameters and a grayscale image were obtained. A Machine learning predictive model using the Random Forest (RF) algorithm was trained with the quantitative parameters. Moreover, a Deep learning Convolutional Neural Network (CNN) algorithm was trained with cell images. Both models were trained on 67% of data and tested using the remaining 33%.

Main results and the role of chance:
Predictive models based on RF and CNN showed high performance for normal/altered cells automatic classification. The accuracy achieved by the RF models was 95.51% for the alkaline Comet assay and 92.64% for the neutral Comet assay. Regarding the CNN models, the accuracy was 96.71% for the alkaline Comet assay and 93.19% for the neutral Comet assay. CNN models showed better accuracy on both assays.

Regarding the quantitative parameters considered in the RF model, the most important parameters for classification are the following ones: Mean_Grey_Level and Total_Intensity in the alkaline Comet assay, and Tail_Migration and Tail_Length in the neutral Comet assay.

Finally, 20000 spermatozoa from 100 semen samples were analysed to compare the AI models result with the annotation from an expert human. The Kruskal-Wallis test did not show significant differences for the alkaline and the neutral Comet assays (p>0.05 for all cases). Paired comparisons using the Mann-Whitney U test did not show statistical differences (p>0.05 for all cases). According to these results, AI models may reproduce a human analysis.

To facilitate the use in the laboratory of the obtained models, a web application was developed to process new samples.

Limitations, reasons for caution:
Diagnostic assays based on continuous variables include a threshold value to separate normal and altered populations. The final result of some samples could be mistaken when a high number of spermatozoa present a fragmentation index near the cut-off value.

Wider implications of the findings:
Diagnostic of male infertility through the analysis of SDF can be achieved through AI predictive models. This technology might help in the standardization of SDF testing between laboratories.

Keywords:
Artificial Intelligence
Sperm DNA Fragmentation
Comet assay
machine learning
Male Infertility
A novel Artificial Intelligence Microscopy: Mojo AISA, the new way to perform semen analysisA. Parrella1, N. Rubio Riquelme1, L.A. van Os Galdos-1, I. Vilella Amorós1, M. Jiménez Gadea1, J. Aizpurua2.
1IVF Spain, IVF laboratory, Alicante, Spain.
2IVF Spain, Gynecology and IVF laboratory, Alicante, Spain.
Study question:

Can Mojo AISA, an Artificial Intelligence microscopy, release accurate and reliable semen analysis results for the daily routine?

Summary answer:
Mojo AISA guarantee precise semen analysis results improving the objectiveness and minimizing human error. Moreover, embryologists can safe 50% of time per procedure.

What is known already:
The current method to perform a semen analysis is through the manual microscope and/or computer-assisted semen analysis. The most automated sperm analyzers rely on a classic image processing algorithms which can distinguish spermatozoa by size and brightness. However, it has been demonstrated that these algorithms are not able to discriminate well spermatozoa heads from other cells that have similar size, leading to improper results. To overcome these limits, a new Artificial Intelligence Semen Analysis system, Mojo AISA, has been developed to carry out concentration and motility. Mojo AISA is based on a neural network classification, a series of embedded algorithms.

Study design, size, duration:
In the last nine months, semen parameters of 64 men were assessed simultaneously by manual microscopy method and by Mojo AISA. The manual semen analysis was performed by two certified andrologists following WHO 5th Edition guidelines. Concentration and motility parameters were assessed and compared between the two methods. Regarding the motility, we compare the following 3 categories: Progressive (PR), Non-Progressive (NP) and combined motility (PR+NP). Samples with normal and abnormal semen parameters were included.

Participants/materials, setting, methods:
Semen samples were allowed to liquefy for at least 15 min at 37 °C. For the manual method, 10 μL of raw sample was loaded onto a Makler chamber and for Mojo AISA, two 10 μl drops of raw samples were smeared side on the side on a glass slide. Mojo AISA delivered semen analysis results in 4 minutes per sample. The statistical analysis was carried out with SPSS 14.0 statistical software.

Main results and the role of chance:
A semen analysis of 64 semen samples from 62 men (40±10 years old) was performed simultaneously with manual method and Mojo AISA, following WHO 5th Edition (2010) guidelines. The average and the standard deviation of semen concentration with manual method and with Mojo AISA was 52.7±46 and 50.6± 43.2 x 106/ml, respectively (P=NS). No significant difference was found when the combined motility (PR+NP) was evaluated. Indeed, the average and standard deviation was 53.5±20% with manual method and 49.1± 22.1% with Mojo AISA (P=NS). Similar results were seen with progressive motility showing an average and standard deviation of 38.5±19% and 34.1±20%, respectively (P=NS). Ultimately, the assessment of non-progressive motility showed an average and standard deviation of 12.3±12% with manual method and 13.9± 9% with Mojo AISA, showing no statistic difference.

Limitations, reasons for caution:
The protocol of the slide’s preparation should be properly followed since the formation of air bubbles can impact on the correct semen evaluation of Mojo, misleading sperm results. Mojo AISA presents difficulty to assess sample with extremely low concentration and further evaluation are needed for this type of samples.

Wider implications of the findings:
These findings show that the semen analysis results of mojo AISA and those of manual method are comparable. Mojo AISA can guarantee semen analysis results more precise, with lower inter‐laboratory variability and in a 50% shorter time.



Keywords:
semen analysis
Artificial Intelligence
Mojo AISA
Network Classification
ICSI Semi Automation powered by AI: Gateway for Oocyte Quality Assessment real time based on biomechanics during ICSIS. Payeli1.
1IVF Precisions Pvt Ltd, Innovations, Bangalore, India.
Study question:
Embryo selection based on oolemma elasticity and cytoplasm viscosity using semi automated CL-ICSI technology assisted by Artificial Intelligence

Summary answer:
CL-ICSI syringe/technology is capable of converting embryologist visual and feel of each ICSI procedure into a measurable form for objectivity of assessment in real time.

What is known already:
In last four decades, understanding and/or assessing oocyte quality (based on biomechanics) become one of the needs in order to define the embryo formation and selection. initial days of ICSI, embryologist identifies the oocyte pool as fragile, normal or hyper elastic groups based on visual and indentation methods. Further, a variety of advanced techniques were implemented such as ultra fast imaging, atomic force sensing, micro tactile sensors, microfluidics and zone modulus measuring methods. These techniques were implemented on oocyte donation programs, and limited to research, due to the complexity and technical difficulties in implementing.

Study design, size, duration:
From our observations, we would like to introduce our innovative technology (CL-ICSI) that is semi automation of ICSI procedure which is capable of measuring oolemma elasticity and cytoplasmic viscosity. we show the performance using limited number of discarded oocytes as a proof of concept of our technology. We believe that, exploitation of such technology become an asset/gateway for better understanding of oocyte quality and perhaps contribute in embryo selection based on oocyte biomechanics/quality.

Participants/materials, setting, methods:
Current technology (CL-ICSI) is based on ICSI syringes that are calculated/designed to improve objectivity during the ICSI procedure among inter and intra users. CL-ICSI technology comprises of a semi automated syringe equipped with micro controllers that utilises force sensing and pressure mapping techniques to convert oocyte biomechanics data into measurable read outs on display unit. These measurable readouts are finally fed to AI algorithms that calculate the oocyte quality score as an output.

Main results and the role of chance:
CL-ICSI semi automation technology is designed to minimise embryologist subjectivity during the ICSI procedure. Existing syringes works on manual rotatory mechanism which does not provide the start and stop point of the pressure applied on oocyte for puncturing, which genratlly controlled by embryologists skill and experience. Current technology comes with a click button that is designed to be loaded with potential energy which is proportional to the pressure/force required to puncture oolemma subsequently for sperm deposition. In case of classical syringes, post puncturing opposite rotation is implemented to withdraw the negative pressure or cytoplasmic content flow into the needle. Due to the lack of precise positioning the system become subjective. For this reason, we introduced mechanical potential energy build up, where oolemma puncture occurs with a button click and compensation of negative pressure occurs with releasing the button. This mechanism provides objectivity as well requires less than half of the time for oocyte intervention during ICSI over conventional rotatory syringes. More over, this technology works on pneumatics and is adaptable to all existing micro manipulators. Utilisation of such technologies may help improve the objectivity during ICSI and help understand oocyte quality in addition to the visual morphology assessment using AI.

Limitations, reasons for caution:
CL-ICSI technology may be considered as next generation ICSI systems. Application of this technology in clinical setting in collecting the data from multicenters and different ethnicities might provide information that is a step closer to understand the oocyte quality and its role in embryo formation using artificial intelligence.

Wider implications of the findings:
Embryo selection is one of the key event for single embryo transfer for healthy pregnancies. Morphology and morphokinetics data on oocytes and embryos provide valuable information but are limited, if there are no embryos formed. Current technology (CL-ICSI) may provide new avenues for better understanding on embryo prediction and selection.

Keywords:
ICSI semi automation
embryo selection
Single Embryo Transfer (SET)
Oocyte quality and biomechanics
Artificial Intelligence
Blastocyst morphometry and morphology predict ploidyB. Shapiro1, F. Garner1, L. Kaye2, M. Rasouli3, K. Verma3, C. Bedient2.
1Fertility Center of Las Vegas, Reproductive Endocrinology, Las Vegas, U.S.A..
2Fertility Center of Las Vegas, Reproductive Endocrinilogy, Las Vegas, U.S.A..
3University of Nevada School of Medicine, Ob/gyn, Las Vegas, U.S.A..
Study question:

Are objective measurements and subjective assessments of blastocysts predictive of aneuploidy?

Summary answer:
Aneuploidy was predicted equally well by lower trophectoderm cell count (objectively measured morphometry) or by a lower subjective trophectoderm grade.

What is known already:
Blastocyst morphological grade has been reported to be moderately predictive of embryo ploidy. However, due to its subjective nature, morphological grade may be significantly more difficult to code into artificial intelligence algorithms when compared to objective measurements.

Study design, size, duration:
This retrospective cohort study included all 1409 blastocysts that were subject to pre-implantation genetic testing (PGT) in a 30-month study period.

Participants/materials, setting, methods:
Per clinic routine, embryos from oocyte were cultured to the blastocyst stage following conventional ovarian stimulation, oocyte retrieval, and intracytoplasmic sperm injection. If the patients opted for PGT, a laser was used to open the zona at the cleavage stage, trophectoderm biopsies were collected at the blastocyst stage, and biopsies were analyzed by next-generation sequencing. Inner cell mass sizes were measured by ocular micrometer and trophectoderm cells were counted in one plane of focus.

Main results and the role of chance:
The mean patient age (including oocyte donors) was 31.9±6.4 years and the mean number of collected oocytes was 23.0±12.9. In multivariable logistic regression, greater patient age was the best available predictor of blastocyst aneuploidy (P<0.0001). Along with age in the model, the observation of fewer trophectoderm cells (P=0.0019) was also predictive of aneuploidy. Alternatively, the failure to have an A-grade trophectoderm (P=0.0007) was also predictive of aneuploidy, also with age in the model. Neither the objectively measured inner cell mass size nor subjective inner cell mass grade were significant predictors of ploidy. Overall, objectively measured blastocyst morphometry was about equally predictive of ploidy as was subjective blastocyst grading, and area under the receiver operating characteristic curve was 0.62 for each model.

Limitations, reasons for caution:
This study was retrospective, allowing the possibility of selection bias among patients and among embryos chosen for biopsy.

Wider implications of the findings:
Artificial intelligence algorithms assessing embryos might benefit from the similar performance of subjective grading and objective measurements, because it is much easier to code objective measurements into the algorithms.

Keywords:
aneuploidy
blastocyst
trophectoderm biopsy
Artificial Intelligence
embryo grading
Sensitivity analysis of an embryo grading artificial intelligence model to different focal planesJ.H. Cho1, C.D. Brumar1, P. Maeder-York1, O. Barash2, J. Malmsten3, N. Zaninovic3, D. Sakkas4, K. Miller5, M. Levy6, M.D. VerMilyea7, K. Loewke1.
1Alife Health, Alife Health, Cambridge, U.S.A..
2Reproductive Science Center, Reproductive Science Center, San Ramon, U.S.A..
3Weill Cornell Medicine, Weill Cornell Medicine, New York, U.S.A..
4Boston IVF, Boston IVF, Waltham, U.S.A..
5IVF Florida, IVF Florida, Margate, U.S.A..
6Shady Grove Fertility, Shady Grove Fertility, Rockville, U.S.A..
7Ovation Fertility, Ovation Fertility, Austin, U.S.A..
Study question:

What is the sensitivity of an embryo-grading artificial intelligence (AI) model to different focal planes and how do we obtain consistent scores across focal planes?

Summary answer:
Test-time augmentation and ensemble modeling reduce sensitivity of the AI model to different focal planes while maintaining performance.

What is known already:
When prioritizing embryos for transfer, embryologists assess the 3D morphological features under a microscope, by zooming up and down, and assign a score that reflects the embryo quality. In comparison, some AI-based embryo grading models typically take one 2D focal plane of an embryo and output a score based on that focal plane. AI models such as convolutional neural networks (CNNs) are known to be sensitive to perturbations in its input. In order to reduce sensitivity and generalization error and thus improve predictive performance, techniques such as ensemble learning and test-time augmentation can be used.

Study design, size, duration:
Historical, de-identified images of blastocyst-stage embryos were collected from 11 IVF clinics in the United States for cycles between 2015-2020. 5,100 blastocysts were matched to pregnancy outcomes as determined by fetal heartbeat. 2,900 blastocysts were matched to aneuploid PGT-A results and added to the negative training group to reduce selection bias. Data was split to 70% for training and 30% for testing. A set of 10 embryos were used for focal plane sensitivity.

Participants/materials, setting, methods:
A single model (ResNet18), a three-model (ResNet18), and a six-model (ResNet18 and EfficientNet-b1) ensemble with and without test-time augmentation were trained to rank embryos according to their likelihood of reaching clinical pregnancy. Test-time augmentation involved taking the average scores from 4 flipped and rotated copies of the original input image. Manual grades were mapped to numeric scores for comparison. The AUC was used to evaluate the ability of the models to rank embryos.

Main results and the role of chance:
Focal plane sensitivity was calculated as the range, or difference between the maximum and minimum score, for an embryo at different focal planes. Between 12 and 100 focal plane images were available for each of the 10 embryos. On average, the focal plane range was 0.26 for the single model, 0.22 for the single model with test-time augmentation, 0.14 for a 3-model ensemble with test-time augmentation, and 0.11 for a 6-model ensemble with test-time augmentation. Test-time augmentation on the single model reduced the range by 17%; whereas ensemble modeling with test-time augmentation reduced the range by 46% for the 3-model ensemble and 60% for the 6-model ensemble. Reduction in range did not compromise performance. The AUC for the test set for all embryos was 0.73 for the single model, 0.74 for the single model with test-time augmentation, 0.75 for the three-model ensemble with test-time augmentation and 0.74 for the six-model ensemble with test-time augmentation. All models outperformed manual grading, which was estimated to have an AUC of 0.67 for all embryos.

Limitations, reasons for caution:
Our analysis on focal plane sensitivity was limited to a small sample size of 10 embryos, so more samples will be needed to confirm our findings.

Wider implications of the findings:
Test-time augmentation and ensemble techniques can be used to reduce sensitivity while maintaining model performance. By reducing sensitivity to different focal planes, an AI model can produce one reliable score for a single embryo as is done currently in practice with manual grading.

Keywords:
Artificial Intelligence
embryo grading
image analysis
machine learning
blastocyst
Large-scale simulation of pregnancy rate improvements using an AI model for embryo rankingJ.H. Cho1, A. Ehlers1, C. Brumar1, P. Maeder-York1, O. Barash2, J. Malmsten3, Z. Nikica3, D. Sakkas4, M. Levy5, K. Miller6, M.D. VerMilyea7, K. Loewke1.
1Alife Health, Alife Health, Cambridge, U.S.A..
2Reproductive Science Center, Reproductive Science Center, San Ramon, U.S.A..
3Weill Cornell Medicine, Weill Cornell Medicine, New York, U.S.A..
4Boston IVF, Boston IVF, Waltham, U.S.A..
5Shady Grove Fertility, Shady Grove Fertility, Rockville, U.S.A..
6IVF Florida, IVF Florida, Margate, U.S.A..
7Ovation Fertility, Ovation Fertility, Austin, U.S.A..
Study question:

What is the expected improvement in pregnancy rates using an artificial intelligence (AI) model for embryo ranking compared to manual grading systems?

Summary answer:
A large-scale retrospective bootstrapped analysis shows that use of an AI model for embryo ranking can improve pregnancy rates compared to manual grading.

What is known already:
Embryo evaluation is one of the most important steps of an in vitro fertilization (IVF) procedure. Recently, artificial intelligence (AI) models have been developed to automate embryo analysis and reduce the subjectivity of manual grading. While models are often evaluated in terms of classification accuracy or area under the curve (AUC), a more relevant metric is improvement in pregnancy rates. Here we evaluate a previously developed model using a large-scale bootstrapped analysis of virtual patient pregnancy rates and compare its performance to manual grading.

Study design, size, duration:
Historical, de-identified images of transferred blastocyst-stage embryos and manual morphology grades were collected from 11 IVF clinics in the United States for cycles started between 2015-2020. Images were captured on day 5, 6, or 7 using the inverted microscope prior to biopsy or freeze. A total of 1,776 test set images from 3-fold cross validation were used for this analysis.

Participants/materials, setting, methods:
Embryos were matched by age, PGT status, and race to create 16 distinct categories. Virtual patient panels were created within each category using a random selection of 3-5 embryos. Embryos were re-used across different panels, but each individual panel was unique. Three different manual ranking systems were created incorporating the morphology grade and day of image capture. The AI and one randomly chosen manual ranking system independently selected a top embryo for each panel.

Main results and the role of chance:
On average, 105,263 unique virtual patient panels were constructed from the 1,776 embryos. Within these panels, the AI model and manual ranking system selected different top embryos from each other in 27,860 cases, or 26% of the time. The average pregnancy rate of the top-ranked embryo using manual grading was 53.1%, and the average pregnancy rate of the top-ranked embryo using the AI model was 59.4%. The average pregnancy rate improvement from using the AI model was 6.3%, with a standard deviation of 0.2% measured across 10 repetitions of the simulation with different random seeds.

Limitations, reasons for caution:
The primary limitation is the retrospective nature of this study. Also, this bootstrapped panel study relied on recorded manual morphology grades at the time of embryo transfer or freeze rather than on the actual selection of the top embryo in each panel by an embryologist.

Wider implications of the findings:
Our results demonstrate the potential of using an AI model for embryo ranking in terms of improved pregnancy rates. Results from this large-scale bootstrapped retrospective analysis will help inform the design of future clinical validation studies.

Keywords:
machine learning
Artificial Intelligence
embryo grading
blastocyst
image analysis
Aneuploid embryos as a proposal for improving Artificial Intelligence performanceE. Güell Penas1, A. Vives Perelló2, M. Esquerrà Parés2, M. Mladenova Koleva2.
1CONSULTFIV & Conceptum, IVF Lab, Valls, Spain.
2Conceptum, IVF Lab, Reus, Spain.
Study question:

Could we improve the performance of Machine Learning algorithms by using aneuploid embryos instead of non-implanted embryos as the contrary reference to Live-Birth embryos?

Summary answer:
Machine Learning (ML) algorithms results were improved when aneuploid embryos were taken into consideration.

What is known already:
Artificial Intelligence (AI) techniques have been focusing on Deep Learning (DL) image recognition although several Machine Learning algorithms are also suitable for morphokinetic analysis. Traditional Live-Birth prediction labelling could have included in the outcome variable a certain number of potentially evolving embryos with a negative result due to factors unrelated to the embryo such as endometrial receptivity. According to this, mislabelling could lead to distorted predictions and decreased AI performance. Aneuploid diagnosis could be useful for potential prediction labelling despite some aneuploid embryos could still be viable as a result of mosaicism or non-concordance misdiagnosis.

Study design, size, duration:
Retrospective analysis of morphokinetic data and clinical outcomes of 343 embryos in a single IVF unit between 2014 and 2021.

Participants/materials, setting, methods:
Two datasets were prepared and used for training and testing (V-Fold Cross-Validation) by three ML algorithms: eXtreme Gradient Boosting (XGB), k-Nearest Neighbor (kNN) and Random Forest (rF). Both datasets shared 117 Live-Birth Embryos. “Dataset A” included 123 non-implanted embryos while “Dataset B” comprised 103 aneuploid embryos which were kept vitrified at the blastocyst stage. The prediction power for each dataset model was measured using the area under the curve (AUC) and its confusion matrix’s metrics.

Main results and the role of chance:
All metrics for each Machine Learning algorithm analysed were higher in the dataset including aneuploid embryos. The AUC for “Dataset A” did not reach the value of 0.6 (XGB = 0.540, kNN = 0.500, rF = 0.590) while AUC values for “Dataset B” surpassed 0.7 (XGB = 0.722, kNN = 0.718, rF = 0.740). According to this, different morphokinetic patterns were detected by Machine Learning algorithms. Algorithms’ minor performance with non-implanted embryos may be due to an increased Label Noise effect, suggesting that including aneuploid embryos could be more appropriate when building predictive algorithms for embryo viability.

Limitations, reasons for caution:
Although the sample size was larger than the minimum recommended for training Machine Learning algorithms, this study should be replicated with a higher number of embryos. Mosaic embryos should be included in further and deeper analysis.

Wider implications of the findings:
This study was the first part of a global project based on AI and time-lapse data. Implantation and Live-Birth Rate will be calculated for each predicted level. Further studies are needed to confirm if other AI techniques such as DL could also improve their performance.

Keywords:
morphokinetics
machine learning
Artificial Intelligence
mislabelling
noisy labels
Association between iDAScore v1.0, senior embryologists’ grading and euploidy in 546 blastocysts obtained during 189 PGT-A cyclesV. Casciani1, D. Cimadomo1, S. Trio2, V. Chiappetta1, F. Innocenti1, B. Iussig3, E. Alviggi4, S. Canosa5, N. Barnocchi6, R. Maggiulli1, J. Berntsen7, M.F. Kragh7, M. Larman8, F.M. Ubaldi1, L. Rienzi1.
1Clinica Valle Giulia, GeneraLife IVF, Rome, Italy.
2GeneraLife Milan, GeneraLife IVF, Milan, Italy.
3Genera Veneto, GeneraLife IVF, Marostica, Italy.
4Clinica Ruesch, GeneraLife IVF, Naples, Italy.
5Livet, GeneraLife IVF, Turin, Italy.
6Genera Umbria, GeneraLife IVF, Umbertide, Italy.
7Vitrolife, Vitrolife A/S, Aarhus, Denmark.
8Vitrolife, Vitrolife Sweden AB, Göteborg, Sweden.
Study question:

Is (intelligent data analysis) iDAScore v1.0 associated with euploidy at the blastocyst stage?

Summary answer:
iDAScore v1.0 significantly correlated with euploidy (maternal age-adjusted OR:1.3 and AUC:0.72). Euploid blastocysts were ranked highest in ca.70% of the cohorts with both diagnoses.

What is known already:
With machine learning and artificial intelligence (AI) implementation in IVF, several studies have been published mostly aimed at providing standardized and reproducible tools for gamete/embryo assessment and selection. Several of the proposed models might not be generally applicable due to their development on only a single center, small sample size and poor representation of the numerous clinical scenarios. Furthermore, the evidence has been rarely confirmed prospectively and/or in multicenter studies. Lately, the EmbryoScope+ has incorporated the iDAScore v1.0. This algorithm scores the chance of embryo implantation based on the video of blastocyst development and with no need for timing annotations.

Study design, size, duration:
Interim analysis of a prospective study. Between April-December 2021, 189 preimplantation-genetic-testing (PGT) cycles (maternal age:38.4±4yr) with ≥1 blastocyst (N=546 blastocysts, mean±SD:2.9±1.8, range:1-13) were included. We aimed at blindly analyzing the correlation between iDAScore v1.0 and (i) blastocyst quality estimated by senior embryologists, (ii) day of blastocyst full-expansion, (iii) chromosomal constitution diagnosed by NGS on a trophectoderm biopsy, (iv) the blastocyst to prioritize for transfer within cohorts with ≥2 blastocysts.

Participants/materials, setting, methods:
Undisturbed culture was conducted in the EmbryoScope+. Assisted hatching was not performed and only fully-expanded blastocysts were biopsied. Morphology was assessed by 2 senior embryologists based on Gardner criteria. Average iDAScores were reported for the following groups: (i)excellent (AA)/good (AB,BA)/average (BB,AC,CA)/poor-quality (CC,BC,CB) blastocysts, (ii)day5/6/7 blastocysts, (iii)euploid/aneuploid/complex aneuploid blastocysts. Lastly, we reported how often the highest iDAScore corresponded to the highest ranked morphology (N=143 cycles with ≥2 blastocysts) and/or euploid blastocysts (N=79 cycles with both diagnoses).

Main results and the role of chance:
In the study period, 546 blastocysts (iDAScore: 6.9±2.0, 2-9.7) were biopsied. The iDAScore was significantly different (Kruskal-Wallis<0.01) across blastocysts graded excellent (N=256,46.9%; 8.1±1.3, 2.5-9.7), good (N=97,17.7%; 6.9±1.6, 2.3-9.5), average (N=75,13.9%; 5.8±1.4, 2.9-8.7) and poor (N=118,21.5%; 4.8±1.6, 2-8.8). A significant difference (Kruskal-Wallis<0.01) was also found for the day of full-expansion (day5: N=184,33.9%, 8.8±0.8, 4.3-9.7; day6: N=324,59.1%, 6.0±1.6, 2.2-9.1; day7: N=38,6.9%, 4.6±1.6, 2-7.8). Euploid blastocysts (N=178,32.6%) had a significantly higher (Kruskal-Wallis<0.01) iDAScore (7.5±1.7, 2.4-9.6) than both simple (N=209,38.3%, 6.7±2.1, 2.1-9.7) and complex aneuploid blastocysts (N=159,29.1%, 6.3±2.0, 2-9.4). The logistic regression adjusted for maternal age highlighted a multivariate-OR 1.3, 95%CI 1.18-1.45, adjusted-p<0.01 for the association between iDAScore v1.0 and euploidy. The Receiver-Operating-Characteristic (ROC) curves outlined similar performance in predicting euploidy among the models encompassing iDAScore v1.0 adjusted for maternal age (AUC: 0.72, 95%CI 0.67-0.76, p<0.01) or blastocyst quality (defined by senior embryologists) plus day of biopsy also adjusted for maternal age (AUC: 0.73, 95%CI 0.69-0.78, p<0.01). iDAScore v1.0 and embryologists ranked the same blastocyst highest in 123 of 143 cycles with ≥2 blastocysts (86%). The highest ranked blastocyst according to iDAScore was a euploid blastocyst in 54 of the 79 cycles (68%) containing both euploid and aneuploid blastocysts.

Limitations, reasons for caution:
The main purpose of iDAScore v1.0, for which the algorithm was trained, is implantation prediction of untested blastocysts. Thus, once the sample size of this blinded prospective study will be large enough, we will also examine the association between iDAScore v1.0 and the implantation of euploid blastocysts.

Wider implications of the findings:
The similar predictivity on euploidy reported between subjective senior embryologists’ grading and objective AI-powered iDAscores is promising in view of IVF automation and standardization. This is especially relevant since iDAScore v1.0 has not been trained yet to specifically predict euploidy, and its future versions could be fine-tuned accordingly.

Keywords:
iDAScore v1.0
automation
time-lapse microscopy
embryo selection
Artificial Intelligence
Successful implementation of an end-to-end artificial intelligence (AI) platform in a busy IVF clinic: A prospective observational study.A. Papatheodorou1, D. Gilboa2, D. Seidman23, C. Oraiopoulou1, M. Karagianni1, M.I. Papadopoulou1, M. Tsarfati2, A. Kubany2, N. Christoforidis2, A. Chatziparasidou2.
1Embryolab Fertility Clinic, IVF Lab, Thessaloniki, Greece.
2AiVF, Data Science Group, Tel Aviv, Israel.
3Tel Aviv University, Sackler School of Medicine-, Tel Aviv, Israel.
Study question:

How demanding in terms of time and resource allocation is the full integration of an AI platform to the routine operation of an IVF clinic?

Summary answer:
The rapid and effective implementation, and continuous performance, of EMA was qualitatively and quantitatively demonstrated for the first time in a real-world clinical setting.

What is known already:
The role of AI-based embryo predictive analysis tools is often hailed as one of the most important recent developments in the IVF clinic. The high precision of such systems has been reported. For instance, the AI-system implemented here, EMATM(AiVF), employs a convolutional neural network architecture, providing an area-under-the-curve (AUC) of 0.95 with 83% accuracy. However, little was reported to date on the full implementation of these advanced systems in the active IVF lab. This study prospectively evaluated the clinical implementation and process-of-use in an IVF clinic of EMA, the first end-to-end AI-driven platform designed to algorithmically aid in evaluating embryos.

Study design, size, duration:
A prospective observational single center study. The study was performed in two phases. Phase I: EMA was integrated into standard workflow and qualitatively evaluated over the course of one month by five embryologists in a series of twice-daily qualitative checks and questionaries. Phase II: The rate of agreement between EMA and embryologists were benchmarked and compared to evaluate how the model aids embryologists in efficiently assessing embryos.

Participants/materials, setting, methods:
Phase I: Five senior embryologists completed electronic questionaries to qualitatively report on the ease-of-use, functionality, and performance of EMA after using the platform as adjunctive information on 588 embryos; ICSI was performed on all treatment cycles. Phase I was completed within 2 weeks. Phase II: The rate of agreement between five senior embryologists and EMA was calculated for the accuracy in ranking embryo(s) for transfer/freeze (146 treatment cycles). Phase II was undertaken in 4 weeks.

Main results and the role of chance:
EMA was effectively incorporated into a busy IVF laboratory for routine daily use to algorithmically assess all embryos at 105 hours post-fertilization prior to vitrification. All embryos were cultured in a time-lapse incubator and successfully evaluated by both EMA and embryologists in parallel to conventional morphologic embryo evaluation. In Phase I, all five embryologists qualitatively approved of EMA’s integration and clinical utility inside their workflow and reported enhanced efficiency when EMA was used per its intended use. In Phase II we demonstrated a 86% agreement rate between embryologists and EMA. Of all embryos selected for transfer by embryologists 100% were also identified by EMA as having the highest potential for implantation. Of all embryos selected for vitrification by embryologists, 85%were identified by EMA as top-quality (Gardner criteria: A/B grade) embryos. Among embryos that were graded C/D (Gardner criteria) by embryologists, 89% were identified by EMA as “low grade” as well. Pregnancies were shown to be highly associated with EMA’s embryo selection. The final stage of our implementation analysis of EMA is currently ongoing; the association between the algorithmic outputs of EMA and clinical implantation rates are being investigated in a prospective double-blinded, observation cohort study and results will be presented.

Limitations, reasons for caution:
This is a single center study, based on relatively homogenous patient population. Nevertheless, given the impressive results reported herein, we conclude that this single case-study is sufficient for demonstrating rapid and successful implementation and process-validation of EMA for routine use by the clinic.

Wider implications of the findings:
AI-based decision support systems like EMA have the potential to increase rapid and objective standardization inside the clinic, thereby improving accurate decision making and saving time and resources without interfering with the busy workflow of an IVF setting. Routine use of EMA in IVF should be prioritized for further evaluation.

Keywords:
Artificial Intelligence
AI
IVF
Algorithmic embryo evaluation
Convolutional Neural Network
Deep ensembles-based AI as a tool to support embryo grading and clinical pregnancy predictionH.J. Lee12, T. Ko3, J.H. Park4, H.M. Kim3, S. Woo5.
1Kai Health, Chief Executive Officer, Seoul, Korea- South.
2Seoul National University, Obstetrics and Gynecology, Seoul, Korea- South.
3The Catholic University of Korea, Department of Medical Informatics, Seoul, Korea- South.
4Miraewaheemang hospital, IVF clinic, Seoul, Korea- South.
5Kai Health, Artificial intelligence, Seoul, Korea- South.
Study question:

Can deep learning accurately evaluate embryo grades and predict clinical pregnancy while providing relevant clinical evidence, not just results from a black box?

Summary answer:
The sophisticated ensemble method can improve the predictive performance for embryo grades and clinical pregnancy, while providing clinically relevant evidence.

What is known already:
Previous studies have shown that AI can predict the IVF outcomes by analyzing the images of embryos. In many literature, AI outperformed human because AI could identify features human eyes could not easily detect. However, clinicians have been cautious to adopt the AI technology due to the black box nature of AI algorithms. In this study, we increased the predictive power of AI as well as providing evidence of the prediction by using deep ensembles and Grad-CAM images.

Study design, size, duration:
We performed a retrospective study of single static images of 727 Day 5 blastocysts from 270 patients who underwent single embryo transfer at a single in vitro fertilization (IVF) clinic between January 2015 and March 2021. The images were collected from standard optical light microscopes and matched with metadata such as embryo grades and pregnancy outcomes.

Participants/materials, setting, methods:
Two different models were designed: an automatic embryo grading model and a pregnancy prediction model. Embryologists labeled a day 5 embryo “GEM,” a good embryo if 4AA/AB or above in the Gardner system and pregnancy was defined as the presence of a fetal heartbeat (FHB). Deep ensembles were applied by training four convolutional neural networks (CNNs) and Grad-CAM images were extracted from the last layer and reviewed by experts.

Main results and the role of chance:
Under several single CNNs, the highest AUROCs of the embryo grading model and the pregnancy prediction model were 0.80 and 0.67, respectively. After applying deep ensembles, the AUROCs of the two models increased to 0.84 and 0.72, respectively. When the F1-score for the positive cases were maximized by adjusting the threshold of ensembles, accuracy, sensitivity and specificity of the embryo grading model were 88.1%, 92.9% and 62.5% respectively. For the pregnancy prediction model, accuracy, sensitivity and specificity were 66.3%, 77.1% and 55.6% respectively. The accuracy of GEM predicting pregnancy for the embryologists and the embryo grading AI model was 47.3% and 59.2%, respectively. It is noteworthy that the AI pregnancy prediction model outperformed the embryologists while successfully auto-grading embryos, a strong evidence that AI considered more features for prediction than what was used for grading. It was also noted from the review of the Grad-CAM images that the both AI models were focusing on the ICM, TE and hatching. Although their area of focus was the same, the pregnancy prediction model was able to make better predictions than the embryologists and the embryo grading model.

Limitations, reasons for caution:
This study has limitations as it is a retrospective study performed on embryo images from a single IVF center. In addition, including other variables such as clinical data may enhance the models.

Wider implications of the findings:
We showed that deep learning can automatically grade embryos and more accurately predict pregnancy than embryologists. Furthermore, the embryologists confirmed the model was looking at key features like ICM, TE and hatching. Sharing such evidence with clinicians can be a necessary step for AI to be adopted for clinical practice.

Keywords:
Artificial Intelligence
embryo selection
IVF
deep learning
AI study shows the effect of patient age on embryo quality is inherent in the morphology of an embryoM. Perugini1, S.M. Diakiw2, T.V. Nguyen34, D. Perugini1, Y. Galiana5, N. Rubio5, J. Aizpurua5, M.D. VerMilyea67, J.M.M. Hall38,9.
1Presagen, Life Whisperer, San Francisco, U.S.A..
2Presagen, Life Whisperer, Los Angeles, U.S.A..
3Presagen, Life Whisperer, Adelaide, Australia.
4University of Woollongong, School of Computing and Information Technology, Woollongong, Australia.
5IVF-Life, IVF-Spain, Alicante, Spain.
6Ovation Fertility, Laboratory, Austin, U.S.A..
7Texas Fertility Centre, IVF Laboratory, Austin, U.S.A..
8The University of Adelaide, School of Physical Sciences, Adelaide, Australia.
9Australian Research Council, Centre of Excellence for Nanoscale BioPhotonics, Adelaide, Australia.
Study question:

Does patient age need to be explicitly factored into AI-based embryo quality assessment, or does embryo morphology alone capture the age-related decline in embryo quality?

Summary answer:
Age-related effects on embryo quality are inherently captured in embryo morphology. AI algorithms that assess morphology correlate with expected decline in embryo quality with age.

What is known already:
Patient age strongly correlates with genetic aneuploidy in oocytes, which results in a dramatic reduction in genetic integrity and viability of embryos with patient age1. This negative correlation ultimately leads to poorer implantation and clinical pregnancy outcomes.

AI imaging tools assess the quality of embryos generally, using morphology alone2. However, it is unknown whether these morphological assessments inherently consider age-related quality factors like cytoplasmic and/or genetic competence, or whether age should be incorporated as a separate variable.

The current study aimed to assess the correlation of AI-based scores with the age-related decline in embryo quality.

Study design, size, duration:
The study used a retrospective dataset of static Day 5 blastocyst images taken using an optical light microscope with associated PGT-A or pregnancy outcomes. The dataset comprised images of 4,000 embryos sourced from 1,199 consecutive patients treated between 2011 and 2020 at five IVF clinics (USA). The study evaluated correlation of algorithms Life Whisperer Genetics and Life Whisperer Viability with patient or donor age. Data were excluded in donor cases where age was not known.

Participants/materials, setting, methods:
4,000 embryo images were used to report a linear correlation between proportion of euploids(%) and pregnancies(%) across six age-brackets, between 20 to 50 years old.

Life Whisperer Genetics AI was applied to a blind dataset of 809 images to assess likelihood of euploidy, and Life Whisperer Viability AI applied to a dataset of 556 images to assess likelihood of pregnancy. Scores within each age-bracket were averaged and chi-squared analyses was used to assess significance.

Main results and the role of chance:

As expected, there was a significant negative correlation between the number of euploid embryos(%) and patient/donor age on a dataset of 4,000 images (slope of -13.2±0.2), and on a blind test set of 809 embryos (slope of -11.2±0.2). The Life Whisperer Genetics AI score was then reported on the blind test set, showing a significant negative correlation with age (-0.45±0.16 with a χ2/dof value of 0.41). The significant downward trend indicates that the AI, using morphology alone, can account for the age-related impact in the genetic competence without a corresponding reduction in accuracy, and without needing additional age-related variables in its calculation. The AI was able to generalize correctly, identifying morphological signs of ploidy well, regardless of age.

Regarding cytoplasmic or metabolic competence, we report on a blind dataset of 556 images that the proportion of viable embryos(%) reduces with increasing patient age, although exhibiting a peak in proportion of viable embryos in the 25-29 year bracket. Similarly, we show that Life Whisperer Viability AI scores within each age-bracket reduce with age.

Our results suggest that both AI algorithms for genetic competence and metabolic competence in terms of viability take into account patient age based on morphology.

Limitations, reasons for caution:
Although age was shown to be represented in embryo morphology, adding a separate age-related variable could be considered in future studies. However, for embryo ranking and selection for a given patient, this is likely to be of value only when comparing embryos corresponding to different donor oocytes.

Wider implications of the findings:
As the age of the patient increases, the morphology of their embryos also changes, corresponding to a decrease in embryo quality. This justifies morphology-based embryo quality assessment, giving credence to generalizable AI that perform robust assessment of embryo quality for patients of all ages, and do not require calibration.

Keywords:
Artificial Intelligence
AI
PGT-A
age
Embryo
An analysis of qualitative and quantitative morphokinetic parameters automatically annotated using CHLOE (Fairtility), an AI-based tool, finds AI score predictive of blastulation and ploidyE. Gómez1, A. Brualla-Mora2, N. Almunia1, R. Jiménez3, C. Hickman4, I. Har-vardi4, A.M. Villaquirán3.
1Next Fertility Murcia, IVF Lab, Murcia, Spain.
2Fairtility- Israel, Embryology, Tel Aviv, Israel.
3Next Fertility Murcia, Gynecology, Murcia, Spain.
4Fairtility- Israel, Clinical Departament, Tel Aviv, Israel.
Study question:

What is the relationship between qualitative and quantitative morphokinetic parameters automatically annotated using CHLOE(Fairtility), an AI-based tool?

Summary answer:
CHLOE score is associated with ploidy. DUC embryos have lower blastulation, form fewer good blastocysts, have increased fragmentation, slower development, lower implantation than non-DUCs.

What is known already:
The introduction of time-lapse technologies in IVF has led to the discovery of quantitiative and qualitative morphokinetic parameters which are predictive of embryo viability (ESHRE Workshop group, 2020). The challenges of annotating videos manually remain: (i)operator variation, (ii)time-consuming; (iii)complexity of how to prioritise numerous features when determining which embryos to transfer, freeze or discard. CHLOE (Fairtility) is an AI-based tool designed to automatically capture these parameters from the time-lapse videos, removing the “black box” associated with AI, and, instead, bringing transparency and support to the embryologist responsible for the decision, thus, enhancing personalisation of care down to each individual embryo.

Study design, size, duration:
Prospective cohort analysis on time-lapse data retrospectively collected at a single private fertility clinic in Spain between 2018-2020. 693 videos were automatically annotated (without training) using the CHLOE Artificial Intelligence (AI) tool for the following quantitative features: tPNa,tPNf,t2,t3,t4,t5,t6,t7,t8,t9,tM,tsB,tB,teB, size of ICM; and the following qualitative parameters: number of pronucleates, morphological quality of Inner Cell Mass and Trophectoderm (CHLOE Morphological scoring), identification of unusual embryo cleavages i.e. Direct Uneven Cleavage (DUCs), amongst other features.

Participants/materials, setting, methods:
All embryos were cultured using the Embryoscope (Vitrolife) incubator. Using a range of algorithms, CHLOE generated a prediction of blastulation (at 30hpi) and implantation which were compared to outcome (blastocysts vs non-blastocysts; euploids vs Aneuploids&Mosaics; Mosaics vs euploids&aneuploids). Embryos identified as DUCS by CHLOE were compared with non-DUCs in terms of outcomes and in terms of endpoints generated by CHLOE (parametric continuous data assessed using 2-tail t-test, categorical data using chi-square).

Main results and the role of chance:
Within all cleaved embryos analysed (n=693), 29% were DUCs. DUC embryos were less likely to blastulate (DUCvsNonDUCs: 25vs50%,p<0.001), had a higher proportion of embryos with severe fragmentation (26% vs 3%,p<0.001), less likely to be suitable for biopsy (23vs87%, p<0.001) lower blastulation prediction score (0.53vs0.76,p<0.001), lower implantation prediction score (0.21vs0.48,p<0.001) and slower embryo development across the all morphokinetic time-points assessed(p<0.001), except for t5 (NS); than non-DUCs. DUCs and non-DUCs had similar proportion of 1,2,3PNs(5,83,5%vs 7,84,3%, NS).

Within embryos that blastulated (n=581), 25% were DUCs. DUC blastocysts were less likely to have a good quality ICM (7vs33%,p<0.001) or a good quality trophectoderm (9vs35%,p<0.05), lower implantation score (0.29vs0.52,p<0.05) and slower embryo development across the following morphokinetics time-points than non-DUC blastocysts. DUCs (n=38) and non-DUC (n=292) blastocysts had similar euploidy rate (50vs43%,NS), mosaicism rate (8vs11%,NS), and similar ratio of Euploids:Aneuploid:Mosaics (19:16:3vs126:133:33, NS).

One DUC embryo was transferred, leading to an ongoing clinical pregnancy.

Blastulation score was predictive of blastulation (AUC of 0.91, p<0.001). Mosaic embryos had similar implantation score to non-mosaics (0.61vs0.67, NS). Euploid embryos had a higher implantation score than aneuploid blastocysts (0.71bs0.62, p<0.02), so implantation score was predictive of ploidy.

Limitations, reasons for caution:
This study involved the validation of (i) a specific AI based tool which may not be generalised across other AI tools; (ii) in a single centre. Results obtained did not involve training, suggestive of CHLOE’s ability to generalise across clinics. Presenting a framework for responsibly incorporating AI into clinical practice.

Wider implications of the findings:
CHLOE can simplify the processing of time-lapse data to effectively, consistently, and efficiently quantify parameters that can help explain a comprehensive prediction of embryo viability. This provides a useful tool which will ultimately assist clinicians with selecting the most optimal embryos for transfer and avoid wastage from discarding viable embryos.

Keywords:
Artificial Intelligence
embryo development
‘Augmented intelligence’ to possibly shorten euploid identification time: A human-machine interaction study for euploid identification using ERICA, an Artificial Intelligence software to assist embryo ranking.A. Chavez Badiola M.B.B.Ch.-M.D.12,3, A. Flores-Saiffe1, R. Valencia4, G. Mendizabal-Ruiz1, J. Villavicencio5, D. Gonzalez5, D. Griffin2, A. Drakeley67, J. Cohen89.
1IVF 2.0 ltd, Research and Development, London, United Kingdom.
2University of Kent, School of Biosciences, Canterbury, United Kingdom.
3New Hope Fertility Center, Clinical Research, Mexico City, Mexico.
4IVF 2.0 ltd, MLOps, London, United Kingdom.
5IVF 2.0 ltd, MLOps, Guadalajara, Mexico.
6Liverpool Women’s Hospital, Hewitt Centre for Reproductive Medicine, Liverpool, United Kingdom.
7University of Liverpool, Clinical, Liverpool, United Kingdom.
8IVFqc, Research and Development, New York City, U.S.A..
9IVF 2.0 ltd, Embryology, New York City, U.S.A..
Study question:
What is the mean number of transfers needed to achieve a euploid transfer selected by embryologists plus ERICA’s assistance?

Summary answer:
Augmented intelligence (ERICA plus human collaboration) outperforms both the embryologists and artificial intelligence’s individual performance alone.

What is known already:
Euploid embryos are more likely to implant successfully. Artificial intelligence (AI) could improve embryo selection over current techniques, but scepticism exists. Augmented intelligence (AuI) combines both the mathematical reproducibility of machine learning and the knowledge and experience of humans. This approach employs AI tools as an assistant, where the user shall learn to interpret the AI. A recent study suggested that embryologists assisted by AI improved the embryo selection of euploid transfers. ERICA (IVF2.0 Limited, UK) was designed to rank blastocysts according to their probability of euploidy.

Study design, size, duration:
We prospectively studied embryo selection for ERICA alone, embryologists only and when interacting (embryologists and ERICA) in 150 synthetically generated (reconstructed on real-data) embryo transfer cycles. Embryos were ranked in order, and performance was assessed by time to identify a euploid embryo within each cycle cohort correctly. Embryologists were allowed to rank a maximum of 10 cycles per day for three weeks starting in January 2022, using a mobile phone application designed for this purpose.

Participants/materials, setting, methods:
Using real-life cycle distributions of euploid/aneuploid blastocysts and the number of embryos in a cycle (according to ERICA’s database), we created 150 synthetic cycles, 30 for each age bracket (< 35, 35-37, 38-40, 41-42, and >42). These were randomly populated with blastocyst images preserving their actual ploidy status correspondingly. Each synthetic cycle contained between 2 to 6 authentic embryo images with at least one euploid and one aneuploid.

Main results and the role of chance:
The total database had a euploid rate of 37.4% (n= 513), and by age brackets from 1 to 5 were 45.7% (n=116), 43.8% (n=105), 35.9% (n=92), 31.2% (n=96), and 28.8% (n=104) respectively.

The mean number of cycles analysed by each participant was 113.5 (CI: 100.8-126.2). The mean time-to-euploid transfer for embryologists alone was 2.07 (CI:2.00-2.13); for the ERICA alone was 1.86 (CI:1.82-1.91); and for embryologists assisted by ERICA was 1.62 (CI:1.55-1.68). All study groups compared to each other were statistically significant using a paired two-tailed student’s t-test (p<0.001).

The proportion of euploid transfer at the first try for embryologists alone was 0.40 (CI:0.37-0.43), for ERICA alone was 0.54 (CI:0.53-0.54), and for embryologists assisted by ERICA was 0.47 (CI:0.44-0.50). All study groups compared with each other were statistically significant with a paired two-tailed student’s t-test (p<0.01).


Limitations, reasons for caution:
Although our findings suggest that Aul outperforms both AI and humans alone, this study needs to be replicated with a larger cohort of embryologists with different experience levels in different countries to confirm these results.

Wider implications of the findings:
Combining machine-human interaction through a well-designed process could improve embryo selection and reduce inter-operator variability amongst staff with different experience levels. It could also set a frame for adequate agency and accountability, and enhance trust and adoption.


Keywords:
AI
embryo ranking
ERICA
Augmented Intelligence
Human-Machine Interaction
A novel non-invasive tool for oocyte selection using gene expression and artificial intelligenceC.A. Link1, L. von Mengden2, M.A. De Bastiani2, M. Faller3, L. Dorneles3, R. Pedo3, L. Arruda3, R. Link1, F. Klamt2.
1ProSer Clinics, Gynecology, Porto Alegre, Brazil.
2Federal University of Rio Grande do Sul- UFRGS, Biochemistry, Porto Alegre, Brazil.
3ProSer Clinics, Embryology, Porto Alegre, Brazil.
Study question:

Is it possible to predict top quality embryos through gene expression analysis of cumulus cells and artificial intelligence before fertilization?

Summary answer:
The artificial inteligence based tool OsteraTest is able to predict the ability of the oocyte to develop into a top quality blastocyst with 86% accuracy.

What is known already:
Proper oocyte selection is an important bottleneck for In Vitro Fertilization (IVF) success. Nowadays, oocyte selection relies mainly in morphological analyses, which is not an unbiased method and may fail to reveal the real competence status of gametes. Cumulus oophorus cells (CC) are somatic cells that surround the oocyte at the antral follicle. It is directly involved in oocyte maturation and development, and thus is a valuable non-invasive source of biological information regarding the oocyte’s health. Artificial intelligence can be used to identify key biological processes and markers of interest through machine learning methods and could thus be applied.

Study design, size, duration:
This is a prospective study that included data from 80 CC samples retrieved from publicly available microarray data (GSE27377) in the algorithm construction phase and 65 CC samples from each oocyte of 26 patients submitted to Intracytoplasmic Sperm Injection (ICSI) in validation phase. Samples were divided in two groups: CCs from oocytes that developed into top quality blastocysts in day 5 after ICSI and CCs from oocytes that presented arrested development.

Participants/materials, setting, methods:
Samples were submitted to real time quantitative PCR with 25 target genes. Afterwards, gene expression levels for each gene and sample were submitted to the final algorithm, that was computed into a software, the OsteraTest, in a double-blind approach. The software indicated the development potential of each oocyte and this ranking was compared to the embryologist’s day 5 blastocyst classification according to Gardner.

Main results and the role of chance:
The bioinformatic approach implemented resulted in the OsteraTest, composed of 8 machine learning models using a 25-gene network that altogether can predict oocyte quality, thus representing a very complex assembly. The software presented more than 86% accuracy in predicting the oocytes developmental capacity into a top-quality day 5 blastocyst. Top quality blastocysts present over 80% chance of resulting in a healthy pregnancy and live birth, and so this approach could be further used as a pregnancy potential predictor after a prospective study is conducted, analyzing CCs from oocytes that were further fertilized, developed into blastocysts and transferred in single embryo transfers. This tool can contribute greatly to improve success rates in IVF procedures and to assess egg quality in egg freezing procedures, providing information about the gametes potential even years before its use.

Limitations, reasons for caution:
A large-scale, prospective, randomized study is necessary for further validation of these findings and to confirm the validity of the OsteraTest in the clinical environment. Such study is now being conducted in our lab.

Wider implications of the findings:
The OsteraTest proved to be a valuable non-invasive tool to predict embryo formation and oocyte capacity even before fertilization.It can enable the clinics to anticipate successful treatments and provide a predictive report for oocyte freezing patients.

Keywords:
oocyte selection
embryo selection
Artificial Intelligence
machine learning
Cumulus cells
The location of fragments and degraded zones in blastocysts is associated with ploidy: moving towards explaining an AI-based morphology tool trained on euploidy outcomes.A. Chavez-Badiola12, A. Flores-Saiffe Farias1, D. Sanchez3, G. Mendizabal-Ruiz14, R. Valencia-Murillo1, A. Drakeley5, J. Cohen6.
1IVF 2.0 Ltd, Research and development, Maghull, United Kingdom.
2University of Kent, School of Bioscience, Canterbury, United Kingdom.
3New Hope Fertility Center, Embryology, Mexico City, Mexico.
4Universidad de Guadalajara, Department of Computational Sciences, Guadalajara, Mexico.
5Hewitt Fertility Centre- Liverpool Women’s Hospital, University of Liverpool, Liverpool, United Kingdom.
6IVFqc, Research & Development, New York, U.S.A..
Study question:

Is the location of degraded areas or fragments an indication of ploidy in blastocyst images?

Summary answer:
Degradation traces observed in a blastocyst’s inner cell mass correlates with aneuploidy when confirmed by trophectoderm biopsy.

What is known already:
The interaction between humans and Artificial Intelligence (AI) augmented intelligence, (AuI) is dependent on the AI’s ability to be self-explainable and interpretable. This is a highly desired feature of AI’s in healthcare, given that blindly trusting it to make a decision has serious ethical considerations and potential consequences. Currently, most available AI’s provide “black-box” advice that might cause difficult interaction with their human counterparts. ERICA (IVF2.0 Limited, UK), was designed to rank blastocysts using euploid status as ground truth, and although initially a “black-box,” we describe results from an initial attempt towards making it explainable.

Study design, size, duration:
This study was designed as a proof-of-concept on retrospectively collected images. De-identified images (n = 329) with known ploidy status (euploid or aneuploid) were retrieved (November 2021) from ERICA. The images were processed from December 2021 to January 2022.

Participants/materials, setting, methods:
A senior embryologist identified visual degenerative traces from blastocyst images for areas of cell degradation and cell fragments. Ploidy status was blinded to the embryologist. Images were segmented for trophectoderm (TE), blastocoele (BC), and inner cell mass (ICM) using the automated tool of ERICA’s algorithm. The distance between the centre of each degenerative trace and the ICM was measured. The Dice Similarity Coefficient (DSC) and the proportion of degenerative traces in each zone were computed.

Main results and the role of chance:
We identified some level of degradation in 60% of the blastocysts, particularly in BC:44%, ICM:38%, TE:26%, and ICM+BC:55%, and the presence of fragments in 103, particularly in BC:21%, ICM:10%, and TE:24%. Our database contained 52% euploid blastocyst images.

We found that when DSC between degradation and ICM is more than 10% (44/78 aneuploids) the chances of aneuploidy increase by 25% (Z=-1.76, p<0.05).

We also found a 13% increased chance of an embryo being aneuploid (92/157 aneuploidy) if the area of ICM+BC has any presence of degradation (Z=-1.14, p=0.13), and an increased risk of aneuploidy if DSC (U=12401, p=0.09), and also if the proportion of degradation was found in ICM+BC (U=12397, p=0.09).

Our data also suggests that aneuploid embryos have closer fragments (mean=51um, 95% CI: 42.2-59.9) than euploids (mean=63.4um 95% CI:51.1-75.7) (U = 988,=0.19).

Mann-Whitney U test and Z-test for proportions were used accordingly, both under the hypothesis that increased degenerative traces means a higher probability of being aneuploid (one-tailed test).





Limitations, reasons for caution:
Analyzing degenerative traces using a single image from a single focal plane might be limiting. Identifying fragments and degradation might not be a replicable process inter- or intra- embryologist. More annotators are needed to reduce this bias.

Wider implications of the findings:
Correlation between aneuploidy and cell degradation was stronger in the ICM than TE, although ploidy status is obtained via TE biopsy. Our data suggest that fragments that are closer to the ICM might increase the chances of aneuploidy. A larger prospective multicentre study should be conducted to confirm these findings.

Keywords:
explainable artificial intelligence
Augmented Intelligence
cell degradation
cell fragmentation
ploidy status
The influence of artificial intelligence embryo scoring on male-female sex ratioI. Zvereva1, K. Dmitry1.
1Fomin Clinic LLC, IVF department, Moscow, Russia C.I.S..
Study question:

Are there correlations among male-female sex ratio, human blastocyst ploidy status and artificial intelligence (AI)-based morphokinetics embryo selection?

Summary answer:
Embryo selection based on morphological evaluation by time-lapse system (TLS) with AI technology could lead to a female-biased sex ratio of resulting newborns.

What is known already:
As of now, there have been only limited attempts to evaluate how AI-based TLS embryo selection for priority transfer could affect male-to-female sex ratio in human population, and the results of different publications were contradicting. However, the morphokinetic assessment was made without calculating the embryos KID Score (Embryos with Known Implantation Data), which significantly improves and make faster the decision-making process.

Study design, size, duration:
This is a monocentric, retrospective study from October 2019 to December 2021 including 251 blastocysts with PGT-A results. Embryos were cultured in time-lapse incubator (EmbryoScope, Vitrolife) up to the time of trophectoderm biopsy. All embryos were evaluated based on the KIDscoreTM D5 algorithm (Vitrolife) under routine supervision by experienced embryologists. The PGT-A results were obtained by using next-generation sequencing (NGS) platform from Medical Genomics LLC laboratory (Illumina MiSeq, Illumina).

Participants/materials, setting, methods:
Sample size was 251 embryos from 101 women (mean female age was 36.0 ± 5.6 years). All embryos were divided in four groups in accordance with their final KID score: <2.5 (n=7), 2.6-5.0 (n=33), 5.1-7.5 (n=123) and >7.5 (n=88). The embryos with sex chromosome abnormalities were also included in research to assess the frequency of occurrence in embryos with low and high KID score.

Main results and the role of chance:
As expected, the percentage of aneuploid blastocysts, as well as the rate of sex chromosome abnormalities, decreased with increasing the embryo KID score. The highest male-female sex ratio among all embryos was observed for the group with KID score <2.5 (1.33), and gradually decreased to values of 0.92 and 0.74 in groups with KID score 5.1-7.5 and >7.5, respectively. At the same time, the highest male-female sex ratio among euploid blastocysts was maximal in the group with KID score 2.6-5.0. The obtained data contradict results of some other studies, which reported faster development of male embryos (which should mean their higher KID score). However, the KID score was not evaluated in them, and thus these results cannot be directly compared to ours.

Limitations, reasons for caution:
Most patients in this study had complicated reproductive history, with repeated failures in IVF programs, often with a stop in embryo development. Also, the present investigation is retrospective. A following multicenter researches with larger sample size and cross-centered validation of embryologist-performed annotation is considered in our future approach.

Wider implications of the findings:
Obtained data doesn’t allow to establish the female-gender prevalence among embryos. Nevertheless, further accumulation of knowledge about relation between KID Embryo Score and embryo gender can be used for presumptive sex determination in special cases with sex-linked diseases, where poor embryo morphology doesn’t allow to perform biopsy for genetic analysis.

Keywords:
Artificial Intelligence
time-lapse technology
male-female sex ratio
KID Score
Impact of Direct Unequal Cleavage (DUC) on embryo development, blastocyst formation and ploidy – artificial intelligence (AI) analysis.A. Florek1, R. Odia1, S. Theodorou2, M. Duran2, W. Saab2, V. Seshadri2, P. Serhal2, C. Hickman3, A. Brualla Mora3, R. Derrick3, M. Gaunt1.
1The Centre for Genetic & Reproductive Health CRGH, Embryology, London, United Kingdom.
2The Centre for Genetic & Reproductive Health CRGH, Clinical, London, United Kingdom.
3Fairtility, n/a, Tel Aviv-Yafo, Israel.
Study question:

Do DUCs significantly impact embryo development? In particular, morphokinetics, grading, and Pre-implantation Genetic Testing for Aneuploidy (PGT-A) outcome? Is this analysis corroborated by artificial intelligence?

Summary answer:
DUC embryos develop slower, have lower rates of blastulation and lower CHLOE (Fairtility) scores for blastulation and implantation. However, occasionally euploid blastocysts form from DUCs.

What is known already:
Time-lapse technology enables the identification of DUCs during embryo development. Previous research associates DUCs with poorer blastulation, implantation, and ploidy outcomes. However, DUCs are not routinely annotated in all clinics. Some algorithms, deselect embryos with short second cell cycles; hence, DUCs are rarely transferred. Whether DUC embryos should be automatically discarded or deprioritised is an ongoing debate which leads to inconsistency in clinical practices across fertility centres. AI image processing algorithms may assist embryologists in the identification of DUCs.

Study design, size, duration:
A retrospective single-centre study of normally-fertilised embryos cultured in time-lapse incubators throughout 2019–2021. We reviewed 9284 time-lapse videos using an AI image processing tool (CHLOE, Fairtility), and assessed DUC embryo outcomes (ploidy, blastulation, and blastocyst quality). Additionally, we analysed pronuclei data searching for possible causes of DUCs.

Participants/materials, setting, methods:
CHLOE (Fairtility) software analysed time-lapse videos identifying pronuclei, DUCs, and blastulation; recording all morphokinetic time points (tPNa,tPNf,t2,t3,t4,t5,t6,t7,t8,t9,tM, tSB,tB,tEB), morphological grades for the inner cell mass (ICM) and trophectoderm, blastocyst size at 116hpi; and assessing the likelihood of blastulation (at 30hpi) and implantation. We evaluated the statistical significance for all variables using t-tests (continuous variables) and chi-squared tests (categorical variables). We quantified the two pronuclei (2PN) detection efficacy using four metrics: accuracy, sensitivity, specificity, and informedness.

Main results and the role of chance:
Of all the embryos analysed (n=9284), 35% showed DUCs (n=3269). Blastulation was significantly higher in non-DUC versus DUC embryos (76% and 49%, p<0.0001). Of the embryos that blastulated, ICM quality (A,B,C,D: 24%,13%,19%,21% and 3%,4%,16%,47%, p<0.001) and trophectoderm quality (20%,21%,15%,23% and 2%,7%,14%,52%, p<0.0001) were significantly higher in non-DUC than in DUC embryos.

As defined, DUC embryos were significantly quicker at reaching t3 than non-DUC [Mean(SD): 34(15) and 39(10), p<0.0001], with the minimum times being 4hpi and 13hpi respectively. Interestingly, there was no significant difference in achieving t5 [52(21) and 51(12), NS]. For all other morphokinetic milestones, DUC embryos were 6 hours slower than non-DUC embryos. DUC embryos had an euploidy rate of 27.2% (12/44). Only one DUC embryo was transferred in a double embryo transfer cycle leading to a negative outcome.

Implantation score [0.14(0.24) and 0.46(0.36), p<0.0001] and blastulation score [0.4(0.46) and 0.75(0.4), p<0.0001] were lower for DUC embryos than for non-DUC embryos.

CHLOE automatic PN assessment agreed with human annotation in 92% of cases (TP=388,TN=5,FP=29,FN=7). CHLOE Blastocyst prediction at 30hpi had an AUC of 0.89. The embryologist agreed on 97% of all 483 embryos that CHLOE classified as DUC. Discrepancies arose from CHLOE misclassifying fragments as blastomeres. Further studies warranted.

Limitations, reasons for caution:
Differentiating between fragments and blastomeres within the 5 hours from the first division proves challenging for embryologists and, especially, AI algorithms. Hence, some embryos’ DUC status may be misclassified. Additionally, our sample sizes are limited and larger sizes are needed to corroborate our findings, especially those pertaining to ploidy status.

Wider implications of the findings:
DUC embryos are associated with poorer outcomes and DUC status should be integrated into embryo classification frameworks. Nevertheless, some DUC embryos prove to be euploid. Hence, DUC embryos should not excluded from culture at cleavage stage and instead be allowed to reach blastocyst stage before assessing their suitability for transfer/vitrification/PGT-A.

Keywords:
Direct Unequal Cleavage
morphokinetics
Pre-implantation Genetic Testing for Aneuploidy (PGT-A)
artificial intelligence (AI)
Can AI be used as a tool in the evaluation of the risk of pregnancy loss after euploid single embryo transfer?B. Yuksel1, G. Ozkara1, H. Yelke1, B. Okten1, A. Brualla Mora2, R. Derrick2, I. Erlich2, G. Ozer1, Y. Kumtepe1, M. Aygun1, S. Kahraman1.
1Istanbul Memorial Sisli Hospital, ART and Reproductive Genetics Center, Istanbul, Turkey.
2Fairtility, Consultant, Tel Aviv, Israel.
Study question:

Can AI be used as a tool in the evaluation of the risk of pregnancy loss after euploid embryo transfer?

Summary answer:
AI-annotated tSB (time to start blastulation) and tB (time to formation of full blastocyst) were predictive of miscarriage and live birth.

What is known already:
Despite the advantage of PGT-A in preventing miscarriages, a pregnancy loss still can occur after the transfer of a chromosomally normal embryo. Unfortunately in the literature there are no clear criteria indicating which morphologically good euploid embryos may be at risk of resulting in pregnancy loss.

Study design, size, duration:
Retrospective cohort analysis of 455 euploid embryos allowing for the analysis of a range of variables for prediction of live birth or miscarriage from ICSI cycles, that were cultured in the Embryoscope (Vitrolife) at a single clinic (Istanbul Memorial Hospital, ART and Reproductive Genetics Center), and transferred between 2017-2020. This is the largest reported AI study to date predicting outcome in euploid SETs.

Participants/materials, setting, methods:
Patients were aged between 24-44 years. Each morphokinetic feature was annotated manually and by CHLOE-(Fairtility), and pregnancy outcomes were evaluated.

Main results and the role of chance:
When annotated using AI, the average time (mean±standard deviation (SD)) for tSB (98±7 vs 97±7, p<0.05) and tB (106±7 vs 105±, p=0.02) were significantly longer in patients who miscarried compared to those that did not.

Implantation and blastulation scores were not significantly predictive of clinical miscarriage (0.77 ± 0.22 vs 0.74 ± 0.24, NS and 0.97±0.15 vs 0.97±0.16, NS for live birth and miscarriage, respectively).

Embryos that aborted and led to live birth had an equal proportion of direct unequal cleavage (DUC) (respectively, DUCs assessed by CHLOE was: 6.8% vs 6.8%, NS). There was no significant difference between the presence of DUC and the pregnancy outcome (miscarriage rate in the presence or absence of DUC was 16.1% vs 16.3%, NS).

Limitations, reasons for caution:
Retrospective data using embryos selected for transfer

Wider implications of the findings:
AI-annotated tSB and tB can be added to the already existing range of available evaluation methods for embryo viability and functions which can predict the risk of miscarriage after a euploid embryo transfer.

Keywords:
euploid embryo
miscarriage
Artificial Intelligence
time lapse morphokinetics
A validation study for artificial intelligence (AI) compared with manual annotation, using donor eggs reveals that AI accurately predicts blastulationJ. Teruel Lopez1, C. Miret Lucio1, M. Lozano Zamora1, M. Escribá Suarez1, M. Benavent Martínez1, J. Crespo Simó2, I. Erlich3, M. Tran3, N. Bergelson3.
1Equipo Médico Crespo, IVF Laboratory, Valencia, Spain.
2Equipo Médico Crespo, Medical Director, Valencia, Spain.
3Fairtility, Clinical, Tel Aviv, Israel.
Study question:

Are the annotations produced by AI comparable to manual annotations? Does AI accurately assess fertilisation checks, and predict embryo usage and blastulation compared to embryologists?

Summary answer:
Automatic annotations by AI was consistent with manual annotations. AI implantation algorithms had strong prediction of blastulation and embryo usage.

What is known already:
Currently, embryos are manually annotated for specific morphokinetic features during embryo development. This is a labour-intensive process, and dependent on training and experience, leading to inter and intra clinic variation.

The decision to transfer, freeze or discard embryos relies heavily on these annotations. It is paramount that we develop a tool that will provide consistency and accuracy in annotation and produce scores that can facilitate decisions around embryo usage.

AI has demonstrated its potential to achieve this, but first must be validated before its integration into clinical practice. There have been no such studies demonstrating this so far.

Study design, size, duration:
Retrospective cohort study, that took place between September to December 2021 at a private fertility clinic in Spain. To control for embryo variability, this study only included 179 time-lapse videos for embryos created from donor eggs. This was based on the understanding that donor eggs are more likely to produce better quality blastocysts and embryos and thus will give the most optimal conditions for annotation in a validation framework.

Participants/materials, setting, methods:
The same time-lapse cultured embryos were annotated manually and automatically by CHLOE(Fairtility, an AI-based tool). Manual and CHLOE annotations were compared to assess the strength of agreement (i) using intra-class correlation (ICC), and (ii) the proportion of corrections required at the pronuclei (PN) stage. AI accuracy in predicting blastulation at 30hours, and blastulation before 116 hours, was also assessed using AUC as the efficacy metric. Embryo usage was compared with the AI-generated ranking of embryos.

Main results and the role of chance:
The majority of morphokinetic variables showed a very-strong agreement, with an ICC range of (0.81-1.00), namely for; tPNf, t2, t3, t5, t7, tSB, tB and tEB. Only t4 (0.5) showed a moderate agreement. On average (Mean+-Standard deviation), AI annotated t4 later than embryologists (36+-5vs39+-10 (hours)). All other variables fell within a strong ICC of (0.61-0.8). There were no very weak (0-0.2) or weak (0.21-0.4) variables. PN agreement between AI and embryologists was 93%: PN’s had to be corrected by an embryologist only 7%(n=179) of the time.

AI predicted blastulation on day 3 with a high level of sensitivity 0.77 and specificity 0.82, (AUC: 0.84,p<0.0001). Furthermore, the blastulation score given on day 3 was a predictor of blastulation before 116 hours with a high sensitivity 0.77 and specificity 0.80, (AUC: 0.81,p<0.0001).

Similarly, AI-generated ranking accurately correlated with embryologist decisions to freeze, transfer or discard embryos, with an overall high sensitivity 0.88 and specificity 0.67, (AUC: 0.84,p<0.0001). A rank of 1 was seen in 14%(n=113) of embryos, all of which were frozen or transferred. Some embryos that scored a rank of 2 were discarded, but this was significantly lower than those that scored a rank of 3 or more (3%vs32%,p=0.0004).

Limitations, reasons for caution:
This study only included embryos from donor eggs. Furthermore, this study occurred at a single site and is planned to be replicated at several clinics. Where there are discrepancies between human and AI, further studies are required to determine the ground truth.

Wider implications of the findings:
This study demonstrates an AI framework to safely introduce AI in the fertility clinic. AI will accurately annotate embryos and give reliable scores to predict good quality blastulation, and inform decisions around embryo usage determination. AI provides a time-effective, objective tool in decision-making, with the potential to optimise success.

Keywords:
Artificial Intelligence
Validation
Embryo
blastulation
morphokinetic
An artificial intelligence (AI) deselection model for top-quality blastocysts: algorithmic analysis of morphokinetic features for aneuploidy may increase implantation ratesD. Gilboa1, L. Bori2, M. Shapiro1, A. Pellicer3, R. Maor1, A. Delgado2, D. Seidman1, M. Meseguer2.
1AiVF Ltd., IVF Research and Development, Raanana, Israel.
2IVI RMA Valencia, IVF Laboratory, Valencia, Spain.
3IVI Foundation-IIS La Fe, Research and Innovation, Valencia, Spain.
Study question:

Can an AI deselection model identify distinct morphokinetic patterns in top-quality blastocyst with unknown ploidy that fail to implant?

Summary answer:
An AI based deselection model was able to predict implantation failure based on morphokinetic features previously found to associate with aneuploidy.

What is known already:
Aneuploidy is the most common explanation for implantation failure of high-quality blastocysts. Yet, high-quality blastocysts with unknown ploidy that fail to implant are often morphologically indistinguishable from blastocysts that succeed to implant. Our previously published results (ESHRE 2021) demonstrated that aneuploid blastocysts were more likely to reach development events (t2-t8) later, and that the timing between each event was statistically longer (p<0.001), when compared to euploid embryos. Given that delayed morphokinetic rates are tightly linked to ploidy, we investigated whether similar known morphokinetic features were associated with implantation failure in top-graded embryos.

Study design, size, duration:
Time-lapse sequences of 3,259 top-quality blastocysts from fresh single embryo transfer cycles with known implantation outcomes were analyzed using an AI-based algorithm. The algorithm utilized convolutional neural network extracted temporal features based on multiple morphokinetic parameters known to associate with ploidy.

Participants/materials, setting, methods:
time-lapse sequences and morphokinetic events were algorithmically analyzed to measure the rate of mitotic division events and compare the number of embryos in each category (implanted/nonimplanted) that reached each developmental event at least one standard deviation (SD) later than the mean for implanted embryos.

Main results and the role of chance:
Results showed statistical differences in the following morphokinetic features between the two categories: t2, t3, t4, and t3-t4 (p<0.05). Implanted top-graded blastocysts were likely to reach t2, t3, and t4 after 25.23 ± 3.8 SD, 36.06 ± 3.4 SD, and 37.14 hours ±3.6 SD, respectively. The time gap between t3 and t4 was found to be 12.25 hours ± 5.31 SD. Given this, we followed the methodology described above to propose cutoff values (in hours) that differentiated between non-implanted and implanted top-graded blastocysts based on their morphokinetic profiles. Implantation failure was found to be associated with the likelihood of reaching t2 after 28.61 hours (OR=2.36, CI 0.96-5.77), t3 after 39.46 (OR=3.48, CI 1.62-7.47), and t4 after 40.79 hours (OR=2.23, CI 1.09- 4.53). A time gap between t3 and t4 of more than 17.56 hours was also associated with implantation failure (OR=2.48, CI 1.12-5.48), indicating perturbed mitotic activity. The cutoff values proposed here were incorporated into the algorithm for optimized deselection of morphologically similar top-quality blastocysts with delayed morphokinetic profiles.

Limitations, reasons for caution:
This study needs to be validated on a larger, multi-centric dataset that takes into account more morphokinetic features associated with ploidy in order to increase the robustness of our algorithm.

Wider implications of the findings:
For the first time, our algorithmic model proposed here demonstrates the utility of an AI tool to deselect top-graded blastocysts that would otherwise be selected for transfer based on conventional morphologic assessment alone.

Keywords:
Artificial Intelligence
PGT
morphokinetic
time lapse
Potential for improvement and current limitations of Artificial Intelligence (AI) for embryo selection: analysis of external validation dataI. Sfontouris1, D. Nikiforaki1, S. Liarmakopoulou1, A. Sialakouma1, A. Koutsi1, A. Polia1, M. Belmpa1, S. Theodoratos2, J. Walker2, E. Makrakis1.
1Hygeia IVF – Embryogenesis, Embryology Laboratory, Athens, Greece.
2IVF Vision Limited, IVF Vision Limited, Cambridge, United Kingdom.
Sfontouris1, D. Nikiforaki1, S. Liarmakopoulou1, A. Sialakouma1, A. Koutsi1, A. Polia1, M. Belmpa1, S. Theodoratos2, J. Walker2, E. Makrakis1.
1Hygeia IVF – Embryogenesis, Embryology Laboratory, Athens, Greece.
2IVF Vision Limited, IVF Vision Limited, Cambridge, United Kingdom.

Study question:

What are the prospects of improvement and the limitations of an AI system for embryo selection?

Summary answer:
The predictive performance of AI can be enhanced by including additional factors, on top of embryo images, and by assessing images with centered blastocysts.

What is known already:
We previously reported the external validation of IVFvision.ai, an AI algorithm that differentiates between Day-5 blastocysts with a positive or negative implantation outcome. IVFvision.ai had higher AUC and overall accuracy in predicting implantation compared to KIDScoreD5 and senior embryologists. Here we report a secondary analysis of external validation data, focusing on a) the improvement of the predictive ability of IVFvision.ai by incorporating data from additional sources, and b) the impact of the blastocyst image quality on the performance of IVFvision.ai.

Study design, size, duration:
This is a secondary analysis of external validation data. External validation of IVFvision.ai was performed at a University IVF Clinic using 113 anonymised Embryoscope images of single D5 blastocyst transfers with known implantation outcome.

Participants/materials, setting, methods:
The performance of IVFvision.ai and three senior Embryologists to correctly classify blastocysts according to implantation outcome were compared in images in which the whole blastocyst was visible (centred blastocysts, n=62) vs images in which part of the blastocyst was not visible (off-centred blastocysts, n=51). Logistic regression models were created: a) IVFvision alone, b) IVFvision+age, c) IVFvision+fertilisation_method, d) IVFvision+KIDScoreD5, e) IVFvision+age+Fertilisation_method+KIDScoreD5. The AUC of each model in predicting implantation was estimated using ROC curve analysis.

Main results and the role of chance:
The AUC of IVFVision.ai (0.675 vs 0.432), Embryologist 1 (0.570 vs 0.390), Embryologist 2 (0.663 vs 0.448) and Embryologist 3 (0.628 vs 0.485) were higher for images with centered blastocysts compared to non-centered blastocysts, respectively. There was a progressive increase of AUC with the addition of more factors in the predictive models. a) IVFvision alone: AUC=0.675, b) IVFvision+age: AUC=0.675 c) IVFvision+KIDScoreD5: AUC=0.721 d) IVFvision+fertilisation_method=0.740, e) IVFvision+age+Fertilisation_method+KIDScoreD5=0.768.

Limitations, reasons for caution:
The retrospective nature of the study and the small sample of the study raise the need for further prospective studies with a larger number of embryos.

Wider implications of the findings:
The highest performance of IVFvision.ai is achieved in images with centred blastocysts, suggesting that implantation cannot be predicted accurately in images with non-centred blastocysts. In addition, we provide provide proof of concept that training AI systems using data from different sources, in addition to embryo images, may increase overall accuracy.

Keywords:
embryo selection
Artificial Intelligence
blastocyst
implantation
AI
An assessment of agreement between automated embryo annotation, through artificial intelligence, and manual embryo annotationA. Barrie1, R. Smith1, C. Hickman2, I. Erlich2, A. Campbell1.
1CARE Fertility Ltd, CARE Fertility Nottingham, Nottingham, United Kingdom.
2Fairtility, Fairtility, Telaviv, Israel.
Study question:

How strong is the agreement between embryo morphokinetic annotations performed by experienced embryologists compared to an automated embryo annotation system based on artificial intelligence (AI)?

Summary answer:
Agreement between manual and automated annotation as determined by the interclass correlation coefficient (ICC) revealed strong or very strong agreement for all analysed morphokinetic variables.

What is known already:
Transitioning from time-lapse imaging to embryo selection for transfer, freezing or discard involves annotation; the action of converting images to numerical data. Numerical data can be used as input to selection models quantifying embryo viability. Currently, embryos are manually annotated by the embryologist which can be subjective and time-consuming. As such, clinics prioritise a manageable number of variables to annotate, leading to a range of clinic practices. There is the additional challenge of operator variation, despite the development of standardised definitions and quality assurance schemes. AI may help resolve these challenges.

Study design, size, duration:
Retrospective comparative analysis, including 2442 embryos from IVF and ICSI cycles, from four private fertility clinics belonging to the same group in the UK. All the embryos cultured in a time-lapse incubator (EmbryoScope,Vitrolife) between January 2016 and 2019 were included in the study. Manual annotations (MA) versus automated annotations (AA) were compared using a two-way, mixed interclass correlation coefficient (ICC), which produced five categories of agreement, very weak(0-0.20), weak(0.21-0.40), moderate(0.41-0.60), strong(0.61-0.80), very strong(0.81-1.00).

Participants/materials, setting, methods:
Videos were manually annotated by experienced embryologists from pronuclei fading (tPNf) to time of expanded blastocyst (tEB) with all cell stages annotated in between (time to two-cell (t2), three-cell (t3), four-cell (t4), five-cell (t5), six-cell (t6), seven-cell (t7), eight-cell (t8), nine-cell (t9), morula (tM), start of blastulation (tSB) and full blastocyst (tB)). Blind to human annotations, and without any training, the same videos were annotated by CHLOE (Fairtility) to produce automated annotation data.

Main results and the role of chance:
Of the expected annotations, AA did not provide a result for 15.4% of the MA(3235/21008). Very strong agreement(0.81-1.00) between MA and AA was found for tPNf, t2, t3, t5, t6, tM, tSB, tB, tEB. Strong agreement(0.61-0.80) was found for t4, t7, t8 and t9+. Outliers in the AA data, defined as one standard deviation from the MA, were interrogated further for five key morphokinetic parameters; t2, t5, t8, tSB and tB. A total of 269 outliers were identified.

For t2 outliers(n=14,6%), the average time difference was 5.97h(range;5.50-24.44h). All embryos with a t2 outlier were classed as either poor(PQ) or average quality(AQ).

The t5 outliers(n=45,19%) had an average time difference of 2.84h(range;9.33-36.69h). 96%(n=43) of these embryos were classed as PQ(n=25,56%) or AQ(n=18,40%).

Outliers for t8(138,58%) were, on average, 17.53h different between MA and AA(range;12.68-40.35h). 94%(n=130)of these embryos were classed as PQ(n=77,56%) or AQ(n=53,38%).

The tSB outliers(n=28,12%) had an average time difference of 3.58h(range;0.71-14.39h). 89%(n=25) of these embryos were classed as PQ(n=16,57%) or AQ(n=9,32%).

Finally, outliers associated with tB(n=44,18%) had an average time difference of 6.39h(range;0.02-33.67h). 95%(n=42) of these embryos were classed as PQ(n=38,86%) or AQ(n=4,9%).

Almost 15%(n=40) of the embryos had outliers in more than one of the five morphokinetic parameters.

Limitations, reasons for caution:
The findings for this study reflect the capabilities of a specific AI-based annotation algorithm against the practice in multiple clinics in the same group and country. The automated annotation algorithm was not trained on this dataset prior to validation, which is encouraging for generalisability.

Wider implications of the findings:
AI is ideally suited to resolve annotation challenges. This study demonstrates that where embryo quality is poor, annotation could be skewed both when performed manually and automatically. Once robustness is demonstrated, AI tools such as CHLOE, may allow clinics to process clinical data efficiently, objectively and consistently.

Keywords:
Artificial Intelligence
time lapse imaging
embryo selection
Use of Artificial Intelligence to Assess the Effects of Assisted Hatching on Embryo Development and Implantation PotentialV. Jiang1, C. Bormann1, I. Souter1, I. Dimitriadis1, M.K. Kanakasabapathy2, P. Thirumalaraju2, H. Shafiee2.
1Massachusetts General Hospital, Ob/Gyn, Boston, U.S.A..
2Brigham and Women’s Hospital, Medicine, Boston, U.S.A..
Study question:

Does the use of laser-assisted hatching (AH) on cleavage stage embryos affect in vitro preimplantation embryo development or implantation potential?

Summary answer:
There is no difference in blastocyst conversion rate or implantation potential of embryos following AH at the cleavage stage for patients under age 35 years.

What is known already:
Laser-AH is the process of creating an opening within the zona pellucida on cleavage stage embryos to facilitate biopsy of trophectoderm cells for preimplantation genetic testing (PGT). Studies have shown that PGT for aneuploidy (PGT-A) in patients under 35 years have reduced pregnancy rates compared to those not undergoing biopsy. This is attributed to the additional micromanipulation events involved with PGT-A may decrease the viability of embryos and compromise their implantation potential. We aimed to objectively compare the impact of AH on embryo development using an artificial intelligence (AI)-algorithm trained to assess embryo quality and predict developmental fate.

Study design, size, duration:
A retrospective dataset from patients under 35 years was generated from two timepoints: cleavage stage embryos immediately before AH between 60-64 hours post insemination (hpi); and blastocyst stage embryos between 110-115 hpi prior to transfer or vitrification. Time-lapse imaging was obtained using the EmbryoScope (Vitrolife) . Cleavage stage embryo images were used to train a convolutional neural networks (CNN) to predict and classify the development and implantation potential of cleavage and blastocyst stage embryos.

Participants/materials, setting, methods:
Time-lapse images were collected for 1444 cleavage stage embryos spanning 189 in vitro fertilization (IVF) cycles between January 2014 – December 2021 at a single academic fertility center in Boston. Embryos were categorized into two groups: Day 3 embryos with AH (D3+AH) and without AH (D3-No AH). Each patient had a single blastocyst embryo transfer with a known outcome. Two-tailed t-tests were used to compare differences, with p-value less than 0.05 set for statistical significance.

Main results and the role of chance:
The dataset included 1035 embryos with AH (D3+AH) and 409 embryos without AH (D3-No AH). There were no differences in AI-predicted blastocyst development between Day 3 embryos with AH and without AH (64.1% vs 64.1%) or AI-predicted high quality blastocyst development rate between these two groups (43.8% vs 40.8%), respectively. On Day 5 there were no differences in the AI-categorization of embryos at the blastocyst stage between embryos with or without AH (62.3% vs 62.5%) or AI-categorization of high-quality blastocyst development (45.2% vs 41.8%), respectively. AI predicted a similar implantation potential between embryos with and without AH at the cleavage stage (61.1% vs 69.9%). When stratifying to only the embryos transferred, there were no differences in the AI-predicted blastocyst development between Day 3 embryos with AH and without AH (96.0% vs 97.1%) or in the AI-predicted high quality blastocyst development rate between these two groups (72.0% vs 82.7%). AI predicted a similar implantation potential between embryos with and without AH at the cleavage stage (72.0% vs 69.0%). These results correspond with the true clinical pregnancy rate between the AH and Non-AH groups (68.0% vs 61.9%, p=0.44).

Limitations, reasons for caution:
These retrospective findings were of patients who had time-lapse imaging of cleavage stage and blastocysts available. Additionally, we focused on high prognosis patients that were eligible for single blastocyst stage embryo transfer. Clinical pregnancy rate was examined, not spontaneous abortion or live birth rates.

Wider implications of the findings:
Utilization of AI technology allows for more objective and standardized methods for examining the impact of laboratory procedures on the developmental fate of embryos. This study demonstrated the safety of utilizing laser-assisted hatching on embryo development within this study population.

Keywords:
Artificial Intelligence
Assisted Hatching
embryo development
IVF
implantation
Usability, accuracy and cost-effectiveness of “eDiagEPU”, a medical software for early pregnancies: a retrospective studyF. Blavier1, D. Grobet2, C. Duflos3, R. Rayssiguier4, N. Ranisavljevic5, M. Duport Percier6, A. Rodriguez6, C. Blockeel7, S. dos Santos Ribeiro8, G. Faron9, L. Gucciardo9, F. Fuchs6.
1UZ Brussel, Obstetric and Prenatal Medicine, Grabels, France.
2Brussels Engineering School, Lecturer Computer Science, Brussels, Belgium.
3CHU Montpellier- Univ Montpellier, Clinical Research and Epidemiology Unit, Montpellier, France.
4CHU Montpellier, Obstetric and prenatal medicine, Montpellier, France.
5CHU Montpellier, ART-PGD Department, Montpellier, France.
6CHU Montpellier, Obstetrics and Prenatal Medicine, Montpellier, France.
7UZ Brussel University Hospital, Centre for Reproductive Medicine, Brussels, Belgium.
8IVI Lisbon, IVI-RMA Lisboa, Lisbon, Portugal.
9UZ Brussel University Hospital, Obstetrics and Prenatal Medicine, Brussels, Belgium.
Study question:

Can early pregnancies be accurately and cost-effectively diagnosed and managed using a new medical computerised tool, named “eDiagEPU”?

Summary answer:
Compared to the standard clinical approach, the retrospective implementation of “eDiagEPU” in a gynaecological emergency unit was correlated with sharper diagnoses and more cost-effective managements.

What is known already:
Early pregnancies complications are responsible for a large percentage of consultations, mostly in emergency units. Moreover, clinical guidelines updates for the management of Intrauterine Pregnancies of Uncertain Viability (IPUV) have become increasingly complex and seem to be unknown or misunderstood by several practitioners. Specifically, a recently published prospective multinational survey revealed a limited knowledge regarding early pregnancy guidelines, with 69.0% of the participants reporting incorrect managements of IPUV and 86.6% misinterpreting the evolution of serum human chorionic gonadotropin (hCG).

In an attempt to aid practitioners with the diagnosis and management of early pregnancies, a software, named “eDiagEPU”, was developed.

Study design, size, duration:
A total of 780 consultations, recorded between November 2018 and June 2019 in the gynaecological emergency unit of a tertiary university hospital, were retrospectively encoded in eDiagEPU. Positive hCG, ultrasonographical visualisation of gestational sac or/and embryo corresponding to a gestational age of 14 weeks gestation or less were the inclusion criteria.

Diagnoses and managements suggested by eDiagEPU are named “eDiagnoses”. The ones provided by a gynaecologist member of the emergency department staff are called “medDiagnoses”.

Participants/materials, setting, methods:
Identical eDiagnosis and medDiagnosis were considered as correct (gold standard). During follow-up examinations, if they became both identical to a previous discrepant eDiagnosis or medDiagnosis, this previous eDiagnosis/medDiagnosis was considered as correct. Persistent discrepancies were reviewed by four double-blinded experts whose majority defined the correct eDiagnosis/medDiagnosis.

The accuracies of eDiagnoses/medDiagnoses were compared using McNemar’s Chi square test, computing diagnostic values (Sensitivity, Specificity, and predictive values) and 95% Confidence Intervals (CI). Cost reduction was also analysed.

Main results and the role of chance:
Only one datum (0.1%) from 780 registered medical records was missing to process using “eDiagEPU”. Out of the 779 consultations that could be fully encoded until obtaining an eDiagnosis, 675 eDiagnoses were identical to the medDiagnoses (86.6%) and 104 discrepant (13.4%). From these 104, 60 reached an agreement during follow-up controls with 59 medDiagnoses finally changing into the initial eDiagnoses (98%) while only one discrepant eDiagnosis turning later into the initial medDiagnosis (2%). Finally, 24 remained discrepant at all subsequent checks and 20 were not reevaluated. Out of these 44 discrepancies without identical diagnoses/managements during follow-up controls, the double-blinded experts majority chose 38 eDiagnoses (86%) and 5 medDiagnoses (11%) including 4 twin pregnancies whose twinness was the only discrepancy. One discrepant eDiagnosis/medDiagnosis reached no majority (2%).

In total, eDiagnoses accuracy was 99.1% (675 + 59 + 38=772 eDiagnoses out of 779 final diagnoses), vs 87.4% (675 + 1 + 5=681) for medDiagnoses accuracy (p < 0.0001). Calculating all basic costs of consultations, medications, surgeries and hospitalisations induced by medDiagnoses versus eDiagnoses, “eDiagEPU” would have saved 3 623.75 € per month.

Retrospectively, “eDiaEPU” was usable (99.9%), more accurate for each diagnosis except twinning report and more cost-effective than standard clinical approach.

Limitations, reasons for caution:
The retrospective design is a limitation, as well as the quality of ultrasound interpretation. Some improvements could not derive exclusively from “eDiagEPU” but also from the encoding by a rested or more experienced physician. This software cannot replace clinical and ultrasonographical skills but can improve the diagnostic and therapeutic reasoning.

Wider implications of the findings:
An improved “eDiagEPU” version, considering the diagnosis and management of multiple pregnancies with their specificities (potentially multiple locations, chorioamnionicity) has been developed. Prospective evaluations will be required. Further development steps are considered, including software incorporation into ultrasound devices and integration of previously published predictive/prognostic factors (serum progesterone, corpus luteum scoring…).

Keywords:
Artificial Intelligence
ultrasound
Early pregnancy
Diagnostic and therapeutic software
An expected benefit analysis of using an interpretable machine learning model for optimizing the day of trigger during ovarian stimulationM. Fanton1, P. Maeder-York1, E. Hariton2, O. Barash3, L. Weckstein3, D. Sakkas4, A.B. Copperman5, K. Loewke1.
1Alife Health, Alife Health, San Francisco- CA, U.S.A..
2University of California San Francisco, Department of Obstetrics- Gynecology and Reproductive Sciences, San Francisco- CA, U.S.A..
3Reproductive Science Center, IVF Laboratory, San Ramon- CA, U.S.A..
4Boston IVF, The Eugin Group, Waltham- MA, U.S.A..
5Reproductive Medicine Associates of New York, Department of Reproductive Endocrinology and Infertility, New York- NY, U.S.A..
Study question:

What is the expected benefit of using a machine learning model for predicting the optimal day of trigger during ovarian stimulation?

Summary answer:
Patients who had an optimal day of trigger had improved outcomes compared to propensity matched patients who were triggered late or early.

What is known already:
The timing of the final trigger injection in ovarian stimulation is a subjective decision that varies across clinics and providers, with limited data to support objective criteria. Many studies report that follicles too small or too large on the day of trigger are less likely to yield a mature egg, although it remains unclear on how to apply these findings towards optimizing trigger timing. Clinical decision support tools to optimize the day of trigger have recently been developed, but these existing models rely upon black-box machine learning algorithms that are unable to explain the basis for their recommendations.

Study design, size, duration:
We performed a retrospective analysis of patients undergoing autologous IVF cycles from 2014 – 2020 (n=30,278) at three different IVF clinics in the United States. Data were split into train (70%), validation (10%), and test (20%) sets. The primary outcomes were the average number of MIIs, 2PNs, and usable blastocysts.

Participants/materials, setting, methods:
Linear regression models were trained to predict MIIs retrieved if triggering today, MIIs if triggering tomorrow, and next-day E2 levels using follicle counts and estradiol levels. A trigger day recommendation algorithm evaluated each patient’s simulation records day-by-day in order to compare MII outcomes if triggering today vs. tomorrow. If the predicted number of MIIs showed an increasing trend, the recommendation was to continue stimulation; for a decreasing trend, the recommendation was to trigger.

Main results and the role of chance:
The linear regression model for predicting MIIs on the day of trigger had a mean absolute error (MAE) of 2.87 oocytes and an R2 of 0.64, and the model for predicting next-day MIIs had an MAE of 3.02 oocytes and an R2 of 0.62. Next-day E2 levels were predicted with a MAE of 274 pg/mL and R2 of 0.88. Our model coefficients indicate that follicles 14-15mm and 16-17mm in diameter were most important for predicting MIIs if triggering today, while small follicles <= 10mm and large follicles >19mm were least important. For predicting next-day MIIs, follicles 11-13mm were most important, while follicles of size >19mm remained the least important. Possible early and late triggers were identified in 48.7% and 13.8% of cycles, respectively, by comparing the actual day of trigger to what the model recommended. After propensity score matching, patients with early triggers had on average 2.3 fewer MIIs, 1.8 fewer 2PNs, and 1.0 fewer blastocysts compared to matched patients with on-time triggers, and patients with late triggers had on average 2.7 fewer MIIs, 2.0 fewer 2PNs, and 0.7 fewer blastocysts compared to matched patients with on-time triggers.

Limitations, reasons for caution:
The primary limitation is the retrospective nature of this study. Further, we did not differentiate between different trigger medications or types of protocols. Some cycles in our dataset had incomplete or missing data, which were excluded from analysis, and could have introduced sampling bias.

Wider implications of the findings:
Our results suggest that an interpretable machine learning model can help optimize the day of trigger for increasing MII outcomes in a significant number of patients. Future work will include continuing to increase the diversity of our dataset and performing validation studies to show improved outcomes with model use.

Keywords:
ovarian stimulation
ART
machine learning
Artificial Intelligence
A combined expected benefit analysis of using two machine learning models for optimizing starting gonadotropin dose and day of trigger during ovarian stimulationM. Fanton1, J. Tang1, P. Maeder-York1, E. Hariton2, O. Barash3, L. Weckstein3, D. Sakkas4, A. Copperman5, K. Loewke6.
1Alife Health, Alife Health, Cambridge, U.S.A..
2University of California San Francisco, Department of Obstetrics- Gynecology and Reproductive Science, San Francisco, U.S.A..
3Reproductive Science Center Bay Area, Reproductive Science Center Bay Area, San Ramon, U.S.A..
4Boston IVF – The Eugin Group, Boston IVF – The Eugin Group, Waltham, U.S.A..
5Reproductive Medicine Associates of New York, Reproductive Medicine Associates of New York, New York, U.S.A..
6Alife Health, Alife Health, California, U.S.A..
Study question:
What is the combined expected benefit of using machine learning algorithms to optimize the starting gonadotropin dose and day of trigger during ovarian stimulation?

Summary answer:
Patients who had an optimal starting dose and optimal day of trigger had significantly improved outcomes compared to propensity matched patients who did not.

What is known already:
Choosing the starting dose of follicle-stimulating hormones (FSH) and deciding when to inject the final trigger shot are two critical decisions made during an ovarian stimulation protocol. Although studies have investigated the effect of these decisions on patient outcomes, in practice, they remain subjective and can vary significantly across providers. Recently, machine learning techniques to support these decisions have been investigated, providing evidence that following model recommendations can improve outcomes. However, the combined effect of multiple clinical decision support tools on patient outcomes has not been studied.

Study design, size, duration:
We performed a retrospective analysis of patients undergoing autologous, non-cancelled IVF cycles from 2014 – 2020 (n=15,522) at three different IVF clinics in the United States. The primary outcomes were the average number of MIIs, 2PNs, and usable blastocysts in relation to total doses of FSH.

Participants/materials, setting, methods:
To select the optimal starting FSH dose, a K-nearest neighbor model identified 100 similar patients and a dose response curve was created by plotting the number of MIIs retrieved relative to starting FSH across all neighbors. To select the optimal trigger day, linear regression models used daily follicles sizes and estradiol levels to predict MIIs retrieved today versus tomorrow, and a trigger day was identified by looking at day-by-day predicted MII trends.

Main results and the role of chance:
Across all cycles, 27% were given the recommended optimal starting FSH dose. 51% of patients were triggered earlier and 13% were triggered later than the recommendation. Combining both algorithms, 11% of patients were given both the optimal starting dose and optimal trigger day, while the remaining 89% of patients had cycles that did not follow both recommendations. Patients following both model recommendations had on average 3.2 more MIIs, 2.3 more 2PNs, and 1.2 more usable blastocysts, using 730 IU’s less of total FSH, compared to propensity-matched patients with cycles that did not match both recommendations

Limitations, reasons for caution:
The primary limitation is the retrospective nature of this study. Clinicians did not use either decision support tool when planning patients’ ovarian stimulation protocol. Further, we did not differentiate between different protocol or medication types in our analyses, which will be the focus of future work.

Wider implications of the findings:
Our results suggest that following the combined recommendations of two clinical decision support tools can improve outcomes and reduce total FSH used in ovarian stimulation. Future work will include continuing to increase the diversity of our dataset and performing validation studies to show improved outcomes with model use.

Keywords:
ovarian stimulation
trigger
machine learning
Artificial Intelligence
Artificial intelligence (AI) to optimize ovarian stimulation for IVF: from stimulation protocols to total and mature oocyte yield.I. Tur-Kaspa1, D.E. Fordham-2, O. Perl2, T. Tur-Kaspa1, S. Rosentraub2, D. Rosentraub2, A. Gershenfeld2, A.L. Polsky2, Y. Gold-Zamir2, D.H. Silver2, S. Benhiyoun1, D.P. Cohen1.
1Institute for Human Reproduction IHR, Medical, Chicago- Illinois, U.S.A..
2Embryonics, Research & Development, Tel Aviv, Israel.
Study question:

Can AI, based on demographic and clinical data, enhance IVF success by choosing the optimal controlled ovarian stimulation (COS) to maximize total and mature oocyte yield?

Summary answer:
Using patient demographics, routine preliminary blood tests and antral follicle count, this AI tool enables personalized COS protocol to optimize total/mature oocyte yield (±3.5 oocytes).

What is known already:
Number of retrieved oocytes and mature oocyte yield are major factors in IVF success. When planning an IVF cycle, doctors decide the stimulation protocol based on certain demographics and basic clinical data. Yet, the treatment recommendations based on these parameters vary globally. AI, an umbrella term for multiple data-driven disciplines, has recently been employed to assist in various stages of the IVF process, particularly in embryo selection for transfer, for COS drug dosing, and for tracking follicular growth to better predict the trigger date. Can AI optimize treatment regimen choice and predict oocyte quantity and quality?

Study design, size, duration:
Training of the AI aimed to build a model capable of predicting mature oocyte yield for a given COS protocol. An initial retrospective, anonymized dataset of 1463 autologous cycles from 757 patients collected between 2017-2021 was reduced to 769 cycles with complete data from 461 patients and used as input for the AI model. A multilayer perceptron deep learning network was implemented 100 times to validate statistical predictions of total/mature oocyte yield.

Participants/materials, setting, methods:
Patients underwent IVF treatment at a private ART center using antagonist (77%), micro-dose Lupron flare (20%) or minimal-stimulation (3%) COS protocols. Demographic data including age, diagnoses, ethnicity, body mass index (BMI), anti-Müllerian hormone (AMH), Day-3 AFC, estradiol, and progesterone levels were used as input along with protocol type. An 80%, 10%, 10% split of the data was used for AI training, validation, and testing, respectively. Mean patient age was 36.8±4.3 years.

Main results and the role of chance:
The AI model predictions were generated from baseline data, representing the clinical scenario prior to the start of a COS cycle. The mean absolute error (MAE) for the number of mature oocytes retrieved per cycle was 3.5. To assess the role of chance, performance was compared to a random model that assigned each true mature oocyte yield to a random patient from the dataset, resulting in an MAE of 6.1. (p<0.001; Mann-Whitney U test). When predicting total oocyte count, MAE was 4.6 and 7.3 for the AI and random models, respectively (p<0.001). The Spearman correlation coefficient revealed a strong positive correlation between the actual numbers of mature oocyte retrieved and mean predicted oocyte yield (0.748; p<0.0001). Using this model, a preliminary AI tool was developed which can predict total and mature oocyte yield based on different planned COS protocols.

Limitations, reasons for caution:
The results were generated from a model trained using retrospective, single-center data. To further validate the model, additional testing is needed using data from different ART centers, with additional COS protocols, and analyses comparing the results of the first COS cycle to the subsequent cycles of the same patient.

Wider implications of the findings:
This oocyte yield prediction model, once validated, can be used by infertility specialists worldwide as a data-driven approach to individualized COS, with the potential to assist cycle programing and optimize lab preparations. Further training with data concerning COS drug dosages and complications, including ovarian hyperstimulation syndrome, will increase clinical application.

Keywords:
artificial intelligence (AI)
In vitro fertilization (IVF)
deep learning
ovarian stimulation
oocyte
Predicting the Number of Oocytes Retrieved from Controlled Ovarian Hyperstimulation with Machine LearningJ. Chambost1, C. Jacques1, T. Ferrand1, C. Hickman2, P. He2, A. Reigner3, T. Freour3.
1Apricity, AI team, Paris, France.
2Apricity, AI team, London, United Kingdom.
3CHU Nantes, service d’aide médicale à la procréation biologiste AMP-DPI, Nantes, France.
Study question:

Can machine learning predict the number of oocytes retrieved from controlled ovarian hyperstimulation (COH) using a third-party dataset without the need for a data transfer?

Summary answer:
Three machine learning models were successfully trained through the Substra Infrastructure to predict the number of oocytes retrieved from COH. No data transfer took place.

What is known already:
A critical stage in in-vitro fertilization cycles is that of COH. Due to large inter- and intra-individual variations in ovarian response, clinicians need to decide on suitable and cost-effective ovarian stimulation protocols for patients with a view to retrieving as many mature oocytes as possible, while also minimizing the risk of complications such as ovarian hyperstimulation syndrome. A number of previous studies have identified and built predictive models on factors that influence the number of oocytes retrieved during COH. Many of these studies are, however, limited in the fact that they only consider a small number of variables in isolation.

Study design, size, duration:
This study was a retrospective analysis of 14,415 cycles performed at a single centre between 2009 and 2020. The analysis was carried out by an external data analysis team using the Substra framework. Substra enabled the data analysis team to send computer code to run securely on the centre’s on-premises server. Thus, a high level of data security was achieved as the data did not leave the centre at any point during the study.

Participants/materials, setting, methods:
The Light Gradient Boosting Machine algorithm was used to produce three predictive models: one that directly predicted the number of oocytes retrieved, and two that predicted which of a set of bins provided by two clinicians the number of oocytes retrieved fell into. The resulting models were evaluated on a held-out test set. In addition, the models themselves were analyzed to identify the parameters that had the biggest impact on their predictions.

Main results and the role of chance:
On average, the model that directly predicted the number of oocytes retrieved deviated from the ground truth by 3.80 oocytes. The model that predicted the first clinician’s bins deviated by 0.73 bins whereas the model for the second clinician deviated by 0.63 bins. For all models, performance was best within the first and third quartiles of the target variable, with the model underpredicting extreme values of the target variable (no oocytes and large numbers of oocytes retrieved). Nevertheless, the erroneous predictions made for these extreme cases were still within the vicinity of the true value.

Overall, all three models agreed on the importance of each feature which was estimated using Shapley Additive Explanation (SHAP) values. The feature with the highest mean absolute SHAP value (and thus the highest importance) was serum E2 before triggering, followed by the duration of gonadotropin treatment and antral follicle count. Of the other hormonal features, baseline FSH, AMH and E2 levels were similarly important and baseline LH was least important. The treatment characteristic with the highest SHAP value was the duration of treatment with longer periods being associated with a higher number of oocytes retrieved.

Limitations, reasons for caution:
The models produced in this study were trained on a cohort from a single center. They should thus not be used in clinical practice until trained and evaluated on a larger cohort more representative of the general population.

Wider implications of the findings:
These predictive models developed may be useful in clinical practice, assisting clinicians in optimizing COH protocols for individual patients. Our work also demonstrates the promise of using the Substra framework for allowing external researchers to provide clinically-relevant insights on sensitive fertility data in a fully secure, trustworthy manner.

Keywords:
Artificial Intelligence
machine learning
Oocytes
prediction
ovarian stimulation
%d bloggers like this: