publications
publications, conference proceedings, selected reports, citeable scientific software, ...
2026
- A Novel Model to Monitor and Measure Political and Governance Bias of News Channel’s Content on YouTube: Experimental Insights and ReflectionsChopra, Saransh, Saini, Nikunj, Mishra, Deepanshu, Verma, Kritika, Kumar, Sachin, and Verma, Ajit KumarUnder Review 2026
Social media platforms such as Facebook, Twitter (now X), Instagram, and YouTube have emerged as powerful ecosystems for information generation, dissemination, and consumption, engaging vast user bases across the globe. Business corporations and public institutions increasingly leverage these platforms to gauge public sentiment, influence discourse, and market products or services. Simultaneously, media organizations and journalists utilize digital platforms and web portals to propagate news, current affairs, governance initiatives, and political developments, thereby shaping public awareness and opinion. Among these platforms, YouTube has witnessed a significant user shift toward video-centric content, especially during politically sensitive periods such as elections. Despite this trend, there remains a critical research gap in quantifying the impact of YouTube-based news channels on governance narratives and ideological discourse. Notably, YouTube videos have, in several instances, triggered public outrage and mass mobilization, particularly during general and state elections in India. This study investigates the role of YouTube in ideological framing and issue-based discourse during the 2019 Indian General Elections and the 2023 Karnataka State Elections. It introduces a novel dataset comprising videos from the top 10 English news channels (ranked by TRP) over the three months leading up to each election. The research proposes a fine-tuned, text-based classification model capable of accurately and efficiently categorizing YouTube video content as pro-government, anti-government, or neutral. Furthermore, the study presents a temporal analysis of content trends across electoral cycles, highlighting shifts in thematic focus and ideological tone. The proposed model has significant implications for political parties, media strategists, and governance bodies, offering a valuable tool for monitoring public discourse, assessing policy perception, and guiding strategic communication. It also provides a framework for real-time citizen feedback on government initiatives and policy interventions.
2025
- A new SymPy backend for Vector: Uniting experimental and theoretical physicistsChopra, Saransh, and Pivarski, JimEPJ Web Conf. 2025
Vector is a Python library for 2D, 3D, and Lorentz vectors, especially arrays of vectors, to solve common physics problems in a NumPy-like way. Vector can currently perform numerical computations, and through this paper, we introduce a new symbolic backend that extends Vector’s utility to theoretical physicists. The numerical backends of Vector enable users to create pure Python object, NumPy arrays, and Awkward arrays of vectors. The object and Awkward backends are also implemented in Numba to leverage Just-In- Time (JIT) compiled vector calculations. The new symbolic backend, built on top of SymPy expressions, showcases Vector’s ability to support far-flung cases and allows SymPy methods and functions to work on vector classes. Moreover, apart from a few software, high energy physics has maintained a strict separation between tools used by theorists and experimentalists, and Vector’s SymPy backend aims to bridge this gap, providing a unified computational framework for both communities.
- Vector: JIT-compilable mathematical manipulations of ragged Lorentz vectorsChopra, Saransh, Schreiner, Henry, Rodrigues, Eduardo, Eschle, Jonas, and Pivarski, JimJournal of Open Source Software 2025
Mathematical manipulation of vectors is a crucial component of data analysis pipelines in high energy physics, enabling physicists to transform raw data into meaningful results that can be visualized. More specifically, high energy physicists work with 2D and 3D Euclidean vectors, and 4D Lorentz vectors that can be used as physical quantities, such as position, momentum, and forces. Given that high energy physics data are not uniform, vector manipulation frameworks or libraries are expected to work readily on non-uniform or ragged data, data with variable-sized rows (or a nested data structure with variable-sized entries); thus, the library is expected to perform operations on an entire ragged structure in minimum passes. Furthermore, optimizing memory usage and processing time has become essential with the increasing computational demands at the Large Hadron Collider (LHC), the world’s largest particle accelerator. Vector is a Python library for creating and manipulating 2D, 3D, and Lorentz vectors, especially arrays of vectors, to solve common physics problems in a NumPy-like (Harris et al., 2020) way. The library enables physicists to operate on high energy physics data in a high level language without compromising speed. The library is already in use at LHC and is a part of frameworks, like Coffea (Gray et al., 2023), employed by physicists across multiple high energy physics experiments.
- Design and Implementation of On-Board Autonomy for the CHESS Flight SoftwareChopra, SaranshEPFL Semester Project Report 2025
The space sector has recently witnessed a boom in CubeSats, a class of nano-satellite measuring 10 cm3 (1U) or a multiple of it. The easy accessibility and cost-effectiveness of CubeSats has enabled people from academia, industry, and other sectors to build and launch satellites as secondary payloads on a bigger launch vehicle, facilitating scientific research in space. EPFL Spacecraft Team (EST)’s flagship mission, Constellation of High-Energy Swiss Satellites (CHESS), aims to launch two 3U CubeSats on two distinct orbits (circular and elliptical) around Earth to conduct spectroscopic analysis of Earth’s exospheric composition. In orbit, a CubeSat’s operations are governed by its Flight Software (FS). This software is responsible for interfacing with all hardware sub-systems, managing communications with ground stations, processing telecommands and telemetry, and ensuring overall system robustness through Fault Detection, Isolation, and Recovery. The On-Board Autonomy of FS is responsible for maintaining nominal operations in space without ground interventions, making automated decisions, and stabilizing the satellite in the occurrence of an unexpected faults. This project aims to set up the FS for CHESS, and designs and implements the On- Board Autonomy (EventAction) for the CHESS mission. The FS implemented as part of this project will be deployed on Pathfinder 0, the first fully-integrated test satellite planned to launch in Low Earth Orbit in 2027. Concretely, this project contributes the following to EST’s CHESS mission: 1. Set up the initial infrastructure and design of CHESS FS. 2. Implementation of EventAction as an F‘ component. 3. Finite State Machine governing the satellite’s global operating modes and managing transitions between them. 4. A design for communication between EventAction and different FS sub-system managers via Triggers, created by a continuous stream of state-changing worthy information. 5. A computational algorithm to process incoming triggers, make meaningful decisions, and execute ap- propriate responses. These include transitioning to a global SAFE state or its sub-states, and initiating stabilization procedures pending ground intervention.
- Towards handling 10Pb/s of data through Machine Learning at CERN’s Large Hadron ColliderDerme, Francesco, Fumagalli, Pietro, and Chopra, SaranshEPFL Machine Learning for Science Project Report 2025
High Energy Physics (HEP) experiments, such as the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN), produce petabytes of data every second. Physicists are now actively integrating Machine Learning techniques in various parts of the pipeline to collect and analyze this data. Given the massive scale of these experiments, and the upcoming High Luminosity upgrade to LHC (HL-LHC), it’s becoming increasingly important to accelerate the inference of ML models beyond the supported capabilities of present day frameworks. The System for Optimized Fast Inference code Emit (SOFIE), is a C++ library developed by CERN for fast ML inference. SOFIE allows parsing a trained ML model into a highly-optimized C++ function, making it possible to run the inference process with minimal overhead and dependencies. CERN’s Machine Learning For Experimental Physics team has recently been experimenting with adding heterogeneous comput- ing support to SOFIE using the Alpaka library, allowing it to run inference on any device (including GPUs) while maintaining a single codebase. This paper extends SOFIE’s Alpaka backend with four new kernels, and adds related tests and documentation, allowing SOFIE to support inference on GPUs for more ML models. It further benchmarks the newly added operators against PyTorch implementations to showcase an increase in performance and the readiness to be used at scale.
2024
- PyHEP.dev 2024 Workshop Summary Report, August 26-30 2024, Aachen, GermanyAlshehri, Azzah, Bürger, Jan, Chopra, Saransh, Eich, Niclas, Eppelt, Jonas, Erdmann, Martin, Eschle, Jonas, Fackeldey, Peter, Farkas, Maté, Feickert, Matthew, Fillinger, Tristan, Fischer, Benjamin, Gerlach, Lino Oscar, Hartmann, Nikolai, Heidelbach, Alexander, Held, Alexander, Ivanov, Marian I, Molina, Josué, Nikitenko, Yaroslav, Osborne, Ianna, Padulano, Vincenzo Eduardo, Pivarski, Jim, Praz, Cyrille, Rieger, Marcel, Rodrigues, Eduardo, Shadura, Oksana, Smieško, Juraj, Stark, Giordon Holtsberg, Steinfeld, Judith, and Warkentin, Angela2024
The second PyHEP.dev workshop, part of the “Python in HEP Developers” series organized by the HEP Software Foundation (HSF), took place in Aachen, Germany, from August 26 to 30, 2024. This gathering brought together nearly 30 Python package developers, maintainers, and power users to engage in informal discussions about current trends in Python, with a primary focus on analysis tools and techniques in High Energy Physics (HEP). The workshop agenda encompassed a range of topics, such as defining the scope of HEP data analysis, exploring the Analysis Grand Challenge project, evaluating statistical models and serialization methods, assessing workflow management systems, examining histogramming practices, and investigating distributed processing tools like RDataFrame, Coffea, and Dask. Additionally, the workshop dedicated time to brainstorming the organization of future PyHEP.dev events, upholding the tradition of alternating between Europe and the United States as host locations. This document, prepared by the session conveners in the weeks following the workshop, serves as a summary of the key discussions, salient points, and conclusions that emerged.
- Predicting efficacy of antiseizure medication treatment with machine learning algorithms in North Indian populationKaushik, Mahima, Mahajan, Siddhartha, Machahary, Nitin, Thakran, Sarita, Chopra, Saransh, Tomar, Raj Vardhan, Kushwaha, Suman S., Agarwal, Rachna, Sharma, Sangeeta, Kukreti, Ritushree, and Biswal, BibhuEpilepsy Research 2024
Purpose This study aimed to develop a classifier using supervised machine learning to effectively assess the impact of clinical, demographical, and biochemical factors in accurately predicting the antiseizure medications (ASMs) treatment response in people with epilepsy (PWE). Methods Data was collected from 786 PWE at the Outpatient Department of Neurology, Institute of Human Behavior and Allied Sciences (IHBAS), New Delhi, India from 2005 to 2015. Patients were followed up at the 2nd, 4th, 8th, and 12th month over the span of 1 year for the drugs being administered and their dosage, the serum drug levels, the frequency of seizure control, drug efficacy, the adverse drug reactions (ADRs), and their compliance to ASMs. Several features, including demographic details, medical history, and auxiliary examinations electroencephalogram (EEG) or Computed Tomography (CT) were chosen to discern between patients with distinct remission outcomes. Remission outcomes were categorized into ‘good responder (GR)’ and ‘poor responder (PR)’ based on the number of seizures experienced by the patients over the study duration. Our dataset was utilized to train seven classical machine learning algorithms i.e Extreme Gradient Boost (XGB), K-Nearest Neighbor (KNN), Support Vector Classifier (SVC), Decision Tree (DT), Random Forest (RF), Naïve Bayes (NB) and Logistic Regression (LR) to construct classification models. Results Our research findings indicate that 1) among the seven algorithms examined, XGB and SVC demonstrated superior predictive performances of ASM treatment outcomes with an accuracy of 0.66 each and ROC-AUC scores of 0.67 (XGB) and 0.66 (SVC) in distinguishing between PR and GR patients. 2) The most influential factor in discerning PR to GR patients is a family history of seizures (no), education (literate) and multitherapy with Chi-square (χ2) values of 12.1539, 8.7232 and 13.620 respectively and odds ratio (OR) of 2.2671, 0.4467, and 1.9453 each. 3). Furthermore, our surrogate analysis revealed that the null hypothesis for both XGB and SVC was rejected at a 100 % confidence level, underscoring the significance of their predictive performance. These findings underscore the robustness and reliability of XGB and SVC in our predictive modelling framework. Significance Utilizing XG Boost and SVC-based machine learning classifier, we successfully forecasted the likelihood of a patient’s response to ASM treatment, categorizing them as either PR or GR, post-completion of standard epilepsy examinations. The classifier’s predictions were found to be statistically significant, suggesting their potential utility in improving treatment strategies, particularly in the personalized selection of ASM regimens for individual epilepsy patients.
- Computational upgrades to the high energy physics data analysis pipeline for future LHC/HL-LHC runsChopra, SaranshBachelor Thesis 2024
High Energy Physics experiments, such as the Large Hadron Collider at CERN, produce petabytes of data every year. Physicists require scalable and efficient scientific software to analyze and perform physics on the obtained data.elemedis The initial frameworks and scientific software developed for analyzing HEP data, such as ROOT, GEANT4, and BOOST, were written in C and C++; hence, such software had and still have a steep learning curve, especially for physicists with no programming background. Multiple HEP ecosystems have emerged in languages that are comparatively easy to pick up, such as the IRIS-HEP ecosystem in Python. This thesis began as a bid to implement the remaining pieces of Automatic Differentiation in Awkward Arrays, Vector, and Coffea but soon expanded to work on multiple other computational upgrades to the IRIS-HEP ecosystem. More specifically, this thesis extends the support of AD in Awkward Arrays, implements the Unified Histogram Interface for rebinning in boost-histogram, migrates Coffea’s vector algebra backend to Scikit-HEP/vector, and implements a symbolic backend in Vector. The work also includes several computational upgrades specifically in Vector to meet its rapidly growing user base. Finally, this thesis also includes development of a new Python package, cuda-histogram, to support Histogramming on GPUs for HEP data analysis pipelines. The work carried out in the past six months has already been integrated into the data analysis pipelines of physicists all around the globe. Furthermore, the upcoming upgrade of the Large Hadron Collider to the High-Luminosity Large Hadron Collider demands an even fine-grained suite of software, and the work carried out during this thesis adds up to these upgrades.
2023
- Formalising Mathematics and Computing in AgdaChopra, SaranshMcMasterU Visiting Research Student Report 2023
Type systems were originally devised to aid programmers in code development by validating variable passing, method interaction, and overall code structure to detect runtime errors at compile time. However, the Curry-Howard Isomorphism has propelled the evolution of type systems beyond their initial purpose, transforming them into tools not only for programming assistance but also for supporting mathematicians in theorem proving. This evolution has given rise to strongly typed functional programming languages, including Agda, Idris 2, Lean, and Coq, which leverage the Curry-Howard Isomorphism to act as proof assistants. In response to the growing requirement for strongly typed systems exhibiting this isomorphism, it becomes important to develop and distribute free and open-source software for researchers. This work aims to enhance Agda’s standard library and prepare it for version 2.0.0. The improvements encompass refactoring the library, introducing new functions and proofs, addressing mathematical bugs, streamlining the library’s dependency graph to reduce compile time, and incorporating concepts of finiteness. By undertaking these enhancements, the research contributes to the ongoing development of robust tools for both programming and formal mathematical reasoning, ultimately contributing to both, the fields of computing and mathematics.
- A Synergic Deep Learning Approach for Glioma Grading from Brain Tumor MRI ImagesChopra, Saransh, Sandhu, Harshvir, and Bansal, IshanPre-print (somehow never got published) 2023
Computer-aided diagnosis using deep learning approaches has made tremendous im- provements in medical imaging for automatically detecting tumor area, tumor type, and grade. These advancements, however, have been limited due to the fact that 1) medical images are often less in quantity, leading to overfitting, and 2) significant inter-class similarity and intra-class variation between the images. This study proposes a Synergic Deep Learning model with an AlexNet as a backbone for the automatic grading of glioma tumors. The Synergic Deep Learning architecture enables two pre- trained models to mutually learn from each other, allowing them to perform better than vanilla pre-trained models. Our study uses 417 T1-weighted sagittal tumor Magnetic Resonance Imaging (MRI) slices obtained from the REMBRANDT dataset. These 417 slices are pre-processed and augmented before they are fed into the model, which then classifies the tumor into one of the three grades: oligodendro glioma, anaplastic glioma, and glioblastoma multiforme. The proposed architecture achieves accuracy of 98.36%, showing that the model achieves excellent performance metrics even after being trained on an extremely small dataset. Finally, the proposed SDL model trained on less number of MRI images performs wither better or equally with other models trained on larger datasets in the literature.
2022
- liionpack: A Python package for simulating packs of batteries with PyBaMMTranter, Thomas G., Timms, Robert, Sulzer, Valentin, Planella, Ferran Brosa, Wiggins, Gavin M., Karra, Suryanarayana V., Agarwal, Priyanshu, Chopra, Saransh, Allu, Srikanth, Shearing, Paul R., and Brett, Dan J.Journal of Open Source Software 2022
Electrification of transport and other energy intensive activities is of growing importance as it provides an underpinning method to reduce carbon emissions. With an increase in reliance on renewable sources of energy and a reduction in the use of more predictable fossil fuels in both stationary and mobile applications, energy storage will play a pivotal role and batteries are currently the most widely adopted and versatile form. Therefore, understanding how batteries work, how they degrade, and how to optimize and manage their operation at large scales is critical to achieving emission reduction targets. The electric vehicle (EV) industry requires a considerable number of batteries even for a single vehicle, sometimes numbering in the thousands if smaller cells are used, and the dynamics and degradation of these systems, as well as large stationary power systems, is not that well understood. As increases in the efficiency of a single battery become diminishing for standard commercially available chemistries, gains made at the system level become more important and can potentially be realised more quickly compared with developing new chemistries. Mathematical models and simulations provide a way to address these challenging questions and can aid the engineer and designers of batteries and battery management systems to provide longer lasting and more efficient energy storage systems.