## Frontiers for Young Minds

- Download PDF

## Prime Numbers–Why are They So Exciting?

Prime numbers have attracted human attention from the early days of civilization. We explain what they are, why their study excites mathematicians and amateurs alike, and on the way we open a window to the mathematician’s world.

From the beginning of human history, prime numbers aroused human curiosity. What are they? Why are the questions related to them so hard? One of the most interesting things about prime numbers is their distribution among the natural numbers. On a small scale, the appearance of prime numbers seems random, but on a large scale there appears to be a pattern, which is still not fully understood. In this short paper, we will try to follow the history of prime numbers since ancient times and use this opportunity to dive into and better understand the mathematician’s world.

## Composite Numbers and Prime Numbers

Have you ever wondered why the day is divided into exactly 24 h, and the circle into 360 degrees? The number 24 has an interesting property: it can be divided into whole equal parts in a relatively large number of ways. For example, 24÷2 = 12, 24÷3 = 8, 24÷4 = 6, and so on (complete the rest of the options yourself!). This means that a day can be divided into two equal parts of 12 h each, daytime and nighttime. In a factory that works non-stop in 8-h shifts, each day is divided into exactly three shifts.

This is also the reason why the circle was divided into 360°. If the circle is divided into two, three, four, ten, twelve, or thirty equal parts, each part will contain a whole number of degrees; and there are additional ways of dividing a circle that we did not mention. In ancient times, dividing a circle into equal-sized sectors with high precision was necessary for various artistic, astronomical, and engineering purposes. With a compass and protractor as the only available instruments, division of a circle into equal sectors had great practical value. 1

A whole number that can be written as the product of two smaller numbers is called a composite number . For example, the equations 24 = 4 × 6 and 33 = 3 × 11 show that 24 and 33 are composite numbers. A number that cannot be broken down in this way is called a prime number . The numbers

2, 3, 5, 7, 11, 13, 17, 19, 23, and 29

are all prime numbers. In fact, these are the first 10 prime numbers (you can check this yourself, if you wish!).

Looking at this short list of prime numbers can already reveal a few interesting observations. First, except for the number 2, all prime numbers are odd, since an even number is divisible by 2, which makes it composite. So, the distance between any two prime numbers in a row (called successive prime numbers) is at least 2. In our list, we find successive prime numbers whose difference is exactly 2 (such as the pairs 3,5 and 17,19). There are also larger gaps between successive prime numbers, like the six-number gap between 23 and 29; each of the numbers 24, 25, 26, 27, and 28 is a composite number. Another interesting observation is that in each of the first and second groups of 10 numbers (meaning between 1–10 and 11–20) there are four prime numbers, but in the third group of 10 (21–30) there are only two. What does this mean? Do prime numbers become rarer as the numbers grow? Can anyone promise us that we will be able to keep finding more and more prime numbers indefinitely?

If, at this stage, something excites you and you wish to keep investigating the list of prime numbers and the questions we raised, this means that you have a mathematician’s soul. Stop! Do not continue reading! 2 Grab a pencil and a piece of paper. Write all the numbers up to 100 and mark the prime numbers. Check how many pairs with a difference of two are there. Check how many prime numbers there are in each group of 10. Can you find any patterns? Or does the list of prime numbers up to 100 seem random to you?

## Some History and the Concept of a Theorem

Prime numbers have occupied human attention since ancient times and were even associated with the supernatural. Even today, in modern times, there are people trying to provide prime numbers with mystical properties. The well-known astronomer and science author Carl Sagan wrote a book in 1985 called “Contact,” dealing with extraterrestrials (a human-like culture outside of earth) trying to communicate with humans using prime numbers as signals. The idea that signals based on prime numbers could serve as a basis for communication with extraterrestrial cultures continues to ignite the imagination of many people to this day.

It is commonly assumed that serious interest in prime numbers started in the days of Pythagoras. Pythagoras was an ancient Greek mathematician. His students, the Pythagoreans—partly scientists and partly mystics—lived in the sixth century BC. They did not leave written evidence and what we know about them comes from stories that were passed down orally. Three hundred years later, in the third century BC, Alexandria (in modern Egypt) was the cultural capital of the Greek world. Euclid ( Figure 1 ), who lived in Alexandria in the days of Ptolemy the first, may be known to you from Euclidean geometry, which is named after him. Euclidean geometry has been taught in schools for more than 2,000 years. But Euclid was also interested in numbers. In the ninth book of his work “Elements,” in Proposition 20, there appears for the first time a mathematical proof of the theorem that there are infinitely many prime numbers.

- The people behind the prime numbers.

This is a good place to say a few words about the concepts of theorem and mathematical proof. A theorem is a statement that is expressed in a mathematical language and can be said with certainty to be either valid or invalid. For example, the theorem “there are infinitely many prime numbers” claims that within the system of natural numbers (1,2,3…) the list of prime numbers is endless. To be more precise, this theorem claims that if we write a finite list of prime numbers, we will always be able to find another prime number that is not on the list. To prove this theorem, it is not enough to point out an additional prime number for a specific given list. For instance, if we point out 31 as a prime number outside the list of first 10 primes mentioned before, we will indeed show that that list did not include all prime numbers. But perhaps by adding 31 we have now found all of the prime numbers, and there are no more? What we need to do, and what Euclid did 2,300 years ago, is to present a convincing argument why, for any finite list, as long as it may be, we can find a prime number that is not included in it. In the next section, we will present Euclid’s proof, without burdening you with too much detail.

## Euclid’s Proof for the Existence of Infinitely Many Prime Numbers

To prove that there are infinitely many prime numbers, Euclid used another basic theorem that was known to him, which is the statement that “ every natural number can be written as a product of prime numbers .” It is easy to be convinced of the truth of this last claim. If you pick a number that is not composite, then that number is prime itself. Otherwise, you can write the number you chose as a product of two smaller numbers. If each of the smaller numbers is prime, you have expressed your number as a product of prime numbers. If not, write the smaller composite numbers as products of still smaller numbers, and so forth. In this process, you keep replacing any of the composite numbers with products of smaller numbers. Since it is impossible to do this forever, this process must end and all the smaller numbers you end up with can no longer be broken down, meaning they are prime numbers. As an example, let us break down the number 72 into its prime factors:

72 = 12 × 6 = 3 × 4 × 6 = 3 × 2 × 2 × 6 = 3 × 2 × 2 × 2 × 3.

Based on this basic fact, we can now explain Euclid’s beautiful proof for the infinitude of the set of prime numbers. We will demonstrate the idea using the list of the first 10 primes but notice that this same idea works for any finite list of prime numbers. Let us multiply all the numbers in the list and add one to the result. Let us give the name N to the number we get. (The value of N does not actually matter since the argument should be valid for any list.)

N = (2 × 3 × 5 × 7 × 11 × 13 × 17 × 19 × 23 × 29)+1.

The number N , just like any other natural number, can be written as a product of prime numbers. Who are these primes, the prime factors of N ? We do not know, because we have not calculated them, but there is one thing we know for sure: they all divide N . But the number N leaves a remainder of one when divided by any of the prime numbers on our list 2, 3, 5, 7,…, 23, 29. This is supposed to be a complete list of our primes, but none of them divides N . So, the prime factors of N are not on that list and, in particular, there must be new prime numbers beyond 29.

## The Sieve of Eratosthenes

Have you found all the prime numbers smaller than 100? Which method did you use? Did you check each number individually, to see if it is divisible by smaller numbers? If this is the way you chose, you definitely invested a lot of time. Eratosthenes ( Figure 1 ), one of the greatest scholars of the Hellenistic period, lived a few decades after Euclid. He served as the chief librarian in the library of Alexandria , the first library in history and the biggest in the ancient world. He was interested not only in mathematics but also in astronomy, music, and geography, and was the first to calculate the earth’s circumference with an impressive precision for his time. Among other things, he designed a clever way to find all the prime numbers up to a given number. Since this method is based on the idea of sieving (sifting) the composite numbers, it is called the Sieve of Eratosthenes .

We will demonstrate the sieve of Eratosthenes on the list of prime numbers smaller than 100, which is hopefully still in front of you ( Figure 2 ). Circle the number 2, since it is the first prime number, and then erase all its higher multiples, namely all the composite even numbers. Move on to the next non-erased number, the number 3. Since it was not erased, it is not a product of smaller numbers, and we can circle it knowing that it is prime. Again, erase all its higher multiples. Notice that some of them, such as 6, have been already deleted, while others, such as 9, will be erased now. The next non-erased number—5—will be circled. Again, erase all its higher multiples: 10, 15, and 20 have already been deleted, but 25 and 35, for instance, should be erased now. Continue in the same manner. Until when? Try to think why after passing 10 = 100 we do not need to continue the process. All numbers smaller than 100 that were not erased are prime numbers and can be safely circled!

- Figure 2 - Sieve of Eratosthenes.
- Composite numbers are crossed out and prime numbers are circled.

## Frequency of Prime Numbers

What is the frequency of prime numbers? How many prime numbers are there, approximately, between 1,000,000 and 1,001,000 (one million and one million plus one thousand) and how many between 1,000,000,000 and 1,000,001,000 (one billion and one billion plus one thousand)? Can we estimate the number of prime numbers between one trillion (1,000,000,000,000) and one trillion plus one thousand?

Calculations reveal that prime numbers become more and more rare as numbers get larger. But is it possible to state an accurate theorem that will express exactly how rare they are? Such a theorem was first stated as a conjecture by the great mathematician Carl Friedrich Gauss in 1793, at the age of 16. The nineteenth-century mathematician Bernhard Riemann ( Figure 1 ), who influenced the study of prime numbers in modern times more than anyone else, developed further tools needed to deal with it. But a formal proof of the theorem was given only in 1896, a century after it had been stated. Surprisingly, two independent proofs were provided the same year by the French Jacques Hadamard and the Belgian de la Vallée-Poussin ( Figure 1 ). It is interesting to note that both men were born around the time of the death of Riemann. The theorem they proved received the name “ the prime number theorem ” due to its importance.

The precise formulation of the prime number theorem, even more so the details of its proof, require advanced mathematics that we cannot discuss here. But put less precisely, the prime number theorem states that the frequency of prime numbers around x is inversely proportional to the number of digits in x . In the above example, the number of primes in a “window” of length 1,000 around one million (by which we mean the interval between one million and one million and one thousand) will be 50% larger than the number of primes in the same “window” around one billion (the ratio is 9:6, just like the ratio between the number of zeroes in one billion and one million), and about twice as much as the number of primes in the same window around one trillion (where the ratio of the number of zeroes is 12:6). Indeed, computer calculations show that there are 75 prime numbers in the first window, 49 in the second and only 37 in the third, between one trillion and one trillion plus one thousand.

The same information can be pictured as a graph, shown below ( Figure 3 ). You can see how the number π( x ) of primes up to x changes in the range x ≤ 100, and again for x ≤ 1,000. Notice that any time we meet a new prime along the x -axis, the graph rises by 1, so the graph takes the shape of steps ( Figure 3A ). On a small scale, it is hard to detect a pattern in the graph. It is quite easy to prove that we can find arbitrarily large intervals in which there are no prime numbers, meaning spans where the graph does not rise. On the other hand, a famous conjecture (see below) states that there are infinitely many twin primes , that is, pairs of primes with a difference of 2 between them, which would translate to a “step” of width 2 in the graph. On a larger scale, however, the graph looks smooth ( Figure 3B ). This smooth curve seen on a large scale demonstrates the prime number theorem.

- Figure 3 - Frequency of the prime numbers.
- Graphs showing π( x ), the number of primes up to the number x . In panel A. x ranges from 0 to 100, and the graph is step-like. In panel B. x ranges from 0 to 1,000, so the scale is larger and the graph appears to be much smoother.

The fact that a mathematical phenomenon seems to behave randomly in one scale but shows regularity (smoothness) in a different/larger scale—a regularity which becomes more and more accurate as the scale grows—is not new to mathematics. Systems in probability, such as coin flipping, behave in this way. It is impossible to predict the result of a single coin flipping, but over time, if the coin is unbiased, it will come up heads half the time. What is surprising is that the prime number system is not probabilistic, but it still behaves in many ways as if it were randomly selected.

## Summary: Who Wants to be a Millionaire?

Number theory, which includes the study of prime numbers, is rich with unsolved problems, unsuccessfully tackled by the greatest minds for hundreds of years. A few of those open problems are mathematical statements that have not been proven yet, but in whose correctness, we strongly believe. Such unproven theorems are called “conjectures” or “hypotheses.” We already mentioned the conjecture regarding the existence of infinitely many twin primes —pairs of prime numbers a distance of two apart. Another well-known conjecture, called Goldbach’s conjecture, states that every even number can be written as a sum of two prime numbers. For example: 16 = 13 + 3, 54 = 47 + 7. If you manage to prove any of them, you will win eternal fame. 3

Arguably the most famous unsolved problem in mathematics, Riemann’s hypothesis , was proposed by the same Bernhard Riemann who was mentioned earlier. In Riemann’s only research paper on prime numbers, published in 1859, Riemann stated a hypothesis that predicted how far from the true value of π( x ), the number of primes up to x , was the approximation given by the prime number theorem. In other words, what can be said about the “error term” in the prime number theorem—the difference between the real quantity and the suggested formula? The Clay Foundation has named this problem as one of the seven problems for which it will pay a $1,000,000 prize for the solution! If you were not intrigued so far, maybe this prize will motivate you…

Why is this important? Who does it interest? Mathematicians judge their problems first and foremost by their difficulty and intrinsic beauty. Prime numbers score high in both of these criteria. However, prime numbers are also useful in a practical way. Research on prime numbers has found an important use in encryption (the science of encoding secret messages) in the past few decades. We mentioned earlier the fictional book by Carl Sagan, on an extraterrestrial culture communicating with mankind using prime numbers. But there is a much “hotter” area, not fictional whatsoever, that uses prime numbers for either civilian or military purposes; that is, encrypted transmissions. When we withdraw money from an ATM, we use a debit card, and the communication between us and the ATM is encrypted. Like many other codes for encryption, the one found on almost every debit card, called RSA (named after its inventors—Rivest, Shamir, and Adleman), is based on the properties of prime numbers.

The story of prime numbers is still surrounded with mystery. So, their story is not yet over and done with…

Composite Number : ↑ a whole number that can be written as a product of two smaller numbers, for example, 24 = 3 × 8.

Prime Number (Non-Composite) : ↑ a whole number that cannot be written as the product of two smaller numbers, such as 7 or 23.

Mathematical Proof : ↑ a series of logical arguments meant to prove the truth of a mathematical theorem. The proof is based on basic assumptions that were tested, or on other theorems that were previously proven.

Mathematical Theorem : ↑ a claim expressed in the language of mathematics that can definitely be said to be valid or invalid in a certain system.

Mathematical Conjecture : ↑ (also called a hypothesis)—a mathematical statement that is believed to be true but has not yet been proven. The “belief in validity” can result from checking special cases, computational evidence, or mathematical intuition. There are mathematical conjectures over which people still disagree.

Twin Primes : ↑ a pair of prime numbers with a difference of two, such as 5, 7 or 41, 43.

## Conflict of Interest Statement

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

## Further Readings

[1] ↑ Du Sautoy, M. 2003. The Music of the Primes . HarperCollins.

[2] ↑ Doxiadis, A. 1992. Uncle Petros and Goldbach’s Conjecture . Bloomsbury.

[3] ↑ Pomerance, C. 2004. “Prime numbers and the search for extraterrestrial intelligence,” in Mathematical Adventures for Students and Amateurs , eds D. Hayes and T. Shubin (M.A.A), 1–4.

[4] ↑ Singh, S. 1999. The Code Book . London, Fourth Estate.

[1] ↑ The division of the circle into 360 appears for the first time in the writings of Greek and Egyptian astronomers, but is based on an earlier division of the hour to 60 min by the Babylonians. Undoubtedly, it is also related to the fact that the solar years last 365 days (on average), but note that 365 = 5 x 73 and since both 5 and 73 are prime, 365 admits much fewer factorizations than 360.

[2] ↑ A correct reading of a mathematical text is an “active reading,” where the reader checks what is being said, computes examples, etc. But, if you would like to skip the proposed task, you can do so, and we will return to it and discuss it later on.

[3] ↑ The twin prime conjecture witnessed in recent years amazing breakthroughs by Zhang and Maynard but is nevertheless still open. Concerning the Goldbach conjecture, Helfgott proved in 2014 that every odd number larger than 5 is the sum of three primes.

## PRIMES: Research Papers

2023 research papers, 375) roger fan, nitya mani (mit), multidisperse random sequential adsorption and generalizations (7 jan 2024).

In this paper, we present a unified study of the limiting density in one-dimensional random sequential adsorption (RSA) processes where segment lengths are drawn from a given distribution. In addition to generic bounds, we are also able to characterize specific cases, including multidisperse RSA, in which we draw from a finite set of lengths, and power-law RSA, in which we draw lengths from a power-law distribution.

## 374) Vasiliy Neckrasov (Brandeis University), Eric Zhan, On Nontrivial Winning and Losing Parameters of Schmidt Games (arXiv.org, 1 Jan 2024)

In this paper we completely describe the winning and losing conditions different from the only "trivial" conditions known before. In other words, we solve the open question of finding a complete nontrivial Schmidt diagram. In addition, we give the new bounds for two family of sets: one related to frequencies of digits in base-$2$ expansions, and one connected to the set of the badly approximable numbers.

## 373) Michael Yang, Rigidity and Rank of Group-Circulant Matrices (19 Dec 2023)

Given a finite group $G$, a ring $\Lambda,$ and a function $f : G \rightarrow \Lambda$, a $G$-circulant matrix of $f$ is a $|G| \times |G|$ matrix $M$ with rows and columns indexed by the elements of $G$ for which $M_{xy} = f(xy)$ for all $x, y \in G.$ We study the fundamental properties of $G$-circulants when $\Lambda$ is an algebraically closed field with characteristic coprime to $|G|$. We begin by proving new results about the matrix rigidity of $G$-circulants for nonabelian $G$, which are the first of its kind. We show that for any sequence of finite groups $G_i$ whose abelian normal subgroups have sufficiently small index, the family of $G_i$-circulants is not Valiant-rigid. Furthermore, we show that this result applies for families of groups $\{G_i\}_i$ whose representations are bounded above in degree. Next, we exhibit a formula for the rank of any $G$-circulant in terms of the decomposition of its corresponding function $f : G \rightarrow \Lambda$ into the matrix coefficients of the irreducible representations of $G.$ While this was known to Diaconis, we present a more elementary proof that avoids the full strength of Schur Orthogonality. We then apply this formula to the case of $G$-circulants for cyclic $G.$ Through this, we generalize a theorem of Chen, providing a necessary and sufficient criterion for when zero-one circulants are always nonsingular. Additionally, we answer an open problem about singular circulant digraphs posed by Lal--Reddy and give a probabilistic estimate for the regularity of zero-one singular circulant matrices. Lastly, we investigate orthogonal representations of graphs. Given a finite, simple graph $G,$ we provide a novel lower bound for the minimal dimension in which a faithful orthogonal representation for $G$ exists. Furthermore, we use our bound to determine the aforementioned minimal dimension for an infinite family of Kneser graphs up to a constant factor.

## 372) Rohan Das, Christopher Qiu, and Shiqiao Zhang, The Distribution of the Cokernels of Random Symmetric and Alternating Matrices over the Integers Modulo a Prime Power (13 Dec 2023)

Given a prime $p$ and positive integers $n$ and $k$, consider the ring $M_n(\mathbb{Z}/p^{k}\mathbb{Z})$ of $n \times n$ matrices over $\mathbb{Z}/p^{k}\mathbb{Z}$. In 1989, Friedman and Washington computed the number of matrices in $M_n(\mathbb{Z}/p^{k}\mathbb{Z})$ with a given residue modulo $p$ and a given cokernel $G$ subject to the condition $p^{k - 1} G = 0$. Cheong, Liang, and Strand generalized this result in 2023 by removing the condition $p^{k - 1} G = 0$, completing the description of the distribution of the cokernel of a random matrix uniformly selected from $M_n(\mathbb{Z}/p^{k}\mathbb{Z})$. In 2015, following the work of Friedman and Washington, Clancy, Kaplan, Leake, Payne, and Wood determined the distribution of the cokernel of a random $n \times n$ symmetric matrix over $\mathbb{Z}_p$, and Bhargava, Kane, Lenstra, Poonen, and Rains determined the distribution of the cokernel of a random $n \times n$ alternating matrix over $\mathbb{Z}_p$. In this paper, we refine these results by determining the distribution of the cokernels of random symmetric and alternating matrices over $\mathbb{Z}_p$ with a fixed residue modulo $p$.

## 371) Anton Levonian, Existence of Circle Packings on Translation Surfaces (8 Dec 2023)

A translation surface is a surface formed by identifying edges of a collection of polygons in the complex plane that are parallel and of equal length using only translations. We determined that the same circle packing can be realized on varying translation surfaces in a certain stratum. We also determined possible complexities of contacts graphs and provide a bound on this complexity in some low-genus strata. Finally, we established the possibility of certain contacts graphs’ complexities in strata with genus greater than 2.

## 370) Srinivas Arun, Further Bounds on the Helly Numbers of Product Sets (7 Dec 2023)

The Helly number $h(S)$ of a set $S\subseteq\mathbb{R}^d$ is defined as the smallest positive integer $h$, if it exists, such that the following statement is true: for any finite family of convex sets in $\mathbb{R}^d,$ if every subfamily of $h$ sets intersects, then all sets in the family intersect. We study Helly numbers of product sets of the form $A^d$ for some one-dimensional set $A.$ Inspired by Dillon's research on the Helly numbers of product sets, Ambrus, Balko, Frankl, Jung, and Naszódi recently obtained the first bounds for Helly numbers of exponential lattices in two dimensions, which are sets of the form $S=\{\alpha^n: n\in\mathbb{N}\}^2$ for some $\alpha>1.$ We develop a different, simpler method to obtain better upper bounds for exponential lattices. In addition, we generalize the lower bounds of Ambrus et al.~to higher dimensions. We additionally investigate sets $A\in\mathbb{Z}$ whose consecutive elements differ by at most $2$ such that $h(A^2)=\infty.$ We slightly strengthen a theorem of Dillon that such sets exist while also providing a shorter proof. We obtain Helly number bounds for certain sets defined by arithmetic congruences. Finally, we introduce a generalization of the notion of an empty polygon, and show that in one case, it is equivalent to the original definition.

## 369) Michelle Wei and Guanghao Ye (MIT), Solving Second-Order Cone Programs Deterministically in Matrix Multiplication Time (3 Dec 2023)

We propose a deterministic algorithm for solving second-order cone programs of the form \[ \min_{Ax=b,x \in \mathcal{L}_1\times \dots \times \mathcal{L}_r} c^\top x, \] which optimize a linear objective function over the set of $x\in \mathbb{R}^n$ contained in the intersection of an affine set and the product of $r$ second-order cones. Our algorithm achieves a runtime of $$\widetilde {O}((n^{\omega} + n^{2+o(1)}r^{1/6} + n^{2.5-\alpha/2 + o(1)})\log(1/\epsilon)),$$ where $\omega$ and $\alpha$ are the exponents of matrix multiplication, and $\epsilon$ is the relative accuracy. For the current values of $\omega\sim 2.37$ and $\alpha\sim 0.32$, our algorithm takes $\widetilde{O}(n^{\omega} \log(1/\epsilon))$ time. This nearly matches the runtime for solving the sub-problem $Ax=b$. To the best of our knowledge, this is the first improvement on the computational complexity of solving second-order cone programs after the seminal work of Nesterov and Nemirovski on general convex programs. For $\omega=2$, our algorithm takes $\widetilde{O}(n^{2+o(1)} r^{1/6}\log(1/\epsilon))$ time. To obtain this result, we utilize several new concepts that we believe may be of independent interest: (1) We introduce a novel reduction for splitting $\ell_p$-cones. (2) We propose a deterministic data structure to efficiently maintain the central path of interior point methods for general convex programs.

## 368) Sophia Liao, Harold Polo (University of Florida), A Goldbach theorem for Laurent polynomials with positive integer coefficients (arXiv.org, 2 Dec 2023), forthcoming in American Mathematical Monthly

We establish an analogue of the Goldbach conjecture for Laurent polynomials with positive integer coefficients.

## 367) Matvey Borodin, The Orbits of the Action of the Cactus Group on Arc Diagrams (arXiv.org, 2 Dec 2023)

The cactus group $J_n$ is the $S_n$-equivariant fundamental group of the real locus of the Deligne-Mumford moduli space of stable rational curves with marked points. This group plays the role of the braid group for the monoidal category of Kashiwara crystals attached to a simple Lie algebra. Following Frenkel, Kirillov and Varchenko, one can identify the multiplicity set in a tensor product of $\mathfrak{sl}_2$-crystals with the set of arc diagrams on a disc, thus allowing a much simpler description of the corresponding $J_n$-action. We address the problem of classifying the orbits of this cactus group action. Namely, we describe some invariants of this action and show that in some (fairly general) classes of examples there are no other invariants. Furthermore, we describe some additional relations, including the braid relation, that this action places on the generators of $J_n$.

## 366) Catherine Li and Daniel Lazarev (MIT), Spatiotemporal risk prediction for infectious disease spread and mortality (28 Nov 2023; arXiv.org, 5 Dec 2023)

With the outbreak of the COVID-19 pandemic, various studies have focused on predicting the trajectory and risk factors of the virus and its variants. Building on previous work that addressed this problem using genetic and epidemiological data, we introduce a method, Geo Score, that also incorporates geographic, socioeconomic, and demographic data to estimate infection and mortality risk by region and time. We employ gradient descent to find the optimal weights of the factors’ significance in determining risk. Such spatiotemporal risk prediction is important for informed public health decision-making so that individuals are aware of the risks of travel during an epidemic or pandemic, and, perhaps more importantly, so that policymakers know how to triage limited resources during a crisis. We apply our method to New York City COVID-19 data from 2020, predicting ZIP code-level COVID-19 risk for 2021.

## 365) Aryan Bora, Yunseo Choi (Harvard), and Lucas Tang, On the spum and sum-diameter of paths (27 Nov 2023)

In a sum graph, the vertices are labeled with distinct positive integers, and two vertices are adjacent if the sum of their labels is equal to the label of another vertex. The spum of a graph G is defined as the minimum difference between the largest and smallest labels of a sum graph that consists of G in union with a minimum number of isolated vertices. More recently, Li introduced the sum-diameter of a graph G , which modifies the definition of spum by removing the requirement that the number of isolated vertices must be minimal. In this paper, we settle conjectures by Singla, Tiwari, and Tripathi and a conjecture by Li by evaluating the spum and the sum-diameter of paths.

## 364) Artem Kalmykov (MIT), Brian Li, Intertwining operators between subregular Whittaker modules for $\mathfrak{gl}_N$ and non-standard quantizations (29 Oct 2023)

In this paper, we study intertwining operators between subregular Whittaker modules of $gl_N$ generalizing, on the one hand, the classical exchange construction of dynamical quantum groups, on the other hand, earlier results for principal W-algebras. We explicitly construct them using the generators of W-algebras introduced by Brundan-Kleshchev. We interpret the fusion on intertwining operators in terms of categorical actions and compute the semi-classical limit of the corresponding monoidal isomorphisms which turn out to depend on dynamical-like parameters.

## 363) Felix Gotti (MIT), Henrick Rabinovitz, On the ascent of atomicity to one-dimensional monoid algebras (28 Oct 2023)

A commutative cancellative monoid is atomic if every non-invertible element factors into irreducibles (also called atoms), while an integral domain is atomic if its multiplicative monoid is atomic. Back in the eighties, Gilmer posed the question of whether the fact that a torsion-free monoid~$M$ and an integral domain $R$ are both atomic implies that the monoid algebra $R[M]$ of $M$ over $R$ is also atomic. In general this is not true, and the first negative answer to this question was given by Roitman in 1993: he constructed of an atomic integral domain whose polynomial extension is not atomic. More recently, Coykendall and the first author constructed finite-rank torsion-free atomic monoids whose algebras over certain finite fields are not atomic. Still, the ascent of atomicity from finite-rank torsion-free monoids to their monoid algebras over fields of characteristic zero is an open problem. The main purpose of this paper is to provide a negative answer to this problem. We actually construct a rank-one torsion-free atomic monoid whose monoid algebras over any field are not atomic. To do so, we introduce and study a methodological construction inside the class of rank-one torsion-free monoid that we call lifting: it consists in embedding a given monoid into another monoid that is often more tractable from the arithmetic viewpoint.

## 362) Scott T. Chapman (SHSU), Joshua Jang, Jason Mao, Skyler Mao, Betti Graphs and Atomization of Puiseux Monoids (9 Oct 2023; arXiv.org, 30 Nov 2023)

Let $M$ be a Puiseux monoid, that is, a monoid consisting of nonnegative rationals (under addition). A nonzero element of $M$ is called an atom if its only decomposition as a sum of two elements in $M$ is the trivial decomposition (i.e., one of the summands is $0$), while a nonzero element $b \in M$ is called atomic if it can be expressed as a sum of finitely many atoms allowing repetitions: this sum of atoms is called an (additive) factorization of $b$. The monoid $M$ is called atomic if every nonzero element of $M$ is atomic. In this paper, we study factorizations in atomic Puiseux monoids through the lens of their associated Betti graphs. The Betti graph of $b \in M$ is the graph whose vertices are the factorizations of $b$ with edges between factorizations that share at least one atom. Betti graphs have been useful in the literature to understand several factorization invariants in the more general class of atomic monoids.

## 361) Hannah Fox, Agastya Goel, Sophia Liao, Arithmetic of semisubtractive semidomains (5 Oct 2023; arXiv.org, 13 Nov 2023)

A subset $S$ of an integral domain is called a semidomain if the pairs $(S,+)$ and $(S, \cdot)$ are commutative and cancellative semigroups with identities. The multiplication of $S$ extends to the group of differences $\mathcal{G}(S)$, turning $\mathcal{G}(S)$ into an integral domain. In this paper, we study the arithmetic of semisubtractive semidomains (i.e., semidomains $S$ for which either $s \in S$ or $-s \in S$ for every $s \in \mathcal{G}(S)$). Specifically, we provide necessary and sufficient conditions for a semisubtractive semidomain to satisfy the ascending chain condition on principals ideals, to be a bounded factorization semidomain, and to be a finite factorization semidomain, which are subsequent relaxations of the property of having unique factorizations. In addition, we present a characterization of half-factorial semisubtractive semidomains. Throughout the article, we present examples to provide insight into the arithmetic aspects of semisubtractive semidomains.

## 360) Andrew Lin, Henrick Rabinovitz, Qiao Zhang, The Furstenberg property in Puiseux monoids (21 Sept 2023)

Let $M$ be a commutative monoid. The monoid $M$ is called atomic if every non-invertible element of $M$ factors into atoms (i.e., irreducible elements), while $M$ is called a Furstenberg monoid if every non-invertible element of $M$ is divisible by an atom. Additive submonoids of $\mathbb{Q}$ consisting of nonnegative rationals are called Puiseux monoids, and their atomic structure has been actively studied during the past few years. The primary purpose of this paper is to investigate the property of being Furstenberg in the context of Puiseux monoids. In this direction, we consider some properties weaker than being Furstenberg, and then we connect these properties with some atomic results which have been already established for Puiseux monoids.

## 359) Akshaya Chakravarthy (PRIMES), Agustina Czenky (University of Oregon), Julia Plavnik (Indiana University Bloomington), On modular categories with Frobenius-Perron dimension congruent to 2 modulo 4 (arXiv.org, 24 Aug 2023)

We contribute to the classification of modular categories $\mathcal{C}$ with $\operatorname{FPdim}(\mathcal{C})\equiv 2 \pmod 4$. We prove that such categories have group of invertibles of even order, and that they factorize as $\mathcal C\cong \widetilde{\mathcal C} \boxtimes \operatorname{sem}$, where $\widetilde{\mathcal C}$ is an odd-dimensional modular category and $\operatorname{sem}$ is the rank 2 pointed modular category. This reduces the classification of these categories to the classification of odd-dimensional modular categories. It follows that modular categories $\mathcal C$ with $\operatorname{FPdim}(\mathcal{C})\equiv 2 \pmod 4$ of rank up to 46 are pointed. More generally, we prove that if $\mathcal C$ is a weakly integral MTC and $p$ is an odd prime dividing the order of the group of invertibles that has multiplicity one in $\operatorname{FPdim}(\mathcal C)$, then we have a factorization $\mathcal C \cong \widetilde{\mathcal C} \boxtimes \operatorname{Vec}_{\mathbb Z_p}^{\chi},$ for $\widetilde{\mathcal C}$ an MTC with dimension not divisible by $p$.

## 358) Evan Chang (PRIMES), Neel Kolhe (PRIMES), Youngtak Sohn (MIT), Upper bounds on the $2$-colorability threshold of random $d$-regular $k$-uniform hypergraphs for $k\geq 3$ (arXiv.org, 3 Aug 2023)

For a large class of random constraint satisfaction problems (CSP), deep but non-rigorous theory from statistical physics predict the location of the sharp satisfiability transition. The works of Ding, Sly, Sun (2014, 2016) and Coja-Oghlan, Panagiotou (2014) established the satisfiability threshold for random regular $k$-NAE-SAT, random $k$-SAT, and random regular $k$-SAT for large enough $k\geq k_0$ where $k_0$ is a large non-explicit constant. Establishing the same for small values of $k\geq 3$ remains an important open problem in the study of random CSPs. In this work, we study two closely related models of random CSPs, namely the $2$-coloring on random $d$-regular $k$-uniform hypergraphs and the random $d$-regular $k$-NAE-SAT model. For every $k\geq 3$, we prove that there is an explicit $d_{\ast}(k)$ which gives a satisfiability upper bound for both of the models. Our upper bound $d_{\ast}(k)$ for $k\geq 3$ matches the prediction from statistical physics for the hypergraph $2$-coloring by Dall'Asta, Ramezanpour, Zecchina (2008), thus conjectured to be sharp. Moreover, $d_{\ast}(k)$ coincides with the satisfiability threshold of random regular $k$-NAE-SAT for large enough $k\geq k_0$ by Ding, Sly, Sun (2014).

## 357) Henry Jiang, Shihan Kanungo, Harry Kim, A weaker notion of the finite factorization property (arXiv.org, 18 Jul 2023)

An (additive) commutative monoid is called atomic if every given non-invertible element can be written as a sum of atoms (i.e., irreducible elements), in which case, such a sum is called a factorization of the given element. The number of atoms (counting repetitions) in the corresponding sum is called the length of the factorization. Following Geroldinger and Zhong, we say that an atomic monoid $M$ is a length-finite factorization monoid if each $b \in M$ has only finitely many factorizations of any prescribed length. An additive submonoid of $\mathbb{R}_{\ge 0}$ is called a positive monoid. Factorizations in positive monoids have been actively studied in recent years. The main purpose of this paper is to give a better understanding of the non-unique factorization phenomenon in positive monoids through the lens of the length-finite factorization property. To do so, we identify a large class of positive monoids which satisfy the length-finite factorization property. Then we compare the length-finite factorization property to the bounded and the finite factorization properties, which are two properties that have been systematically investigated for more than thirty years.

## 356) Alicia Li and Matan Yablon, Adversarial Attacks Against Online Learning Agents (1 Jul 2023)

Consider a typical streaming problem, where an agent dynamically interacts with its environment to learn an optimal behavior. Such methods are used in a variety of applications, including playing Atari games and robotic hand manipulation. We analyze an agent that learns the rewards of each path in its environment, which can be modeled as determining the edge weights of a graph. We study an agent that follows an ϵ-greedy sampling strategy because this model is widely used and has been successfully applied to many problems. However, in recent years, numerous attacks have been devised against graph learning algorithms, with some methods exploiting graph structure and node features. To ultimately create a robust graph streaming algorithm based on ϵ-annealing, we first construct, implement, and analyze worst-case attacks against random-sampling and ϵ-greedy victim models. Our adversarial strategy exploits path overlaps and stalls the victim to effectively increase the corruption budget.

## 355) Linus Tang, Extremal Bounds on Peripherality Measures (arXiv.org, 27 Jun 2023)

We investigate several measures of peripherality for vertices and edges in networks. We improve asymptotic bounds on the maximum value achieved by edge peripherality, edge sum peripherality, and the Trinajstić index over $n$ vertex graphs. We also prove similar results on the maxima over $n$-vertex bipartite graphs, trees, and graphs with a fixed diameter. Finally, we refute two conjectures of Furtula, the first on necessary conditions for minimizing the Trinajstić index and the second about maximizing the Trinajstić index.

## 354) David Dong, Generalized Eulerian Numbers (arXiv.org, 16 Jun 2023)

Let $A(n,m)$ denote the Eulerian numbers, which count the number of permutations on $[n]$ with exactly $m$ descents. It is well known that $A(n,m)$ also counts the number of permutations on $[n]$ with exactly $m$ excedances. In this report, we define numbers of the form $A(n,m,k)$, which count the number of permutations on $[n]$ with exactly $m$ descents and the last element $k$. We then show bijections between this definition and various other analogs for $r$-excedances and $r$-descents. We also prove a variation of Worpitzky's identity on $A(n,m,k)$ using a combinatorial argument mentioned in a paper by Spivey in 2021.

## 353) Joseph Vulakh, Twisted homogeneous racks over the alternating groups (arXiv.org, 30 May 2023)

An important step towards the classification of finite-dimensional pointed Hopf algebras is the classification of finite-dimensional Nichols algebras arising from braided vector spaces of group type. This question is fundamentally linked with the structure of algebraic objects called racks. Of particular interest to this classification is the type D condition on racks, a sufficient condition for a rack to not be the source of a finite-dimensional Nichols algebra. In this paper, we study the type D condition in simple racks arising from the alternating groups. Expanding upon previous work in this direction, we make progress towards a general classification of twisted homogeneous racks of type D by proving that several families of twisted homogeneous racks arising from alternating groups are of type D.

## 352) Agustina Czenky, William Gvozdjak (PRIMES), Julia Plavnik, Classification of low-rank odd-dimensional modular categories (arXiv.org, 23 May 2023), published in Journal of Algebra , and also presented at BIMSA-Tsinghua Quantum Symmetry Seminar

We prove that any odd-dimensional modular category of rank at most $23$ is pointed. We also show that an odd-dimensional modular category of rank $25$ is either pointed, perfect, or equivalent to $\operatorname{Rep}(D^\omega(\mathbb Z_7\rtimes\mathbb Z_3))$. Finally, we give partial classification results for modular categories of rank up to $73$.

## 2022 Research Papers

351) matvey borodin, ethan liu, justin zhang, results on vanishing polynomials and polynomial root counting (arxiv.org, 24 sept 2023), forthcoming in proceedings of the 2023 ieee mit undergraduate research technology conference.

We study the set of algebraic objects known as vanishing polynomials (the set of polynomials that annihilate all elements of a ring) over general commutative rings with identity. These objects are of special interest due to their close connections to both ring theory and the technical applications of polynomials, along with numerous applications to other mathematical and engineering fields. We first determine the minimum degree of monic vanishing polynomials over a specific infinite family of rings of a specific form and consider a generalization of the notion of a monic vanishing polynomial over a subring. We then present a partial classification of the ideal of vanishing polynomials over general commutative rings with identity of prime and prime square orders. Finally, we prove some results on rings that have a finite number of roots and propose a technique that can be utilized to restrict the number of roots polynomials can have over certain finite commutative rings.

## 350) Daniel Kriz, Eric Shen (PRIMES), and Kevin Wu (PRIMES), Congruences between logarithms of Heegner points (26 Mar 2023)

Elliptic curves are an important class of Diophantine equations. We study certain special solutions of elliptic curves called Heegner points, which are the traces of images under modular parametrizations of complex multiplication points in the complex upper half-plane. We prove, for pairs of elliptic curves with isomorphic Galois representations, a general congruence of stabilized formal logarithms. This is done by first showing that the isomorphism of Galois representations implies a congruence of stabilized modular forms and then translating these to the congruence of formal logarithms using Honda’s theorem relating formal groups of elliptic curves to L -series and the modular parametrization. We use this congruence to show that examples of elliptic curves with analytic and algebraic rank 1 propagate in quadratic twist families.

## 349) Brendan Halstead, Moduli spaces of morphisms between cone stacks (22 Mar 2023)

We study morphisms between $\textit{cone stacks}$, objects defined by Cavelieri, Chan, Ulirsch, and Wise as a framework for moduli problems in tropical geometry. We construct a cone stack $[\Sigma, \Gamma]$ parameterizing morphisms between fixed cone stacks $\Sigma$ and $\Gamma.$ We also briefly discuss applications to logarithmic geometry.

## 348) Annie Wang, On the Hilbert Series of the Rational Cherednik Algebra in Type A n in Characteristic p (28 Feb 2023)

We study the polynomial representation of the rational Cherednik algebra of type $A$ in characteristic $p=3$ for $p$ dividing $n-2$, some parameter $t=0$, and generic parameter $c.$ We describe all the polynomials in the maximal proper graded submodule $\ker{\mathcal{B}}$, which is the kernel of the contravariant form $\mathcal{B},$ and we use this to find the Hilbert series of the irreducible quotient for the polynomial representation. We proceed degree by degree to explicitly determine the Hilbert series and work towards proving Etingof and Rains's conjecture in the case that $p=3$, $t=0$, and $n=kp+2.$

## 347) Tanya Khovanova (MIT), Rich Wang (PRIMES), Ending States of a Special Variant of the Chip-Firing Algorithm (arXiv.org, 21 Feb 2023)

We investigate a special variant of chip-firing, in which we consider an infinite set of rooms on a number line, some of which are occupied by violinists. In a move, we take two violinists in adjacent rooms, and send one of them to the closest unoccupied room to the left and the other to the closest unoccupied room to the right. We classify the different possible final states from repeatedly performing this operation. We introduce numbers $R(N,\ell,x)$ that count labeled recursive rooted trees with $N$ vertices, $\ell$ leaves, and the smallest rooted path ending in $x$. We describe the properties of these numbers and connect them to permutations. We conjecture that these numbers describe the probabilities ending with different final states when the moves are chosen uniformly.

## 346) Khalid Ajran, Juliet Bringas, Bangzheng Li, Easton Singer, Marcos Tirador (CrowdMath-2022), Factorization in Additive Monoids of Evaluation Polynomial Semirings (arXiv.org, 5 Feb 2023), published in Communications in Algebra 51:10 (2023): 4347-4362

For a positive real $α$, we can consider the additive submonoid $M$ of the real line that is generated by the nonnegative powers of $α$. When $α$ is transcendental, $M$ is a unique factorization monoid. However, when $α$ is algebraic, $M$ may not be atomic, and even when $M$ is atomic, it may contain elements having more than one factorization (i.e., decomposition as a sum of irreducibles). The main purpose of this paper is to study the phenomenon of multiple factorizations inside $M$. When $α$ is algebraic but not rational, the arithmetic of factorizations in $M$ is highly interesting and complex. In order to arrive to that conclusion, we investigate various factorization invariants of $M$, including the sets of lengths, sets of Betti elements, and catenary degrees. Our investigation gives continuity to recent studies carried out by Chapman, et al. in 2020 and by Correa-Morris and Gotti in 2022.

## 345) Benjamin Fan (PRIMES), Edward Qiao (PRIMES), Anran Jiao, Zhouzhou Gu, Wenhao Li, and Lu Lu, Deep Learning for Solving and Estimating Dynamic Macro-Finance Models (4 Feb 2023)

Deep learning has been shown to be an effective method for solving partial differential equations (PDEs) by embedding the PDE residual into the neural network loss function. In this paper, we design a methodology that utilizes deep learning to simultaneously solve and estimate canonical continuous-time general equilibrium models in financial economics, including (1) industrial dynamics of firms and (2) macroeconomic models with financial frictions. Through these applications, we illustrate the advantages of our method.

## 344) Steven Tan, Models for Somatic CAG Repeat Expansion in the Onset and Progression of Huntington's Disease (30 Jan 2023)

Huntington's Disease (HD) is an inherited neurodegenerative disease caused by alleles with 36 or more repeats of the trinucleotide sequence CAG in the huntingtin (HTT) gene. A person with HD inherits an allele with a certain CAG length (> 35) at birth, but somatic expansion within the brain is known to occur throughout their lifetime, resulting in a situation in which individual cells have longer and highly variable numbers of CAG repeats. Somatic expansion is increasingly thought to be a driver of disease onset, as age-at-onset associates with modifier alleles in DNA-repair genes that regulate somatic expansion. Thus, a better understanding of the mechanisms behind CAG repeat expansion could be crucial in revealing novel therapeutic targets. In this study, we adapted a stochastic birth-death model previously used for a different repeat-expansion disease (Myotonic Dystrophy Type 1, or DM1) to model CAG repeat expansion in HD. We made use of a new kind of biological data, in which CAG length has been measured precisely in many individual neurons of the most vulnerable type from post mortem brain samples. We found that single-process models consisting of only one length threshold and rate — models that succeeded in modeling DM1 — were unable to explain all features of repeat expansion data observed in HD patients. Effectively fitting the data required models consisting of two separate processes, suggesting that there may be two distinct biological mechanisms underlying CAG repeat expansion in HD. These processes appear to have differing rates and CAG length thresholds: one at roughly 36 CAGs — a threshold for instability — and another at 70 CAGs, which we hypothesize is a threshold for accelerated expansion. This model deepens our understanding of disease progression and can inform the design of clinical trials for new therapies that target the somatic expansion process.

## 343) Garett Brown, Linda He (PRIMES), and James Unwin, The Potential Impact of Primordial Black Holes on Exoplanet Systems (28 Jan 2023), forthcoming in Monthly Notices of the Royal Astronomical Society

The orbits of planetary systems can be deformed from their initial configurations due to close encounters with large astrophysical bodies. Candidates for close encounters include astrophysical black holes, brown dwarf stars, rogue planets, as well as hypothetical populations of primordial black holes (PBH) or dark matter microhalos. We show that potentially tens of thousands of exoplanetary systems in the Milky Way may have had close encounters with PBH significant enough to impact their planetary orbits. Furthermore, we propose that precision measurements of exoplanet orbital parameters could be used to infer or constrain the abundances of these astrophysical bodies. Specifically, focusing on PBH we numerically estimate the number of times that such objects pass through the local neighborhood of a given planetary system, and then analyze the statistical impact on the orbital parameters of such systems.

## 342) Nilay Mishra, On the Uniqueness of Certain Types of Circle Packings on Translation Surfaces (26 Jan 2023)

Consider a collection of finitely many polygons in $\mathbb C$, such that for each side of each polygon, there exists another side of some polygon in the collection (possibly the same) that is parallel and of equal length. A translation surface is the surface formed by identifying these opposite sides with one another. The $\mathcal{H}(1, 1)$ stratum consists of genus two translation surfaces with two singularities of order one. A circle packing corresponding to a graph $G$ is a configuration of disjoint disks such that each vertex of $G$ corresponds to a circle, two disks are externally tangent if and only if their vertices are connected by an edge in $G$, and $G$ is a triangulation of the surface. It is proven that for certain circle packings on $\mathcal{H}(1, 1)$ translation surfaces, there are only a finite number of ways the packing can vary without changing the contacts graph, if two disks along the slit are fixed in place. These variations can be explicitly characterized using a new concept known as \textit{splitting bigons}. Finally, the uniqueness theorem is generalized to a specific type of translation surfaces with arbitrary genus $g \geq 2$.

## 341) Yibo Gao (MIT) and Anthony Wang (PRIMES), Consecutive Patterns in Coxeter Groups (25 Jan 2023), published in Journal of Algebra , vol. 634 (15 November 2023): 650-666

For an arbitrary Coxeter group element $\sigma$ and a connected subset $J$ of the Coxeter diagram, the parabolic decomposition $\sigma=\sigma^J\sigma_J$ defines $\sigma_J$ as a consecutive pattern of $\sigma$, generalizing the notion of consecutive patterns in permutations. We then define the cc-Wilf-equivalence classes as an extension of the c-Wilf-equivalence classes for permutations, and identify non-trivial families of cc-Wilf-equivalent classes. Furthermore, we study the structure of the consecutive pattern poset in Coxeter groups and prove that its M\"{o}bius function is bounded by $2$ when the arguments belong to finite Coxeter groups, but can be arbitrarily large otherwise.

## 340) Eric Chen and Alex Zitzewitz, Unitary Conditions for Lamé and Heun Differential Operators (25 Jan 2023)

In this paper, we explore the connections between the so-called "accessory parameter" of the Heun Equation and the properties of its monodromy groups. In particular, we investigate which numerical values of the accessory parameter yield unitary monodromy groups (i.e., those that preserve a Hermitian inner product). To this end, we employ both analytical and computational methods, extending previous work on the Lamé Equation. In particular, for a large class of Heun Equations (generalizing the Lamé Equation), we prove a connection between unitarity and the traces of certain monodromy matrices. We exploit this theorem to create an algorithm that finds accessory parameters that yield unitary monodromy groups. Using this algorithm, we calculate and report the values of the accessory parameter that give rise to unitary monodromy groups. We also draw convergence maps, demonstrating the convergence and overall robustness of our algorithm. Finally, we derive an asymptotic formula for the desired accessory parameters which agrees with our numerical results.

## 339) Advay Goel (PRIMES) and Zoe Wellner (CMU), The Geometry and Limits of Young Partition Flow Polytopes (23 Jan 2023)

In 2017, Mészáros, Simpson, and Wellner demonstrated that certain flow polytopes resulting from Young tableaux are easily decomposed into simplices, and others have a natural relation to the well-known Tesler and CRY polytopes. Within a family of polytopes determined by a single tableaux shape, they introduced the limiting polytope. The limiting polytope is a useful notion since it is easy to decompose into a product of simplices. In this work, we use geometric decomposition to further examine the limiting process within each family of polytopes. Our main results analyze the family of hooks, and we demonstrate an algorithm to get geometric decompositions.

## 338) Yihao (Michael) Huang (PRIMES), Shangdi Yu (MIT), and Julian Shun (MIT), Faster Parallel Exact Density Peaks Clustering (16 Jan 2023)

Clustering multidimensional points is a fundamental data mining task, with applications in many fields, such as astronomy, neuroscience, bioinformatics, and computer vision. The goal of clustering algorithms is to group similar objects together. Density-based clustering is a clustering approach that defines clusters as dense regions of points. It has the advantage of being able to detect clusters of arbitrary shapes, rendering it useful in many applications. In this paper, we propose fast parallel algorithms for Density Peaks Clustering (DPC), a popular variant of density-based clustering. Existing exact DPC algorithms suffer from low parallelism both in theory and in practice, which limits their application to largescale data sets. Our most performant algorithm, which is based on priority search k d-trees, achieves O (log n log log n ) span (parallel time complexity). Our algorithm is also work-efficient, achieving a work complexity matching the best existing sequential exact DPC algorithm. In addition, we present another DPC algorithm based on a Fenwick tree that makes fewer assumptions for its average-case complexity to hold. We provide optimized implementations of our algorithms and evaluate their performance via extensive experiments. On a 30- core machine with two-way hyperthreading, we find that our best algorithm achieves a 10.8–13169x speedup over the previous best parallel exact DPC algorithm. Compared to the state-of-the-art parallel approximate DPC algorithm, our best algorithm achieves a geometric mean speedup of 55.8x while being exact.

## 337) Andrey Khesin (MIT), Andrew Tung (PRIMES), and Karthik Vedula (PRIMES), New Properties of Intrinsic Information and Their Relation to Bound Secrecy (16 Jan 2023)

Two parties, Alice and Bob, seek to generate a mutually agreed upon string of bits, unknown to an eavesdropper Eve, by sampling repeatedly from a joint probability distribution. The secret-key rate has been defined as the asymptotic rate at which Alice and Bob can extract secret bits after sampling many times from the probability distribution. The secret-key rate has been bounded above by two information-theoretic quantities, first by the intrinsic information, and more strongly by the reduced intrinsic information. However, in this paper we prove that the reduced intrinsic information is 0 if and only if the intrinsic information is 0. This result implies that at least one of the following two conjectures is false: either the conjecture of the existence of bound secrecy, distributions where the intrinsic information is positive but the secret-key rate is 0, or the conjecture that the reduced intrinsic information equals the secret-key rate. Furthermore, we introduce a number of promising approaches for showing that bound secrecy does indeed exist using the idea of binarization of random variables. We improve on previous work by giving an explicit construction for a particular candidate for bound secrecy of an information-erasing binarization.

## 336) Max Xu, Gonality Sequences of Multipartite Graphs (15 Jan 2023)

In this paper, we deal with a particular sequence associated with a graph, the gonality sequence. This gonality sequence is a part of a larger topic of the chipfiring game on a graph G . The gonality sequence of a graph measures how much the degree of a divisor on that graph needs to change in order to increase its rank. The portions of the gonality sequence are known for when the input is greater than the genus. However, there has been little work done to find the first terms of the gonality sequence. In this paper, we partially compute the first terms of the gonality sequence for some complete multipartite graphs. In particular, the ones with all but one partite class having one vertex are analyzed, and here we present some results and further conjectures.

## 335) Jiayi Dong and Anshul Rastogi, Locating regions of uncertainty in distributed systems using aggregate trace data (15 Jan 2023)

Distributed systems are central to countless applications in the modern world. These applications can have tens to thousands of components interacting making it difficult to identify the source of performance problems. Distributed tracing is widely used to elucidate the interactions within a distributed system; however, instrumenting system codebases can be tedious, and collecting tracing data generates overhead. Optimally, minimal instrumentation is added to regions of the codebase that explains the majority of the system's performance variation. We present a prototype application that highlights regions of performance uncertainty in a system, guiding developers to where instrumentation would most increase predictability. Using aggregate trace data, spans are ranked by uncertainty metrics, which are primarily the standard deviation and coefficient of variation of the exclusive latencies of an operation across multiple traces. We developed our prototype in Python and applied it to trace data extracted from HotROD. We evaluated our tool on four test scenarios where we injected latency into services in HotROD. Our tool highlights the service(s) with injected latency in all four test cases.

## 334) Alicia Li and Matan Yablon, Adversarial Attacks Against Online Reinforcement Learning Agents in MDPs (15 Jan 2023)

Online Reinforcement Learning (RL) is a fast-growing branch of machine learning with increasingly important applications. Moreover, making RL algorithms robust against perturbations is essential to their utility in the real world. Adversarial RL, in which an attacker attempts to degrade an RL agent's performance by perturbing the environment, can be used to understand how to robustify RL systems. In this work, we connect an adversarial attack model to streaming algorithms: the victim samples paths based on its interactions with the environment, while the adversary corrupts this stream of data. We construct an attack algorithm in Markov Decision Processes (MDPs) for a random-sampling victim and prove its optimality, in addition to investigating an adversarial strategy against an epsilon-greedy victim with a warm start period. In the epsilon-greedy setting, we bound adversarial corruption and analyze how to exploit this highly adaptive model to improve upon warm start budget. Experimentally, we show that our algorithm outperforms baseline attacks, and we generate random MDPs to characterize how their general-case structure affects the adversary's ability to maintain its warm start corruption.

## 333) Jeffrey Chen (PRIMES) and Jesse Selover (UMass Amherst), Positivity properties of the q -hit numbers (15 Jan 2023)

We consider the problem of counting matrices over a finite field with fixed rank and support contained in a fixed set. The count of such matrices gives a q -analogue of the classical rook number, but it is known not to be polynomial in q in general. We use inclusion-exclusion on the support of the matrices and the orbit counting method of Lewis et al. to show that the residues of these functions in low degrees are polynomial. We define a generalization of the rook and hit numbers over certain classes of graphs. This provides us a formula for residues of the q -rook and q -hit numbers in low degrees. We analyze the residues of the q -hit number and show that the coefficient of q $-$ 1 in the q -hit number is always non-negative.

## 332) Sacha Servan-Schreiber (MIT), Simon Beyzerov (PRIMES), Eli Yablon (PRIMES), and Hyojae Park (PRIMES), Private Access Control for Function Secret Sharing (15 Jan 2023)

Function Secret Sharing (FSS; Eurocrypt 2015) allows a dealer to share a function f with two or more evaluators. Given secret shares of a function f , the evaluators can locally compute secret shares of f(x) on an input x , without learning information about f . In this paper, we initiate the study of access control for FSS. Given the shares of f, the evaluators can ensure that the dealer is authorized to share the provided function. For a function family $F$ and an access control list defined over the family, the evaluators receiving the shares of $f ∈ F$ can efficiently check that the dealer knows the access key for f . This model enables new applications of FSS, such as: (1) anonymous authentication in a multiparty setting, (2) access control in private databases, and (3) authentication and spam prevention in anonymous communication systems. Our definitions and constructions abstract and improve the concrete efficiency of several recent systems that implement ad-hoc mechanisms for access control over FSS. The main building block behind our efficiency improvement is a discrete-logarithm zero-knowledge proof-ofknowledge over secret-shared elements, which may be of independent interest. We evaluate our constructions and show a 50–70× reduction in computational overhead compared to existing access control techniques used in anonymous communication. In other applications, such as private databases, the processing cost of introducing access control is only 1.5–3× when amortized over databases with 500,000 or more items.

## 331) Derek Liu (PRIMES) and Yuan Yao (MIT), Arrangements of Simplices in Fine Mixed Subdivisions (12 Jan 2023)

A regular simplex of side length $n$ can be subdivided into multiple polytopes, each of which is a Minkowski sum of some faces of a unit simplex. Ardila and Billey have shown that exactly $n$ of these cells must be simplices, and their positions must be in a “spread-out” arrangement. In this paper, we consider their question of whether every spread-out arrangement of simplices can be extended into such a subdivision, especially in the three-dimension case. We prove that a specific class of these arrangements, namely those that project down to a two-dimensional spread-out arrangement, all extend to a subdivision.

## 330) George Cao (PRIMES), Kent B. Vashaw (MIT), On the decomposition of tensor products of monomial modules for finite 2-groups (arXiv.org, 11 Jan 2023)

Dave Benson conjectured in 2020 that if $G$ is a finite $2$-group and $V$ is an odd-dimensional indecomposable representation of $G$ over an algebraically closed field $\Bbbk$ of characteristic $2$, then the only odd-dimensional indecomposable summand of $V \otimes V^*$ is the trivial representation $\Bbbk$. This would imply that a tensor power of an odd-dimensional indecomposable representation of $G$ over $\Bbbk$ has a unique odd-dimensional summand. Benson has further conjectured that, given such a representation $V$, the function sending a positive integer $n$ to the dimension of the unique odd-dimensional indecomposable summand of $V^{\otimes n}$ is quasi-polynomial. We examine this conjecture for monomial modules, a class of graded representations for the group $\mathbb{Z}/{2^r}\mathbb{Z} \times \mathbb{Z}/{2^s}\mathbb{Z}$ which correspond to skew Young diagrams. We prove the tensor powers conjecture for several modules, giving some of the first nontrivial cases where this conjecture has been verified, and we give conjectural quasi-polynomials for a broad range of monomial modules based on computational evidence.

## 329) Jesse Geneson (SJSU), Ethan Zhou (PRIMES), Online Learning of Smooth Functions (arXiv.org, 4 Jan 2023)

In this paper, we study the online learning of real-valued functions where the hidden function is known to have certain smoothness properties. Specifically, for $q \ge 1$, let $\mathcal F_q$ be the class of absolutely continuous functions $f: [0,1] \to \mathbb R$ such that $\|f'\|_q \le 1$. For $q \ge 1$ and $d \in \mathbb Z^+$, let $\mathcal F_{q,d}$ be the class of functions $f: [0,1]^d \to \mathbb R$ such that any function $g: [0,1] \to \mathbb R$ formed by fixing all but one parameter of $f$ is in $\mathcal F_q$. For any class of real-valued functions $\mathcal F$ and $p>0$, let $\text{opt}_p(\mathcal F)$ be the best upper bound on the sum of $p^{\text{th}}$ powers of absolute prediction errors that a learner can guarantee in the worst case. In the single-variable setup, we find new bounds for $\text{opt}_p(\mathcal F_q)$ that are sharp up to a constant factor. We show for all $\varepsilon \in (0, 1)$ that $\text{opt}_{1+\varepsilon}(\mathcal{F}_{\infty}) = Θ(\varepsilon^{-\frac{1}{2}})$ and $\text{opt}_{1+\varepsilon}(\mathcal{F}_q) = Θ(\varepsilon^{-\frac{1}{2}})$ for all $q \ge 2$. We also show for $\varepsilon \in (0,1)$ that $\text{opt}_2(\mathcal F_{1+\varepsilon})=Θ(\varepsilon^{-1})$. In addition, we obtain new exact results by proving that $\text{opt}_p(\mathcal F_q)=1$ for $q \in (1,2)$ and $p \ge 2+\frac{1}{q-1}$. In the multi-variable setup, we establish inequalities relating $\text{opt}_p(\mathcal F_{q,d})$ to $\text{opt}_p(\mathcal F_q)$ and show that $\text{opt}_p(\mathcal F_{\infty,d})$ is infinite when $p<d$ and finite when $p>d$. We also obtain sharp bounds on learning $\mathcal F_{\infty,d}$ for $p < d$ when the number of trials is bounded.

## 328) Coleman DuPlessie and Eddie Wei, Deep Learning Transformers for Non-cyclical Kinematics (31 Dec 2022)

Machine learning is a useful tool in the field of kinematics because of its ability to easily analyze high-dimensional temporal data and recognize patterns that are often not discernible to humans. Many machine learning models have already been applied to human kinematics, yet the transformer, a model that is especially good at capturing long-distance relationships in data, has not yet been applied to this field. Because common models such as LSTMs perform much worse on non-cyclical data than on cyclical data, their usefulness in the field of kinematics is limited. We theorize that, because Transformers can better represent long-term dependencies, they will achieve superior performance on tasks in this field, where the time series data is significantly aperiodic. In this work, we have compared Transformers and similar models to an LSTM model and a heuristic benchmark on non-cyclical, 3-dimensional positional data from CMU’s Quality of Life Grand Challenge Kitchen dataset and found that vanilla Transformers are able to outperform both LSTMs and simple heuristics.

## 327) S. K. Devalapurkar (Harvard), and M. L. Misterka (PRIMES), Generalized n-Series and de Rham Complexes (31 Dec 2022)

The goal of this article is to study some basic algebraic and combinatorial properties of ``generalized $n$-series'' over a commutative ring $R$, which are functions $s: \mathbb{Z}_{\geq 0} \to R$ satisfying a mild condition. A special example of generalized $n$-series is given by the $q$-integers $\frac{q^n-1}{q-1} \in \mathbb{Z}[q]$. Given a generalized $n$-series $s$, one can define $s$-analogues of factorials (via $n!_s = \prod_{i=1}^n s(n)$) and binomial coefficients. We prove that Pascal's identity, the binomial identity, Lucas' theorem, and the Vandermonde identity admit $s$-analogues; each of these specialize to their appropriate $q$-analogue in the case of the $q$-integer generalized $n$-series. We also study the growth rates of generalized $n$-series defined over the integers. Finally, we define an $s$-analogue of the ($q$-)derivative, and prove $s$-analogues of the Poincar\'e lemma and the Cartier isomorphism for the affine line, as well as a pullback square due to Bhatt-Lurie.

## 326) Matvey Borodin, Ethan Liu, Justin Zhang, The Ideal of Vanishing Polynomials and the Ring of Polynomial Functions (25 Dec 2022; arXiv.org, 24 Sept 2023)

Vanishing polynomials are polynomials over a ring which output $0$ for all elements in the ring. In this paper, we study the ideal of vanishing polynomials over specific types of rings, along with the closely related ring of polynomial functions. In particular, we provide several results on generating vanishing polynomials. We first analyze the ideal of vanishing polynomial over $\mathbb{Z}_n$, the ring of the integers modulo $n$. We then establish an isomorphism between the vanishing polynomials of a ring and the vanishing polynomials of the constituent rings in its decomposition. Lastly, we generalize our results to study the ideal of vanishing polynomials over arbitrary commutative rings.

## 325) Felix Gotti (MIT), Joseph Vulakh (PRIMES), On the atomic structure of torsion-free monoids (arXiv.org, 16 Dec 2022), published in Semigroup Forum 107 (2023): 402–423

Let $M$ be a cancellative and commutative (additive) monoid. The monoid $M$ is atomic if every non-invertible element can be written as a sum of irreducible elements, which are also called atoms. Also, $M$ satisfies the ascending chain condition on principal ideals (ACCP) if every increasing sequence of principal ideals (under inclusion) becomes constant from one point on. In the first part of this paper, we characterize torsion-free monoids that satisfy the ACCP as those torsion-free monoids whose submonoids are all atomic. A submonoid of the nonnegative cone of a totally ordered abelian group is often called a positive monoid. Every positive monoid is clearly torsion-free. In the second part of this paper, we study the atomic structure of certain classes of positive monoids.

## 324) Paul Gutkovich (PRIMES) and Zi Song Yeoh (MIT), Computing Truncated Metric Dimension of Trees (8 Dec 2022)

Let $G=(V,E)$ be a simple, unweighted, connected graph. Let $d(u,v)$ denote the distance between vertices $u,v$. A resolving set of $G$ is a subset $S$ of $V$ such that knowing the distance from a vertex $v$ to every vertex in $S$ uniquely identifies $v$. The metric dimension of $G$ is defined as the size of the smallest resolving set of $G$. We define the $k$-truncated resolving set and $k$-truncated metric dimension of a graph similarly, but with the notion of distance replaced with $d_k(u,v) := \min(d(u,v),k+1)$. In this paper, we demonstrate that computing the $k$-truncated metric dimension of trees is NP-Hard for general $k$. We then present a polynomial-time algorithm to compute the $k$-truncated metric dimension of trees when $k$ is a fixed constant.

## 323) Nitya Mani (MIT) and Edward Yu (PRIMES), Turán Problems for Mixed Graphs (arXiv.org, 23 Oct 2022)

We investigate natural Turán problems for mixed graphs, generalizations of graphs where edges can be either directed or undirected. We study a natural Turán density coefficient that measures how large a fraction of directed edges an $F$-free mixed graph can have; we establish an analogue of the Erdős-Stone-Simonovits theorem and give a variational characterization of the Turán density coefficient of any mixed graph (along with an associated extremal $F$-free family). This characterization enables us to highlight an important divergence between classical extremal numbers and the Turán density coefficient. We show that Turán density coefficients can be irrational, but are always algebraic; for every $k \in \mathbb N$, we construct a family of mixed graphs whose Turán density coefficient has algebraic degree $k$.

## 322) Alan Bu, Joseph Vulakh, and Alex Zhao, Length-Factoriality and Pure Irreducibility (arXiv.org, 13 Oct 2022), published in Communications in Algebra 51:9 (2023): 3745-3755

An atomic monoid $M$ is called length-factorial if for every non-invertible element $x \in M$, no two distinct factorizations of $x$ into irreducibles have the same length (i.e., number of irreducible factors, counting repetitions). The notion of length-factoriality was introduced by J. Coykendall and W. Smith in 2011 under the term 'other-half-factoriality': they used length-factoriality to provide a characterization of unique factorization domains. In this paper, we study length-factoriality in the more general context of commutative, cancellative monoids. In addition, we study factorization properties related to length-factoriality, namely, the PLS property (recently introduced by Chapman et al.) and bi-length-factoriality in the context of semirings.

## 321) Alan Lee, Connectedness in Friends-and-Strangers Graphs of Spiders and Complements (arXiv.org, 5 Oct 2022)

Let $X$ and $Y$ be two graphs with vertex set $[n]$. Their friends-and-strangers graph $\mathsf{FS}(X,Y)$ is a graph with vertex set $S_n$, and two permutations $σ$ and $σ'$ are adjacent if they are separated by a transposition $\{a,b\}$ such that $a$ and $b$ are adjacent in $X$ and $σ(a)$ and $σ(b)$ are adjacent in $Y$. Specific friends-and-strangers graphs such as $\mathsf{FS}(\mathsf{Path}_n,Y)$ and $\mathsf{FS}(\mathsf{Cycle}_n,Y)$ have been researched, and their connected components have been enumerated using various equivalence relations such as double-flip equivalence. A spider graph is a collection of path graphs that are all connected to a single center point. In this paper, we delve deeper into the question of when $\mathsf{FS}(X,Y)$ is connected when $X$ is a spider and $Y$ is the complement of a spider or a tadpole.

## 320) Scott T. Chapman (SHSU), Caroline Liu (PRIMES), Annabel Ma (PRIMES), Andrew Zhang (PRIMES), On the factorization invariants of arithmetical congruence monoids (arXiv.org, 3 Oct 2022)

In this paper, we study various factorization invariants of arithmetical congruence monoids. The invariants we investigate are the catenary degree, a measure of the maximum distance between any two factorizations of the same element, the length density, which describes the distribution of the factorization lengths of an element, and the omega primality, which measures how far an element is from being prime.

## 319) Colin Defant (MIT), David Dong (PRIMES), Alan Lee (PRIMES), Michelle Wei (PRIMES), Connectedness and Cycle Spaces of Friends-and-Strangers Graphs (arXiv.org, 4 Sept 2022)

If $X=(V(X),E(X))$ and $Y=(V(Y),E(Y))$ are $n$-vertex graphs, then their friends-and-strangers graph $\mathsf{FS}(X,Y)$ is the graph whose vertices are the bijections from $V(X)$ to $V(Y)$ in which two bijections $\sigma$ and $\sigma'$ are adjacent if and only if there is an edge $\{a,b\}\in E(X)$ such that $\{\sigma(a),\sigma(b)\}\in E(Y)$ and $\sigma'=\sigma\circ (a\,\,b)$, where $(a\,\,b)$ is the permutation of $V(X)$ that swaps $a$ and $b$. We prove general theorems that provide necessary and/or sufficient conditions for $\mathsf{FS}(X,Y)$ to be connected. As a corollary, we obtain a complete characterization of the graphs $Y$ such that $\mathsf{FS}(\mathsf{Dand}_{k,n},Y)$ is connected, where $\mathsf{Dand}_{k,n}$ is a dandelion graph; this substantially generalizes a theorem of the first author and Kravitz in the case $k=3$. For specific choices of $Y$, we characterize the spider graphs $X$ such that $\mathsf{FS}(X,Y)$ is connected. In a different vein, we study the cycle spaces of friends-and-strangers graphs. Naatz proved that if $X$ is a path graph, then the cycle space of $\mathsf{FS}(X,Y)$ is spanned by $4$-cycles and $6$-cycles; we show that the same statement holds when $X$ is a cycle and $Y$ has domination number at least $3$. When $X$ is a cycle and $Y$ has domination number at least $2$, our proof sheds light on how walks in $\mathsf{FS}(X,Y)$ behave under certain Coxeter moves.

## 318) Paula Bergero, Laura P. Schaposnik, and Grace Wang (PRIMES), Correlations Between COVID-19 and Dengue (arXiv.org, 27 Jul 2022), published in Nature Scientific Reports (27 Jan 2023)

A dramatic increase in the number of outbreaks of Dengue has recently been reported, and climate change is likely to extend the geographical spread of the disease. In this context, this paper shows how a neural network approach can incorporate Dengue and COVID-19 data as well as external factors (such as social behaviour or climate variables), to develop predictive models that could improve our knowledge and provide useful tools for health policy makers. Through the use of neural networks with different social and natural parameters, in this paper we define a Correlation Model through which we show that the number of cases of COVID-19 and Dengue have very similar trends. We then illustrate the relevance of our model by extending it to a Long short-term memory model (LSTM) that incorporates both diseases, and using this to estimate Dengue infections via COVID-19 data in countries that lack sufficient Dengue data.

## 317) Zifan (Carl) Guo (PRIMES) and William S. Moses (MIT), Understanding High-Level Properties of Low-Level Programs Through Transformers (8 July 2022)

Transformer models have enabled breakthroughs in the field of natural language processing largely because unlike other models, Transformers can be trained on a large corpus of unlabeled data. One can then perform fine-tuning on the model to fit a specific task. Unlike natural language, which is somewhat tolerant of minor differences in word choices or ordering, the structured nature of programming languages means that program meaning can be completely redefined or be invalid if even one token is altered. In comparison to highlevel languages, low-level languages are less expressive and more repetitive with more details from the computer microarchitecture. Whereas recent literature has examined how to effectively use Transformer models on high-level programming semantics, this project explores the effectiveness of applying Transformer models on low-level representations of programs that can shed light on better optimizing compilers. In this paper, we show that Transformer models can translate C to LLVM-IR with high accuracy, by training on a parallel corpus of functions extract from 1 million compilable, open-sourced C programs (AnghaBench) and its corresponding LLVM-IR after compiling with Clang. Our model shows a $49.57\%$ verbatim match when performed on the AnghaBench dataset and a high BLEU score of 87.68. We also present another case study that analyzes x86 64 basic blocks for estimating their throughput and match the state of the art. We show through ablation studies that a collection of preprocessing simplifications of the low-level programs especially improves the model’s ability to generate low level programs and discuss data selection, network architecture, as well as limitations to the use of Transformers on low-level programs.

## 316) Tanisha Saxena (PRIMES) and Jun Wan (MIT), A Systematic Study on the Difference and Conversion Between Synchronous and Asynchronous Protocols (1 July 2022)

In this paper, we provide a fundamental analysis of the similarities and differences between synchronous and asynchronous distributed systems. Specifically, we define a special and normal adversary such that any protocol for a synchronous system that is resilient to the special adversary can be replicated by a protocol for an asynchronous system that is resilient to the normal adversary. Protocols for the synchronous model are less complex, as the guarantee that messages will be delivered within a bounded time makes it easy to determine the sequence of events in the system. But, this is unrealistic in the real world, as systems tend to be asynchronous where messages are not guaranteed to be delivered in a timely manner. Protocols for the asynchronous model, on the other hand, are more complex as there are many edge cases to account for. Our adversaries help to create intermediary models that allow us to replicate protocol outputs across both synchronous and asynchronous systems, allowing for simpler creation of protocols that remain functional under the asynchronous model.

## 2021 Research Papers

315) felix gotti (mit), bangzheng li (primes), divisibility and a weak ascending chain condition on principal ideals (arxiv.org, 12 dec 2022).

An integral domain $R$ is atomic if each nonzero nonunit of $R$ factors into irreducibles. In addition, an integral domain $R$ satisfies the ascending chain condition on principal ideals (ACCP) if every increasing sequence of principal ideals (under inclusion) becomes constant from one point on. Although it is not hard to verify that every integral domain satisfying ACCP is atomic, examples of atomic domains that do not satisfy ACCP are notoriously hard to construct. The first of such examples was constructed by A. Grams back in 1974. In this paper we delve into the class of atomic domains that do not satisfy ACCP. To better understand this class, we introduce the notion of weak-ACCP domains, which generalizes that of integral domains satisfying ACCP. Strongly atomic domains were introduced by D. D. Anderson, D. F. Anderson, and M. Zafrullah in 1990. It turns out that every weak-ACCP domain is strongly atomic, and so we introduce a taxonomic classification on our class of interest: ACCP implies weak-ACCP, which implies strong atomicity, which implies atomicity. We study this chain of implications, putting special emphasis on the weak-ACCP property. This allows us to provide new examples of atomic domains that do not satisfy ACCP.

## 314) Tanya Khovanova (MIT) and Atharva Pathak (PRIMES), Combinatorial Aspects of the Card Game War (arXiv.org, 28 Jan 2022)

This paper studies a single-suit version of the card game War on a finite deck of cards. There are varying methods of how players put the cards that they win back into their hands, but we primarily consider randomly putting the cards back and deterministically always putting the winning card before the losing card. The concept of a $\textit{passthrough}$ is defined, which refers to a player playing through all cards in their hand from a particular point in the game. We consider games in which the second player wins during their first passthrough. We introduce several combinatorial objects related to the game: game graphs, win-loss sequences, win-loss binary trees, and game posets. We show how these objects relate to each other. We enumerate states depending on the number of rounds and the number of passthroughs.

## 313) Luke Robitaille, Topological Entropy of Simple Braids (22 Jan 2022)

Mathematical objects called $\textit{braids}$ are formed from “strands” (like string or yarn) that intertwine. A certain collection of braids, called $\textit{simple braids}$, correspond to permutations, depending on how the strands get permuted. We can think of braids as maps from a disc with some “punctures” to itself; using this idea, we can consider the $\textit{topological entropy}$ of a braid, which can be zero or positive. What proportion of simple braids have positive topological entropy? The main theorem of this project is that, in the limit as the number of strands increases, the proportion of simple braids that have positive topological entropy approaches 1. This can be proved by showing that we can almost always find a long cycle in the permutation that will enable us to get a braid with three strands that has positive topological entropy, yielding the theorem. Topological entropy of braids can have use beyond just being interesting mathematics, such as for considering how to stir fluids.

## 312) Andrew Gu, On LU Matrices and Springer Theory (19 Jan 2022)

In this paper, we investigate and find the number of LU matrices in $GL_n(\mathbb{F}_q)$ that are similar to a regular semisimple $s$ in $GL_n(\mathbb{F}_q)$. Linking our results with M.-T. Trinh's study of certain ``generalized Steinberg varieties,'' we expand on his work. Trinh has established certain numerical identities coming from a $P=W$ conjecture of Cataldo-Hausel-Migliorini between affine Springer fibers and these generalized Steinberg varieties. The results of this paper provide numerical evidence of the relation between Springer fibers and LU matrices. Using a linear-algebraic approach, we find a direct relation between LU matrices and Trinh's spaces. Consequently, we derive a closed formula for a point count of LU matrices that is a constant factor from the point count of Trinh's spaces. Furthermore, we identify a common point count among these sets. From this we propose a conjecture that generalizes our results.

## 311) Zifan (Carl) Guo, The Effectiveness of Transformer Models for Analyzing Low-Level Programs (18 Jan 2022)

Recently, transformer networks have enabled breakthroughs in the field of natural language processing. This is partially due to the fact that transformer models can be first trained on a large corpus of unlabeled data prior to fine-tuning on a downstream task. Unlike natural language, which is somewhat tolerant of minor differences in word choices or ordering, the structured nature of programming languages means that program meaning can be completely redefined or be invalid if even one token is altered. In comparison to high-level languages, low-level languages are less expressive and more repetitive with more details from the computer microarchitecture. Whereas recent literature has examined how to effectively use transformer models on high-level programming semantics, this project explores the effectiveness of applying transformer models on low-level representations of programs that can shed light on better optimizing compilers. In this paper, we show that transformer models can translate C to LLVM-IR with high accuracy, by training on a parallel corpus of functions extract from 1 million compilable, open-sourced C programs (AnghaBench) and its corresponding LLVM-IR after compiling with Clang. We also present another case study that analyzes x86_64 basic blocks for estimating their throughput. We discuss various changes in data selection, program representation, network architecture, and other modifications that influence the effectiveness of transformer models on low-level programs.

## 310) Arun S. Kannan and Zifan (Atticus) Wang (PRIMES), Representation Stability and Finite Orthogonal Groups (17 Jan 2022)

In this paper, we prove stability results about orthogonal groups over finite commutative rings where 2 is a unit. Inspired by Putman and Sam (2017), we construct a category $\mathbf{OrI}(R)$ and prove a Noetherianity theorem for the category of $\mathbf{OrI}(R)$-modules. This implies an asymptotic structure theorem for orthogonal groups. In addition, we show general homological stability theorems for orthogonal groups, with both untwisted and twisted coefficients, partially generalizing a result of Charney (1987).

## 309) Ilaria Seidel, Bounds on Generalized Symmetric Numerical Semigroups (16 Jan 2022)

Numerical semigroups are combinatorial objects that are easy to define, but have rich connections to other fields. Certain families of numerical semigroups are of particular interest because of their connections to algebraic geometry. We focus on one such family known as symmetric semigroups, and analyze the rate of growth of the number of symmetric semigroups $S(g)$ with genus $g$. Then, we partition semigroups of genus $g$ by their Frobenius number, and denote by $N(g, F)$ the number of semigroups with genus $g$ and Frobenius number $F$. We extend results from $S(g)$ to $N(g, 2g-k)$ for $k$ fixed in the range $1 \leq k \leq g$. We state a conjecture about the local behavior of the ratio $\frac{S(g+1)}{S(g)}$, depending on the residue of $g \pmod 3$. Finally, we generalize this conjecture to include $N(g, 2g-k)$ for fixed $k$.

## 308) Kevin Cong, Square Tilings of Translation Surfaces (16 Jan 2022)

Translation surfaces are obtained by identifying opposite edges of a polygon with an even number of sides, paired together. We explore the question of tiling translation surfaces including the torus and the surfaces generated by the regular octagon with squares. Given any tiling, we identify its contacts graph, a triangulation formed by corresponding one vertex per square and drawing edges between vertices corresponding to adjacent squares. In particular, we prove that under certain conditions, there is exactly one torus tiling that has contacts graph a given torus triangulation. We then provide a method to approximately construct this tiling. We also show that the regular octagon translation surface cannot be tiled with squares. However, we give constructive tilings of translation surfaces corresponding to certain affine transformations of the octagon.

## 307) Akhil Kammila, Proposed Improvements to the Tor Handshake (15 Jan 2022)

Tor is the world’s largest anonymous communication network. It conceals its users’ identities by sending their traffic through three successive Tor relays. To establish connections between users, relays, and destinations, Tor uses a unique two-staged handshake. The first stage is a modified version of TLS 1.2 and the second stage is a fully encrypted exchange of Tor cells. The two-stage process enables both parties to authenticate while masking the differences that the Tor’s handshake has from standard TLS. The Tor handshake has multiple shortcomings when compared to widely-used cryptographic protocols like TLS and QUIC. It has high latency that detracts from the user experience and increased complexity that makes maintenance challenging. The first stage of the handshake also only supports TLS 1.2 despite TLS 1.3’s release in 2018. Our work presents an analysis of Tor’s handshake and proposes improvements. We find messages in the second stage of the Tor handshake that are redundant. Most notably, the responder sends a certificate that is not necessary for authentication. Removing these messages reduces the data transferred in the handshake without compromising the key exchange or authentication. Further, we find that removing backward compatibility from the Tor handshake allows for the trivial use of TLS 1.3 in the first stage. This reduces the round-trips and improves the security of the Tor handshake.

## 306) Abigail Thomas, The Implementation of Model Pruning to Optimize zk-SNARKs (15 Jan 2022)

Zero-Knowledge Succinct Non-Interactive Arguments of Knowledge (zk-SNARK)s are used to convince a verifier that a server possesses certain information without revealing these private inputs. Thus, zk-SNARKs can be useful when outsourcing computations for cloud computing. The proofs returned by the server must be less computationally intensive than the given task, but the more complex the task, the more expensive the proof. We present a method that involves model pruning to decrease the complexity of the given task and thus the proof as well, to allow clients to outsource more complex programs. The proposed method harnesses the beneﬁts of producing accurate results using a lower number of constraints, while remaining secure.

## 305) Vishnu Emani (PRIMES), Vijay Govindarajan, and David Hoganson (Boston Children's Hospital), Computational Fluid Modeling for Surgical Planning of Single Ventricle Congenital Heart Defects (15 Jan 2022)

Single ventricle defects (SVD) refer to the collection of congenital heart defects in which one chamber of the heart remains weak or underdeveloped. The most common palliative treatment for SVD physiologies involves a 3-stage surgical intervention, ending with the Fontan procedure. For patients with bilateral Superior Vena Cavae (SVC), the bilateral bidirectional Glenn (BBDG) procedure is typically employed. The primary goal of this study was to examine the effects of various physiological factors, such as vascular sizes, hepatic vein angle, curvature and position of the Fontan conduit, and the construction of a neo-innominate vein on the distribution of hepatic flow to the lungs in BBDG geometries.

## 304) Tanisha Saxena (PRIMES) and Jun Wan (MIT), A Compromise Between Synchronous and Asynchronous Systems (15 Jan 2022)

In this paper, we introduce a partially synchronous model for distributed systems such that any protocol for our model can be transformed to a corresponding protocol for the asynchronous model. Given a distributed system with $n$ users, we define a normal adversary as one that allows up to $ f (f < n/2)$ users to send any arbitrary message at any time, and a special adversary that can, additionally, block up to $f$ message channels for any number of users. We prove that, for any synchronous protocol that is resilient to the special adversary, there is an equivalent protocol for the asynchronous model that is resilient to the normal adversary. The special adversary helps us relax the restriction of time-bounded delivery and provides a model that is useful in analyzing if a synchronous protocol can be modified to work correctly in an asynchronous distributed system. Our model provides a basis to use synchronous protocols to function on asynchronous systems such as electronic banking and Blockchain systems distributed across the Internet.

## 303) Yavor Litchev, Signature Scheme with Access Control (15 Jan 2022)

A wide variety of digital signature schemes currently exist, from RSA to El-Gamal to Schnorr. More recently, multi-party signature schemes have been developed, including distributed signature schemes and threshold signature schemes. In particular, threshold signature schemes provide useful functionality, in that they require the number of participating parties to pass a threshold in order to generate a valid signature. However, they are limited in their complexity, as they can only model a threshold function. The proposed signature scheme (monotonic signature scheme) allows for the modeling of complex functions, so long as they are monotonic. This would allow for a much greater degree of access control, all while security and correctness are preserved.

## 302) Jack Wang, Exploration of Capabilities and Limitations in View Change of the X-Fields Model (15 Jan 2022)

Generating images of the same scenes from different perspectives — whether that is from different points, from different angles, under varying illumination, or with other parameters — has a myriad of use cases, stretching from creating debug models to producing smooth videos. In the X-Fields model, hard-coded graphics tricks like lighting, 3D projection, and albedo are used to supplement neural networks in creating a differentiable map for the image parameters and the actual pixels using sample images and their corresponding coordinate values. Although X-Fields performs well on datasets of images concentrated on a 2D (x, y) plane relative to alternative interpolation methods, the original model cannot support broader, practical use cases like the interpolation of images in different 3D (x, y, z) positions. In this paper, we use 3D images and coordinates generated by the 3DB framework in our dimensionally expanded X-Fields model. We find that the new model can generate promising interpolation results with relatively sparse datasets and with large view angle changes; parameters such as learning rate, the bandwidth parameter in soft blending, and others have impact over the interpolation quality and construct trade-offs between training cost and interpolation quality; and that adding certain backgrounds (like the ocean) reference images can pose challenges for interpolation.

## 301) Garrett Heller (PRIMES) and Chengyang Shao (MIT), Strichartz and Multi-linear Estimates for the One-dimensional Periodic Dysthe equation (11 Jan 2022)

This paper presents Strichartz estimates for the linearized 1D periodic Dysthe equation on the torus, namely estimate of the $L^6_{x,t}(\mathbb{T}^2)$ norm of the solution in terms of the initial data, and estimate of the $L^4_{x,t}(\mathbb{T}^2)$ norm in terms of the Bourgain space norm. The paper also presents other results such as bilinear and trilinear estimates pertaining to local well-posedness of the 1-dimensional periodic Dysthe equation in a suitable Bourgain space, and ill-posedness results in Sobolev spaces.

## 300) Neil Chowdhury, Interplay Between Loop Extrusion and Compartmentalization During Mitosis (10 Jan 2022)

During mitosis, DNA changes its physical structure from diffuse chromatin spread throughout the cell nucleus to discrete, compacted, cylindrical chromatids. This process is essential for cells to be able to transfer replicated chromosomes to the daughter nuclei. During interphase, chromatin is compartmentalized into heterochromatin and euchromatin, resulting in a visible signal in Hi-C contact maps. However, as the cell enters mitosis, this signal is disrupted, only to reappear after the cell divides. This paper explores the interphase and mitotic states by modeling DNA using polymer simulations. It is shown that loop extrusion, the mechanism underlying mitotic chromosome formation, can simultaneously be responsible for disrupting compartmentalization.

## 299) Nathan Xiong (PRIMES) and Pu Yu (MIT), The Master Field and Free Brownian Motions (10 Jan 2022)

The master field on the plane is the large $N$ limit of the Wilson loop functionals from the two-dimensional Yang–Mills holonomy process. In this paper, we redefine the master field purely through free Brownian motions, so that its definition is independent from finite $N$ Yang–Mills theory. From this aspect, we prove that the master field does not depend on the lasso basis chosen on a graph. We also give a new, elementary proof for the Makeenko–Migdal equations, which allow us to efficiently calculate the master field of any loop via a system of differential equations. While previous work in this field is mostly differential geometric in nature, our proofs all use combinatorial techniques, heavily utilizing the moment-cumulant relation from free probability.

## 298) Sushanth Sathish Kumar, The Restricted Lie Algebra Structure on the Bar Spectral Sequence of an Iterated Loop Space (8 Jan 2022)

There is a rich algebraic structure in the mod $p$ homology of the iterated loop space $H_*(\Omega^n X; \mathbb{F}_p)$. It admits a Lie bracket called the Browder bracket that is compatible with the Dyer-Lashof operations $Q_0, Q_1,\ldots, Q_{n-1}$. Furthermore, the top Dyer-Lashof operation $Q_{n-1}$ is a restriction for the Browder bracket. Ni proved that the Browder bracket on the homology $H_*(\Omega^n X)$ converges to the bracket on $H_*(\Omega^{n-1} X)$ in the bar spectral sequence, making it a spectral sequence of Poission-Hopf algebras. Our goal is to use the bar spectral sequence to relate the restricted Lie algebra structure given by the top Dyer-Lashof operation on $H_*(\Omega^n X; \mathbb{F}_2)$ to that of $H_*(\Omega^{n-1} X; \mathbb{F}_2)$.

## 297) Nancy Jiang, Bangzheng Li, and Sophie Zhu, On the primality and elasticity of algebraic valuations of cyclic free semirings (arXiv.org, 4 Jan 2022), published in International Journal of Algebra and Computation 33:2 (2023): 197-210

A cancellative commutative monoid is atomic if every non-invertible element factors into irreducibles. Under certain mild conditions on a positive algebraic number $\alpha$, the additive monoid $M_\alpha$ of the evaluation semiring $\mathbb{N}_0[\alpha]$ is atomic. The atomic structure of both the additive and the multiplicative monoids of $\mathbb{N}_0[\alpha]$ has been the subject of several recent papers. Here we focus on the monoids $M_\alpha$, and we study its omega-primality and elasticity, aiming to better understand some fundamental questions about their atomic decompositions. We prove that when $\alpha$ is less than 1, the atoms of $M_\alpha$ are as far from being prime as they can possibly be. Then we establish some results about the elasticity of $M_\alpha$, including that when $\alpha$ is rational, the elasticity of $M_\alpha$ is full (this was previously conjectured by S. T. Chapman, F. Gotti, and M. Gotti).

## 296) Kunal Kapoor (PRIMES) and Jun Wan (MIT), Consensus under a Dynamic Synchronous Model (3 Jan 2022)

With the advance of blockchain and cryptocurrency, the need for efficient and practical consensus algorithms is growing. However, most existing works only consider protocols under the synchronous setting. It is usually assumed that there exist at least $h$ users who are always honest and online. This is impractical as honest users might alternate between online and offline states. In this paper, we adapt Byzantine Broadcast protocols to a dynamic synchronous model which features sleepy/offline users as well as information gaps. We do this by building off an approach centered around a Trust Graph, modifying key algorithms from previous works such as the post-processing algorithm to ensure correctness with the dynamic model. This allows the creation of a more fault-tolerant protocol.

## 295) Andrew Du, Quaternion-Based Analytical Inverse Dynamics for the Human Body (31 Dec 2021)

The human body provides unique challenges to study from a dynamical perspective, due to its mechanical complexity and the difficulty of obtaining measurements of internal dynamic quantities. Thus, it is essential to create models that both simplify analysis and account for important anatomical details, the two of which must necessarily be balanced into a sufficiently accurate-yet-manageable framework. A number of critical applications require accurate inverse dynamic models of the human body, including medical treatment and virtual simulation of human motion. A recent general technique was developed by Dumas et. al. that used a quaternion screw algebra to make computation of inverse dynamic quantities more practical and more efficient. In this paper, we adapt their technique to the case of human anatomy, integrating these computational improvements within a novel framework for modeling human musculature.

## 294) Tanmay Gupta and Anshul Rastogi, Threshold-Based Inference of Dependencies in Distributed Systems (31 Dec 2021)

Many current online services rely on the interaction between different components that form a distributed system. Analyzing distributed systems is important in performance analysis (e.g. critical path analysis), debugging, and testing newfeatures. However, the analysis of these systems can be difficult due to limited knowledge of how components work and the variety of services and applications that are usually instrumented. The Mystery Machine , introduced by Chow et al. in 2014, has a “big data” approach, using logged events across many traces to generate and refine a causal model. We introduce Scooby Systems , our extension of The Mystery Machine ’s algorithm. We introduce thresholds to increase the tolerance to violations in the formation of causal relationships. In the future, we hope to improve Scooby Systems ’s scalability with a Hadoop MapReduce implementation.

## 293) Yihao (Michael) Huang and Claire Wang, Efficient Algorithms for Parallel Bi-core Decomposition (31 Dec 2021)

Graphs are used in the modeling of social networks, biological networks, user-product networks, and many other real-world relationships. Identifying dense regions within these graphs can often aid in applications including product-recommendation, spam identification, and protein-function discovery. A fundamental dense substructure discovery problem in graph theory is the k -core decomposition. However, the k -core decomposition does not directly apply to bipartite graphs, which are graphs that model the connections between two disjoint sets of entities, such as book-authorship, affiliation, and gene-disease association. Given the prevalence of bipartite graphs, solving the dense subgraph discovery problem on bipartite graphs has wide-reaching real-world impacts. In this paper, we solve the bipartite analogue of the k- core decomposition problem, which is the bi-core decomposition problem. Existing sequential bi-core decomposition algorithms are not scalable to large-scale bipartite graphs with hundreds of millions of edges. Therefore, we develop a theoretically efficient parallel bi-core decomposition algorithm. Our algorithm improves the theoretical bounds of existing algorithms, reducing the length of the computation graph’s longest dependency path, which asymptotically bounds the runtime of a parallel algorithm when there are sufficiently many processors. We prove the problem of bi-core decomposition to be P-complete. We also devise a parallel bi-core index structure to allow for fast queries of the computed cores. Finally, we provide optimized parallel implementations of our algorithms that are scalable and fast. Using 30 threads, our parallel bi-core decomposition algorithm achieves up to a 44x speedup over the best existing sequential algorithm and up to a 2.9x speedup over the best existing parallel algorithm. Our parallel query implementation is up to 22.3x faster than the existing sequential query implementation.

## 292) Raymond Feng, Andrew Lee, and Espen Slettnes, Results on Various Models of Mistake-Bounded Online Learning (29 Dec 2021)

We determine bounds for several variations of the mistake-bound model. The first half of our paper presents various bounds on the weak reinforcement model and the delayed, ambiguous reinforcement model. In both models, the adversary gives $r$ inputs in one round and only indicates a correct answer if all $r$ guesses are correct. The only difference between the two models is that in the delayed, ambiguous model, the learner must answer each input before receiving the next input of the round, while the learner receives all $r$ inputs at once in the modified weak reinforcement model. We also prove generalizations for multi-class functions. Then, we prove a lower and upper bound of the maximum factor gap that are tight up to a factor of $r$ between the modified weak reinforcement model and the standard model. Lastly, we also introduce several related models for learning with permutation patterns: the order model, the relative position model, and the delayed relative position model. In these models, a learner attempts to learn a permutation from a set of permutations $F$ by guessing statistics related to sub-permutations. We similarly define the notions of weak versus strong reinforcement and of delayed, ambiguous, reinforcement, and determine some sharp bounds by mimicking sorting algorithms.

## 291) Fenghuan (Linda) He, A Topological Centrality Measure for Directed Networks (24 Dec 2021; arXiv.org, 30 Jan 2022)

Given a directed network G , we are interested in studying the qualitative features of G which govern how perturbations propagate across G . Various classical centrality measures have been already developed and proven useful to capture qualitative features and behaviors for undirected networks. In this paper, we use topological data analysis (TDA) to adapt measures of centrality to capture both directedness and non-local propagating behaviors in networks. We introduce a new metric for computing centrality in directed weighted networks, namely the quasi-centrality measure. We compute these metrics on trade networks to illustrate that our measure successfully captures propagating effects in the network and can also be used to identify sources of shocks that can disrupt the topology of directed networks. Moreover, we introduce a method that gives a hierarchical representation of the topological influences of nodes in a directed network.

## 290) Joshua Guo (PRIMES) and Kevin Chang (MIT), On the Gauss-Epple homomorphism of the braid group $B_n$, and generalizations to Artin groups of crystallographic type (24 Dec 2021)

In this paper, we introduce a broad family of group homomorphisms that we name the Gauss-Epple homomorphisms. In the setting of braid groups, the Gauss-Epple invariant was originally defined by Epple based on a note of Gauss as an action of the braid group $B_n$ on the set $\{1, \dots, n\}\times\mathbb{Z}$; we prove that it is well-defined. We consider the associated group homomorphism from $B_n$ to the symmetric group $\text{Sym}(\{1, \dots, n\}\times\mathbb{Z})$. We prove that this homomorphism factors through $\mathbb{Z}^n\rtimes S_n$ (in fact, its image is an order 2 subgroup of the previous group). We also describe the kernel of the homomorphism and calculate the asymptotic probability that it contains a random braid of a given length. Furthermore, we discuss the super-Gauss-Epple homomorphism, a homomorphism which extends the generalization of the Gauss-Epple homomorphism and describe a related 1-cocycle of the symmetric group $S_n$ on the set of antisymmetric $n\times n$ matrices over the integers. We then generalize the super-Gauss-Epple homomorphism and the associated 1-cocycle to Artin groups of finite type. For future work, we suggest studying possible generalizations to complex reflection groups and computing the vector spaces of Gauss-Epple analogues.

## 289) Valeri Frumkin (MIT) and Rishabh Das (PRIMES), Thermal modulation of fluidic lenses in microgravity (22 Dec 2021)

The fluidic shaping method is an exciting new technology that allows to rapidly shape liquids into a wide range of optical topographies with sub-nanometer surface quality. The scale-invariance of the method makes it well suited for for space-based fabrication of large fluidic optics. However, in microgravity, the resulting optical topographies are limited to constant mean curvature surfaces. Here we study how variations in surface tension result in deviations from constant mean curvature topographies, allowing one to introduce optical corrections which would not be obtainable otherwise. Under the assumption of small thermal Peclet number, we derive a differential equation governing the steady-state shape of the liquid surface under the effect of spatially varying surface tension. This equation allows us to formulate an inverse problem of finding the required surface-tension distribution for a desired correction. Lastly, we provide several examples for surface tension distributions yielding required aspheric topographies.

## 288) Yi Liang (PRIMES) and James Unwin (University of Illinois at Chicago), COVID-19 Forecasts via Stock Market Indicators (arXiv.org, 13 Dec 2021)

Reliable short term forecasting can provide potentially lifesaving insights into logistical planning, and in particular, into the optimal allocation of resources such as hospital staff and equipment. By reinterpreting COVID-19 daily cases in terms of candlesticks, we are able to apply some of the most popular stock market technical indicators to obtain predictive power over the course of the pandemics. By providing a quantitative assessment of MACD, RSI, and candlestick analyses, we show their statistical significance in making predictions for both stock market data and WHO COVID-19 data. In particular, we show the utility of this novel approach by considering the identification of the beginnings of subsequent waves of the pandemic. Finally, our new methods are used to assess whether current health policies are impacting the growth in new COVID-19 cases.

## 287) Anuj Sakarda, Jerry Tan, and Armaan Tipirneni, On the Distance Spectra of Extended Double Stars (arXiv.org, 6 Dec 2021)

The distance matrix of a connected graph is defined as the matrix in which the entries are the pairwise distances between vertices. The distance spectrum of a graph is the set of eigenvalues of its distance matrix. A graph is said to be determined by its distance spectrum if there does not exist a non-isomorphic graph with the same spectrum. The question of which graphs are determined by their spectrum has been raised in the past, but it remains largely unresolved. In this paper, we prove that extended double stars are determined by their distance spectra.

## 286) Daniel Xia (PRIMES) and Pei-Ken Hung (University of Minnesota), A Minkowski-type inequality in the AdS-Melvin space (arXiv.org, 19 Nov 2021)

The AdS-Melvin spacetime was introduced by Astorino and models the AdS soliton with electromagnetic charge. It is a static spacetime with a time-symmetric Cauchy hypersurface, which we refer to as the AdS-Melvin space. In this paper, we study a sharp Minkowski-type inequality for surfaces embedded in the AdS-Melvin space. We first prove the inequality for special cases in which the surface enjoys axisymmetry or is a small perturbation of a coordinate torus. We then use a weighted normal flow to show that the inequality holds for general surfaces.

## 285) Jeremy Yu (PRIMES), Lu Lu (MIT), Xuhui Meng, and George Em Karniadakis, Gradient-enhanced physics-informed neural networks for forward and inverse PDE problems (arXiv.org, 1 Nov 2021), published in Computer Methods in Applied Mechanics and Engineering , vol. 393 (1 April 2022)

Deep learning has been shown to be an effective tool in solving partial differential equations (PDEs) through physics-informed neural networks (PINNs). PINNs embed the PDE residual into the loss function of the neural network, and have been successfully employed to solve diverse forward and inverse PDE problems. However, one disadvantage of the first generation of PINNs is that they usually have limited accuracy even with many training points. Here, we propose a new method, gradient-enhanced physics-informed neural networks (gPINNs), for improving the accuracy and training efficiency of PINNs. gPINNs leverage gradient information of the PDE residual and embed the gradient into the loss function. We tested gPINNs extensively and demonstrated the effectiveness of gPINNs in both forward and inverse PDE problems. Our numerical results show that gPINN performs better than PINN with fewer training points. Furthermore, we combined gPINN with the method of residual-based adaptive refinement (RAR), a method for improving the distribution of training points adaptively during training, to further improve the performance of gPINN, especially in PDEs with solutions that have steep gradients.

## 284) Felix Gotti (MIT) and Bangzheng Li (PRIMES), Atomic semigroup rings and the ascending chain condition on principal ideals (arXiv.org, 30 Oct 2021), published in Proceedings of the American Mathematical Society 151 (2023): 2291-2302

An integral domain is called atomic if every nonzero nonunit element factors into irreducibles. On the other hand, an integral domain is said to satisfy the ascending chain condition on principal ideals (ACCP) if every ascending chain of principal ideals terminates. It was asserted by Cohn back in the sixties that every atomic domain satisfies the ACCP, but such an assertion was refuted by Grams in the seventies with an explicit construction of a neat example. Still, atomic domains without the ACCP are notoriously elusive, and just a few classes have been found since Grams' first construction. In the first part of this paper, we generalize Grams' construction to provide new classes of atomic domains without the ACCP. In the second part of this paper, we construct what seems to be the first atomic semigroup ring without the ACCP in the existing literature.

## 283) Karthik Seetharaman, William Yue, and Isaac Zhu, Patterns in the Lattice Homology of Seifert Homology Spheres (arXiv.org, 26 Oct 2021)

In this paper, we study various homology cobordism invariants for Seifert fibered integral homology 3-spheres derived from Heegaard Floer homology. Our main tool is lattice homology, a combinatorial invariant defined by Ozsv\'ath-Szab\'o and N\'emethi. We reprove the fact that the $d$-invariants of Seifert homology spheres $\Sigma(a_1,a_2,\dots,a_n)$ and $\Sigma(a_1,a_2,\dots,a_n+a_1a_2\cdots a_{n-1})$ are the same using an explicit understanding of the behavior of the numerical semigroup minimally generated by $a_1a_2\cdots a_n/a_i$ for $i\in[1,n]$. We also study the maximal monotone subroots of the lattice homologies, another homology cobordism invariant introduced by Dai and Manolescu. We show that the maximal monotone subroots of the lattice homologies of Seifert homology spheres $\Sigma(a_1,a_2,\dots,a_n)$ and $\Sigma(a_1,a_2,\dots,a_n+2a_1a_2\cdots a_{n-1})$ are the same.

## 282) Christian Gaetz (MIT) and Ram K. Goel (PRIMES), Products of reflections in smooth Bruhat intervals (arXiv.org, 25 Oct 2021)

A permutation is called smooth if the corresponding Schubert variety is smooth. Gilboa and Lapid prove that in the symmetric group, multiplying the reflections below a smooth element $w$ in Bruhat order in a compatible order yields back the element $w$. We strengthen this result by showing that such a product in fact determines a saturated chain $e \to w$ in Bruhat order, and that this property characterizes smooth elements.

## 281) Yash Agarwal (PRIMES) and Sarah Greer (MIT), Convolutional encoder decoder network for the removal of coherent seismic noise (arXiv.org, 25 Oct 2021)

Seismologists often need to gather information about the subsurface structure of a location to determine if it is fit to be drilled for oil. However, there may be electrical noise in seismic data which is often removed by disregarding certain portions of the data with the use of a notch filter. Instead, we use a convolutional encoder decoder network to remove such noise by training the network to take the noisy shot record as input and remove the noise from the shot record as output. In this way, we retain important information about the data collected while still removing coherent noise in seismic data.

## 280) Sophia Benjamin, Arushi Mantri, and Quinn Perian, On the Wasserstein Distance Between $k$-Step Probability Measures on Finite Graphs (arXiv.org, 20 Oct 2021)

We consider random walks $X,Y$ on a finite graph $G$ with respective lazinesses $\alpha, \beta \in [0,1]$. Let $\mu_k$ and $\nu_k$ be the $k$-step transition probability measures of $X$ and $Y$. In this paper, we study the Wasserstein distance between $\mu_k$ and $\nu_k$ for general $k$. We consider the sequence formed by the Wasserstein distance at odd values of $k$ and the sequence formed by the Wasserstein distance at even values of $k$. We first establish that these sequences always converge, and then we characterize the possible values for the sequences to converge to. We further show that each of these sequences is either eventually constant or converges at an exponential rate. By analyzing the cases of different convergence values separately, we are able to partially characterize when the Wasserstein distance is constant for sufficiently large $k$.

## 279) Sheryl Hsu (PRIMES), Fidel I. Schaposnik Massolo (Université Libre de Bruxelles), and Laura P. Schaposnik (University of Illinois at Chicago), The Power of Many: A Physarum Swarm Steiner Tree Algorithm (arXiv.org, 15 Oct 2021)

We create a novel Physarum Steiner algorithm designed to solve the Euclidean Steiner tree problem. Physarum is a unicellular slime mold with the ability to form networks and fuse with other Physarum organisms. We use the simplicity and fusion of Physarum to create large swarms which independently operate to solve the Steiner problem. The Physarum Steiner tree algorithm then utilizes a swarm of Physarum organisms which gradually find terminals and fuse with each other, sharing intelligence. The algorithm is also highly capable of solving the obstacle avoidance Steiner tree problem and is a strong alternative to the current leading algorithm. The algorithm is of particular interest due to its novel approach, rectilinear properties, and ability to run on varying shapes and topological surfaces.

## 278) Alexander Tianlin Hu (PRIMES) and Andrey Boris Khesin (MIT), Improved Graph Formalism for Quantum Circuit Simulation (arXiv.org, 20 Sep 2021)

Improving the simulation of quantum circuits on classical computers is important for understanding quantum advantage and increasing development speed. In this paper, we explore a new way to express stabilizer states and further improve the speed of simulating stabilizer circuits with a current existing approach. First, we discover a unique and elegant canonical form for stabilizer states based on graph states to better represent stabilizer states and show how to efficiently simplify stabilizer states to canonical form. Second, we develop an improved algorithm for graph state stabilizer simulation and establish limitations on reducing the quadratic runtime of applying controlled-Pauli $Z$ gates. We do so by creating a simpler formula for combining two Pauli-related stabilizer states into one. Third, to better understand the linear dependence of stabilizer states, we characterize all linearly dependent triplets, revealing symmetries in the inner products. Using our novel controlled-Pauli $Z$ algorithm, we improve runtime for inner product computation from $O(n^3)$ to $O(nd^2)$ where $d$ is the maximum degree of the graph.

## 277) Sophie Zhu, Factorizations in evaluation monoids of Laurent semirings (arXiv.org, 26 Aug 2021), published in Communications in Algebra 50:6 (2022): 2719-2730

For a positive real number $α$, let $\mathbb{N}_0[α,α^{-1}]$ be the semiring of all real numbers $f(α)$ for $f(x)$ lying in $\mathbb{N}_0[x,x^{-1}]$, which is the semiring of all Laurent polynomials over the set of nonnegative integers $\mathbb{N}_0$. In this paper, we study various factorization properties of the additive structure of $\mathbb{N}_0[α, α^{-1}]$. We characterize when $\mathbb{N}_0[α, α^{-1}]$ is atomic. Then we characterize when $\mathbb{N}_0[α, α^{-1}]$ satisfies the ascending chain condition on principal ideals in terms of certain well-studied factorization properties. Finally, we characterize when $\mathbb{N}_0[α, α^{-1}]$ satisfies the unique factorization property and show that, when this is not the case, $\mathbb{N}_0[α, α^{-1}]$ has infinite elasticity.

## 276) Felix Gotti (MIT) and Bangzheng Li (PRIMES), Divisibility in rings of integer-valued polynomials (arXiv.org, 25 July 2021), published in The New York Journal of Mathematics 28 (2022): 117–139

In this paper, we address various aspects of divisibility by irreducibles in rings consisting of integer-valued polynomials. An integral domain is called atomic if every nonzero nonunit factors into irreducibles. Atomic domains that do not satisfy the ascending chain condition on principal ideals (ACCP) have proved to be elusive, and not many of them have been found since the first one was constructed by A. Grams in 1974. Here we exhibit the first class of atomic rings of integer-valued polynomials without the ACCP. An integral domain is called a finite factorization domain (FFD) if it is simultaneously atomic and an idf-domain (i.e., every nonzero element is divisible by only finitely many irreducibles up to associates). We prove that a ring is an FFD if and only if its ring of integer-valued polynomials is an FFD. In addition, we show that neither being atomic nor being an idf-domain transfer, in general, from an integral domain to its ring of integer-valued polynomials. In the same class of rings of integer-valued polynomials, we consider further properties that are defined in terms of divisibility by irreducibles, including being Cohen-Kaplansky and being Furstenberg.

## 275) Beining Zhou, High-Order Sensor Array Geometries for Improved Direction of Arrival Estimation in Signal Processing (9 July 2021)

In signal processing, the direction of arrival (DOA) estimation is a central problem to locate the source of a signal. It applies extensively in wireless communication systems such as radars and the GPS, in medical imaging, in telescopes, etc. Devising a signal sensor array geometry that achieves higher degree of freedom (DOF) has been a crucial challenge to improve the efficiency of DOA estimation. Recently, high-order cumulants are used extensively to construct high-order sensor arrays, but the state-of-the art high-order arrays are not optimal. This paper proposes novel sensor array geometries, the high-order embeded arrays (HOEA) for the 4th- and 6th-order and then extends those arrays to the 2$q$th-order by layering. Compared to previous methods, the proposed HOEA significantly improves the DOF generation from $O(2^{q}N^{2q})$ to $O(17^{q/3}N^{2q})$, which increases the theoretical efficiency by $25\%$ in the 4th order, $113\%$ in the 6th, and $352\%$ in the 12th order.

## 274) Benjamin Chen, Practical Anonymity Sets in a Pseudonymous Forum Setting (6 July 2021)

Pseudonymous forums are online websites where users can post publicly visible content and participate in discussions under a pseudonym. Such forums are not perfectly private, as their privacy can be compromised to traffic analysis attacks. However, many methods of providing perfect privacy to such a system come with a heavy performance cost—whether in bandwidth or latency. We examine the practicality of anonymity sets, a defense against such attacks that can still provide a formal privacy guarantee with less performance losses, and attempt to simulate their implementation in a real-world setting using real data scraped from Reddit, a popular pseudonymous forum. We try various different methods of creating these anonymity sets, finding that K-means with some dimensionality compression yields decent results; we also propose a method of defining a common traffic budget for members of a set. We find that anonymity sets are a feasible defense against such attacks in the pseudonymous forum setting.

## 273) Matthew Ding, An Analysis of Multi-hop Iterative Approximate Byzantine Consensus with Local Communication (27 June 2021)

Iterative Approximate Byzantine Consensus (IABC) is a fundamental problem of fault-tolerant distributed computing where machines seek to achieve approximate consensus to arbitrary exactness in the presence of Byzantine failures. We present a novel algorithm for this problem, named Relay-IABC, which relies on the usage of a multi-hop relayed messaging system and crytographically secure message signatures. The use of signatures and relays allows the strict necessary network conditions of traditional IABC algorithms to be circumvented. In addition, we show evidence that Relay-IABC achieves faster convergence than traditional algorithms even under these strict network conditions with both theoretical analysis and experimental results.

## 272) Jason Yang (PRIMES), Jun Wan (MIT), and Hanshen Xiao (MIT), Decentralized Gradient Descent: how network structure affects convergence (26 June 2021)

We investigate decentralized gradient descent among a network of nodes where an adversary has corrupted certain nodes. We focus on the case where the utility functions of all nodes are 1-dimensional quadratics, and where each corrupted node is connected to all honest nodes.

## 271) Sheryl Hsu (PRIMES) and Laura P. Schaposnik (University of Illinois at Chicago), Cell fusion through slime mold network dynamics (arXiv.org, 21 June 2021)

Physarum Polycephalum is a unicellular slime mold that has been intensely studied due to its ability to solve mazes, find shortest paths, generate Steiner trees, share knowledge, remember past events, and its applications to unconventional computing. The CELL model is a unicellular automaton introduced in the recent work of Gunji et al. in 2008, that models Physarum's amoeboid motion, tentacle formation, maze solving, and network creation. In the present paper, we extend the CELL model by spawning multiple CELLs, allowing us to understand the interactions between multiple cells, and in particular, their mobility, merge speed, and cytoplasm mixing. We conclude the paper with some notes about applications of our work to modeling the rise of present day civilization from the early nomadic humans and the spread of trends and information around the world. Our study of the interactions of this unicellular organism should further the understanding of how Physarum Polycephalum communicates and shares information.

## 270) Linda Chen, Communication Complexity of Byzantine Broadcast (19 June 2021)

Byzantine Broadcast is a fundamental problem in distributed computing, with communication complexity being an important aspect of Byzantine Broadcast protocols. In Byzantine Broadcast, a designated leader must ensure that all honest users in a distributed system reach a consensus, even in the presence of some dishonest users. Previous works have shown an $O(n^2)$ lower bound on communication complexity, as well as protocols with $O(n^2)$ communication complexity for the honest majority scenario. In this paper, we review the previous work and provide various methods and intuition towards a possible $O(n^3)$ communication complexity lower bound for dishonest majority Byzantine Broadcast.

## 2020 Research Papers

269) varun suraj (primes), catherine del vecchio fitz, laura kleiman, suresh bhavnani, chinmay jani, surbhi shah, rana mckay, jeremy warner, and gil alterovitz, smart covid navigator, a clinical decision support tool for covid-19 treatment: design and development study , published in journal of medical internet research 24, no. 2 (18 feb 2022).

COVID-19 caused by SARS-CoV-2 has infected 219 million individuals at the time of writing of this paper. A large volume of research findings from observational studies about disease interactions with COVID-19 is being produced almost daily, making it difficult for physicians to keep track of the latest information on COVID-19’s effect on patients with certain pre-existing conditions.

## 268) Ayshwarya Subramanian (Broad Institute), Mikhail Alperovich (PRIMES), Yiming Yang, and Bo Li, Biology-inspired data-driven quality control for scientific discovery in single-cell transcriptomics (bioRxiv.org, 28 Oct 2021)

Quality control (QC) of cells, a critical step in single-cell RNA sequencing data analysis, has largely relied on arbitrarily fixed data-agnostic thresholds on QC metrics such as gene complexity and fraction of reads mapping to mitochondrial genes. The few existing data-driven approaches perform QC at the level of samples or studies without accounting for biological variation in the commonly used QC criteria. We demonstrate that the QC metrics vary both at the tissue and cell state level across technologies, study conditions, and species. We propose data-driven QC ( ddqc ), an unsupervised adaptive quality control framework that performs flexible and data-driven quality control at the level of cell states while retaining critical biological insights and improved power for downstream analysis. On applying ddqc to 6,228,212 cells and 835 mouse and human samples, we retain a median of 39.7% more cells when compared to conventional data-agnostic QC filters. With ddqc , we recover biologically meaningful trends in gene complexity and ribosomal expression among cell-types enabling exploration of cell states with minimal transcriptional diversity or maximum ribosomal protein expression. Moreover, ddqc allows us to retain cell-types often lost by conventional QC such as metabolically active parenchymal cells, and specialized cells such as neutrophils or gastric chief cells. Taken together, our work proposes a revised paradigm to quality filtering best practices - iterative QC, providing a data-driven quality control framework compatible with observed biological diversity.

## 267) Robert H. Dolin, Shaileshbhai R. Gothi, Aziz Boxwala, Bret S. E. Heale, Ammar Husami, James Jones, Himanshu Khangar, Shubham Londhe, Frank Naeymi-Rad, Soujanya Rao, Barbara Rapchak, James Shalaby, Varun Suraj (PRIMES), Ning Xie, Srikar Chamala & Gil Alterovitz, vcf2fhir: a utility to convert VCF files into HL7 FHIR format for genomics-EHR integration , published in BMC Bioinformatics 22, article No. 104 (2 Mar 2021)

VCF formatted files are the lingua franca of next-generation sequencing, whereas HL7 FHIR is emerging as a standard language for electronic health record interoperability. A growing number of FHIR-based clinical genomics applications are emerging. Here, we describe an open source utility for converting variants from VCF format into HL7 FHIR format.

## 266) Quanlin Chen, The Center of the $q$-Weyl Algebra over Rings with Torsion (23 Jan 2021)

We compute the centers of the Weyl algebra, $q$-Weyl algebra, and the "first $q$-Weyl algebra" over the quotient of the ring $\mathbb{Z}/p^N \mathbb{Z}[q]$ by some polynomial $P(q)$. Through this, we generalize and "quantize" part of a result by Stewart and Vologodsky on the center of the ring of differential operators on a smooth variety over $\mathbb{Z}/p^N \mathbb{Z}$. We prove that a corresponding Witt vector structure appears for general $P(q)$ and compute the extra terms for special $P(q)$ with particular properties, answering a question by Bezrukavnikov of possible interpolation between two known results.

## 265) Tanisha Saxena and Daniel Xu, Graph Alignment-Based Protein Comparison (23 Jan 2021)

Inspired by the question of identifying mechanisms of viral infection, we are interested in the problem of comparing pairs of proteins, given by amino acid sequences and traces of their 3-dimensional structure. While it is true that the problem of predicting and comparing protein function is one of the most famous unsolved problems in computational biology, we propose a heuristic which poses it as a simple alignment problem, which - after some linear-algebraic pre-processing - is amenable to a dynamic programming solution.

## 264) Andrew Cai, Ratios of Naruse-Newton Coefficients Obtained from Descent Polynomials (arXiv.org, 20 Jan 2021)

We study Naruse-Newton coefficients, which are obtained from expanding descent polynomials in a Newton basis introduced by Jiradilok and McConville. These coefficients $C_0, C_1, \ldots$ form an integer sequence associated to each finite set of positive integers. For fixed nonnegative integers $a<b$, we examine the set $R_{a, b}$ of all ratios $\frac{C_a}{C_b}$ over finite sets of positive integers. We characterize finite sets for which $\frac{C_a}{C_b}$ is minimized and provide a construction to prove $R_{a, b}$ is unbounded above. We use this construction to obtain results on the closure of $R_{a, b}$. We also examine properties of Naruse-Newton coefficients associated with doubleton sets, such as unimodality and log-concavity. Finally, we find an explicit formula for all ratios $\frac{C_a}{C_b}$ of Naruse-Newton coefficients associated with ribbons of staircase shape.

## 263) Ishan Levy (MIT) and Justin Wu (PRIMES), The Borel Cohomology of Free Iterated Loop Spaces (16 Jan 2021; arXiv.org, 28 May 2021)

We compute the $\text{SO}(n+1)$-equivariant mod $2$ Borel cohomology of the free iterated loop space $Z^{S^n}$ when $n$ is odd and $Z$ is a product of mod $2$ Eilenberg Mac Lane spaces. When $n=1$, this recovers Ottosen and B\"okstedt's computation for the free loop space. The highlight of our computation is a construction of cohomology classes using an $O(n)$-equivariant evaluation map and a pushforward map. We then reinterpret our computation as giving a presentation of the zeroth derived functor of the Borel cohomology of $Z^{S^n}$ for arbitrary $Z$. We also include an appendix where we give formulas for computing the zeroth derived functor of the cohomology of mapping spaces, and study the dependence of such derived functors on the Steenrod operations.

## 262) Linda Chen, Reducing Round Complexity of Byzantine Broadcast (15 Jan 2021)

Byzantine Broadcast is an important topic in distributed systems and improving its round complexity has long been a focused challenge. Under honest majority, the state of the art for Byzantine Broadcast is 10 rounds for a static adversary and 16 rounds for an adaptive adversary. In this paper, we present a Byzantine Broadcast protocol with expected 8 rounds under a static adversary and expected 10 rounds under an adaptive adversary. We also generalize our idea to the dishonest majority setting and achieve an improvement over existing protocols.

## 261) Zarathustra Brady (MIT) and Holden Mui (PRIMES), Symmetric Operations on Domains of Size at Most 4 (15 Jan 2021)

To convert a fractional solution to an instance of a constraint satisfaction problem into a solution, a rounding scheme is needed, which can be described by a collection of symmetric operations with one of each arity. An intriguing possibility, raised in a recent paper by Carvalho and Krokhin, would imply that any clone of operations on a set $D$ which contains symmetric operations of arities $1, 2, \ldots, |D|$ contains symmetric operations of all arities in the clone. If true, then it is possible to check whether any given family of constraint satisfaction problems is solved by its linear programming relaxation. We characterize all idempotent clones containing symmetric operations of arities $1, 2, \ldots, |D|$ for all sets $D$ with size at most four and prove that each one contains symmetric operations of every arity, proving the conjecture above for $|D|{\leq}4$.

## 260) Yuxiao Wang, Asymptotics for Iterating the Lusztig-Vogan Bijection for $GL_n$ on Dominant Weights (15 Jan 2021)

In this paper, we iterate the explicit algorithm computing the Lusztig-Vogan bijection in Type $A$ ($GL_n$) on dominant weights, which was proposed by Achar and simplified by Rush. Our main result focuses on describing asymptotic behavior between the number of iterations for an input and the length of the input; we also present a recursive formula to compute the slope of the asymptote. This serves as another contribution to understanding the Lusztig-Vogan bijection from a combinatorial perspective and a first step in understanding the iterative behavior of the Lusztig-Vogan bijection in Type $A$.

## 259) Quanlin Chen, Tianze Jiang, and Yuxiao Wang, On the Generational Behavior of Gaussian Binomial Coefficients at Roots of Unity (15 Jan 2021)

The generational behavior of Gaussian binomial coefficients at roots of unity shadows the relationship between the reductive algebraic group in prime characteristic and the quantum group at roots of unity. In this paper, we study three ways of obtaining integer values from Gaussian binomial coefficients at roots of unity. We rigorously define the generations in this context and prove such behavior at primes power and two times primes power roots of unity. Moreover, we investigate and make conjectures on the vanishing, valuation, and sign behavior under the big picture of generations.

## 258) Fiona Abney-McPeek, Serena An, and Jakin Ng, The Stembridge Equality for Skew Stable Grothendieck Polynomials and Skew Dual Stable Grothendieck Polynomialsls (15 Jan 2021; arXiv.org, 9 Feb 2021)

The Schur polynomials $s_{\lambda}$ are essential in understanding the representation theory of the general linear group. They also describe the cohomology ring of the Grassmannians. For $\rho = (n, n-1, \dots, 1)$ a staircase shape and $\mu \subseteq \rho$ a subpartition, the Stembridge equality states that $s_{\rho/\mu} = s_{\rho/\mu^T}$. This equality provides information about the symmetry of the cohomology ring. The stable Grothendieck polynomials $G_{\lambda}$, and the dual stable Grothendieck polynomials $g_{\lambda}$, developed by Buch, Lam, and Pylyavskyy, are variants of the Schur polynomials and describe the $K$-theory of the Grassmannians. Using the Hopf algebra structure of the ring of symmetric functions and a generalized Littlewood-Richardson rule, we prove that $G_{\rho/\mu} = G_{\rho/\mu^T}$ and $g_{\rho/\mu} = g_{\rho/\mu^T}$, the analogues of the Stembridge equality for the skew stable and skew dual stable Grothendieck polynomials.

## 257) Samuel H. Florin (PRIMES), Matthew H. Ho (PRIMES), and Zilin Jiang (MIT), On the binary adder channel with complete feedback, with an application to quantitative group testing (arXiv.org, 25 Jan 2021), published in IEEE Transactions on Information Theory 68:5 (May 2022): 2839-2856

We determine the exact value of the optimal symmetric rate point in the Dueck zero-error capacity region of the binary adder channel with complete feedback. Our motivation is a problem in quantitative group testing. Given a set of $n$ elements two of which are defective, the quantitative group testing problem asks for the identification of these two defectives through a series of tests. Each test gives the number of defectives contained in the tested subset, and the outcomes of previous tests are assumed known at the time of designing the current test. We establish that the minimum number of tests is asymptotic to $(\log_2 n) / r$, where the constant $r \approx 0.78974$ lies strictly between the lower bound $5/7 \approx 0.71428$ due to Gargano et al. and the information-theoretic upper bound $(\log_2 3) / 2 \approx 0.79248$.

## 256) Adithya Balachandran, Andrew Huang, and Siwen Sun, Product Expansions of q -Character Polynomials (15 Jan 2021)

We consider certain class functions defined simultaneously on the groups $Gl_n(\mathbb{F}_q)$ for all n , which we also interpret as statistics on matrices. It has been previously shown that these simultaneous class functions are closed under multiplication, and we work towards computing the structure constants of this ring of functions. We derive general criteria for determining which statistics have nonzero expansion coefficients in the product of two fixed statistics. To this end, we introduce an algorithm that computes expansion coefficients in general, which we furthermore use to give closed form expansions in some cases. We conjecture that certain indecomposable statistics generate the whole ring, and indeed prove this to be the case for statistics associated with matrices consisting of up to 2 Jordan blocks. The coefficients we compute exhibit surprising stability phenomena, which in turn reflect stabilizations of joint moments as well as multiplicities in the irreducible decomposition of tensor products of representations of finite general linear groups.

## 255) Daniel Hong, Hyunwoo Lee, and Alex Wei, Optimal solutions and ranks in the max-cut SDP (15 Jan 2021)

The max-cut problem is a classical graph theory problem which is NP-complete. The best polynomial time approximation scheme relies on semidefinite programming (SDP). We study the conditions under which graphs of certain classes have rank 1 solutions to the max-cut SDP. We apply these findings to look at how solutions to the max-cut SDP behave under simple combinatorial constructions. Our results determine when solutions to the max-cut SDP for cycle graphs are rank 1. We find the solutions to the max-cut SDP of the vertex sum of two graphs. We then characterize the SDP solutions upon joining two triangle graphs by an edge sum.

## 254) Sam Florin, Matthew Ho, and Rahul Thomas, Group testing for two defectives and the zero-error channel capacity (14 Jan 2021)

The issue of identifying defects in a set with as few tests as possible has many applications, including in maximum efficiency pool testing during the COVID-19 pandemic. This research aims to determine the rate of growth of the number of tests required relative to the logarithm of the size of the set. In particular, we focus on the case where there are exactly two defects in the set, which is equivalent to the problem of determining the zero-error capacity of a two-user binary adder channel with complete feedback. The channel capacity is given by a non-linear optimization problem involving entropy functions, whose optimal value remains unknown. In this paper, using the linear dependence technique, we are able to reduce the complexity of the optimization problem significantly. We also gather numerical evidence for the conjectured optimal value.

## 253) Sarah Chen, In silico prediction of retained intron-derived neoantigens in leukemia (8 Jan 2021)

Alternative splicing is critical for the regulation and diversification of gene expression. Conversely, splicing dysregulation, caused by mutations in splicing machinery or splice junctions, is a hallmark of cancer. Tumor-specific isoforms are a potential source of neoantigens, cancer-specific peptides presented by human leukocyte antigen (HLA) class I molecules and potentially recognized by T cells. For cancers such as acute myeloid leukemia (AML) with a low mutation burden but widespread splicing aberrations, splice variants and retained introns (RIs) in particular, may broaden the number of suitable targets for immunotherapy. I developed a computational pipeline to predict AS-derived neoepitopes from tumor RNA-Seq. I first used the B721.221 B cell line as a model system, for which RNA-Seq, Ribo-Seq, and immunoproteome data from >90 HLA class I monoallelic lines were available. I performed de novo transcriptome assembly with StringTie, identifying on average 694±73 AS isoforms across 4 technical replicates. Using HLAthena, I identified 1,087 AS-derived neoepitopes predicted to bind across 4 frequent HLA alleles. Of them, 192 (18%) also displayed evidence of mRNA translation, measured as the alignment of ≥1 Ribo-Seq. To further increase prediction accuracy, I am currently analyzing the HLA I immunopeptidome to define the features of predicted AS isoforms more likely to be not only translated but also HLA presented. Finally, I applied my prediction pipeline to AML cell lines ( n =8) and primary samples ( n =7). I identified 682±113 AS isoforms in AML cell lines, similar to the 694 in B721, but the proportion of isoforms containing RIs (as opposed to alternative 5' and 3' splice sites or cassette exons) was 3.5x higher than in B721, in line with the biological relevance of RIs in particular in this disease setting. Primary AML samples yielded 1496±294 AS isoforms, more than twofold the number in B721 or AML cell lines, thus reinforcing the significant contribution of AS to the cancer immunopeptidome. Accurate prediction of AS-derived neoantigens through this pipeline will contribute to the design of novel cancer immunotherapies.

## 252) Kenta Suzuki (PRIMES) and Michael E. Zieve (University of Michigan), Meromorphic functions with the same preimages at several finite sets (31 Dec 2020)

Let $p$ and $q$ be nonconstant meromorphic functions on $\mathbb{C}^m$. We show that if $p$ and $q$ have the same preimages as one another, counting multiplicities, at each of four nonempty pairwise disjoint subsets $S_1,\ldots,S_4$ of $ \mathbb{C}$, then $p$ and $q$ have the same preimages as one another at each of infinitely many subsets of $ \mathbb{C}$, and moreover $g(p)=g(q)$ for some nonconstant rational function $g(x)$ whose degree is bounded in terms of the sizes of the $S_i$'s. This result is new already when $m=1$, and it implies many previous results about the extent to which a meromorphic function is determined by its preimages of a few points or a few small sets.

## 251) Yavor Litchev and Abigail Thomas, Hybrid Privacy Scheme (31 Dec 2020)

Local Differential Privacy (LDP) is an approach that allows a central server to compute on data submitted by multiple users while maintaining the privacy of each user. LDP is a very efficient approach to security; however, as privacy increases, the accuracy of these computations decreases. Multi-Party Computation (MPC) is a process by which multiple parties work together to compute the output of a function without revealing their own information. MPC is highly secure and accurate for such computations, but it is very computationally expensive and slow. The proposed hybrid privacy model harnesses the benefits of both LDP and MPC to create a secure, accurate, and fast algorithm for machine learning.

## 250) Ho Tin Fan and Alvin Lu, Parallel Batch-Dynamic 3-Vertex Subgraph Maintenance (31 Dec 2020)

Counting certain subgraphs is a fundamental problem that is crucial in recognizing patterns in large graphs, such as social networks and biological interactomes. However, many real world graphs are constantly evolving and are subject to changes over time, and previous work on efficient parallel subgraph counting algorithms either do not support dynamic modifications or do not extend to general subgraphs. This paper presents a theoretically-efficient and demonstrably fast algorithm for parallel batch-dynamic 3-vertex subgraph counting, and the underlying data structure can be extended to counting 4-vertex subgraph counts as well. The algorithm maintains the h -index of the graph, or the maximum h such that the graph contains h vertices with degree at least h , and uses this to update subgraph counts through an efficient traversal of two-paths, or wedges. For a batch of size b , the algorithm takes O( bh ) expected amortized work and O(log( bh )) span with high probability.

## 249) Kevin Edward Zhao (PRIMES), Vladislav Lialin & Anna Rumshisky (UMass Lowell), Text Is an Image: Augmentation via Embedding Mixing (30 Dec 2020)

Data augmentation techniques are essential for computer vision, yielding significant accuracy improvements with little engineering costs. However, data augmentation for text has always been tricky. Synonym replacement techniques require a good thesaurus and domain-specific rules for synonym selection from the synset, while backtranslation techniques are computationally expensive and require a good translation model for the language in interest. In this paper, we present simple text augmentation techniques on the embeddings level, inspired by mixing-based image augmentations. These techniques are language-agnostic and require little to no hyperparameter tuning. We evaluate the augmentation techniques on IMDB and GLUE tasks, and the results show that the augmentations significantly improve the score of the RoBERTa model.

## 248) Alvin Chen (PRIMES) and Kai Huang (MIT), Alpha invariants of $K$-semistable smooth toric Fano varieties (29 Dec 2020)

Jiang conjectured that the $\alpha$-invariant for $n$-dimensional $K$-semistable smooth Fano varieties has a gap between $\frac{1}{n}$ and $\frac{1}{n+1}$, where $\frac{1}{n+1}$ can only be achieved by projective $n$-space. Assuming a weaker version of Ewald's conjecture, we prove this gap conjecture in the toric case. We also prove a necessary and sufficient classification for all possible values of the $\alpha$-invariant for $K$-semistable smooth toric Fano varieties by providing an explicit construction of the polytopes that can achieve these values. This provides an important step towards understanding the types of polytopes that correspond to particular values of the $\alpha$-invariant; in particular, we show that $K$-semistable smooth Fano polytopes are centrally symmetric if and only if they have an $\alpha$-invariant of $\frac{1}{2}$. Lastly, we examine the effects of the Picard number on the $\alpha$-invariant, classifying the $K$-semistable smooth toric Fano varieties with Picard number 1 or 2 and their $\alpha$-invariants.

## 247) Vishnu Emani (PRIMES), Klaus Schmitz-Abe and Pankaj Agrawal (Boston Children's Hospital), Statistical Ranking Model for Candidate Genes in Rare Genetic Disorders (28 Dec 2020)

Genetic mutations are responsible for a significant number of rare diseases, and so investigating the genetic basis of various rare diseases has been a crucial area of study. More specifically, studying variants in the exome, the protein coding region which makes up approximately 1% of the human genome, has been proven effective at identifying the most likely pathogenic variants. The advent of whole exome and whole genome sequencing facilitates identification of the most likely pathogenic mutations much more efficiently and on a greater scale. Next-generation sequencing has been growing rapidly in the past decade and has led to numerous successful disease-detection pipelines. The pipeline involved in this study was the Variant Explorer Pipeline (VExP), developed by our laboratory to improve diagnostic yield. In the VExP pipeline, genetic variants are filtered based on a variety of criteria, which can be divided into the categories of genotype data and phenotype data (Figure 1). After the filtering process, the most likely variants are isolated, a process which requires meticulous examination of a large number of mutations. Furthermore, determining the strength of a phenotype match presents challenges because a number of resources need to be consulted to make an informed decision. The purpose of this project was to develop an automated algorithm, using a host of parameters, to rank mutation candidates based on the two computed scores for pathogenicity.

## 246) Neil Chowdhury, Modeling the Effect of Histone Methylation on Chromosomal Organization in Colon Cancer Cells (27 Dec 2020)

Loop extrusion and compartmentalization are the two most important processes regulating the high-level organization of DNA in the cell nucleus. These processes are largely believed to be independent and competing. Chromatin consists of nucleosomes, which contain coils of DNA wrapped around histone proteins. Besides packing DNA, nucleosomes contain an "epigenetic code" - tails of histone proteins are chemically modified at certain positions to leave certain "histone marks" on the chromatin fiber. This paper explores the effect of the H3K9me3 histone modification, which typically corresponds to inactive and repressed chromatin, on genome structure. Interestingly, in H3K9me3 domains, there are much fewer topologically associating domains (TADs) than in other domains, and there is a unique compartmentalization pattern. A high-resolution polymer model simulating both loop extrusion and compartmentalization is created to explore these differences.

## 245) Daniel Xu, Modeling of Network Based Digital Contact Tracing and Testing Strategies for the COVID-19 Pandemic (26 Dec 2020; arXiv.org, 28 Dec 2020), published in Mathematical Biosciences , vol. 338 (August 2021)

With more than 1.7 million COVID-19 deaths, identifying effective measures to prevent COVID19 is a top priority. We developed a mathematical model to simulate the COVID-19 pandemic with digital contact tracing and testing strategies. The model uses a real-world social network generated from a high-resolution contact data set of 180 students. This model incorporates infectivity variations, test sensitivities, incubation period, and asymptomatic cases. We present a method to extend the weighted temporal social network and present simulations on a network of 5000 students. The purpose of this work is to investigate optimal quarantine rules and testing strategies with digital contact tracing. The results show that the traditional strategy of quarantining direct contacts reduces infections by less than 20% without sufficient testing. Periodic testing every 2 weeks without contact tracing reduces infections by less than 3%. A variety of strategies are discussed including testing second and third degree contacts and the pre-exposure notification system, which acts as a social radar warning users how far they are from COVID-19. The most effective strategy discussed in this work was combined the pre-exposure notification system with testing second and third degree contacts. This strategy reduces infections by 18.3% when 30% of the population uses the app, 45.2% when 50% of the population uses the app, 72.1% when 70% of the population uses the app, and 86.8% when 95% of the population uses the app. When simulating the model on an extended network of 5000 students, the results are similar with the contact tracing app reducing infections by up to 79%.

## 244) Yongyi Chen (MIT) and Tae Kyu Kim (PRIMES), On Generalized Carmichael Numbers (15 Dec 2020; arXiv.org 5 Mar 2021)

Given an integer $k$, define $C_k$ as the set of integers $n > \max(k,0)$ such that $a^{n-k+1} \equiv a \pmod{n}$ holds for all integers $a$. We establish various multiplicative properties of the elements in $C_k$ and give a sufficient condition for the infinitude of $C_k$. Moreover, we prove that there are finitely many elements in $C_k$ with one and two prime factors if and only if $k>0$ and $k$ is prime. In addition, if all but two prime factors of $n \in C_k$ are fixed, then there are finitely many elements in $C_k$, excluding certain infinite families of $n$. We also give conjectures about the growth rate of $C_k$ with numerical evidence. We explore a similar question when both $a$ and $k$ are fixed and prove that for fixed integers $a \geq 2$ and $k$, there are infinitely many integers $n$ such that $a^{n-k} \equiv 1 \pmod{n}$ if and only if $(k,a) \neq (0,2)$ by building off the work of Kiss and Phong. Finally, we discuss the multiplicative properties of positive integers $n$ such that Carmichael function $\lambda(n)$ divides $n-k$.

## 243) William Qin, HOMFLY Polynomials of Pretzel Knots (11 Dec 2020; arXiv.org, 3 Jan 2021)

HOMFLY polynomials are one of the major knot invariants being actively studied. They are difficult to compute in the general case but can be far more easily expressed in certain specific cases. In this paper, we examine two particular knots, as well as one more general infinite class of knots. From our calculations, we see some apparent patterns in the polynomials for the knots $9_{35}$ and $9_{46}$, and in particular their $F$-factors. These properties are of a form that seems conducive to finding a general formula for them, which would yield a general formula for the HOMFLY polynomials of the two knots. Motivated by these observations, we demonstrate and conjecture some properties both of the $F$-factors and HOMFLY polynomials of these knots and of the more general class that contains them, namely pretzel knots with 3 odd parameters. We make the first steps toward a matrix-less general formula for the HOMFLY polynomials of these knots.

## 242) Jonathan Yin (PRIMES), Hattie Chung (Broad Institute), and Aviv Regev (Broad Institute), A multi-view generative model for molecular representation improves prediction tasks (7 Dec 2020), accepted paper for LMRL2020 (Learning Meaningful Representations of Life) workshop at NeurIPS 2020 (Thirty-fourth Conference on Neural Information Processing Systems)

Unsupervised generative models have been a popular approach to representing molecules. These models extract salient molecular features to create compact vec- tors that can be used for downstream prediction tasks. However, current generative models for molecules rely mostly on structural features and do not fully capture global biochemical features. Here, we propose a multi-view generative model that integrates low-level structural features with global chemical properties to create a more holistic molecular representation. In proof-of-concept analyses, compared to purely structural latent representations, multi-view latent representations improve model accuracy on various tasks when used as input to feed-forward prediction networks. For some tasks, simple models trained on multi-view representations perform comparably to more complex supervised methods. Multi-view represen- tations are an attractive method to improve representations in an unsupervised manner, and could be useful for prediction tasks, particularly in contexts where data is limited.

## 241) Yibo Gao (MIT), Joshua Guo (PRIMES), Karthik Seetharaman (PRIMES), and Ilaria Seidel (PRIMES), The Rank-Generating Functions of Upho Posets (arXiv.org, 3 Nov 2020), published in Discrete Mathematics 345:1 (Jan 2022)

Upper homogeneous finite type (upho) posets are a large class of partially ordered sets with the property that the principal order filter at every vertex is isomorphic to the whole poset. Well-known examples include k-array trees, the grid graphs, and the Stern poset. Very little is known about upho posets in general. In this paper, we construct upho posets with Schur-positive Ehrenborg quasisymmetric functions, whose rank-generating functions have rational poles and zeros. We also categorize the rank-generating functions of all planar upho posets. Finally, we prove the existence of an upho poset with uncomputable rank-generating function.

## 240) Jason Yang (PRIMES) and Jun Wan (MIT), On Updating and Querying Submatrices (arXiv.org, 25 Oct 2020)

In this paper, we study the $d$-dimensional update-query problem. We provide lower bounds on update and query running times, assuming a long-standing conjecture on min-plus matrix multiplication, as well as algorithms that are close to the lower bounds. Given a $d$-dimensional matrix, an \textit{update} changes each element in a given submatrix from $x$ to $x\bigtriangledown v$, where $v$ is a given constant. A \textit{query} returns the $\bigtriangleup$ of all elements in a given submatrix. We study the cases where $\bigtriangledown$ and $\bigtriangleup$ are both commutative and associative binary operators. When $d = 1$, updates and queries can be performed in $O(\log N)$ worst-case time for many $(\bigtriangledown,\bigtriangleup)$ by using a segment tree with lazy propagation. However, when $d\ge 2$, similar techniques usually cannot be generalized. We show that if min-plus matrix multiplication cannot be computed in $O(N^{3-\varepsilon})$ time for any $\varepsilon>0$ (which is widely believed to be the case), then for $(\bigtriangledown,\bigtriangleup)=(+,\min)$, either updates or queries cannot both run in $O(N^{1-\varepsilon})$ time for any constant $\varepsilon>0$, or preprocessing cannot run in polynomial time. Finally, we show a special case where lazy propagation can be generalized for $d\ge 2$ and where updates and queries can run in $O(\log^d N)$ worst-case time. We present an algorithm that meets this running time and is simpler than similar algorithms of previous works.

## 239) Vishaal Ram (PRIMES) and Laura P. Schaposnik (University of Illinois at Chicago), A modified age-structured SIR model for COVID-19 type viruses (arXiv.org, 23 Sept 2020), published in Nature Scientific Reports (2021) 11:15194

We present a modified age-structured SIR model based on known patterns of social contact and distancing measures within Washington, USA. We find that population age-distribution has a significant effect on disease spread and mortality rate, and contribute to the efficacy of age-specific contact and treatment measures. We consider the effect of relaxing restrictions across less vulnerable age-brackets, comparing results across selected groups of varying population parameters. Moreover, we analyze the mitigating effects of vaccinations and examine the effectiveness of age-targeted distributions. Lastly, we explore how our model can be applied to other states to reflect social-distancing policy based on different parameters and metrics.

## 238) Richard Chen (PRIMES), Feng Gui (MIT), Jason Tang (PRIMES), and Nathan Xiong (PRIMES), Few distance sets in $\ell_p$ spaces and $\ell_p$ product spaces (19 Sept 2020; arXiv.org, 26 Sept 2020), published in European Journal of Combinatorics 102 (May 2022)

Kusner asked if $n+1$ points is the maximum number of points in $\mathbb{R}^n$ such that the $\ell_p$ distance between any two points is $1$. We present an improvement to the best known upper bound when $p$ is large in terms of $n$, as well as a generalization of the bound to $s$-distance sets. We also study equilateral sets in the $\ell_p$ sums of Euclidean spaces, deriving upper bounds on the size of an equilateral set for when $p=\infty$, $p$ is even, and for any $1\le p<\infty$.

## 237) Tanya Khovanova (MIT) and Sean Li (PRIMES), The Penney's Game with Group Action (arXiv.org, 13 Sept 2020), published in Annals of Combinatorics (15 Jan 2022)

We generalize word avoidance theory by equipping the alphabet $\mathcal{A}$ with a group action. We call equivalence classes of words patterns. We extend the notion of word correlation to patterns using group stabilizers. We extend known word avoidance results to patterns. We use these results to answer standard questions for the Penney's game on patterns and show non-transitivity for the game on patterns as the length of the pattern tends to infinity. We also analyze bounds on the pattern-based Conway leading number and expected wait time, and further explore the game under the cyclic and symmetric group actions.

## 236) Ankit Bisain (PRIMES) and Eric J. Hanson (Brandeis University), The Bernardi Formula for Non-Transitive Deformations of the Braid Arrangement (7 Sept 2020; arXiv.org, 2 Oct 2020), published in The Electronic Journal of Combinatorics 28:4 (2021)

Bernardi has given a general formula to compute the number of regions of a deformation of the braid arrangement as a signed sum over boxed trees . We prove that the contribution to this sum of the set of boxed trees sharing an underlying rooted labeled tree is 0 or ±1 and give an algorithm for computing this value. We then restrict to arrangements which we call almost transitive and construct a sign-reversing involution which reduces Bernardi's signed sum to enumeration of a set of rooted labeled trees in this case. We conclude by explicitly enumerating the trees corresponding to the regions of certain nested Ish arrangements which we call non-negative , recovering their known counting formula.

## 235) Alejandro H. Morales (UMass Amherst) and William Shi (PRIMES), Refinements and Symmetries of the Morris identity for volumes of flow polytopes (7 Sept 2020; arXiv.org, 11 Feb 2021), published in Comptes Rendus Mathématique 359 (2021): 823-851

Flow polytopes are an important class of polytopes in combinatorics whose lattice points and volumes have interesting properties and relations. The Chan-Robbins-Yuen (CRY) polytope is a flow polytope with normalized volume equal to the product of consecutive Catalan numbers. Zeilberger proved this by evaluating the Morris constant term identity, but no combinatorial proof is known. There is a refinement of this formula that splits the largest Catalan number into Narayana numbers, which Mészáros gave an interpretation as the volume of a collection of flow polytopes. We introduce a new refinement of the Morris identity with combinatorial interpretations both in terms of lattice points and volumes of flow polytopes. Our results generalize Mészáros's construction and a recent flow polytope interpretation of the Morris identity by Corteel-Kim-Mészáros. We prove the product formula of our refinement following the strategy of the Baldoni-Vergne proof of the Morris identity. Lastly, we study a symmetry of the Morris identity bijectively using the Danilov-Karzanov-Koshevoy triangulation of flow polytopes and a bijection of Mészáros-Morales-Striker.

## 234) Vishaal Ram (PRIMES), Laura P. Schaposnik (University of Illinois at Chicago) et al., Extrapolating continuous color emotions through deep learning (2 Sept 2020), published in Physical Review Research 2:3 (September–November 2020)

By means of an experimental dataset, we use deep learning to implement an RGB (red, green, and blue) extrapolation of emotions associated to color, and do a mathematical study of the results obtained through this neural network. In particular, we see that males (type-$m$ individuals) typically associate a given emotion with darker colors, while females (type-$f$ individuals) associate it with brighter colors. A similar trend was observed with older people and associations to lighter colors. Moreover, through our classification matrix, we identify which colors have weak associations to emotions and which colors are typically confused with other colors.

## 233) Jesse Geneson, Suchir Kaustav, and Antoine Labelle (CrowdMath-2020), Extremal results for graphs of bounded metric dimension (arXiv.org, 31 Aug 2020), published in Discrete Applied Mathematics 309 (15 March 2022): 123-129

Metric dimension is a graph parameter motivated by problems in robot navigation, drug design, and image processing. In this paper, we answer several open extremal problems on metric dimension and pattern avoidance in graphs from (Geneson, Metric dimension and pattern avoidance, Discrete Appl. Math. 284, 2020, 1-7). Specifically, we construct a new family of graphs that allows us to determine the maximum possible degree of a graph of metric dimension at most $k$, the maximum possible degeneracy of a graph of metric dimension at most $k$, the maximum possible chromatic number of a graph of metric dimension at most $k$, and the maximum $n$ for which there exists a graph of metric dimension at most $k$ that contains $K_{n, n}$. We also investigate a variant of metric dimension called edge metric dimension and solve another problem from the same paper for $n$ sufficiently large by showing that the edge metric dimension of $P_n^{d}$ is $d$ for $n \geq d^{d-1}$. In addition, we use a probabilistic argument to make progress on another open problem from the same paper by showing that the maximum possible clique number of a graph of edge metric dimension at most $k$ is $2^{\Theta(k)}$. We also make progress on a problem from (N. Zubrilina, On the edge dimension of a graph, Discrete Math. 341, 2018, 2083-2088) by finding a family of new triples $(x, y, n)$ for which there exists a graph of metric dimension $x$, edge metric dimension $y$, and order $n$. In particular, we show that for each integer $k > 0$, there exist graphs $G$ with metric dimension $k$, edge metric dimension $3^k(1-o(1))$, and order $3^k(1+o(1))$.

## 232) William Li, Lebesgue Measure Preserving Thompson's Monoid (30 Aug 2020)

This paper defines Lebesgue measure preserving Thompson's monoid, denoted by $\mathbb{G}$, which is modeled on Thompson's group $\mathbb{F}$ except that the elements of $\mathbb{G}$ are non-invertible. Moreover, it is required that the elements of $\mathbb{G}$ preserve Lebesgue measure. Monoid $\mathbb{G}$ exhibits very different properties from Thompson's group $\mathbb{F}$. The paper studies a number of algebraic (group-theoretic) and dynamical properties of $\mathbb{G}$ including approximation, mixing, periodicity, entropy, decomposition, generators, and topological conjugacy.

## 231) Srinath Mahankali, Velocity Inversion Using the Quadratic Wasserstein Metric (24 Aug 2020; arXiv.org 26 Aug 2020)

Full-waveform inversion (FWI) is a method used to determine properties of the Earth from information on the surface. We use the squared Wasserstein distance (squared $W_2$ distance) as an objective function to invert for the velocity as a function of position in the Earth, and we discuss its convexity with respect to the velocity parameter. In one dimension, we consider constant, piecewise increasing, and linearly increasing velocity models as a function of position, and we show the convexity of the squared $W_2$ distance with respect to the velocity parameter on the interval from zero to the true value of the velocity parameter when the source function is a probability measure. Furthermore, we consider a two-dimensional model where velocity is linearly increasing as a function of depth and prove the convexity of the squared $W_2$ distance in the velocity parameter on large regions containing the true value. We discuss the convexity of the squared $W_2$ distance compared with the convexity of the squared $L^2$ norm, and we discuss the relationship between frequency and convexity of these respective distances. We also discuss multiple approaches to optimal transport for non-probability measures by first converting the wave data into probability measures.

## 230) Michael Gerovitch, Environment-aware Pedestrian Trajectory Prediction for Autonomous Driving (21 Aug 2020)

People's safety is a primary concern in autonomous driving. There exist efficient methods for identifying static obstacles. However, the prediction of future trajectories of moving elements, such as pedestrians crossing a street, is a much more challenging problem. A promising direction of research is the use of machine learning algorithms with location bias maps. Our goal was to further explore this idea by training an interchangeable location bias map, a location-specific feature that is added into the middle of a convolutional neural network. For different locations, we used different location bias maps to allow the network to learn from different setting contexts without overfitting to a specific setting. Using pre-annotated video footage of pedestrians moving around in crowded areas, we implemented a pedestrian behavior encoding scheme to generate input and output volumes for the neural network. Using this encoding scheme, we trained our neural network and interchangeable location bias map. Our research demonstrates that the network with an interchangeable location bias map can predict realistic pedestrian trajectories even when trained simultaneously in multiple settings.

## 229) Andrew Shen, Towards Proving Application Isolation for Cryptocurrency Hardware Wallets (22 Jul 2020)

We often perform security-sensitive operations in our day-to-day lives such as performing monetary transactions. To perform these operations securely, we can isolate the confirmation of such operations to separate hardware devices. However, proving that these devices operate securely is still difficult given the complexity of their kernels, yet important given the rise in popularity of cryptocurrency transaction devices. To support multiple cryptocurrencies and other functionality, these devices must be able to run multiple applications that are isolated from one another as they could be potentially maliciously acting applications. We can simplify our device by modeling it as running applications sequentially in user mode. We seek to prove that these applications cannot tamper with the kernel memory and show that the kernel protection is set up correctly. To do this, we developed a RISC-V machine emulator in Rosette, which enables us to reason about the behaviour of symbolic machine states and symbolic applications. We make progress towards verifying application isolation for launching and running applications on a simple kernel.

## 228) Andrey Boris Khesin (MIT) and Alexander Lu Zhang (PRIMES), On Quasisymmetric Functions with Two Bordering Variables (arXiv.org, 23 Jul 2020)

We extend past results on a family of formal power series $K_{n, \Lambda}$, parameterized by $n$ and $\Lambda \subseteq [n]$, that largely resemble quasisymmetric functions. This family of functions was conjectured to have the property that the product $K_{n, \Lambda}K_{m, \Omega}$ of any two functions $K_{n, \Lambda}$ and $K_{m, \Omega}$ from the family can be expressed as a linear combination of other functions from the family. In this paper, we show that this is indeed the case and that the span of the $K_{n, \Lambda}$'s forms an algebra. We also provide techniques for examining similar families of functions and a formula for the product $K_{n, \Lambda}K_{m, \Omega}$ when $n=1$.

## 227) Neel Bhalla, Constructing Workflow-centric Traces in Close to Real Time for the Hadoop File System (22 Jul 2020)

Diagnosing problems in large scale systems using cloud based distributed services is a challenging problem. Workflow-centric tracing captures the workflow (work done to process requests) and dependency graph of causally-related events among the components of a distributed system. But, constructing traces has historically been performed offline in batch fashion, so trace data is not immediately available to engineers for their diagnosis efforts. In this work, we present an approach based on graph abstraction and streaming framework to construct workflow-centric traces in near real time for the Hadoop file system. This approach will provide the network operators with a real time understanding of the distributed system behavior.

## 226) Yunseo Choi (PRIMES) and James Unwin (University of Illinois at Chicago), Racial Impact on Infections and Deaths due to COVID-19 in New York City (11 Jul 2020; arXiv.org , 9 Jul 2020), forthcoming in Harvard Technology Review

Redlining is the discriminatory practice whereby institutions avoided investment in certain neighborhoods due to their demographics. Here we explore the lasting impacts of redlining on the spread of COVID-19 in New York City (NYC). Using data available through the Home Mortgage Disclosure Act, we construct a redlining index for each NYC census tract via a multi-level logistical model. We compare this redlining index with the COVID-19 statistics for each NYC Zip Code Tabulation Area. Accurate mappings of the pandemic would aid the identification of the most vulnerable areas and permit the most effective allocation of medical resources, while reducing ethnic health disparities.

## 225) Sanath Govindarajan (PRIMES) and William S. Moses (MIT), SyFER-MLIR: Integrating Fully Homomorphic Encryption Into the MLIR Compiler Framework (3 Jul 2020)

Fully homomorphic encryption opens up the possibility of secure computation on private data. However, fully homomorphic encryption is limited by its speed and the fact that arbitrary computations must be represented by combinations of primitive operations, such as addition, multiplication, and binary gates. Integrating FHE into the MLIR compiler infrastructure allows it to be automatically optimized at many different levels and will allow any program which compiles into MLIR to be modified to be encrypted by simply passing another flag into the compiler. The process of compiling into an intermediate representation and dynamically generating the encrypted program, rather than calling functions from a library, also allows for optimizations across multiple operations, such as rewriting a DAG of operations to run faster and removing unnecessary operations.

## 224) Ethan Mendes (PRIMES) and Kyle Hogan (MIT), Defending Against Imperceptible Audio Adversarial Examples Using Proportional Additive Gaussian Noise (30 Jun 2020)

Neural networks are susceptible to adversarial examples, which are specific inputs to a network that result in a misclassification or an incorrect output. While most past work has focused on methods to generate adversarial examples to fool image classification networks, recently, similar attacks on automatic speech recognition systems have been explored. Due to the relative novelty of these audio adversarial examples, there exist few robust defenses for these attacks. We present a robust defense for inaudible or imperceptible audio adversarial examples. This approach mimics the adversarial strategy to add targeted proportional additive Gaussian noise in order to revert an adversarial example back to its original transcription. Our defense performs similarly to other defenses yet is the first randomized or probabilistic strategy. Additionally, we demonstrate the challenges that arise when applying defenses against adversarial examples for images to audio adversarial examples.

## 223) Walden Yan (PRIMES) and William S. Moses (MIT) , Token pairing to improve neural program synthesis models (30 Jun 2020)

In neural program synthesis (NPS), a network is trained to output or aid in the output of code that satisfies a given program specification. In our work, we make modifications upon the simple sequence-to-sequence (Seq2Seq) LSTM model. Extending the most successful techniques from previous works, we guide a beam search with an encoder-decoder scheme augmented with attention mechanisms and a specialized syntax layer. But one of the withstanding difficulties of NPS is the implicit tree structure of programs, which makes it inherently more difficult for linearly-structured models. To address this, we experiment with a novel technique we call token pairing . Our model is trained and evaluated on AlgoLisp, a dataset of English description-to-code programming problems paired with example solutions and test cases on which to evaluate programs. We also create a new interpreter for AlgoLisp that fixes the bugs present in the builtin executor. In the end, our model achieves 99.24% accuracy at evaluation, which greatly improves on the previous state-of-the-art of 95.80% while using fewer of parameters.

## 222) Zhenkun Li (MIT) and Jessica Zhang (PRIMES), Classification of tight contact structures on a solid torus (arXiv.org, 30 Jun 2020)

It is a basic question in contact geometry to classify all non-isotopic tight contact structures on a given 3-manifold. If the manifold has a boundary, we need also specify the dividing set on the boundary. In this paper, we answer the classification question completely for the case of a solid torus by writing down a closed formula for the number of non-isotopic tight contact structures with any given dividing set on the boundary of the solid torus. Previously, only a few special cases were known due to work by Honda.

## 221) Christian Gaetz (MIT) and Katherine Tung (PRIMES), The Sperner property for $132$-avoiding intervals in the weak order (arXiv.org, 29 Jun 2020), published in Bulletin of the London Mathematical Society 53:2 (April 2021): 442-457.

A well-known result of Stanley implies that the weak order on a maximal parabolic quotient of the symmetric group $S_n$ has the Sperner property; this same property was recently established for the weak order on all of $S_n$ by Gaetz and Gao, resolving a long-open problem. In this paper we interpolate between these results by showing that the weak order on any parabolic quotient of $S_n$ (and more generally on any $132$-avoiding interval) has the Sperner property. This result is proven by exhibiting an action of $\mathfrak{sl}_2$ respecting the weak order on these intervals. As a corollary we obtain a new formula for principal specializations of Schubert polynomials. Our formula can be seen as a strong Bruhat order analogue of Macdonald's reduced word formula. This proof technique and formula generalize work of Hamaker, Pechenik, Speyer, and Weigandt and Gaetz and Gao.

## 220) Yuxuan (Jason) Chen, Real World Application of Event-based End to End Autonomous Driving (29 Jun 2020)

End-to-end autonomous driving has recently been a popular area of study for deep learning. This work studies the use of event cameras for real-world deep learned driving task in comparison to traditions RGB cameras. In this work, we evaluate existing stateof-the-art event-based models on offline datasets, design a novel model that fuses the benefits from both event-based and traditional frame-based cameras, and integrate the trained models on board a full-scale vehicle. We conduct tests in a challenging track with features unseen to the model. Through our experiments and saliency visualization, we show that event-based models actually predict the existing motion of the car rather than the active control the car should take. Therefore, while event-based models excel at offline tasks such as motion estimation, our experiments reveal a fundamental challenge in applying event-based end-to-end learning to active control tasks, that the models need to learn reasoning about future actions with a feedback loop that impacts its future state.

## 219) Arun S. Kannan (MIT) and Honglin Zhu (PRIMES), Characters for Projective Modules in the BGG Category $\mathcal{O}$ for the Orthosymplectic Lie Superalgebra $\mathfrak{osp}(3|4)$ (arXiv.org, 11 Jun 2020), published in Journal of Algebra 569 (1 March 2021): 723-757

We determine the Verma multiplicities of standard filtrations of projective modules for integral atypical blocks in the BGG category $\mathcal{O}$ for the orthosymplectic Lie superalgebras $\mathfrak{osp}(3|4)$ by way of translation functors. We then explicitly determine the composition factor multiplicities of Verma modules using BGG reciprocity.

## 2019 Research Papers

218) espen slettnes, minimal embedding dimensions of rectangle k-visibility graphs , published in journal of graph algorithms and applications 25:1 (january 2021): 59-96..

Bar visibility graphs were adopted in the 1980s as a model to represent traces, e.g., on circuit boards and in VLSI chip designs. Two generalizations of bar visibility graphs, rectangle visibility graphs and bar $k$-visibility graphs, were subsequently introduced. Here, we combine bar $k$- and rectangle visibility graphs to form rectangle $k$-visibility graphs (R$k$VGs), and further generalize these to higher dimensions. A graph is a $d$-dimensional R$k$VG if and only if it can be represented with vertices as disjoint axis-aligned hyperrectangles in $d$-space, such that there is an axis-parallel line of sight between two hyperrectangles that intersects at most $k$ other hyperrectangles if and only if there is an edge between the two corresponding vertices. For any graph $G$ and a fixed $k$, we prove that given enough spatial dimensions, $G$ has a rectangle $k$-visibility representation, and thus we define the minimal embedding dimension (MED) with $k$-visibility of $G$ to be the smallest $d$ such that $G$ is a $d$-dimensional R$k$VG. We study the properties of MEDs and find upper bounds on the MEDs of various types of graphs. In particular, we find that the $k$-visibility MED of the complete graph on $m$ vertices $K_m$ is at most $\lceil{m/(2(k+1))}\rceil,$ of complete $r$-partite graphs is at most $r+1,$ and of the $m^{\rm th}$ hypercube graph $Q_m$ is at most $\lceil{2m/3}\rceil$ in general, and at most $\lfloor{\sqrt{m}\,}\rceil$ for $k=0,~ m \ne 2.$

## 217) Zhengyang (Leo) Dong (PRIMES) and Gil Alterovitz (MIT), netAE: Semi-supervised dimensionality reduction of single-cell RNA sequencing to facilitate cell labeling , published in Bioinformatics (29 Jul 2020)

Single-cell RNA sequencing allows us to study cell heterogeneity at an unprecedented cell-level resolution and identify known and new cell populations. Current cell labeling pipeline uses unsupervised clustering and assigns labels to clusters by manual inspection. However, this pipeline does not utilize available gold-standard labels because there are usually too few of them to be useful to most computational methods. This paper aims to facilitate cell labeling with a semi-supervised method in an alternative pipeline, in which a few gold-standard labels are first identified and then extended to the rest of the cells computationally. We built a semi-supervised dimensionality reduction method, a network-enhanced autoencoder (netAE). Tested on three public datasets, netAE outperforms various dimensionality reduction baselines and achieves satisfactory classification accuracy even when the labeled set is very small, without disrupting the similarity structure of the original space.

## 216) Tanya Khovanova (MIT) and Kevin Wu (PRIMES), Base 3/2 and Greedily Partitioned Sequences (arXiv.org, 19 Jul 2020)

We delve into the connection between base $\frac{3}{2}$ and the greedy partition of non-negative integers into 3-free sequences. Specifically, we find a fractal structure on strings written with digits 0, 1, and 2. We use this structure to prove that the even non-negative integers written in base $\frac{3}{2}$ and then interpreted in base 3 form the Stanley cross-sequence, where the Stanley cross-sequence comprises the first terms of the infinitely many sequences that are formed by the greedy partition of non-negative integers into 3-free sequences.

## 215) Dmitry Kleinbock (Brandeis University), Anurag Rao (Brandeis University), and Srinivasan Sathiamurthy (PRIMES), Critical loci of convex domains in the plane (26 Mar 2020; arXiv.org, 30 Mar 2020), published in Indagationes Mathematicae 32:3 (May 2021): 719-728.

Let $K$ be a bounded convex domain in $\mathbb{R}^2$ symmetric about the origin. The critical locus of $K$ is defined to be the (non-empty compact) set of lattices $\Lambda$ in $\mathbb{R}^2$ of smallest possible covolume such that $\Lambda \cap K= \lbrace 0\rbrace$. These are classical objects in geometry of numbers; yet all previously known examples of critical loci were either finite sets or finite unions of closed curves. In this paper we give a new construction which, in particular, furnishes examples of domains having critical locus of arbitrary Hausdorff dimension between $0$ and $1$.

## 214) P. A. Crowdmath, Propagation time for weighted zero forcing (arXiv.org, 15 May 2020)

Zero forcing is a graph coloring process that was defined as a tool for bounding the minimum rank and maximum nullity of a graph. It has also been used for studying control of quantum systems and monitoring electrical power networks. One of the problems from the 2017 AIM workshop "Zero forcing and its applications" was to explore edge-weighted probabilistic zero forcing, where edges have weights that determine the probability of a successful force if forcing is possible under the standard zero forcing coloring rule. In this paper, we investigate the expected time to complete the weighted zero forcing coloring process, known as the expected propagation time, as well as the time for the process to be completed with probability at least $\alpha$, known as the $\alpha$-confidence propagation time. We demonstrate how to find the expected and confidence propagation times of any edge-weighted graph using Markov matrices. We also determine the expected and confidence propagation times for various families of edge-weighted graphs including complete graphs, stars, paths, and cycles.

## 213) P. A. Crowdmath, Applications of the abc conjecture to powerful numbers (arXiv.org, 15 May 2020)

The abc conjecture is one of the most famous unsolved problems in number theory. The conjecture claims for each real $\epsilon > 0$ that there are only a finite number of coprime positive integer solutions to the equation $a+b = c$ with $c > (rad(a b c))^{1+\epsilon}$. If true, the abc conjecture would imply many other famous theorems and conjectures as corollaries. In this paper, we discuss the abc conjecture and find new applications to powerful numbers, which are integers $n$ for which $p^2 | n$ for every prime $p$ such that $p | n$. We answer several questions from an earlier paper on this topic, assuming the truth of the abc conjecture.

## 212) Alin Tomescu (MIT CSAIL), Robert Chen (PRIMES), Yiming Zheng (PRIMES), Ittai Abraham (VMware Research), Benny Pinkas (VMware Research and Bar Ilan University), Guy Golan Gueta (VMware Research), and Srinivas Devadas (MIT CSAIL), Towards Scalable Threshold Cryptosystems (9 Mar 2020), published in Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP) , San Francisco, CA, vol. 1, pp. 1242-1258.

The resurging interest in Byzantine fault tolerant systems will demand more scalable threshold cryptosystems. Unfortunately, current systems scale poorly, requiring time quadratic in the number of participants. In this paper, we present techniques that help scale threshold signature schemes (TSS), verifiable secret sharing (VSS) and distributed key generation (DKG) protocols to hundreds of thousands of participants and beyond. First, we use efficient algorithms for evaluating polynomials at multiple points to speed up computing Lagrange coefficients when aggregating threshold signatures. As a result, we can aggregate a 130,000 out of 260,000 BLS threshold signature in just 6 seconds (down from 30 minutes). Second, we show how "authenticating" such multipoint evaluations can speed up proving polynomial evaluations, a key step in communicationefficient VSS and DKG protocols. As a result, we reduce the asymptotic (and concrete) computational complexity of VSS and DKG protocols from quadratic time to quasilinear time, at a small increase in communication complexity. For example, using our DKG protocol, we can securely generate a key for the BLS scheme above in 2.3 hours (down from 8 days). Our techniques improve performance for thresholds as small as 255 and generalize to any Lagrange-based threshold scheme, not just threshold signatures. Our work has certain limitations: we require a trusted setup, we focus on synchronous VSS and DKG protocols and we do not address the worst-case complaint overhead in DKGs. Nonetheless, we hope it will spark new interest in designing large-scale distributed systems.

## 211) Daniil Kalinov (MIT) and Lev Kruglyak (PRIMES), The Rational Cherednik Algebra of Type $A_1$ with Divided Powers (5 Mar 2020), published in New York Journal of Mathematics 27 (2021): 1328-1346

Motivated by the recent developments of the theory of Cherednik algebras in positive characteristic, we study rational Cherednik algebras with divided powers. In our research we have started with the simplest case, the rational Cherednik algebra of type $A_1$. We investigate its maximal divided power extensions over $R[c]$ and $R$ for arbitrary principal ideal domains $R$ of characteristic zero. In these cases, we prove that the maximal divided power extensions are free modules over the base rings, and construct an explicit basis in the case of $R[c]$. In addition, we provide an abstract construction of the rational Cherednik algebra of type $A_1$ over an arbitrary ring, and prove that this generalization expands the rational Cherednik algebra to include all of the divided powers.

## 210) Sebastian Jeon (PRIMES) and Tanya Khovanova (MIT), 3-Symmetric Graphs (arXiv.org, 8 Mar 2020)

An intuitive property of a random graph is that its subgraphs should also appear randomly distributed. We consider graphs whose subgraph densities exactly match their expected values. We call graphs with this property for all subgraphs with $k$ vertices to be $k$-symmetric. We discuss some properties and examples of such graphs. We construct 3-symmetric graphs and provide some statistics.

## 209) Lucy Cai, Espen Slettnes, and Jeremy Zhou, A Combinatorial Approach to Extracting Rooted Tree Statistics from the Order Quasisymmetric Function (3 Mar 2020)

The chromatic symmetric function defined by Stanley is a power series that is symmetric in an infinite number of variables and generalizes the chromatic polynomial. Shareshian and Wachs defined the chromatic quasisymmetric function, and Awan and Bernardi defined an analog of it for digraphs. Three decades ago, Stanley posed a question equivalent to "Does the chromatic symmetric function distinguish between all trees?" A similar question can be raised for rooted trees: "Does the chromatic quasisymmetric function distinguish between all rooted trees?". Hasebe and Tsujie showed algebraically the stronger statement that the order quasisymmetric function distinguishes rooted trees. Here, we aim to directly extract useful statistics about a tree given only its order quasisymmetric function. This approach emphasizes the combinatorics of trees over the the algebraic properties of quasisymmetric functions. We show that a rooted-tree-statistic we name the "co-height profile profile" is extractable, and that it distinguishes rooted 2-caterpillars.

## 208) Heidi Lei, On the Hausdorff Dimension of the Visible Koch Curve (28 Feb 2020)

In geometry, a point in a set is visible from another point if the line segment connecting two points does not contain other points in the set. We show that the Hausdorff dimension is 1 for the portion of the Koch curve that is visible from points at infinity and points in certain defined regions of the plane.

## 207) Aditya Saligrama (PRIMES) and Guillaume Leclerc (MIT), Revisiting Ensembles in an Adversarial Context: Improving Natural Accuracy (arXiv.org, 26 Feb 2020), presented at the ICLR 2020 Workshop on Towards Trustworthy ML: Rethinking Security and Privacy for ML (26 April 2020) ( slides )

A necessary characteristic for the deployment of deep learning models in real world applications is resistance to small adversarial perturbations while maintaining accuracy on non-malicious inputs. While robust training provides models that exhibit better adversarial accuracy than standard models, there is still a significant gap in natural accuracy between robust and non-robust models which we aim to bridge. We consider a number of ensemble methods designed to mitigate this performance difference. Our key insight is that model trained to withstand small attacks, when ensembled, can often withstand significantly larger attacks, and this concept can in turn be leveraged to optimize natural accuracy. We consider two schemes, one that combines predictions from several randomly initialized robust models, and the other that fuses features from robust and standard models.

## 206) William Kuszmaul (MIT) and Alek Westover (PRIMES), In-Place Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory (25 Feb 2020)

We present an in-place algorithm for the parallel partition problem that has linear work and polylogarithmic span. The algorithm uses only exclusive read/write shared variables, and can be implemented using parallel-for-loops without any additional concurrency considerations (i.e., the algorithm is EREW). A key feature of the algorithm is that it exhibits provably optimal cache behavior, up to small-order factors. We also present a second in-place EREW algorithm that has linear work and span O (log n ·loglog n ), which is within an O (loglog n ) factor of the optimal span. By using this low-span algorithm as a subroutine within the cache-friendly algorithm, we are able to obtain a single EREW algorithm that combines their theoretical guarantees: the algorithm achieves span O (log n ·loglog n ) and optimal cache behavior. As an immediate consequence, we also get an in-place EREW quicksort algorithm with work O ( n log n ), span O (log 2 n ·loglog n ).

## 205) Justin Yu, On a rank game (22 Feb 2020)

We introduce a new game played by two players that generates an $(0,1)$-matrix of size $n$. The first player aims to maximize its resulting rank, while the second player aims to minimize it. We show that the first player can force almost full rank given additional power in move possibilities.

## 204) Benjamin Kang (PRIMES) and James Unwin (University of Illinois at Chicago), All-Pay Auctions as Models for Trade Wars and Military Annexation (arXiv.org, 10 Feb 2020), published in Letters in Spatial and Resource Sciences (13 May 2022)

We explore an application of all-pay auctions to model trade wars and territorial annexation. Specifically, in the model we consider the expected resource, production, and aggressive (military/tariff) power are public information, but actual resource levels are private knowledge. We consider the resource transfer at the end of such a competition which deprives the weaker country of some fraction of its original resources. In particular, we derive the quasi-equilibria strategies for two country conflicts under different scenarios. This work is relevant for the ongoing US-China trade war, and the recent Russian capture of Crimea, as well as historical and future conflicts.

## 203) Benjamin Kang (PRIMES) and James Unwin (University of Illinois at Chicago), All-Pay Auctions with Different Forfeits (arXiv.org, 7 Feb 2020), forthcoming in the Yau Competition finalists compendium

In an auction each party bids a certain amount and the one which bids the highest is the winner. Interestingly, auctions can also be used as models for other real-world systems. In an all pay auction all parties must pay a forfeit for bidding. In the most commonly studied all pay auction, parties forfeit their entire bid, and this has been considered as a model for expenditure on political campaigns. Here we consider a number of alternative forfeits which might be used as models for different real-world competitions, such as preparing bids for defense or infrastructure contracts.

## 202) Victoria Zhang, Patterns and Symmetries in Spiking Neural Networks (11 Jan 2020)

Inspired by recent progress in computational neuroscience and artificial intelligence, this paper explores rich temporal patterns in networks of neurons that communicate via electric pulses known as spikes. In particular, we describe the attractors in small circuits of spiking neurons with different symmetries and connectivities. Using methods developed in the theory of dynamical systems, we extend an analytical approach to capture the phase-locked states and their stability for a general N -cell system. We then systematically explore attractors in reduced state spaces via Poincaré maps for both all-to-all coupled and star-like coupled networks. We identify a sequence of bifurcations when the coupling strengths vary from inhibition to excitation. Moreover, using high-precision numerical simulations, we find two novel states in star-like networks that are unobserved in all-to-all networks: the death of oscillation for inhibitory coupling and quasi-periodic behaviors for excitatory coupling. Our results elucidate the interplay between dynamical patterns and symmetries in the building blocks of real networks. Furthermore, as self-sustained oscillations with pulsatile couplings are ubiquitous, our analysis may clarify understanding of not only neural dynamics but also other pulse-coupled oscillator systems such as non-linear electric circuits, wireless sensor networks, and self-organizing chemical reactions.

## 201) Zander Hill, Upper Bound on the Distortion of Cabled Knots (8 Jan 2020)

The torus knots are a class of knots generated by ordered pairs $(p,q)$ of relatively prime integers, where the $(p,q)$-torus knot is the curve defined by a ray of slope $\frac{p}{q}$ emanating from the origin in the representation of the torus as a square with opposing sides identified. Furthermore, given a curve $K$, we can define the $(p,q)$-cabling of $K$ to be the $(p,q)$-torus knot living on an embedding of the torus which follows $K$, as opposed to the standard embedding of the torus which follows $S^1$ in $\mathbb{R}^3$. We show that for all $p$ and $q \gg p$, there exists a curve in the isotopy class of the $(p,q)$-torus knot whose supremal ratio of arc length to Euclidean distance, called the distortion of the curve, is bounded above by $\frac{7q}{\log(q)}$, and additionally show that this bound holds for the $(p,q)$-cabling of any knot. This extends a result of Studer establishing sublinear upper bounds for the distortion of the $(2,q)-$torus knots.

## 200) Oliver Hayman (PRIMES) and Ashwin Narayan (MIT), Analyzing Visualization and Dimensionality-Reduction Algorithms (9 Jan 2020)

In order to find patterns among high dimensional data sets in scientific studies, scientists use mapping algorithms to produce representative two-dimensional or three-dimensional data sets that are easier to visualize. The most prominent of these algorithms is the t-Distributed Stochastic Neighbor Embedding algorithm (t-SNE). In this project, we create a metric for evaluating how clustered a data set is, and use it to measure how the perplexity parameter of the t-SNE algorithm affects the clustering of outputted data sets. Additionally, we propose a modification in which improved how well randomness is preserved in outputted data sets. Finally, we create a separate metric to test whether a group of points contains one or multiple clusters in a data set of centered clusters.

## 199) Frank Wang, The integral shuffle algebra and the $K$-theory of the Hilbert scheme of points in $\mathbb{A}^2$ (8 Jan 2020; arXiv.org, 12 Feb 2020)

We examine the shuffle algebra defined over the ring $\mathbf{R} = \mathbb{C}[q_1^{\pm 1}, q_2^{\pm 1}]$, also called the integral shuffle algebra, which was found by Schiffmann and Vasserot to act on the equivariant $K$-theory of the Hilbert Scheme of points in the plane. We find that the modules of 2 and 3 variable elements of the shuffle algebra are finitely generated, and prove a necessary condition for an element to be in the integral shuffle algebra for arbitrarily many variables.

## 198) Tejas Gopalakrishna (PRIMES) and Yichi Zhang (MIT), Analysis of the One Line Factoring Algorithm (6 Jan 2020)

For integers that fit within $42$ bits, a competitive factoring algorithm is the so-called One Line Factoring Algorithm proposed by William B. Hart. We analyze this algorithm in special cases, in particular, for semiprimes $N = pq$, and look for optimizations. We first observe the cases in which the larger or smaller prime is returned. We then show that when $p$ and $q$ are sufficiently close, we always finish on the first iteration. An upper bound can be found for the first iteration that successfully factors an odd semiprime. Using this upper bound, we demonstrate some simplifications to the algorithm for odd semiprimes in particular. One of our observations is that we only need to iterate numbers $\{ 0,1,3,5,7 \}$ modulo $8$, as the other iterators are very rarely the first that successfully factor the semiprime. Finally, we inspect the performance of the optimized algorithm.

## 197) Sunay Joshi, On the degenerate Turán problem and its variants (3 Jan 2020)

Given a family of graphs $\mathcal{F}$, a central problem in extremal graph theory is to determine the maximum number $\text{ex}(n,\mathcal{F})$ of edges in a graph on $n$ vertices that does not contain any member of $\mathcal{F}$ as a subgraph. The degenerate Turán problem regards the asymptotic behavior of $\text{ex}(n,\mathcal{F})$ for familes $\mathcal{F}$ of bipartite graphs. In this paper, we prove four new theorems regarding the extremal number and its variants. We begin by investigating several notions central to providing lower bounds on extremal numbers, including balanced rooted graphs and the Erdös--Simonovits Reduction Theorem. In addition, we present new lower bounds on the asymmetric extremal number $\text{ex}(m,n,F)$ and the lopsided asymmetric extremal number $\text{ex}^*(m,n,F)$ when $F$ is a blowup of a bipartite graph or a theta graph.

## 196) Alexander J. Ding, An Evaluation of UPC++ by Porting Shared-Memory Parallel Graph Algorithms (1 Jan 2020)

Unified Parallel C++ (UPC++), a C++ library, attempts to address the programming difficulty introduced by distributed parallel systems and still take advantage of the model's high scalability by exposing an API that represents the distributed memory as a contiguous global address space, similar to that of a sharedmemory parallel system. Though previous work, including the various benchmarks by UPC++ developers, has demonstrated the library's effectiveness in simple tasks and in porting distributed-memory parallel algorithms that are often implemented in OpenMPI, there lacks an assessment of the ease and effectiveness of porting shared-memory parallel algorithms into UPC++. We implement a number of graph algorithms in OpenMP, a common shared-memory parallel library, and port them into UPC++ in a locality-aware, communication-averse manner to evaluate the convenience, scalability, and robustness of UPC++. Tests on both a single-node, multicore system and the NERSC supercomputer (a multi-node system), with a plethora of real and random input graphs, demonstrate a number of prerequisites for high scalability in our UPC++ implementation: large input graphs, dense input graphs, and dense operations. Similar tests on our OpenMP implementation function as control, proving the algorithms' performance in shared-memory systems. Despite the relatively straightforward and naive porting from OpenMP, we still achieve competitive performance and scalability in dense algorithms on large inputs. The porting demonstrates UPC++'s ease of usage and good porting potential, especially when compared with other distributed libraries like OpenMPI. Finally, we extrapolate a distributed graph processing system on UPC++, optimized with a hybrid top-down/bottom-up approach, to simplify future distributed graph algorithm implementations.

## 195) Jason Yang (PRIMES), Martin Falk (MIT), and Sameer Abraham (MIT), The relationship between gene expression correlation and 3D genome organization (31 Dec 2019)

In some organisms such as E. coli and S. cerevisiae yeast, it is known that there is a relationship between the distance among genes and their coexpression (Pannier et. al., Kruglyak and Tang). It is also known that in general there is a relationship between gene function and genome structure (Szabo et. al). One might also expect to find a relationship between gene expression and TADs, which are domains within the genome where loci inside contact each other more frequently than loci outside. However, by analyzing data from Mus musculus brain cells, we do not find a relationship between gene pair correlation of single-cell RNA-seq gene expression and gene pair distance. Furthermore, despite the body of work linking gene expression and TAD structure, we also find no difference between gene pairs within a single TAD and between two TADs in terms of the relationship between gene pair distance and correlation. Additionally, we find that gene pair correlation is not related to the biological functions of the genes. However, there is a relationship between highly negative gene pair correlation and the number of times both genes are expressed 0 times across different cells.

## 194) Sarah Chen (PRIMES), Karl Clauser, Travis Law, and Tamara Ouspenskaia (Broad Institute), Seeking Neoantigen Candidates within Retained Introns (28 Dec 2019)

Major histocompatibility complex class I (MHC I) molecules present peptides from cytosolic proteins on the surface of cells. Cytotoxic T cells can recognize the presented antigens, and infected or cancerous cells that present non-self antigens can elicit an immune response. The identification of cancer-specific peptides (neoantigens) produced by somatic mutations in tumor cells and presented by MHC I molecules enables immunotherapies such as personalized cancer vaccines and adoptive T cell transfer. The state of the art approach searches for neoantigens derived from cancer-specific somatic variants and often falls short for cancers with few somatic mutations. Retained introns (RIs) resulting from splicing errors in cancer are an additional source of neoantigens. In this study, we identify RIs which are transcribed, translated, and contribute peptides to MHC I presentation. Using de novo transcriptome assembly of RNA-seq data,we identified 1799 RIs in B721.221 cells. Additionally, we detected 87 peptides from 83 RIs by liquid chromatography-tandem mass spectrometry of the MHC I immunopeptidome (LC-MS/MS). Finally, we use ribosome profiling (Ribo-seq), which provides a readout of mRNA translation, to identify RIs that are translated, a prerequisite for MHC I presentation. Previous studies have predicted thousands of RIs but have been able to validate only a handful through mass spectrometry. By distinguishing transcribed but untranslated versus translated candidates, Ribo-seq has the potential to improve RI predictions. We propose the use of a combination of RNA-seq and Ribo-seq, paired with mass spectrometry validation, to more accurately predict the contribution of RIs to the MHC I immunopeptidome, enabling the use of RI derived neoantigens in future immunotherapies.

## 193) Kevin Edward Zhao and Vishnu Emani, The Role of Protein Occupancy in DNA Compartmentalization (23 Dec 2019)

The organization of DNA throughout the genome is a complex process to study. Analysis reveals a checker-board pattern of separation at a megabase-pair scale, called compartments, which are captured well by the largest eigenvector of the Hi-C contact matrix. The sign of the eigenvector correlates with active and repressed areas of the genome. These compartments have been characterized into two categories, called A and B compartments, which are hypothesized to be spatially separated based upon the protein occupancy in the region. This project explores the factors that govern DNA compartmentalization, including the relationship between compartments and protein occupancy. In order to analyze contacts within the genome, Hi-C data was loaded and the eigenvectors of the contact matrix were computed. Protein occupancy in murine cortical neurons and neural progenitor cells was measured via ChIP-Seq. Using this data, we calculated the influence of several proteins on the sign of the Hi-C eigenvector via regression and Support Vector Machines (SVMs). Based on our findings, we tried to develop a simple model for compartments and explored this via simulations. We developed simple simulations of compartments based on ChIP-Seq data, and compared the results to compartments identified in experimental Hi-C maps. The results demonstrate a high correlation between the eigenvectors of the simulated and experimental Hi-C maps. In conclusion, the computational methods are effective at determining the proteins which most significantly contribute to compartmentalization.

## 192) Neil Chowdhury, A method to recognize universal patterns in genome structure using Hi-C (22 Dec 2019)

The expression of genes in cells is a complicated process. Expression levels of a gene are determined not only by its local neighborhood but also by more distal regions, as is the case with enhancer-promoter interactions, which can connect regions millions of bases away. The large-scale organization of DNA within the cell nucleus plays a substantial role in gene expression and cell fate, with recent developments in biochemical assays (such as Hi-C) generating quantitative maps of the higher-order structure of DNA. The interactions captured by Hi-C have been attributed to several distinct physical processes. One of the processes is that of segregation of DNA into compartmental domains by phase separation. While the current consensus is that there are broadly two types of compartmental domains (A and B), there is some evidence for a larger number of compartmental domains. Here a methodology to determine the identity and number of such compartments is presented, and it is observed that there are four distinct compartments within the genome.

## 191) Yizhen Chen, Mobile Sensor Networks: Bounds on Capacity and Complexity of Realizability (22 Dec 2019; arXiv.org, 21 Jan 2020), submitted to Electronic Journal of Combinatorics

In a restricted combinatorial mobile sensor network (RCMSN), there are n sensors that continuously receive and store information from outside. Every two sensors communicate exactly once, and at an event when two sensors communicate, they receive and store additionally all information the other has stored. C. Gu, I. Downes, O. Gnawali, and L. Guibas proposed a capacity of information diffusion in mobile sensor networks. They collected all information received by two sensors between a communication event and the previous communication events for each of them into one information packet, and considered the number of sensors a packet eventually reaches. Then they defined the capacity of an RCMSN to be the ratio of the average number of sensors the packets reach and the total number of sensors. While they have studied the expected capacity of an RCMSN (when the order of communications is random), we found the RCMSNs with maximum and minimum capacities. We also found the maximum, minimum, and expected capacities for several related mobile sensor network constructions, such as ones generated from intersections of lines, as well as complexity results concerning when a mobile sensor network can be generated in such geometric ways.

## 190) Andrew Zhang, Antimicrobial resistance prediction using deep convolutional neural networks on whole genome sequence data (19 Dec 2019)

We propose a method to determine whether a bacterial strain is resistant to an antibiotic based on its whole genome sequence data using deep machine learning – deep convolutional neural networks (DCNN). The DCNN model developed in this research is shown to achieve an average AMR prediction accuracy of 94.7%. Each prediction takes less than a second. The model is verified with Klebsiella pneumoniae resistance to tetracycline data and Acinetobacter baumannii resistance to carbapenem data from the public database PATRIC. The DCNN model is further tested with clinically collected genomic data of 149 strains of Mycobacterium tuberculosis, and achieves a prediction accuracy of 93.1% for resistance to pyrazinamide (PZA). To find genes that harbor mutations of PZA resistance, we build a Support Vector Machine (SVM) model tailored for VCF format genomic data, which has revealed two novel genes, embB and gyrA, that harbor mutations associated with PZA resistance besides the well-known pncA gene. Our DCNN and SVM Machine Learning framework, if used together with the real-time genome sequencing machines, which are now already available, could make rapid AMR predictions, allowing for critical time to ensure good patient outcomes and preventing outbreaks of deadly AMR infections. Furthermore, the developed framework identifies pertinent resistance genes, helping researchers understand the mechanisms behind resistance. Finally, this research demonstrates how deep machine learning techniques can produce high accuracy predictive models accelerating the diagnosis of AMR.

## 189) Rupert Li, Pulses of Flow-firing Processes (8 Dec 2019)

Flow-firing is a natural generalization of chip-firing, or the abelian sandpile model, to higher dimensions, operating on infinite planar graphs. The edges of the graph have flow, which is rerouted through the faces of the graph. We investigate initial flow configurations which display terminating behavior and global confluence, meaning the terminating configuration is unique. The pulse configuration over a hole, or a configuration of flow going around a face that cannot redirect flow, is known to display global confluence, and we expand this result to initial configurations that have multiple pulses, identifying which terminating configurations are possible. We also generalize the analysis of the global confluence of pulses to configurations with flow outside of the hole, especially to the configuration of a pulse with radius, and prove under what conditions this displays global confluence. We conclude with a conjecture on the global confluence of a generalization of a pulse with radius, a uniform conservative configuration, or contour.

## 188) Yibo Gao (MIT) and Rupert Li (PRIMES), Compatible Recurrent Identities of the Sandpile Group and Maximal Stable Configurations (18 Nov 2019; arXiv.org, 23 Aug 2020), published in Discrete Applied Mathematics 288 (15 Jan 2021): 123-137

In the abelian sandpile model, recurrent chip configurations are of interest as they are a natural choice of coset representatives under the quotient of the reduced Laplacian. We investigate graphs whose recurrent identities with respect to different sinks are compatible with each other. The maximal stable configuration is the simplest recurrent chip configuration, and graphs whose recurrent identities equal the maximal stable configuration are of particular interest, and are said to have the complete maximal identity property. We prove that given any graph $G$ one can attach trees to the vertices of $G$ to yield a graph with the complete maximal identity property. We conclude with several intriguing conjectures about the complete maximal identity property of various graph products.

## 187) Andrew Weinfeld, Bases for Quotients of Symmetric Polynomials (arXiv.org, 17 Nov 2019)

We create several families of bases for the symmetric polynomials. From these bases we prove that certain Schur symmetric polynomials form a basis for quotients of symmetric polynomials that generalize the cohomology and the quantum cohomology of the Grassmannian. Our work also provides an alternative proof of a result due to Grinberg.

## 186) Yuyuan Luo (PRIMES) and Laura P. Schaposnik (University of Illinois at Chicago), Minimal percolating sets for mutating infectious diseases (arXiv.org, 5 Nov 2019), published in Physical Review Research , vol. 2 (1 April 2020), featured in the Coronavirus (COVID-19) Collection from Physical Review journals by the American Physical Society

This paper is dedicated to the study of the interaction between dynamical systems and percolation models, with views towards the study of viral infections whose virus mutate with time. Recall that r-bootstrap percolation describes a deterministic process where vertices of a graph are infected once r neighbors of it are infected. We generalize this by introducing $F(t)$- bootstrap percolation , a time-dependent process where the number of neighbouring vertices which need to be infected for a disease to be transmitted is determined by a percolation function $F(t)$ at each time $t$. After studying some of the basic properties of the model, we consider smallest percolating sets and construct a polynomial-timed algorithm to find one smallest minimal percolating set on finite trees for certain $F(t)$-bootstrap percolation models.

## 185) Christopher Zhu, Enumerating Permutations and Rim Hooks Characterized by Double Descent Sets (arXiv.org, 28 Oct 2019)

Let $dd(I;n)$ denote the number of permutations of $[n]$ with double descent set $I$. For singleton sets $I$, we present a recursive formula for $dd(I;n)$ and a method to estimate $dd(I;n)$. We also discuss the enumeration of certain classes of rim hooks. Let $\mathcal{R}_I(n)$ denote the set of all rim hooks of length $n$ with double descent set $I$, so that any tableau of one of these rim hooks corresponds to a permutation with double descent set $I$. We present a formula for the size of $\mathcal{R}_I(n)$ when $I$ is a singleton set, and we also present a formula for the size of $\mathcal{R}_I(n)$ when $I$ is the empty set. We additionally present several conjectures about the asymptotics of certain ratios of $dd(I;n)$.

## 184) Nithin Kavi, Cutting and Gluing Surfaces (arXiv.org, 25 Oct 2019)

We start with a disk with $2n$ vertices along its boundary where pairs of vertices are connected with $n$ strips with certain restrictions. This forms a {\it pairing}. To relate two pairings, we define an operator called a cut-and-glue operation. We show that this operation does not change an invariant of pairings known as the {\it signature.} Pairings with a signature of $0$ are special because they are closely related to a topological construction through cut and glue operations that have other applications in topology. We prove that all balanced pairings for a fixed $n$ are connected on a surface with any number of boundary components. As a topological application, combined with works of Li, this shows that a properly embedded surface induces a well-defined grading on the sutured monopole Floer homology defined by Kronheimer and Mrowka.

## 183) Alejandro H. Morales (UMass Amherst) and Daniel G. Zhu (PRIMES), On the Okounkov-Olshanski formula for standard tableaux of skew shapes (arXiv.org, 9 Jul 2020); published in FPSAC 2020 Proceedings of the 32nd Conference on Formal Power Series and Algebraic Combinatorics (Online) and forthcoming in Combinatorial Theory

The classical hook length formula counts the number of standard tableaux of straight shapes. In 1996, Okounkov and Olshanski found a positive formula for the number of standard Young tableaux of a skew shape. We prove various properties of this formula, including three determinantal formulas for the number of nonzero terms, an equivalence between the Okounkov-Olshanski formula and another skew tableaux formula involving Knutson-Tao puzzles, and two $q$-analogues for reverse plane partitions, which complements work by Stanley and Chen for semistandard tableaux. We also give several reformulations of the formula, including two in terms of the excited diagrams appearing in a more recent skew tableaux formula by Naruse. Lastly, for thick zigzag shapes we show that the number of nonzero terms is given by a determinant of the Genocchi numbers and improve on known upper bounds by Morales-Pak-Panova on the number of standard tableaux of these shapes.

## 182) Alin Tomescu (MIT), Vivek Bhupatiraju (PRIMES), Dimitrios Papadopoulos (Hong Kong University of Science and Technology), Charalampos Papamanthou (University of Maryland, College Park), Nikos Triandopoulos (Stevens Institute of Technology), Srinivas Devadas and (MIT), Transparency Logs via Append-Only Authenticated Dictionaries , published in CCS '19 Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security , London, United Kingdom, November 11-15, 2019, pp. 1299-1316.

Transparency logs allow users to audit a potentially malicious service, paving the way towards a more accountable Internet. For example, Certificate Transparency (CT) enables domain owners to audit Certificate Authorities (CAs) and detect impersonation attacks. Yet, to achieve their full potential, transparency logs must be bandwidth-efficient when queried by users. Specifically, everyone should be able to efficientlylook up log entries by their keyand efficiently verify that the log remainsappend-only. Unfortunately, without additional trust assumptions, current transparency logs cannot provide both small-sizedlookup proofs and small-sizedappend-only proofs. In fact, one of the proofs always requires bandwidth linear in the size of the log, making it expensive for everyone to query the log. In this paper, we address this gap with a new primitive called anappend-only authenticated dictionary (AAD). Our construction is the first to achieve (poly)logarithmic size for both proof types and helps reduce bandwidth consumption in transparency logs. This comes at the cost of increased append times and high memory usage, both of which remain to be improved to make practical deployment possible.

## 181) Ezra Erives (PRIMES), Srinivasan Sathiamurthy (PRIMES), and Zarathustra Brady (MIT), Asymptotics of $d$-Dimensional Visibility (arXiv.org, 16 Sep 2019)

We consider the space $[0,n]^3$, imagined as a three dimensional, axis-aligned grid world partitioned into $n^3$ $1\times 1 \times 1$ unit cubes. Each cube is either considered to be empty, in which case a line of sight can pass through it, or obstructing, in which case no line of sight can pass through it. From a given position, some of these obstructing cubes block one's view of other obstructing cubes, leading to the following extremal problem: What is the largest number of obstructing cubes that can be simultaneously visible from the surface of an observer cube, over all possible choices of which cubes of $[0,n]^3$ are obstructing? We construct an example of a configuration in which $\Omega\big(n^\frac{8}{3}\big)$ obstructing cubes are visible, and generalize this to an example with $\Omega\big(n^{d-\frac{1}{d}}\big)$ visible obstructing hypercubes for dimension $d>3$. Using Fourier analytic techniques, we prove an $O\big(n^{d-\frac{1}{d}}\log n\big)$ upper bound in a reduced visibility setting.

## 180) Florian Naef (MIT) and Yuting Qin (PRIMES), The Elliptic Kashiwara-Vergne Lie algebra in low weights (arXiv.org, 7 Aug 2019)

In this paper, we study the elliptic Kashiwara-Vergne Lie Algebra $\mathfrak{krv}$, which is a certain Lie subalgebra of the Lie algebra of derivations of the free Lie algebra in two generators. It has a natural bigrading, such that the Lie bracket is of bidegree $(-1,-1)$. After recalling the graphical interpretation of this Lie algebra, we examine low degree elements of $\mathfrak{krv}$. More precisely, wцe find that $\mathfrak{krv}^{(2,j)}$ is one-dimensional for even $j$ and zero $j$ odd. We also compute $\operatorname{dim}(\mathfrak{krv})^{(3,m)} = \lfloor\frac{m-1}{2}\rfloor - \lfloor\frac{m-1}{3}\rfloor$. In particular, we show that in those degrees there are no odd elements and also confirm Enriquez' conjecture in those degrees.

## 179) Vincent Huang (PRIMES) and James Unwin (University of Illinois, Chicago), Markov Chain Models of Refugee Migration Data (arXiv.org, 19 March 2019), published in IMA Journal of Applied Mathematics (2020): 1-21

The application of Markov chains to modelling refugee crises is explored, focusing on local migration of individuals at the level of cities and days. As an explicit example we apply the Markov chains migration model developed here to UNHCR data on the Burundi refugee crisis. We compare our method to a state-of-the-art `agent-based' model of Burundi refugee movements, and highlight that Markov chain approaches presented here can improve the match to data while simultaneously being more algorithmically efficient.

## 178) Sean Elliott, Anti-Ramsey Type Problems (14 March 2019)

A classical theorem due to Ramsey says the following: Given a finite number of colors and a positive integer p, any edge-coloring of the complete graph $K_n$ will contain a monochromatic copy of $K_p$ as long as n is sufficiently large. A related problem is to consider colorings of $K_n$ for which every copy of $K_4$ uses at least $3$ distinct colors, and ask for the minimum number of colors that can be used to produce such a coloring. Here we present an alternate proof of the best known upper bound, which is $2^{O(\sqrt{\log n})}$. We also consider the problem of covering a regular graph with regular bipartite subgraphs. The motivation for this problem comes from the example of covering $K_n$ with complete bipartite subgraphs, which can be done with $\log_{2} (n)$ many subgraphs. Here we show that with high probability, a random $d$-regular graph with an even number of vertices can be covered with $c {\log d}$ many regular bipartite subgraphs for an absolute constant $c$.

## 177) Alan Yan, Asymptotic Counting in Dynamical Systems (4 March 2019)

We consider several dynamically generated sets with certain measurable properties such as the diameters or angles. We define various counting functions on these geometric objects which quantify these properties and explore the asymptotics of these functions. We conjecture that these functions grow like power functions with exponent the dimension of the residual set. The main objects that we examine are Fatou components of the quadratic family and limit sets of Schottky groups. Finally, we provide heuristic algorithms to compute the counting functions in these examples in an attempt to confirm this conjecture.

## 176) Archer Wang, Hilbert Series of Quasiinvariant Polynomials (19 Feb 2019)

The space of quasiinvariant polynomials generalize that of symmetric polynomials: under the action of the symmetric group, the polynomials remain invariant to a certain order. We discern the structure and symmetries of quasiinvariant polynomials by way of examining the invariance of relevant polynomial spaces under certain specific group actions. Both pure and computational methods are employed in this pursuit. Felder and Veselov, when studying quasiinvariant polynomials, made a breakthrough discovery in computing their Hilbert series in fields of characteristic 0, and since then, quasiinvariant polynomials have been extensively studied due to their applications in representation theory, algebraic geometry, and mathematical physics. We investigate the Hilbert series of quasiinvariant polynomials that are divisible by a generic homogeneous polynomial. We also continue the previous work regarding their Hilbert series in fields of prime characteristic.

## 175) Sanjit Bhat (PRIMES), Dimitris Tsipras (MIT), and Aleksander Madry (MIT), Towards Efficient Methods for Training Robust Deep Neural Networks (13 Feb 2019).

In recent years, it has been shown that neural networks are vulnerable to adversarial examples, i.e., specially crafted inputs that look visually similar to humans yet cause machine learning models to make incorrect predictions. A lot of research has been focused on training robust models--models immune to adversarial examples. One such method is Adversarial Training, in which the model continuously trains on adversarially perturbed inputs. However, since these inputs require significant computation time to create, Adversarial Training is often much slower than vanilla training. In this work, we explore two approaches to increasthe efficiency of Adversarial Training. First, we study whether faster yet less accurate methods for generating adversarially perturbed inputs suffice to train a robust model. Second, we devise a method for asynchronous parallel Adversarial Training and analyze a phenomenon of independent interest that arises--staleness. Taken together, these two techniques enable comparable robustness on the MNIST dataset to prior art with a 26× reduction in training time from 4 hours to just 9 minutes.

## 174) Jesse Geneson (Iowa State University), Carl Joshua Quines (PRIMES), Espen Slettnes (PRIMES), Shen-Fu Tsai (Google), Expected capture time and throttling number for cop versus gambler (arXiv.org, 10 Feb 2019)

We bound expected capture time and throttling number for the cop versus gambler game on a connected graph with $n$ vertices, a variant of the cop versus robber game that is played in darkness, where the adversary hops between vertices using a fixed probability distribution. The paper that originally defined the cop versus gambler game focused on two versions, a known gambler whose distribution the cop knows, and an unknown gambler whose distribution is secret. We define a new version of the gambler where the cop makes a fixed number of observations before the lights go out and the game begins. We show that the strategy that gives the best possible expected capture time of $n$ for the known gambler can also be used to achieve nearly the same expected capture time against the observed gambler when the cop makes a sufficiently large number of observations. We also show that even with only a single observation, the cop is able to achieve an expected capture time of approximately $1.5n$, which is much lower than the expected capture time of the best known strategy against the unknown gambler (approximately $1.95n$).

## 173) John Kuszmaul, Verkle Trees (5 Feb 2019)

We present Verkle Trees, a bandwidth-efficient alternative to Merkle Trees. Merkle Trees are currently employed in a variety of applications in which membership proofs are sent across a network, including consensus protocols, public-key directories, cryptocurrencies such as Bitcoin, and Secure File Systems. A Merkle Tree with n leaves has $O({\log_2 n})$-sized proofs. In large trees, sending the proofs can dominate bandwidth consumption. Vector Commitments (VCs) pose a potential alternative to Merkle Trees, with constant-sized proofs. Unfortunately, VC construction time is $O(n^2)$, which is too large for many applications. We present Verkle Trees, which are constructed similarly to Merkle Trees, but using Vector Commitments rather than cryptographic hash functions. In a Merkle Tree, a parent node is the hash of its children. In a Verkle Tree, a parent node is the Vector Commitment of its children. A Verkle Tree with branching factor k achieves $O(kn)$ construction time and $O({\log_k n})$ membership proof-size. This means that the branching factor, k, offers a tradeoff between computational power and bandwidth. The bandwidth reduction is independent of the depth of the tree; it depends only on the branching factor. We find experimentally that with a branching factor of k = 1024, which provides a factor of 10 reduction in bandwidth, it takes 110.1 milliseconds on average per leaf to construct a Verkle Tree with $2^{14}$ leaves. A branching factor of k = 32, which provides a bandwidth reduction factor of 5, yields a construction time of 8.4 milliseconds on average per leaf for a tree with $2^{14}$ leaves. (The performance on a tree with $2^{14}$ leaves is representative of larger trees because the asymptotics already dominate the computation costs.) My role in this research project has been proving the time complexities of Verkle Trees, implementing Verkle Trees, and testing and benchmarking the implementation.

## 172) Andrew Ahn (MIT), Gopal Goel (PRIMES), and Andrew Yao (PRIMES), Derivative Asymptotics of Uniform Gelfand-Tsetlin Patterns (1 Feb 2019)

Bufetov and Gorin introduced the idea of applying differential operators which are diagonalized by the Schur functions to Schur generating functions, a generalization of probability generating functions to particle systems. This technology allowed the authors to access asymptotics of a variety of particle systems. We use this technique to analyze uniformly distributed Gelfand-Tsetlin patterns where the top row is fixed. In particular, we obtain limiting moments for the difference of empirical measure for two adjacent rows in uniformly random Gelfand-Tsetlin patterns.

## 171) William Fisher, Polynomial Wolff Axioms and Multilinear Kakeya-type Estimates for Bent Tubes in $R^n$ (31 Jan 2019)

In this paper we consider the applicability of Guth and Zahl's polynomial Wolff axioms to bent tubes. We demonstrate that Guth and Zahl's multilinear bounds hold for tubes defined by low degree algebraic curves with bounded $C^2$ -norms. To show this we give an exposition of their proof in a n -dimensional, k -linear context. In considering the ability to obtain linear bounds using the multilinear bounds we utilize the strategy of Guth and Bourgain. We find that the multilinear bounds obtained from Guth and Zahl's technique break the inductive structure of this process and thus provide inferior bounds to the endpoint cases of Bennett, Carbery, and Tao's multilinear bounds. We discuss future research directions, which could eventually remedy this, that improve multilinear bounds by adding the assumption that the collection of tubes lie near a k -plane.

## 170) Rinni Bhansali (PRIMES) and Laura P. Schaposnik (University of Illinois at Chicago), A Trust Model in Bootstrap Percolation (21 Jan 2019; arXiv.org, 23 May 2019), published in the Proceedings of the Royal Society A , vol. 476, no. 2235 (1 March 2020)

Bootstrap percolation is a class of monotone cellular automata describing an activation process which follows certain activation rules. In particular, in the classical r -neighbor bootstrap process on a graph G , a set A of initially infected vertices spreads by infecting vertices with at least r already-infected neighbors. Motivated by the study of social networks and biological interactions through graphs, where vertices represent people and edges represent the relations amongst them, we introduce here a novel model which we name T-bootstrap percolation ( T -BP). In this new model, vertices of the graph G are assigned random labels, and the set of initially infected vertices spreads by infecting (at each time step) vertices with at least a fixed number of already-infected neighbors of each label. The Trust Model for Bootstrap Percolation allows one to impose a preset level of skepticism towards a rumor, as it requires a rumor to be validated by numerous groups in order for it to spread, hence imposing a predetermined level of trust needed for the rumor to spread. By considering different random and non-random networks, we describe various properties of this new model (e.g., the critical probability of infection and the confidence threshold), and compare it to other types of bootstrap percolation from the literature, such as U -bootstrap percolation. Ultimately, we describe its implications when applied to rumor spread, fake news, and marketing strategies, along with potential future applications in modeling the spread of genetic diseases.

## 169) Stanley Wang, Connectedness of the Moduli Space of Genus 1 Planar Tropical Curves (arXiv.org, 12 Jan 2019)

Tropical geometry is a relatively recent field in mathematics created as a simplified model for certain problems in algebraic geometry. We introduce the definition of abstract and planar tropical curves as well as their properties, including combinatorial type and degree. We also talk about the moduli space, a geometric object that parameterizes all possible types of abstract or planar tropical curves subject to certain conditions. Our research focuses on the moduli spaces of planar tropical curves of genus one, arbitrary degree d and any number of marked, unbounded edges. We prove that these moduli spaces are connected.

## 168) Aayush Karan, Generating Set for Nonzero Determinant Links Under Skein Relation (arXiv.org, 6 Jan 2019), published in Topology and its Applications , vol. 265 (15 September 2019)

Traditionally introduced in terms of advanced topological constructions, many link invariants may also be defined in much simpler terms given their values on a few initial links and a recursive formula on a skein triangle. Then the crucial question to ask is how many initial values are necessary to completely determine such a link invariant. We focus on a specific class of invariants known as nonzero determinant link invariants, defined only for links which do not evaluate to zero on the link determinant. We restate our objective by considering a set $\mathcal{S}$ of links subject to the condition that if any three nonzero determinant links belong to a skein triangle, any two of these belonging to $\mathcal{S}$ implies that the third also belongs to $\mathcal{S}$. Then we aim to determine a minimal set of initial generators so that $\mathcal{S}$ is the set of all links with nonzero determinant. We show that only the unknot is required as a generator if the skein triangle is unoriented. For oriented skein triangles, we show that the unknot and Hopf link orientations form a set of generators.

## 167) Jiwon Choi, Gromov-Hausdorff Distance Between Metric Graphs (2 Jan 2019)

In this paper we study the Gromov-Hausdorff distance between two metric graphs. We compute the precise value of the Gromov-Hausdorff distance between two path graphs. Moreover, we compute the precise value of the Gromov-Hausdorff distance between a cycle graph and a tree. Given a graph X , we consider a graph Y that results from adding an edge to X without changing the number of vertices. We compute the precise value of the Gromov-Hausdorff distance between X and Y .

## 166) Kaiying Hou, Agent-based Models for Conservation Equations (31 Dec 2018)

In this research, we use agent-based models to solve conservation equations. A conservation equation is a partial differential equation that describes any conserved quantity by establishing a relationship between the density and the flux. It is used in areas such as traffic flow and fluid dynamics. Past research on numerically solving conservation equations mainly tackles the problem by establishing discrete cells in the space and approximating the densities in the cells. In this research, we use an agent-based model, in which we describe the solution through the movement of particles in the space. We propose an agent-based model for conservation equation in 1-D space. We found a change of variables that transforms the original conservation equation to the specific volume conservation equation. This transform allows us to apply results in finite volume method to the agent-based model and find a condition for the agent-based solution to converge to the exact solution of scalar conservation equations.

## 165) Andy Xu, Approximating the Hurwitz Zeta Function (22 Dec 2018)

This project aims to implement a MATLAB function that approximates the Hurwitz zeta function $\zeta(s, a)$. This is necessary because the naive implementation fails for certain input near critical values for $s$ and for $a$. Other series representations of the Hurwitz zeta function converge rapidly but do not handle complex values of $s$ and/or $a$. We also consider existing forms for the Hurwitz zeta function, including one given by Bailey and Borwein, and evaluate their overall performance.

## 164) Allen Wang (PRIMES) and Guangyi Yue (MIT), Relationship Between Mullineux Involution and the Generalized Regularization (arXiv.org, 19 Dec 2018), published in European Journal of Combinatorics 85 (March 2020)

The Mullineux involution is an important map on $p$-regular partitions that originates from the modular representation theory of $\mathcal{S}_n$. In this paper we study the Mullineux transpose map and the generalized column regularization and prove a condition under which the two maps are exactly the same. Our results generalize the work of Bessenrodt, Olsson and Xu, and the combinatorial constructions is related to the Iwahori-Hecke algebra and the global crystal basis of the basic $U_q(\widehat{\mathfrak{sl}}_b)$-module. In the conclusion, we provide several conjectures regarding the $q$-decomposition numbers and generalizations of results due to Fayers.

## 163) Maximillian Guo, Behavior of Bar-Natan Homology under Conway Mutation (18 Dec 2018)

The Bar-Natan homology is a perturbation of the Khovanov homology of a knot. Previous work has shown that Khovanov homology remains unchanged under Conway mutation of the knot diagram. We give an exact triangle with three different resolutions of a link and prove several lemmas relating the dimensions of different Bar-Natan chain complexes and homologies. These allow us to prove that the dimension of the Bar-Natan homology $BN^k (L; \mathbb{Z}/2\mathbb{Z})$ is invariant under Conway mutation.

## 162) Nithin Kavi (PRIMES), Wendy Wu (PRIMES), and Zhenkun Li (MIT), Trunk of Satellite and Companion Knots (arXiv.org, 8 Dec 2018), published in Topology and its Applications , vol. 272 (1 March 2020)

We study the knot invariant called trunk, as defined by Ozawa, and the relation of the trunk of a satellite knot with the trunk of its companion knot. Our first result is ${\rm trunk}(K) \geq n \cdot {\rm trunk}(J)$ where ${\rm trunk}(\cdot)$ denotes the trunk of a knot, $K$ is a satellite knot with companion $J$, and $n$ is the winding number of $K$. To upgrade winding number to wrapping number, which we denote by $m$, we must include an extra factor of $\frac{1}{2}$ in our second result $\text{trunk}(K)$ $>$ $(1/2)m\cdot \text{trunk}(J)$ since $m \geq n$. We also discuss generalizations of the second result.

## 161) Merrick Cai (PRIMES) and Daniil Kalinov (MIT), The Hilbert Series of the Irreducible Quotient of the Polynomial Representation of the Rational Cherednik Algebra of Type $A_{n-1}$ in Characteristic $p$ for $p|n-1$ (arXiv.org, 12 Nov 2018)

We study the irreducible quotient $\mathcal{L}_{t,c}$ of the polynomial representation of the rational Cherednik algebra $\mathcal{H}_{t,c}(S_n,\mathfrak{h})$ of type $A_{n-1}$ over an algebraically closed field of positive characteristic $p$ where $p|n-1$. In the $t=0$ case, for all $c\ne 0$ we give a complete description of the polynomials in the maximal proper graded submodule $\ker \mathcal{B}$, the kernel of the contravariant form $\mathcal{B}$, and subsequently find the Hilbert series of the irreducible quotient $\mathcal{L}_{0,c}$. In the $t=1$ case, we give a complete description of the polynomials in $\ker \mathcal{B}$ when the characteristic $p=2$ and $c$ is transcendental over $\mathbb{F}_2$, and compute the Hilbert series of the irreducible quotient $\mathcal{L}_{1,c}$. In doing so, we prove a conjecture due to Etingof and Rains completely for $p=2$, and also for any $t=0$ and $n\equiv 1\pmod{p}$. Furthermore, for $t=1$, we prove a simple criterion to determine whether a given polynomial $f$ lies in $\ker \mathcal{B}$ for all $n=kp+r$ with $r$ and $p$ fixed.

## 160) Tanya Khovanova (MIT) and Eric Zhang (PRIMES), On 3-Inflatable Permutations (arXiv.org, 22 Sept 2018), published in The Electronic Journal of Combinatorics 28:1 (2021)

Call a permutation $k$-inflatable if it can be "blown up" into a convergent sequence of permutations by a uniform inflation construction, such that this sequence is symmetric with respect to densities of induced subpermutations of length $k$. We study properties of 3-inflatable permutations, finding a general formula for limit densities of pattern permutations in the uniform inflation of a given permutation. We also characterize and find examples of $3$-inflatable permutations of various lengths, including the shortest examples with length $17$.

## 159) Sathwik Karnik, Bounds on the Maximal Cardinality of an Acute Set in a Hypercube (7 Sept 2018)

The acute set problem asks the following question: what is the maximal cardinality of a $d$-dimensional set of points such that all angles formed between any three points are acute? In this paper, we consider an analogous problem with the condition that the acute set is a subset of a $d$-dimensional unit hypercube. We provide an explicit construction and proof to show that a lower bound for the maximum cardinality of an acute set in $\{0,1\}^d$ is $2^{2^{\lfloor \log_3 d \rfloor}}$. Using a similar construction, we improve this lower bound to $2^{d/3}$. Through a consideration of points diagonally opposite a particular point on 2-faces, we improve the upper bound to $\left(1 + \dfrac{2}{d}\right)\cdot 2^{d-2}$. We then seek to generalize these findings and a combinatorial interpretation of the problem in $\{0,1\}^d$.

## 158) Vincent Bian, Special Configurations in Anchored Rectangle Packings (arXiv.org, 6 Sept 2018)

Given a finite set S in $[0,1]^2$ including the origin, an anchored rectangle packing is a set of non-overlapping rectangles in the unit square where each rectangle has a point of S as its left-bottom corner and contains no point of S in its interior. Allen Freedman conjectured in the 1960s one can always find an anchored rectangle packing with total area at least $1/2$. We verify the conjecture for point configurations whose relative positions belong to certain classes of permutations.

## 157) Tanya Khovanova (MIT) and Wayne Zhao (PRIMES), Mathematics of a Sudo-Kurve (arXiv.org, 20 Aug 2018), published in Recreational Mathematics Magazine , no. 10 (2018): 5-27.

We investigate a type of a Sudoku variant called Sudo-Kurve, which allows bent rows and columns, and develop a new, yet equivalent, variant we call a Sudo-Cube. We examine the total number of distinct solution grids for this type with or without symmetry. We study other mathematical aspects of this puzzle along with the minimum number of clues needed and the number of ways to place individual symbols.

## 156) Vinjai Vale, A new paradigm for computer vision based on compositional representation (14 May 2018)

Deep convolutional neural networks - the state-of-the-art technique in artificial intelligence for computer vision - achieve notable success rates at simple classification tasks, but are fundamentally lacking when it comes to representation. These neural networks encode fuzzy textural patterns into vast matrices of numbers which lack the semantically structured nature of human representations (e.g. "a table is a flat horizontal surface supported by an arrangement of identical legs"). This paper takes multiple important steps towards filling in these gaps. I first propose a series of tractable milestone problems set in the abstract two-dimensional ShapeWorld, thus isolating the challenge of object compositionality. Then I demonstrate the effectiveness of a new compositional representation approach based on identifying structure among the primitive elements comprising an image and representing this structure through an augmented primitive element tree and coincidence list. My approach outperforms Google's state-of-the-art Inception-v3 Convolutional Neural Network in accuracy, speed, and structural representation in my object representation milestone tasks. Finally, I present a mathematical framework for a probabilistic programming approach that can learn highly structured generative stochastic representations of compositional objects from just a handful of examples. This work is foundational for the future of general computer vision, and its applications are wide-reaching, ranging from autonomous vehicles to intelligent robotics to augmented and virtual reality.

## 155) Andrew Gritsevskiy (PRIMES) and Maksym Korablyov (MIT), Capsule networks for low-data transfer learning (arXiv.org, 26 Apr 2018)

We propose a capsule network-based architecture for generalizing learning to new data with few examples. Using both generative and non-generative capsule networks with intermediate routing, we are able to generalize to new information over 25 times faster than a similar convolutional neural network. We train the networks on the multiMNIST dataset lacking one digit. After the networks reach their maximum accuracy, we inject 1-100 examples of the missing digit into the training set, and measure the number of batches needed to return to a comparable level of accuracy. We then discuss the improvement in low-data transfer learning that capsule networks bring, and propose future directions for capsule research.

## 2017 Research Papers

154) tanya khovanova (mit) and joshua lee (primes), the 5-way scale (8 mar 2019), published in recreational mathematics magazine 11 (2019): 5-14.

In this paper, we discuss coin-weighing problems that use a 5-way scale which has five different possible outcomes: MUCH LESS, LESS, EQUAL, MORE, and MUCH MORE. The 5-way scale provides more information than the regular 3-way scale. We study the problem of finding two fake coins from a pile of identically looking coins in a minimal number of weighings using a 5-way scale. We discuss similarities and differences between the 5-way and 3-way scale. We introduce a strategy for a 5-way scale that can find both counterfeit coins among $2^k$ coins in $k+1$ weighings, which is better than any strategy for a 3-way scale.

## 153) Grace Tian, Multi-Crossing Numbers for Knots (26 Jan 2019)

We study the projections of a knot K that have only n -crossings. The n-crossing number of K is the minimum number of n -crossings among all possible projections of K with only n -crossings. We obtain new results on the relation between n -crossing number and (2 n − 1)-crossing number for every positive even integer n .

## 152) David Lu (PRIMES), Sanjit Bhat (PRIMES), Albert Kwon (MIT), and Srinivas Devadas (MIT), DynaFlow: An Efficient Website Fingerprinting Defense Based on Dynamically-Adjusting Flows (15 Oct 2018), published in Proceedings of the 2018 Workshop on Privacy in the Electronic Society (WPES 2018), pp. 109-113.

Website fingerprinting attacks enable a local adversary to determine which website a Tor user visits. In recent years, several researchers have proposed defenses to counter these attacks. However, these defenses have shortcomings: many do not provide formal guarantees of security, incur high latency and bandwidth overheads, and require a frequently-updated database of website traffic patterns. In this work, we introduce a new countermeasure, DynaFlow, based on dynamically-adjusting flows to protect against website fingerprinting. DynaFlow provides a similar level of security as current state-of-the-art while being over $40\%$ more efficient. At the same time, DynaFlow does not require a pre-established database and extends protection to dynamically-generated websites.

## 151) Mihir Singhal (PRIMES) and Christopher Ryba (MIT), Generalizations of Hall-Littlewood Polynomials (24 Sept 2018)

Hall-Littlewood polynomials are important functions in various fields of mathematics and quantum physics, and can be defined combinatorially using a model of path ensembles. Wheeler and Zinn-Justin applied a re ection construction to this model to obtain an expression for type BC Hall-Littlewood polynomials. Borodin applied a single-parameter deformation to the model and obtained a formula for generalized Hall-Littlewood polynomials. Borodin has asked whether a similar generalization could be applied to type BC Hall-Littlewood polynomials. We present the model incorporating Borodin's generalization. We also obtain expressions for polynomials that were previously studied by Borodin, in addition to an expression for generalized type BC Hall-Littlewood polynomials.

## 150) Gopal Goel (PRIMES) and Andrew Ahn (MIT), Discrete Derivative Asymptotics of the $\beta$-Hermite Eigenvalues (arXiv.org, 18 Sept 2018), published in Combinatorics, Probability and Computing (17 April 2019)

We consider the asymptotics of the difference between the empirical measures of the $\beta$-Hermite tridiagonal matrix and its minor. We prove that this difference has a deterministic limit and Gaussian fluctuations. Through a correspondence between measures and continual Young diagrams, this deterministic limit is identified with the Vershik-Kerov-Logan-Shepp curve. Moreover, the Gaussian fluctuations are identified with a sectional derivative of the Gaussian free field.

## 149) Franklyn Wang, Monodromy Groups of Indecomposable Rational Functions (10 Sept 2018)

The most important geometric invariant of a degree-$n$ complex rational function $f(X)$ is its monodromy group , which is a set of permutations of $n$ objects. This monodromy group determines several properties of $f(X)$. A fundamental problem is to classify all degree-$n$ rational functions which have special behavior, meaning that their monodromy group $G$ is not one of the two "typical" groups, namely $A_n$ or $S_n$. Many mathematicians have studied this problem, including Oscar Zariski, John Thompson, Robert Guralnick, and Michael Aschbacher. In this paper we bring this problem near completion by solving it when $G$ is in any of the classes of groups which previously seemed intractable. We introduce new techniques combining methods from algebraic geometry, Galois theory, group theory, representation theory, and combinatorics. The classification of rational functions with special behavior will have many consequences, including far-reaching generalizations of Mazur's theorem on uniform boundedness of rational torsion on elliptic curves and Nevanlinna's theorem on uniqueness of meromorphic functions with prescribed preimages of five points. This improved understanding of rational functions has potential significance in various fields of science and engineering where rational functions arise.

## 148) Michael Ma, New Results on Pattern-Replacement Equivalences: Generalizing a Classical Theorem and Revising a Recent Conjecture (6 Sept 2018)

In this paper we study pattern-replacement equivalence relations on the set $S_n$ of permutations of length $n$. Each equivalence relation is determined by a set of patterns, and equivalent permutations are connected by pattern-replacements in a manner similar to that of the Knuth relation. One of our main results generalizes the celebrated Erdös-Szekeres Theorem for permutation pattern-avoidance to a new result for permutation pattern-replacement. In particular, we show that under the $ \left \{ 123...k, k...321 \right \}$-equivalence, all permutations in $S_n$ are equivalent up to parity when $n \geq \Omega(k^2)$. Additionally, we extend the work of Kuszmaul and Zhou on an infinite family of pattern-replacement equivalences known as the rotational equivalences. Kuszmaul and Zhou proved that the rotational equivalences always yield either one or two nontrivial equivalence classes in Sn, and conjectured that the number of nontrivial classes depended only on the patterns involved in the rotational equivalence (rather than on $n$). We present a counterexample to their conjecture, and prove a new theorem fully classifying (for large $n$) when there is one nontrivial equivalence class and when there are two nontrivial equivalence classes. Finally, we computationally analyze the pattern-replacement equivalences given by sets of pairs of patterns of length four. We then focus on three cases, in which the number of nontrivial equivalence classes matches an OEIS sequence. For two of these we present full proofs of the enumeration and for the third we suggest a potential future method of proof.

## 147) Kyle Gatesman (PRIMES), James Unwin (University of Illinois at Chicago), Lattice Studies of Gerrymandering Strategies (arXiv.org, 8 Aug 2018), published in Political Analysis 29:2 (April 2021): 167-192

We propose three novel gerrymandering algorithms which incorporate the spatial distribution of voters with the aim of constructing gerrymandered, equal-population, connected districts. Moreover, we develop lattice models of voter distributions, based on analogies to electrostatic potentials, in order to compare different gerrymandering strategies. Due to the probabilistic population fluctuations inherent to our voter models, Monte Carlo methods can be applied to the districts constructed via our gerrymandering algorithms. Through Monte Carlo studies we quantify the effectiveness of each of our gerrymandering algorithms and we also argue that gerrymandering strategies which do not include spatial data lead to (legally prohibited) highly disconnected districts. Of the three algorithms we propose, two are based on different strategies for packing opposition voters, and the third is a new approach to algorithmic gerrymandering based on genetic algorithms, which automatically guarantees that all districts are connected. Furthermore, we use our lattice voter model to examine the effectiveness of isoperimetric quotient tests and our results provide further quantitative support for implementing compactness tests in real-world political redistricting.

## 146) William Zhang, Improved bounds on the extremal function of hypergraphs (arXiv.org, 5 Jul 2018)

A fundamental problem in pattern avoidance is describing the asymptotic behavior of the extremal function and its generalizations. We prove an equivalence between the asymptotics of the graph extremal function for a class of bipartite graphs and the asymptotics of the matrix extremal function. We use the equivalence to prove several new bounds on the extremal functions of graphs. We develop a new method to bound the extremal function of hypergraphs in terms of the extremal function of their associated multidimensional matrices, improving the bound of the extremal function of $d$-permutation hypergraphs of length $k$ from $O(n^{d-1})$ to $2^{O(k)}n^{d-1}$.

## 145) P. A. Crowdmath, The Broken Stick Project (arXiv.org, 16 May 2018)

The broken stick problem is the following classical question. You have a segment $[0,1]$. You choose two points on this segment at random. They divide the segment into three smaller segments. Show that the probability that the three segments form a triangle is $1/4$. The MIT PRIMES program, together with Art of Problem Solving, organized a high school research project where participants worked on several variations of this problem. Participants were generally high school students who posted ideas and progress to the Art of Problem Solving forums over the course of an entire year, under the supervision of PRIMES mentors. This report summarizes the findings of this CrowdMath project.

## 144) Aaron Kaufer, Superalgebra in characteristic 2 (arXiv.org, 3 Apr 2018)

Following the work of Siddharth Venkatesh, we study the category $\textbf{sVec}_2$. This category is a proposed candidate for the category of supervector spaces over fields of characteristic $2$ (as the ordinary notion of a supervector space does not make sense in charcacteristic $2$). In particular, we study commutative algebras in $\textbf{sVec}_2$, known as $d$-algebras, which are ordinary associative algebras $A$ together with a linear derivation $d:A \to A$ satisfying the twisted commutativity rule: $ab = ba + d(b)d(a)$. In this paper, we generalize many results from standard commutative algebra to the setting of $d$-algebras; most notably, we give two proofs of the statement that Artinian $d$-algebras may be decomposed as a direct product of local $d$-algebras. In addition, we show that there exists no noncommutative $d$-algebras of dimension $\leq 7$, and that up to isomorphism there exists exactly one $d$-algebra of dimension $7$. Finally, we give the notion of a Lie algebra in the category $\textbf{sVec}_2$, and we state and prove the Poincare-Birkhoff-Witt theorem for this category.

## 143) Kaiying Hou and Brian Rhee, Continuum Modelling of Traffic Systems with Autonomous Vehicles (17 Mar 2018)

Describing the behavior of automobile traffic via mathematical modeling and computer simulation has been a field of study conducted by mathematicians throughout the last century. One of the oldest models in traffic flow theory casts the problem in terms of densities and fluxes in partial differential conservation laws. In the past few years, the rise of autonomous vehicles (driven by software without human intervention) presents a new problem for classical traffic modeling. Autonomous vehicles react very differently from the traditional human-driven vehicles, resulting in modifications to the underlying partial differential equation constitutive laws. In this paper, we aim to provide insight into some new proposed constitutive laws by using continuum modelling to study traffic flows with a mix of human and autonomous vehicles. We also introduce various existing traffic flow models and present a new model for traffic flow that is based on an interaction between human drivers and autonomous vehicles where each vehicle can only measure the total density of surrounding cars, regardless of human or autonomous status. By implementing the Lax-Friedrichs scheme in Octave, we test how these different constitutive laws perform in our model and analyze the density curves that form over time steps. We also analytically derive and implement a Roe solver for a class of coupled conservation equations in which the velocities of cars are polynomial functions of the total density of surrounding cars regardless of type. We hope that our results could help civil engineers bring forth real progress in implementing efficient road systems that integrates both human-operated and unmanned vehicles.

## 142) Michael Gintz, Classifying Graph Lie Algebras (14 Mar 2018)

A Lie algebra is a linear object which has a powerful homomorphism with a Lie group, an important object in differential geometry. In previous work a construction is given that builds a Lie algebra on a Dynkin diagram, a commonly studied structure in Lie theory. We expand this definition to construct a Lie algebra given any simple graph, and consider the problem of determining its structure. We begin by defining an alteration on a graph which preserves its underlying graph Lie algebra structure, and use it to simplify the general graph. We then provide a decomposition move which further simplifies the Lie algebra structure of the general graph. Finally, we combine these two moves to classify all graph Lie algebras.

## 141) Sanjit Bhat (PRIMES), David Lu (PRIMES), Albert Kwon (MIT), and Srinivas Devadas (MIT), Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning (arXiv.org, 28 Feb 2018), published in Proceedings on Privacy Enhancing Technologies (PETS 2019) (4): 292-310.

In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over $1\%$ higher true positive rate (TPR) than state-of-the-art attacks while achieving $4\times$ lower false positive rate (FPR). Var-CNN's improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by $3.12\%$ while increasing the TPR by $13\%$. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues.

## 140) Richard Xu, Algebraicity regarding Graphs and Tilings (27 Jan 2018)

Given a planar graph G , we prove that there exists a tiling of a rectangle by squares such that each square corresponds to a face of the graph and the side lengths of the squares solve an extremal problem on the graph. Furthermore, we provide a practical algorithm for calculating the side lengths. Finally, we strengthen our theorem by restricting the centers and side lengths of the squares to algebraic numbers and explore the application of our technique in proving algebraicity in packing problems.

## 139) Anlin Zhang (PRIMES) and Laura P. Schaposnik (University of Illinois at Chicago), Modelling epidemics on d -cliqued graphs (published in Letters in Biomathematics 5:1 (Jan 16, 2018)

Since social interactions have been shown to lead to symmetric clusters, we propose here that symmetries play a key role in epidemic modelling. Mathematical models on d -ary tree graphs were recently shown to be particularly effective for modelling epidemics in simple networks. To account for symmetric relations, we generalize this to a new type of networks modelled on d -cliqued tree graphs, which are obtained by adding edges to regular d -trees to form d -cliques. This setting gives a more realistic model for epidemic outbreaks originating within a family or classroom and which could reach a population by transmission via children in schools. Specifically, we quantify how an infection starting in a clique (e.g. family) can reach other cliques through the body of the graph (e.g. public places). Moreover, we propose and study the notion of a safe zone , a subset that has a negligible probability of infection.

## 138) Dylan Pentland, Coefficients of Gaussian polynomials modulo N (arXiv.org, 30 Dec 2017)

The $q$-analogue of the binomial coefficient, known as a $q$-binomial coefficient, is typically denoted $\left[{n \atop k}\right]_q$. These polynomials are important combinatorial objects, often appearing in generating functions related to permutations and in representation theory. Stanley conjectured that the function $f_{k,R}(n) = \#\left\{i : [q^{i}] \left[{n \atop k}\right]_q \equiv R \pmod{N}\right\}$ is quasipolynomial for $N=2$. We generalize, showing that this is in fact true for any integer $N\in \mathbb{N}$ and determine a quasi-period $\pi'_N(k)$ derived from the minimal period $\pi_N(k)$ of partitions with at most $k$ parts modulo $N$.

## 137) Andy Xu and Wendy Wu, Higher Gonalities of Erdös-Rényi Random Graphs (22 Dec 2017)

We consider the asymptotic behavior of the second and higher gonalities of an Erdös-Rényi random graph and provide upper bounds for both via the probabilistic method. Our results suggest that for sufficiently large $n$, the second gonality of an Erdös-Rényi random Graph $G(n,p)$ is strictly less than and asymptotically equal to the number of vertices under a suitable restriction of the probability $p$. We also prove an asymptotic upper bound for all higher gonalities of large Erdös-Rényi random graphs that adapts and generalizes a similar result on complete graphs. We suggest another approach towards finding both upper and lower bounds for the second and higher gonalities for small $p=\frac{c}{n}$, using a special case of the Riemann-Roch Theorem, and fully determine the asymptotic behavior of arbitrary gonalities when $c\leq 1$.

## 136) Michael Ren (PRIMES) and Xiaomeng Xu (MIT), Quasi-invariants in characteristic p and twisted quasi-invariants (15 Nov 2017; arXiv.org, 31 Jul 2019)

The spaces of quasi-invariant polynomials were introduced by Feigin and Veselov, where their Hilbert series over fields of characteristic 0 were computed. In this paper, we show some partial results and make two conjectures on the Hilbert series of these spaces over fields of positive characteristic. On the other hand, Braverman, Etingof, and Finkelberg introduced the spaces of quasi-invariant polynomials twisted by a monomial. We extend some of their results to the spaces twisted by a smooth function.

## 135) David Darrow, A Novel, Near-Optimal Spectral Method for Simulating Fluids in a Cylinder (13 Nov 2017)

Simulations of fluid flow offer theoretical insight into fluid dynamics and critical applications in industry, with implications ranging from blood flow to hurricanes. However, open problems in fluid dynamics require more accurate simulations and lower computational resource costs than current algorithms provide. Accordingly, we develop in this paper a novel, computationally efficient spectral method for computing solutions of the incompressible Navier–Stokes equations, which model incompressible fluid flow, on the cylinder. The method described addresses three major limitations of current methods. First, while current methods either underresolve the cylinder's boundary or overresolve its center (effectively overemphasizing less physically interesting non-boundary regions), this new method more evenly resolves all parts of the cylinder. Secondly, current simulation times scale proportionally as $N^{7/3}$ or higher (where $N$ is the number of discretization points), while the new method requires at most $\mathcal{O}(N\log N)$ operations per time step. For large $N$, this means that calculations that required weeks can now be run in minutes. Lastly, current practical methods offer only low order (algebraic) accuracy. The new method has spectral accuracy, which often represents an improvement of the accuracy of the results by 5–10 orders of magnitude or more.

## 134) Espen Slettnes, Carl Joshua Quines, Shen-Fu Tsai, and Jesse Geneson (CrowdMath-2017), Variations of the cop and robber game on graphs (arXiv.org, 31 Oct 2017)

We prove new theoretical results about several variations of the cop and robber game on graphs. First, we consider a variation of the cop and robber game which is more symmetric called the cop and killer game. We prove for all $c < 1$ that almost all random graphs are stalemate for the cop and killer game, where each edge occurs with probability $p$ such that $\frac{1}{n^{c}} \le p \le 1-\frac{1}{n^{c}}$. We prove that a graph can be killer-win if and only if it has exactly $k\ge 3$ triangles or none at all. We prove that graphs with multiple cycles longer than triangles permit cop-win and killer-win graphs. For $\left(m,n\right)\neq\left(1,5\right)$ and $n\geq4$, we show that there are cop-win and killer-win graphs with $m$ $C_n$s. In addition, we identify game outcomes on specific graph products. Next, we find a generalized version of Dijkstra's algorithm that can be applied to find the minimal expected capture time and the minimal evasion probability for the cop and gambler game and other variations of graph pursuit. Finally, we consider a randomized version of the killer that is similar to the gambler. We use the generalization of Dijkstra's algorithm to find optimal strategies for pursuing the random killer. We prove that if $G$ is a connected graph with maximum degree $d$, then the cop can win with probability at least $\frac{\sqrt d}{1+\sqrt d}$ after learning the killer's distribution. In addition, we prove that this bound is tight only on the $\left(d+1\right)$-vertex star, where the killer takes the center with probability $\frac1{1+\sqrt d}$ and each of the other vertices with equal probabilities.

## 133) Ayush Agarwal (PRIMES) and Christian Gaetz (MIT), Differential posets and restriction in critical groups (arXiv.org, 23 Oct 2017), published in Algebraic Combinatorics , vol. 2:6 (2019): 1311-1327.

In recent work, Benkart, Klivans, and Reiner defined the critical group of a faithful representation of a finite group $G$, which is analogous to the critical group of a graph. In this paper we study maps between critical groups induced by injective group homomorphisms and in particular the map induced by restriction of the representation to a subgroup. We show that in the abelian group case the critical groups are isomorphic to the critical groups of a certain Cayley graph and that the restriction map corresponds to a graph covering map. We also show that when $G$ is an element in a differential tower of groups, critical groups of certain representations are closely related to words of up-down maps in the associated differential poset. We use this to generalize an explicit formula for the critical group of the permutation representation of the symmetric group given by the second author, and to enumerate the factors in such critical groups.

## 132) Louis Golowich (PRIMES) and Chiheon Kim (MIT), New Classes of Set-Sequential Tree (arXiv.org, 14 Oct 2017), published in Discrete Mathematics , vol. 343:3 (March 2020)

A graph is called set-sequential if its vertices can be labeled with distinct nonzero vectors in $\mathbb{F}_2^n$ such that when each edge is labeled with the sum$\pmod{2}$ of its vertices, every nonzero vector in $\mathbb{F}_2^n$ is the label for either a single vertex or a single edge. We resolve certain cases of a conjecture of Balister, Gyori, and Schelp in order to show many new classes of trees to be set-sequential. We show that all caterpillars $T$ of diameter $k$ such that $k \leq 18$ or $|V(T)| \geq 2^{k-1}$ are set-sequential, where $T$ has only odd-degree vertices and $|T| = 2^{n-1}$ for some positive integer $n$. We also present a new method of recursively constructing set-sequential trees.

## 131) Zachary Steinberg, Automated Segmentation of 3D Punctate Neural Expansion Microscopy Data (30 Sept 2017)

The comprehensive study of multiple-neuron circuits, known as connectomics, has historically been hampered by the time-consuming process of obtaining data with perfect morphological reconstructions of neurons. Existing attempts to automate the reconstruction of synaptic connnections have used electron microscope data to some success, but were limited due to the black-and-white nature of such data and the computational requirements of supervised learning. Now that multicolor data is available at 20nm resolution via Expansion Microscopy (ExM), creating an automated, reliable algorithm requiring minimal training that can process the future petabytes of neural tissue data in a reasonable amount of time is an open problem. Here, we outline an automated approach to segment neurons in a 20x expanded hippocampus slice expressing Brainbow fluorescent proteins. We first use a neural network as a mask to filter data, oversegment in color space to create supervoxels, and finally merge those supervoxels together to reconstruct the 3D volume for an individual neuron. The results demonstrate this approach shows promise to harness ExM data for 3D neural imaging. Our approach offers several insights that can guide future work.

## 130) Andrew Gritsevskiy, Towards Generative Drug Discovery: Metric Learning using Variational Autoencoders (30 Sept 2017)

We report a method for metric learning using an extended variational autoencoder. Our architecture, based on deep learning, provides the ability to learn a transformation- invariant metric on any set of data. Our architecture consists of a pair of encoding and decoding networks. The encoder network converts the data into differentiable latent representations, while the decoder network learns to convert these representations back into data. We then apply an additional set of losses to the encoder network, forcing it to learn codings that are independent of orientation and re ect the desired metric. Then, our architecture is able to predict the real metric for a set of data points, and can generate data points that match a set of requirements. We demonstrate our networks ability to calculate the maximum overlap area of any two shapes in one shot; we also demonstrate our networks success at matching halves of geometric shapes. We then propose the applications of our network to areas of biochemistry and medicine, especially generative drug discovery.

## 129) Kaan Dokmeci, Theorems on Field Extensions and Radical Denesting (26 Sept 2017)

The problem of radical denesting is the problem that looks into given nested radical expressions and ways to denest them, or decrease the number of layers of radicals. This is a fairly recent problem, with applications in mathematical software that do algebraic manipulations like denesting given radical expressions. Current algorithms are either limited or inefficient. We tackle the problem of denesting real radical expressions without the use of Galois Theory. This uses various theorems on field extensions formed by adjoining roots of elements of the original field. These theorems are proven via the roots of unity filter and degree arguments. These theorems culminate in proving a general theorem on denesting and leads to a general algorithm that does not require roots of unity. We optimize this algorithm further. Also, special cases of radical expressions are covered, giving more efficient algorithms in these cases, spanning many examples of radicals. Additionally, a condition for a radical not to denest is given. The results of denesting radicals over $Q$ are extended to real extensions of $Q$ and also transcendental extensions like $Q$(t). Finally, the case of denesting sums of radicals is explored as well.

## 2016 Research Papers

128) piotr suwara (mit) and albert yue (primes), an index-type invariant of knot diagrams giving bounds for unknotting framed unknots (arxiv.org, 7 jul 2017).

We introduce a new knot diagram invariant called the Self-Crossing Index (SCI). Using SCI, we provide bounds for unknotting two families of framed unknots. For one of these families, unknotting using framed Reidemeister moves is significantly harder than unknotting using regular Reidemeister moves. We also investigate the relation between SCI and Arnold's curve invariant St, as well as the relation with Hass and Nowik's invariant, which generalizes cowrithe. In particular, the change of SCI under $\Omega$3 moves depends only on the forward/backward character of the move, similar to how the change of St or cowrithe depends only on the positive/negative quality of the move.

## 127) P.A. CrowdMath, Results on Pattern Avoidance Games (arXiv.org, 18 Apr 2017)

A zero-one matrix $A$ contains another zero-one matrix $P$ if some submatrix of $A$ can be transformed to $P$ by changing some ones to zeros. $A$ avoids $P$ if $A$ does not contain $P$. The Pattern Avoidance Game is played by two players. Starting with an all-zero matrix, two players take turns changing zeros to ones while keeping $A$ avoiding $P$. We study the strategies of this game for some patterns $P$. We also study some generalizations of this game.

## 126) P.A. CrowdMath, Algorithms for Pattern Containment in 0-1 Matrices (arXiv.org, 18 Apr 2017)

We say a zero-one matrix $A$ avoids another zero-one matrix $P$ if no submatrix of $A$ can be transformed to $P$ by changing some ones to zeros. A fundamental problem is to study the extremal function $ex(n,P)$, the maximum number of nonzero entries in an $n \times n$ zero-one matrix $A$ which avoids $P$. To calculate exact values of $ex(n,P)$ for specific values of $n$, we need containment algorithms which tell us whether a given $n \times n$ matrix $A$ contains a given pattern matrix $P$. In this paper, we present optimal algorithms to determine when an $n \times n$ matrix $A$ contains a given pattern $P$ when $P$ is a column of all ones, an identity matrix, a tuple identity matrix, an $L$-shaped pattern, or a cross pattern. These algorithms run in $\Theta(n^2)$ time, which is the lowest possible order a containment algorithm can achieve. When $P$ is a rectangular all-ones matrix, we also obtain an improved running time algorithm, albeit with a higher order.

## 125) Malte Möser, Kyle Soska, Ethan Heilman, Kevin Lee, Henry Heffan (PRIMES), Shashvat Srivastava (PRIMES), Kyle Hogan, Jason Hennessey, Andrew Miller, Arvind Narayanan, and Nicolas Christin, An Empirical Analysis of Traceability in the Monero Blockchain (arXiv.org, 13 Apr 2017); to appear at PETS (Privacy Enhancing Technologies Symposium) 2018 ; an accompanying article about this paper appread in Wired (March 27, 2018)

Monero is a privacy-centric cryptocurrency that allows users to obscure their transactions by including chaff coins, called "mixins," along with the actual coins they spend. In this paper, we empirically evaluate two weaknesses in Monero's mixin sampling strategy. First, about 62% of transaction inputs with one or more mixins are vulnerable to "chain-reaction" analysis -- that is, the real input can be deduced by elimination. Second, Monero mixins are sampled in such a way that they can be easily distinguished from the real coins by their age distribution; in short, the real input is usually the "newest" input. We estimate that this heuristic can be used to guess the real input with 80% accuracy over all transactions with 1 or more mixins. Next, we turn to the Monero ecosystem and study the importance of mining pools and the former anonymous marketplace AlphaBay on the transaction volume. We find that after removing mining pool activity, there remains a large amount of potentially privacy-sensitive transactions that are affected by these weaknesses. We propose and evaluate two countermeasures that can improve the privacy of future transactions.

## 124) Alec Leng, Independence of the Miller-Rabin and Lucas Probable Prime Tests (30 Mar 2017)

In the modern age, public-key cryptography has become a vital component for secure online communication. To implement these cryptosystems, rapid primality testing is necessary in order to generate keys. In particular, probabilistic tests are used for their speed, despite the potential for pseudoprimes. So, we examine the commonly used Miller-Rabin and Lucas tests, showing that numbers with many nonwitnesses are usually Carmichael or Lucas-Carmichael numbers in a specific form. We then use these categorizations, through a generalization of Korselt’s criterion, to prove that there are no numbers with many nonwitnesses for both tests, affirming the two tests’ relative independence. As Carmichael and Lucas-Carmichael numbers are in general more difficult for the two tests to deal with, we next search for numbers which are both Carmichael and Lucas-Carmichael numbers, experimentally finding none less than $10^{16}$. We thus conjecture that there are no such composites and, using multivariate calculus with symmetric polynomials, begin developing techniques to prove this.

## 123) Ria Das, Exploring the Ant Mill: Numerical and Analytical Investigations of Mixed Memory-Reinforcement Systems (arXiv.org, 20 Mar 2017)

Under certain circumstances, a swarm of a species of trail-laying ants known as army ants can become caught in a doomed revolving motion known as the death spiral, in which each ant follows the one in front of it in a never-ending loop until they all drop dead from exhaustion. This phenomenon, as well as the ordinary motions of many ant species and certain slime molds, can be modeled using reinforced random walks and random walks with memory. In a reinforced random walk, the path taken by a moving particle is influenced by the previous paths taken by other particles. In a random walk with memory, a particle is more likely to continue along its line of motion than change its direction. Both memory and reinforcement have been studied independently in random walks with interesting results. However, real biological motion is a result of a combination of both memory and reinforcement. In this paper, we construct a continuous random walk model based on diffusion-advection partial differential equations that combine memory and reinforcement. We find an axi-symmetric, time-independent solution to the equations that resembles the death spiral. Finally, we prove numerically that the obtained steady-state solution is stable.

## 122) Andrew Gritsevskiy and Adithya Vellal, Development and Biological Analysis of a Neural Network Based Genomic Compression System (3 Mar 2017)

The advent of Next Generation Sequencing (NGS) technologies has resulted in a barrage of genomic data that is now available to the scientific community. This data contains information that is driving fields such as precision medicine and pharmacogenomics, where clinicians use a patient’s genetics in order to develop custom treatments. However, genomic data is immense in size, which makes it extremely costly to store, transport and process. A genomic compression system which takes advantage of intrinsic biological patterns can help reduce the costs associated with this data while also identifying important biological patterns. In this project, we aim to create a compression system which uses unsupervised neural networks to compress genomic data. The complete compression suite, GenComp, is compared to existing genomic data compression methods. The results are then analyzed to discover new biological features of genomic data. Testing showed that GenComp achieves at least 40 times more compression than existing variant compression solutions, while providing comparable decoding times in most applications. GenComp also provides some insight into genetic patterns, which has significant potential to aid in the fields of pharmacogenomics and precision medicine. Our results demonstrate that neural networks can be used to significantly compress genomic data while also assisting in better understanding genetic biology.

## 121) Vivek Bhupatiraju, John Kuszmaul, and Vinjai Vale, On the Viability of Distributed Consensus by Proof of Space (3 Mar 2017)

In this paper, we present our implementation of Proof of Space (PoS) and our study of its viability in distributed consensus. PoS is a new alternative to the commonly used Proof of Work, which is a protocol at the heart of distributed consensus systems such as Bitcoin. PoS resolves the two major drawbacks of Proof of Work: high energy cost and bias towards individuals with specialized hardware. In PoS, users must store large “hard-to-pebble” PTC graphs, which are recursively generated using subgraphs called superconcentrators. We implemented two types of superconcentrators to examine their differences in performance. Linear superconcentrators are about 1:8 times slower than butterfly superconcentrators, but provide a better lower bound on space consumption. Finally, we discuss our simulation of using PoS to reach consensus in a peer-to-peer network. We conclude that Proof of Space is indeed viable for distributed consensus. To the best of our knowledge, we are the first to implement linear superconcentrators and to simulate the use of PoS to reach consensus on a decentralized network.

## 120) Albert Yue, An Index-Type Invariant of Knot Diagrams and Bounds for Unknotting Framed Knots (3 Mar 2017)

We introduce a new knot diagram invariant called self-crossing index, or $\mathrm{SCI}$. We found that $\mathrm{SCI}$ changes by at most $\pm 1$ under framed Reidemeister moves, and specifically provides a lower bound for the number of 3 moves. We also found that $\mathrm{SCI}$ is additive under connected sums, and is a Vassiliev invariant of order 1. We also conduct similar calculations with Hass and Nowik's diagram invariant and cowrithe, and present a relationship between forward/backward, ascending/descending, and positive/negative 3 moves.

## 119) Valerie Zhang, Computer-Based Visualizations and Manipulations of Matching Paths (2 Mar 2017)

Given n points in the 2-D plane, a matching path is a path that starts at one of these n points and ends at a different one without going through any of the other n - 2 points. Matching paths, as well as an important operation called the Hurwitz move, come up naturally in the study of complex algebraic varieties. At the heart of the Hurwitz move is the twist operation, which “twists” one matching path along another to produce a new (third) matching path. Performing the twist operation by hand, however, is not only tedious but also prone to errors and unnecessary complications. Therefore, using computer-based methods to represent matching paths and perform the twist operation makes sense. In this project, which was coded in Java, computer-based methods are developed to perform the twist operation efficiently and accurately, providing a framework for visualizing and manipulating matching paths with computers. The computer program performs fast computations and represents matching paths as simply as possible in a simple visual interface. This program could be utilized when solving open problems in symplectic geometry: potential applications include characterizing the overtwistedness of contact manifolds, as well as better understanding braid group actions.

## 118) Harshal Sheth, Nihar Sheth, and Aashish Welling, Read-Copy Update in a Garbage Collected Environment (1 Mar 2017)

Read-copy update (RCU) is a synchronization mechanism that allows efficient parallelism when there are a high number of readers compared to writers. The primary use of RCU is in Linux, a highly popular operating system kernel. The Linux kernel is written in C, a language that is not garbage collected, and yet the functionality that RCU provides is effectively that of a “poor man’s garbage collector” (P. E. McKenney). RCU in C is also complicated to use, and this can lead to bugs. The purpose of this paper is to investigate whether RCU implemented in a garbage collected language (Go) is easier to use while delivering comparable performance to RCU in C. This is tested through the implementation and benchmarking of 4 linked lists, 2 using RCU and 2 using mutexes. One RCU linked list and one mutex linked list are implemented in each language. This paper finds that RCU in a garbage collected language is indeed significantly easier to use, has similar overall performance to, and on very high read loads, outperforms, RCU in C.

## 117) Xiangyao Yu (MIT), Siye Zhu (PRIMES), Justin Kaashoek (PRIMES), Andrew Pavlo (Carnegie Mellon University), and Srinivas Devadas (MIT), Taurus: A Parallel Transaction Recovery Method Based on Fine-Granularity Dependency Tracking (28 Feb 2017)

Logging is crucial to performance in modern multicore main-memory database management systems (DBMSs). Traditional data logging (ARIES) and command logging algorithms enforce a sequential order among log records using a global log sequence number (LSN). Log flushing and recovery after a crash are both performed in the LSN order. This serialization of transaction logging and recovery can limit the system performance at high core count. In this paper, we propose Taurus to break the LSN abstraction and enable parallel logging and recovery by tracking fine-grained dependencies among transactions. The dependency tracking lends Taurus three salient features. (1) Taurus decouples the transaction logging order with commit order and allows transactions to be flushed to persistent storage in parallel independently. Transactions that are persistent before commit can be discovered and ignored by the recovery algorithm using the logged dependency information. (2) Taurus can leverage multiple persistent devices for logging. (3) Taurus can leverage multiple devices and multiple worker threads for parallel recovery. Taurus improves logging and recovery parallelism for both data and command logging. .

## 116) Louis Golowich (PRIMES), Chiheon Kim (MIT), and Richard Zhou (PRIMES), Maximum Size of a Family of Pairwise Graph-Different Permutations (arXiv.org, 27 Feb 2017), published in The Electronic Journal of Combinatorics 24:4 (2017)

Two permutations of the vertices of a graph $G$ are called $G$-different if there exists an index $i$ such that $i$-th entry of the two permutations form an edge in $G$. We bound or determine the maximum size of a family of pairwise $G$-different permutations for various graphs $G$. We show that for all balanced bipartite graphs $G$ of order $n$ with minimum degree $n/2 - o(n)$, the maximum number of pairwise $G$-different permutations of the vertices of $G$ is $2^{(1-o(1))n}$. We also present examples of bipartite graphs $G$ with maximum degree $O(\log n)$ that have this property. We explore the problem of bounding the maximum size of a family of pairwise graph-different permutations when an unlimited number of disjoint vertices is added to a given graph. We determine this exact value for the graph of 2 disjoint edges, and present some asymptotic bounds relating to this value for graphs consisting of the union of $n/2$ disjoint edges.

## 115) Sathwik Karnik, On the Classification and Algorithmic Analysis of Carmichael Numbers (arXiv.org, 26 Feb 2017)

In this paper, we study the properties of Carmichael numbers, false positives to several primality tests. We provide a classification for Carmichael numbers with a proportion of Fermat witnesses of less than 50%, based on if the smallest prime factor is greater than a determined lower bound. In addition, we conduct a Monte Carlo simulation as part of a probabilistic algorithm to detect if a given composite number is Carmichael. We modify this highly accurate algorithm with a deterministic primality test to create a novel, more efficient algorithm that differentiates between Carmichael numbers and prime numbers.

## 114) Felix Wang, Functional equations in Complex Analysis and Number Theory (26 Feb 2017)

We study the following questions: (1) What are all solutions to $f\circ \hat{f} = g\circ \hat{g}$ with $f,g,\hat{f},\hat{g}\in\mathbb{C}(X)$ being complex rational functions? (2) For which rational functions $f(X)$ and $g(X)$ with rational coefficients does the equation $f(a)=g(b)$ have infinitely many solutions with $a,b\in$ $Q$? We utilize various algebraic, geometric and analytic results in order to resolve both (1) and a variant of (2) in case the numerator of $f(X)-g(Y)$ is an irreducible polynomial in $\mathbb{C}[X,Y]$. Our results have applications in various mathematical fields, such as complex analysis, number theory, and dynamical systems. Our work resolves a 1973 question of Fried, and makes significant progress on a 1924 question of Ritt and a 1997 question of Lyubich and Minsky. In addition, we prove a quantitative refinement of a 2015 conjecture of Cahn, Jones and Spear.

## 113) Laura Pierson, Signatures of Stable Multiplicity Spaces in Restrictions of Representations of Symmetric Groups (25 Feb 2017)

Representation theory is a way of studying complex mathematical structures such as groups and algebras by mapping them to linear actions on vector spaces. Recently, Deligne proposed a new way to study the representation theory of finite groups by generalizing the collection of representations of a sequence of groups indexed by positive integer rank to an arbitrary complex rank, creating an abelian tensor category. In this project, we focused on the case of the symmetric groups $S_n,$ the groups of permutations of $n$ objects. Elements of the Deligne category Rep $S_t$ can be constructed by taking a stable sequence of $S_n$ representations for increasing $n$ and interpolating the associated formulas to an arbitrary complex number $t.$ In this project, we studied the case of restriction multiplicity spaces $V_{\lambda,\rho}$, counting the number of copies of an irreducible representation $V_{\rho}$ of $S_{n-k}$ in the restriction $\text{Res}_{S_{n-k}}^{S_n} V_{\lambda}$ of an irreducible representation of $S_n.$ We found formulas for norms of orthogonal basis vectors in these spaces, and ultimately for signatures (the number of basis vectors with positive norm minus the number with negative norm), an invariant that multiplies over tensor products and has important combinatorial connections.

## 112) Albert Gerovitch, Automatically Improving 3D Neuron Segmentations for Expansion Microscopy Connectomics (25 Feb 2017)

Understanding the geometry of neurons and their connections is key to comprehending brain function. This is the goal of a new optical approach to brain mapping using expansion microscopy (ExM), developed in the Boyden Lab at MIT to replace the traditional approach of electron microscopy. A challenge here is to perform image segmentation to delineate the boundaries of individual neurons. Currently, however, there is no method implemented for assessing a segmentation algorithm’s accuracy in ExM. The aim of this project is to create automated assessment of neuronal segmentation algorithms, enabling their iterative improvement. By automating the process, I aim to devise powerful segmentation algorithms that reveal the “connectome” of a neural circuit. I created software, called SEV-3D, which uses the pixel error and warping error metrics to assess 3D segmentations of single neurons. To allow better assessment beyond a simple numerical score, I visualized the results as a multilayered image. My program runs in a closed loop with a segmentation algorithm, modifying its parameters until the algorithm yields an optimal segmentation. I am further developing my application to enable evaluation of multi-cell segmentations. In the future, I aim to further implement the principles of machine learning to automatically improve the algorithms, yielding even better accuracy.

## 111) Kevin Chang, Upper Bounds for Ordered Ramsey Numbers of Small 1-Orderings (arXiv.org, 7 Feb 2017)

A $k$-ordering of a graph $G$ assigns distinct order-labels from the set $\{1,\ldots,|G|\}$ to $k$ vertices in $G$. Given a $k$-ordering $H$, the ordered Ramsey number $R_{<} (H)$ is the minimum $n$ such that every edge-2-coloring of the complete graph on the vertex set $\{1, \ldots, n\}$ contains a copy of $H$, the $i$th smallest vertex of which either has order-label $i$ in $H$ or no order-label in $H$. This paper conducts the first systematic study of ordered Ramsey numbers for $1$-orderings of small graphs. We provide upper bounds for $R_{<} (H)$ for each connected $1$-ordering $H$ on $4$ vertices. Additionally, for every $1$-ordering $H$ of the $n$-vertex path $P_n$, we prove that $R_{<} (H) \in O(n)$. Finally, we provide an upper bound for the generalized ordered Ramsey number $R_{<} (K_n, H)$ which can be applied to any $k$-ordering $H$ containing some vertex with order-label $1$.

## 110) Nikhil Marda, On Equal Point Separation by Planar Cell Decompositions (arXiv.org, 17 Jan 2017)

In this paper, we investigate the problem of separating a set $X$ of points in $\mathbb{R}^{2}$ with an arrangement of $K$ lines such that each cell contains an asymptotically equal number of points (up to a constant ratio). We consider a property of curves called the stabbing number, defined to be the maximum countable number of intersections possible between the curve and a line in the plane. We show that large subsets of $X$ lying on Jordan curves of low stabbing number are an obstacle to equal separation. We further discuss Jordan curves of minimal stabbing number containing $X$. Our results generalize recent bounds on the Erdös-Szekeres Conjecture, showing that for fixed $d$ and sufficiently large $n$, if $|X| \ge 2^{c_dn/d + o(n)}$ with $c_d = 1 + O(\frac{1}{\sqrt{d}})$, then there exists a subset of $n$ points lying on a Jordan curve with stabbing number at most $d$.

## 109) Samuel Cohen and Peter Rowley, Results of Triangles Under Discrete Curve Shortening Flow (7 Jan 2017)

In this paper, we analyze the results of triangles under discrete curve shortening flow, specifically isosceles triangles with top angles greater than $\frac{\pi}{3}$, and scalene triangles. By considering the location of the three vertices of the triangle after some small time $\epsilon$, we use the definition of the derivative to calculate a system of differential equations involving parameters that can describe the triangle. Constructing phase plane diagrams and then analyzing them, we find that the singular behavior of discrete curve shorting flow on isosceles triangles with top angles greater than $\frac{\pi}{3}$ is a point, and for scalene triangles is a line segment.

## 108) Matthew Hase-Liu (PRIMES) and Nicholas Triantafillou (MIT), Efficient Point-Counting Algorithms for Superelliptic Curves (7 Jan 2017; arXiv.org, 7 Sep 2017)

In this paper, we present efficient algorithms for computing the number of points and the order of the Jacobian group of a superelliptic curve over finite fields of prime order p. Our method employs the Hasse-Weil bounds in conjunction with the Hasse-Witt matrix for superelliptic curves, whose entries we express in terms of multinomial coefficients. We present a fast algorithm for counting points on specific trinomial superelliptic curves and a slower, more general method for all superelliptic curves. For the first case, we reduce the problem of simplifying the entries of the Hasse-Witt matrix modulo p to a problem of solving quadratic Diophantine equations. For the second case, we extend Bostan et al.'s method for hyperelliptic curves to general superelliptic curves. We believe the methods we describe are asymptotically the most efficient known point-counting algorithms for certain families of trinomial superelliptic curves.

## 107) P.A. CrowdMath, Bounds on parameters of minimally non-linear patterns (arXiv.org, 31 Dec 2016), published in the Electronic Journal of Combinatorics 25:1 (2018)

Let $ex(n, P)$ be the maximum possible number of ones in any 0-1 matrix of dimensions $n \times n$ that avoids $P$. Matrix $P$ is called minimally non-linear if $ex(n, P) = \omega(n)$ but $ex(n, P') = O(n)$ for every strict subpattern $P'$ of $P$. We prove that the ratio between the length and width of any minimally non-linear 0-1 matrix is at most $4$, and that a minimally non-linear 0-1 matrix with $k$ rows has at most $5k-3$ ones. We also obtain an upper bound on the number of minimally non-linear 0-1 matrices with $k$ rows. In addition, we prove corresponding bounds for minimally non-linear ordered graphs. The minimal non-linearity that we investigate for ordered graphs is for the extremal function $ex_{<}(n, G)$, which is the maximum possible number of edges in any ordered graph on $n$ vertices with no ordered subgraph isomorphic to $G$.

## 106) Seth Shelley-Abrahamson (MIT) and Alec Sun (PRIMES), Towards a Classification of Finite-Dimensional Representations of Rational Cherednik Algebras of Type D (arXiv.org, 15 Dec 2016)

Using a combinatorial description due to Jacon and Lecouvey of the wall crossing bijections for cyclotomic rational Cherednik algebras, we show that the irreducible representations $L_c(\lambda^\pm)$ of the rational Cherednik algebra $H_c(D_n, \mathbb{C}^n)$ of type $D$ for symmetric bipartitions $\lambda$ are infinite dimensional for all parameters $c$. In particular, all finite-dimensional irreducible representations of rational Cherednik algebras of type $D$ arise as restrictions of finite-dimensional irreducible representations of rational Cherednik algebras of type $B$.

## 105) Nicholas Guo (PRIMES) and Guangyi Yue (MIT), Counting Independent Sets in Graphs of Hyperplane Arrangements (arXiv.org, 13 Dec 2016), published in Discrete Mathematics , vol. 343:3 (March 2020)

In this paper, we count the number of independent sets of a type of graph $G(\mathcal{A},q)$ associated to some hyperplane arrangement $\mathcal{A}$, which is a generalization of the construction of graphical arrangements. We show that when the parameters of $\mathcal{A}$ satisfy certain conditions, the number of independent sets of the disjoint union $G(\mathcal{A},q_1)\cup\cdots\cup G(\mathcal{A},q_s)$ depends only on the coefficients of $\mathcal{A}$ and the total number of vertices $\sum_i q_i$ when $q_i$'s are powers of large enough prime numbers. In addition it is independent of the coefficients as long as $\mathcal{A}$ is central and the coefficients are multiplicatively independent.

## 104) Yatharth Agarwal (PRIMES), Vishnu Murale (PRIMES), Jason Hennessey (Boston University), Kyle Hogan (Boston University), and Mayank Varia (Boston University), Moving in Next Door: Network Flooding as a Side Channel in Cloud Environments (14-16 Nov 2016), published in Sara Foresti and Giuseppe Persiano, eds., Cryptology and Network Security: 15th International Conference Proceedings, CANS 2016, Milan, Italy, November 14–16, 2016 , pp. 755-760.

Co-locating multiple tenants’ virtual machines (VMs) on the same host underpins public clouds’ affordability, but sharing physical hardware also exposes consumer VMs to side channel attacks from adversarial co-residents. We demonstrate passive bandwidth measurement to perform traffic analysis attacks on co-located VMs. Our attacks do not assume a privileged position in the network or require any communication between adversarial and victim VMs. Using a single feature in the observed bandwidth data, our algorithm can identify which of 3 potential YouTube videos a co-resident VM streamed with 66 % accuracy. We discuss defense from both a cloud provider’s and a consumer’s perspective, showing that effective defense is difficult to achieve without costly under-utilization on the part of the cloud provider or over-utilization on the part of the consumer.

## 103) Dhruv Rohatgi, A Connection Between Vector Bundles over Smooth Projective Curves and Representations of Quivers (31 Oct 2016)

We create a partition bijection that yields a partial result on a recent conjecture by Schiffmann relating the problems of counting over a finite field (1) vector bundles over smooth projective curves, and (2) representations of quivers.

## 102) Aaron Yeiser (PRIMES) and Alex Townsend (Cornell University), A spectral element method for meshes with skinny elements (30 Oct 2016; arXiv.org, 27 Mar 2018)

When numerically solving partial differential equations (PDEs), the first step is often to discretize the geometry using a mesh and to solve a corresponding discretization of the PDE. Standard finite and spectral element methods require that the underlying mesh has no skinny elements for numerical stability. Here, we develop a novel spectral element method that is numerically stable on meshes that contain skinny elements, while also allowing for high degree polynomials on each element. Our method is particularly useful for PDEs for which anisotropic mesh elements are beneficial and we demonstrate it with a Navier--Stokes simulation. Code for our method can be found at this URL .

## 101) Tanya Khovanova (MIT) and Rafael Saavedra (PRIMES), Discreet Coin Weighings and the Sorting Strategy (arXiv.org, 23 Sep 2016)

In 2007, Alexander Shapovalov posed an old twist on the classical coin weighing problem by asking for strategies that manage to conceal the identities of specific coins while providing general information on the number of fake coins. In 2015, Diaco and Khovanova studied various cases of these "discreet strategies" and introduced the revealing factor, a measure of the information that is revealed. In this paper we discuss a natural coin weighing strategy which we call the sorting strategy: divide the coins into equal piles and sort them by weight. We study the instances when the strategy is discreet, and given an outcome of the sorting strategy, the possible number of fake coins. We prove that in many cases, the number of fake coins can be any value in an arithmetic progression whose length depends linearly on the number of coins in each pile. We also show the strategy can be discreet when the number of fake coins is any value within an arithmetic subsequence whose length also depends linearly on the number of coins in each pile. We arrive at these results by connecting our work to the classic Frobenius coin problem. In addition, we calculate the revealing factor for the sorting strategy.

## 100) Kai-Siang Ang (PRIMES) and Laura P. Schaposnik (University of Illinois at Chicago), On the geometry of regular icosahedral capsids containing disymmetrons (arXiv.org, 29 Aug 2016), published in Journal of Structural Biology (19 Jan 2017)

Icosahedral virus capsids are composed of symmetrons, organized arrangements of capsomers. There are three types of symmetrons: disymmetrons, trisymmetrons, and pentasymmetrons, which have different shapes and are centered on the icosahedral 2-fold, 3-fold and 5-fold axes of symmetry, respectively. In 2010 [Sinkovits & Baker] gave a classification of all possible ways of building an icosahedral structure solely from trisymmetrons and pentasymmetrons, which requires the triangulation number T to be odd. In the present paper we incorporate disymmetrons to obtain a geometric classification of icosahedral viruses formed by regular penta-, tri-, and disymmetrons. For every class of solutions, we further provide formulas for symmetron sizes and parity restrictions on h, k, and T numbers. We also present several methods in which invariants may be used to classify a given configuration.

## 99) Tanya Khovanova (MIT) and Shuheng Niu (PRIMES), m -Modular Wythoff (arXiv.org, 2 Aug 2016)

We discuss a variant of Wythoff's Game, $m$-Modular Wythoff's Game, and identify the winning and losing positions for this game.

## 2015 Research Papers

98) caleb ji, robin park, and angela song, combinatorial games of no strategy (20 aug 2016).

In this paper, we study a particular class of combinatorial game motivated by previous research conducted by Professor James Propp, called Games of No Strategy , or games whose winners are predetermined. Finding the number of ways to play such games often leads to new combinatorial sequences and involves methods from analysis, number theory, and other fields. For the game Planted Brussel Sprouts , a variation on the well-known game Sprouts, we find a new proof that the number of ways to play is equal to the number of spanning trees on n vertices, and for Mozes’ Game of Numbers , a game studied for its interesting connections with other fields, we use prior work by Alon to calculate the number of ways to play the game for a certain case. Finally, in the game Binary Fusion , we show through both algebraic and combinatorial proofs that the number of ways to play generates Catalan’s triangle.

## 97) Meena Jagadeesan, The Exchange Graphs of Weakly Separated Collections (arXiv.org, 19 Aug 2016)

Weakly separated collections arise in the cluster algebra derived from the Pl\"ucker coordinates on the nonnegative Grassmannian. Oh, Postnikov, and Speyer studied weakly separated collections over a general Grassmann necklace $\mathcal{I}$ and proved the connectivity of every exchange graph. Oh and Speyer later introduced a generalization of exchange graphs that we call $\mathcal{C}$-constant graphs. They characterized these graphs in the smallest two cases. We prove an isomorphism between exchange graphs and a certain class of $\mathcal{C}$-constant graphs. We use this to extend Oh and Speyer's characterization of these graphs to the smallest four cases, and we present a conjecture on a bound on the maximal order of these graphs. In addition, we fully characterize certain classes of these graphs in the special cases of cycles and trees.

## 96) Nicholas Diaco, Counting Counterfeit Coins: A New Coin Weighing Problem (arXiv.org, 13 Jun 2016)

In 2007, a new variety of the well-known problem of identifying a counterfeit coin using a balance scale was introduced in the sixth International Kolmogorov Math Tournament. This paper offers a comprehensive overview of this new problem by presenting it in the context of the traditional coin weighing puzzle and then explaining what makes the new problem mathematically unique. Two weighing strategies described previously are used to derive lower bounds for the optimal number of admissible situations for given parameters. Additionally, a new weighing procedure is described that can be adapted to provide a solution for a broad spectrum of initial parameters by representing the number of counterfeit coins as a linear combination of positive integers. In closing, we offer a new form of the traditional counterfeit coin problem and provide a lower bound for the number of weighings necessary to solve it.

## 95) Jesse Geneson (MIT) and Meghal Gupta (PRIMES), Bounding extremal functions of forbidden 0-1 matrices using (r,s) -formations (19 Mar 2016)

First, we prove tight bounds of $n 2^{\frac{1}{(t-2)!}\alpha(n)^{t-2} \pm O(\alpha(n)^{t-3})}$ on the extremal function of the forbidden pair of ordered sequences $(1 2 3 \ldots k)^t$ and $(k \ldots 3 2 1)^t$ using bounds on a class of sequences called $(r,s)$-formations. Then, we show how an analogous method can be used to derive similar bounds on the extremal functions of forbidden pairs of $0-1$ matrices consisting of horizontal concatenations of identical identity matrices and their horizontal reflections.

## 94) Varun Jain, Novel Relationships Between Circular Planar Graphs and Electrical Networks (20 Feb 2016)

Circular planar graphs are used to model electrical networks, which arise in classical physics. Associated with such a network is a network response matrix, which carries information about how the network behaves in response to certain potential differences. Circular planar graphs can be organized into equivalence classes based upon these response matrices. In each equivalence class, certain fundamental elements are called critical. Additionally, it is known that equivalent graphs are related by certain local transformations. Using wiring diagrams, we first investigate the number of Y-∆ transformations required to transform one critical graph in an equivalence class into another, proving a quartic bound in the order of the graph. Next, we consider positivity phenomena, studying how testing the signs of certain circular minors can be used to determine if a given network response matrix is associated with a particular equivalence class. In particular, we prove a conjecture by Kenyon and Wilson for some cases.

## 93) Arthur Azvolinsky, Explicit Computations of the Frozen Boundaries of Rhombus Tilings of Polygonal Domains (12 Feb 2016)

Consider a polygonal domain $\Omega$ drawn on a regular triangular lattice. A rhombus tiling of $\Omega$ is defined as a complete covering of the domain with $60^{\textrm{o}}$-rhombi, where each one is obtained by gluing two neighboring triangles together. We consider a uniform measure on the set of all tilings of $\Omega$. As the mesh size of the lattice approaches zero while the polygon remains fixed, a random tiling approaches a deterministic limit shape. An important phenomenon that occurs with the convergence towards a limit shape is the formation of frozen facets ; that is, areas where there are asymptotically tiles of only one particular type. The sharp boundary between these ordered facet formations and the disordered region is a curve inscribed in $\Omega$. This inscribed curve is defined as the frozen boundary . The goal of this project was to understand the purely algebraic approach, elaborated on in a paper by Kenyon and Okounkov, to the problem of explicitly computing the frozen boundary. We will present our results for a number of special cases we considered.

## 92) David Amirault, Better Bounds on the Rate of Non-Witnesses of Lucas Pseudoprimes (3 Feb 2016)

Efficient primality testing is fundamental to modern cryptography for the purpose of key generation. Different primality tests may be compared using their runtimes and rates of non-witnesses. With the Lucas primality test, we analyze the frequency of Lucas pseudoprimes using MATLAB. We prove that a composite integer n can be a strong Lucas pseudoprime to at most 1 ⁄ 6 of parameters P , Q unless n belongs to a short list of exception cases, thus improving the bound from the previous result of 4 ⁄ 15 : We also explore the properties obeyed by such exceptions and how these cases may be handled by an extended version of the Lucas primality test.

## 91) Daniel Guo, An Infection Spreading Model on Binary Trees (26 Jan 2016)

An important and ongoing topic of research is the study of infectious diseases and the speed at which these diseases spread. Modeling the spread and growth of such diseases leads to a more precise understanding of the phenomenon and accurate predictions of spread in real life. We consider a long-range infection model on an infinite regular binary tree. Given a spreading coefficient $\alpha>1$, the time it takes for the infection to travel from one node to another node below it is exponentially distributed with specific rate functions such as $2^{-k}k^{-\alpha}$ or $\frac{1}{\alpha^k}$, where $k$ is the difference in layer number between the two nodes. We simulate and analyze the time needed for the infection to reach layer $m$ or below starting from the root node. The resulting time is recorded and graphed for different values of $\alpha$ and $m$. Finally, we prove rigorous lower and upper bounds for the infection time, both of which are approximately logarithmic with respect to $m$. The same techniques and results are valid for other regular $d$-ary trees, in which each node has exactly $d$ children where $d>2$.

## 90) Jacob Klegar, Bounded Tiling-Harmonic Functions on the Integer Lattice (25 Jan 2016)

Tiling-harmonic functions are a class of functions on square tilings that minimize a specific energy. These functions may provide a useful tool in studying square Sierpinski carpets. In this paper we show two new Maximum Modulus Principles for these functions, prove Harnack's Inequality, and give a proof that the set of tiling-harmonic functions is closed. One of these Maximum Modulus Principles is used to show that bounded infinite tiling-harmonic functions must have arbitrarily long constant lines. Additionally, we give three sufficient conditions for tiling-harmonic functions to be constant. Finally, we explore comparisons between tiling and graph-harmonic functions, especially in regards to oscillating boundary values.

## 89) Richard Yi, A Probability-Based Model of Traffic Flow (22 Jan 2016)

Describing the behavior of traffic via mathematical modeling and computer simulation has been a challenge confronted by mathematicians in various ways throughout the last century. In this project, we introduce various existing traffic flow models and present a new, probability-based model that is a hybrid of the microscopic and macroscopic views, drawing upon current ideas in traffic flow theory. We examine the correlations found in the data of our computer simulation. We hope that our results could help civil engineers implement efficient road systems that fit their needs, as well as contribute toward the design of safely operating unmanned vehicles.

## 88) Kenz Kallal, Matthew Lipman, and Felix Wang, Equal Compositions of Rational Functions (21 Jan 2016)

We study the following questions: (1) What are all solutions to $f\circ \hat{f} = g\circ \hat{g}$ in complex rational functions $f,g\in\mathbb{C}(X)$ and meromorphic functions $\hat{f}, \hat{g}$ on the complex plane? (2) For which rational functions $f(X)$ and $g(X)$ with coefficients in an algebraic number field $K$ does the equation $f(a)=g(b)$ have infinitely many solutions with $a,b\in K$? We utilize various algebraic, geometric and analytic results in order to resolve both questions in the case that the numerator of $f(X)-g(Y)$ is an irreducible polynomial in $\mathbb{C}[X,Y]$ of sufficiently large degree. Our work answers a 1973 question of Fried in all but finitely many cases, and makes significant progress towards answering a 1924 question of Ritt and a 1997 question of Lyubich and Minsky.

## 87) Dhruv Medarametla, Bounding Norms of Locally Random Matrices (21 Jan 2016)

Recently, several papers proving lower bounds for the performance of the Sum Of Squares Hierarchy on the planted clique problem have come out. A crucial part of all four papers is probabilistically bounding the norms of certain \locally random" matrices. In these matrices, the entries are not completely independent of each other, but rather depend upon a few edges of the input graph. In this paper, we study the norms of these locally random matrices. We start by bounding the norms of simple locally random matrices, whose entries depend on a bipartite graph H and a random graph G ; we then generalize this result by bounding the norms of complex locally random matrices, matrices based o of a much more general graph H and a random graph G . For both cases, we prove almost-tight probabilistic bounds on the asymptotic behavior of the norms of these matrices.

## 86) Rachel Zhang, Statistics of Intersections of Curves on Surfaces (19 Jan 2016)

Each orientable surface with nonempty boundary can be associated with a planar model, whose edges can then be labeled with letters that read out a surface word. Then, the curve word of a free homotopy class of closed curves on a surface is the minimal sequence of edges of the planar model through which a curve in the class passes. The length of a class of curves is defined to be the number of letters in its curve word. We fix a surface and its corresponding planar model. Fix a free homotopy class of curves ω on the surface. For another class of curves c , let i (ω; c ) be the minimal number of intersections of curves in ω and c . In this paper, we show that the mean of the distribution of i (ω; c ), for random curve c of length n , grows proportionally with n and approaches μ(ω) ⋅ n for a constant μ(ω). We also give an algorithm to compute μ(ω) and have written a program that calculates μ(ω) for any curve ω on any surface. In addition, we prove that i (ω; c ) approahces a Gaussian distribution as n → ∞ by viewing the generation of a random curve as a Markov Chain.

## 85) Cristian Gutu and Fengyao Ding, SecretRoom: An Anonymous Chat Client (16 Jan 2016)

While many people would like to be able to communicate anonymously, the few existing anonymous communication systems sacrifice anonymity for performance, or viceversa. The most popular such app is Tor, which relies on a series of relays to protect anonymity. Though proven to be efficient, Tor does not guarantee anonymity in the presence of strong adversaries like ISPs and government agencies who can conduct indepth traffic analysis. In contrast, our messaging application, SecretRoom, implements an improved version of a secure messaging protocol called Dining Cryptographers Networks (DCNets) to guarantee true anonymity in moderately sized groups. However, unlike traditional DCNets, SecretRoom does not require direct communication between all participants and does not depend on the presence of honest clients for anonymity. By introducing an untrusted server that performs the DCNet protocol on behalf of the clients, SecretRoom manages to reduce the O( n 2 ) communication associated with traditional DCNets to O( n ) for n clients. Moreover, by introducing artificially intelligent clients, SecretRoom makes the anonymity set size independent of the number of “real” clients. Ultimately SecretRoom reduces the communication to O( n ) and allows the DCNet protocol to scale to hundreds of clients compared to a few tens of clients in traditional DCNets.

## 84) Girishvar Venkat, Signatures of the Contravariant Form on Representations of the Hecke Algebra and Rational Cherednik Algebra associated to G ( r ,1, n ) (15 Jan 2016)

The Hecke algebra and rational Cherednik algebra of the group G ( r ,1, n ) are non-commutative algebras that are deformations of certain classical algebras associated to the group. These algebras have numerous applications in representation theory, number theory, algebraic geometry and integrable systems in quantum physics. Consequently, understanding their irreducible representations is important. If the deformation parameters are generic, then these irreducible representations, called Specht modules in the case of the Hecke algebra and Verma modules in the case of the Cherednik algebra, are in bijection with the irreducible representations of G ( r ,1, n ). However, while every irreducible representation of G ( r ,1, n ) is unitary, the Hermitian contravariant form on the Specht modules and Verma modules may only be non-degenerate. Thus, the signature of this form provides a great deal of information about the representations of the algebras that cannot be seen by looking at the group representations. In this paper, we compute the signature of arbitrary Specht modules of the Hecke algebra and use them to give explicit formulas of the parameter values for which these modules are unitary. We also compute asymptotic limits of existing formulas for the signature character of the polynomial representations of the Cherednik algebra which are vastly simpler than the full signature characters and show that these limits are rational functions in t . In addition, we show that for half of the parameter values, for each k , the degree k portion of the polynomial representation is unitary for large enough n .

## 83) Mehtaab Sawhney (PRIMES) and Jonathan Weed (MIT), Further results on arc and bar k-visibility graphs (arXiv.org, 6 Jan 2016)

We consider visibility graphs involving bars and arcs in which lines of sight can pass through up to k objects. We prove a new edge bound for arc k-visibility graphs, provide maximal constructions for arc and semi-arc k-visibility graphs, and give a complete characterization of semi-arc visibility graphs. We show that the family of arc i-visibility graphs is never contained in the family of bar j-visibility graphs for any i and j, and that the family of bar i-visibility graphs is not contained in the family of bar j-visibility graphs for $i \neq j$. We also give the first thickness bounds for arc and semi-arc k-visibility graphs. Finally, we introduce a model for random semi-bar and semi-arc k-visibility graphs and analyze its properties.

## 82) Harshal Sheth and Aashish Welling, An Implementation and Analysis of a Kernel Network Stack in Go with the CSP Style (30 Dec 2015; arXiv.org, 17 Mar 2016)

Modern operating system kernels are written in lower-level languages such as C. Although the low-level functionalities of C are often useful within kernels, they also give rise to several classes of bugs. Kernels written in higher level languages avoid many of these potential problems, at the possible cost of decreased performance. This research evaluates the advantages and disadvantages of a kernel written in a higher level language. To do this, the network stack subsystem of the kernel was implemented in Go with the Communicating Sequential Processes (CSP) style. Go is a high-level programming language that supports the CSP style, which recommends splitting large tasks into several smaller ones running in independent "threads". Modules for the major networking protocols, including Ethernet, ARP, IPv4, ICMP, UDP, and TCP, were implemented. In this study, the implemented Go network stack, called GoNet, was compared to a representative network stack written in C. The GoNet code is more readable and generally performs better than that of its C stack counterparts. From this, it can be concluded that Go with CSP style is a viable alternative to C for the language of kernel implementations.

## 81) Xiangyao Yu (MIT), Hongzhe Liu (PRIMES), Ethan Zou (PRIMES), and Srini Devadas (MIT), Tardis 2.0: An Optimized Time Traveling Coherence Protocol (arXiv.org, 27 Nov 2015), published in Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16), pp. 261-274.

The scalability of cache coherence protocols is a significant challenge in multicore and other distributed shared memory systems. Traditional snoopy and directory-based coherence protocols are difficult to scale up to many-core systems because of the overhead of broadcasting and storing sharers for each cacheline. Tardis, a recently proposed coherence protocol, shows potential in solving the scalability problem, since it only requires O(logN) storage per cacheline for an N-core system and needs no broadcasting support. The original Tardis protocol, however, only supports the sequential consistency memory model. This limits its applicability in real systems since most processors today implement relaxed consistency models like Total Store Order (TSO). Tardis also incurs large network traffic overhead on some benchmarks due to an excessive number of renew messages. Furthermore, the original Tardis protocol has suboptimal performance when the program uses spinning to communicate between threads. In this paper, we address these downsides of Tardis protocol and make it significantly more practical. Specifically, we discuss the architectural, memory system and protocol changes required in order to implement TSO consistency model on Tardis, and prove that the modified protocol satisfies TSO. We also propose optimizations for better leasing policies and to handle program spinning. Evaluated on 20 benchmarks, optimized Tardis at 64 (256) cores can achieve average performance improvement of 15.8% (8.4%) compared to the baseline Tardis and 1% (3.4%) compared to the baseline directory protocol. Our optimizations also reduce the average network traffic by 4.3% (6.1%) compared to the baseline directory protocol. On this set of benchmarks, optimized Tardis improves on a fullmap directory protocol in the metrics of energy, performance and storage, while being simpler to implement.

## 80) Allison Paul, Spectral Inference of a Directed Acyclic Graph Using Pairwise Similarities (11 Nov 2015)

A gene ontology graph is a directed acyclic graph (DAG) which represents relationships among biological processes. Inferring such a graph using a gene similarity matrix is NP-hard in general. Here, we propose an approximate algorithm to solve this problem efficiently by reducing the dimensionality of the problem using spectral clustering. We show that the original problem can be simplified to the inference problem of overlapping clusters in a network. We then solve the simplified problem in two steps: first we infer clusters using a spectral clustering technique. Then, we identify possible overlaps among the inferred clusters by identifying maximal cliques over the cluster similarity graph. We illustrate the effectiveness of our method over various synthetic networks in terms of both the performance and computational complexity compared to existing methods.

## 79) Niket Gowravaram , A Variation of nil-Temperley-Lieb Algebras of type A (26 Sep 2015)

We investigate a variation on the nil-Temperley-Lieb algebras of type A. This variation is formed by removing one of the relations and, in some sense, can be considered as a type B of the algebras. We give a general description of the structure of monomials formed by generators in the algebras. We also show that the dimension of these algebras is the sequence ${2n \choose n}$, by showing that the dimension is the Catalan transform of the sequence $2^n$.

## 78) Caleb Ji, Tanya Khovanova (MIT), Robin Park, and Angela Song, Chocolate Numbers (arXiv.org, 21 Sep 2015), published in Journal of Integer Sequences , vol. 19 (2016)

In this paper, we consider a game played on a rectangular $m \times n$ gridded chocolate bar. Each move, a player breaks the bar along a grid line. Each move after that consists of taking any piece of chocolate and breaking it again along existing grid lines, until just $mn$ individual squares remain. This paper enumerates the number of ways to break an $m \times n$ bar, which we call chocolate numbers, and introduces four new sequences related to these numbers. Using various techniques, we prove interesting divisibility results regarding these sequences.

## 77) Albert Gerovitch, Andrew Gritsevskiy, and Gregory Barboy, Mobile Health Surveillance: The Development of Software Tools for Monitoring the Spread of Disease (21 Sep 2015)

Disease spread monitoring data often comes with a significant delay and low geospatial resolution. We aim to develop a software tool for data collection, which enables daily monitoring and prediction of the spread of disease in a small community. We have developed a crowdsourcing application that collects users' health statuses and locations. It allows users to update their daily status online, and, in return, provides a visual map of geospatial distribution of sick people in a community, outlining locations with increased disease incidence. Currently, due to the lack of a large user base, we substitute this information with simulated data, and demonstrate our program's capabilities on a hypothetical outbreak. In addition, we use analytical methods for predicting town-level disease spread in the future. We model the disease spread via interpersonal probabilistic interactions on an undirected social graph. The network structure is based on scale-free networks integrated with Census data. The epidemic is modeled using the Susceptible-Infected-Recovered (SIR) model and a set of parameters, including transmission rate and vaccination patterns. The developed application will provide better methods for early detection of epidemics, identify places with high concentrations of infected people, and predict localized disease spread.

## 76) Niket Gowravaram and Tanya Khovanova (MIT), On the Structure of nil-Temperley-Lieb Algebras of type A (arXiv.org, 1 Sep 2015)

We investigate nil-Temperley-Lieb algebras of type A. We give a general description of the structure of monomials formed by the generators. We also show that the dimensions of these algebras are the famous Catalan numbers by providing a bijection between the monomials and Dyck paths. We show that the distribution of these monomials by degree is the same as the distribution of Dyck paths by the sum of the heights of the peaks minus the number of peaks.

## 75) Tanya Khovanova (MIT) and Karan Sarkar, P-positions in Modular Extensions to Nim (arXiv.org, 27 Aug 2015), published in International Journal of Game Theory , vol. 46 (2017)

In this paper, we consider a modular extension to the game of Nim, which we call $m$-Modular Nim, and explore its optimal strategy. In $m$-Modular Nim, a player can either make a standard Nim move or remove a multiple of $m$ tokens in total. We develop a winning strategy for all $m$ with $2$ heaps and for odd $m$ with any number of heaps.

## 74) Nicholas Diaco and Tanya Khovanova (MIT), Weighing Coins and Keeping Secrets (arXiv.org, 20 Aug 2015), published in Mathematical Intelligencer (September 2016)

In this expository paper we discuss a relatively new counterfeit coin problem with an unusual goal: maintaining the privacy of, rather than revealing, counterfeit coins in a set of both fake and real coins. We introduce two classes of solutions to this problem --- one that respects the privacy of all the coins and one that respects the privacy of only the fake coins --- and give several results regarding each. We describe and generalize 6 unique strategies that fall into these two categories. Furthermore, we explain conditions for the existence of a solution, as well as showing proof of a solution's optimality in select cases. In order to quantify exactly how much information is revealed by a given solution, we also define the revealing factor and revealing coefficient; these two values additionally act as a means of comparing the relative effectiveness of different solutions. Most importantly, by introducing an array of new concepts, we lay the foundation for future analysis of this very interesting problem, as well as many other problems related to privacy and the transfer of information.

## 73) Luke Sciarappa, Simple commutative algebras in Deligne's categories Rep($S_t$) (arXiv.org, 24 Jun 2015)

We show that in the Deligne categories $\mathrm{Rep}(S_t)$ for $t$ a transcendental number, the only simple algebra objects are images of simple algebras in the category of representations of a symmetric group under a canonical induction functor. They come in families which interpolate the families of algebras of functions on the cosets of $H\times S_{n-k}$ in $S_n$, for a fixed subgroup $H$ of $S_k$.

## 2014 Research Papers

72) geoffrey fudenberg (harvard), maxim imakaev (mit), carolyn lu (primes), anton goloborodko (mit), nezar abdennur (mit), and leonid mirny (mit), formation of chromosomal domains by loop extrusion (biorxiv, 14 aug 2015), published in cell reports 15:9 (31 may 2016): 2038–2049..

Characterizing how the three-dimensional organization of eukaryotic interphase chromosomes modulates regulatory interactions is an important contemporary challenge. Here we propose an active process underlying the formation of chromosomal domains observed in Hi-C experiments. In this process, cis-acting factors extrude progressively larger loops, but stall at domain boundaries; this dynamically forms loops of various sizes within but not between domains. We studied this mechanism using a polymer model of the chromatin fiber subject to loop extrusion dynamics. We find that systems of dynamically extruded loops can produce domains as observed in Hi-C experiments. Our results demonstrate the plausibility of the loop extrusion mechanism, and posit potential roles of cohesin complexes as a loop-extruding factor, and CTCF as an impediment to loop extrusion at domain boundaries.

## 71) Kavish Gandhi , Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube (4 April 2015)

A geodesic in the hypercube is the shortest possible path between two vertices. Leader and Long (2013) conjectured that, in every antipodal $2$-coloring of the edges of the hypercube, there exists a monochromatic geodesic between antipodal vertices. For this and an equivalent conjecture, we prove the cases $n = 2, 3, 4, 5$. We also examine the maximum number of monochromatic geodesics of length $k$ in an antipodal $2$-coloring and find it to be $2^{n-1}(n-k+1)\binom{n-1}{k-1}(k-1)!$. In this case, we classify all colorings in which this maximum occurs. Furthermore, we explore the maximum number of antipodal geodesics in a subgraph of the hypercube with a fixed proportion of edges, providing a conjectured optimal configuration as a lower bound, which, interestingly, contains a constant proportion of geodesics with respect to $n$. Finally, we present a series of smaller results that could be of use in finding an upper bound on the maximum number of antipodal geodesics in such a subgraph of the hypercube.

## 70) Jesse Geneson (MIT) and Peter M. Tian (PRIMES), Sequences of formation width $4$ and alternation length $5$ (arXiv.org, 13 Feb 2015)

Sequence pattern avoidance is a central topic in combinatorics. A sequence $s$ contains a sequence $u$ if some subsequence of $s$ can be changed into $u$ by a one-to-one renaming of its letters. If $s$ does not contain $u$, then $s$ avoids $u$. A widely studied extremal function related to pattern avoidance is $Ex(u, n)$, the maximum length of an $n$-letter sequence that avoids $u$ and has every $r$ consecutive letters pairwise distinct, where $r$ is the number of distinct letters in $u$. We bound $Ex(u, n)$ using the formation width function, $fw(u)$, which is the minimum $s$ for which there exists $r$ such that any concatenation of $s$ permutations, each on the same $r$ letters, contains $u$. In particular, we identify every sequence $u$ such that $fw(u)=4$ and $u$ contains $ababa$. The significance of this result lies in its implication that, for every such sequence $u$, we have $Ex(u, n) = \Theta(n \alpha(n))$, where $\alpha(n)$ denotes the incredibly slow-growing inverse Ackermann function. We have thus identified the extremal function of many infinite classes of previously unidentified sequences.

## 69) William Wu (PRIMES), Nicolaas Kaashoek (PRIMES), Matthew Weinberg (MIT), Christos Tzamos (MIT), and Costis Daskalakis (MIT), Game Theory based Peer Grading Mechanisms for MOOCs , paper for the Learning at Scale 2015 conference , March 14-18, 2015, Vancouver, BC, Canada (4 February 2015)

An efficient peer grading mechanism is proposed for grading the multitude of assignments in online courses. This novel approach is based on game theory and mechanism design. A set of assumptions and a mathematical model is ratified to simulate the dominant strategy behavior of students in a given mechanism. A benchmark function accounting for grade accuracy and workload is established to quantitatively compare eectiveness and scalability of various mechanisms. After multiple iterations of mechanisms under increasingly realistic assumptions, three are proposed: Calibration, Improved Calibration, and Deduction. The Calibration mechanism performs as predicted by game theory when tested in an online crowd-sourced experiment, but fails when students are assumed to communicate. The Improved Calibration mechanism addresses this assumption, but at the cost of more eort spent grading. The Deduction mechanism performs relatively well in the benchmark, outperforming the Calibration, Improved Calibration, traditional automated, and traditional peer grading systems. The mathematical model and benchmark opens the way for future derivative works to be performed and compared.

## 68) Alexandria Yu , Towards the classification of unital 7-dimensional commutative algebras (19 Jan 2015)

An algebra is a vector space with a compatible product operation. An algebra is called commutative if the product of any two elements is independent of the order in which they are multiplied. A basic problem is to determine how many unital commutative algebras exist in a given dimension and to find all of these algebras. This classification problem has its origin in number theory and algebraic geometry. For dimension less than or equal to 6, Poonen has completely classified all unital commutative algebras up to isomorphism. For dimension greater than or equal to 7, the situation is much more complicated due to the fact that there are infinitely many algebras up to isomorphism. The purpose of this work is to develop new techniques to classify unital 7-dimensional commutative algebras up to isomorphism. An algebra is called local if there exists a unique maximal ideal m. Local algebras are basic building blocks for general algebras as any finite dimensional unital commutative algebra is isomorphic to a direct sum of finite dimensional unital commutative local algebras. Hence, in order to classify all finite dimensional unital commutative algebras, it suffices to classify all finite dimensional unital commutative local algebras. In this article, we classify all unital 7-dimensional commutative local algebras up to isomorphism with the exception of the special case k 1 = 3 and k 2 = 3, where, for each positive integer i , m i is the subalgebra generated by products of i elements in the maximal ideal m and k i is the dimension of the quotient algebra m i / m i+1 . When k 2 = 1, we classify all finite dimensional unital commutative local algebras up to isomorphism. As a byproduct of our classification theorems, we discover several new classes of unital finite dimensional commutative algebras.

## 67) Niket Gowravaram and Uma Roy , Diagrammatic Calculus of Coxeter and Braid Groups (arXiv.org, 15 Mar 2015)

We investigate a novel diagrammatic approach to examining strict actions of a Coxeter group or a braid group on a category. This diagrammatic language, which was developed in a series of papers by Elias, Khovanov and Williamson, provides new tools and methods to attack many problems of current interest in representation theory. In our research we considered a particular problem which arises in this context. To a Coxeter group $W$ one can associate a real hyperplane arrangement, and can consider the complement of these hyperplanes in the complexification $Y_W$. The celebrated $K(\pi,1)$ conjecture states that $Y_W$ should be a classifying space for the pure braid group, and thus a natural quotient ${Y_W}/{W}$ should be a classifying space for the braid group. Salvetti provided a cell complex realization of the quotient, which we refer to as the Salvetti complex. In this paper we investigate a part of the $K(\pi,1)$ conjecture, which we call the $K(\pi,1)$ conjecturette, that states that the second homotopy group of the Salvetti complex is trivial. In this paper we present a diagrammatic proof of the $K(\pi,1)$ conjecturette for a family of braid groups as well as an analogous result for several families of Coxeter groups.

## 66) Arjun Khandelwal, Compact dot representations in permutation avoidance (3 Mar 2015)

A paper by a Eriksson et. al (2001) introduced a new form of representing a permutation, referred to as the compact dot representation, with the goal of constructing a smaller superpattern. We study this representation and give bounds on its size. We also consider a variant of the problem, where limitations on the alphabet size are imposed, and obtain lower bounds. Lastly, we consider the Mobius function of the poset of permutations ordered by containment.

## 65) Suzy Lou and Max Murin, On the Strongly Regular Graph of Parameters (99, 14, 1, 2) (9 Jan 2015)

In an attempt to find a strongly regular graph of parameters (99; 14; 1; 2) or to disprove its existence, we studied its possible substructure and constructions.

## 64) Shashwat Kishore (PRIMES) and Augustus Lonergan (MIT), Signatures of Multiplicity Spaces in Tensor Products of sl 2 and U q ( sl 2 ) Representations (9 Jan 2015; arXiv.org, 8 Jun 2015)

We study multiplicity space signatures in tensor products of sl2 and U q ( sl 2 ) representations and their applications. We completely classify definite multiplicity spaces for generic tensor products of sl 2 Verma modules. This provides a classification of a family of unitary representations of a basic quantized quiver variety, one of the first such classifications for any quantized quiver variety. We use multiplicity space signatures to provide the first real critical point lower bound for generic sl 2 master functions. As a corollary of this bound, we obtain a simple and asymptotically correct approximation for the number of real critical points of a generic sl 2 master function. We obtain a formula for multiplicity space signatures in tensor products of finite dimensional simple U q ( sl 2 ) representations. Our formula also gives multiplicity space signatures in generic tensor products of sl 2 Verma modules and generic tensor products of real U q ( sl 2 ) Verma modules. Our results have relations with knot theory, statistical mechanics, quantum physics, and geometric representation theory.

## 63) Joseph Zurier, Generalizations of the Joints Problem (9 Jan 2015)

In this paper we explore generalizations of the joints problem introduced by B. Chazelle et al.

## 62) Nathan Wolfe (PRIMES), Ethan Zou (PRIMES), Ling Ren (MIT), and Xiangyao Yu (MIT), Optimizing Path ORAM for Cloud Storage Applications (arXiv.org, 8 Jan 2015)

We live in a world where our personal data are both valuable and vulnerable to misappropriation through exploitation of security vulnerabilities in online services. For instance, Dropbox, a popular cloud storage tool, has certain security flaws that can be exploited to compromise a user's data, one of which being that a user's access pattern is unprotected. We have thus created an implementation of Path Oblivious RAM (Path ORAM) for Dropbox users to obfuscate path access information to patch this vulnerability. This implementation differs significantly from the standard usage of Path ORAM, in that we introduce several innovations, including a dynamically growing and shrinking tree architecture, multi-block fetching, block packing and the possibility for multi-client use. Our optimizations together produce about a 77% throughput increase and a 60% reduction in necessary tree size; these numbers vary with file size distribution.

## 61) Brice Huang, Monomization of Power Ideals and Generalized Parking Functions (8 Jan 2015)

A power ideal is an ideal in a polynomial ring generated by powers of homogeneous linear forms. Power ideals arise in many areas of mathematics, including the study of zonotopes, approximation theory, and fat point ideals; in particular, their applications in approximation theory are relevant to work on splines and pertinent to mathematical modeling, industrial design, and computer graphics. For this reason, understanding the structure of power ideals, especially their Hilbert series, is an important problem. Unfortunately, due to the computational complexity of power ideals, this is a difficult problem. Only a few cases of this problem have been solved; efficient ways to compute the Hilbert series of a power ideal are known only for power ideals of certain forms. In this paper, we find an efficient way to compute the Hilbert series of a class of power ideals.

## 60) Kyle Gettig, Linear Extensions of Acyclic Orientations (7 Jan 2015)

Given a graph, an acyclic orientation of the edges determines a partial ordering of the vertices. This partial ordering has a number of linear extensions, i.e. total orderings of the vertices that agree with the partial ordering. The purpose of this paper is twofold. Firstly, properties of the orientation that induces the maximum number of linear extensions are investigated. Due to similarities between the optimal orientation in simple cases and the solution to the Max-Cut Problem, the possibility of a correlation is explored, though with minimal success. Correlations are then explored between the optimal orientation of a graph G and the comparability graphs with the minimum number of edges that contain G as a subgraph, as well as to certain graphical colorings induced by the orientation. Specifically, small cases of non-comparability graphs are investigated and compared to the known results for comparability graphs. We then explore the optimal orientation for odd anti-cycles and related graphs, proving that the conjectured orientations are optimal in the odd anti-cycle case. In the second part of this paper, the above concepts are extended to random graphs, that is, graphs with probabilities associated with each edge. New definitions and theorems are introduced to create a more intuitive system that agrees with the discrete case when all probabilities are 0 or 1, though complete results for this new system would be much more difficult to prove.

## 59) Shyam Narayanan , Improving the Speed and Accuracy of the Miller-Rabin Primality Test (7 Jan 2015)

In this paper, we discuss the accuracy of the Miller-Rabin Primality Test and the number of nonwitnesses for a composite odd integer n .

## 58) Peter M. Tian, Extremal Functions of Forbidden Multidimensional Matrices (7 Jan 2015)

We advance the extremal theory of matrices in two directions. The methods that we use come from combinatorics, probability, and analysis.

## 57) Eric Neyman, Cylindric Young Tableaux and their Properties (7 Jan 2015; earlier version on arXiv.org, 19 Oct 2014)

Cylindric Young tableaux are combinatorial objects that first appeared in the 1990s. A natural extension of the classical notion of a Young tableau, they have since been used several times, most notably by Gessel and Krattenthaler and by Alexander Postnikov. Despite this, relatively little is known about cylindric Young tableaux. This paper is an investigation of the properties of this object. In this paper, we extend the Robinson-Schensted-Knuth Correspondence, a well-known and very useful bijection concerning regular Young tableaux, to be a correspondence between pairs of cylindric tableaux. We use this correspondence to reach further results about cylindric tableaux. We then establish an interpretation of cylindric tableaux in terms of a game involving marble-passing. Next, we demonstrate a generic method to use results concerning cylindric tableaux in order to prove results about skew Young tableaux. We finish with a note on Knuth equivalence and its analog for cylindric tableaux.

## 56) Yilun Du , On the Algorithmic and Theoretical Exploration of Tiling-Harmonic Functions (6 Jan 2015)

In this paper, we explore a new class of harmonic functions defined on a tiling T , a square tiling of a region D , in C . We define these functions as tiling harmonic functions. We develop an efficient algorithm for computing interior values of tiling harmonic functions and graph harmonic functions in a tiling. Using our algorithm, we find that in general tiling harmonic functions are not generally equivalent to graph harmonic functions. In addition, we prove some theoretical results on the structure of tiling harmonic functions and classify one type of tiling harmonic function.

## 55) Jessica Li , On the Modeling of Snowflake Growth Using Hexagonal Automata (2 Jan 2015; arXiv.org , 8 May 2015; pubished (with Laura P. Schaposnik) in Physical Review E 93:2 (Feb. 2016) )

Snowflake growth is an example of crystallization, a basic phase transition in physics. Studying snowflake growth helps gain fundamental understanding of this basic process and may help produce better crystalline materials and benefit several major industries. The basic theoretical physical mechanisms governing the growth of snowflake are not well understood: whilst current computer modeling methods can generate snowflake images that successfully capture some basic features of actual snowflakes, so far there has been no analysis of these computer models in the literature, and more importantly, certain fundamental features of snowflakes are not well understood. A key challenge of analysis is that the snowflake growth models consist of a large set of partial difference equations, and as in many chaos theory problems, rigorous study is difficult. In this paper we analyze a popular model (Reiter’s model) using a combined approach of mathematical analysis and numerical simulation. We divide a snowflake image into main branches and side branches and define two new variables (growth latency and growth direction) to characterize the growth patterns. We derive a closed form solution of the main branch growth latency using a one dimensional linear model, and compare it with the simulation results using the hexagonal automata. We discover a few interesting patterns of the growth latency and direction of side branches. On the basis of the analysis and the principle of surface free energy minimization, we propose a new geometric rule to incorporate interface control, a basic mechanism of crystallization that is not taken into account in the original Reiter’s model.

## 54) Amy Chou and Justin Kaashoek, PuzzleJAR: Automated Constraint-based Generation of Puzzles of Varying Complexity (30 Sept 2014)

Engaging students in practicing a wide range of problems facilitates their learning. However, generating fresh problems that have specific characteristics, such as using a certain set of concepts or being of a given difficulty level, is a tedious task for a teacher. In this paper, we present PuzzleJAR, a system that is based on an iterative constraint-based technique for automatically generating problems. The PuzzleJAR system takes as parameters the problem definition, the complexity function, and domain-specific semantics-preserving transformations. We present an instantiation of our technique with automated generation of Sudoku and Fillomino puzzles, and we are currently extending our technique to generate Python programming problems. Since defining complexities of Sudoku and Fillomino puzzles is still an open research question, we developed our own mechanism to define complexity, using machine learning to generate a function for difficulty from puzzles with already known difficulties. Using this technique, PuzzleJAR generated over 200,000 Sudoku puzzles of different sizes (9x9, 16x16, 25x25) and over 10,000 Fillomino puzzles of sizes ranging from 2x2 to 16x16. .

## 53) Tanya Khovanova, Eric Nie, and Alok Puranik, The Sierpinski Triangle and The Ulam-Warburton Automaton (arXiv.org, 25 Aug 2014), published in Math Horizons (September 2015), reprinted in The Best Writing on Mathematics 2016

This paper is about the beauty of fractals and the surprising connections between them. We will explain the pioneering role that the Sierpinski triangle plays in the Ulam-Warburton automata and show you a number of pictures along the way.

## 52) Tanya Khovanova and Joshua Xiong, Cookie Monster Plays Games (arXiv.org, 6 July 2014), published in College Mathematics Journal 46:4 (2015): 283-293

We research a combinatorial game based on the Cookie Monster problem called the Cookie Monster game that generalizes the games of Nim and Wythoff. We also propose several combinatorial games that are in between the Cookie Monster game and Nim. We discuss properties of P-positions of all of these games. Each section consists of two parts. The first part is a story presented from the Cookie Monster's point of view, the second part is a more abstract discussion of the same ideas by the authors.

## 51) Tanya Khovanova and Joshua Xiong, Nim Fractals (arXiv.org, 23 May 2014), published in Journal of Integer Sequences , Vol. 17 (2014)

We enumerate P-positions in the game of Nim in two different ways. In one series of sequences we enumerate them by the maximum number of counters in a pile. In another series of sequences we enumerate them by the total number of counters. We show that the game of Nim can be viewed as a cellular automaton, where the total number of counters divided by 2 can be considered as a generation in which P-positions are born. We prove that the three-pile Nim sequence enumerated by the total number of counters is a famous toothpick sequence based on the Ulam-Warburton cellular automaton. We introduce 10 new sequences.

## 50) Noah Golowich , Resolving a Conjecture on Degree of Regularity of Linear Homogeneous Equations (arXiv.org, 13 Apr 2014), published in The Electronic Journal of Combinatorics 21:3 (2014)

A linear equation is $r$-regular, if, for every $r$-coloring of the positive integers, there exist positive integers of the same color which satisfy the equation. In 2005, Fox and Radoićič conjectured that the equation $x_1 + 2x_2 + \cdots + 2^{n-2}x_{n-1} - 2^{n-1}x_n = 0$, for any $n \geq 2$, has a degree of regularity of $n-1$, which would verify a conjecture of Rado from 1933. Rado's conjecture has since been verified with a different family of equations. In this paper, we show that Fox and Radoićič's family of equations indeed have a degree of regularity of $n-1$. We also prove a few extensions of this result.

## 2013 Research Papers

49) ritesh ragavender , odd dunkl operators and nilhecke algebras (30 may 2014).

Symmetric functions appear in many areas of mathematics and physics, including enumerative combinatorics, the representation theory of symmetric groups, statistical mechanics, and the quantum statistics of ideal gases. In the commutative (or “even”) case of these symmetric functions, Kostant and Kumar introduced a nilHecke algebra that categorifies the quantum group U q ( sl 2 ) . This categorification helps to better understand Khovanov homology, which has important applications in studying knot polynomials and gauge theory. Recently, Ellis and Khovanov initiated the program of “oddification” as an effort to create a representation theoretic understanding of a new “odd” Khovanov homology, which often yields more powerful results than regular Khovanov homology. In this paper, we contribute to- wards the project of oddification by studying the odd Dunkl operators of Khongsap and Wang in the setting of the odd nilHecke algebra. Specifically, we show that odd divided difference operators can be used to construct odd Dunkl operators, which we use to give a representation of sl 2 on the algebra of skew polynomials and evaluate the odd Dunkl Laplacian. We then investigate q -analogs of divided difference operators to introduce new algebras that are similar to the even and odd nilHecke algebras and act on q -symmetric polynomials. We describe such algebras for all previously unstudied values of q . We conclude by generalizing a diagrammatic method and developing the novel method of insertion in order to study q -symmetric polynomials from the perspective of bialgebras.

## 48) Gabriella Studt , Construction of the higher Bruhat order on the Weyl group of type B (27 May 2014)

Manin and Schechtman defined the Bruhat order on the type A Weyl group, which is closely associated to the Symmetric group S n , as the order of all pairs of numbers in {1, 2, ..., n} . They proceeded to define a series of higher orders. Each higher order is an order on the subsets of {1, 2, ..., n} of size k , and can be computed using an inductive argument. It is also possible to define each of these higher orders explicitly, and therefore know conclusively the lexicographic orders for all k . It is thought that a closely related concept of lexicographic order exists for the Weyl group of type B, and that a similar method can be used to compute this series of higher orders. The applicability of this method is demonstrated in the paper, and we are able to determine and characterize the higher Bruhat order explicitly for certain n and k . We therefore conjecture the existence of such an order for all n > k ,as well as its accompanying properties.

## 47) Jeffrey Cai, Orbits of a fixed-point subgroup of the symplectic group on partial flag varieties of type A (24 May 2014)

In this paper we compute the orbits of the symplectic group Sp 2 n on partial flag varieties GL 2 n / P and on partial flag varieties enhanced by a vector space, C 2 n x GL 2 n / P . This extends analogous results proved by Matsuki on full flags. The general technique used in this paper is to take the orbits in the full flag case and determine which orbits remain distinct when the full flag variety GL 2 n / B is projected down to the partial flag variety GL 2 n / P .

The recent discovery of a connection between abstract algebra and the classical combinatorial Robinson-Schensted (RS) correspondence has sparked research on related algebraic structures and relationships to new combinatorial bijections, such as the Robinson- Schensted-Knuth (RSK) correspondence, the "mirabolic" RSK correspondence, and the "exotic" RS correspondence. We conjecture an exotic RSK correspondence between the or- bits described in this paper and semistandard bi-tableaux, which would yield an extension to the exotic RS correspondence found in a paper of Henderson and Trapa.

## 46) John Long , Evidence of Purifying Selection in Mammals (9 May 2014)

The Human Genome Project completed in 2003 gave us a reference genome for the human species. Before the project was completed, it was believed that the primary function of DNA was to code for protein. However, it was discovered that only 2% of the genome consists of regions that code for proteins. The remaining regions of the genome are either functional regions that regulate the coding regions or junk DNA regions that do nothing. The distinct ion between these two types of regions is not completely clear. Evidence of purifying selection, the decrease in frequency of deleterious mutations , is likely a sign that a region is functional. The goal of this project was to find evidence of purifying se lection in newly acquired regions in the human genome that are hypothesized to be functional. The mean Derived Allele Frequency of the featured regions was compared to that of control regions to determine the likelihood of selection.

## 45) Ravi Jagadeesan , A new Gal( Q /Q)-invariant of dessins d'enfants (arXiv.org, 30 March 2014)

We study the action of $\operatorname{Gal}(\overline{\mathbb{Q}}/\mathbb{Q})$ on the category of Belyi functions (finite, \'{e}tale covers of $\mathbb{P}^1_{\overline{\mathbb{Q}}}\setminus \{0,1,\infty\}$). We describe a new combinatorial $\operatorname{Gal}(\overline{\mathbb{Q}}/\mathbb{Q})$-invariant for a certain class of Belyi functions. As a corollary, we obtain that for all $k < 2^{\sqrt{\frac{2}{3}}}$ and all positive integers $N$, there is an $n \le N$ such that the set of degree $n$ Belyi functions of a particular rational Nielsen class must split into at least $\Omega\left(k^{\sqrt{N}}\right)$ Galois orbits. In addition, we define a new version of the Grothendieck-Teichm\"{u}ller group $\widehat{GT}$ into which $\operatorname{Gal}(\overline{\mathbb{Q}}/\mathbb{Q})$ embeds.

## 44) Andrey Grinshpun (MIT), Raj Raina (PRIMES), and Rik Sengupta (MIT), Minimum Degrees of Minimal Ramsey Graphs for Almost-Cliques (arXiv.org, 26 Jun 2014)

For graphs $F$ and $H$, we say $F$ is Ramsey for $H$ if every $2$-coloring of the edges of $F$ contains a monochromatic copy of $H$. The graph $F$ is Ramsey $H$-minimal if $F$ is Ramsey for $H$ and there is no proper subgraph $F'$ of $F$ so that $F'$ is Ramsey for $H$. Burr, Erdös, and Lovasz defined $s(H)$ to be the minimum degree of $F$ over all Ramsey $H$-minimal graphs $F$. Define $H_{t,d}$ to be a graph on $t+1$ vertices consisting of a complete graph on $t$ vertices and one additional vertex of degree $d$. We show that $s(H_{t,d})=d^2$ for all values $1<d\le t$; it was previously known that $s(H_{t,1})=t-1$, so it is surprising that $s(H_{t,2})=4$ is much smaller. We also make some further progress on some sparser graphs. Fox and Lin observed that $s(H)\ge 2\delta(H)-1$ for all graphs $H$, where $\delta(H)$ is the minimum degree of $H$; Szabo, Zumstein, and Zurcher investigated which graphs have this property and conjectured that all bipartite graphs $H$ without isolated vertices satisfy $s(H)=2\delta(H)-1$. Fox, Grinshpun, Liebenau, Person, and Szabo further conjectured that all triangle-free graphs without isolated vertices satisfy this property. We show that $d$-regular $3$-connected triangle-free graphs $H$, with one extra technical constraint, satisfy $s(H) = 2\delta(H)-1$; the extra constraint is that $H$ has a vertex $v$ so that if one removes $v$ and its neighborhood from $H$, the remainder is connected.

## 43) Boryana Doyle (PRIMES), Geoffrey Fudenberg (Harvard), Maxim Imakaev (MIT), and Leonid Mirny (MIT), Chromatin Loops as Allosteric Modulators of Enhancer-Promoter Interactions , published in PLoS Computational Biology (23 Oct 2014; earlier version in BioRxiv.org, 26 February 2014)

The classic model of eukaryotic gene expression requires direct spatial contact between a distal enhancer and a proximal promoter. Recent Chromosome Conformation Capture (3C) studies show that enhancers and promoters are embedded in a complex network of looping interactions. Here we use a polymer model of chromatin fiber to investigate whether, and to what extent, looping interactions between elements in the vicinity of an enhancer-promoter pair can influence their contact frequency. Our equilibrium polymer simulations show that a chromatin loop, formed by elements flanking either an enhancer or a promoter, suppresses enhancer-promoter interactions, working as an insulator. A loop formed by elements located in the region between an enhancer and a promoter, on the contrary, facilitates their interactions. We find that different mechanisms underlie insulation and facilitation; insulation occurs due to steric exclusion by the loop, and is a global effect, while facilitation occurs due to an effective shortening of the enhancer-promoter genomic distance, and is a local effect. Consistently, we find that these effects manifest quite differently for in silico 3C and microscopy. Our results show that looping interactions that do not directly involve an enhancer-promoter pair can nevertheless significantly modulate their interactions. This phenomenon is analogous to allosteric regulation in proteins, where a conformational change triggered by binding of a regulatory molecule to one site affects the state of another site.

## 42) William Kuszmaul , A New Approach to Enumerating Statistics Modulo n (arXiv.org, 16 February 2014)

We find a new approach to computing the remainder of a polynomial modulo $x^n-1$; such a computation is called modular enumeration. Given a polynomial with coefficients from a commutative $\mathbb{Q}$-algebra, our first main result constructs the remainder simply from the coefficients of residues of the polynomial modulo $\Phi_d(x)$ for each $d\mid n$. Since such residues can often be found to have nice values, this simplifies a number of modular enumeration problems; indeed in some cases, such residues are already known while the related modular enumeration problem has remained unsolved. We list six such cases which our technique makes easy to solve. Our second main result is a formula for the unique polynomial $a$ such that $a \equiv f \mod \Phi_n(x)$ and $a\equiv 0 \mod x^d-1$ for each proper divisor $d$ of $n$.

We find a formula for remainders of $q$-multinomial coefficients and for remainders of $q$-Catalan numbers modulo $q^n-1$, reducing each problem to a finite number of cases for any fixed $n$. In the prior case, we solve an open problem posed by Hartke and Radcliffe. In considering $q$-Catalan numbers modulo $q^n-1$, we discover a cyclic group operation on certain lattice paths which behaves predictably with regard to major index. We also make progress on a problem in modular enumeration on subset sums posed by Kitchloo and Pachter.

## 41) Ajay Saini , Predictive Modeling of Opinion and Connectivity Dynamics in Social Networks (26 January 2014)

Social networks have been extensively studied in recent years with the aim of understanding how the connectivity of different societies and their subgroups influences the spread of innovations and opinions through human networks. Using data collected from real-world social networks, researchers are able to gain a better understanding of the dynamics of such networks and subsequently model the changes that occur in these networks over time. In our work, we use data from the Social Evolution dataset of the MIT Human Dynamics Lab to develop a data-driven model capable of predicting the trends and long term changes observed in a real- world social network. We demonstrate the effectiveness of the model by predicting changes in both opinion spread and connectivity that reflect the changes observed in our dataset. After validating the model, we use it to understand how different types of social networks behave over time by varying the conditions governing the change of opinions and connectivity. We conclude with a study of opinion propagation under different conditions in which we use the structure and opinion distribution of various networks to identify sets of agents capable of propagating their opinion throughout an entire network. Our results demonstrate the effectiveness of the proposed modeling approach in predicting the future state of social networks and provide further insight into the dynamics of interactions between agents in real-world social networks.

## 40) Rohil Prasad, Investigating GCD in Z[√ 2 ] (1 1 January 2014)

We attempt to optimize the time needed to calculate greatest common divisors in the Euclidean domain Z[√ 2 ].

## 39) Jin-Woo Bryan Oh , Towards Generalizing Thrackles to Arbitrary Graphs (1 January 2014)

In the 1950s, John Conway came up with the notion of thrackles , graphs with embeddings in which no edge crosses itself, but every pair of distinct edges intersects each other exactly once. He conjectured that |E(G)| ≤ |V(G)| for any thrackle G, a question unsolved to this day. In this paper, we discuss some of the known properties of thrackles and contribute a few new ones.

Only a few sparse graphs can be thrackles, and so it is of interest to find an analogous notion that applies to denser graphs as well. In this paper we introduce a generalized version of thrackles called near-thrackles , and prove some of their properties. We also discuss a large number of conjectures about them which seem very obvious but nonetheless are hard to prove. In the final section, we introduce thrackleability , a number between 0 and 1 that turns out to be an accurate measure of how far away a graph is from being a thrackle..

## 38) Junho Won , Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle (1 January 2014)

The minimum number of crossings for all drawings of a given graph $G$ on a plane is called its crossing number, denoted $cr(G)$. Exact crossing numbers are known only for a few families of graphs, and even the crossing number of a complete graph $K_m$ is not known for all $m$. Wenping et al. showed that $cr(K_m\Box C_n)\geqslant n\cdot cr(K_{m+2})$ for $n\geqslant 4$ and $m\geqslant 4$. We adopt their method to find a lower bound for $cr(G\Box C_n)$ where $G$ is a vertex-transitive graph of degree at least 3. We also suggest some particular vertex-transitive graphs of interest, and give two corollaries that give lower bounds for $cr(G\Box C_n)$ in terms of $n$, $cr(G)$, the number of vertices of $G$, and the degree of $G$, which improve on Wenping et al.'s result.

## 37) Ying Gao, On an Extension of Stanley Depth for Refinement-Ordered Posets (30 December 2013)

The concept of Stanley depth was originally defined for graded modules over commutative rings in 1982 by Richard P. Stanley. However, in 2009 Herzog, Vladiou, and Zheng found a property, ndepth, of posets analogous to the Stanley depths of certain modules, which provides an important link between combinatorics and commutative algebra. Due to this link, there arises the question of what this ndepth is for certain classes of posets.

Because ndepth was only recently defined, much remains to be discovered about it. In 2009, Biro, Howard, Keller, Trotter and Young found a lower bound for the ndepth of the poset of nonempty subsets of {1; 2; ...; n} ordered by inclusion. In 2010, Wang calculated the ndepth of the product of chains n k \ 0. However, ndepth has yet to be studied in relation to many other commonly found classes of posets. We chose to research the properties of the ndepths of one such well-known class of posets - the posets which consist of non-empty partitions of sets ordered by refinement, which we denote as G i .

We use combinatorial and algebraic methods to find the ndepths for small posets in G i . We show that for posets of increasing size in G i , new depth is strictly non-decreasing, and furthermore we show that ndepth[G i ] ≥ [8i/29] for all i. We also find that for all i, ndepth[G i ] ≤ i through the proof that ndepth[G i+1 ] ≤ ndepth[G i ] + 1.

## 36) Nihal Gowravaram , Enumeration of Subclasses of (2+2)-free Partially Ordered Sets (26 December 2013)

We investigate avoidance in (2+2)-free partially ordered sets, posets that do not contain any induced subposet isomorphic to the union of two disjoint chains of length two. In particular, we are interested in enumerating the number of partially ordered sets of size N avoiding both 2+2 and some other poset α. For any α of size 3, the results are already well-known. However, out of the 15 such α of size 4, only 2 were previously known. Through the course of this paper, we explicitly enumerate 7 other such α of size 4. Also, we consider the avoidance of three posets simultaneously, 2+2 along with some pair (α,β); it turns out that this enumeration is often clean, and has sometimes surprising results. Furthermore, we turn to the question of Wilf-equivalences in (2+2)-free posets. We show such an equivalence between the Y-shaped and chain posets of size 4 via a direct bijection, and in fact, we extend this to show a Wilf-equivalence between the general chain poset and a general Y-shaped poset of the same size. In this paper, while our focus is on enumeration, we also seek to develop an understanding of the structures of the posets in the subclasses we are studying.

## 35) Yael Fregier (MIT) and Isaac Xia, Lower Central Series Ideal Quotients Over $\mathbb{F}_p$ and $\mathbb{Z}$ (17 November 2013; arXiv.org, 28 Jun 2015)

Given a graded associative algebra $A$, its lower central series is defined by $L_1 = A$ and $L_{i+1} = [L_i, A]$. We consider successive quotients $N_i(A) = M_i(A) / M_{i+1}(A)$, where $M_i(A) = AL_i(A) A$. These quotients are direct sums of graded components. Our purpose is to describe the $\mathbb{Z}$-module structure of the components; i.e., their free and torsion parts. Following computer exploration using MAGMA , two main cases are studied. The first considers $A = A_n / (f_1,\dots, f_m)$, with $A_n$ the free algebra on $n$ generators $\{x_1, \ldots, x_n\}$ over a field of characteristic $p$. The relations $f_i$ are noncommutative polynomials in $x_j^{p^{n_j}},$ for some integers $n_j$. For primes p > 2 , we prove that $p^{\sum n_j} \mid \text{dim}(N_i(A))$. Moreover, we determine polynomials dividing the Hilbert series of each $N_i(A)$. The second concerns $A = \mathbb{Z} \langle x_1, x_2, \rangle / (x_1^m, x_2^n)$. For $i = 2,3$, the bigraded structure of $N_i(A_2)$ is completely described.

## 34) Steven Homberg , Finding Enrichments of Functional Annotations for Disease- Associated Single-Nucleotide Polymorphisms (10 November 2013)

Computational analysis of SNP-disease associations from GWAS as well as functional annotations of the genome enables the calculation of a SNP set's enrichment for a disease. These statistical enrichments can be and are calculated with a variety of statistical techniques, but there is no standard statistical method for calculating enrichments. Several entirely different tests are used by different investigators in the field. These tests can also be conducted with several variations in parameters which also lack a standard. In our investigation, we develop a computational tool for conducting various enrichment calculations and, using breast cancer-associated SNPs from a GWAS catalog as a foreground against all GWAS SNPs as a background, test the tool and analyze the relative performance of the various tests. The computational tool will soon be released to the scientific community as a part of the Bioconductor package. Our analysis shows that, for R2 threshold in LD block construction, values around 0.8-0.9 are preferable to those with more lax and more strict thresholds respectively. We find that block-matching tests yield better results than peak-shifting tests. Finally, we find that, in block-matching tests, block tallying using binary scoring, noting whether or not a block has an annotation only, yields the most meaningful results, while weighting LD r2 threshold has no influence.

## 33) Kavish Gandhi , Noah Golowich , and László Miklós Lovász, Degree of Regularity of Linear Homogeneous Equations (arXiv.org, 27 Sept 2013), published in Journal of Combinatorics 5:2 (2014)

We define a linear homogeneous equation to be strongly r-regular if, when a finite number of inequalities is added to the equation, the system of the equation and inequalities is still r-regular. In this paper, we derive a constraint on the coefficients of a linear homogeneous equation that gives a sufficient condition for the equation to be strongly r-regular. In 2009, Alexeev and Tsimerman introduced a family of equations, each of which is (n-1)-regular but not n-regular, verifying a conjecture of Rado from 1933. We show that these equations are actually strongly (n-1)-regular as a corollary of our results.

## 32) Leigh Marie Braswell and Tanya Khovanova, On the Cookie Monster Problem (arXiv.org, 23 Sept 2013), published in Jennifer Beineke & Jason Rosenhouse, The Mathematics of Various Entertaining Subjects: Research in Recreational Math (Princeton University Press, 2015).

The Cookie Monster Problem supposes that the Cookie Monster wants to empty a set of jars filled with various numbers of cookies. On each of his moves, he may choose any subset of jars and take the same number of cookies from each of those jars. The Cookie Monster number of a set is the minimum number of moves the Cookie Monster must use to empty all of the jars. This number depends on the initial distribution of cookies in the jars. We discuss bounds of the Cookie Monster number and explicitly find the Cookie Monster number for jars containing cookies in the Fibonacci, Tribonacci, n-nacci, and Super-n-nacci sequences. We also construct sequences of k jars such that their Cookie Monster numbers are asymptotically rk, where r is any real number between 0 and 1 inclusive.

## 31) Vahid Fazel-Rezai, Equivalence Classes of Permutations Modulo Replacements Between 123 and Two-Integer Patterns (arXiv.org, 18 Sept 2013), published in The Electronic Journal of Combinatorics 21:2 (2014)

We explore a new type of replacement of patterns in permutations, suggested by James Propp, that does not preserve the length of permutations. In particular, we focus on replacements between 123 and a pattern of two integer elements. We apply these replacements in the classical sense; that is, the elements being replaced need not be adjacent in position or value. Given each replacement, the set of all permutations is partitioned into equivalence classes consisting of permutations reachable from one another through a series of bi-directional replacements. We break the eighteen replacements of interest into four categories by the structure of their classes and fully characterize all of their classes.

## 30) Jesse Geneson (MIT), Rohil Prasad (PRIMES), and Jonathan Tidor (PRIMES), Bounding sequence extremal functions with formations (arXiv.org, 17 Aug 2013), published in The Electronic Journal of Combinatorics 21:3 (2014)

An $(r, s)$-formation is a concatenation of $s$ permutations of $r$ letters. If $u$ is a sequence with $r$ distinct letters, then let $\mathit{Ex}(u, n)$ be the maximum length of any $r$-sparse sequence with $n$ distinct letters which has no subsequence isomorphic to $u$. For every sequence $u$ define $\mathit{fw}(u)$, the formation width of $u$, to be the minimum $s$ for which there exists $r$ such that there is a subsequence isomorphic to $u$ in every $(r, s)$-formation. We use $\mathit{fw}(u)$ to prove upper bounds on $\mathit{Ex}(u, n)$ for sequences $u$ such that $u$ contains an alternation with the same formation width as $u$. We generalize Nivasch's bounds on $\mathit{Ex}((ab)^{t}, n)$ by showing that $\mathit{fw}((12 \ldots l)^{t})=2t-1$ and $\mathit{Ex}((12\ldots l)^{t}, n) =n2^{\frac{1}{(t-2)!}\alpha(n)^{t-2}\pm O(\alpha(n)^{t-3})}$ for every $l \geq 2$ and $t\geq 3$, such that $\alpha(n)$ denotes the inverse Ackermann function. Upper bounds on $\mathit{Ex}((12 \ldots l)^{t} , n)$ have been used in other papers to bound the maximum number of edges in $k$-quasiplanar graphs on $n$ vertices with no pair of edges intersecting in more than $O(1)$ points. If $u$ is any sequence of the form $a v a v' a$ such that $a$ is a letter, $v$ is a nonempty sequence excluding $a$ with no repeated letters and $v'$ is obtained from $v$ by only moving the first letter of $v$ to another place in $v$, then we show that $\mathit{fw}(u)=4$ and $\mathit{Ex}(u, n) =\Theta(n\alpha(n))$. Furthermore we prove that $\mathit{fw}(abc(acb)^{t})=2t+1$ and $\mathit{Ex}(abc(acb)^{t}, n) = n2^{\frac{1}{(t-1)!}\alpha(n)^{t-1}\pm O(\alpha(n)^{t-2})}$ for every $t\geq 2$.

## 29) Jesse Geneson (MIT), Tanya Khovanova (MIT), and Jonathan Tidor (PRIMES), Convex geometric (k+2)-quasiplanar representations of semi-bar k-visibility graphs (arXiv.org, 3 Jul 2013), published in Discrete Mathematics 331 (2014)

We examine semi-bar visibility graphs in the plane and on a cylinder in which sightlines can pass through k objects. We show every semi-bar k-visibility graph has a (k+2)-quasiplanar representation in the plane with vertices drawn as points in convex position and edges drawn as segments. We also show that the graphs having cylindrical semi-bar k-visibility representations with semi-bars of different lengths are the same as the (2k+2)-degenerate graphs having edge-maximal (k+2)-quasiplanar representations in the plane with vertices drawn as points in convex position and edges drawn as segments.

## 28) Leigh Marie Braswell and Tanya Khovanova, Cookie Monster Devours Naccis (arXiv.org, 18 May 2013), published in the College Mathematics Journal 45:2 (2014)

In 2002, Cookie Monster appeared in The Inquisitive Problem Solver . The hungry monster wants to empty a set of jars filled with various numbers of cookies. On each of his moves, he may choose any subset of jars and take the same number of cookies from each of those jars. The Cookie Monster number is the minimum number of moves Cookie Monster must use to empty all of the jars. This number depends on the initial distribution of cookies in the jars. We discuss bounds of the Cookie Monster number and explicitly find the Cookie Monster number for Fibonacci, Tribonacci and other nacci sequences.

## 2012 Research Papers

27) william kuszmaul and ziling zhou, equivalence classes in s n for three families of pattern-replacement relations (arxiv.org, 20 april 2013).

We study a family of equivalence relations in S n , the group of permutations on n letters, created in a manner similar to that of the Knuth relation and the forgotten relation. For our purposes, two permutations are in the same equivalence class if one can be reached from the other through a series of pattern-replacements using patterns whose order permutations are in the same part of a predetermined partition of S c . In particular, we are interested in the number of classes created in S n by each relation and in characterizing these classes. Imposing the condition that the partition of S c has one nontrivial part containing the cyclic shifts of a single permutation, we find enumerations for the number of nontrivial classes. When the permutation is the identity, we are able to compare the sizes of these classes and connect parts of the problem to Young tableaux and Catalan lattice paths. Imposing the condition that the partition has one nontrivial part containing all of the permutations in S c beginning with 1, we both enumerate and characterize the classes in S n . We do the same for the partition that has two nontrivial parts, one containing all of the permutations in S c beginning with 1, and one containing all of the permutations in S c ending with 1.

## 26) William Kuszmaul , Counting permutations modulo pattern-replacement equivalences for three-letter patterns (arXiv.org, 20 April 2013), published in the Electronic Journal of Combinatorics 20:4 (2013)

We study a family of equivalence relations in S n , the group of permutations on n letters, created in a manner similar to that of the Knuth relation and the forgotten relation. For our purposes, two permutations are in the same equivalence class if one can be reached from the other through a series of pattern-replacements using patterns whose order permutations are in the same part of a predetermined partition of S c . When the partition is of S 3 and has one nontrivial part of size greater than two, we provide formulas for the number of classes created in all unresolved cases. When the partition is of S 3 and has two nontrivial parts, each of size two (as do the Knuth and forgotten relations), we enumerate the classes for 13 of the 14 unresolved cases. In two of these cases, enumerations arise which are the same as those yielded by the Knuth and forgotten relations. The reasons for this phenomenon are still largely a mystery.

## 25) Tanya Khovanova and Ziv Scully , Efficient Calculation of Determinants of Symbolic Matrices with Many Variables (arXiv.org, 13 April 2013)

Efficient matrix determinant calculations have been studied since the 19th century. Computers expand the range of determinants that are practically calculable to include matrices with symbolic entries. However, the fastest determinant algorithms for numerical matrices are often not the fastest for symbolic matrices with many variables. We compare the performance of two algorithms, fraction-free Gaussian elimination and minor expansion, on symbolic matrices with many variables. We show that, under a simplified theoretical model, minor expansion is faster in most situations. We then propose optimizations for minor expansion and demonstrate their effectiveness with empirical data.

## 24) Michael Zanger-Tishler and Saarik Kalia , On the Winning and Losing Parameters of Schmidt's Game (8 April 2013)

First introduced by Wolfgang Schmidt, the ( α , β )-game and its modifications have been shown to be a powerful tool in Diophantine approximation, metric number theory, and dynamical systems. However, natural questions about the winning-losing parameters of most sets have not been studied thoroughly even after more than 40 years. There are a few results in the literature showing that some non-trivial points and small regions are winning or losing, but complete pictures remain largely unknown. Our main goal in this paper is to provide as much detail as possible about the global pictures of winning-losing parameters for some interesting families of sets.

## 23) Sheela Devadas and Steven Sam, Representations of Cherednik algebras of G (m, r, n) in positive characteristic (arXiv.org, 3 April 2013), published in Journal of Commutative Algebra (Winter 2014): 525-559

We study lowest-weight irreducible representations of rational Cherednik algebras attached to the complex reflection groups G(m, r, n) in characteristic p . Our approach is mostly from the perspective of commutative algebra. By studying the kernel of the contravariant bilinear form on Verma modules, we obtain formulas for Hilbert series of irreducible representations in a number of cases, and present conjectures in other cases. We observe that the form of the Hilbert series of the irreducible representations and the generators of the kernel tend to be determined by the value of n modulo p , and are related to special classes of subspace arrangements. Perhaps the most novel (conjectural) discovery from the commutative algebra perspective is that the kernel can be given the structure of a "matrix regular sequence" in some instances, which we prove in some small cases.

## 22) Christina Chen and Nan Li, Apollonian Equilateral Triangles (arXiv.org, 1 March 2013)

Given an equilateral triangle with a the square of its side length and a point in its plane with b, c, d the squares of the distances from the point to the vertices of the triangle, it can be computed that a, b, c, d satisfy 3( a 2 + b 2 + c 2 + d 2 ) = ( a + b + c + d ) 2 . This paper derives properties of quadruples of nonnegative integers ( a; b; c; d ), called triangle quadruples, satisfying this equation. It is easy to verify that the operation generating ( a; b; c; a + b + c - d ) from ( a; b; c; d ) preserves this feature and that it and analogous ones for the other elements can be represented by four matrices. We examine in detail the triangle group, the group with these operations as generators, and completely classify the orbits of quadruples with respect to the triangle group action. We also compute the number of triangle quadruples generated after a certain number of operations and approximate the number of quadruples bounded by characteristics such as the maximal element. Finally, we prove that the triangle group is a hyperbolic Coxeter group and derive information about the elements of triangle quadruples by invoking Lie groups. We also generalize the problem to higher dimensions.

## 21) Dhroova Aiylam, Modified Stern-Brocot sequences (arXiv.org, 29 January 2013), published in Integers: Electronic Journal of Combinatorics and Number Theory 17 (2017)

We present the classical Stern-Brocot tree and provide a new proof of the fact that every rational number between 0 and 1 appears in the tree. We then generalize the Stern-Brocot tree to allow for arbitrary choice of starting terms, and prove that in all cases the tree maintains the property that every rational number between the two starting terms appears exactly once.

## 20) Nihal Gowravaram and Ravi Jagadeesan , Beyond alternating permutations: Pattern avoidance in Young diagrams and tableaux (arXiv.org, 28 January 2013), published in the Electronic Journal of Combinatorics 20:4 (2013)

We investigate pattern avoidance in alternating permutations and generalizations thereof. First, we study pattern avoidance in an alternating analogue of Young diagrams. In particular, we extend Babson-West's notion of shape-Wilf equivalence to apply to alternating permutations and so generalize results of Backelin-West-Xin and Ouchterlony to alternating permutations. Second, we study pattern avoidance in the more general context of permutations with restricted ascents and descents. We consider a question of Lewis regarding permutations that are the reading words of thickened staircase Young tableaux, that is, permutations that have (k - 1) ascents followed by a descent, followed by (k - 1) ascents, et cetera. We determine the relative sizes of the sets of pattern-avoiding (k - 1)-ascent permutations in terms of the forbidden pattern. Furthermore, we give inequalities in the sizes of sets of pattern-avoiding permutations in this context that arise from further extensions of shape-equivalence type enumerations.

## 19) Rohil Prasad and Jonathan Tidor , Optimal Results in Staged Self-Assembly of Wang Tiles (22 January 2013)

The subject of self-assembly deals with the spontaneous creation of ordered systems from simple units and is most often applied in the field of nanotechnology. The self-assembly model of Winfree describes the assembly of Wang tiles, simulating assembly in real-world systems. We use an extension of this model, known as the staged self-assembly model introduced by Demaine et al. that allows for discrete steps to be implemented and permits more diverse constructions. Under this model, we resolve the problem of constructing segments, creating a method to produce them optimally. Generalizing this construction to squares gives a new flexible method for their construction. Changing a parameter of the model, we explore much simpler constructions of complex monotone shapes. Finally, we present an optimal method to build most arbitrary shapes.

## 18) Aaron Klein, On Rank Functions of Graphs (6 January 2013)

We study rank functions (also known as graph homomorphisms onto Z), ways of imposing graded poset structures on graphs. We rst look at a variation on rank functions called discrete Lipschitz functions . We relate the number of Lipschitz functions of a graph G to the number of rank functions of both G and G X E . We then find generating functions that enable us to compute the number of rank or Lipschitz functions of a given graph. We look at a subset of graphs called squarely generated graphs , which are graphs whose cycle space has a basis consisting only of 4-cycles. We show that the number of rank functions of such a graph is proportional to the number of 3-colorings of the same graph, thereby connecting rank functions to the Potts model of statistical mechanics. Lastly, we look at some asymptotics of rank and Lipschitz functions for various types of graphs.

## 17) Andrew Xia, Integrated Gene Expression Probabilistic Models for Cancer Staging (1 January 2013)

The current system for classifying cancer patients' stages was introduced more than one hundred years ago. With the modern advance in technology, many parts of the system have been outdated. Because the current staging system emphasizes surgical procedures that could be harmful to patients, there has been a movement to develop a new Taxonomy, using molecular signatures to potentially avoid surgical testing. This project explores the issues of the current classification system and also looking for a potentially better way to classify cancer patients’ stages. Computerization has made a vast amount of cancer data available online. However, a significant portion of the data is incomplete; some crucial information is missing. It is logical to attempt to develop a system of recovering missing cancer data. Successful completion of this research saves costs and increases efficiency in cancer research and curing. Using various methods, we have shown that cancer stages cannot be simply extrapolated with incomplete data. Furthermore, a new approach of using RNA Sequencing data is studied. RNA Sequencing can potentially become a cost-efficient way to determine a cancer patient’s stage. We have obtained promising results of using RNA sequencing data in breast cancer staging.

## 16) Surya Bhupatiraju , On the Complexity of the Marginal Satisfiability Problem (18 November 2012)

The marginal satisfiability problem (MSP) asks: Given desired marginal distributions D S for every subset S of c variable indices from {1, . . . , n}, does there exist a distribution D over n-tuples of values in {1, . . . , m} with those S -marginals D S ? Previous authors have studied MSP in fixed dimensions, and have classified the complexity up to certain upper bounds. However, when using general dimensions, it is known that the size of distributions grows exponentially, making brute force algorithms impractical. This presents an incentive to study more general, tractable variants, which in turn may shed light on the original problem's structure. Thus, our work seeks to explore MSP and its variants for arbitrary dimension, and pinpoint its complexity more precisely. We solve MSP for n = 2 and completely characterize the complexity of three closely related variants of MSP. In particular, we detail novel greedy and stochastic algorithms that handle exponentially-sized data structures in polynomial time, as well as generate accurate representative samples of these structures in polynomial time. These algorithms are also unique in that they represent possible protocols in data compression for communication purposes. Finally, we posit conjectures related to more generalized MSP variants, as well as the original MSP.

## 15) Fengning Ding and Aleksander Tsymbaliuk, Representations of Infinitesimal Cherednik Algebras (arXiv.org, 17 October 2012), published in Representation Theory 17 (2013)

Infinitesimal Cherednik algebras, first introduced by Etingof, Gan, and Ginzburg (2005), are continuous analogues of rational Cherednik algebras, and in the case of gl n , are deformations of universal enveloping algebras of the Lie algebras sl n+1 . Despite these connections, infinitesimal Cherednik algebras are not widely-studied, and basic questions of intrinsic algebraic and representation theoretical nature remain open. In the first half of this paper, we construct the complete center of H ζ (gl n ) for the case of n = 2 and give one particular generator of the center, the Casimir operator, for general n. We find the action of this Casimir operator on the highest weight modules to prove the formula for the Shapovalov determinant, providing a criterion for the irreducibility of Verma modules. We classify all irreducible finite dimensional representations and compute their characters. In the second half, we investigate Poisson-analogues of the infinitesimal Cherednik algebras and use them to gain insight on the center of H ζ (gl n ). Finally, we investigate H ζ (sp 2n ) and extend various results from the theory of H ζ (gl n ), such as a generalization of Kostant's theorem.

## 14) Tanya Khovanova and Dai Yang, Halving Lines and Their Underlying Graphs (arXiv.org, 17 October 2012), published in Involve 11:1 (2018): 1–11

In this paper we study halving-edges graphs corresponding to a set of halving lines. Particularly, we study the vertex degrees, path, cycles and cliques of such graphs. In doing so, we study a vertex-partition of said graph called chains which are equipped with interesting properties.

## 2011 Research Papers

13) carl lian, representations of cherednik algebras associated to complex reflection groups in positive characteristic (arxiv.org, 1 july 2012).

We consider irreducible lowest-weight representations of Cherednik algebras associated to certain classes of complex reflection groups in characteristic p . In particular, we study maximal submodules of Verma modules associated to these algebras. Various results and conjectures are presented concerning generators of these maximal submodules, which are found by computing singular polynomials of Dunkl operators. This work represents progress toward the general problem of determining Hilbert series of irreducible lowest-weight representations of arbitrary Cherednik algebras in characteristic p .

## 12) Aaron Klein, Joel Brewster Lewis, and Alejandro Morales, Counting matrices over finite fields with support on skew Young and Rothe diagrams (arXiv.org, 26 March 2012); published in the Journal of Algebraic Combinatorics (May 2013)

We consider the problem of finding the number of matrices over a finite field with a certain rank and with support that avoids a subset of the entries. These matrices are a q-analogue of permutations with restricted positions (i.e., rook placements). For general sets of entries these numbers of matrices are not polynomials in q (Stembridge 98); however, when the set of entries is a Young diagram, the numbers, up to a power of q-1, are polynomials with nonnegative coefficients (Haglund 98). In this paper, we give a number of conditions under which these numbers are polynomials in q, or even polynomials with nonnegative integer coefficients. We extend Haglund's result to complements of skew Young diagrams, and we apply this result to the case when the set of entries is the Rothe diagram of a permutation. In particular, we give a necessary and sufficient condition on the permutation for its Rothe diagram to be the complement of a skew Young diagram up to rearrangement of rows and columns. We end by giving conjectures connecting invertible matrices whose support avoids a Rothe diagram and Poincaré polynomials of the strong Bruhat order.

## 11) Surya Bhupatiraju , Pavel Etingof, David Jordan, William Kuszmaul , and Jason Li, Lower central series of a free associative algebra over the integers and finite fields (arXiv.org, 8 March 2012), published in the Journal of Algebra (December 2012)

Consider the free algebra A_n generated over Q by n generators x_1, ..., x_n. Interesting objects attached to A = A_n are members of its lower central series, L_i = L_i(A), defined inductively by L_1 = A, L_{i+1} = [A,L_{i}], and their associated graded components B_i = B_i(A) defined as B_i=L_i/L_{i+1}. These quotients B_i, for i at least 2, as well as the reduced quotient \bar{B}_1=A/(L_2+A L_3), exhibit a rich geometric structure, as shown by Feigin and Shoikhet and later authors (Dobrovolska-Kim-Ma, Dobrovolska-Etingof, Arbesfeld-Jordan, Bapat-Jordan). We study the same problem over the integers Z and finite fields F_p. New phenomena arise, namely, torsion in B_i over Z, and jumps in dimension over F_p. We describe the torsion in the reduced quotient RB_1 and B_2 geometrically in terms of the De Rham cohomology of Z^n. As a corollary we obtain a complete description of \bar{B}_1(A_n(Z)) and \bar{B}_1(A_n(F_p)), as well as of B_2(A_n(Z[1/2])) and B_2(A_n(F_p)), p>2. We also give theoretical and experimental results for B_i with i>2, formulating a number of conjectures and questions based on them. Finally, we discuss the supercase, when some of the generators are odd (fermionic) and some are even (bosonic), and provide some theoretical results and experimental data in this case.

10) David Jordan and Masahiro Namiki, Determinant formulas for the reflection equation algebra (19 Feb 2012)

In this note, we report on work in progress to explicitly describe generators of the center of the reflection equation algebra associated to the quantum GL(N) R-matrix. In particular, we conjecture a formula for the quantum determinant, and for the quadratic central element, both of which involve the excedance statistic on the symmetric group. Current efforts are directed at proving these formulas, and at finding formulas for the remaining central elements.

## 9) Ziv Scully , Yan Zhang, and Tian-Yi (Damien) Jiang, Firing Patterns in the Parallel Chip-Firing Game (arXiv.org, 29 Nov 2012), published in Discrete Mathematics and Theoretical Computer Science (DMTCS) proc., Nancy, France, 2014

The parallel chip-firing game is an automaton on graphs in which vertices “fire” chips to their neighbors. This simple model, analogous to sandpiles forming and collapsing, contains much emergent complexity and has connections to different areas of mathematics including self-organized criticality and the study of the sandpile group. In this work, we study firing sequences , which describe each vertex’s interaction with its neighbors in this game. Our main contribution is a complete characterization of the periodic firing sequences that can occur in a game, which have a surprisingly simple combinatorial description. We also obtain other results about local behavior of the game after introducing the concept of motors .

## 8) Sheela Devadas , Lowest-weight representations of Cherednik algebras in positive characteristic (29 Jan 2012)

We study lowest-weight irreducible representations of rational Cherednik algebras attached to the complex reflection groups G(m, r, n) in characteristic p , focusing specifically on the case p ≤ n , which is more complicated than the case p > n . The goal of our work is to calculate characters (and in particular Hilbert series) of these representations. By studying the kernel of the contravariant bilinear form on Verma modules, we proved formulas for Hilbert series of irreducible modules in a number of cases, and also obtained a lot of computer data which suggests a number of conjectures. Specifically, we find that the shape and form of the Hilbert series of the irreducible representations and the generators of the kernel tend to be determined by the value of n modulo p .

## 7) Christina Chen , Maximizing Volume Ratios for Shadow Covering by Tetrahedra (arXiv.org, 9 Jan 2012)

Define a body A to be able to hide behind a body B if the orthogonal projection of B contains a translation of the corresponding orthogonal projection of A in every direction. In two dimensions, it is easy to observe that there exist two objects such that one can hide behind another and have a larger area than the other. It was recently shown that similar examples exist in higher dimensions as well. However, the highest possible volume ratio for such bodies is still undetermined. We investigated two three-dimensional examples, one involving a tetrahedron and a ball and the other involving a tetrahedron and an inverted tetrahedron. We calculate the highest volume ratio known up to this date, 1.16, which is generated by our second example.

## 6) Yongyi Chen, Pavel Etingof, David Jordan, and Michael Zhang , Poisson traces in positive characteristic (arXiv.org, 29 Dec 2011)

We study Poisson traces of the structure algebra A of an affine Poisson variety X defined over a field of characteristic p. According to arXiv:0908.3868v4 , the dual space HP_0(A) to the space of Poisson traces arises as the space of coinvariants associated to a certain D-module M(X) on X. If X has finitely many symplectic leaves and the ground field has characteristic zero, then M(X) is holonomic, and thus HP_0(A) is finite dimensional. However, in characteristic p, the dimension of HP_0(A) is typically infinite. Our main results are complete computations of HP_0(A) for sufficiently large p when X is 1) a quasi-homogeneous isolated surface singularity in the three-dimensional space, 2) a quotient singularity V/G, for a symplectic vector space V by a finite subgroup G in Sp(V), and 3) a symmetric power of a symplectic vector space or a Kleinian singularity. In each case, there is a finite nonnegative grading, and we compute explicitly the Hilbert series. The proofs are based on the theory of D-modules in positive characteristic.

## 5) Saarik Kalia , The Generalizations of the Golden Ratio: Their Powers, Continued Fractions, and Convergents (23 Dec 2011)

The relationship between the golden ratio and continued fractions is commonly known about throughout the mathematical world: the convergents of the continued fraction are the ratios of consecutive Fibonacci numbers. The continued fractions for the powers of the golden ratio also exhibit an interesting relationship with the Lucas numbers. In this paper, we study the silver means and introduce the bronze means, which are generalizations of the golden ratio. We correspondingly introduce the silver and bronze Fibonacci and Lucas numbers, and we prove the relationship between the convergents of the continued fractions of the powers of the silver and bronze means and the silver and bronze Fibonacci and Lucas numbers. We further generalize this to the Lucas constants, a two-parameter generalization of the golden ratio.

## 4) Caroline Ellison , The Number of Nonzero Coefficients of Powers of a Polynomial over a Finite Field (15 Nov 2011)

Coefficients of polynomials over finite fields often encode information that can be applied in various areas of science; for instance, computer science and representation theory. The purpose of this project is to investigate these coefficients over the finite field F p . We find four exact results for the number of nonzero coefficients in special cases of n and p for the polynomial (1 + x + x 2 ) n . More importantly, we use Amdeberhan and Stanley's matrices to find what we conjecture to be an approximation for the sum of the number of nonzero coefficients of P(x) n over F p . We also relate the number of nonzero coefficients to the number of base p digits of n . These results lead to questions in representation theory and combinatorics.

## 3) Xiaoyu He , On the Classification of Universal Rotor-Routers (arXiv.org, 6 Nov 2011)

The combinatorial theory of rotor-routers has connections with problems of statistical mechanics, graph theory, chaos theory, and computer science. A rotor-router network defines a deterministic walk on a digraph G in which a particle walks from a source vertex until it reaches one of several target vertices. Motivated by recent results due to Giacaglia et al., we study rotor-router networks in which all non-target vertices have the same type. A rotor type r is universal if every hitting sequence can be achieved by a homogeneous rotor-router network consisting entirely of rotors of type r. We give a conjecture that completely classifies universal rotor types. Then, this problem is simplified by a theorem we call the Reduction Theorem that allows us to consider only two-state rotors. A rotor-router network called the compressor, because it tends to shorten rotor periods, is introduced along with an associated algorithm that determines the universality of almost all rotors. New rotor classes, including boppy rotors, balanced rotors, and BURD rotors, are defined to study this algorithm rigorously. Using the compressor the universality of new rotor classes is proved, and empirical computer results are presented to support our conclusions. Prior to these results, less than 100 of the roughly 260,000 possible two-state rotor types of length up to 17 were known to be universal, while the compressor algorithm proves the universality of all but 272 of these rotor types.

## 2) Yongyi Chen and Michael Zhang, On zeroth Poisson homology in positive characteristic (30 Sept 2011)

A Poisson algebra is a commutative algebra with a Lie bracket {,} satisfying the Leibniz rule. An important invariant of a Poisson algebra A is its zeroth Poisson homology HP_0(A)=A/A,A}. It characterizes densities on the phase space invariant under all Hamiltonian flows. Also, the dimension of HP_0(A) gives an upper bound for the number of irreducible representations of any quantization of A. We study HP_0(A) when A is the algebra of functions on an isolated quasihomogeneous surface singularity. Over C, it's known that HP_0(A) is the Jacobi ring of the singularity whose dimension is the Milnor number. We generalize this to characteristic p. In this case, HP_0(A) is a finite (although not finite dimensional) module over A^p. We give its conjectural Hilbert series for Kleinian singularities and for cones of smooth projective curves, and prove the conjecture in several cases. (The conjecture has now been proved in general in our follow-up paper with P. Etingof and D. Jordan.)

## 1) Christina Chen , Tanya Khovanova, and Daniel A. Klain, Volume bounds for shadow covering (arXiv.org, 8 Sep 2011), published in Transactions of the American Mathematical Society 366 (2014)

For n ≥ 2 a construction is given for a large family of compact convex sets K and L in n -dimensional Euclidean space such that the orthogonal projection L u onto the subspace u ⊥ contains a translate of the corresponding projection K u for every direction u , while the volumes of K and L satisfy V n (K) > V n (L) . It is subsequently shown that, if the orthogonal projection L u onto the subspace u ⊥ contains a translate of K u for every direction u , then the set (n/(n−1))L contains a translate of K . It follows that V n (K) ≤ (n/(n−1)) n V n (L) . In particular, we derive a universal constant bound V n (K) ≤ 2.942 V n (L) , independent of the dimension n of the ambient space. Related results are obtained for projections onto subspaces of some fixed intermediate co-dimension. Open questions and conjectures are also posed.

With questions, contact PRIMES Program Director Slava Gerovitch at

## Articles on Prime numbers

Displaying all articles.

## Exploring the mathematical universe – connections, contradictions, and kale

Joan Licata , Australian National University

## Has one of math’s greatest mysteries, the Riemann hypothesis, finally been solved?

William Ross , University of Richmond

## Why prime numbers still fascinate mathematicians, 2,300 years later

Martin H. Weissman , University of California, Santa Cruz

## Why do we need to know about prime numbers with millions of digits?

Ittay Weiss , University of Portsmouth

## What’s the point of maths research? It’s the abstract nonsense behind tomorrow’s breakthroughs

Wolfram Bentz , University of Hull

## The 22 million digit number … and the amazing maths behind primes

Steve Humble , Newcastle University

## A little number theory makes the times table a thing of beauty

Anita Ponsaing , The University of Melbourne

## Related Topics

- Cryptography
- Mathematicians
- Mathematics
- Maths study
- Quick reads

## Top contributors

Research Associate in Mathematics, The University of Melbourne

Mathematics Education Primary and Secondary PGCE, Newcastle University

Director of Research for Mathematics, University of Hull

Mathematician, University of Portsmouth

Associate Professor of Mathematics, University of California, Santa Cruz

Professor of Mathematics, University of Richmond

Associate Professor, Mathematics, Australian National University

- X (Twitter)
- Unfollow topic Follow topic

## Turning Toxins into Treatments

When music becomes a quasi-philosophical exercise, openmind books, scientific anniversaries, faraday, the apprentice who popularized electricity, featured author, latest book, hunting for prime numbers: who cares and why.

More than 3,550 years ago, an Egyptian scribe named Ahmes wrote a papyrus on which he recorded differently those fractions whose denominators were prime numbers. The data is often cited as a sign that the knowledge and search for these peculiar numbers are almost as old as human thought, a search that has reached almost inconceivable heights in the last couple of decades. But what is the point of hunting for ever-larger prime numbers?

The definition of a prime number is so simple that it is learned in primary school: it is that natural number greater than 1 that can only be divided exactly by 1 and by itself. In fact, this apparent simplicity is part of its appeal, according to what Adrian Dudek, a mathematician from Australian National University, tells OpenMind : “I think the fascination for prime numbers comes from the fact that they are so elementary in description but yet incredibly difficult to analyse. A young child can understand what makes a number prime, yet lifetimes of mathematical research have been spent trying to solve some of the problems in the field.”

The first known person to look specifically at this subject was the Greek mathematician Euclid of Alexandria, who around 300 B.C. demonstrated for the first time that prime numbers are infinite. A century later, another Greek mathematician, Eratosthenes , created a screening method that allows all the prime numbers of a limited list to be identified, simply by crossing out multiples.

## The Mersenne primes

After the Greeks, interest in prime numbers was only revived at the end of the Middle Ages. At the beginning of the 17th century, French monk Marin Mersenne defined the prime numbers that bear his name, obtained as M p = 2 p – 1. If p is a prime number, it is possible, though not certain, that M p is also a prime number. Already in 1588, Italian mathematician Pietro Cataldi had shown that 2 19 – 1 = 524,287 is prime, setting a record for his time. The Mersenne primes became the mathematicians’ preferred target thanks to tests such as the Lucas-Lehmer primality test , which facilitates verification. Édouard Lucas himself, a French mathematician, demonstrated in 1876 that 2 127 – 1 is a prime. This 39-digit number remains the highest prime discovered by manual calculations.

In 1951 computers began to be used to calculate even larger new prime numbers. That year a new record was set with a 79-digit number, but this number began to grow rapidly with advances in computing. In 1989 the largest prime number was 65,087 digits; ten years later, Mersenne’s prime M 6972593 reached 2,098,960 digits.

The great leap from thousands to millions of digits came about mainly because of one person. In 1996, the American George Woltman, from the Massachusetts Institute of Technology, founded the Great Internet Mersenne Prime Search (GIMPS), a distributed computing project that searches for new Mersenne primes and in which any user can participate by downloading the software Prime95 , created by Woltman.

Since then, all new primes have been discovered by GIMPS users. The current record is held by the 51 st known Mersenne prime. This real colossus, M 82589933 , discovered on December 7, 2018 by Florida programmer Patrick Laroche, reaches the unimaginable length of 24,862,048 digits ; if someone were to try to print it on paper, almost 10,000 sheets would be needed.

And the search goes on. As Woltman tells OpenMind , “GIMPS will continue to forge ahead over the coming years. Our progress is dependent on the number of users and advancements in hardware.” Of course, there can be a reward for hunters: GIMPS awards $3,000 for new discoveries , and both the Woltman project and its participants are eligible for the awards given by Electronic Frontier Foundation , which currently offers a $150,000 purse to anyone finding a prime over 100 million digits long .

## The utility of gigantic primes

But leaving aside the economic incentive or the headlines, what motivates this search? For mathematicians, the importance of prime numbers is indisputable; since the rest of the natural numbers are broken down into a product of primes , they are considered building blocks in number theory. “If you want to understand a building, how it will react to a storm or earthquake, you must first know what it is made of,” University of Tennessee mathematician Chris Caldwell, discoverer of prime numbers and author of The Prime Pages website, which maintains a list of the top 5,000 known , tells OpenMind . “I find primes beautiful because of the ubiquity, their myriad of uses, and what appears to be randomness in their distribution.” Moreover, they are at the heart of such famous mathematical problems as Goldbach’s conjecture or the Riemann hypothesis.

However, despite the theoretical purity defended by mathematicians, the truth is that prime numbers have also brought great practical benefits to humanity, such as electronic commerce. In 1977, three researchers designed RSA cryptography (using the initials of Rivest, Shamir and Adleman), based on the known product of two large prime numbers, which can only be deciphered by those who know the factors. This type of encryption, called asymmetric or public key , is used for encryption on the Internet, for example in digital signatures, and is the most important current application of large prime numbers. However, only numbers with a few hundred digits are used; it would be unthinkable to use the giant primes known today.

So, do these numerical giants have any specific uses? For Martin Weissman, a mathematician at the University of California at Santa Cruz, the search for gigantic primes “is interesting, but not very important,” beyond stimulating interest in mathematics. “If someone found a new algorithm that quickly determined whether a number with millions of digits was prime, that would be interesting to me,” he tells OpenMind . For Weissman, classical problems such as the Riemann hypothesis will focus interest on the field of prime numbers in the coming decades.

But even if technologies like Artificial Intelligence or quantum computers break down current barriers in computing, “it’s highly unlikely that gigantic prime numbers will ever be used in the same way that currently large primes are used,” University of Portsmouth mathematician Ittay Weiss tells OpenMind . And this is not only because of the difficulty in computing them, but also because it would not contribute anything relevant. However, Weiss speculates that these numbers could be used to test new computers or algorithms, as is currently done with millions of decimals of pi. “Perhaps huge primes can serve a similar purpose,” he suggests.

Ultimately, according to Woltman, “the point of the search is primarily just for fun. There is great joy in discovering a new, exceedingly rare, and interesting item. Mankind loves to break records like building the fastest car, exploring new territory like outer space, and challenging oneself like climbing Mt. Everest.” And, as Caldwell says, “the journey is often more important than the destination.”

## Javier Yanes

Related publications.

- The Strange Relation Between Numbers and Neurons
- When Magic Gave Way to Numbers
- Fibonacci and his Magic Numbers

## More about Science

Environment, leading figures, mathematics, scientific insights, more publications about ventana al conocimiento (knowledge window), comments on this publication.

Morbi facilisis elit non mi lacinia lacinia. Nunc eleifend aliquet ipsum, nec blandit augue tincidunt nec. Donec scelerisque feugiat lectus nec congue. Quisque tristique tortor vitae turpis euismod, vitae aliquam dolor pretium. Donec luctus posuere ex sit amet scelerisque. Etiam sed neque magna. Mauris non scelerisque lectus. Ut rutrum ex porta, tristique mi vitae, volutpat urna.

Sed in semper tellus, eu efficitur ante. Quisque felis orci, fermentum quis arcu nec, elementum malesuada magna. Nulla vitae finibus ipsum. Aenean vel sapien a magna faucibus tristique ac et ligula. Sed auctor orci metus, vitae egestas libero lacinia quis. Nulla lacus sapien, efficitur mollis nisi tempor, gravida tincidunt sapien. In massa dui, varius vitae iaculis a, dignissim non felis. Ut sagittis pulvinar nisi, at tincidunt metus venenatis a. Ut aliquam scelerisque interdum. Mauris iaculis purus in nulla consequat, sed fermentum sapien condimentum. Aliquam rutrum erat lectus, nec placerat nisl mollis id. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Nam nisl nisi, efficitur et sem in, molestie vulputate libero. Quisque quis mattis lorem. Nunc quis convallis diam, id tincidunt risus. Donec nisl odio, convallis vel porttitor sit amet, lobortis a ante. Cras dapibus porta nulla, at laoreet quam euismod vitae. Fusce sollicitudin massa magna, eu dignissim magna cursus id. Quisque vel nisl tempus, lobortis nisl a, ornare lacus. Donec ac interdum massa. Curabitur id diam luctus, mollis augue vel, interdum risus. Nam vitae tortor erat. Proin quis tincidunt lorem.

## Pasteur and the Apostles of the Fight Against Infections

Do you want to stay up to date with our new publications.

Receive the OpenMind newsletter with all the latest contents published on our website

## OpenMind Books

- The Search for Alternatives to Fossil Fuels
- View all books

## About OpenMind

Connect with us.

- Keep up to date with our newsletter

## The PrimePages : prime number research & records

What are the primepages.

Webster's New Collegiate Dictionary defines prime as follows.

prime \'prīm\ n [ ME , fr. MF , fem. of prin first, L primus; akin to L prior ] 1 : first in time : ORIGINAL 2 a : having no factor except itself and one <3 is a ~ number> b : having no common factor except one <12 and 25 are relatively ~> 3 a : first in rank, authority or significance : PRINCIPAL b : having the highest quality or value <~ television time>

Each of Webster's definitions apply, but the most operative is 2a: An integer greater than one is prime if its only positive divisors are itself and one . For example 15 is not prime because it has other divisors, namely the primes 3 and 5. These pages presents lists of large primes and key research about them.

## Largest Known Primes Database

Our central database acts as a “Guinness book” of prime number records! This list includes the 5000 largest known primes and smaller ones of selected forms updated hourly.

## Other Lists of Primes Here

The first 1,000 primes and first 50,000,000 primes. Lists of top 20 records (e.g., twin primes , Mersenne primes ...). Small random primes up to 300 digits . The smallest titanics with special forms and many more .

See also the various database searches .

## Finding/proving

The theory behind how these record primes are found and proven.

We answer common questions: Is one a prime? Longest list of primes? Why?

## Prime Glossary

The Prime Glossary is a collection of definitions related to prime numbers.

## Largest Known Prime by Year ↓

The Largest Known Prime by Year discusses how big have the largest known primes been historically.

## How many primes are there? ↓

Over 2000 years ago Euclid proved that there are infinity many. How Big of an Infinity?

## The Riemann Hypothesis ↓

A short note about one of the most important conjectures in prime number theory. When (and if) it is proven, many of the bounds on prime estimates can be improved and primality proving can be simplified.

## Check a Number's Primality ↓

A simple routine to check most small numbers for primality (and a link to a more sophisticated test). Check Primality

Eve's Dad showed her a really big prime!

Cyber Security pp 315–326 Cite as

## Prime Numbers: Foundation of Cryptography

- Sonal Sarnaik 17 &
- Basit Ansari 17
- Conference paper
- First Online: 28 April 2018

1524 Accesses

Part of the Advances in Intelligent Systems and Computing book series (AISC,volume 729)

Prime number plays a very important role in cryptography. There are various types of prime numbers and consists various properties. This paper gives the detail description of the importance of prime numbers in cryptography and algorithms which generates large/strong prime numbers. This paper also focuses on algorithms which find prime factors and tests whether the entered number is prime number or not.

- Prime numbers
- Primality testing
- Prime number generation

This is a preview of subscription content, log in via an institution .

## Buying options

- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
- Compact, lightweight edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

## Book References

Menezes B. Network security and cryptography: Cengage Learning, India, 2010, 432

Google Scholar

Bose R. Information theory, coding and cryptography 2008, Tata Mc Graw hill

Menezes AJ, van Oorschot PC, Vanstone SA (2001) Handbook of applied cryptography, CRC Press, London, Oct 1996, 816

CrossRef Google Scholar

Stinson DR (2006) Cryptography: theory and practice, 3rd edn. CRC Press, London

## Journal References

Rivest R, Shamir A, Adleman L (1978) A method for obtaining digital signature and publickey cryptosystem communications. ACM 21:120–126

Crandall R, Pomerance C (2001) Prime numbers, a computational perspective. Springer, New York

Joye M, Paillier P, Vaudenay S (2000) Efficient generation of prime numbers?, Springer-Verlag, 1965:34–354

Rivest RL, Silvermany RD. Are strong primes needed for RSA?

Agrawal M, Kayal N, Saxena N. Primes is in p

Wagsta SS Jr (2014) Is there a shortage of primes for cryptography?, 2(IX), Sep 2014, IJARET

Sarnaik S, Gadekar D, Gaikwad U. An overview to integer factorization and RSA in cryptography

Saouter Y. A (1995) new method for the generation of strong prime numbers, RR-2657, INRIA

Galbraith SD (2012) Towards a rigorous analysis of Pollard Rho. Mathematics of public key cryptography. Cambridge University Press, Cambridge, pp 272–273, ISBN 9781107013926

Yan Y (2008) Integer factorization attacks. Cryptanalytic attacks on RSA, Springer-Verlag, US, 255

Abubakar A, Jabaka S, Tijjani BI (2014) Cryptanalytic attacks on Rivest, Shamir, and Adleman (RSA) cryptosystem: issues and challenges, JATIT, Mar 2014, 61(1):37–43

Hawana B (2013) An overview and cryptographic challenges of RSA. IJERMT

Chalurkar SN, Khochare N, Meshram BB (2011) Survey on modular attack on RSA algorithm, IJCEM, Vol 14, Oct 2011, 106–110

Download references

## Author information

Authors and affiliations.

Marathwada Institute of Technology, Aurangabad, India

Sonal Sarnaik & Basit Ansari

You can also search for this author in PubMed Google Scholar

## Corresponding author

Correspondence to Sonal Sarnaik .

## Editor information

Editors and affiliations.

Department of Computer Science, Aligarh Muslim University, Aligarh, Uttar Pradesh, India

M. U. Bokhari

National Institute of Financial Management, Faridabad, Haryana, India

Namrata Agrawal

Bharati Vidyapeeth’s College of Engineering (BVCOE), New Delhi, India

Dharmendra Saini

## Rights and permissions

Reprints and permissions

## Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

## About this paper

Cite this paper.

Sarnaik, S., Ansari, B. (2018). Prime Numbers: Foundation of Cryptography. In: Bokhari, M., Agrawal, N., Saini, D. (eds) Cyber Security. Advances in Intelligent Systems and Computing, vol 729. Springer, Singapore. https://doi.org/10.1007/978-981-10-8536-9_31

## Download citation

DOI : https://doi.org/10.1007/978-981-10-8536-9_31

Published : 28 April 2018

Publisher Name : Springer, Singapore

Print ISBN : 978-981-10-8535-2

Online ISBN : 978-981-10-8536-9

eBook Packages : Engineering Engineering (R0)

## Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

- Publish with us

Policies and ethics

- Find a journal
- Track your research

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

- View all journals
- My Account Login
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Open access
- Published: 03 January 2024

## Satellite mapping reveals extensive industrial activity at sea

- Fernando Paolo ORCID: orcid.org/0000-0002-6439-2918 1 na1 ,
- David Kroodsma ORCID: orcid.org/0000-0002-1752-9141 1 na1 ,
- Jennifer Raynor ORCID: orcid.org/0000-0003-0229-9889 2 ,
- Tim Hochberg ORCID: orcid.org/0000-0003-2456-1286 1 ,
- Pete Davis 1 ,
- Jesse Cleary 3 ,
- Luca Marsaglia ORCID: orcid.org/0009-0005-4497-3271 1 ,
- Sara Orofino 4 ,
- Christian Thomas ORCID: orcid.org/0000-0001-9574-224X 5 &
- Patrick Halpin ORCID: orcid.org/0000-0001-5845-3588 3

Nature volume 625 , pages 85–91 ( 2024 ) Cite this article

52k Accesses

1388 Altmetric

Metrics details

- Energy infrastructure
- Environmental impact

The world’s population increasingly relies on the ocean for food, energy production and global trade 1 , 2 , 3 , yet human activities at sea are not well quantified 4 , 5 . We combine satellite imagery, vessel GPS data and deep-learning models to map industrial vessel activities and offshore energy infrastructure across the world’s coastal waters from 2017 to 2021. We find that 72–76% of the world’s industrial fishing vessels are not publicly tracked, with much of that fishing taking place around South Asia, Southeast Asia and Africa. We also find that 21–30% of transport and energy vessel activity is missing from public tracking systems. Globally, fishing decreased by 12 ± 1% at the onset of the COVID-19 pandemic in 2020 and had not recovered to pre-pandemic levels by 2021. By contrast, transport and energy vessel activities were relatively unaffected during the same period. Offshore wind is growing rapidly, with most wind turbines confined to small areas of the ocean but surpassing the number of oil structures in 2021. Our map of ocean industrialization reveals changes in some of the most extensive and economically important human activities at sea.

More than one billion people depend on the ocean for their primary source of food 1 , 2 , 3 , with 260 million employed by global marine fisheries alone 6 . About 80% of all traded goods are shipped over the ocean 7 and nearly 30% of the world’s oil is produced in offshore fields and distributed worldwide 8 . In addition to these established uses of the ocean, increases in offshore renewable energy, aquaculture and mining are rapidly emerging. All of this industrial machinery powers a 1.5–2.5 trillion dollar ‘blue economy’ 9 , 10 that is growing faster than the overall global economy 10 but is also causing rapid environmental decline. A third of fish stocks are operated beyond biologically sustainable levels 11 and an estimated 30–50% of critical marine habitats have been lost owing to human industrialization 12 , 13 , 14 .

A lack of global observational data limits understanding of where and how the blue economy is expanding and how it is affecting developing nations and coastal communities 15 , 16 , 17 . On land, maps exist for almost every road 18 , datasets are being developed for every human-made structure 19 and extractive industries such as forestry and agriculture are mapped globally at sub-kilometre scale and updated monthly 20 , 21 . In the ocean, however, many seagoing vessels do not broadcast their location or are not detected by public monitoring systems 22 , and information on the development of offshore infrastructure and other industrial activities is often held private 23 . The result is that continuing human expansion into the ocean is poorly documented.

Current approaches for mapping human activity at sea have limitations. Some vessel-tracking systems, such as the vessel monitoring system (VMS) used in fishing, are proprietary, which limit the ability to map and compare across regions 5 . For public mapping of ships, the focus has been on the automatic identification system (AIS) 24 , which broadcasts vessel coordinates to track vessel movements and support maritime safety; AIS data can also reveal vessel identities, owners and corporations, and fishing activities 5 , 25 , 26 . Not all vessels, however, are required to use AIS devices, as regulations vary by country, vessel size and activity 22 . Vessels engaged in illicit activities often turn off their AIS transponders or manipulate the locations they broadcast 27 , 28 , 29 . In recent years, for example, the largest cases of illegal fishing 28 and forced labour 30 , 31 were by fleets that mostly did not use AIS devices. Furthermore, large ‘blind spots’ along coastal waters emerge where satellite reception is poor 22 and AIS data received by terrestrial receptors can be restricted by national governments 32 . We refer to vessels that are not visible on publicly accessible AIS data as ‘not publicly tracked’. This concept is also sometimes referred to as ‘dark vessels’. Although the location of offshore fixed infrastructure should be more readily available than moving vessels, information on offshore development is often restricted for commercial or bureaucratic reasons 33 , and large-scale assessments must aggregate several disparate data sources, which are often incomplete or outdated 34 . Vessel activity and ocean infrastructure are not captured well by existing methods, but satellite imagery and deep learning can improve the monitoring of human use of the ocean.

Here we present a detailed global map of major industrial activities at sea. To detect and characterize vessels and offshore infrastructure in coastal waters around the globe, we analysed 2 petabytes of satellite imagery spanning the years 2017–2021, with our analyses covering more than 15% of the ocean (Extended Data Fig. 1 ) in which more than 75% of industrial activity is concentrated ( Methods ). We designed and trained three deep convolutional neural networks to identify objects (>97% accuracy) and estimate their lengths ( R 2 score of 0.84); to classify offshore infrastructure into oil, wind and other objects (>98% accuracy); and to classify vessels as fishing or non-fishing (>90% accuracy). Combined, we classified more than 67 million image tiles, including dual-polarization synthetic-aperture radar (SAR) imagery from Sentinel-1 (ref. 35 ) and optical (red, green, blue and near-infrared (NIR)) imagery from Sentinel-2 (ref. 36 ). The resolution of SAR allows us to capture most objects larger than 15 m (detection rate >70% for 25-m vessels and >90% for vessels 50 m and larger; Extended Data Fig. 2 ). We also analysed 53 billion vessel GPS positions from the AIS and matched them to the satellite detections to determine whether a detected vessel was publicly tracked.

## Fishing and non-fishing vessels

During 2017–2021, on average, about 63,300 vessel occurrences were detected at any given moment, roughly half (42–49%) of which were fishing vessels (based on 23.1 million vessel detections; Fig. 1 ). Notably, about three-quarters (72–76%) of globally mapped industrial fishing did not appear in public monitoring systems, compared with one-quarter (21–30%) for other vessel activities.

a , b , Per square kilometre, the average number of industrial fishing vessels ( a ) and shipping, tanker, passenger and support vessels ( b ), from 5 years of satellite SAR imagery. The colour represents the percentage of detected vessels that were matched (blue, publicly tracked) and unmatched (red, not publicly tracked) to known vessel positions from AIS broadcast. c , For each continent, the total number of detected vessels and the respective fraction of publicly and not publicly tracked. The outline around the continents (light grey) shows the area of the ocean with available SAR imagery (see Extended Data Fig. 1 for the spatial distribution of images). ‘N. America’ includes Central American countries. Classification of detected objects was performed with deep learning.

Vessel activity was widespread but also highly concentrated. Dividing our study area into 0.1° cells (about 11 km), we detected a vessel at least once in 84% of the cells covered by the satellites, yet half of all vessel activity was concentrated in less than 3% of the cells. Most vessel activity (86% of fishing and 75% of non-fishing) was focused in waters less than 200 m deep (Fig. 1 ), which constitute only 7% of the ocean. Activity is also unevenly distributed by continent, with approximately 67% of all vessel activity in Asia, followed by 12% in Europe, 7% in North America, 7% in Africa, 4% in South America and 2% in Australia (Fig. 1 ).

Our satellite mapping revealed high densities of vessel activity in large areas of the ocean that previously showed little to no vessel activity by public tracking systems (Fig. 2 ). Indonesia, South Asia, Southeast Asia and the northern and western coasts of Africa (Fig. 2 and Extended Data Figs. 3 and 4 ) all show substantial amounts of activity not publicly tracked.

Satellite SAR detections of individual vessels during 2017–2021, matched (blue) and unmatched (red) to known vessel positions from AIS broadcast, are classified as fishing or non-fishing vessels with a deep-learning model. Most fishing vessels, usually smaller than 50 m in length, concentrate close to shore and follow bathymetric features, such as the continental shelf break and seabed canyons, or regulatory and political boundaries. Extensive areas of previously unmapped fishing activity are revealed along Northern Africa and South and Southeast Asia. The absolute number of detections in each location depends on the local vessel density and the number of satellite image acquisitions, which varies by region. Depicted numbers may represent a slightly larger area than is shown. This figure shows the level of spatial detail that is possible with our mapping approach. Extended Data Figures 3 and 4 show further examples of high-resolution fishing and non-fishing patterns publicly and not publicly tracked.

By mapping vessels that fail to broadcast their location, we show far more accurately the global distribution of industrial fishing. AIS data alone, for example, wrongly suggest that Europe and Asia have comparable fishing activity, with other continents having less than one-fifth as much activity (Extended Data Table 1 ). Our global map, however, reveals that Asia dominates industrial fishing, accounting for 70% of all fishing vessel detections (Extended Data Fig. 5 ); nearly 30% of all mapped fishing vessels were concentrated in the exclusive economic zone (EEZ) of China alone. Similarly, AIS data suggest that European countries in the Mediterranean have more than ten times as many fishing hours in their EEZs as do African countries 5 , but our mapping shows that detections of fishing vessels are fairly balanced between the northern and southern parts of the Mediterranean Sea (Figs. 1 and 2 ).

Our mapping can also reveal potential hotspots of illegal fishing activity. Previous work showed substantial illicit activity in the eastern waters of North Korea 28 , but our global mapping shows that most of the undisclosed fishing actually occurred in the western part of the Korean Peninsula (Fig. 2 ). In fact, this location showed the highest density of fishing vessels in the world from 2017 to 2019, with about 40 vessels per 1,000 km 2 . This previously unmapped activity peaked each year in May, during China’s moratorium on fishing in their own waters (Extended Data Fig. 6 ), and activity abruptly fell by 85% during the COVID-19 pandemic when North Korea shut its borders. Numerous fishing vessels not publicly tracked were also detected inside many marine protected areas (MPAs). For example, two of the most iconic, biologically important and well-monitored MPAs in the world—the Galápagos Marine Reserve and the Great Barrier Reef Marine Park—showed, on average, more than 5 and 20 of these vessels per week, respectively (Extended Data Fig. 7 ).

The spatial resolution of our data, which is substantially higher than the most widely used global fishing products 37 , 38 , also reveals detailed fishing strategies at the regional scale (Fig. 2 and Extended Data Fig. 3 ). The area between Tunisia and Sicily, for example, shows a mix of both publicly and not publicly tracked fishing vessels aggregating along ocean banks and the edges of seabed canyons, a signature characteristic of bottom trawling 39 . Similarly, off the coast of Bangladesh, in which almost no vessels are publicly tracked and no public maps of fishing exist, fishing vessels follow bathymetric contours and submarine canyons that radiate from the Ganges Delta.

Unlike fishing, most non-fishing vessels (largely transport and energy-related) broadcast their locations, with just about one-quarter missing from public monitoring systems. Asia had the largest concentration (65% of all detections) of transport and energy vessels, including most of the non-broadcasting ones (Fig. 1 )—most of these vessels, however, were operating in areas with poor satellite AIS reception, so it is possible that many vessels broadcast their positions but were not trackable with global AIS tracking services. All other continents seemed to have relatively minor tracking discrepancies across transport and energy vessels, with less than 20% of these vessels not publicly trackable.

Our mapping also tracks changes in vessel activity over time (Fig. 3 ). Similar to a previous AIS-based analysis 5 , our data show yearly cycles of fishing activity, with cycles inside China driven by the Chinese New Year and their voluntary fishing moratorium, and in the rest of the world by the New Year and associated holidays. But, owing to SAR-based detection, we can provide a more accurate assessment of trends, which reveals a global decrease in fishing activity of 12 ± 1%, coinciding with the pandemic. By stark contrast, transport and energy remained stable or even slightly increased over 2017–2021. Moreover, the impact of COVID-19 on fishing activity was much greater outside China (compared with 2018 and 2019), and transport and energy grew more in China than it did in the rest of the world.

Time series of the average number of vessels over the area covered by SAR from Sentinel-1 (constructed from the average number of detections per satellite overpass at any given location; Methods ) showing that COVID-19 greatly affected fishing activity, whereas transport and energy continued to grow. China alone holds nearly 30% of the global fishing fleet and about 21% of transport and energy vessels. a , Industrial fishing vessels greater than 15–20 m in length over all EEZs outside China and inside China EEZ. b , The same as a but for transport-related and energy-related vessels, mostly shipping, tankers, passenger and support. The shaded grey areas indicate the 2-year mean ± 1 s.d., highlighting the effect of the 2020 global pandemic. The numbers show the per cent change with their respective standard error. The combined change (outside + inside) is −12 ± 1% (industrial fishing) and 0 ± 1% (transport and energy); Methods . The coloured boxes highlight the annual cycles in activity related to national holidays and fishing moratoria. The y axis shows the maximum, mean and minimum values of detected vessels during 2017–2021.

## Fixed infrastructure

The number of offshore structures worldwide was around 28,000 by the end of 2021 (Fig. 4 ). Wind turbines and oil structures in notable wind-producing or oil-producing areas ( Methods ) constituted 48% and 38% of all ocean infrastructure, respectively; the remaining 14% was divided across wind turbines and oil structures outside major development areas, as well as piers, bridges, power lines, aquaculture and other human-made structures.

a , Global map of offshore development, showing oil infrastructure in major oil-producing areas, wind farms and other human-made structures (such as piers, power lines and aquaculture). Circles are proportional to the number of structures per square-degree grid cell at the end of 2021. b , One year of vessel traffic associated with offshore infrastructure in the North Sea. These vessels were all broadcasting and interacted with a detected oil or wind structure at some point during 2021 (the vessels were within 200 m of an offshore structure for at least 2 h at a speed of 0 knots). Globally, in 2021, nearly 4,140,000 h of vessel activity were associated with oil platforms and around 792,500 h with wind turbines. GER, Germany; DK, Denmark; NL, The Netherlands; NO, Norway; SE, Sweden. c , Evolution of the number of (fixed) oil and wind structures in the ocean and the leading nations in wind development. Extended Data Figure 9 shows the leading nations in oil development. Error bars define a lower bound considering only high-confidence detections of oil structures and an upper bound including detections with lower confidence (for example, potential oil structures outside oil-producing areas; Methods ).

Most oil infrastructure is distributed among 13 major oil-producing areas (Fig. 4a ). Excluding Lake Maracaibo in Venezuela, which is a lagoon, our mapping shows that the largest concentration of offshore oil infrastructure in the world is in the Gulf of Mexico. At the end of 2021, about a quarter of the global offshore oil infrastructure is accounted for by the USA (>2,200 oil structures), followed by Saudi Arabia (>770) and Indonesia (>670).

Offshore wind development has been mostly confined to northern Europe (52%) and China (45%) (Fig. 4a and Extended Data Fig. 8 ); however, there has been a shift in offshore energy development. The number of offshore oil structures has increased by about 16% over the past half a decade (Fig. 4c ), with a decrease in the USA of several hundred structures offset by increases elsewhere (Extended Data Fig. 9 ). By contrast, the number of wind turbines in the ocean has more than doubled since 2017, probably surpassing the number of oil structures by the end of 2020 (Fig. 4c ). China leads the development of offshore wind, with a staggering 900% increase in turbines from 2017 to 2021 (averaging around 950 wind turbines per year), well ahead of projections by the International Energy Agency 40 . The UK and Germany lead offshore wind development in Europe, increasing by 49% and 28%, respectively, since 2017.

## Interactions between vessels and fixed infrastructure

A key question for the future is how vessel traffic may be affected by changes in oil and wind infrastructure development. Trawlers, which fish by hauling nets along the seafloor or through the water column and are the most common fishing gear globally, avoid fishing within 1 km of oil structures, most probably to avoid net entanglement (Extended Data Fig. 10a ). Other types of fishing, which are at a lower risk of entanglement, are attracted to these structures, probably because they can cause fish to aggregate 41 . Although wind turbines may also aggregate fish, they are less likely to affect industrial fishing in the same way because they are, at present, highly concentrated and, on average, far from shore, where there is less fishing activity (Extended Data Fig. 10b ). Also, oil-related vessel traffic has a much wider footprint than wind-related traffic, accounting for five times as much activity globally in 2021 (Fig. 4b and Extended Data Fig. 4 ).

Overall, our study reveals the extent of major industrial activities at sea, with fishing being by far the ocean industry with the most activity that is not public. With our freely available dataset and technology, hotspots of potentially illegal activity can now be shown 28 and industrial fishing vessels can be identified that are encroaching on artisanal fishing grounds 17 or other countries’ EEZs 27 , but at a global scale and accessible to any nation. Maps of global fishing effort can now include all vessels, not just those based on AIS tracking (which misses about three-quarters of large vessels), and with much higher resolution than just EEZs or statistical reporting areas 37 , 38 , 42 . Our data can also help to quantify the scale of greenhouse gas emissions from vessel traffic and offshore development, which may help to inform policies on reducing greenhouse gas emissions.

This picture of human activity also presents a snapshot of how industrial use of the ocean is changing. Although COVID-19 may have had a dominant role in depressing fishing activity, fishing still decreased far more than other ocean industries. This slowdown is in line with a long-term decline in the relative importance of fishing in the ocean 43 . Since the 1980s, global marine fish catch has been relatively unchanged as most fisheries are already fished to capacity 11 . As a result, global fishing effort, which has increased several fold since 1950, increased only slightly in recent years 42 . Many countries that have reformed their fisheries show an actual decline in their fishing effort 44 . The decrease highlighted in this study may reflect this longer trend and we may already have seen the peak of fishing activity in the past decade. By contrast, transport and energy vessel traffic may continue to expand, following trends in global trade and the rapid development of renewable energy infrastructure. In this scenario, changes to marine ecosystems brought by infrastructure and vessel traffic may rival fishing in impact 43 , and an accurate mapping of these activities is fundamental to understanding and managing future human activities in the ocean.

## SAR imagery

SAR imaging systems have proved to be the most consistent option for detecting vessels at sea 45 , 46 . SAR is unaffected by light levels and most weather conditions, including daylight or darkness, clouds or rain. By contrast, some other satellite sensors, such as electro-optical imagery, rely on sunlight and/or the infrared radiation emitted by objects on the ground and can therefore be confounded by cloud cover, haze, weather events and seasonal darkness at high latitudes.

We used SAR imagery from the Copernicus Sentinel-1 mission of the European Space Agency (ESA) ( https://sentinel.esa.int/web/sentinel/user-guides/sentinel-1-sar ). The images are sourced from two satellites (S1A and, formerly, S1B, which stopped operating in December 2021) that orbit 180° out of phase with each other in a polar, sun-synchronous orbit. Each satellite has a repeat cycle of 12 days, so that—together—they provide a global mapping of coastal waters around the world approximately every 6 days. The number of images per location, however, varies greatly depending on mission priorities, latitude and degree of overlap between adjacent satellite passes ( https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-1/observation-scenario ). Spatial coverage also varies over time and is improved with the addition of S1B in 2016 and the acquisition of more images in later years (Extended Data Fig. 1 ). Our data consist of dual-polarization images (VH and VV) from the Interferometric Wide (IW) swath mode, with a resolution of about 20 m. We used the Ground Range Detected (GRD) Level-1 product provided by Google Earth Engine ( https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD ), processed for thermal noise removal, radiometric calibration and terrain correction ( https://developers.google.com/earth-engine/guides/sentinel1 ). To eliminate potential noise artefacts 33 that would introduce false detections, we further processed each image by clipping a 500-m buffer off the borders. We selected all SAR scenes over the ocean from October 2016 to February 2022, comprising 753,030 images of 29,400 × 24,400 pixels each on average.

## Visible and NIR imagery

For optical imagery, we used the Copernicus Sentinel-2 (S2) mission of the ESA ( https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi ). These twin satellites (S2A and S2B) also orbit 180° out of phase and carry a wide-swath, high-resolution, multispectral imaging system, with a combined global 5-day revisit frequency. Thirteen spectral bands are sampled by the S2 Multispectral Instrument (MSI): visible (RGB) and NIR at 10 m, red edge and SWIR at 20 m, and other atmospheric bands at 60-m spatial resolution. We used the RGB and NIR bands from the Level-1C product provided by Google Earth Engine ( https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2 ) and we excluded images with more than 20% cloud coverage using the QA60 bitmask band with cloud mask information. We analysed all scenes that contained a detected offshore infrastructure during our observation period, comprising 2,494,370 images of 10,980 × 10,980 pixels each on average (see the ‘Infrastructure classification’ section).

AIS data were obtained from satellite providers ORBCOMM and Spire. In total, using Global Fishing Watch’s data pipeline 5 , we processed 53 billion AIS messages. From those data, we extracted the locations, lengths and identities of all AIS devices that operated near the SAR scenes around the time the images were taken; we did so by interpolating between AIS positions to identify where vessels probably were at the moment of the image, as described in ref. 47 . Identities of vessels in the AIS were based on methods in ref. 5 and revised in ref. 26 .

## Environmental and physical data

To classify vessels detected with SAR as fishing and non-fishing, we constructed a series of global environmental fields that were used as features in our model. Each of these rasters represents an environmental variable over the ocean at 1-km resolution. Data were obtained from the following sources: chlorophyll data from the NASA Ocean Biology Processing Group ( https://oceancolor.gsfc.nasa.gov/data/10.5067/ORBVIEW-2/SEAWIFS/L2/IOP/2018 ), sea-surface temperature and currents from the Copernicus Global Ocean Analysis and Forecast System ( https://doi.org/10.48670/moi-00016 ), distance to shore from NASA OBPG/PacIOOS ( http://www.pacioos.hawaii.edu/metadata/dist2coast_1deg_ocean.html ), distance to port from Global Fishing Watch ( https://globalfishingwatch.org/data-download/datasets/public-distance-from-port-v1 ) and bathymetry from GEBCO ( https://www.gebco.net/ ). EEZ boundaries used in our analysis and maps are from Marine Regions 48 .

## Vessel detection by SAR

Detecting vessels with SAR is based on the widely used constant false alarm rate (CFAR) algorithm 46 , 49 , 50 , a standard adaptive threshold algorithm used for anomaly detection in radar imagery. This algorithm is designed to search for pixel values that are unusually bright (the targets) compared with those in the surrounding area (the sea clutter). This method sets a threshold that depends on the statistics of the local background, sampled with a set of sliding windows. Pixel values above the threshold constitute an anomaly and are probably samples from a target. Our modified two-parameter CFAR algorithm evaluates the mean and standard deviation of backscatter values, delimited by a ‘ring’ composed of an inner window of 200 × 200 pixels and an outer window of 600 × 600 pixels. The best separation between the ocean and the targets is accomplished by the vertical–horizontal (VH) polarization band, which shows relatively low polarized returns over flat areas (ocean surface) compared with volumetric objects (vessels and infrastructure) 45 :

in which x px is the backscatter value of the centre pixel, μ b and σ b are the mean and standard deviation of the background, respectively, and n t is a time-dependent threshold.

To maximize detection performance, we determined the sizes of the windows empirically, based on the fraction of detected vessels (broadcasting AIS) with length between 15 m and 20 m. A key feature of our two-parameter CFAR algorithm is the ability to specify different thresholds for different times. This adjustment is needed because the statistical properties of the SAR images provided by Sentinel-1 vary with time as well as by satellite (S1A and S1B). We thus found that the ocean pixels for both the mean and the standard deviation of the scenes changed, requiring different calibrations of the CFAR parameters for five different time intervals during which the statistics of the images remained relatively constant: January 2016 to October 2016 ( n S1A = 14, n S1B = none); September 2016 to January 2017 (14, 18); January 2017 to March 2018 (14, 17); March 2018 to January 2020 (16, 19); and January 2020 to December 2021 (22, 24). The five detection thresholds were calibrated to obtain a consistent detection rate for the smaller vessels across the entire Sentinel-1 archive (60% detection of vessels 15–20 m in length). The relative simplicity of our approach allowed us to reprocess the full archive of Sentinel-1 imagery several times to empirically determine the optimal parameters for detection.

To implement our SAR detection algorithm, we used the Python API of Google Earth Engine ( https://developers.google.com/earth-engine/tutorials/community/intro-to-python-api ), a planetary-scale platform for analysing petabytes of satellite imagery and geospatial datasets. For processing, analysing and distributing our data products, our detection workflow uses Google’s cloud infrastructure for big data, including Earth Engine, Compute Engine, Cloud Storage and BigQuery.

## Vessel presence and length estimation

To estimate the length of every detected object and also to identify when our CFAR algorithm made false detections, we designed a deep convolutional neural network (ConvNet) based on the modern ResNet (Residual Networks) architecture 51 . This single-input/multi-output ConvNet takes dual-band SAR image tiles of 80 × 80 pixels as input and outputs the probability of object presence (a binary classification task) and the estimated length of the object (a regression task).

To analyse every detection, we extracted a small tile from the original SAR image that contained the detected object at the centre and that preserved both polarization bands (VH and VV). Our inference data therefore consisted of more than 62 million dual-band image tiles to classify. To construct our training and evaluation datasets, we used SAR detections that matched to AIS data with high confidence (see the ‘SAR and AIS integration’ section), including a variety of challenging scenarios such as icy locations, rocky locations, low-density and high-density vessel areas, offshore infrastructure areas, poor-quality scenes, scenes with edge artefacts and so on (Extended Data Fig. 11 ). To inspect and annotate these samples, we developed a labelling tool and used domain experts, cross-checking annotations from three independent labellers on the same samples and retaining the high-confidence annotations. Overall, our labelled data contained about 12,000 high-quality samples that we partitioned into the training (80%, for model learning and selection) and test (20%, for model evaluation) sets.

For model learning and selection, we followed a training–validation scheme that uses fivefold cross-validation ( https://scikit-learn.org/stable/modules/cross_validation.html ), in which, for each fold (a training cycle), 80% of the data is reserved for model learning and 20% for model validation, with the validation subset non-overlapping across folds. Performance metrics are then averaged across folds for model assessment and selection, and the final model evaluation is performed on the holdout test set. Our best model achieved on the test set an F1 score of 0.97 (accuracy = 97.5%) for the classification task and a R 2 score of 0.84 (RMSE = 21.9 m, or about 1 image pixel) for the length-estimation task.

## Infrastructure detection

To detect offshore infrastructure, we used the same two-parameter CFAR algorithm developed for vessel detection, with two fundamental modifications. First, to remove non-stationary objects, that is, most vessels, we constructed median composites from SAR images within a 6-month time window. Because stationary objects are repeated across most images, they are retained with the median operation, whereas non-stationary objects are excluded. We repeated this procedure for each month, generating a monthly time series of composite images. The temporal aggregation of images also reduces the background noise (the sea clutter) while enhancing the coherent signals from stationary objects 33 . Second, we empirically adjusted the sizes of the detection window. As some offshore infrastructure is usually arranged in dense clusters, such as wind farms following a grid-like pattern, we reduced the spatial windows to avoid ‘contamination’ from neighbouring structures. It is also common to find smaller structures such as weather masts placed between some of the wind turbines. We found that an inner window of 140 × 140 pixels and outer window of 200 × 200 pixels was optimal for detecting every object in all wind farms and oil fields that we tested, including Lake Maracaibo, the North Sea and Southeast Asia, areas known for their high density of structures (Extended Data Fig. 7 ).

## Infrastructure classification

To classify every detected offshore structure, we used deep learning. We designed a ConvNet based on the ConvNeXt architecture 52 . A key difference from the ‘vessel presence and length estimation’ model, besides using a different architecture, is that this model is a multi-input/single-output ConvNet that takes two different multiband image tiles of 100 × 100 pixels as input, passes them through independent convolutional layers (two branches), concatenates the resulting feature maps and, with a single classification head, outputs the probabilities for the specified classes: wind infrastructure, oil infrastructure, other infrastructure and noise.

A new aspect of our deep-learning classification approach is the combination of SAR imagery from Sentinel-1 with optical imagery from Sentinel-2. From 6-month composites of dual-band SAR (VH and VV) and four-band optical (RGB and NIR) images, we extracted small tiles for every detected fixed structure, with the respective objects at the centre of the tile. Although both the SAR and optical tiles consist of 100 pixels, they come from imagery with different resolutions: the dual-band SAR tile has a spatial resolution of 20 m per pixel and the four-band optical tile is 10 m per pixel. This variable resolution not only provides information with different levels of granularity but also yields different fields of view.

From our inference data for infrastructure classification, which consisted of nearly six million multiband images, we constructed the labelled data by integrating several sources of ground truth for ‘oil and gas’ and ‘offshore wind’: from the Bureau of Ocean Energy Management ( https://www.data.boem.gov/Main/HtmlPage.aspx?page=platformStructures ), the UK Hydrographic Office ( https://www.admiralty.co.uk/access-data/marine-data ), the California Department of Fish and Wildlife ( https://data-cdfw.opendata.arcgis.com/datasets/CDFW::oil-platforms-ospr-ds357/about ) and Geoscience Australia ( https://services.ga.gov.au/gis/rest/services/Oil_Gas_Infrastructure/MapServer ). Using a labelling approach similar to that of the vessel samples, we also inspected a large number of detections to identify samples for ‘other structures’ and ‘noise’ (rocks, small islands, sea ice, radar ambiguities and image artefacts). From all areas known to have some offshore infrastructure (Extended Data Fig. 11 ), our labelled data contained more than 47,000 samples (45% oil, 41% wind, 10% noise and 4% other) that we partitioned into the training (80%) and test (20%) sets, using the same fivefold cross-validation strategy as for vessels.

Because the same fixed objects appear in several images over time, we grouped the candidate structures for the labelled data into 0.1° spatial bins and sampled from different bins for each data partition, so that the subsets for model learning, selection and evaluation did not contain the same (or even nearby) structures at any point. We also note that, in the few cases in which optical tiles were unavailable, for example, because of seasonal darkness close to the poles, the classification was performed with SAR tiles only (optical tiles were blank). Our best model achieved on the test set a class-weighted average F1 score of 0.99 (accuracy = 98.9%) for the multiclass problem.

## Fishing and non-fishing classification

To identify whether a detected vessel was a fishing or non-fishing boat, we also used deep learning. For this classification task, we used the same underlying ConvNeXt architecture as for infrastructure, modified to process the following two inputs: the estimated length of the vessel from SAR (a scalar quantity) and a stack of environmental rasters centred at the location of the vessel (a multiband image). This multi-input-mixed-data/single-output model passes the raster stack (11 bands) through a series of convolutional layers and combines the resulting feature maps with the vessel-length value to perform a binary classification: fishing or non-fishing.

Two key aspects of our neural-net classification approach differ greatly from conventional image-classification tasks.

First, we are classifying the environmental context in which the vessel in question operates. To do so, we constructed 11 gridded fields (rasters) with a resolution of 0.01° (approximately 1 km per pixel at the equator) and with global coverage. At every pixel, each raster contains contextual information on the following variables: (1) vessel density (based on SAR); (2) average vessel length (based on SAR); (3) bathymetry; (4) distance from port, (5) and (6) hours of non-fishing-vessel presence (from the AIS) for vessels less than 50 m and more than 50 m, respectively; (7) average surface temperature; (8) average current speed; (9) standard deviation of daily temperature; (10) standard deviation of daily current speed; and (11) average chlorophyll. For every detected vessel, we sampled 100 × 100-pixel tiles from these rasters, producing an 11-band image that we then classified with the ConvNet. Each detection is thus provided with context in an area just over 100 × 100 km. We obtained the fishing and non-fishing labels from AIS vessel identities 26 .

Second, our predictions are produced with an ensemble of two models with no overlap in spatial coverage. To avoid leakage of spatial information between the training sets of the two models, and also to maximize spatial coverage, we divided the centre of the tiles into a 1° longitude and latitude grid. We then generated two independent labelled datasets, one containing the tiles from the ‘even’ and the other from the ‘odd’ latitude and longitude grid cells. This alternating 1° (the size of the tile) strategy ensures no spatial overlap between tiles across the two sets. We trained two independent models, one for ‘even’ tiles and another for ‘odd’ tiles, with each model ‘seeing’ a fraction of the ocean that the other model does not ‘see’. The test set that we used to evaluate both models contains tiles from both ‘even’ and ‘odd’ grid cells, with a 0.5° buffer around all the test grid cells removed from all the neighbouring cells (used for training) to ensure spatial independence across all data partitions (no leakage). By averaging the predictions from these two models, we covered the full spatial extent of our detections with independent and complementary spatial information.

Our original test set contained 47% fishing and 53% non-fishing samples. We calibrated the model output scores by adjusting the ratio of fishing to non-fishing vessels in the test set to 1:1 ( https://scikit-learn.org/stable/modules/calibration.html ). We performed a sensitivity analysis to see how our results changed with different proportions of fishing and non-fishing vessels, 2:1 and 1:2. On average, about 30,000 vessels not publicly tracked were detected at any given time. The calibrated scores with two-thirds fishing vessels predicted that 77% of these vessels were fishing, whereas the calibration with only one-third fishing vessels predicted that 63% of them were fishing vessels. Thus, the total percentage (considering all detections) of fishing and non-fishing vessels not publicly tracked amounts to 72–76% and 21–30%, respectively. Analysts at Global Fishing Watch then reviewed these outputs in different regions of the world to verify its accuracy.

Our training data contained about 120,000 tiles (divided into ‘odd’ and ‘even’) that we split into 80% for model learning and 20% for model selection. Our test set for model evaluation contained 14,100 tiles from both ‘odd’ and ‘even’ grid cells (Extended Data Fig. 11 ). The inference data contained more than 52 million tiles (11-band images) with respective vessel lengths that we classified with the two models. Our best model ensemble achieved on the test set a F1 score of 0.91 (accuracy = 90.5%) for the classification task.

## False positives and recall

Because there is no ground-truth data on where vessels are not present, estimating the rate of false positives at the global scale of our vessel detection algorithm is challenging. Although some studies report the total number of false positives, we believe that a more meaningful metric is the ‘false positive density’ (number of false positives per unit area), which takes into account the actual scale of the study. We estimated this metric by analysing 150 million km 2 of imagery across all five years in regions with very low density of AIS-equipped vessels (less than 10 total hours in 2018 in a grid cell of 0.1°), in regions far from shore (>20 km) and in the waters of countries that have relatively good AIS use and reception. The number of non-broadcasting vessel detections in these regions serves as the upper limit on the density of false positives, which we estimated as 5.4 detections per 10,000 km 2 . If all of these were false positives, it would suggest a false-positive rate of about 2% in our data. Because many of these are probably real detections, however, the actual false-positive rate is probably lower. Compared with other sources of uncertainties, such as the resolution limitation of the SAR imagery and missing some areas of the ocean (see below), false positives introduce a relatively minor error to our estimations.

To estimate recall (proportion of actual positives correctly identified), we used a method similar to that used in ref. 47 . We identified all vessels that had an AIS position very close in time to the image acquisition (<2 min) and should therefore have appeared in the SAR scene; if they were detected in the SAR image, we could match them to the respective AIS-equipped vessels and then identify the AIS-equipped vessels not detected. The recall curve suggests that we are able to detect more than 95% of all vessels greater than 50 m in length and around 80% of all vessels between 25 m and 50 m in length, with the detection rate decaying steeply for vessels smaller than 25 m (Extended Data Fig. 2 ). However, because our vessel detection relies on a CFAR algorithm with a 600-m-wide window, when vessels are close to one another (<1 km), the detection rate is lower. See the ‘Limitations of our study’ section for factors influencing detectability.

## SAR and AIS integration

Matching SAR detections to the GPS coordinates of vessels (from AIS records) is challenging because the timestamp of the SAR images and AIS records do not coincide, and a single AIS message can potentially match to several vessels appearing in the image, and vice versa. To determine the likelihood that a vessel broadcasting AIS signals corresponded to a specific SAR detection, we followed the matching approach outlined in ref. 47 , with a few improvements. This method draws on probability rasters of where a vessel probably is minutes before and after an AIS position was recorded. These rasters were developed from one year of global AIS data, including roughly 10 billion vessel positions, and computed for six different vessel classes, considering six different speeds and 36 time intervals, leading to 1,296 rasters. This probability raster approach could be seen as a utilization distribution 53 —for each vessel class, speed and time interval—in which the space is relative to the position of the individual.

As described in ref. 47 , we combined the before and after probability rasters to obtain the probability distribution of the probable location of each vessel. We then calculated the value of this probability distribution at each SAR detection that a given vessel could match to. This value was then adjusted to account for: (1) the likelihood a vessel was detected and (2) a factor to account for whether the length of the vessel (from Global Fishing Watch’s AIS database) is in agreement with the length estimated from the SAR image. The resulting value provides a score for each potential AIS to SAR match, calculated as

in which p is the value of the probability distribution at the location of the detection (following ref. 47 ), L match is a factor that adjusts this score based on length and L detect is the likelihood of detecting the vessel, defined as

in which R is the recall as a function of vessel size and distance to the nearest vessel with an AIS device (Extended Data Fig. 2 ) and L inside is the probability that the vessel was in the scene at the moment of the image, obtained by calculating the fraction of a vessel’s probability distribution that is within the given SAR scene 47 . Drawing on 2.8 million detections of high-confidence matches (AIS to SAR matches that were unlikely to match to other detections and for which the AIS-equipped vessel had a position within 2 min of the image), we developed a lookup table with the fractional difference between AIS known length and SAR estimated length, discretized in 0.1 difference intervals. Multiplying by this value ( L match ) makes it very unlikely for a small vessel to match to a large detection, or vice versa.

A matrix of scores of potential matches between SAR and AIS is then computed and matches are assigned (by selecting the best option available at the moment) and removed in an iterative procedure, with our method performing substantially better than conventional approaches, such as interpolation based on speed and course 47 . A key challenge for us is deciding on the best score threshold to accept or reject a match, because a threshold that is too low or too high would increase or decrease the likelihood that a given SAR detection is a vessel not publicly tracked. To determine the optimal score, we estimated the total number of vessels with AIS devices that should have appeared in the scenes globally by summing R (length, spacing) L match for all scenes. This value suggests that, globally, 17 million vessels with AIS devices should have been detected in the SAR images. As such, we selected the threshold that provided 17 million matches from the actual detections, that is, 7.4 ×10 −6 .

We refer to ref. 47 for the full description of the raster-based matching algorithm, and the matching code can be found at https://github.com/GlobalFishingWatch/paper-longline-ais-sar-matching .

## Data filtering

Delineating shorelines is difficult because current global datasets do not capture the complexities of all shorelines around the world 54 , 55 . Furthermore, the shoreline is a dynamic feature that constantly changes with time. To avoid false detections introduced by inaccurately defined shorelines, we filtered out a 1-km buffer from a global shoreline that we compiled using several sources ( https://www.ngdc.noaa.gov/mgg/shorelines , https://www.naturalearthdata.com/downloads/10m-physical-vectors/10m-minor-islands , https://data.unep-wcmc.org/datasets/1 , https://doi.org/10.1080/1755876X.2018.1529714 , https://osmdata.openstreetmap.de/data/land-polygons.html , https://www.arcgis.com/home/item.html?id=ac80670eb213440ea5899bbf92a04998 ). We used this synthetic shoreline to determine the valid area for detection within each SAR image.

We filtered out areas with a notable concentration of sea ice, which could introduce false detections because ice is a strong radar reflector, often showing up in SAR images with a similar signature to that of vessels and infrastructure. We used a time-variable sea-ice-extent mask from the Multisensor Analyzed Sea Ice Extent – Northern Hemisphere (MASIE-NH), Version 1 ( https://nsidc.org/data/g02186/versions/1#qt-data_set_tabs ), supplemented with predefined bounding boxes over lower-latitude areas known to have substantial seasonal sea ice, such as the Hudson Bay in Canada, the Sea of Okhotsk north of Japan, the Arctic Ocean, the Bering Sea, selected areas near Greenland, the northern Baltic Sea and South Georgia Islands. No imagery in the mode we processed was available for Antarctic waters.

We also removed repeated objects across several images (that is, fixed structures) from the vessel-detection dataset so as to exclude them from all calculations about vessel activity. This process also removed vessels anchored for a long period of time, so our dataset is more representative of moving vessels than stationary ones.

Another potential source of noise is reflections from moving vehicles on bridges or roads close to shore. Although bridges can be removed from the data through fixed infrastructure analysis, a vehicle moving perpendicular to the satellite path will appear offset. Vehicles visible in SAR can appear more than a kilometre away from the road when moving faster than 100 km per hour on a highway, sometimes appearing in the water. For matching AIS to SAR, we account for this movement in the matching code 47 . Drawing on the global gROADSv1 dataset of roads, we identified every highway and primary road within 3 km of the ocean (including bridges) and then calculated for each image where vehicles would appear if they were travelling 135 km per hour on a highway or 100 km per hour on a primary road. These offsetting positions were turned into polygons that excluded detections within this distance, which eliminated about 1% of detections globally.

A minor source of false positives is ‘radar ambiguities’ or ‘ghosts’, which are an aliasing effect caused by the periodic sampling (radar echoes) of the target to form an image. For Sentinel-1, these ghosts are most commonly caused by bright objects and appear offset a few kilometres in the azimuth direction (parallel to the satellite ground track) from the source object. These ambiguities appear separated from their source by an azimuth angle 56 ψ = λ /(2 V )PRF, in which λ is the SAR wavelength, V is the satellite velocity and PRF is the SAR pulse repetition frequency, which—in the case of Sentinel-1—ranges from 1 to 3 kHz and is constant across each sub-swath of the image 35 . Thus, we expect the offsets to also be constant across each sub-swath.

To locate potential ambiguities, we calculated the off-nadir angle 35 θ i for every detection i and then identified all detections j within 200 m of the azimuth line through each detection as candidate ambiguities. We then calculated the difference in azimuth angles ψ ij for these candidates. To find which of these detentions were potential ambiguities, we binned the calculated off-nadir angles ( θ i ) in intervals of 0.1° (approximately 200 m) and built a histogram for each interval by counting the number of detections at different azimuthal offset angles ψ , binning ψ at 0.001°. For each interval θ i , we identified the angle ψ for which there was the maximum number of detections, limiting ourselves to cases in which the number of detections was at least two standard deviations above the background level. As expected, ambiguities appeared at a consistent ψ within each of the three sub-swaths of the IW mode images. For θ < 32.41°, ambiguities occurred at ψ = 0.363° ± 0.004°. For 32.41° < θ < 36.87°, ambiguities occurred at ψ = 0.308° ± 0.004°. And for θ > 36.85°, ambiguities occurred at ψ = 0.359° ± 0.004°.

We then flagged all pairs of detections that lay along a line parallel to the satellite ground track and had an angle ψ within the expected values for their respective sub-swath. The smaller (dimmer) object in the pair was then selected as a potential ambiguity. We identified about 120,000 outliers out of 23.1 million detections (0.5%), which we excluded from our analysis.

Ambiguities can also arise from objects on shore. Because, generally, only objects larger than 100 m produce ambiguities in our data, and few objects larger than 100 m on shore regularly move, these ambiguities probably show up in the same location in images at different times. All stationary objects were removed from our analysis of vessels. The analysis of infrastructure also removed these false detections because, in addition to SAR, it draws on Sentinel-2 optical imagery, which is free from these ambiguities.

We defined spatial polygons for the major offshore oil-producing areas and wind-farm regions (Fig. 4a ) and we prescribed a higher confidence to the classification of oil and wind infrastructure falling inside these areas and a lower confidence elsewhere. Overall, we identified 14 oil polygons (Alaska, California, Gulf of Mexico, South America, West Africa, Mediterranean Sea, Persian Gulf, Europe, Russia, India, Southeast Asia, East Asia, Australia, Lake Maracaibo) and two wind polygons (Northern Europe, South and East China seas). We defined these polygons through a combination of: (1) global oil regions datasets ( https://doi.org/10.18141/1502839 , https://www.prio.org/publications/3685 ); (2) AIS-equipped vessel activity around infrastructure; and (3) visual inspection of satellite imagery. We then used a DBSCAN 57 clustering approach to identify detections over time (within a 50-m radius) that were probably the same structure but their coordinates differed slightly and assigned them the most common predicted label of the cluster. We also filled in gaps for fixed structures that were missing in one time step but detected in the previous and following time steps and dropped detections appearing in a single time step.

## Vessel activity estimation

To convert individual detections of vessel instances to average vessel activity, we first calculated the total number of detections per pixel on a spatial grid of 1/200° resolution (about 550 m) and then normalized each pixel by the number of satellite overpasses (number of SAR acquisitions per location). To construct a daily time series of average activity, we performed this procedure with a rolling window of 24 days (two times the repeat cycle of Sentinel-1), aggregating the detections over the window and assigning the value to the centre date. We restricted the temporal analysis to only those pixels that had at least 70 of the 24-day periods (out of 77 possible), which included 95% of the total vessel activity in our study area. For individual pixels with no overpass for 24 days, we linearly interpolated the respective time series at the pixel location. Overall, only 0.7% of the activity in our time series is from interpolated values. This approach provides the average number of vessels present in each location at any given time regardless of spatial differences in frequency and number of SAR acquisitions.

## Temporal change estimation

We computed the global and EEZ mean time series of daily average number of vessels and monthly median number of infrastructure. We aggregated the gridded and normalized data over the area sampled by Sentinel-1 during 2017–2021, when the spatial coverage of Sentinel-1 was fairly consistent (Extended Data Fig. 1 ). From these times series, we then computed yearly means with respective standard deviations. Although absolute values may be sensitive to the spatial coverage, such as buffering out 1 km from shore, the trends and relative changes are robust as (a) they are calculated over a fixed area over the observation period and (b) this area contains well over three-quarters of all industrial activity at sea (corroborated by AIS). We estimated the per cent change in vessel activity owing to the pandemic (difference between means; Fig. 3 ) and respective standard error by bootstrapping 58 the residuals with respect to the average seasonal cycle, obtaining for industrial fishing: −14 ± 2% (outside China), −8 ± 3% (inside China), −12 ± 1% (globally); and for transport and energy: −1 ± 1% (outside China), +4 ± 1% (inside China), 0 ± 1% (globally). We note that, for visualization purposes, we smoothed the time series of vessels and offshore infrastructure with a rolling median.

## Limitations of our study

Sentinel-1 does not sample most of the open ocean. As our study shows, however, most of the industrial activity is close to shore. Also, farther from shore, more fishing vessels use AIS (60–90%) 59 , far more than the average for all fishing vessels (about 25%). Thus, for most of the world, our analysis complemented with AIS data will capture most of the human activity in the global ocean.

We do not classify objects within 1 km of shore, because of ambiguous coastlines and rocks. Nor do we classify objects in much of the Arctic and Antarctic, in which sea ice can create too many false positives; in both regions, however, vessel traffic is either very low (Antarctic) or in countries that have a high adoption of the AIS (northern European or northern North American countries). The bulk of industrial activities occurs several kilometres from shore, such as fishing along the continental shelf break, ocean transport over shipping lanes and offshore development in medium-to-large oil rigs and wind farms. Also, much of the vessel activity within 1 km of shore is by smaller boats, such as pleasure crafts.

Vessel detection by SAR imagery is limited primarily by the resolution of the images (about 20 m in the case of the Sentinel-1 IW GRD product). As a result, we miss most vessels less than 15 m in length, although an object smaller than a pixel can still be seen if it is a strong reflector, such as a vessel made of metal rather than wood or fibreglass. Especially for smaller vessels (<25 m), detection also depends on wind speed and the state of the ocean 60 , as a rougher sea surface will produce higher backscatter, making it difficult to separate a small target from the sea clutter. Conversely, the higher the radar incidence angle, the higher the probability of detection 60 , as less backscatter from the background will be received by the antenna. The vessel orientation relative to the satellite antenna also matters, as a vessel perpendicular to the radar line of sight will have a larger backscatter cross-section, increasing the probability of being detected.

Our estimates of vessel length are limited by the quality of the ground-truth data. Although we selected only high-confidence AIS to SAR matches to construct our training data, we found that some AIS records contained an incorrectly reported length. These errors, however, resulted in only a small fraction of imprecise training labels, and deep-learning models can accommodate some noise in the training data 61 .

Our fishing classification may be less accurate in certain regions. In areas of high traffic from pleasure crafts and other service boats, such as near cities in wealthy countries and in the fjords of Norway and Iceland, some of these smaller craft might be misclassified as fishing vessels. Conversely, some misclassification of fishing vessels as non-fishing vessels is expected in areas in which all activity is not publicly tracked, such as Southeast Asia. More importantly, however, is that many industrial fishing vessels are between 10 and 20 m in length, and the recall of our model falls off quickly within these lengths. As a result, the total number of industrial fishing vessels is probably substantially higher than what we detect. Because our model uses vessel length from SAR, it may be possible to use methods similar to those in ref. 47 to estimate the number of missing vessels. Future work can address this challenge.

Overall, our study probably underestimates the concentration of fishing in Asian waters and Chinese fisheries, in which we see areas of vessel activity being ‘cut off’ by the edge of the Sentinel-1 footprint. And because we miss very small vessels (for example, most artisanal fishing) that are less likely to carry AIS devices, the global estimate of activity not publicly tracked presented here is probably higher. Algorithmic improvements can capture the first kilometre from shore and the inclusion of more SAR satellites in the coming years (two more ESA Sentinel-1 satellites and NASA’s NISAR mission) will allow us to apply this method more broadly to build on this map and capture all activity at sea.

## Data availability

All vessel and infrastructure data are freely available through the Global Fishing Watch data portal at https://globalfishingwatch.org/datasets-and-code . All data to reproduce this study can be downloaded from https://doi.org/10.6084/m9.figshare.24309475 (statistical analysis and figures) and https://doi.org/10.6084/m9.figshare.24309469 (model training and evaluation).

## Code availability

All code developed in this study for SAR detection, deep-learning models and analysis is open source and freely available at https://github.com/GlobalFishingWatch/paper-industrial-activity .

Costello, C. et al. The future of food from the sea. Nature 588 , 95–100 (2020).

Article ADS CAS PubMed Google Scholar

Golden, J. S. et al. Making sure the blue economy is green. Nat. Ecol. Evol. 1 , 0017 (2017).

Article Google Scholar

Jouffray, J. B., Blasiak, R., Norström, A. V., Österblom, H. & Nyström, M. The blue acceleration: the trajectory of human expansion into the ocean. One Earth 2 , 43–54 (2020).

Article ADS Google Scholar

Ryabinin, V. et al. The UN decade of ocean science for sustainable development. Front. Mar. Sci. 6 , 470 (2019).

Article MathSciNet Google Scholar

Kroodsma, D. A. et al. Tracking the global footprint of fisheries. Science 359 , 904–908 (2018).

Teh, L. C. L. & Sumaila, U. R. Contribution of marine fisheries to worldwide employment. Fish Fish. 14 , 77–88 (2013).

United Nations Conference on Trade and Development (UNCTAD). Review of maritime transport 2019. https://unctad.org/system/files/official-document/rmt2019_en.pdf (2019).

US Energy Information Administration. Today in energy. eia.gov , https://www.eia.gov/todayinenergy/detail.php?id=28492 (2016).

Hoegh-Guldberg, O. et al. Reviving the ocean economy: the case for action - 2015. https://wwfint.awsassets.panda.org/downloads/revivingoceaneconomy_summary_high_res.pdf (2015).

Organisation for Economic Co-operation and Development (OECD). The Ocean Economy in 2030 , https://doi.org/10.1787/9789264251724-en (2016).

Food and Agriculture Organization of the United Nations (FAO). The State of World Fisheries and Aquaculture 2022 , https://doi.org/10.4060/cc0461en (2022).

Lotze, H. K. et al. Depletion degradation, and recovery potential of estuaries and coastal seas. Science 312 , 1806–1809 (2006).

Waycott, M. et al. Accelerating loss of seagrasses across the globe threatens coastal ecosystems. Proc. Natl Acad. Sci. USA 106 , 12377–12381 (2009).

Article ADS CAS PubMed PubMed Central Google Scholar

Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES). Summary for Policymakers of the Global Assessment Report on Biodiversity and Ecosystem Services , https://zenodo.org/record/3553579 (2019).

Winther, J. G. et al. Integrated ocean management for a sustainable ocean economy. Nat. Ecol. Evol. 4 , 1451–1458 (2020).

Article PubMed Google Scholar

Bennett, N. J., Govan, H. & Satterfield, T. Ocean grabbing. Mar. Policy 57 , 61–68 (2015).

Belhabib, D., Sumaila, U. R. & Le Billon, P. The fisheries of Africa: exploitation, policy, and maritime security trends. Mar. Policy 101 , 80–92 (2019).

Center for International Earth Science Information Network (CIESIN), Columbia University, and Information Technology Outreach Services (ITOS), University of Georgia. Global roads open access data set (gROADS), v1 (1980–2010). https://doi.org/10.7927/H4VD6WCT (2013).

Google LLC. Places Library, Google Maps Platform. https://developers.google.com/maps/documentation/javascript/places .

Hoang, N. T. & Kanemoto, K. Mapping the deforestation footprint of nations reveals growing threat to tropical forests. Nat. Ecol. Evol. 5 , 845–853 (2021).

Waldner, F. et al. A unified cropland layer at 250 m for global agriculture monitoring. Data 1 , 3 (2016).

Taconet, M., Kroodsma, D. & Fernandes, J. A. Global Atlas of AIS-based Fishing Activity: Challenges and Opportunities , http://www.fao.org/3/ca7012en/CA7012EN.pdf (2019).

Virdin, J. et al. The Ocean 100: transnational corporations in the ocean economy. Sci. Adv. 7 , eabc8041 (2021).

Article ADS PubMed PubMed Central Google Scholar

March, D., Metcalfe, K., Tintoré, J. & Godley, B. J. Tracking the global reduction of marine traffic during the COVID-19 pandemic. Nat. Commun. 12 , 2415 (2021).

Carmine, G. et al. Who is the high seas fishing industry? One Earth 3 , 730–738 (2020).

Park, J. et al. Tracking elusive and shifting identities of the global fishing fleet. Sci. Adv. 9 , eabp8200 (2023).

Article PubMed PubMed Central Google Scholar

Welch, H. et al. Hot spots of unseen fishing vessels. Sci. Adv. 8 , eabq2109 (2022).

Park, J. et al. Illuminating dark fishing fleets in North Korea. Sci. Adv. 6 , eabb1197 (2020).

Center for Advanced Defense Studies (C4ADS). Above us only stars: exposing GPS spoofing in Russia and Syria. https://c4ads.org/reports/above-us-only-stars (2019).

McDonald, G. G. et al. Satellites can reveal global extent of forced labor in the world’s fishing fleet. Proc. Natl Acad. Sci. USA 118 , e2016238117 (2021).

Article CAS PubMed Google Scholar

Joo, R. et al. Towards a responsible machine learning approach to identify forced labor in fisheries. Preprint at https://arxiv.org/abs/2302.10987 (2023).

Jonathan, S. & Baptista, E. Off the grid: Chinese data law adds to global shipping disruption. Reuters https://www.reuters.com/world/china/off-grid-chinese-data-law-adds-global-shipping-disruption-2021-11-17/ (2021).

Wong, B. A., Thomas, C. & Halpin, P. Automating offshore infrastructure extractions using synthetic aperture radar & Google Earth Engine. Remote Sens. Environ. 233 , 111412 (2019).

Gourvenec, S., Sturt, F., Reid, E. & Trigos, F. Global assessment of historical, current and forecast ocean energy infrastructure: Implications for marine space planning, sustainable design and end-of-engineered-life management. Renew. Sustain. Energy Rev. 154 , 111794 (2022).

Torres, R., Snoeij, P., Davidson, M., Bibby, D. & Lokas, S. in Proc. 2012 IEEE International Geoscience and Remote Sensing Symposium 1703–1706 (IEEE, 2012).

Spoto, F. et al. in Proc. 2012 IEEE International Geoscience and Remote Sensing Symposium 1707–1710 (IEEE, 2012).

Food and Agriculture Organization of the United Nations (FAO). Global Capture Production Quantity (1950–2021) , https://www.fao.org/fishery/statistics-query/en/capture/capture_quantity (2021).

Pauly D., Zeller D. & Palomares M. L. D. (eds) Sea Around Us Concepts, Design and Data , www.seaaroundus.org (2020).

Fiorentino F. et al. Synthesis of Information on Some Demersal Crustaceans Relevant for Fisheries in the South Central Mediterranean Sea, http://www.faomedsudmed.org/pdf/publications/TD32.pdf (2013).

International Energy Agency (IEA). Offshore Wind Outlook 2019, https://www.iea.org/reports/offshore-wind-outlook-2019 (2019).

Claisse, J. T. et al. Oil platforms off California are among the most productive marine fish habitats globally. Proc. Natl Acad. Sci. USA 111 , 15462–15467 (2014).

Rousseau, Y., Watson, R. A., Blanchard, J. L. & Fulton, E. A. Evolution of global marine fishing fleets and the response of fished resources. Proc. Natl Acad. Sci. USA 116 , 12238–12243 (2019).

McCauley, D. J. et al. Marine defaunation: animal loss in the global ocean. Science 347 , 1255641 (2015).

Hilborn, R. et al. Effective fisheries management instrumental in improving fish stock status. Proc. Natl Acad. Sci. USA 117 , 2218–2224 (2020).

Crisp, D. J. The State-of-the-art in Ship Detection in Synthetic Aperture Radar Imagery, https://apps.dtic.mil/sti/citations/ADA426096 (2004).

El-Darymli, K., McGuire, P., Power, D. & Moloney, C. Target detection in synthetic aperture radar imagery: a state-of-the-art survey. J. Appl. Remote Sens. 7 , 071598 (2018).

Kroodsma, D. A. et al. Revealing the global longline fleet with satellite radar. Sci. Rep. 12 , 21004 (2022).

Flanders Marine Institute. Marine Regions. www.marineregions.org (2023).

Pappas, O., Achim, A. & Bull, D. Superpixel-level CFAR detectors for ship detection in SAR imagery. IEEE Geosci. Remote Sens. Lett. 15 , 1397–1401 (2018).

Leng, X., Ji, K., Yang, K. & Zou, H. A bilateral CFAR algorithm for ship detection in SAR images. IEEE Geosci. Remote Sens. Lett. 12 , 1536–1540 (2015).

He, K., Zhang, X., Ren, S. & Sun, J. in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (Computer Vision Foundation, 2016).

Liu, Z. et al. in Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11966–11976 (Computer Vision Foundation, 2022).

Keating, K. A. & Cherry, S. Modeling utilization distributions in space and time. Ecology 90 , 1971–1980 (2009).

Lawrence, P. J. et al. Artificial shorelines lack natural structural complexity across scales. Proc. R. Soc. B Biol. Sci. 288 , 20210329 (2021).

Crowell, M., Leatherman, S. P. & Buckle, M. K. Historical shoreline change: error analysis and mapping accuracy. J. Coast. Res. 7 , 839–852 (1991).

Google Scholar

Choi, J. H. & Won, J. S. Efficient SAR azimuth ambiguity reduction in coastal waters using a simple rotation matrix: the case study of the northern coast of Jeju Island. Remote Sens. 13 , 4865 (2021).

Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. in Proc. Second International Conference on Knowledge Discovery and Data Mining 226–231 (ACM, 1996).

Politis, D. N. & Romano, J. P. The stationary bootstrap. J. Am. Stat. Assoc. 89 , 1303–1313 (1994).

Sala, E. et al. The economics of fishing the high seas. Sci. Adv. 4 , eaat2504 (2018).

Tings, B., Bentes, C., Velotto, D. & Voinov, S. Modelling ship detectability depending on TerraSAR-X-derived metocean parameters. CEAS Space J. 11 , 81–94 (2019).

Krause, J. et al. in Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science , vol. 9907. (eds Leibe, B. et al.) https://doi.org/10.1007/978-3-319-46487-9_19 (Springer, 2016).

Download references

## Acknowledgements

This work was funded by Bloomberg Philanthropies, National Geographic Pristine Seas and Oceankind. We thank D. Kroodsma for reviewing the manuscript. We thank the European Space Agency (ESA) for making the radar and optical imagery freely available. Google provided in kind computing resources and technical support. All maps were generated using Python ( https://www.python.org ) with the open-source visualization libraries PySeas ( https://github.com/GlobalFishingWatch/pyseas ), Matplotlib ( https://matplotlib.org ) and Cartopy ( https://scitools.org.uk/cartopy ).

## Author information

These authors contributed equally: Fernando Paolo, David Kroodsma

## Authors and Affiliations

Global Fishing Watch, Washington, DC, USA

Fernando Paolo, David Kroodsma, Tim Hochberg, Pete Davis & Luca Marsaglia

Forest and Wildlife Ecology Department, University of Wisconsin–Madison, Madison, WI, USA

Jennifer Raynor

Marine Geospatial Ecology Lab, Nicholas School of the Environment, Duke University, Durham, NC, USA

Jesse Cleary & Patrick Halpin

Bren School of Environmental Science and Management, University of California, Santa Barbara, Santa Barbara, CA, USA

Sara Orofino

SkyTruth, Shepherdstown, WV, USA

Christian Thomas

You can also search for this author in PubMed Google Scholar

## Contributions

F.P. led the writing, with input from D.K. and J.R. and suggestions from all authors. D.K., F.P., P.H. and C.T. conceived the study. D.K. oversaw the project and secured most of the funding. F.P. built the detector, with contributions from T.H., C.T., D.K. and P.H. F.P. and T.H. built the deep-learning models. F.P. and D.K. performed the main analyses, supported by P.D. F.P., P.D., C.T. and J.C. reviewed the offshore infrastructure. P.D., S.O., L.M. and D.K. reviewed the vessel detections and fishing classifications. F.P., D.K., P.D. and L.M. performed the data labelling. F.P. made most of the figures, with D.K. and P.D. contributing further figures. T.H. and D.K. developed the SAR to AIS matching. All authors discussed the results.

## Corresponding author

Correspondence to Fernando Paolo .

## Ethics declarations

Competing interests.

The authors declare no competing interests.

## Peer review

Peer review information.

Nature thanks Konstantin Klemmer, Bjoern Tings and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

## Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data figures and tables

Extended data fig. 1 the sentinel-1 sar imagery (iw grd product) covers most coastal waters but does not sample most of the open ocean..

a , The extent and frequency of SAR acquisitions is determined by the mission priorities. b , The area of the ocean imaged every day by the Sentinel-1 GRD product (using a 12-day rolling average) depended on whether one satellite was imaging the ocean (S1A, October 2014 to present) or two (S1A and S1B, September 2016 to December 2017). S1B stopped operating on 23 December 2021.

## Extended Data Fig. 2 The Sentinel-1 detection model is able to detect most industrial vessels.

The recall curve (fraction of actual positives correctly detected) for our Sentinel-1 detection model as a function of vessel length shows that vessels spaced far apart (>1 km distance, constituting 79% of all vessel detections) have higher recall than all vessels combined. For vessels smaller than 25 m, detection performance decays steeply with vessel size.

## Extended Data Fig. 3 Fishing vessel activity at sea is shown with an unprecedented level of detail by satellite mapping and deep learning.

Fishing vessels tend to aggregate along bathymetric features. Each dot represents a detected vessel during 2017–2021. The colours represent detections matched (blue, publicly tracked) and unmatched (red, not publicly tracked) to known vessel positions from the AIS. The number of detections in each location depends on the local density of vessels, as well as the number of SAR acquisitions.

## Extended Data Fig. 4 Transport and energy vessel activity at sea is shown with an unprecedented level of detail by satellite mapping and deep learning.

Transport and energy vessels usually follow major routes (for example, shipping lanes). Each dot represents a detected vessel during 2017–2021. The colours represent detections matched (blue, publicly tracked) and unmatched (red, not publicly tracked) to known vessel positions from the AIS. The number of detections in each location depends on the local density of vessels, as well as the number of SAR acquisitions.

## Extended Data Fig. 5 Leading nations with most fishing and non-fishing vessel activity.

The bars represent the average number of detections per satellite overpass at any location in the EEZ during 2017–2021. Percentages are the fraction of detections unmatched to known vessel locations from the AIS (activity missing from public monitoring systems).

## Extended Data Fig. 6 In the western North Korean EEZ, peaks of fishing vessel activity coincide with Chinese moratoria on industrial fishing.

Fishing activity in western North Korea waters increases coinciding with the Chinese fishing moratoria (vertical stripes). There is a substantial decrease in overall vessel activity during the COVID-19 pandemic (2020–2021), when North Korea shut its borders.

## Extended Data Fig. 7 Satellite imagery-based detection allows monitoring at local scale.

a , b , From 2017 to 2021, there were substantial numbers of vessels not publicly tracked (red) within the boundaries of two of the most iconic, biologically important and well-monitored MPAs in the world: the Galápagos Marine Reserve and south of the Great Barrier Reef Marine Park. c , d , Two areas of intense marine infrastructure development are the oil infrastructure in Lake Maracaibo in Venezuela and offshore wind farms north of Shanghai, China.

## Extended Data Fig. 8 Leading countries with most offshore oil and wind infrastructure.

Bars represent the median value of monthly counts of offshore structures for each EEZ in 2021. ‘Probable’ refers to detected infrastructure with lower confidence but still within the EEZ of the respective country.

## Extended Data Fig. 9 Offshore oil development during 2017–2021 in the top 20 oil nations.

Time series represent the median monthly counts of detected oil structures inside each country’s EEZ annually. Note the different ranges in the y axes.

## Extended Data Fig. 10 Number of vessels and structures as a function of distance from shore and from infrastructure.

a , Trawler vessel activity is relatively low close to oil infrastructure, but other types of fishing show increased activity there. b , The number of vessels and oil platforms decrease rapidly far from shore, but the number of wind structures stays relatively constant within tens of kilometres from the coast.

## Extended Data Fig. 11 The labelled data used for training the deep-learning models sample all regions of the ocean.

Spatial distribution of the training and holdout data used to train and evaluate the ‘vessel presence and length estimation’ model, ‘fishing and non-fishing classification’ model and ‘offshore infrastructure classification’ model. The holdout data are random subsamples with the same spatial distribution as the training data without any overlap in time or space (no data leakage between training and test sets). See respective classification sections for a description of the sampling strategies and characteristics of each dataset.

## Supplementary information

Peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

## About this article

Cite this article.

Paolo, F., Kroodsma, D., Raynor, J. et al. Satellite mapping reveals extensive industrial activity at sea. Nature 625 , 85–91 (2024). https://doi.org/10.1038/s41586-023-06825-8

Download citation

Received : 31 March 2023

Accepted : 02 November 2023

Published : 03 January 2024

Issue Date : 04 January 2024

DOI : https://doi.org/10.1038/s41586-023-06825-8

## Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

## Quick links

- Explore articles by subject
- Guide to authors
- Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Read our research on: Israel | Internet & Technology | Election 2024

## Regions & Countries

Teens and social media fact sheet.

MORE FACT SHEETS: TEENS AND INTERNET, DEVICE ACCESS

## Table of Contents

Explore the patterns and trends of U.S. teens’ experiences on different online platforms below.

## Which online platforms are most common among teens

YouTube tops the list among teens, with roughly nine-in-ten saying they use the platform. TikTok, Snapchat and Instagram also remain popular – more than half of teens report using each of these sites.

Note: Figures from 2015 depicted above were collected from 2014 to 2015. Those who did not give an answer are not shown

Source: Surveys of U.S. teens conducted 2014-2023.

## How use of online platforms by teens differs across demographic groups

Usage of the major online platforms can vary among teens by factors such as age, gender, and race and ethnicity. 1

% of U.S. teens ages 13 to 17 who say they ever use the following apps or sites

- YouTube TikTok Snapchat
- Instagram Facebook Discord
- WhatsApp Twitter (X) Twitch
- Reddit BeReal

## How often teens visit online platforms

Many teens are on social media daily – if not constantly – but daily use varies by platform. About seven-in-ten U.S. teens say they visit YouTube every day – including 16% who do so almost constantly. TikTok follows with 58% who say they visit it daily, while far fewer report daily use of Facebook.

Note: Figures may not add up to NET values due to rounding. Those who did not give an answer are not shown.

Source: Survey of U.S. teens conducted Sept. 26-Oct. 23, 2023.

## Which teens constantly visit online platforms

Differences emerge among teens, including by gender and race and ethnicity, on whether they say they are on these platforms almost constantly.

% of U.S. teens ages 13 to 17 who say they visit or use the following apps or sites almost constantly

## Find out more

This fact sheet was compiled by Research Analyst Michelle Faverio , with help from Research Assistant Olivia Sidoti , Digital Producer Sara Atske , Associate Information Graphics Designer Kaitlyn Radde and Temporary Researcher Eugenie Park .

Read the methodology and topline .

Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder. This report was created to better understand teens’ use of digital devices, social media and other online platforms.

Follow these links for more in-depth analysis of teens and technology:

- Teens, Social Media and Technology 2023
- Connection, Creativity and Drama: Teen Life on Social Media in 2022
- Social media policies for minors: What U.S. adults and teens think
- Use of ChatGPT for schoolwork among U.S. teens

Find more reports and blog posts related to internet and technology .

- There were not enough Asian American teen respondents in the sample to be broken out into a separate analysis. As always, their responses are incorporated into the general teen population figures throughout. ↩

About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of The Pew Charitable Trusts .

- Entertainment
- Photography
- Press Releases
- Israel-Hamas War
- Russia-Ukraine War
- Latin America
- Middle East
- Asia Pacific
- Election 2024
- AP Top 25 College Football Poll
- Movie reviews
- Book reviews
- Financial Markets
- Business Highlights
- Financial wellness
- Artificial Intelligence
- Social Media

## Scientists find about a quarter million invisible nanoplastic particles in a liter of bottled water

The average liter of bottled water has nearly a quarter million invisible pieces of ever so tiny nanoplastics, detected and categorized for the first time by a microscope using dual lasers. (Jan. 8) (AP Video: Mary Conlon)

FILE - Tourists fill plastic bottles with water from a public fountain at the Sforzesco Castle, in Milan, Italy, June 25, 2022. A new study found the average liter of bottled water has nearly a quarter million invisible pieces of nanoplastics, microscopic plastic pieces, detected and categorized for the first time by a microscope. (AP Photo/Luca Bruno, File)

- Copy Link copied

Naixin Qian, a Columbia physical chemist, demonstrates the glass filtration apparatus used to test water samples for nanoplastics, microscopic plastic pieces, in New York on Monday, Jan. 8, 2024. The average liter of bottled water has nearly a quarter million invisible pieces of nanoplastics, detected and categorized for the first time by a microscope. (AP Photo/Mary Conlon)

The inside of an optical box reveals the components that organize the light from laser beams to identify nanoplastics, microscopic plastic pieces, in New York on Monday, Jan. 8, 2024. A new study found the average liter of bottled water has nearly a quarter million invisible pieces of nanoplastics, detected and categorized for the first time by a microscope. (AP Photo/Mary Conlon)

Naixin Qian, a Columbia physical chemist, demonstrates the placement of a sample for nanoplastics, microscopic plastic pieces, in New York on Monday, Jan. 8, 2024. A new study found the average liter of bottled water has nearly a quarter million invisible pieces of nanoplastics, detected and categorized for the first time by a microscope. (AP Photo/Mary Conlon)

Naixin Qian, a Columbia physical chemist, zooms in on an image generated from a microscope scan, with nanoplastics, microscopic plastic pieces, appearing as bright red dots in New York on Monday, Jan. 8, 2024. A new study found the average liter of bottled water has nearly a quarter million invisible pieces of nanoplastics, detected and categorized for the first time by a microscope. (AP Photo/Mary Conlon)

The average liter of bottled water has nearly a quarter million invisible pieces of ever so tiny nanoplastics, detected and categorized for the first time by a microscope using dual lasers.

Scientists long figured there were lots of these microscopic plastic pieces, but until researchers at Columbia and Rutgers universities did their calculations they never knew how many or what kind. Looking at five samples each of three common bottled water brands, researchers found particle levels ranged from 110,000 to 400,000 per liter, averaging at around 240,000 according to a study in Monday’s Proceedings of the National Academy of Sciences.

These are particles that are less than a micron in size. There are 25,400 microns — also called micrometers because it is a millionth of a meter — in an inch. A human hair is about 83 microns wide.

Previous studies have looked at slightly bigger microplastics that range from the visible 5 millimeters, less than a quarter of an inch, to one micron. About 10 to 100 times more nanoplastics than microplastics were discovered in bottled water, the study found.

Tourists fill plastic bottles with water from a public fountain at the Sforzesco Castle, in Milan, Italy, June 25, 2022. (AP Photo/Luca Bruno)

Much of the plastic seems to be coming from the bottle itself and the reverse osmosis membrane filter used to keep out other contaminants, said study lead author Naixin Qian, a Columbia physical chemist. She wouldn’t reveal the three brands because researchers want more samples before they single out a brand and want to study more brands. Still, she said they were common and bought at a WalMart.

Researchers still can’t answer the big question: Are those nanoplastic pieces harmful to health?

“That’s currently under review. We don’t know if it’s dangerous or how dangerous,” said study co-author Phoebe Stapleton, a toxicologist at Rutgers. “We do know that they are getting into the tissues (of mammals, including people) … and the current research is looking at what they’re doing in the cells.”

The International Bottled Water Association said in a statement: “There currently is both a lack of standardized (measuring) methods and no scientific consensus on the potential health impacts of nano- and microplastic particles. Therefore, media reports about these particles in drinking water do nothing more than unnecessarily scare consumers.”

The American Chemistry Council, which represents plastics manufacturers, declined to immediately comment.

The world “is drowning under the weight of plastic pollution, with more than 430 million tonnes of plastic produced annually” and microplastics found in the world’s oceans , food and drinking water with some of them coming from clothing and cigarette filters, according to the United Nations Environment Programme. Efforts for a global plastics treaty continue after talks bogged down in November.

Naixin Qian, a Columbia physical chemist, zooms in on an image generated from a microscope scan, with nanoplastics, microscopic plastic pieces, appearing as bright red dots in New York on Monday, Jan. 8, 2024. (AP Photo/Mary Conlon)

All four co-authors interviewed said they were cutting back on their bottled water use after they conduced the study.

Wei Min, the Columbia physical chemist who pioneered the dual laser microscope technology, said he has reduced his bottled water use by half. Stapleton said she now relies more on filtered water at home in New Jersey.

But study co-author Beizhan Yan, a Columbia environmental chemist who increased his tap water usage, pointed out that filters themselves can be a problem by introducing plastics.

“There’s just no win,” Stapleton said.

Outside experts, who praised the study, agreed that there’s a general unease about perils of fine plastics particles, but it’s too early to say for sure.

“The danger of the plastics themselves is still an unanswered question. For me, the additives are the most concerning,” said Duke University professor of medicine and comparative oncology group director Jason Somarelli, who wasn’t part of the research. “We and others have shown that these nanoplastics can be internalized into cells and we know that nanoplastics carry all kinds of chemical additives that could cause cell stress, DNA damage and change metabolism or cell function.”

Somarelli said his own not yet published work has found more than 100 “known cancer-causing chemicals in these plastics.”

What’s disturbing, said University of Toronto evolutionary biologist Zoie Diana, is that “small particles can appear in different organs and may cross membranes that they aren’t meant to cross, such as the blood-brain barrier.”

Diana, who was not part of the study, said the new tool researchers used makes this an exciting development in the study of plastics in the environment and body.

About 15 years ago, Min invented dual laser microscope technology that identifies specific compounds by their chemical properties and how they resonate when exposed to the lasers. Yan and Qian talked to him about using that technique to find and identify plastics that had been too small for researchers using established methods.

Kara Lavender Law, an oceanographer at the Sea Education Association, said “the work can be an important advance in the detection of nanoplastics” but she said she’d like to see other analytical chemists replicate the technique and results.

Denise Hardesty, an Australian government oceanographer who studies plastic waste, said context is needed. The total weight of the nanoplastic found is “roughly equivalent to the weight of a single penny in the volume of two Olympic-sized swimming pools.”

Hardesty is less concerned than others about nanoplastics in bottled water, noting that “I’m privileged to live in a place where I have access to ‘clean’ tap water and I don’t have to buy drinking water in single use containers.”

Yan said he is starting to study other municipal water supplies in Boston, St. Louis, Los Angeles and elsewhere to see how much plastics are in their tap water. Previous studies looking for microplastics and some early tests indicate there may be less nanoplastic in tap water than bottled.

Even with unknowns about human health, Yan said he does have one recommendation for people who are worried: Use reusable bottles instead of single-use plastics.

Read more of AP’s climate coverage at http://www.apnews.com/climate-and-environment

Follow Seth Borenstein on X, formerly known as Twitter, at @borenbears

Associated Press climate and environmental coverage receives support from several private foundations. See more about AP’s climate initiative here. The AP is solely responsible for all content.

## More and more jobs can be done from anywhere. What does that mean for workers?

By 2030, the number of global digital jobs that can be performed remotely from anywhere is expected to rise by roughly 25% to around 92 million. Image: Kristin Wilson/Unsplash

## .chakra .wef-1c7l3mo{-webkit-transition:all 0.15s ease-out;transition:all 0.15s ease-out;cursor:pointer;-webkit-text-decoration:none;text-decoration:none;outline:none;color:inherit;}.chakra .wef-1c7l3mo:hover,.chakra .wef-1c7l3mo[data-hover]{-webkit-text-decoration:underline;text-decoration:underline;}.chakra .wef-1c7l3mo:focus,.chakra .wef-1c7l3mo[data-focus]{box-shadow:0 0 0 3px rgba(168,203,251,0.5);} Victoria Masterson

## .chakra .wef-1nk5u5d{margin-top:16px;margin-bottom:16px;line-height:1.388;color:#2846F8;font-size:1.25rem;}@media screen and (min-width:56.5rem){.chakra .wef-1nk5u5d{font-size:1.125rem;}} Get involved .chakra .wef-9dduvl{margin-top:16px;margin-bottom:16px;line-height:1.388;font-size:1.25rem;}@media screen and (min-width:56.5rem){.chakra .wef-9dduvl{font-size:1.125rem;}} with our crowdsourced digital platform to deliver impact at scale

Stay up to date:, future of work.

## Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

## Related topics:

The agenda .chakra .wef-n7bacu{margin-top:16px;margin-bottom:16px;line-height:1.388;font-weight:400;} weekly.

A weekly update of the most important issues driving the global agenda

## .chakra .wef-1dtnjt5{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;} More on Future of Work .chakra .wef-17xejub{-webkit-flex:1;-ms-flex:1;flex:1;justify-self:stretch;-webkit-align-self:stretch;-ms-flex-item-align:stretch;align-self:stretch;} .chakra .wef-nr1rr4{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;white-space:normal;vertical-align:middle;text-transform:uppercase;font-size:0.75rem;border-radius:0.25rem;font-weight:700;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;line-height:1.2;-webkit-letter-spacing:1.25px;-moz-letter-spacing:1.25px;-ms-letter-spacing:1.25px;letter-spacing:1.25px;background:none;padding:0px;color:#B3B3B3;-webkit-box-decoration-break:clone;box-decoration-break:clone;-webkit-box-decoration-break:clone;}@media screen and (min-width:37.5rem){.chakra .wef-nr1rr4{font-size:0.875rem;}}@media screen and (min-width:56.5rem){.chakra .wef-nr1rr4{font-size:1rem;}} See all

Private equity holds the key to creating quality jobs for millions

Megha Bansal Rizoli

January 10, 2024

These are the biggest global risks we face in 2024 and beyond

Sophie Heading and Ellissa Cavaciuti-Wishart

Closing gender gaps in the private sector benefits over 850,000 women

Boosting productivity is only the sideshow for AI: Transforming good work into great will be its real benefit

K Krithivasan

January 9, 2024

8 global issues shaping and driving job creation

Steffica Warwick

January 8, 2024

4 ways public-private partnerships can bridge the AI opportunity gap

Jeff Maggioncalda

## Here’s what you’re really swallowing when you drink bottled water

A new study finds that ‘nanoplastics’ are even more common than microplastics in bottled water.

People are swallowing hundreds of thousands of microscopic pieces of plastic each time they drink a liter of bottled water, scientists have shown — a revelation that could have profound implications for human health.

A new paper released Monday in the Proceedings of the National Academy of Sciences found about 240,000 particles in the average liter of bottled water, most of which were “nanoplastics” — particles measuring less than one micrometer (less than one-seventieth the width of a human hair).

For the past several years, scientists have been looking for “microplastics,” or pieces of plastic that range from one micrometer to half a centimeter in length, and found them almost everywhere. The tiny shards of plastic have been uncovered in the deepest depths of the ocean , in the frigid recesses of Antarctic sea ice and in the human placenta . They spill out of laundry machines and hide in soils and wildlife. Microplastics are also in the food we eat and the water we drink: In 2018, scientists discovered that a single bottle of water contained, on average, 325 pieces of microplastics .

But researchers at Columbia University have now identified the extent to which nanoplastics also pose a threat.

“Whatever microplastic is doing to human health, I will say nanoplastics are going to be more dangerous,” said Wei Min, a chemistry professor at Columbia and one of the authors of the new paper.

Scientists have also found microplastics in tap water, but in smaller amounts.

Sherri Mason, a professor and director of sustainability at Penn State Behrend in Erie, Pa., says plastic materials are a bit like skin — they slough off pieces into water or food or whatever substance they are touching.

“We know at this point that our skin is constantly shedding,” she said. “And this is what these plastic items are doing — they’re just constantly shedding.”

The typical methods for finding microplastics can’t be easily applied to finding even smaller particles, but Min co-invented a method that involves aiming two lasers at a sample and observing the resonance of different molecules. Using machine learning, the group was able to identify seven types of plastic molecules in a sample of three types of bottled water.

“There are some other techniques that have identified nanoplastics before,” said Naixin Qian, a PhD student in chemistry at Columbia and the first author of the new paper. “But before our study, people didn’t have a precise number of how many.”

“It’s really groundbreaking,” said Mason, who was not involved in the research but was one of the first researchers to identify plastics in bottled water. The new study, she says, shows how extensive nanoplastics are and provides a starting point to assess their health effects.

“Normal humans looking at a sample of water — if there’s visible plastic in it, they’ll be turned off,” she said. “But they don’t realize that it’s actually the invisible plastics present that are the biggest concern.”

The new study found pieces of PET (polyethylene terephthalate), which is what most plastic water bottles are made of, and polyamide, a type of plastic that is present in water filters. The researchers hypothesized that this means plastic is getting into the water both from the bottle and from the filtration process.

Researchers don’t yet know how dangerous tiny plastics are for human health. In a large review published in 2019, the World Health Organization said there wasn’t enough firm evidence linking microplastics in water to human health, but described an urgent need for further research.

In theory, nanoplastics are small enough to make it into a person’s blood, liver and brain. And nanoplastics are likely to appear in much larger quantities than microplastics — in the new research, 90 percent of the plastic particles found in the sample were nanoplastics, and only 10 percent were larger microplastics.

Jill Culora, a spokeswoman for the International Bottled Water Association, said in an email that there “is both a lack of standardized methods and no scientific consensus on the potential health impacts of nano- and microplastic particles. Therefore, media reports about these particles in drinking water do nothing more than unnecessarily scare consumers.”

Finding a connection between microplastics and health problems in humans is complicated — there are thousands of types of plastics, and over 10,000 chemicals used to manufacture them. But at a certain point, Mason said, policymakers and the public need to prepare for the possibility that the tiny plastics in the air we breathe, the water we drink and the clothes we wear have serious and dangerous effects.

“You still have a lot of people that, because of marketing, are convinced that bottled water is better,” Mason said. “But this is what you’re drinking in addition to that H2O.”

## More on climate change

Understanding our climate: Global warming is a real phenomenon , and weather disasters are undeniably linked to it . As temperatures rise, heat waves are more often sweeping the globe — and parts of the world are becoming too hot to survive .

What can be done? The Post is tracking a variety of climate solutions , as well as the Biden administration’s actions on environmental issues . It can feel overwhelming facing the impacts of climate change, but there are ways to cope with climate anxiety .

Inventive solutions: Some people have built off-the-grid homes from trash to stand up to a changing climate. As seas rise, others are exploring how to harness marine energy .

What about your role in climate change? Our climate coach Michael J. Coren is answering questions about environmental choices in our everyday lives. Submit yours here. You can also sign up for our Climate Coach newsletter .

- Pollution fueling a sex imbalance among endangered green sea turtles November 26, 2023 Pollution fueling a sex imbalance among endangered green sea turtles November 26, 2023
- This Fox News host gives climate skeptics airtime but went solar at home October 25, 2023 This Fox News host gives climate skeptics airtime but went solar at home October 25, 2023
- How humans have altered the Earth enough to start a new chapter of geologic time June 20, 2023 How humans have altered the Earth enough to start a new chapter of geologic time June 20, 2023

To revist this article, visit My Profile, then View saved stories .

- What Is Cinema?
- Newsletters

## Washington Post Newsroom Is Rattled by Buyouts

By Charlotte Klein

In late December, word of who’d taken a buyout at The Washington Post began to trickle out. Reporters found themselves especially alarmed by the hard cost cutting hit taken by one particular department: news research, a unit that assists investigations by, among other things, tracking down subjects, finding court records, verifying claims, and scouring documents. The department’s three most senior researchers— Magda Jean-Louis and Pulitzer Prize winners Alice Crites and Jennifer Jenkins —had all accepted buyouts, among the 240 that the company offered employees across departments amid financial struggles . That left news research with only two people: supervisor Monika Mathur and researcher Razzan Nakhlawi.

A group of Post journalists were so concerned about the gutting of the department that they expressed that sentiment in writing last week to executive editor Sally Buzbee and Will Lewis, the paper’s new publisher and CEO. The buyouts have “left us at a real disadvantage both in experience and sheer numbers when compared with our competitors,” the letter read, according to a copy reviewed by Vanity Fair. “We are eager to start off 2024 with a renewed sense of purpose and feel it will put us at a considerable disadvantage if our news research department is in such a diminished state.” The letter—whose signatories included Post stars such as Josh Dawsey, Ashley Parker, Jacqueline Alemany, Beth Reinhard, and Sarah Ellison —urged management to both bring Crites and Jean-Louis back in some capacity and provide more “permanent support” for those remaining in the department.

The objective, according to one Post staffer, was to convey to Lewis and Buzbee that while news research may be a small department, “it actually may be the most important one we have at the paper in some ways.” Researchers “have access to all these databases and tools that we don’t have. So either you have to give us the tools to do it or hire more people.” Buzbee, I’m told, responded to the letter, saying that they were working on it and trying to bring someone on. “The solution so far is not really acceptable,” the staffer said, noting that reallocating someone from another team “doesn’t replace two multiple-time Pulitzer-[winning] researchers who can find anything in the world.”

Stress over the research buyouts speaks to broader anxiety inside the Post, which heads into this election year with less manpower and lingering uncertainty around both business and editorial strategy. “In general, going into this year with 10% of the company just shaved off—it’s sort of like you wake up January 2 and think, Okay, shit, here we go,” as a second staffer put it. Names impacted in the buyout action flooded out in the final weeks of 2023, a staggering list that included longtime editors and writers with a wealth of experience and institutional knowledge, such as Opinion columnist Greg Sargent , national correspondent Scott Wilson, media reporter Paul Farhi, senior editor Marc Fisher, and investigative editor Jeff Leen. The Post has also begun the year with news that its chief revenue officer, Alex MacCallum, is departing after less than six months on the job. She’s reportedly in talks to return to CNN, where she formerly served as a top digital executive—and where Mark Thompson, her former Times boss, is now running the show. Lewis, meanwhile, apparently wants to be a presence in the newsroom in ways that predecessor Fred Ryan seemed to deliberately avoid. I’m told that he has sent several reporters personal notes about their stories, and was seen walking around the newsroom last week.

A variety of newsroom concerns—from the immediate impact of the buyouts, to MacCallum’s exit, to the health of the business—were raised last week during a National desk meeting held by Buzbee and managing editor Matea Gold, which more than 100 employees attended. (Buzbee, I’m told, is kicking off the New Year by holding such meetings with various teams, such as local and international.) The state of the research team was on the minds of several staffers, who pointed out that Crites had been the go-to for reporters on everything from school shootings to legal briefings to finding the cell phone numbers of people who very much did not want to be found—so much so that she was often given a co-byline on pieces. Alemany called the research team the linchpin of any ambitious endeavor at the Post and described how Crites had handed her the keys to some of her biggest scoops. Reinhard mentioned that the Post never replaced Pulitzer-winning researcher Julie Tate when she decamped for The New York Times in 2021, and noted that the paper was now without Crites, who’d been holding up the department for a while. “To be honest, it wasn’t really on my radar,” a third staffer conceded to me—not until hearing “these reporters saying all their best stories were done with researchers.”

Also during the session, Parker said she’d been stunned to learn in a prior meeting that the Post had lost about 60 people of color in the past two years, a stat she’d heard from deputy managing editor Monica Norton, who has been keeping her own unofficial list. (The number of journalists of color who have been hired in the last two years exceeds the number of journalists of color who have left in that same period, according to a source familiar with the matter.) Other reporters in the meeting also expressed concerns that the buyouts would make the paper less diverse, according to two staffers. Buzbee said that the Post had conducted copious amounts of testing to understand how the buyouts would impact its diversity, according to the source with knowledge of the situation. She also said the Post did not yet have numbers to share on how the buyouts had impacted diversity at the paper.

“The Post has a long history of holding power to account and we adhere to that legacy every day, including in times of transition,” Buzbee said in a statement to Vanity Fair. “Right now, we’re committed to fulfilling that mission and to building a newsroom of the future.”

Scaling back staff while heading into a pivotal presidential election year seems like an especially ill-timed move given the Post ’s traditional strengths in national politics and policy. Senior editors at the Post have been banking on heightened interest in the election to juice readership amid slowed traffic and subscriptions. At one point in the meeting, according to two staffers, investigative reporter Carol Leonnig said that over the years she’d been told that the National team was doing great work and that issues on the business side would be taken care of, only for the problems to persist.

In November, I reported how staffers were seeking clarity about the Post ’s future, with the central question being, as one staffer put it, “What do we want to be?” The question remains, and was at the heart of the National meeting. Congressional reporter Paul Kane got the room’s attention when he questioned the paper’s editorial strategy by reading the top headlines on the Post homepage off his phone: a hodgepodge about everything from national security to how to stop worrying about FOMO. No offense, he said, per two staffers, but there was great journalism being buried on the homepage. Veteran political reporter Dan Balz also chimed in to ask about the paper’s sensibility and character—what message was the Post trying to send about what it stands for?

Buzbee talked about the need to feature lighter stories amid news fatigue, and how the Post needed to get smarter on SEO—at which some reporters rolled their eyes, considering they’d been discussing SEO internally for years. Buzbee noted Lewis’s dedication to hard news, expressing her excitement about his years spent working on the issue of journalism in the social media era. Some attendees I spoke to didn’t find Buzbee’s responses particularly satisfying, however; a fourth staffer felt that the top editor didn’t “answer the fundamental question of who we are.”

This story has been updated.

## More Great Stories from Vanity Fair

There Is No “Both Sides” to Trump’s Threat to Democracy

The Golden Globes Were a Near-Total Disaster

King Charles Mulls a New Solution to His “Andrew Problem”

Beyond a “Biohacker” Tech Bro’s Quest to Cheat Death

The Real-Life Story That Inspired Netflix’s Dark Oscar Contender

Inside the Frat-Boy Crime Ring That Swept the South

Plus: Fill Out Your 2023 Emmys Ballot Before the Big Night

By Katey Rich

By Leah Faye Cooper

## Charlotte Klein

Media reporter.

By Radhika Jones

## IMAGES

## VIDEO

## COMMENTS

2, 3, 5, 7, 11, 13, 17, 19, 23, and 29 are all prime numbers. In fact, these are the first 10 prime numbers (you can check this yourself, if you wish!). Looking at this short list of prime numbers can already reveal a few interesting observations.

In this paper, twenty different types of prime numbers have been covered and Python programs to generate them are given, with the Python library. Asymmetric algorithm has been used for key...

The paper studies a number of algebraic (group-theoretic) and dynamical properties of $\mathbb{G}$ including approximation, mixing, periodicity, entropy, decomposition, generators, and topological conjugacy.

Explore the latest full-text research PDFs, articles, conference papers, preprints and more on PRIME NUMBERS. Find methods information, sources, references or conduct a literature review on PRIME ...

Prime Numbers Richard Crandall Carl Pomerance Prime Numbers A Computational Perspective Second Edition Richard Crandall Center for Advanced Computation 3754 SE Knight Street Portland, OR 97202 USA [email protected] Carl Pomerance Department of Mathematics Dartmouth College Hanover, NH 03755-3551 USA [email protected]

This paper deal with the development of prime and composite numbers and their modern applications to mathematical and physical sciences. It contains the distribution of prime numbers, prime number theorems, Euler's and Riemann's zeta functions and their remarkable link with prime numbers and the celebrated unsolved Riemann Hypothesis (RH). Special attention is given to the discovery of the ...

Prime Numbers On Prime number varieties and their applications Authors: Y. Gayathri Narayana Yegnanarayanan Venkataraman Kalasalingam Academy of Research and Education Abstract Prime...

negative integers and so on. A prime number is simply a natural number that is only divisible by 1 and itself. All other integers are known as composite numbers. The first ten prime numbers can be easily listed: 2, 3, 5, 7, 11, 13, 17, 19, 23 and 29; interestingly, only the number two is an even prime number. In the literature, there are many

Counting the number of prime numbers up to a certain natural number and describing the asymptotic behavior of such a counting function has been studied by famous mathematicians like Gauss, Legendre, Dirichlet, and Euler. The prime number theorem determines that such asymptotic behavior is similar to the asymptotic behavior of the number divided by its natural logarithm. In this paper, we take ...

Prime numbers are beautiful, mysterious, and beguiling mathematical objects. The mathematician Bernhard Riemann made a celebrated conjecture about primes in 1859, the so-called Riemann hypothesis, which remains one of the most important unsolved problems in mathematics. Through the deep insights of the authors, this book introduces primes and explains the Riemann hypothesis.

Diophantine equations and inequalities with prime numbers. January 2018. This paper presents a brief survey of the basic diophantine equations and inequalities with prime numbers, solved by ...

A prime number (or a prime) is a natural number greater than 1 that has no positive divisors other than 1 and itself. A natural number greater than 1 that is not a prime number is called a composite number. For example, 5 is prime, as only 1 and 5 divide it, whereas 6 is composite, since it has the divisors 2 and 3 in addition to 1 and 6. The fundamental theorem of arithmetic establishes the ...

January 12, 2018 Why do we need to know about prime numbers with millions of digits? Ittay Weiss, University of Portsmouth Prime numbers are a mathematical mystery. September 19, 2016...

In this paper, a new formula for {\pi}^ (2) (N) is formulated, it is a function that counts the number of semi-primes not exceeding a given number N. A semi-prime is a natural number that is the product of precisely two prime numbers, the two primes in the product may equal each other. Semi-prime numbers are also a case of almost primes.

This real colossus, M 82589933, discovered on December 7, 2018 by Florida programmer Patrick Laroche, reaches the unimaginable length of 24,862,048 digits; if someone were to try to print it on paper, almost 10,000 sheets would be needed. And the search goes on. As Woltman tells OpenMind, "GIMPS will continue to forge ahead over the coming years.

The concept of prime number and the strategies used in explaining prime numbers South African Journal of Education DOI: CC BY 4.0 Authors: Nejla Gürefe Mersin University Gülfem Sarpkaya Aktaş...

... 257 is a prime number. In Table 1 is given a list of all primes less than 260 [7, 8]. In general, ℤn has exactly n elements: ℤ/nℤ = {0, 1, …, n − 1}. ... Significant role of the specific...

Finding/proving The theory behind how these record primes are found and proven. Proving Questions? We answer common questions: Is one a prime? Longest list of primes? Why? FAQ Prime Glossary The Prime Glossary is a collection of definitions related to prime numbers. Prime Glossary Largest Known Prime by Year ↓

1 Introduction In Number theory, prime numbers are defined as those numbers that have two factors, i.e., one and divisor itself. Prime numbers have been studied thousand of years. The first incident noted around 300 BC when Euclid published several results about prime numbers i.e.

Abstract：Twin prime number problem is mainly the structure of the twin prime numbers and whether there are infinitely many prime twins group. In this paper, by constructing a special ... computer symbolic operation and so on, also has research interest in number theory, published over 70 papers, published by 2 books. Email: [email protected] ...

The research followed the PRISMA 2020 guideline for reporting systematic reviews and was conducted using three established databases: Web of Science, IEEE Xplore and Scopus. This methodical review strive to provide a systematic, extensive overview of the progress in sieving for primes since the inception of the field - unfortunately, no ...

1987. 1,217. PDF. This article briefly outlines the development of the theory of prime numbers; then an application to the problem of security during data transmission, that is cryptography is described. On the one hand, the study of numbers - and especially of prime numbers - has fascinated mathematicians since ancient times; on the other ...

Abstract Prime number plays a very important role in cryptography. There are various types of prime numbers and consists various properties. This paper gives the detail description of the importance of prime numbers in cryptography and algorithms which generates large/strong prime numbers.

The world's population increasingly relies on the ocean for food, energy production and global trade 1,2,3, yet human activities at sea are not well quantified 4,5.We combine satellite imagery ...

YouTube tops the list among teens, with roughly nine-in-ten saying they use the platform. TikTok, Snapchat and Instagram also remain popular - more than half of teens report using each of these sites. % of U.S. teens ages 13 to 17 who say they ever use the following apps or sites. 0% 20% 40% 60% 80% 100% 2015 2022 2023. YouTube. TikTok. Snapchat.

Updated 1:11 PM PST, January 8, 2024. The average liter of bottled water has nearly a quarter million invisible pieces of ever so tiny nanoplastics, detected and categorized for the first time by a microscope using dual lasers. Scientists long figured there were lots of these microscopic plastic pieces, but until researchers at Columbia and ...

The bottom line is that, by 2030, the number of these global digital jobs that can be performed remotely from anywhere is expected to rise by roughly 25% to around 92 million. Higher-income roles will predominate as technology development drives digital jobs of the future with high and mid-level wages, the report finds.

A new paper released Monday in the Proceedings of the National Academy of Sciences found about 240,000 particles in the average liter of bottled water, most of which were "nanoplastics ...

Newsroom Is Rattled by Buyouts. Top reporters have urged executive editor Sally Buzbee and publisher Will Lewis to address the gutted research department, a casualty of the paper's roughly 10% ...