Transcript

Transcript prepared by Bob Therriault, Igor Kim and Sanjay Cherian
Show Notes

00:00:00 [Tali Beynon]

It's rarely the case that you need an order, a linear order, on the rows or columns. You do need to distinguish them to match them up when you do, say, a matrix multiplication. But like, knowing that this is row 3 and this is row 1 versus this is row x and this is row z, there's no need for that. Like, that's just a false baggage that we've brought along just because we happen to write them down in a particularly ordered way.

00:00:25 [Music]

00:00:34 [Conor Hoekstra]

Welcome to another episode of ArrayCast. I'm your host, Conor. And today with us, we have four panelists plus a special guest who we will get to introducing in a moment. But first, we'll go around and do brief introductions. We'll start with Bob, then go to Stephen, then go to Marshall, and then to Rich.

00:00:47 [Bob Therriault]

I'm Bob Therriault. I am a J enthusiast and have had a person accompanying me over the last couple of days on Vancouver Island.

00:00:55 [Stephen Taylor]

I miss Steven Taylor that.

I'm Stephen Taylor, that person, and I'm an APL and q enthusiast.

00:01:01 [Marshall Lochbaum]

I'm Marshall Lochbaum. I'm where I normally am, but I started out as a J programmer. I worked at Dyalog for a while, and now I make my own language called BQN.

00:01:12 [Rich Park]

I'm Rich Park. I'm an APL programmer. I also do media and training at Dyalog Limited.

00:01:18 [CH]

And as mentioned before, my name is Conor. I'm a polyglot programmer, research scientist, and a huge fan of all array languages. And super excited to talk to our guest today. But before we get to introducing him, we're going to go around and we've got a few announcements. I think we've got three from Rich, one from Bob, and then one from me.

So we'll go to Rich first.

00:01:36 [RP]

Save the date or book in your calendar the APL Bay Area User Group, APLBUG, having a meeting on Monday, the 11th of December [01]. And Morten Kromberg, CTO of Dyalog Limited, will be there. He's going to talk about LINQ, which is Dyalog's effort to getting source code in text files, which doesn't sound exciting unless you know the history of APL, I guess. And then also, we had Dyalog 23 a couple of weeks ago in Elsinore, Denmark. That was really fantastic to be at. Lots of different use cases and new APLs people had never seen before and met. It was really exciting and fun to be a part of. Now, we're just starting to release recordings, videos from that user meeting. And hopefully, if all goes well, we'll be done processing and uploading those by the end of the year. But they are coming out bit by bit. And then lastly, for people who are learning APL, it's a great resource. You can try out problems, phase one problems usually solved in one line of APL from the annual APL problem solving competition. And there is a website, problems. tryapl.org, where you can enter your solutions and get immediate validation on whether you got those correct. And we've just put up the problems from the 2023 round. So you can go ahead and try those out now.

00:02:56 [BT]

And I'll take it up from there. We've had a lot of response in the last couple of episodes. We've been talking about tacit and we were inviting people who might be developing their own languages or their own approaches to get in touch with us. And there's several that have. And they may have walked into something because in the future, they may end up as guests on our episodes. But in particular, there was one called Empirical and Ex Machina was another one that are developed. We'll put links on the website if you're interested in checking them out. And maybe we should check back with those people, see whether they're comfortable with exposing what they're doing to the general population. So when I say they'll be in our show notes, they may be if they're not, it's just because and in the future, we may have them in the future show. If you listen to this later, we'll put them back in. But they may be open to having people take a look at what they're doing. Very neat stuff. So we're always happy to have people get in touch with us at contact@arraycast.com and really enjoy connecting with people who are doing some really interesting new things and have new ideas and they're willing to take the plunge to open them up to us.

00:04:05 [CH]

I think I checked out both of those briefly. And the whether or not they want people going and checking out, I guess is a different question. But the availability of it, it is online. It's not like private repos and stuff. So one would assume that if they're putting it online, they're okay with people stumbling into it. And with that, we'll go to the last announcement, which is that in one week's time from the recording today, but only I think three days or four days from when this will be released, depending on when you listen to it. If you happen to be in the greater Toronto area at York University on November 14, between 4pm and 6pm at the Science and Engineering Library, there's going to be a exhibit called the MCM slash 70 at 50, which I actually don't know a ton about. But I know that there was a book written at one point and they have basically a kind of mini museum that talks about and displays these microcomputers. And there's a lot of APL history that ties in to this computer. So we will find after we record this a link to that book if people are interested in checking it out. Unfortunately, you have to be in Toronto. I don't think there's going to be a virtual component of this. And I think also the same location that this is being held at was also where there was, I think, a 50 year APL sort of anniversary or party back in like 2014 or something like that. I was not, I did not know about the array languages at that time, and I was still in university. So unfortunately, I don't even think I lived in Toronto at the time. So I wouldn't have been able to attend. But anyways, links to that party exhibit, whatever they're calling it, and everything else we mentioned will be in the show notes. And with that out of the way, it is my pleasure to introduce Tali Beynon. And he is with us today to talk about his rainbow array programming. It is a well, actually, I will let him describe it. But it's very, very interesting. I believe you gave I'm not sure if there's multiple talks, but the one I watched was the one that's linked to at the top of the article. It was a talk presented, I believe, at the Hypermatrix workshop, which I think was roughly about six months ago, half a year ago. It's a very, very interesting talk in terms of it's introducing, I'm not sure how novel the ideas are, because there are a couple different papers that I'm sure you're going to refer to. So it's, it seems like a somewhat novel, though, idea. And there's been some discussion about how it hasn't been super explored in the traditional sort of Iversonian array languages. But with that sort of mini introduction out of the way, I'll throw it over to you, Tali. If you want, you know, take us back to whenever you want to, you know, when you discover computers, and give us sort of an abridged summary of how you got to where you are today. And then we'll start chatting about Rainbow Array Programming.

00:06:43 [TB]

Awesome. Thank you so much, Conor, for that introduction. [02] And yeah, it's a pleasure to be joining you guys today. So yeah, looking back, I've been sort of tinkering with computers since forever. I've always been drawn to them. I think I started coding when I was like nine or 10. Mostly it was sort of just games in QBasic. I think my brother bought back one of those big books that had, you know, listings where you had to type in 10, you know, print, hello, 20, that kind of thing. And then from there, just sort of adapting those to do novel things, I fell in love with programming. And a big moment for me was, I remember, I was reading Richard Dawkins' great book, The Blind Watchmaker. And he has this whole kind of like, side thing that he introduces in the book, which is a program called the Biomorph Program, which is a kind of illustration of the power of artificial selection. So you have this little program that like uses parameters to produce these little diagrams, these little pictures. And by sort of breeding them, you can make them look like a huge diversity of different things. It's just a toy, but it's like a great educational tool for the, how evolution works, basically. And I was so inspired by that, I kind of copied that, did my own version with some different tweaks. And I had an aunt who lived in Oxford, and she delivered it to him on a floppy disk, because it was in the very early days of the internet. I don't think I knew how to zip something up and send it by email. And he replied, which like, maybe gave me a little confidence boost early on in my explorations. But yeah, so I've been kind of playing around and teaching myself stuff since forever. I self-taught myself high school, dropped out, mainly to code. I just kind of wasn't as interested in other things.

00:08:35 [CH]

Wait, wait, wait, you dropped. So wait, there's homeschooling, but you dropped out of high school and then just taught yourself? Is that allowed?

00:08:43 [TB]

Well, things are a little bit of a wild west in South Africa. I think the truant offices aren't quite as dedicated as they might be in other more developed countries. But yeah, so I suppose I did have to pass some basic metrics, you know, how to do or write an exam every year or whatever. But mostly it was me just resisting all attempts of my mother, who's a professional educator, to teach me and just teaching myself the things that I cared about, which was computers. But I did eventually go to university. So I studied math and physics. I think I was inspired by Feynman. [03] He was a big hero of mine. And his books were really inspirational. And I loved learning stuff that was outside my comfort zone. But also experience is like a little bit of a disconnect for the things I was interested in. Computers weren't really a big part of how math and physics was taught there. But university I went to, it sort of didn't really connect. So I remember discovering Python in undergraduate and neural networks and being very excited about this. I thought it was like a real testament to the power of Python that you could write with list comprehensions, you could write like a neural network forward pass of an NLP in like one line. I thought that was like really cool. But like no one else in that stream was maybe vibing on those ideas in the way that I wanted. So after doing my undergraduate, I joined industry. I didn't want to go further down the math road. I thought it wasn't relevant, which turned out to be a big, not a mistake, but that was short sighted in a way because I've since rediscovered how applicable a lot of abstract mathematics is to the things that I'm really interested in. But maybe we can get to that later. And within about six months, I got hired by Wolfram Research. So they advertised like a summer school where you could just go and do a fun project. And like Stephen Wolfram would come and check out the project and help you design the project and then like evaluate it at the end after a few weeks. And based on that, the company was excited to hire me. And I was sort of blown away. It was a dream come true to come work for a company that made a product that I really respected. I discovered mathematics and played around with it. And there's nothing quite like mathematics. It's like very unusual in a lot of different ways and lets you do some really powerful, interesting things. So yeah, so that was maybe in 2009, 2010, I joined and worked on a whole bunch of different things. Maybe we can hit the things that are relevant to what we talked about today and array programming. I think one of the big moments for me, both because it's like a really fun project that I really enjoyed, but also like taught me a lot about the power and importance of multidimensional arrays was this project to analyze Facebook data that our users had donated to us. Because we've made a like a little tool that they could use to do their own Facebook personal analytics, like plot their social network and show all kinds of like histograms and things. And the trouble with analyzing that data was like so many demographic variables, age, sex, relationship status, geographic location. And then there was the interdependencies of like those variables with the same quantities that your friends might possess. So if you want to look at like, what is the correlation between your age and your friend's age, like as a joint distribution, it's quite gnarly. Because you can also then condition that on all these other variables that might make a difference, right? Might vary from all kinds of things. Like, so it's really, you're limited by the kinds of questions you can ask, and how quickly you can answer those questions. And so I ran into this problem, like I try to load it into a SQL database and write queries, but it was just so painful experience to do that. It's not really designed. And perhaps if I'd had more time, and if I hadn't been as in love with Mathematica, I would have explored R and tidyverse and those kinds of things. But to do this kind of thing, especially the visualization side. But it made sense to sort of dog food at Mathematica and build these multi-dimensional histograms, essentially, and the plumbing that would let you project them and aggregate them into visualizations really quickly. So that you could come up with a question that you wanted to ask, and explore, and immediately get kind of results and feedback from that. And I think what is a core thing there that I realized early on, and try to make that work, build a workflow for that, is that you couldn't really do that if you just had numbered axes. If you had to remember that axis one was the age bin for the main person, and then axis two was the age bin for one of their friends. And then the value of the array was the count across your whole data set of people falling into those bins, like relationships that fell into those. That's too hard. You really needed to use named axes to be able to express these complicated things and keep track of what you're doing, and for it to be intelligible after the fact. So that's something I just did. I didn't know that that was a special thing to do, but I've since realized that this named axis kind of phenomena, or at least the motivation to use named axes more broadly in dealing with multidimensional arrays, is like everywhere. It wasn't specific to that problem. And that project was fun. It paid off. A lot of people interested in that. Maybe you can put that in the show notes or something if people want to take a look at that. But what it led to was sort of Stephen pointing out that we didn't really have a response to Python's data frames. Dealing with this sort of functionality, it was a finicky and mathematical. You had to do everything by hand. So what would that look like? And that was another, I suppose, a year or so of work. We needed to add features to the language. We didn't really have hash maps or dictionaries as a core data structure. And that was something that you absolutely need to build records. You need to be able to do that to represent tables, where the columns are named, for example. So alongside that, I was sort of designing what is the carrier of tabular data that we can allow people to do all the things that they could already do in Mathematica, visualizations and so on. But make the workflow as smooth as possible, to pipe the things that you wanted to where they needed to go to produce a rich visualization and to do aggregations and things. And that was a really educational experience because in some sense it didn't work. I did. I mean, the project was finished and there was a sort of idea. I don't know how much your listeners know about Mathematica, but maybe I can just sketch the basic philosophy a little bit. Essentially what it is, is it's a, you can, to first approximation, it's a functional programming language. It's actually a term rewriting system, which is a more exotic thing that maybe I can talk about later.But it's a functional programming language. And it's also a network of, like a patchwork of different DSLs for different purposes. They're not really partitioned into different libraries because that would make it harder for them to interact with each other and integrate well. So there's a DSL for like graphics, sound, algebra, geographic data, united quantities, regions, statistical distributions, like more and more and more just keep getting dropped in every year or two as a new release comes out. Anyways, the job was to build a data frame like concepts in Mathematica. Let's design a DSL that's for querying structured data sets. The easiest thing to do would be to just like focus on tabular data. And to some extent, that's the simplification that other frameworks have used. So you can have nested data, but it's largely a tabular kind of worldview. You're prioritizing the first two axes and giving them sort of special focus. And that's great, but it's also didn't really match how people typically work with Mathematica. And that's because Mathematica is so much about trees of data, like LISPs are. It's about trees of data. All the DSLs are designed around that philosophy that you build a tree, like for example, graphics visualization is a sort of tree structure, just like a DOM or SVG. And so it's very natural to want to build a reach deep inside that tree and do an aggregation or transformation. Another core point is because you can alternate at different levels or axes, I suppose is how you'd call them in other frameworks, because you can alternate between listy types of axis, and we call them associations in Mathematica, like records or structs, because you can alternate between those two. It's hard to really think about that as a table. In fact, once you go down one of the fields of a record, you might have non-homogenous data beyond that point. So depending on which field you poke into, you might have a list or you might have a scalar. So it's just not an array in the traditional sense we think about it. And so treating that seriously and trying to build a DSL to do to work with that, you kind of can't take that shortcut of seeing everything as two-dimensional. You've got to say, well, everything's n-dimensional. In fact, n might not even be defined because it depends on which parts of a record you go down, how much more structured there is. But what can we do anyway? How do we build a DSL to navigate that? And I did the best I could. And I think it's fine that people use it. But I think what I learned from that is that the complexity that comes from building the little combinatorial gadgets that you have to plug into each other to express all the different kinds of burrowing operation that you want to dig into a highly nested data structure and do transformations or aggregations or fish out information, do projections, that sort of gadget building occupies quite a lot of your time. And it's quite hard to debug and reason about and statically type checking it, which I attempted to do. I had a whole algebraic data type system to kind of represent what the shape of the data was, even if it was a very complicated shape. Trying to reason about what would this gadget do to this nest, this algebraic data type is tricky. And it's tricky to explain to the user when it's not what they thought it was. And again, this is an example where named axes, where instead of ordering your axes, first axis, second axis, third axis, or just rows, columns, and instead dropping the idea that axes are ordered at all, and just focusing on exactly the level that you want to target for a given operation is a more natural and lucid way of doing those kinds of transformations that are very typical. Sorry, I'm just like blitzing through stuff here. I realized this was meant to be an introduction, but it's like turned into a whole essay.

00:19:50 [CH]

No, no, this is fantastic. I'm just digesting and this history is, yeah, our listeners are on the edge of their seats right now. Don't worry, you have not lost it. Well, you might have lost a couple people, but I'm hanging on and this is fantastic.

00:20:05 [TB]

So maybe I should also just like say something in defence of that thing I built that I said it was not a total success. [04] There are some nice features about, if you do have complicated nested data sets, I'll give an example to make it a bit more concrete. So in a typical, let's say you want to summarize, this is from our documentation, right? And when I say our, obviously I don't work at all from anymore. Maybe I'll get to that later, but I suppose because it's my baby, I think of it as our, right? Let's say you have, you want to describe the planets in the solar system. So you can clearly see that as a table, you have a row for each planet and the columns might be things like the name of the planet, the radius, the mass, that kind of thing. That's great. But now you also want to incorporate the fact that planets have moons. So you're going to add another column for moons, but the value of moons is an entire sub table, but it shares the schema across instances. So each moon will have similar properties, you know, radius, mass, whatever. And that'll repeat. So, you know, Earth will have one, Jupiter will have however many, but they're all going to have those fields in common. And so you can use an algebraic data type to represent this whole super table that's got sub tables in it. And then once you have that, you can do nice things like you can, by transforming that moons column by taking the length, for example, you've just got a simple table that has a count in that column. So you just see how many moons each planet has, or you could aggregate. So you could say, well, I want to take the average mass of every moon. And with a suitable gadget, you can walk the table and produce another table that's flattened off the sub tables in that way. So that would be an example of like why nested tables are really interesting. And you can do a lot of that. And one of the things that's nice about the framework that ended up shipping is that there was a planner that would sort of look at that complex algebraic data type that expressed this whole table. And it would figure out like how much screen space and to devote to the different portions of that. So that for each planet, you might want to show like four moons, and you could, you would lay them out. So you've only got a two dimensional table you can produce in the end, but you can stack like four moons in the single row of each planet. So you've got a choice at each turn, like each level of nesting, you can say, okay, I could lay this out horizontally, vertically, or as a little inline list. And like, let me explore all the different options up to some computational budget and try to estimate how good they are according to some weird metrics of like aesthetics. And I'm sure other software does this kind of thing. But I think that's a really nice approach. As soon as you have a like fully algebraic view of complex data, you can do these interesting things like global planning for how to make it easy to display this in a reasonable way. Or in theory, you could even start to suggest analysis and transformations based on the shape of the data, you could be like, you could aggregate by the following things.

00:23:21 [ML]

You get some interesting problems with displaying nested arrays in APL where you don't have this framework at all you like. You have an array, but whatever is inside each element could be anything anywhere, and I mean generally we just.

00:23:33 [ML]

If you do this, you get some interesting problems with displaying nested arrays in APL, where you don't have this framework at all, you like you have an array, but whatever's inside each element could be anything anywhere. And I mean, generally, we just don't try to, we don't try to do this aggregation, we just make a display for the whole array and truncated or something like that. And I've looked a little bit at trying to, you know, squash the subarrays, you know, put ellipses in places. And yeah, if you don't have any global view of the structure, that's very difficult. What I've tried to do is kind of pass down information about like how much space you've got left. And then the subarray decides, you know, am I gonna, am I going to try to use that space or just give up and represent the entire array as an ellipsis? And so on. So yeah, it sounds like if you have a schema, then you get to, you get to work on the type instead of working on the array, basically, maybe.

00:24:15 [TB]

Yes. And you can do a bit of both. So a strategy that I remember using is that if I had some ragged arrays, or if I had like a technically like an existential length variable, which is like, I didn't know how many, I just knew there was some number of moons for each planet, I didn't know how many, they weren't all the same number. It's hard to know, like, can I afford to show all of them for every row or that blow out my table budget, right? So there was a little thing you could do, you could look sample, you could like, okay, I don't know what to do here. But I can get a distribution of the lengths of moons, which will give me a good sense of like, where I can best spend that budget. And in the kind of like extreme case, you could see this as like a true global optimization problem where you like, do some kind of like, you know, nonlinear optimization thing, and you've got some like metrics and scores and things. And it's hard, I didn't get that far, I just did some truncated search. I do think that kind of thing is probably the future of, I mean, people could be doing this already in commercial products or open source projects, I'm not sure. But like, it's an obvious thing to do. And I think like, with enough time and investment, that will become the standard for how we navigate like complex datasets.

00:25:30 [BT]

When you take that information using a nested table to be able to start to represent it, you encounter these kind of problems, do you find there becomes a pattern to the way you think about the information as you go in? So in other words, before you create all these nested tables, after doing a few of these, does that start to shape the way you look at the information going into it?

00:25:51 [TB]

Do you mean like, do you start to perceive certain patterns in how to approach that kind of, like, maybe there's a hint of parametric polymorphism or something that like, you could just abstract a lot of the work that you're doing?

00:26:06 [BT]

Exactly. To me, it's a lot of mathematics is you're looking for patterns within numbers and within functions and things like that. And when you start to find an approach to something, you start to find new patterns that you didn't see before, because this approach opens you up to other things. You designed this kind of a tool, did you find that it affected the way you looked at things?

00:26:28 [TB]

Totally. And I think maybe that's, it leads to a natural thing that I was going to say anyway, which is that I was aware that there was something really interesting going on with, for example, algebraic data types. And also I keyed on this a little bit, that it's like, you had this idea, traditional idea of like rank, you know, matrix rank two, whatever. But then as soon as you have like records that can have values that aren't the same type, rank is undefined, right? Like, depending on which path you take to like dig into the dataset, like you're going to have, your path can be longer or shorter, and they can have different dimensions as you go along. And also for things like raggedness, the values of those dimensions can depend on earlier choices you made and where to go, where to dig. I mean, the most classic examples, if you just have a list of differently length lists, then like, you know that you're going to get at level two, you're going to get like lists, but you can't say what exact length there'll be, but you know that there is a length, but it depends. And you could store those lengths somewhere else. And that actually becomes really relevant to like stuff we'll talk about. And so like, I kind of, I could sense there was something really interesting going on here. And one of the, I think one of the tensions in sort of commercial programming work is that you, there's always stuff to do, and you've got a ship and you've got to do the best you can. And if there's a bit of a rabbit hole forming, at some point you've got to like close it off before it sucks you down. And what I think I would have learned, as I'm sure a lot of your listeners will be kind of shouting to themselves, it's like, oh, that's dependent typing. [05] It's like, that's right there, right? Like it's a whole interesting story, a theoretical computer science story that's like fascinating. And in the end, that's one of the reasons I decided to sort of switch away from more engineering focused work and to spend time kind of thinking and exploring was like, you see lots of interesting things happening. You aware that there's a kind of schema behind them or that there's some deeper way of thinking about it. Can't put your finger on it. And it might take a lot of like hard work, sort of reading existing bodies of thought and different ways of thinking about things. And that's like time that you often don't have if you're doing commercial work. So that is actually what ended up happening for me. It's like the work was really interesting, but at some point I kind of wanted to step back and think, which is ultimately why I left Wolfram Research.

00:28:56 [CH]

So before we move on to the kind of next chapter, which it seems is going to be, is this, you know, time you've taken to start thinking about that. Are there names to the, it sounds like two projects, like you initially started talking about how, you know, you started playing around with SQL and trying to do stuff there. And that was painful. And then that led you to what I think you said was a year of work. And did that end up having a name or was that just research? And then you went on to say that you ended up doing this DSL that was sort of a data frame flavored thing, but tailored to work with Mathematica, which is, you know, has a different model than, than, you know, pandas or the data frames that exist in R. So I'm assuming the second one has a name. Does the first one though, as well?

00:29:39 [TB]

The DSL and the data frame thing are the same sort of project. Ultimately that was called data set. One word it shipped in Mathematica. It's just like part of the the language. And yeah, I've got a page on my website where I described a little bit some of the challenges that went into that and some.

00:29:55 [TB]

The DSL and the data frame thing of the same sort of project, ultimately that was called data set one word it shipped in Mathematica. It's just like part of the language and yeah, I've got a page on my website where I describe a little bit, some of the challenges that went into that and some of the ideas that were there and yeah, maybe we can leave that asF an episode note.

00:30:01 [CH]

Yeah; link [will be] in the show notes. And I'm sure folks will be interested. I'll definitely take a look afterwards. And I will say that while you were telling that part of the story of the sequel that then led you to try things out in Mathematica, in my head, I was like: "this sounds extremely similar to just DataFrame". Which is I think, after NumPy, the second most popular library in Python. Which, probably makes it the second most popular library in the world [chuckles]. Maybe there's some JavaScript stuff out there. But when you ultimately ended up saying that this is kind of the DataFrame thing, I was like: "okay, so I wasn't wrong with what I was thinking in my head there". Beause it is a super popular framework. I mean, there's pandas, and then I think there's like six replicants of pandas, including one that Nvidia makes that are all just trying to take that model and copy it onto different hardware. I think Dask has a version of it. It is almost like a paradigm unto itself. Like how common it has become for people to do data analytics with that kind of framework.

00:31:01 [TB]

Yeah, I agree. And it's clear it's had an amazing impact. And I think it's, as you say, the standard for how people work with that kind of data now. Yeah.

00:31:11 [CH]

All right. So take us now to the next chapter of your voyage into idea exploration land.

00:31:18 [TB]

Sure. There's actually one more stop (the end is in sight) and that's machine learning. So the next big project was: how do we bring machine learning to Mathematica? And that culminated in the neural network functionality of Mathematica. There's another DSL and its job is to describe neural networks. You can build a neural network architecture in the abstract; no weights. Then you can train it on datasets or you can import ones (there's a repository that's maintained online that has all kinds of pre-trained ones that are publicly available or whatever). A lot of the same themes then came up there. I'll give an example of one of the kinds of challenges that happens with neural network frameworks, like representing neural networks. In the easy case, let's take a multi-layer perceptron that's operating on a fixed size vector. Maybe it's predicting, I don't know, house prices from a whole bunch of different variables. What's straightforward about that is that all of the arrays are sort of fixed size and maybe you have 5 scalar inputs, so you can collect them into a vector and you can make a prediction; that's going to be another scalar. Typically when you would train, you would apply it to thousands of examples. So as complicated as it gets is that you have to break your whole dataset up into little batches that are some fixed size for efficiency. And you go through these batches in turn and that back-propagation gives you weight updates and yeah, that's how neural network trains. When you're done, you can apply that neural network with weights to one example or many examples. That's the sort of easy case. Where it gets a bit harder is (a good example) sequences. Let's say you're doing sequence prediction. In modern days, that's all transformer based, but even back in the day we had RNNs and LSTMs. All that they do is they sort of process a bunch of vectors representing some sequence. They could be like, you've turned the characters of a string into vectors somehow. The RNN or the LSTM will observe each of them in turn. When it gets to the end, it's ready to sort of make a prediction for what the next character will be. The easy thing to do would be to train this on lots and lots of fixed length strings that are all the same length. So again, we have like a batch: [the] first axis would be batch dimension; the second axis would be sequence position, like maybe one through 30 or whatever. And then maybe the third axis is like feature dimension; maybe you represent each character with a 10 dimensional vector. That's the input to your neural network. That's fine. But maybe your strings aren't all the same length. You can chop up a large corpus of text into strings at all the same length, but there might maybe honest-to-god examples that aren't, and it would be wrong to chop it up into all the same length. You'd lose information if you did that, or you'd be solving a different problem from the one you really wanted to solve. That's challenging for a number of reasons. One is that if you want to do training efficiently, you have to batch data together into multiple examples, but if they're not all the same length, then you've got a ragged matrix, for example. But it has to be a full matrix to even make it to the neural network, to run in the background. So you've got a pad all the sequences that are too short in a given batch up to be the full sequence length. Then you've got to make sure that all your computations downstream will ignore these irrelevant values by the time it comes to compute things like loss, which is the error signal that helps the neural network learn. So that's one challenge. That's an example of dependent typing staring you in the face; the type of the inner vectors (the length part of that) depends on which vector you're looking at in the batch. People have solved this problem: usually they kind of hard-coded a solution for that. Either they don't solve it because it's tricky or they hard-coded a solution to that specific case because it's so finicky to do all that tracking. It could get even worse: you could get images of different widths and height that you want to do some kind of convolutional neural network on. Now you're doing that same kind of task in two dimensions simultaneously. You've got [a] doubly dependent [chuckles] type of the full batch of examples of images.

00:36:14 [TB]

Another thing that can get really complicated with how shapes of tensors flow through a neural network program is if you're doing things like ... [sentence left incomplete]. I'm sure you've heard of pooling: one of the steps in convolutional neural networks. [06] It's kind of like downscaling. You've got an image; it's successively transformed to extract local features, and then you downscale each step so that by the end, what was a 256x256 image [becomes] like 16x16 or whatever. Because the theory is that at each successive level, you're paying attention to larger and larger features: first eyes, then faces, and then entire bodies; what have you. The way that pooling operation transforms the input size into an output size is quite complicated because it's sort of like a downscaling thing. It's going to involve dividing the input size by some number, but then you might have to round up actually, because it might not be an exact number of pixels. What's tricky about that is if you want to combine that with some of the other things I talked about, like this whole business of batching together differently sized examples, you've got to express these constraints between the input and output shapes. And they're not like simple equalities. It's not like this input shape must match this output shape. It might actually be this output shape is "ceiling(n/3)", where n is the input shape. And to know if a complicated neural network architecture that does a whole bunch of these things in different places, and then maybe matches them at the end to make sure that that's well typed, you end up having to solve systems of equations. They're not even linear equations because you've got things like "ceiling()" in there that mess things up. So this is a weird example where in the long run, probably not the world's most important feature, but it's a fun example of how crazy things can get with dependent typing or with like rich typing, I suppose you could say, with shape inference. You could end up having to do things like solve systems of equations to figure out if a neural network is even well typed, or if it's going to hit errors at runtime that you want to warn a user about or prevent them from even expressing something that doesn't make sense. Building that framework exposed me to a lot of these intricacies again; you have these shape variables that are floating around to efficiently do bucketing and batching when your inputs are different sizes. You've got to do this multi-step compilation procedure where you're reasoning about the shapes at compile time and at runtime, spawning different copies of a program, instantiating them just in time to be specialized for a particular size. And it gets into the nitty-gritty of how do you even think about array shapes in the first place. But that's maybe a little bit in the weeds. I think the simpler take home for me from working on the machine learning side of things (framework side of things), the gymnastics that you go through in a complicated neural network, whether it's a convolutional net or something potentially very tricky like a transformer (like if you have to build a transformer yourself and all these steps to making it work, right?) ... [sentence left incomplete]. The way that you have to make sure that dimensions and shapes are matched up correctly and that dimensions mean what you think they mean is extremely involved and convoluted. It's pretty typical to see long descriptions of neural network architectures that are riddled with comments that are just there to annotate a given operation with what the expected input shape is and what the expected output shape is and what all the axes mean: "first axis is batch dimension; second axis is feature dimension; third axis is sequence". Because like: oh my gosh, to use a particular primitive that a neural network library has made available to you, you've got to have things in the right order for it to go fast, right? You've got to temporarily swap things around to shoehorn them into the right shape for some later step. And then later on, you've got to transpose them back. Then if it comes time to naturally make like a slightly more complicated version of a neural network where (I don't know), you're doing sort of adversarial training or you are comparing two versions of a neural network, you might want to add another axis. The consequences of that for all of your shape gymnastics: your whole choreography that you've built up is so daunting that many times you're prevented from even trying, right? And it felt like that was wrong, like something was off about that. I never got a chance to implement this because it would have too much of a spillover into other parts of Mathematica for it to really be a sellable feature; it would be too big of an overhaul. It would require redesigning parts of the language. But what I really wanted to do was to name axes so that there wouldn't be: first axis, second axis, third axis. There would just be: the x-axis of an image, the y-axis of an image, the color-channel axis of an image. The kind of consequence of that would be that you never had to do broadcasting because it would be obvious the way that you would broadcast two tensors together to add them or divide them or whatever: it would just be [to] match up the names that are in common between them and broadcast all the ones that aren't in common. That's the only thing you can meaningfully do, and it's what you meant. So that sort of inkling was at the back of my mind, and it was sort of too late to try to do something that ambitious. As I mentioned, it would kind of naturally cascade into a lot of other parts of Mathematica. Yeah, go ahead.

00:41:52 [BT]

Well, to me, you're telling a really great story because you're bringing it back around to [that] you have to ship [Tali agrees]. And that's what holds you back from doing the further investigation that you can see out there. Because in some cases, you get to certain areas and people just go: "I'll write a bespoke function that takes care of it; done". But as soon as they do that, they're closing down that door of going further and finding out more about what's going on. And you're saying: "well, I know I have to ship [chuckles]; that's part of what they're paying me for and it's shutting down my investigation of all these things that I see out there". Is that kind of accurate, what's happening?

00:42:29 [TB]

I think so and I suspect that in a kind of broader sense, that's true of all the things we currently doing. We've collectively invested in frameworks and codebases in teaching training material, like tutorials that have hard-coded particular assumptions about how we should think about arrays; how we should deal with arrays. And you could say it's too late to change that; you'd go backwards for so long to go forwards that it'll never happen. I'm not so pessimistic [chuckles]. People have begun implementing named axis libraries in the context of other frameworks; it's already there. I think it's possible in Dex and in JAX and PyTorch and so on. [07] I should maybe touch base a little bit with a story. So my personal trajectory was right about this time, I kind of wanted to do something about this problem. I didn't really know how. I wanted to at least articulate it. And unsurprisingly, many people have noticed this and have brilliantly articulated it. Maybe I can reference one of those that can go in the course notes. Stephen, yeah?

00:43:43 [ST]

Did you ever come across an application where you also wanted to name not just the axes, but indices as well? The reason I ask is I'm remembering an application I worked on some years ago where I was trying to rewrite an evaluation engine for an actuarial application and something which had originally been written as an absolute blizzard of vector operations, re-analyzed into an eight-dimensional matrix. So I labeled all the axes as global variables, so my code was (as you say) no longer talking about numeric indices, but named ones. But in the application I was working on, most of the indices had got semantics to them. So I think my axes were of two kinds: one divided a sum, so you might have some sum of money which was divided between pre-98 and post-88, with or without a guarantee, and so forth. And you could sum across the first kind of axis, but not against the second. So I wound up writing a simple DSL in which you could say which axis you wanted to sum across and basically collapse. Part of getting the application code readable was using globals to label the indices as well. And I'm wondering if that's something which you ever encountered or found yourself thinking about.

00:45:32 [TB]

Thanks, that's a great observation. Absolutely. So in the Facebook data science project, that's a really good example because quite a lot of the observational variables there were categorical. So like people's reported gender or location, for example. In practice, you want to store it in a dense array: you're storing counts; a giant histogram. But conceptually, the values that the first, second, third, fourth index along a given axis represent are "unmarried", "in a relationship", "don't say", or "Johannesburg, New York, whatever". Especially when you want to do visualizations or even aggregations ... [sentence left incomplete]. Like great example is: something like age. Let's take age. Index one might represent a bin: one to five years old. If you want to later on do some sort of merging operation where that bin's too fine (you want to like coarsen it to get better statistics). It would really help if the aggregation step would know about the intervals and would feed them through to the result of that sub-partial aggregation, because you're not throwing away the whole axis; you're just kind of coarsening it. So absolutely, I think the more metadata we have about the meaning of, not just each axis, but what the indices within the axis represent, the better. And in fact, I propose that when we talk about matrices, for example, like I think matrices are a perfect example of this. The way we think about matrices in linear algebra, you always number them. You think about them as numbered, but it's rarely the case that you need an order, a linear order, on the rows or columns. You do need to distinguish them to match them up when you do, say, a matrix multiplication. But knowing that this is row three and this is row one versus this is row X and this is row Z, there's no need for that. That's just a false baggage that we've brought along just because we happen to write them down a particular ordered way. Sorry, go ahead.

00:47:42 [RP]

So I got two things. One is maybe I'm just being slow and I'm just catching up, but obviously the naming of axes doesn't help with your ragged data issues where your text vectors that you're comparing have got different lengths. But are those now just being stored instead of ... [sentence left incomplete]. Now I come from APL world where you've got these matrices of characters and that's what I was envisioning when you're describing padding vectors. These seem like unrelated issues, I guess, or you said that it has been solved? Are you still having to fiddle with solving lots of complicated equations at the end of the day, regardless of this? Is that something you expect to abstract away or hide from users of any system like this?

00:48:29 [TB]

It's a really interesting question or observation. So like ragged arrays, this is a quintessential example of dependent typing. The type of a list of ragged lists, if you want to call it that, is matrix-like, but the second axis does not have a homogenous size. That's the background. What's interesting is if you stay within the ordered axis realm, then you do have this nice property that you can insist (and it sort of makes sense to insist), that as you walk through the axes in order from left to right, the dependencies can only go backwards. So you can have a ragged matrix of size <five,?> where "?" depends on whether you're in row 1, 2, 3, 3, 5. But what does it mean to have a ragged matrix of size <?,5> right? Because the way we typically think about it is like, first you choose the row, then you choose the column, and then you've got the cell there and you can look inside. But what are you going to pick for the question mark? You don't know which column you're in yet. You haven't got to the columns slot yet. So ordered axes do have this nice property that they at least give us a way to canonicalise. To say if you're going to have complicated shape dependencies, where the shape depends on choices you made earlier as you were digging into a compound data type, at least the order allows you to stage that dependency so that you're never faced with an impossible question.

00:50:15 [RP]

This is funny because it's really digging into [it], because you are rethinking concepts of arrays as they are thought of in array-language world, where there are these multi-dimensional collections of data arranged along orthogonal axes that have all got the same length. But then separately, we tend to think of these ragged nested structures: the arrays inside arrays. Largely for historical reasons, right? You had the multi-dimensional homogeneous arrays of yore, and then in the 80s and 90s, we got to put whatever we want inside those elements, which is what you were sort of talking about with the data set thing [Tali agrees]. And so as I'm speaking now [chuckles], I'm literally just catching up to all the concepts you're talking about. I'm not a computer scientist in that way, so I don't really live in like abstract data type world, I just use arrays to do stuff. So that's why I'm sort of asking these questions and thinking in this way. But I'm like also thinking about what's the difference between this and just the dictionary or collection of named values. But you happen to have (and maybe we're jumping ahead at this point), define some operations between the different members, right, that you can aggregate across one member and you broadcast the rest, for example.

00:51:42 [TB]

I mean, it's super. I don't know the right way of thinking about it. It's like you hit on something that is very interesting, like how to combine these two separate ideas: named axes in one hand and like embracing dependent typing as a way of allowing for not only raggedness, but also structs deep inside a nested data type.

00:52:06 [RP]

Well, does K and Q do much of this? Because I know that they have dictionaries as a fundamental type and they perhaps in some way, try to make it work nicely with the rest of the array primitives and stuff that they do.

00:52:20 [ML]

I was thinking pretty much just that. I mean, so a K dictionary is just a list, right? K only ever does one axis at a time, so it doesn't have these orthogonal multidimensional arrays. But all it is, is a list of values together with a list of keys. That solves the problem of having to name the array indices because instead of using numbers for the indices, you can use, for example, symbols, which I think is how a lot of the K stuff does it. And also I said, K doesn't have these orthogonal arrays, but it does have one thing (or many K's have one thing), which is called a table. The concept of it is that you take a list of dictionaries that all have the same set of keys, but then you transpose that. Then when you've got basically a two-dimensional thing, where one of the dimensions is a set of keys. They're basically column names for a database table ([that] is the concept). And then you have one numbered axis. That works very well for the domains that K is designed for, but obviously it's a specialization of having full multidimensional arrays where the domain for an axis could either be numeric or symbols or whatever.

00:53:44 [ST]

Yeah. Let me say something about this. When I was working on that application I mentioned earlier, and working with an eight-dimensional array, I felt that after many, many years as an APL programmer, I finally reached the big time, because when I first learned APL and saw the generality of the notation (that you can have as many axes as you want), I sort of expected that in my future as an application programmer, I'd be working with (what was the term I call it?), "noble arrays". [08] Arrays with more than three dimensions. There'll be lots of this in my future, and I needed to get fluent with ways of dealing with arrays of high rank. When I wrote about what I've been doing with an eight-dimensional array, I got a lot of feedback. It was like: "wow, more than three, huh?" And I think I remember a conversation with Arthur Whitney, which is basically like: "yeah, we can do many, many dimensions, but in the real world, people rarely move beyond two".

00:55:02 [ML]

I think we've pretty much settled on using the term high rank to mean more than one [everyone laughs].

00:55:09 [ST]

Yeah. In fact, I remember Arthur saying to me once, you know: "I'm a really good vector programmer, but I think future generations may throw up people who could do matrix programming in a way that I could barely imagine". Yeah, I was writing about this recently in q. So you can think of vectors as a special case of dictionary in which the index is fixed as the iota. But in the more general form, in a dictionary, you can specify what the indexes are going to be. If you make the values at each index a vector of the same length, or for that matter, dictionaries, you can carry that semantic indexing down as many levels as you want to do. I haven't seen any of this done in practice, but the possibilities intrigue me.

00:56:21 [TB]

Maybe I can jump on. That's really interesting, Stephen. I should mention that interestingly, that's how JavaScript thinks formally (not the actual implementations), formally how it thinks about ordinary lists: they are just special cases of records where the keys happen to be consecutive integers. Of course, that's never how it's implemented in the just-in-time compilers, but that's the crazy semantics that [it] depicts, which turn out to be in a sense, not so crazy, right? That is a reasonable way of looking at it.

00:56:53 [ML]

Well, except for where they use strings for everything, right?

00:56:56 [TB]

Yeah, or at least pretend that strings can be numbers when you add them to actual numbers [everyone laughs].

00:57:03 [ST]

Yeah, that's beautifully twisted because, of course, you can use anything as a key in a K or Q dictionary, but almost always you wind up using a symbol, which is, in fact, just an enumeration. It goes round and round [chuckles].

00:57:18 [TB]

I think it touches on something which I've become pretty convinced of the utility of (that's a convoluted sentence). This kind of perspective of what is a data structure: how do we have a uniform way of thinking about different kinds of data structures, whether it's lists or tuples or multidimensional arrays or records/hash-maps/whatever. There's maybe two observations you could make which are orthogonal. This is the way I like to look at it (this is actually not a fundamental way of splitting that, but it's a start). One is, are the keys (whatever "keys" means), are they consecutive natural numbers or integers or positive integers or whatever, that end somewhere, like <1..n>? Or are they some more complex type, strings or what have you. Then the other axis is, are all the values that come out of your sub-elements, are they all the same type or not? That gives you a little 2x2 grid.

00:58:45 [TB]

If you take the different cases, top left corner would be, it's integer indexed and everything's different types. That's a tuple, because you can have tuple, int, bool, whatever, in any typed programming language that's worth its salt. Then maybe in the top right (oh gosh, I'm probably not going to get these quadrants right), but in one of the other corners [chuckles], you've got integer indexed, but all the elements are the same type. Well, that's [a] list/vector/whatever. We're familiar with that.

00:59:12 [TB]

Then in the other corner, it's keyed by something and all the values are the same. That would be like C++ hash map. In a dynamically typed language, it's not really a thing: you can put whatever you like, anywhere you like, but typically what we think of is a hash map or associative array.

00:59:32 [TB]

Then in the other corner, the values are all different and they're keyed off some fixed set of things that we know in advance. It's just like <1..N>. We've got, I don't know, particular symbols that we've chosen. That's a record or named tuple in Python or struct or whatever; all the different names for that idea. That little grid gives you most of the container type data types in languages that people use. You can go a bit more exotic, but that's a helpful way to think about these things. But you can also just see them all ... [sentence left incomplete]. Go ahead.

01:00:15 [BT]

Just before you go on, I want to observe that there was a meta moment where you lost track of your grid. Essentially, what you converted to at that point was named indices, because you continued on with the explanation of what you were talking about and people built in what that would look like. I was like, I can't believe this is happening. It's amazing.

01:00:39 [ST]

I wondered if you were conflating implementation and concept there, and if it might not even be simpler than you suggest, where an ordered list, what we think of as a vector where you would index things by position, there is no value associated with the basically, it's one damn thing after another. All you've got is sequence and order. So if, for example, you were taking readings from a dial, you've got consecutive readings. And the only thing that the index tells you is the order in which you took the readings. If you've got times, timestamps for those readings, then that's two values. Here's the value of the meter, and here's the timestamp at the point at which we took it. If you were taking regular readings of that same dial, say as an hourly readings, then the hours are your indexes, and the values are just atoms. And so what I'm suggesting here is that maybe we only have two kinds of ways of indexing an array, one of which is the natural numbers, when the other is some kind of category that we've defined, where you pick the third always means something.

01:02:00 [TB]

Yes, no, I totally agree. And I think that becomes sort of clearer when, I mean, maybe less clear, it depends how you look at it, when you adopt the point of view, the sort of computer science point of view, that the natural numbers, they just suck, suck, suck, suck of zero, right? Like any given number is. . . So it's a linked list with nothing in it, just the length of the linked list, right? And that's really what you're indexing by, it's like you're just walking this chain until you get to the end. And then as you go along, you may be walking the tuple and then picking what you got at the end there. So I totally agree with that. Can I also just make an observation, Bob, that your observation was really beautiful, that what I got confused by as I wasn't sticking to my own kind of like advocacy of like, you know, don't number tables, just like give names, not only to the axes.

01:02:52 [RP]

You tried to put them in corners of an imaginary grid.

01:02:57 [ML]

Well, I had them flipped horizontally, personally, to start out with.

01:02:59 [RP]

The top left is this and then the top right.

01:03:00 [ML]

Well, I mean, it's extremely dynamic typing where the type system that you put in inside each square depends on the particular index that you use.

01:03:20 [TB]

Yes, yes.

01:03:21 [ML]

So that's a very circular dependency, isn't it?

01:03:23 [TB]

Yeah, that would have been. . .

01:03:24 [ML]

Really, you should have announced each quadrant using the system of that quadrant.

01:03:30 [CH]

It was an outer product, like, or a Cartesian product of two things. So it made sense.

01:03:35 [BT]

That would be traditional array thinking, though. By naming them, by giving them meaning, by giving them semantic meaning when you go around them, and if you lose track, it doesn't matter. The meaning is still clear. What's creating the difference?

01:03:48 [ML]

Well, I mean, so you've got an array with two elements, and the zero element is numbered indices, and the element whose key is named indices is named indices.

01:04:00 [BT]

Yeah, I don't get that one. You just stepped off my. . .

01:04:03 [ML]

So to figure out which index system you're using, you have to first select an element. And then once you know what the element is, that's the index system you. . .

01:04:14 [TB]

Oh, so it's dependent typing all over again.

01:04:17 [ML]

Well, it's where you're. . . Not only that the type depends on the value, but the type system depends on the value. Ah, I see.

01:04:24 [CH]

It made sense to me. He had heterogeneity and homogeneity, and then keys and indices, and he just, he matched them all. And I think everybody understood. But you were in the. . . You know, we had three people waiting to say things, and then you're going to say, but there's an alternative way or something like that. And then I'm not sure if you're able to. . . Or were you going to say something still, Marshall?

01:04:43 [ML]

Yeah, I did want to. . . My original comment was going to be that I think there's sort of an extra axis that you kind of picked and chose from, which is whether the keys in question are dynamic or static. So you said, for example, in one of the quadrants, there's a record type. And so the way to characterize a record type is, of course, the keys are qualitative. They're not numbered. The values may have different types, but also the keys are statically defined in the program. Not necessarily static, globally static, but they're at least declared in one place for the same type. So, and it's interesting that you kind of chose static versus dynamic types in different positions.

01:05:29 [TB]

Well, can I jump on that point? I mean, not in a bad way, but that's a very. . . So that's a very interesting point. And there's a beautiful way of tying this up, I think, right? Which is that, in theory, you could have a record that has. . . You're right, I kind of swept something under the rug when I said a statically defined set of types. What did I really mean by that? I just meant it's like an enum, like a finite enum. I don't care if you want to think of them as symbols, values of any kind, just as a finite enum. You can enumerate all the different things that could go there, and those are the different record names. It could be integers even, it doesn't matter. Here's the key thing. What goes wrong if you allowed it to be, whatever, just integers, not statically defined, completely any integer you want? Well, the problem would be that, in theory, you could have an infinite amount of information present. You wouldn't be able to know what was there in practice, right? So really, if you want the record to be compactly represented in memory, where you sort of know ahead of time that this key is going to go here, the value for this key is going to go here, and so on, that's when you just need it to be finite. That's the only thing that's really relevant. If you can just enumerate them, then you've got a way of looking up.

01:06:44 [ML]

Yeah, so it has to be finite. But the dynamic system is not that you just give up and say, well, I don't know what the set of keys is, it could be anything, but that the set of keys is stored with the particular value. So that's how dictionaries work in dynamic languages. If you had that, you would put it in the same quadrant, but I wouldn't describe it as a record type, because it doesn't have this fixed set of fields.

01:07:08 [TB]

I suppose, maybe I could just ask, why would you have to store the keys at all?

01:07:15 [ML]

Well, if they're dynamic. So if you can, for example, if you've got a global set of keys, but you can express this in terms of values too, but if you say, some things are using only a subset of the keys, then at different points in the program, you could select all sorts of different subsets. So you have to store something to indicate with the value, which set of keys it's using now, if you can't derive it statically.

01:07:39 [TB]

Sure. I agree. There is one final step of collapsing. We can collapse that grid, that two by two grid into just a vector, a final aggregation. And that aggregation, it's that the homogeneity heterogeneity thing is really the question, is it a dependent type or not? Like, does the type of the value depend on the type of the key? Forget about whether it's integers or symbols or whatever. If it depends, then it's heterogeneous. And if it doesn't, then it's homogenous. That's it. That's actually the, in a sense, that's the only real distinction that, that's the distinction that has the majority of the kind of implications for implementation. If you have a statically, if you have a good compiler, that's my claim.

01:08:36 [RP]

So I've got one and a half things. And the half thing was about implementation of these in terms of memory stuff, because again, coming from the land of orthogonal, and then usually this is best in homogeneous, actually is best in homogeneous arrays. Otherwise you have to have pointers to things, but right. The APL arrays, right. You've got the shape, and then you've got the data all laid out and you have these weird quirks, like, you know, doing some operation along the trailing axis, going back to ordered axes can be faster because you're grabbing the memory out in these big old chunks, right. Instead of having to compute the strides and things, strides, another thing that's to do with having these same fixed length, orthogonal dimensions, right. But in this, in this world of these arrays, or these kind of like objects with array like semantics attached that we're discussing here with the named dimensions, but also your dependently typed dimensions where the length, you know, depends on where you are and what axis you're on and stuff. Does that type of performance consideration just like go out the window or are you, you're going to have to do all kinds of clever stuff or just have to really know ahead of time, oh, okay, thank God we're in, we've got a homogeneous array at this point, because now I can actually do the stuff I'm used to. Otherwise, how does performance work in this scenario? Because it sounds like a mess to me.

01:10:06 [TB]

I think that's a really interesting question. I don't, I'm not a language implementer, other than I suppose the DSLs that I've implemented in Mathematica. So like, I imagine I'm going to be attracted to try to explore this in the future. I have some thoughts. So my hope, and maybe this is a well-known thing in the small community of people who've already done this kind of stuff, so I'm not sure. But my hope is that there's kind of a straightforward sort of calculus for how, what choices you're forced to make to get as good performance as you can for any given case in a dependent type setup. So like, probably Simon Peyton-Jones has written like some beautiful paper that's like described exactly what you must do. I just got to find that paper, right? But like, for now, here's where my thinking's at. My thinking is that, let's take one or two examples. So one example would be, you've got a matrix whose complete shape, you know, compile time. So you don't need to store dimensions for it at all, right? That's the best possible case. You just got to point it to the first entry. That corresponds to, like, you don't need any existential or universal types for that, right? Like, you just got like the shapes hard-coded. Then you've got, let's say you've got a matrix that's got row and column, number of rows, number of columns that you only know at runtime. You've got existential type variables there. You know that they exist for any given instance of this type. It exists a number of rows and a number of columns. That's all you know at compile time. So it means that you're going to have to create like a separate data structure that represents, that stores those values at runtime for the dynamic shape. That's just a dynamic shape. And in a sense, this is like what all dynamically typed programming languages do, is that they do that for everything. Every aspect of a values type is stored in a little adjacent tag, right? But then you've got like more complex case, like a ragged case, right? So you've got, how would you phrase that? Well, you could say that for all rows, there exists a column length. First of all, there exists a row length, number of rows. And then for each row, there exists a column length. It's a slightly more exotic form of the original shape because it's got a little dependency there, right? So it's like one level up in complexity. So that now, that requires you to store not only a number of rows, but a number of rows in another vector, which is the column lengths for each number of rows, right? And then you could imagine more and more and more exotic versions of this, where you introduce more and more dependencies in this way. But my claim would be that there's like, if you just sit down and like, think about it, there is a straightforward way that you could just see from a given shape of dependency in a raise dimensions, you can just see, like, it forces you to lay out a whole set of auxiliary data structures that store exactly what you need to store for you to know what you're dealing with at runtime. And it might be nothing, or it might be everything. But like what we have now in this, in this sort of, this disjunction between static, statically typed languages and dynamically typed languages, it's like, we don't really see there being a continuum here that's like completely flexible, you can pick any point on this continuum, or you can go exactly as far as you need for a given amount of type dependency. So maybe that's a well known thing. I don't know. But it just feels like that is true. And just on the examples that I've thought of, I'm not sure.

01:13:32 [BT]

I'm wondering whether your performant part of it isn't, isn't so much an implementation, there will always be what I think of the hot rod array languages like the K's and the Arthur writes that they're built as double A fueler dragsters, they are fast, they have a purpose. But they may not be as general as some other languages are. But that what they do, they do very quickly. And there's a focus to what they're doing. And they're perfect examples of purpose driven implementation. But in this case, the ability to think more semantically about the structure, the advantage you might get there, and I'm wondering whether and you know, much more than I do about machine learning and these kinds of things. I wonder whether the semantic part might be able to open up some of the information that a transformation or something like that is going through, that allows us to understand and maybe control a bit better about that whole complex process. That right now is just a black box to us, because it's going all over the place and the numbers are dancing and we end up with the result. But is it possible that if we can introduce the information at the point of transformation, it allows us to then get a window about what's happening in there. And really quickly, the example I think of is when I was working in video production, we would edit, but we would edit not based on time code, because time code didn't exist then, it was just a series of pulses. And the example I give to people would be, it's like standing beside a railway and just counting boxcars. And then somebody distracts you and you look away, and you come back and you come to edit. Where are you? Which box? How many cars went by? I don't know. All I'm doing is going one, two, three, three. I'm just counting pulses, but I'm not recording the numbers. And when you jump to time code, you're putting that information. It's like you're writing the number on the side of each boxcar as it goes by. You can look away, you look back, oh no, we're on five. It's really simple. And it's you're introducing that semantic information at the point that you need it. If you take that out, you might be very quick, but you do lose the information. And that might be useful information going into very complicated things like machine learning.

01:15:50 [TB]

I totally agree. So this is maybe a great analogy from computer graphics, game programming. There's this thing called, I think it's called RenderDoc. And it lets you attach, you can instrument like a game. You didn't have to make the game. Like you just, I don't know, you boot up Quake three or whatever. You can like attach RenderDoc to it. And it's going to, it'll show you exactly what every buffer is doing. Like what its contents are, what draw calls were made to OpenGL or DirectX or whatever, what they did, like what all the textures are, exactly what's happening. Right. And it feels like the need for that in array programming, data science and so on, it's like so obvious. It's so clear that like, not only do we want to see the shapes of things, but we want to build a poke around in the innards of array programs as they execute when they fail or statically at compile time to the extent we can do that. Like the value of that is, I think, tremendous, especially you could like do it in the context of particular data. Like maybe you get like a NAN coming out the end and it's like, well, where did that really start to go wrong? Like you can't even printf. So no, you're going to, you need a rewind time or you need to put a little instrument, like little watches, like there's this thing, D-trace in Unix is like, like you can, or Linux, like you can hook in and say, every time this happens, I want to know about it. And it doesn't have any performance impacts because it's smart about how it changes the process to do that. So like there are all those exciting ideas. And I think the more information you have in the type system to do that, or to help that process, the better luck you will have at like creating a rich diagnostic and debugging experience. So I think, I suppose that I'm advocating for much more sophisticated type systems for array programming in that, in the context of your point. Let me also apologize. I think I maybe should have tried to get to the rainbow array thing a little bit earlier in case, you know, people ...

01:17:40 [CH]

We're either cutting that apology out. We do not accept this has been, it's, it is, I mean, we've had, I think, several conversations of this with folks. I mean, the one that comes to mind is Stevan Apter, [09] but I mean, I could, I could list off, you know, four or five names, but these are clearly folks that, you know, are think very deeply about, about these topics. And, you know, when we have those folks on like letting them just sort of walk through their intellectual journey, if you will, over, you know, years, it's, it's clear that, you know, at least for me personally, and I guarantee you, if not all the listeners, most of the listeners hearing that journey and like, you know, you know, it's, it's storytelling, but in the form of, you know, why I thought about things and I hit this problem, and then I solved it. And it's like, you know, like, I've said this before on different episodes, this is like better than a Marvel movie. And that analogy has actually gone downhill, because Marvel, Marvel movies have gone downhill. But yeah, this, this is like more enjoyable to me than, you know, Endgame or whatever the apex of that stuff was. Anyways, back to you. Apology not accepted.

01:18:40 [TB]

Okay, I'll do my best to sort of try to add a little bit of, of like sizzle and incentive to, to, yeah, to part two, if there's a party, I maybe just describe a little bit what what rainbow array thing is. So it's a it's a two part blog post. The first part is, is trying to kind of recapitulate where I suppose where we are now with array programming, like not covering everything that people do. And certainly I'm not even in reference to any particular framework, but just like, what are we doing when we working with multi dimensional arrays, and I've left out some things and I apologize. And also, I'm very happy to be able to give me feedback about it. But I just try to like take a very structural perspective, like mathematical perspective, like what do we, you know, what array programs do? And the argument I make, which is not an observation, it's unique to me, like a lot of people have made this is that arrays are functions, they're just a particular kind of function that you can memorize, right, they have a finite domain. And so you can enumerate the domain. And you can just have a lookup table of exactly what the output of the function is for each possible input. And like, that's, I would argue, that's the best way of thinking about, like ordinary rectangular arrays, ordinary multi dimensional rectangular arrays. And it's a little bit more general from how, in practice, people think about arrays, because often, you know, particular indices are always consecutive integers. So like, that's more limited than what I described, I described, you can have any types in the input, as long as they're finite, as long as their domain is finite, right. But like, okay, if you limit integers, and largely, we do that for historical reasons, we did because like, we knew that the integers meant something else. But we didn't, we didn't want to like keep track of that bookkeeping, we didn't want the programming language to have to deal with that, because, you know, just like days of C or whatever, right, like, but really, often, those indices mean something else, like they may mean color channel, like the zero, one and two in an RGB array, like a width times height times color channel, like the final axis, zero and one and two mean red and green and blue, or maybe they mean blue and green and red, and you don't know which one it is, because like that metadata is not part of the type system. So it's, it's rare that like, you just got a raw numeric index that doesn't, you know, that's like it is what it is, like, sometimes it's the case, but, but so that, for me, that's the right way of thinking about like, arrays are just like, functions from a finite domain to like, some other type that's like the values, often the integers, reals, what have you, right? And that function is indexing. Yes, like, so looking up the value, the cell, that's what applying the function form of the array is, that's all it's doing. And to go to touch, to a little bit closer loop with some of the discussions about ways of seeing other data structures as functions is like, they're also all functions too. And it's just a question about like, what are the domains and what properties does that function have? And like, is it dependently typed or not? But anyway, so arrays are functions and all kinds of cool perspectives, like mind bending perspectives come about from that. So one is that transpose is argument reordering of the function. That's all transpose is, right? Maybe that's not too surprising, but if you like, see a matrix as a function from row and column, i and j, to output, to value, to m i j, then swapping the order of arguments of the function is obviously transposing the matrix. That's it. Maybe less, slightly less obvious thing is like broadcasting is just like having spurious arguments for your function that don't do anything. So if I have a function f, it has two arguments, and I write a function g with three arguments, and all it does is it calls f with the first two arguments, and just like the third argument plays no role. That's a broadcast. I broadcast f into g. Broadcast f, which was two array, a matrix, I broadcast it into a three array. So, okay, so that's broadcasting. So there's like very unusual perspectives that come from it. I found broadcasting much easier to understand when you like, when you see it from that point of view. In complex cases, it's much easier to just like see how the outputs depend on the inputs and like put variable names there and like track them back. So that's what the first blog post does, is try to like set the foundation there, convince you, the reader, that like arrays as functions is a like really nice and clean way of thinking about array programming. It also introduces a diagrammatic notation. So there are little things I call array circuits, and they show how arrays are little boxes that transform the inputs, you know, which part, which index or whatever, of cell of the array you're looking up, into the value, which is the contents of the array of that cell. So you draw these little boxes. And then array programming is just like building bigger boxes out of little boxes. It's just like plugging these boxes together in a circuit, such that the entire circuit is going to be a giant box, right? That's the output of the program. It's the whole frame that contains all of this internal circuitry that's piping indices around into other arrays and looking things up and combining them somehow and so on. So it's a very visual way of like saying, I'm a very visual person that helps me to like understand better what a lot of, like what a lot of things look like when you actually draw them as circuitry, right? A lot of familiar operations from array programming. Array-by-arrays is, I would recommend people look at that because I think it's, that's, for me, I found, I learned things from doing that blog post, even though it's not getting into anything particularly novel. It's just like this diagrammatic notation was interesting and enumerating all the different things we do. It's like, it's nice to see. It's not about any particular framework. It's about the abstract ideas. Then the second post is like more speculative and it's like not finished. And please don't judge it too harshly. I know there's some mistakes there, but like the second post is, okay, let's switch it up. The first post was about functions, arrays as functions, functions with positional arguments. So a matrix is a function with first argument and a second argument. The first argument is the representing the row index. Second argument is the column index. And then the result is entry. Okay. But that's not the only kind of function we have. Like even in Python, we've got functions with keyword arguments, right? Or like named arguments. You don't, it's not position, that's just a convention. Instead, you're going to like explicitly name the arguments. It doesn't matter what order you pass the values for those arguments, as long as you keep the names wrong. And so in that blog post, I'm sort of like explaining what happens when you switch from positional argument convention of arrays to keyword argument convention or named argument convention of arrays, and you get named axes, corresponds to named axes. All the rest flows through. And I update the circuitry notation, the diagrammatic notation for array programs to instead of using left to right order of the wires, which is like how you understand which axis you're talking about. Instead of using order, we use color. It's just like supernatural. You don't have to like hunt and find where a symbol is in a diagram. It's just like the colors just pop out to you. And what's cool about that is that it makes it really vivid, like why broadcasting is sort of gone, why it's a solved problem. There's only one way to broadcast. You can't get it wrong if you use this named axis convention for arrays or this new paradigm for array programming, because the colors just have to snap onto each other. That's how it works. And I give some examples that show it being applied for like dot product matrix multiplication, that kind of thing. So it's not really a full story. It's like, I think, a bit of a more of a vivid illustration of why named axes are a cool idea. And I think the future, personally. And yeah, it'd be exciting to see how array programs can incorporate that insight. Maybe we have to design new ones, or maybe we can add those features into existing ones in a natural way. I'm not sure. I should maybe mention one thing, which is. . .

01:26:52 [CH]

No, that's cool. Sorry. We've got part two where you can remember. And what we'll do is we will put, definitely at the top of the show notes, links to both of the articles. And we might also put, I mean, this hasn't gotten mentioned yet, but there was some, you know, internal correspondence and Marshall has had similar ideas and has a gist, I believed, in the form of an article. I believe it's called Iridescence, [10] right, Marshall? And we can. . .

01:27:24 [ML]

Yeah. And I was actually, I didn't realize how much you were following the function way of thinking when you were developing it. That's also a lot of what I did to develop the ideas in Iridescence. So it's, and not only that, but even looking at like Python's keyword arguments and functions. So it's interesting how the sort of same paths lead you to the same things.

01:27:44 [TB]

That's awesome. I think it's staring us right in the face. Like when you make that, I want, oh yes, I do actually remember, like, I really have to say this. Like a lot of people have made this observation and named argument, named tense arguments are like not my thing. Like I know I, whatever, I may have thought of it independently from some other people, but like a lot of people, Marshall, including you and Adam Paszke and Sasha Rush, like many people have, the Xarray people, I've linked some of those in the blog post, but it's like, this idea is good enough that I think like a lot of people are rediscovering it and discovering it. So I just want to like not appear to be taking credit for anything there. Right?

01:28:27 [CH]

It's one of these, what do they say? Ideas whose time has come, you know, when multiple, it's happened many times in history where multiple people have the same thoughts. So, I mean, we will make sure to link, not just all of those to the articles but also to Marshall's. And so if the listener is interested, they will have the opportunity to email us their questions. And we can then, along with all of my burning questions, which we will get to in part two, the listener can ask questions directly of Tali and potentially Marshall. And we'll make sure to leave show notes as well to the Xarray and JAX and there's a bunch of links that have sort of, we've internally sent around. So we'll include those as well. But I will throw that over to Bob. If you want to email us either your questions in part two, or just, you know, comments that you have about this episode, people can reach us at...

01:29:18 [BT]

Contact@ArrayCast.com. [11] You can email us your comments or your questions and observations. And if we really truly have gone off into the weeds and you haven't followed any of this, maybe listen to it again, he said. With a nasty laugh. But, and just a real quick personal observation. A lot of times when people come up with ideas that have come up before, I think of people, like, I spend a lot of time doing mountaineering and stuff. And when you get to the same idea somebody else has, well, you've climbed the same mountain. Well, good for you. You may not have been the first one, but you got up the same mountain they did. If they were the first ones, well, so what? But everybody who got up to the top of that mountain got up to the top of the mountain. But the big question is, when you got to the top of the mountain, what did you look for? What did you see? And what do you bring back down? And I think that a lot of times you quite often see it. A number of people go to the top of the mountain, but then one goes up and goes, hey, look over there. And now we've got a whole new land to explore. And I think that's the way that I'm thinking about a lot of the things you're talking about, is you're not the first one to the top of the mountain, but for whatever reason, people haven't. . . And again, I'm limited in what I know, but people haven't said, what's that land over there? And maybe there's nothing over there, but the point is, that's what explorers do, and this is why it's exciting. And I'm just, I'm very energized by this episode. It's a lot of fun. And with that, I hand it back to Conor.

01:30:45 [CH]

Yeah, I'll just, we'll finish this by saying, yeah, thank you so much, Tali, for coming on. This has been an absolute blast, as you know, the fact that we've gone way over time, but that is becoming more and more the norm these days. And yeah, one of these days, we're going to come in at under 60 minutes, and we're going to be like, wow, what happened?

01:31:05 [BT]

I'll just point out that the episode that I hosted, I think we came in an hour and one minute.

01:31:09 [CH]

All right, well, that's clearly my fault then. I apologize, not to the listener, but to my fellow panelists, who we always say we're going to be out of here at a certain time, and I never follow through on that. But yeah, thank you so much for taking the time, and hopefully we'll be able to get you on in the next either, you know, couple weeks here, so that our listeners aren't, you know, left waiting too long to hear part two of this. But yeah, this has been an absolute blast. So thanks for taking the time.

01:31:32 [TB]

Thank you. I look forward to it.

01:31:34 [CH]

Awesome. And with that, we will say, happy array programming.

01:31:44 [All]

Happy array programming!

01:31:46 [Music]