Transcript

Transcript prepared by

Adám Brudzewsky, Bob Therriault, and Sanjay Cherian

Show Notes

00:00:00 [Sven-Bodo Scholz]

I think one of the big achievements of array programming, for me the big motivating thing, is as soon as you have felt this thinking in terms of shapes, things become much clearer and you can express things much more concisely.

00:00:12 [Music]

00:00:23 [Conor Hoekstra]

Welcome to episode 107 of ArrayCast. I'm your host Conor, and today with us we have a special first-time guest. Very excited to get to talking with him in a couple minutes, but first we're going to go around and do brief introductions. We'll start with Bob, then go to Stephen, and finish with Marshall.

00:00:36 [Bob Therriault]

I'm Bob Therriault. I am a J enthusiast, and I'm going to have my mind expanded today, I think.

00:00:43 [Stephen Taylor]

I'm Stephen Taylor. I'm an APL enthusiast, and I think my mind might get a little discombobulated.

00:00:52 [Marshall Lochbaum]

I'm Marshall Lochbaum. I'm very worried now, but anyway, I started array programming in J. I worked at Dyalog for a while, and now I develop BQN.

00:01:01 [CH]

And as mentioned before, my name is Conor, host of ArrayCast, massive fan of all the array languages, and excited to talk about one that has only been mentioned a few times. I know it's been mentioned on the podcast, but I don't think we have dived deep into it at all. And we will get to that right after this introduction, because we have zero announcements today. So we're going to go straight into introducing our guest, who is Sven-Bodo Scholz, [01] a professor at Radboud University. Hopefully I'm pronouncing that correctly. I did watch a YouTube video of an English student there, and it looks like a very beautiful campus. But he's a professor in the Department of Software Science of the Institute of Computing and Information Sciences. But for us, he is the individual behind SAC, that is Single Assignment C, which, like I mentioned, is a, if you are an astute listener with a very good memory, you will know this programming language as a functional array, kind of C-flavored programming language. I think we've mentioned it on a handful of episodes. I won't be able to recall which ones. But Bodo has been on my list of folks that I've been wanting to get on this podcast for many months now. And we're extremely excited to have you here, Bodo. So maybe we'll start off by just throwing it over to you. You can take us back as far as you want to when you started getting into computers or programming languages. And along the way, I assume at some point, you discovered the array language paradigm. And then take us all the way to today, 2025, and the research that you're doing with Single Assignment C and everything related to that. So take us back as far as you want.

00:02:34 [SBS]

Yeah, well, thank you very much for having me in the first place. It's a big honor to speak on this platform and share a little bit of my journey here. Yeah, where should I start? I could potentially start essentially when my brother started building one of the first computers at home, where I thought, "Oh, this is cool stuff." So I have a little bit of a hardware taster there. And then it was clear I have to study computer science. And in computer science, the first thing I was exposed to was lambda calculus. And I thought, "Oh, this is a cool thing." And I had already self-taught the typical stuff that you would know, a little bit of basic, a little bit of assembly, that kind of thing. And then I realized, "Oh, look, most of the stuff you can actually describe very nicely." And it was also clear you have all this wonderful concurrency that is built into the semantics. You can evaluate in any order and the result is never going to change. Oh, no more bugs. That sounds cool. And I kind of got hooked into this story. And I thought, "Oh, yeah, we have to have parallel performance. Really, this is what we have to do." And then I started doing this in the context of my diploma thesis, which is kind of a master thesis at the time in Germany. And I was super proud. I had wonderful speed ups on Quicksort, almost linear speed ups up to 16 cores, which at the time is early, what was it? Late 80s. It was amazing that we had a machine like that. And it was a super expensive shared memory, Sparcstation. And I was proud like nothing else. And until one of the PhD students in the department came along and said, "Oh, wow, that's good speed ups. That's amazing. What's the absolute runtime?" "Oh, yeah, it's like 10,000 elements or something." And then they said, "Oh, yeah. And how much time do you need?" "Ah, 10 seconds." And I said, "Well, you are aware that if I use a sequential C version, it just takes a fraction of a second, right?" And that was, for me, it was facing the brutal truth. It's like, "Okay, the way we do functional programming sucks. It is great, but we don't get the parallel performance, right? How do we get there? It doesn't even make sense." Well, okay, it became clear the baseline has to be fast. And how do you do a fast baseline in the functional setting? And if you look at functional, the typical functional languages, then you have lots of frills, which are great. You have lazy evaluation, garbage collection, algebraic data types, all these super fancy things, advanced type systems. And when you try to implement them, you figure out, "Ooh, you have to start boxing data. You need garbage collectors. Ooh, you can't predict when what is going to be evaluated. Debugging is a problem. Higher order functions. You have to build closures. Oh, my God. This is way too costly. We can't go there." So this is what kicked off my PhD studies, ultimately. And then I came into touch with Leonore Mullin, her work on the Psi Calculus, which was trying to formalize, essentially, APL. It was called Mathematics of Arrays. And I was reading her PhD thesis, and I thought, "Ooh, this is kind of cool stuff." And also, it gives you the opportunity that you can do real parallelism, data parallelism. What can possibly be difficult about data parallelism? And this is one of the things that kind of haunt me to the day. Very often, if I tell people what I do, they say, "Oh, yeah, data parallelism is simple." Everybody knows how to do a map. What can be possibly inefficient about that? It turns out it is much harder than one might believe it to be. And so this was kind of the birth of the idea of SAC . So the idea was to say, "Okay, I want to have a functional language, but it should be easy for the people that do data parallelism. So maybe something like C is a good idea." So that gave me the first hurdle, because people say, "Ah, C and functional programming? You've got to be nuts, right? You've got to be out of your mind, right? How can C be functional?" And my claim, which I hold up to this day, is, "Yeah, C is a perfectly functional language, as long as you don't use pointers," which is kind of a cruel statement. But essentially, SAC is just that. It is C, but it doesn't have any pointers. It doesn't have any notion of memory. It just has arrays.

00:07:30 [SBS]

Every data structure is a potentially n-dimensional array, and you can define C functions that take arbitrarily shaped, and that includes ranks, arrays as parameters. So that was the basic idea, and we stick to that until today. But then the question is, which combinators should we have? And then I detected the wealth of the world of APL, but I also came across a few different people. So at the time, that was in the '90s, early '90s, I was meeting people like Bob Bernecky. I was meeting David Leiptak. I was meeting the guy from APL 2000, New York, Brown, I think.

00:08:12 [ML]

Jim Brown?

00:08:12 [SBS]

Yes. So I was meeting all these folks, and they told me, "Oh, yeah, you have to have a rank operator, or you have to do this, or you have to do that." They all were absolutely convinced that there's a certain set of combinators that you have to support. And I thought, "Hmm, yeah, they're all convinced of one particular set, but what set should this be?" And also, if I ever want to optimize this, well, then I have to look at, I have to pattern match for phrases. Oh my god, and I have to teach every phrase to my optimizing compiler, because remember, my driving force was performance, right? So essentially, I think the only kind of, no, one of the two ideas that are in my PhD thesis is to say, "Okay, I don't want to have any fixed operator, but I want to have a language where I can define any operator the way I want it." And if I define it by a single, albeit a little bit complex language construct, then all I have to do is, when I have a combination of these, I have to be able to pull them together. I have to be able to fuse them into a single operation. Because one of the key things is, if you materialize intermediate arrays, your performances dives. So I have to be able to fuse operations, and I don't want to teach my compiler, I'm way too, you could say lazy, to implement this. I don't want to teach my compiler all the phrases that I need to recognize. I just want to have essentially a combination of one and the same language construct. And ideally, the transitive hull of these can be expressed by this operation as well. And that was the key of the SAC journey, where we started, and where we could show that you can write something in very much APL style and still get C-like performance out of it, just in the sequential case even. And at the same time, because it is a side-effect-free language, you also get all the benefits of the concurrency that is semantically guaranteed, and that is really nice. So for example, if you think about C and you have OpenMP, then you can make annotations and tell the compiler, "Yeah, it's all good." But I tend to teach these things to my students, and the students, when they get it wrong, they are really happy to have fantastic run times, but unfortunately, the results are often wrong, because they have a race condition, some annotations are wrong, and there you go. So that's why I kind of stuck to this. I had a couple of journeys into different ventures. I looked at stream processing at some point in time, but I always came back to SAC because I'm still intrigued by the design, kind of the simplicity in the design, and the ability to express arbitrary things, to define your own combinators, to combine the combinators without being punished in terms of performance, because there is no guarantee that any intermediate array that you specify is ever going to materialize in memory. So I've become an increasing believer into this, and then the question was, how far can you push the performance, and how far do you have to change the code if you target a different platform? So currently, we started out doing shared memory multiprocessor systems, and so this has been done by Clemens Grelck in the '90s, fantastic work, very good performance. Then in the 2000s, Jing Guo built a GPU backend for us, and then we found out, ooh, some of the optimizations we typically do are not good if we want to do GPU stuff. So actually, our compiler, you tell the compiler what platform you want to generate it for, and then different things happen under the hood. And the good news is the source code does not change at all. You can use the same source code for sequential, multi-threaded GPU cluster, multi-GPU, [02] whatever you want, but with not always a black belt performance, I would say, but in many cases, very close to black belt performance if you do this smartly. And now always, one of the things that I'm particularly, I should assume, enthusiastic about, maybe that's the right wording I should choose here, is a discovery that I should have made 20 years ago, but I essentially made last year, and that is a good answer for the question, why do you need rank polymorphic programming facilities? Because traditionally, I said, "Ah, yeah, rank polymorphism is great. You can define these combinators. They are applicable to arrays of any shape, dimensionality, no limits." But then people come back, "Yeah, but all my applications have a most four or five dimensions. So what is it good for?" And I then said, "Yeah, but if you need one more, you can get it automatically." But lame answer. And I had to give it to them. And I I was pretending that yeah, I I still thought this is a is a big leap forward now. Fast forward to last year or the year before.

00:13:42 [SBS]

I had this experience with one of my PhD students, and he came from numerical computing and completely outside the computer science realm, per se, but he comes from mathematics. And he says, "Ah, yeah, let's do matrix multiply." Okay, he did a naive matrix multiply, as you do in array languages, it's a two-liner. Very nice, very close to the mathematical specification, but slow. "No, sure it is slow because everybody knows, yeah, it's a cache problem. So you have to do tiling." And then he came into my office, he said, "Oh, Bodo, this tiling really, forgive my French, sucks in SAC, right? Because you don't really want to write loops and the language is not really made for this." And, "Meh." So, you know, I'm kind of stupidly proud a little bit about the language design. So I thought, okay, I have to give him a good answer on this one. So, hey, conceptually, if you tile, isn't that just, you know, kind of creating a higher dimensional array? So instead of having a two-dimensional matrix or two-dimensional matrices that you multiply with each other, actually, it is a four-dimensional matrix multiplied with another four-dimensional matrix. And then what you do is you have a matrix multiply on the outermost two axis. And instead of multiplying the elements, you do a matrix multiply on the elements, which are two-dimensional arrays. Clear, easy to understand in an array programming setting. Ah, so this is now a recursive function. It is a rank recursive function. You want to have like four or five levels of tiling, no problem. Make it a 10-dimensional matrix. Two 10-dimensional matrices with the matrix multiply. And, you know, he went away and, you know, said, "Oh, yeah, it's great. I get the right results. Oh, yeah, that's good." And then I was afraid what was to come because I thought, "Oh, my God, you know, rank recursive functions. If I think about what kind of code and, oh, no, I'm not sure I want to see that." Monday morning comes, he comes into my office, he says, "Oh, no, oh, no, this is great. It works, but it is so slow." And I thought, "Yeah." I expected that he would say something, "Yeah, I get like two gigaflops because, yeah, well, this is roughly what you get on the naive one." And he says, "Oh, I only get 60% peak. That is like 400 gigaflops on my laptop." I said, "What? Yeah, you know, I think it should be more." I really like this attitude, right? And sure as hell, you know, he figured out what it was. And, you know, there is a problem of the C compiler not being able to deal with the innermost loop. So he created some inline assembly and, you know, we replace one explicit loop by, we have an explicit definition for a particular size in the code, and then it all works and we get like MKL-like performance. But the specification is actually, you might forgive me for saying so, it's really sexy because it is, you know, I have one specification for n-dimensional tiling. And if you look at the packages that you have in your standard, you know, MKL or OpenBLAS or whatever, they have thousands of lines of code. And why do they need this? Well, because they have, you know, various different blocking scenarios for various different architectures. And depending on the architecture, they use a different blocking. But this all sits in the code. If you have rank polymorphism, as we do in SAC, then it can sit in the data. And the compiler, if you tell them the compiler, you know, what's the shape that you throw at it, the compiler can go clunkity clunkity clunkity clunk and generate your tiling. And to make matters even better, if you trust your code generator, then actually it's guaranteed to compute the right thing. So one of the things that we have done is we actually have encoded this in Agda and proved that our rank polymorphic version is guaranteed to compute the same thing as the naive one, which essentially means, you know, you can tile any which way you like, and it's guaranteed to do the right thing. Assuming that the code generator is working and you can get excellent performance. So that's where you need rank polymorphism for. Because rank polymorphism allows you to kind of encode in the shape the way you want the data to be traversed. And that is the key thing if you want to get performance. That is the key thing for locality that you can steer. It is the key form for you can also shape if you have a fixed layout that your compiler produces, as we do, we do row major, then you can actually kind of influence what happens also with the parallelism. Where does the compiler actually decide to generate parallelism and where it doesn't?

00:19:15 [SBS]

So we also applied this for scan, which is also a very cool solution because you can have a rank polymorphic version of scan, which if you choose the argument to be a vector is purely sequential. But if you choose the argument to be a higher dimensional thing, you get a parallel version. And in fact, if you choose the shape to be a power of two, so to speak, so a two by two by two by two by two by two by two thing, then Blelloff's algorithm falls out for free. You just have it. And you can choose this. And potentially a compiler could choose this. But in essence, you can choose this and play around with this. So that really got me into this. I have a better answer now, right? If somebody is asking me what you need rank polymorphism for, I tell them it allows me to actually express essentially an entire class of algorithms that compute the same thing, but depending on the shape, have a different memory or concurrency behavior. And that is super powerful because I can reason about these programs. I can run mathematical proofs about what they do, whether they're correct, whether they're equivalent, all these things. Now, after we have done this, obviously the next painful thing that arose was we have a, how should I say, our type system was always driven by partial evaluation. So we said, okay, we have to talk about types, right? Because if you want to generate code, you better know something about types. And we want to figure out whenever we are dealing with scalars rather than just arbitrary arrays, because then we don't have to box them. We don't have to dynamically allocate them. And actually we try to get everything into static shapes as much as possible as we can. But our type system does not allow to express full dependency on shapes. But if you write rank polymorphic or bank recursive algorithms like the ones that I'm talking about, this becomes really painful because it becomes very difficult to understand whether you're dealing correctly with your shapes. Because thinking recursively about shapes is a bit funny. And if you get this wrong, it is rather painful as well, because you get some runtime error and say, oh yeah, but you have an out of bounds selection here because we couldn't statically infer it because you didn't provide enough information about your input. Yeah. So the other step that has happened that has just gone into the version of Sec 2.0 is that we have provided now type pattern. And the type pattern allows you to specify relationships that you expect on arguments of your functions. So if you have a naive matrix multiplier, you just say, okay, it has to be a double MN and a double NK. But we allow arbitrary mixes there. So you can say, if you have a take and you say you have a scalar integer parameter int n, then you can use this n, the parameter name, within a shape pattern as well. So you can describe all kinds of relationships. And in particular, you can also do this for these rank polymorphic things. And we try to partially evaluate these things. And if we can't partially evaluate it, you could decide whether you want to have a runtime check or not. So while you're developing, most likely you want to have these runtime checks. But if you are confident about certain functions, you may just suppress the runtime checks in those functions and get potentially better optimizations for this thing. And yeah, that's why I'm very excited now about this, because I think this is also a good opportunity to really make a difference when it comes to, say, implementing deep learning networks, because you can specify what the shape requirements are. And that means you can also more easily debug your programs. Because if you assume that you specify all your domain constraints on every function, you will get the outermost failure. So if you call matmul with incompatible arguments, the compiler will say, "Oh, you called matmul and the type patterns don't match here," rather than something deep down in the core graph saying, "Oh, this is out of bounds now," or something like that. So yeah, it has actually shifted my perspective a little bit.

00:24:04 [SBS]

Yeah, and I'm excited about many, many new developments. Also, the arrival of these billions of new accelerators where we can generate code for heterogeneous systems. We just started playing with one of these Intel systems where you have two different kinds of cores in there. Then you need to have a code generator to really make good use of that. And there's many more things I'd be happy to talk about, but I think I really need to give you guys an opportunity to engage here. But there's a plethora of exciting things going on. We are creating new backends for new architectures. We are improving the debugging experience. This type system thing is getting increasingly exciting as well. And we have created a, I'll just call it a Jupyter notebook plug-in, [03] so we can pretend to be REPL, although we aren't. We just run the compiler in the background and things like that. So I know professors like to talk too much, so I will stop here for now.

00:25:14 [CH]

It's all right. It makes our job very easy. I was just queuing up the questions and I'll do my best not to ask them all at once. And maybe I'll treat it as a stack, so we'll do last in, first out. You mentioned SAC 2.0. When I was looking at the docs just before recording this, I think on this website it said that it was, or at least the docs there showed it for 1.4. So is there an upcoming 2.0 release that is breaking news that we're letting our listeners know about?

00:25:47 [SBS]

Well, if you look at the open source compiler, the compiler already claims to be 2.0. But it is not yet stable enough that we are happy to really shout out about it. I hope that we will have a release party somewhere in autumn. We have most of the features in, but this type stuff is really new. And we also have support for structures now, so you can also have arbitrary nesting of arrays of structures that contain arrays of other structures and substructures. And that has created some new, has unleashed or unearthed, I should say, some new bugs that we have not been aware of. And we're trying to fix those things. So yes, it kind of exists, but we have not really widely advertised it yet. We are actually not very good at advertising ourselves.

00:26:55 [CH]

That's all right. I'll keep my eye out for updates in the fall time space, and maybe we can announce it on a later episode when it's fully ready to be launched.

00:27:06 [SBS]

Very much appreciated.

00:27:07 [CH]

Yeah, absolutely. Bob, you had your hand ready to go.

00:27:10 [BT]

Yeah, Bodo, you mentioned Lenore Mullin and her mathematics of arrays. How closely does SAC follow that? I had a chance to talk to Lenore, oh, probably a couple of years ago. Actually tried to get her on the podcast, but at that point, to be quite honest, I think when she talked to me, she realized I wasn't sophisticated enough really to deal with this stuff. I think we've probably, well, I don't think that's a bad read. I think she probably had a good read on that. But at that point, she sort of declined. I'm not sure she thought we were as serious as we maybe should have been taken at that point. But I think we've got into that zone by talking to people like you, Bob Bernecky, a number of other people that are really in very deep into the same kind of design. But are you still following quite closely to her mathematics of arrays? Because I know that's something that's really, I would guess it's axiomatic to her. It's an axiomatic system of arrays.

00:28:09 [SBS]

Yeah, it's kind of a longish story and not exactly an easy story. So I was very much inspired by her work. I think the psi-function is a very nice formalism. And so in a way, you could say that our key language construct, which still under the hood is what we call the with loop. Although these days I try to avoid that terminology. We are now more about tensor comprehensions, just because we have a new front end. And I think it's much more readable. It's also more expressive and things. But ultimately, internally, this is all mapped to these with loops. And they are very similar to her psi function, at least in intent. The difficulty that I always felt is that whenever I talked to Lee Mullin about these things, that she very often immediately fell back to her mathematical notations. And I said, look, I think we need to have a representation. Because if you want to have a compiler, you need to have a representation of what you do. And maybe, maybe something like the with loop is a reasonable representation for this. And I had the impression that she considered my wish to have a representation and talk about a representation as being too mundane, maybe, or not really relevant. And that's where we had a little bit of a problem. Um, and so I, yeah, I kind of see what she's doing. And I think this is still exciting, and it is it is good work. But I felt a little bit that it would have been nice if if this would be more closely related to an implementation effort. So there is, I would say there is a close relation, but it is, I think, while being very similar in spirit, it the the implementation is very different.

00:30:14 [BT]

Yeah, I know, as I said, from talking to her, I think she's a very strong person with very strong opinions about how things are done. I think that's what got her to the point of understanding this stuff and explaining it this way. But I can imagine if you were trying to put something into practice, you'd probably find pretty quickly that it does, it diverges a bit from what her vision is.

00:30:38 [SBS]

Yeah, that is kind of, that's kind of the situation as I read it.

00:30:43 [BT]

Yes, I'll give another try to get her on because she is fascinating and brilliant. But as I said, the first time I don't think she, well, she talked to him only talked to me, she didn't talk to anybody else. So she was probably influenced by my naivety. I don't know. The other thing you were mentioning, you were talking about the rank polymorphism. That's really interesting, because actually, when you when you're trying to talk to a person who's programming and you say, well, rank polymorphism is used for this. As you said, it was kind of weak sauce when you first talked to them because it's like, well, I wouldn't do it that way. That's a reason. But it's not that they would use it. It's that your compiler uses it. Yes. And that's the step that they don't even see normally. We had Henry Rich on the last episode talking about all the steps that an interpreter does that you wouldn't see when you're actually trying to program. But it's that level. By having rank polymorphism, it allows your compiler now to do things that the programmer doesn't even actually have to be aware of.

00:31:46 [SBS]

Yeah, I think it's also, you know, I think one of the big things or achievements of array programming or the big, for me, the big motivating thing is as soon as you as you have felt this thinking in terms of shapes, things become much clearer and you can express things much more concisely. And so I think it's this feel kind of a almost like a barrier that you have to push through. And once you've seen it, you can't unsee it anymore. And it becomes very natural. Do you think, yeah, sure, if I talk about this is exactly like the tiling thing. Once I had understood that, you know, yeah, sure, tiling is not just higher dimensional arrays. That's all. And then all of a sudden, the specification becomes like a six line, a six liner, and you have arbitrary tiling and it's all there and you can prove that it's correct. I think that is that is so, so sexy that, you know, you can't do it any other way. And if you have ever written a tile code by hand, I think, you know, being exposed to this kind of style will give you will you give you the goosebumps. It's just amazing.

00:32:57 [BT]

It's like finding a seam in something. Suddenly it all just drops away.

00:33:01 [SBS]

Yeah, it is. It is a cool way. I think it's a bit like you could argue if you if you look at object oriented programming, it's a similar thing. You know, once you you start thinking about state as being encapsulated and representing a state for a thing and then you have objects and classes. Yeah, it's it makes sense for a certain kind of thing. And the same holds if you want to massage massive amounts of data. It's a good idea to think in terms of shapes. And it also matches very well the scientific computation world. I do know the if you just look at the physics papers, all the mathematics papers, how they are written and you can rewrite this in an array language, you know, one to one. And I think this is really, really powerful. And I hope that this entire buzz about, you know, machine learning will get more and more people to to experience this joy.

00:33:57 [CH]

Go ahead, Marshall.

00:33:58 [ML]

First, I will say I did read the scan paper. I didn't see the matrix multiplication, but on scan, I mean, I thought that was a really good presentation of the idea. And I even referenced it on my own in my own notes on implementing scan.

00:34:10 [SBS]

Oh, thank you.

00:34:11 [ML]

The big question I have about this approach generally is when I'm writing like tiling code or things that generally work in chunks, it usually feels like there's the work is split into one part of figuring out how to chunk everything and about 10 parts of figuring out what to do with the last chunk, which may not be a full chunk. So my question is kind of you tend to describe these things in terms of, well, you just split the shape into factors and then it all works. What do you do if the shape is not evenly divisible?

00:34:45 [SBS]

Yeah, there is some more research to be done. If you allow me to answer that cheekily. Yeah, that is clearly is a problem. And one of the things that you could think of is that you say, okay, I just pad my array and I can have a padding operator that then brings to a power of two. And then the question is, how much will our standard optimizations already give you? So most likely the embodying into the zero padded array will come as at almost no cost. Yeah, that is an open problem. And I obviously in an academic publication, I like to sweep that a little bit under the carpet, but that is something that needs looking into.

00:35:34 [ML]

Yeah. I mean, I guess the ideal from my perspective is in SIMD programming, [04] at least like the last dimension you have to pad, you have to work in full vectors because there's no such thing as a partial vector operation. Like you can mask it, but that's just a full vector operation that you don't entirely use. The ideal would be maybe to pad in each dimension. So there is waste, but there's very small waste, but that means you still have to deal with, it's fundamentally not homogeneous because like if you have two dimensions, the last row of your matrix is not as long as the other rows. But the idea would be that by kind of doing that recursively, you get, I mean, I'm not saying that's actually a solution that works with your language or necessarily or compiler, but that's kind of a discussion point on how to approach it.

00:36:24 [SBS]

No, absolutely. I mean, there is where these are two different things in a way that you have mentioned. So this padding problem is indeed is a problem and requires a good solution. I would have some ad hoc ideas how to approach it, but it requires elaboration and finding out what needs to be done in order to get efficiency there. There's also the problem to figure out what kind of tiling you really want. Currently, we were just kind of stealing this from MKL and Co. to just say, okay, they use this tiling, we use the same tiling and we just do it in data and that's it. That's one thing. When it comes to vectorization, that's actually another can of worms. And I had a PhD student in Edinburgh to work on this. So the key idea there was to actually transform the shape of a given matrix and with it, the entire algorithm so that you get in the last axis, a vectorizable vector. And that is actually, the theoretical work was all done. There exists a partial implementation, but it is unfortunately not in our main compiler branch, because there were some missing bits and pieces and then he ran out of time and I couldn't find someone to pick up on this. But that is also a very exciting field of work where you can say, okay, really, I don't want this in a column here, I want to split up this column into two axes. And you can have an array transformation calculus, which then allows you to transform the entire computation accordingly. And that is a very exciting piece of work. Again, where helping hands would be needed to make this happen. Ideally, I would wish that we could derive these kind of rank polymorphic, rank recursive algorithms from the naive one. That you say, okay, I just use my matrix multiplier and then the compiler goes, "No, this looks like I should be changing this into rank polymorphic one and then I should reshape this." Because it always boils down to the same thing. It is ultimately, you have a reduction and you want to make sure in which order you perform the reduction. And you build on the fact that we pretend that actually our floating point numbers are proper fields, that all the laws to hold. I mean, yeah.

00:39:04 [ML]

Yeah, I've been writing some reduction code in BQN and we do not assume that. And so, yeah, there's a lot of avenues that are closed to us.

00:39:12 [SBS]

Yeah, it is. And if the PhD student who did the Matmul thing, Thomas, he is very much looking into numerical stability. And the more you start looking at it, the more fearful you get. It's actually almost like thinking, yeah, it's amazing that we get anything sensible out of this. Because it's very difficult to predict. It's very difficult to predict which order you should do summation, unless you know something about the distribution of your data. But you typically don't.

00:39:43 [ML]

On the one hand, yes. But on the other hand, the sequential ordering is one of the worst.

00:39:48 [SBS]

Yes. Well, but then, you know, which ordering is admissible and which one isn't. And if you say, okay, I assume that the programmer knows it all.

00:39:57 [ML]

And you're pushing a lot of work onto the programmer.

00:40:01 [SBS]

Yes. And this is just avoiding a blame that I think is unreasonable to expect from the programmer. The average programmer that is, I mean, if you have a mathematician that is thinking about the stability of the algorithm, but even then, if you have a mathematician that thinks about the stability of an algorithm, then they will try to make it robust. So that actually, the order really doesn't matter.

00:40:24 [ML]

Well, and they'll know that this is work that they absolutely don't want to do, because it's very taxing and just really complicated.

00:40:31 [CH]

One of the things I was hoping to get you to talk about more, Bodo, is you mentioned back in the 90s, when you were talking to a bunch of folks and everyone had their own set of combinators or primitives that they said, oh, you got to, this is the set. And everyone had a different set. And then, correct me if I'm wrong, but it sounded like you were talking about this simple, even though it takes a little bit to learn language construct, which I think you were referring to the with loop slash tensor comprehension. And so I'm hoping, yeah, you could maybe explain for our listeners a little bit about that. And also, as I was looking up with loop in the docs, I noticed that the keyword fold or keyword or identifier was right next in the identifiers. And I thought when I looked into single assignment C, I think it was two, three years ago, I was trying to see if there was a way to write a generic reduction that you could just pass a lambda or something or binary operation. And at the time I couldn't find one that you have all the sum, prod, min, max, similar to MATLAB and friends. But now when I see this fold, I'm like, did I miss something or did something get added? So two questions at once, I guess.

00:41:40 [SBS]

Yes, you indeed missed something. You can pass an arbitrary binary function there. And then, so all the definitions for sum and prod, they are in our standard library. So essentially what we really have is we have a map construct and a fold construct or reduce. And you can fold them into one. So you can actually have this map construct is the with loop. And the with loop allows you to say, to specify index ranges over which you want to map something. And it allows you to actually specify multiple results that you want to compute at the same time. And mainly that's it. And then you can change the mechanism to perform a reduction. So you can have index ranges, can have multiple index ranges and say, okay, whatever values I get over these index ranges, fold them all into a single value. And then you can have this fold. And so we call this an operator. So we have a fold operator. We have a generate operator and a moderate operator, all of which are the former generate moderate, typically a form, just a map operation of sorts. Or as of recently, I'd rather speak about an IMAP, meaning it's an index mapping. So you specify an index range that never materializes in memory. It's just kind of a generator specification. And then you specify what it is you want to do with the values that you compute. And you can either put them into an array or you can fold them using a reduction operation. And that's it. And it turns out, and that's a funny thing, is if you define an array A with this operation and you define an array B by using a fine selections in A, you can always fold this into a new form of with them. It's transitively closed. And that gives it the power that you can, as long as you have fine selections only, you can all boil it together into a single with loop. And that means you get rid of all intermediate arrays that way. And that's where some of the power comes from. So if you, for example, do a matrix multiply, have the result in C, and then you perform something like a stupid increment on all the elements, then the increment will happen while you compute the matrix multiply, because the two will be fused together. And I think there is the opportunity for array languages in general to even outperform highly tuned libraries. If you have libraries, then typically the library calls are like walls of stone. You call matrix multiply, yeah, boy, it's going to be super speedy. But you have to materialize the full result in memory, and you will get it in memory, whether you like it, whether you need it. No. Even if later on you only do a 10, a take 10, 15 over a 10,000 by 10,000 matrix, yeah, sorry, the 10,000 by 10,000 matrix is going to be materialized in memory. And we don't have to do that. We don't have to do that. And so I think there's a lot of opportunity, because we can do kind of arbitrary folding. Now, again, this is a bit of an oversell, you could say, because, well, I smuggled in the affine selection. So if you don't have affine selections, then things get a little bit more iffy, meaning very often we cannot do the fold. And it's not a panacea. It does not always give you the best performance. But for many cases, we have shown it gives you so good performance that unless you really need a black belt programmer that gets the last 10%, you can prototype things very quickly and get very reasonable performance.

00:45:59 [ML]

It is kind of funny, because the comparison you're drawing between the libraries and the SaC compiler is actually kind of the opposite of the comparison that I often draw between C and APL, which is that the so the advantage of C is the same as you described, where you can fuse all your operations together. The advantage of APL is that you have these handwritten kernels for things like, you know, if somebody does a plus scan or, you know, some sort of transpose, you'll have a handwritten kernel that's designed for that operation specifically. So, of course, the direction I always present this in is, you know, C is always presented as like the fastest thing, but APL can actually be faster than C. And you're going in the other direction of this. And I totally agree with you. This fuse everything model can also be faster than the library model. So, it's kind of interesting that it's the default assumption in a lot of places that, oh, you just can't beat fusing compilation. And in other places, the default is, oh, you just can't beat these libraries.

00:47:03 [SBS]

I beg to differ a little here. I think we may have to look at it a little bit closer perspective. Yeah, sure. If all you want to do is a certain operation or a certain set of operations, you can tune them to a level that is outrageous. And you get the super, this is what I typically refer to as a black belt programmer. If you speak to an HPC person, you know, they don't buy into any of what I'm saying. They say, I have to be in control of the assembly that is being, you know, if I can't do this, I'm not interested. Yeah, yeah, sure. The question is how, you know, what happens if I don't have a black belt programmer that I can or want to pay for this and how far do I get? And yes, if your applications always only use like, you know, I don't know, the 40 or 50 combinators that you may have in your favorite array language, then maybe you can get very good performance and individual operators most likely will beat anything that we can generate by miles, at least a solid percentage. I agree. And yes, that is true. But the question is what happens if, you know, how many combinations of phrases I think you call this in Dyalog, do you actually recognize? And what happens if somebody is using a phrase that is not recognized by the compiler? Because I know, for example, I was talking. Then you're in trouble. Yeah, you're in trouble. And I have seen this at work. I mean, you know, I know Bob Bernecky quite well, and I spent quite some time with him. And, you know, he did a lot of consultancy in APL. And one of the things that he did, he says, oh, look what they have written here. No, if you write it like this, the phrases will be understood. And then, you know, wow, if a lot of it. And this is great. But if you don't have your Bob Bernecky on your belt, then it's a bit harder. And I think this is the same here. I think it is really. And another thing that is, I think, is important is that you don't have to rewrite things if you go from one architecture to a new one.

00:49:19 [ML]

Now, that's an advantage of both systems.

00:49:22 [SBS]

Well, yeah, that's an advantage of APL in the same way as it is.

00:49:26 [ML]

Yeah. So, I mean, if I can give a little more organization to it, you're a black belt programmer who's working on your specific problem is not actually either of the things I'm comparing here. I mean, that's, I agree. You know, if you have a programmer that knows the specific hardware, that knows the problem, that's able to tune it, that's just the fastest that you can humanly do. So, the two different possible compromises are to have, you know, a compiler like C or SaC, which uses very general mechanisms, but might not always be able to compile them that well, versus the library or APL model, where you have specific operations that are definitely optimized well, but may not be the ones that you want. Neither of these is the ideal, and you get different trade-offs and different properties with both of them.

00:50:15 [SBS]

Yeah, I completely agree with you. Absolutely right. I subscribe to that. What I have difficulties with is if you say, well, a compiler like SaC or C. You know, when I started out, my idea--

00:50:29 [ML]

Well, yeah, I don't, these are in the same category, but they're not the same thing.

00:50:32 [SBS]

Yes, and it is, you would be surprised. I mean, I was surprised very much. When I started out, I thought, ah, yes, hey, boy, I'm using C, a subset of C, so maybe I can make it a preprocessor, right? I just, you know, strip away the stuff and then give it to C. Then I found out that the C compiler couldn't do much with it. [05] Many obvious optimizations were just not possible. I thought, wow, why is a C compiler doing this? Then I got into a European project, which amongst other things had a C compiler company from Amsterdam, and they were called Ace, I think, in the consortium. I was talking to these guys. I said, you know, why don't your compiler do this? His answer was, oh, we can't. Why can't you? Well, because we cannot guarantee that this is actually semantically sound, because we don't have a guarantee that, you know, there is no side effect. We don't have a guarantee that someone relies on this pointer to exist. It is really amazing. If you look at what the C compiler should be doing and you would expect it to do, it very often does not do, for good reasons, because, you know, people rely on the semantics of C, whatever that exactly is. One of the beauties of using something like SaC is that we cut away the semantic ambivalence. So in SaC, I can give you, you know, I can give you like two pages of a semantics description, and that's it. That's what the compiler is supposed to do. There is no such thing as, yeah, if you do this, then it's undefined behavior. If you do that, then only this is guaranteed, and this may or may not happen. And there's a good reason that this, I think, the C99 standard was already like 300 pages. And that is a problem that makes compiler life hard. I think the approach that C takes, and which is very successful, don't get me wrong, right, is, you know, we give you a super cool macro assembler, and, you know, it maps all down to the hardware very efficiently, and you deal with anything that comes on top of it. Whereas the SaC approach is, okay, we give you a clean language that is completely decoupled from the hardware. And we try to create a relation between the two by generating C code that hopefully does what we have promised you in the semantics. But having this on the high level, we can do optimizations that other compilers can only dream about. So if you give me an expression, like any function F applied to A, X, B, C, and you have the same, syntactically the same thing a little bit further down, I know it's going to compute the same thing. I can throw one computation away. Common sub-expression elimination, et voila. And that is something that the C compiler never can do. And so the constraints are very different. So our pretence to be C is maybe misleading sometimes. People think, oh yeah, they just hand it over to C. It turns out, you know, all the optimizations that you typically have in a C compiler, we had to implement as well. Because we can apply them way more radically. That was the first painful lesson that we had to learn very early on. Who would think that I have to implement something like variable propagation, common sub-expression elimination, loop unrolling. The C compiler should be able to do that as well, right? It turns out, no.

00:54:30 [CH]

Is that the default backend then? If you don't specify the GPU or OpenMP, does it generate C code and then compiles that? That's the default?

00:54:40 [SBS]

Yeah. It's the only way. So we only create C code. All our optimizations happen on a very high level, almost on source level of SAC. So if you download the compiler, install the compiler or run the compiler, you can break it after all the phases and you can kind of read the SAC code that the compiler has in between. And only out of, I don't know, the 60 phases that we have in our compilation, only the last four or five really go down to C level. And we also create CUDA code.

00:55:15 [ML]

Well, because like, why would you ever introduce pointers to something [chuckles]? If you have a representation without them, you're just immediately giving up so much ... [sentence left incomplete].

00:55:23 [SBS]

Well, at a certain point you have to, right?

00:55:27 [ML]

Oh yeah. Yeah. Eventually. I mean, because the CPU works with pointers. But yeah, delay it as long as possible.

00:55:34 [SBS]

Yes. Absolutely. Yeah. We stay functional as long as possible. And all the misery starts when our memory phase kicks in and we say: "oh, now we introduced the notion of memory." And "oh, now we have sharing and pointers and reference counting." And now we cannot do anything to the code anymore, really. Nothing really meaningful.

00:55:56 [CH]

This is an odd question, but was there any consideration of alternative names that were more indicative of the functional array ... [sentence left incomplete]? Because "Single assignment C", the first time I heard of it, I was like: "that's an array language?" Because it sounds like a kind of "C minus minus" (and obviously that's less well-known than C++) but "Single assignment C" is in the group of modified C. So I'm just curious because I know other languages like Futhark that are kind of Haskell-esque (I think they've got a hedgehog for their logo). For those that don't know, the logo for "Single assignment C" is a C with a lambda up in the corner. So the logo is very indicative that there's something functional going on, because there's a lambda there. But anyways, curious if there were alternatives that didn't make the cut.

00:56:48 [SBS]

No, not really. Yeah, it was a kind of, like, C, because you can tune the performance and it stinks performance, kind of. And it needed to be short. I was also very inspired by Sisal at the time. I don't know whether you guys are aware. It was single assignment in a streaming array language. It was a project from LLNL and Lawrence Livermore. That was also, in a way, very similar to SAC, but not rank polymorphic at all. They didn't have any of the array stuff. This came just in because I was a spoiled kid, right? I come from functional programming, so I went: "nice, it's beautiful abstractions that I can just put together." And so obviously, dealing with more down-to-earth fixed dimensional arrays was like: "yeah, no, easy ." But they also had this data parallel combinator construct. I have forgotten the way you call it; kind of a for-each construct that they had. So yeah, I thought this is a kind of a cool name. Then I thought: "okay, single assignment is indicating that it is functional, but it's also not completely." If it sounds like C, maybe they're inclined to try it and if it would have been something more esoteric, or let alone Haskell-ish, then people might think: "oh my God, I need to learn about monads and I don't want to learn about monads at all." Yeah, I don't know. It definitely has not paid off in terms of having 10 million users, but yeah, that's what it is. What I still like is this stupid insight that most likely is clear to everyone, but me: that C is actually a rather functional language. Really, the amazing thing is you throw away pointers, the notion of memory; you throw it away and it's totally functional. And if you start teaching, for example, smaller kids; you teach them programming. How do you start teaching programming? Oh yeah, you write functions. And then you pass on scalar values. Sure. And then it behaves like math. It's easy to understand. And it's all clear. And as soon as you start talking about: "oh, but now we have to allocate memory," the whole world breaks down. And that's what it is. So yeah, I don't know. If you have a better idea, we can rename. I'm open to anything.

00:59:40 [CH]

[Laughs] I do not. Maybe the ChatGPTs do, but we'll put a pin in that topic. The other question I was going to ask: you mentioned that you were working on functional languages and then you basically had the performance insight from a student mentioning C. And then it sounded like you just stumbled across Lenore Mullin's Mathematics of Arrays work. Up until then, had you been exploring the array languages? Had you heard of APL/J? Or was it just this Mathematics of Arrays? You found yourself reading it, and then that combined with C, you were off in a new direction at that point?

01:00:24 [SBS]

Yeah, it was a gradual thing to happen. So I looked at it and said: "okay, this kind of typical task parallelism [06] that the functional guys were doing at the time, I thought, is way too hard". If the sequential performance is not up to speed, I don't need to talk about speed-ups at all. It's useless. So I have to have a sequential thing that is fast enough. And then the question is, it has to be not task parallelism, because task parallelism is really kind of bad, because it means that you unfold the parallelism only gradually (exponentially) but the organizational structure is horrible. And then you have to fold the parallelism back together. So I thought: "okay, we have to do something like data parallelism." And then I heard of SISAL. They have good performance, and they look rather functional, actually. Hmmm, that is interesting, but it's not really that beautiful, and is also being done already. Then I think my PhD supervisor actually, visited a colleague in Berkeley. This colleague was, I think, in the committee of Lee Mullin. That's how her PhD landed on my desk. And he said: "oh, Bodo, I saw this stuff; I thought you are interested in data parallelism and you're not so happy with the beauty of SISAL; maybe this is more beautiful; have a look." And then I got into this, and I started reading up on this. I thought: "oh, this is actually quite interesting; this is very beautiful." And from there, I got pointers to APL. And then I started realizing: "oh, there's a relationship between this." And then I thought: "ooh, if I can do this with loopy thing, maybe I can speed up APL." So in '98 or something, I wrote like "Accelerating APL programs with compilation to stack" or something like that. And that's how I got in touch with Bob. And Bob kind of inundated me with, how should I say, the dogma of APL. Which was good. It was being showered with this over and over again. I thought: "okay, okay, okay." I started getting it. But it took me another, I don't know, a couple of years to really appreciate this approach and understand this thinking in terms of shapes and traversing ranks and these kinds of things. That was very helpful. But no, it did not come through my educational stream or something. It was more or less coincidental through the back door.

01:03:09 [CH]

I was going to say, that's probably one of the most unique stories of the path to array languages and APL. It is just basically having a PhD advisor that was on a trip that bumped into similar work and then brought it back. I would think of that as a little bit backwards. And then later on, you discovered APL.

01:03:29 [SBS]

Yeah, but you see, it was the late '80s. So the Internet was in its infancy. So you would typically have to go to a library to find things, or you have to go to a conference to get hold of the proceedings and things like that. Today it's a different story. You hear something; you just Google it, and then you have TryAPL in your browser. And there you go! But in those days, that was different. So it was interesting.

01:04:02 [CH]

Yeah, I totally wasn't thinking about that fact.

01:04:05 [SBS]

Sorry, I'm a bit of an old bloke [laughter].

01:04:10 [CH]

When we talk to folks, a lot of them, say they will have had a course that was taught in school or something on APL. But yeah, I guess I have to keep in mind that there is a pre-Internet era. [For] most of my life you could just Google stuff. But yeah, if you don't have a course in university, how does one come across APL? You basically have to bump into someone that knows about it. Otherwise, it's not like you're going to read about APL in the newspaper.

01:04:41 [SBS]

I think actually APL is more like something that had grown in the US and was more widely spread there and had made it to the UK. But [in] continental Europe, I'm not so sure. At the university that I was studying [at], there was no one who knew anything about APL. So after I had started talking about this, people started asking me, which I found funny [chuckles]. But well, today I could help out. But even here, there was nothing in the curriculum at the time. Even now, we don't teach APL here. So I think in particular, continental Europe, I don't think it's done so much. I'm not so sure about the UK, whether that is taught very often. I've been working in two UK institutions and both didn't teach any APL. So I don't know. I think it's more of an American thing, maybe.

01:05:39 [BT]

And I guess the other area that I've seen people come through from doing these interviews and the episodes we've done is that they go to work someplace that uses APL, and then they're mentored with somebody else that they sit beside, and that's how they pick it up. And that's not a way to build out your user base really quickly [laughs]. It's one-on-one for a while, and then it builds deep, but it doesn't build broad as much. I think that's changing, but I think a lot of the history of people comes through that way.

01:06:10 [SBS]

Yeah, I think that happens with many languages. I mean, that's the other thing. There's so many languages out there. And so I think it's actually very exciting to see that there [are] multiple different approaches to array language, like for example, Futhark or Accelerate. The Haskell guys [are] here from Utrecht as well. And yeah, that's exciting. So we actually submitted a paper that compares Futhark to Accelerate to APL (Dyalog) and SAC. And so we looked at a couple of applications and looked at the performance that we would get on shared memory, multi-core systems and GPU systems, from the same sources. And we also had DaCe on board, which is also quite an interesting development.

01:07:00 [CH]

Sorry, was that a different language? DaCe?

01:07:03 [SBS]

DaCe, yeah. It's an annotation language developed by, I think, the group of Torsten Hoefler at ETH Zurich. And they come from the high-performance computing thing. And the idea is really just, you annotate your code and then the compiler can parallelize it for you. But it's also based around this idea of shapes and dealing with shapes. But most of these languages are actually not rank polymorphic. Too bad.

01:07:33 [CH]

And is this paper already published or you said you just submitted it?

01:07:35 [SBS]

Well, we submitted it a couple of months ago.

01:07:38 [ML]

The preprint is out, though.

01:07:40 [CH]

Okay. The preprint is out, then we will link it in the show notes. It does sound like an interesting read.

01:07:46 [SBS]

That's true. We put it on arXiv , actually. I was just wondering: "what preprint are you talking about?" But then: "yeah, we put it on arXiv." [laughs]

01:07:55 [ML]

Didn't hallucinate it.

01:07:56 [SBS]

No.

01:07:57 [CH]

Stephen, it looked like you unmuted your mic, so I'm not sure. I could be hallucinating. But if you did, feel free to ask your question.

01:08:05 [ST]

You are not an AI and you are not hallucinating, no. I'm picking up on the discussion of languages a moment ago. On your website, you say that SAC is predominantly suited for application areas such as numerically intensive applications and signal processing. And it's that kind of high-performance ballpark that we've been talking about today. But I wonder, do you yourself use it for anything else? Should I think about making it my go-to language for general hacking?

01:08:38 [SBS]

Absolutely, you should [laughs]. No, obviously not! So as I tried to explain in the beginning, the motivation was to say: "okay, I SACrifice many things that are dear to me, like higher order functions, complicated polymorphic type systems, and things like that, for the sake of being able to get performance." Because performance was kind of a key selling point, in my view, or the point that interests me most; let's put it that way. So, we have a couple of restrictions in the language. Like [not having] higher order function. Yeah, it's painful, but okay, you can live with it. Polymorphism, yeah, it's not nice [to not have it], but okay, you can live with it. And one of the things that is not so nice is that our arrays have to be homogeneously nested. So, you can nest arrays, but we implicitly consider them higher dimensional. So, if you have a 10x10 array of 10x10 matrices, then it is implicitly a four-dimensional one by 10x10x10x10. But this only works if all the elements of the 10 by 10 outer shell have the same shape. And that is kind of painful. Because if you want to, for example, do some text processing, then you have a bit of a problem. Unless you start padding your values, which I think is a little bit kind of: "yeah, it could be done, but it's not exactly my beauty ideal." And so, the generic form of nesting that, for example, APL offers [isn't supported], and that is a major constraint. Likewise, we do have structs now. So, you can have collections of non-homogeneous items, which by the way, we compile away completely so, in the backend, they don't even exist. But this is only possible if you do not allow for them to be recursive. You cannot do something like a linked list. So, I think it is a bit painful. Another thing that we do not really support very nicely is Streams. But that is something that I hope to work on over the summer when I'm doing a sabbatical in Japan. And hopefully, we will look into infinitely dimensional arrays. So, that's something that I spent some work on a couple of years ago, talking about transfinite arrays. And then, you could have an array that has the shape omega, omega. Specificationally, this is super beautiful. And theoretically, you have all the nice properties that you want to have. But implementing is a bit of a challenge. And implementing it efficiently is an even bigger challenge. And so, we will start looking into this over the summer. I think this is some work that had been ... [sentence left incomplete]. Bob pointed me to a paper by one of the early pioneers. I forgot his name.

01:12:00 [ML]

That was Alan Perlis, maybe?

01:12:02 [SBS]

Who?

01:12:03 [ML]

Alan Perlis [07]

01:12:03 [SBS]

No.

01:12:07 [ML]

I don't think he ever did infinity in more than one direction but he wrote one paper about infinite dimensional arrays.

01:12:15 [SBS]

Yeah, I'm quite enthusiastic about that. In particular, what happens if you cannot decide how long they are, because the entire memory management [and] the kind of code that you need to generate is very different. And it's the kind of code that you would like, for example, to generate if you run things on an FPGA. Because FPGA gets its performance by streaming the data through it. So, that means if you have a normal SAC program as it is now, you need to transform it into a streaming variant. So, you have to understand the relationship between finite size arrays and infinite arrays kind of better. And I think on the theoretical level, we have understood this but on an implementation level, it's harder; way harder. So, it's not a go-to language for everything, I guess. I mean, look at the Apex compiler of Bob and you see, you can write a compiler in APL if you want to. Or Aaron's GPU code generator; the same story and it's beautiful, but it is kind of also a little bit unconventional, because the data types that you use are different. Yeah, many of my colleagues also ask, you know: "oh, when are you going to re-implement the compiler in SAC itself?" And, yeah, I don't think any time soon. Now, if somebody else wants to pick up on that, that's great [chuckles]. And I would be happy to share any insights in where not to go. But having more dynamic data types would be most likely desirable. But if I wanted to do that, I would have to spend the rest of my career on that. And I'm actually a bit more selfish, because I'm interested in high-performance code generation of parallel architectures. So, this is what I will focus on. If there's a young colleague that wants to go for this, I'm more than happy.

01:14:18 [BT]

I was going to say, that's one of the advantages of being the explorer: you get to choose where you explore.

01:14:22 [SBS]

To some degree. You also have to find your funding, right? And so, you have to be able to identify where in the spectrum you are and how Big Tech can possibly benefit from what you're doing. And that is not always easy, in particular, if you don't do mainstream stuff. That also would be great, you know, to do. I'm always trying to find a company that is (how I should say?) bold enough to use our technology for real. Because then we could potentially get a little bit of money to stabilize things more and to do more development, which is not exactly immediately academically publishable. Because the entire maintenance work is mainly done by a couple of colleagues that burn their midnight oil every now and then. And, yeah, we have fun in doing that. That is true. But it would be nice if sometimes we'd not have to worry about the next round.

01:15:28 [BT]

It's a different level of implementation challenge.

01:15:33 [ST]

Yeah, some of the code is, you know, open source. If you want to see some very dark corners in the universe, you know, you can dig through our code base. I will promise you, there are also interesting easter eggs. Comments like: "yeah, I think this should be done, but I don't have the time to do it just now." [chuckles]. Or like one of my students coming in the other day and said: "oh, Bodo, I looked at git blame and the code was written before I was born!" [laughter all around]

01:16:10 [ML]

Must have been some good code to last so long.

01:16:11 [SBS]

Well, no, it's just the code kind of does it. And, you know, don't change your running system.

01:16:20 [CH]

I was going to say, speaking of corporations that may or may not want to use SAC, where should we be sending listeners [and] potentially companies that are looking to use SAC? Is it the main landing page? I can't remember.

01:16:38 [SBS]

SAC-home.org. Yes, definitely. I know it would be nice to freshen up the web page. Or contact me directly if there's anything that they are missing because our main effort really goes in the development of the system and the advancement of the system, not necessarily in its advertisement; in its selling it. I know this is from a business perspective, a stupid move, but I think most of the contributors are just too curious. And they are more curious than trying to make it a lot of money, because it's a very difficult business. And once you start getting money for this, you also have other obligations. Things become a little bit more serious on certain aspects. I kind of would be intrigued to try that out for a while, if there's an opportunity, but it would have to be an opportunity that does not require me to make an upfront investment of a couple of years with a couple of PhD students, or ex-PhD students, and then put all the eggs into that basket. So it would have to be someone who comes along and says: "okay, I'm interested in this technology and as-is, I can use it as a starting point, but I need some security in terms of stability, and I'm happy to give you some money for that." That would be a starting point that would be great to develop this further and make this happen. But without that, I think it is not really thinkable. And people are always kind of, concerned and they then say: "oh, yeah, but will you exist in 10 years time?" Yeah, that's dangerous and difficult and I don't know. So you have to find this one customer that is really keen to use it. And then well, you know, we have a BSD-like license. So it's not that there's anything that people hide from you. I think the risk is reasonable, but it has to give them an added advantage. And so in a way, I think what could become a maybe kind of unique selling point is if our type patterns are really going to be everywhere, and this is what we are currently pushing for, so that you can actually provide all kinds of guarantees about your domains and the shapes of your arguments. And you say: "okay, these two arguments have to be in this relation, and the values have to be in this relation". And you can plug in any function [and] say: "oh, this is a precondition for this, and this is a post condition for this". And that, I think, would give people that, for example, currently go to NumPy or something like that. [08] That would give them a bit more of a guarantee and more confidence in the correctness of their code. As soon as you start playing with crazy shapes and higher dimensional shapes like this, like 10 dimensional arrays, you really want to specify what you expect. And you want to see when you break this kind of envisioned condition or the requirement that conformity that you expect when you call it. When you envision a certain conformity in the arguments or a certain value to be in a certain range. And if you call it wrongly, you want to hear about it and say: "oh, so this function actually requires the second argument to be an odd number, and I have given it an even number; oh, maybe that's not a good idea". And then I think it will greatly improve the trust that you can have on these applications. And it's also interesting research in this context. So one of my ex-PhD students is now a lecturer in Southampton, and he's very much interested in specifying these things in Agda, which is a dependently typed language and you can prove all kinds of things. And so he also has a code generator that generates SAC from his Agda specifications so then you can write an Agda program where you can have _all_ the dependently typed security in the world, and then just crank out SAC code that gives you, hopefully, the performance you hope for. But then you have static guarantees of the properties, which I think would be taking it to the next level. I also saw recently there's a paper from St. Andrews, the Idris group, which are also into dependently typed, and they try to go a little bit in that direction as well. I think with this hype around machine learning there's quite a few people that actually may want to have this. And so maybe there's an opportunity, we shall see. We shall see.

01:22:07 [CH]

And I just realized too that not just the standard library is open source on GitHub, but on GitLab, the compiler is also open source. So if folks are listening to this and either want to go look around or want to contribute (because obviously I know there's a bunch of PhD students that are doing work on this) but if folks out in the wild are interested in adding some backend or making some contribution, is it open to contributions, even if you're not doing a PhD?

01:22:35 [SBS]

Absolutely. Absolutely. Yeah. Yeah. Now we don't host it on the public GitLab server because we are kind of a little bit afraid that at a certain point, GitLab may say: "okay, if you want to continue using our service, you will have to pay". And so we run a GitLab instance on a service where we pay a couple of Euros every month. So you can subscribe to gitlab.SAC-home.org , and then let me know, because I have to kind of approve this thing. There are so many spam attempts that I typically tell everybody, if you really want this, drop me an email. And then I give you access. You can just fork it off and create your own branch. You can create a merge request. You can raise issues; anything. And we try to even react on issues and fix bugs if they come along, or we have also some discussions. I think the access is anyways open. So you can actually look at the sources straight away and play around with them and scavenge for our Easter eggs.

01:23:50 [CH]

Awesome. Well, we will make sure to leave links to the SAC-home.org website [and] the GitLab. Yeah, I had poked around and I didn't realize that the compiler was just on a different site at gitlab.SAC-home.org. And also to the SACpace GitHub page, which has the standard library and demos and stuff like that.

01:24:11 [SBS]

Yes. And that's just bog standard. The standard library and all that is just on the standard GitLab thing, because we do all the CI/CD stuff on our own GitLab server, because that's our main concern. We have typically like 10 [to] 15 merge requests that are open and we run pipelines that run for several hours on, like, 10 different platforms. And if we would have to pay for that on the GitHub [or] GitLab server, then we would very quickly run out of money. So that's why we say: "no, no, no, no, no, we run this on our own service". That's fine. But yeah, the standard library is just on GitHub.

01:25:02 [CH]

Well, I think we will say, thank you so much for spending your time with us. This has been a blast. It's been awesome to learn. I mean, I've always been a SAC curious. As I mentioned two or three years ago, I dove into it a little bit and I think I wrote, I don't know, maybe a 100 or 200 lines of SAC code and just many examples. And yeah, it's super awesome to have you on the podcast and learn more about it. We will say that if you have questions for Bodo or for us, or if you've got comments, feel free to reach us at ...

01:25:31 [BT]

contact@arraycast.com. [09] And I started this off by saying, I was worried to have my mind expanded, and I certainly have. That's what stretching your mind feels like, I guess [laughs]. There are times of confusion; there are times of recognition. I really enjoyed this. This is really interesting to look at and especially the rank polymorphism. I think that's a great example of how it can be used and make a real strong case for it. And I imagine there may be languages that don't have it that now are wondering about that decision, because it does seem to me to be a very strong thing that you can do when you can break a problem up like that recursively. Anyway, thanks so much as well for coming on. And that's it. contact@arraycast.com if you want to get in touch with us.

01:26:18 [SBS]

Yeah, thank you so much for having me. It's been an absolute pleasure and I could keep chatting to you guys forever. And the same holds for me; anyone who wants to contact me. Feel free to do so. I can be easily found either on the SAC homepage (www.SAC-home.org), or just directly; Google my name and you'll find me. Drop me an email. I'm always happy to have this kind of chat or any other chat. And I would be keen to have new collaborators contributing to the project. Sorry for the plug. I couldn't resist!

01:26:55 [CH]

[Laughs] That's what we're here for. That's what we're here for. Thanks. Yeah, once again, this has been awesome. And with that, we'll say, Happy Array Programming.

01:27:05 [Everyone]

Happy Array Programming!

01:27:06 [Music]