Transcript

Transcript prepared by Bob Therriault, Adám Brudzewsky, Sanjay Cherian and Igor Kim.
[ ] reference numbers refer to Show Notes

00:00:00 [Adám Brudzewsky]

You should make Bob say happy array programming to add it to the stack.

00:00:03 [Bob Therriault]

Oh, that's true. Can you say happy array programming, Bob?

00:00:07 [Robert Bernecky]

Can I say happy array programming.

00:00:08 [AB]

Yes he can.

[music]

00:00:21 [Conor Hoekstra]

Welcome to another episode of Arraycast. I'm your host, Conor, and today with me, We have three of our panelists and a special guest who we will get to introducing in a couple minutes. And I think first we're going to go around and do brief introductions. So we'll start with Bob. We'll then go to Marshall and then we'll go to Adám.

00:00:39 [BT]

I'm Bob Therriault, and I am a J enthusiast and still working on that J Wiki, which is an immense beast of a thing.

00:00:47 [AB]

I'm Adám Brudzewsky and I do APL all the time.

00:00:50 [ML]

I'm Marshall Lochbaum. I've been a J programmer and a Dyalog developer and now I work on BQN.

00:00:55 [CH]

And as mentioned before, my name's Conor. I am a polyglot programmer/research scientist/array language enthusiast and love having these discussions about array languages. So I think before we get to introducing today's guest, we've got four announcements all from Adám, so I'll throw it over to him and then we'll get into our conversation today.

00:01:16 [AB]

Okay, so the first thing is, [01] it's been a long time since we had an APL show episode, but Richard and I finally got our act together and recorded one, and it's a special one 'cause it's the first one where we have a guest. So head over to APL.show to listen to that. And then there is an upcoming conference, Array 2023. And if you're listening to this, have been listening to this podcast for a while, You might remember somebody named Rodrigo, who has been here occasionally. And he together with Aaron Hsu wrote a paper called "UNN CNN in APL, Exploring Zero Framework, Zero Library Machine Learning". That's been published and you can go and have a look at that. And finally, June the 19th is going to be a little bit of an interesting APL day because on the opposite sides of the globe, there are meetups. But don't worry, they don't clash with each other because of time zones. Time zones are awesome. Hey, so let's see if I get this right. First, there is the Tokyo APLJK meetup, which is at 10:30 UTC. And then there is the Northern California APL ACM meetup and there will be a presentation by one of my colleagues on data input output and that is at 17:00 UTC. So by the time the sun goes all the way over there, they will be doing APL too.

00:03:00 [CH]

With all of that out of the way, we will introduce today's special guest who is Robert Bernecky, [02] who I think more commonly goes by Bob Bernecky, but to disambiguate, we will refer to him as Robert today. And most recently, I think, Robert, you've been working at Snake Island Research, which is a research company, and I think, at least I know personally, one of the things that you've been working on over the years is the APEX compiler, which I'm sure we're gonna talk about today. But going back all the way to, I think the '70s, maybe even the '60s, you worked at IP Sharp and Associates, which some of our past guests have worked for there, for that company as well. And I think you actually worked at IPSA until they were acquired by Reuters, if I'm not mistaken. But that's all I'll say. I'll throw it over to you to introduce yourself and fill in the gaps of anything I've missed and then we can sort of go from there 'cause I'm sure there's 10 or 20 different things we can ask you about and talk about today.

00:03:58 [RB]

Well, Robert Bernecky, I am commonly known as Bob, but so is my cousin who's another Bob Bernecky, so. And he works in higher math radar stuff so he's got papers out on technical papers as well. So that's why I went back to my real name, is it makes life easier for some. But as I said, once people get to know me, it's usually just Bernecky, just like my sister in the theater. And the Murphy Brown character is also Bernecky. Because Diane English worked with both my sister and I in the theater, and then she stole my name, or our name.But that's okay. I learned programming in high school. I was at a hashtag in Buffalo. It's a technical college prep type high school. And at the time, this was in, let's say, let's consider 1962, the National Science Foundation gave grants to four high schools across the US to put computers in, put a computer in the high school. And these were seriously giant brain machines that are IBM 1620s that with a massive 20,000 characters of DRAM. And no, you don't need a hard drive and don't, you know, don't need tapes. And the output is either punch cards or a flailing arm typewriter. And that's how it all started for me. After that, I got a job at Roswell Park Cancer Hospital in Buffalo, and the first thing that I, with a colleague, a minor from the name Steve Dilley, did was to produce a PERT implementation for the hospital people, because they were building a new building, a new research building, a cell and virus building, and PERT had just been invented by people working on Polaris submarines. And so we built one of those. We built an implementation of that to help control and monitor the construction of that building. So that was my first program. I was a junior in high school at the time. From there, went out to Caltech. And at Caltech, they had an introductory programming project for people. And I said, "Well, look, I already know all this stuff, let's do something interesting. So I ended up writing an N-body model of the solar system. Good physics, good stuff for Caltech, and it was a lot of fun. Came back to Buffalo because I didn't have any money, and worked as they had various jobs in doing computerly operating system things there. Moved to Toronto, got a horrible job that you can learn about. If you Google "Bernecky Zoo Story" videos,[03] there's one on dialogue that I - a talk I gave on my early days at IP Sharp, where I covered that. When I got to IP Sharp, I became - I was hired to do ostensibly - nobody at Sharp ever had job - well, very few people had job There was Ian Sharpe and there was everybody else. A very flat management model. I got hired to do, to brown-thumb the IBS COBOL compiler that was just on its, trying to get out the door. And in order to create models of things like data, corursions, and things like, oh I've got to change, you know, these roman numerals to, you know, floating point, cuneiform, or some other, some other, you know, COBOL data types. I wrote, I learned APL using the Sharp APL system and was having trouble, this being my first APL program ever, it was running really slow. And so I asked Roger Moore, who I was working for at the time, Roger's one of the people who created APL 360 for IBM. Roger was a very bright man, very extremely creative, and I had the fortune to work for him, I had worked with him, nobody asked the court for him, he was aware that we were doing projects together. And at any rate, so I said, "Roger, I got troubles here." And we look at this thing and we keep looking at - so he took a look at my program for a couple minutes and it's always stuck under this dyadic iota thing, set index of primitive, and he said, "Well, maybe it's doing this stupid, you know, n-squared algorithm. And I said, "Well, yeah, that's what I thought. Now what do we do. "And he said, "Oh, can you drag me off and look at the source code. " And we looked at it and sure enough that's what I was doing. And he said, "Okay, fix it. " And so at that point, I became an APL emitter. And that led to the first technical paper that I ever wrote on high performance computation which was, which I gave at the APL Congress in Copenhagen 1973 and it's been downhill since then. I worked for Sharp for 19 years and you know brilliant, brilliant people. I mean yeah Ian was an incredible man and I can't say enough good things about him. Roger, as I said, was worlds above. But then Ken Iverson came to work for us. And so we worked together for many years on design and implementation issues, mostly mostly designed with Ken. And it went on that way until Reuters came along and discovered how to destroy a large company, a successful company, really quickly.

00:09:51 [AB]

You said you're working on design together with Ken. That meant the language, the core language design.

00:09:57 [RB]

Yeah, we got a number of papers out there and other stuff has gotten lost over the ages. A lot of ideas that what we discarded. And that's part of the part of the adventure is like, look at this. Yeah, but it's ugly. We can't do that. Very good.

00:10:13 [ML]

Well, and correct me if I'm wrong, but you you invented replicate during this time, [04] right this time, right. As an extension to Compress. So, yeah, there's an important bit of APL history there.

00:10:25 [CH]

I was literally just the lightning talk that I gave two days ago at a conference called Lambda Days. I showed like C++ code, Haskell code of like filtering odd numbers. And then and then I was showing BQN as the third language, but I mean, the replicate from BQN is taken directly from APL. And that was, the point of that lightning talk was like, Haskell's way more beautiful than C++, but then look at the difference of the shape of the solution in an array language, because we don't have a filter function that takes a predicate. We have a binary function that takes a mask in the sequence, which is just like food for thought. Anyway, so look at that. How many years ago or decades ago was that that it's influencing a talk that I just gave two days ago in 2023.

00:11:21 [AB]

But generally that's compress you're using. Replicate is much rarer to use.

00:11:27 [CH]

Oh yeah, I guess that's true. But I refer to that function as replicate because compress is just a special case of replicate, yeah.

00:11:37 [ML]

Yeah, well compress is a special case. It's just that nobody realized this was a natural way to look at it for quite a while. And I think that's one of the huge strengths of a Booleans in APL actually being zero and one is that once you say Booleans are numbers, like I don't think this would have happened if compressed had taken true and false. Once you take zero and one, you start thinking, well, what about other numbers. So that's, it's really cool that APL allows this sort of extension, even if it is not an easy thing to come up with.

00:12:10 [AB]

We had the same thing going on with partitioned enclose, a Dyalog. For many, many years, it's been taking a boolean left arguments, left argument with a one indicating where we start a new segment in the data we're partitioning and then extending that to say, no, it's not a true false, whether or not we start a partition here. It's how many segments do we start here.

00:12:35 [RB]

Yes, you can make those extensions. And one of the benefits of APL is that it does give one the opportunity to think about such extensions. I'm not convinced in general that the flexibility, we'll call it here, of these extensions means that just because we can doesn't mean we should.

00:12:59 [CH]

So in other words, like you're not sure it's a net positive at the end of the day.

00:13:04 [RB]

I'm talking about the partitioned enclosed thing.00:13:07 [ML]

The principle I use, and I know this is not widely accepted, it's not that if you can you should, but if you can do it that way and you can't possibly do it any other way, you probably should. Which that's roughly the grounds that I'm convinced about the, of course, the replicate but also the partitioned enclosed thing.

00:13:30 [RB]

Yeah, I think I'd put the word elegant in there somewhere. I mean, you can always do it some other way.

00:13:37 [ML]

I think elegance just falls out. I've never seen something that was, that I remember, that was, you know, the only possible extension that wasn't also elegant in its own way.

00:13:47 [AB]

Oh, but today's replicate, I'm not sure if Robert came up with this or not, but it's not just extended to positive numbers, but even to negative numbers which insert that, which replace the corresponding with the absolute value of the given left argument element number of prototypical elements instead. So this may be complicated, but let's say we're, yeah, sorry, let's give an example of this. Let's say we have the data A, B, and C, just letters A, B, and C, and we do a one, two, three replicate on that. So then we get one A, two Bs, and three Cs. Now, and there's an extension to replicate, to replicate, which is that if you type 1, negative 2, 3, replicate ABC, then instead of getting two Bs, you get two prototypical Bs, which would be spaces. So you get A, space, space, C, C, C.

00:14:49 [CH]

What does prototypical mean here?

00:14:50 [AB]

Meaning it's the type of that. So numbers become zero, characters become spaces, and nested things become built up of those.

00:15:02 [ML]

I think the arrays prototype is used instead of the type of that particular element. But as you can see already, there are two different ways to do it, which in my mind is not good.

00:15:13 [AB]

No, but that's why I'm questioning saying.

00:15:15 [ML]

And I don't think this extension was made at Sharp. I think it was an APL2 or maybe an ARS thing.

00:15:22 [RB]

I think if we go back and look at the kind of problem that sort of primitives likes to solve. If you go back and look at there's a book that was written I think around 1962 by an Alberta school guy called "A Programming Language" and in there Ken Iverson describes, [05] well basically what he would do is take two, I think was a conjunction because you had a Boolean mask and then two arguments and basically the where the mask was zero it would select corresponding elements from one side from one or one of the other arguments and where it was one.

00:16:08 [AB]

That's mesh, right.

00:16:11 [RB]

Oh, there's mesh and mask in there. I think mask is the one I'm describing here, but the basically there's two of them. The one one requires that all three arguments be the same shape.

00:16:21 [ML]

Yeah, that that should be mask.

00:16:22 [RB]

Yeah, I'm pretty sure that's a mask. And then Mesh requires that the one argument has as many elements as the shape of the sum of the number of zeros in the one in the other argument, and the other requires that it match the number of ones. And they're both quite powerful and very simple and I would say probably do 90% of what you need. So I ended up at Sharp for 19 years before Reuters and I parted, or Sharp and I parted company, thanks to Reuters deciding they didn't like, well they didn't like much of anything. But that's enough, that's a tale for another day. One of the things that drove me nuts at Sharp, working on the design of mostly of interpreters, was that we always were looking at primitives. There was very little work going on. How can we make this language be hand coded language X, scalar languages. And so very little work on compilation technologies or other technologies that would give us high performance at the application level. Now Roger Moore and Larry Breed, bless his soul, [06] both Grace Marie Hopper award winners for their work on APL360. When I first started at Sharp, Larry came up and we were in Toronto and I was watching over their shoulders but what they were doing is they let's take this tiny tiny we're running 48k workspaces at sharp that's and that was that was roomy that was up from the 32 or so 36 that that was the default and they said well let's take scalar functions because they're simple and write a JIT compiler for it so it's it would see A plus B, and the way the interpreter, which had to fit into this shoehorn, tiny little bit of memory on the machines, or 256K, 512K. I mean, we ran a 100-user timesharing system on 384K machine, so that includes operating system, APL interpreter, plus all the user stuff. So, space was tight. But what Roger and Larry Breed did was to basically take the interesting parts of of a of an inner loop for scalar functions and compile into the workspace itself a little bit of code that would actually do the implementation of that. So instead of having a function call to fetch an L left argument element, a function call to fetch a right argument element, a function call to actually do the A plus B, and then a function call to store result element, and then a function call to do loop closure. They put that all in line and got about a factor of five speed up on reasonable size arrays, let's say, you know, half a dozen elements or more. And at the end of execution of that A+B, they would discard that because there's no more room in a lifeboat to hang on to that stuff. So that was the first instance that I know, I'm aware of, of JIT compilation, at least in the APL world.

00:20:13 [CH]

I'm trying to think, like, to date, what are the different compilation initiatives. Because I think this has come up on a previous episode. Like, Like I know, Co-dfns obviously, we mentioned Aaron Hsu [07] earlier in the announcements. He has his GPU parallel compiler. When we talked to Charles Henriksen, he talked about the history of what at one point I think was called APL tail.

00:20:39 [ML]

APL tail, typed array intermediate language.

00:20:43 [CH]

Yeah, exactly. And that initially was like funded work to try and accelerate dialogue APL, but it ended up kind of just evolving into a parallel array language called Futhark, which was initially intended to be an IR, but it ended up being the Copenhagen. Yeah, exactly. And then I guess I'm not sure if a Single Assignment C, if that counts as like, 'cause I know that sort of bridges into your work with Apex, I think.

00:21:16 [RB]

Little bit of history there. I said the scalar function compiler was the first evidence that I saw of JIT work or a JIT compiler specifically for APL. That technology I used it and everybody else was working in the group. Any time we rewrote or redesigned an APL primitive or of any sort, we tended to include appropriate JIT code in there just because it made it go faster. And particularly once we got once the rank conjunction was in place, that kind of stuff, the JITing stuff would give you that's where you got your factors of 500 or 1000 speedup. So you know it was definitely worth. I mean but we're still talking with this within a single primitive, you know, foo rank 0 1 of Omega. But if you had two of those, there are three of them in a row, in a scalar language, you would just naturally inline this stuff as you run when that's unavailable in an interpreter because we didn't have, in the dark ages, we didn't have room to scroll this stuff. Even the last mainframe version of of Sharp APL, there were architectural limits on workspace size of 16 meg. So it just made things difficult. But let me talk a bit about APL compiler. If you look at my master's thesis, if you go, which is at snakeisland.com/ms/pdf. [08]

00:22:53 [CH]

And we will link this for listeners in the description.

00:22:56 [RB]

And one of the things I discuss in there is other work on APL compilers. In terms of what was happening with compilers early on, there was Clark Wiedmann at EMS had a compiler. Our competition, Scientific Time Sharing, aka STFC, aka APL2000, aka Manugistics, they developed a compiler as well for their mainframe product.

00:23:32 [AB]

My father told me, because I grew up with APL Plus, so that's their APL, and my father told me that even that PC version of it, or at least APL Plus 2, had some compilation features below the hood that you couldn't at least directly observe. There might have been a way to ask it about it, but where it would detect loops and compile them on the second or third run or something like that, see that you're running a tight loop and they would compile just that piece.

00:24:05 [BB]

I believe that EPL two would do that.

00:24:07 [AB]

OK.

00:24:07 [BB]

I don't think the Sharp APL, which was APL PLUS back before they moved to APL 2, yes, architecturally it was difficult. So I'm not aware of anything they did. They did have a way to compile code and call it directly once they were running under a VM, which I think may have been VSAPL or APL2. But these were all, they're glue on things. Anyway, Clark Wiedmann at UMass had a compiler. Mike Jenkins has worked on several at Queens. Jim Wiegang did a lot of work in the STSC compiler. Timothy Budd, he had one of the early APL compilers. I think Lenora Mullen did some work on stuff similar to that. Most of her major contributions I believe were in figuring out how to do function composition on a - for arrays and her doctoral dissertation covers that fairly well, I think.

00:25:11 [BT]

And that's mathematics of arrays, isn't it?

00:25:13 [BB]

Yeah, mathematics of arrays, PhD, C thesis, Syracuse, 88, Lenoir, M, Restifo, Mullin. On Wai-Mee Ching has an APL370 compiler that sort of dates from the late early 80s, late 80s, early 90s. Tim Budd, APL compiler for the UNIX timesharing system, say APL-QoQua, March 1983. And I think those are the main things, and that sort of covers, I think, the first go-round on those. Then when I left IP Sharp and set up Snake Island Research, I decided it was time to go back and look at how to compile. And somewhere along the line, I'd been going to Super-Giving and Conferences, and I met a couple of people who were working on a project called SISAL. [09] Steven Fitzgerald, John Fayo, and a few other people, mostly from Lawrence Livermore, and somebody from U Colorado Boulder. And SISAL stole a lot of ideas from APL, and it made some very interesting contributions to functional array language design and implementation. For example, they had functional control structures. So loops were functional, and conditional expressions were functional. And this turns out to both make comprehension of code much simpler, and it also turns out to make compilation of that code much simpler. Similarly, about the same time, there's some people at IBM. Oh, Ron Cytron, there we go. POPAL, APL, ACM Symposium on principles of programming language. Cytron, Jane Ferrante, Barry Rosen, Mark Wegman, and F. Kenneth ZadeckSISAL. Inefficient method for computing static single assignment form. That's January 1989. Now, what static single assignment is brilliant, a brilliant idea. It basically takes everything that you've wanted to do it's okay let's let's compile some APL and then you see something like i gets 23 and then two lines or i gets a out of five then two lines further down it says i gets quote abcd. Now under traditional compiler this goes oh man and so you end up you end up compute having to compute live ranges and oh what's the type of this variable name on at this point in time and what what static single assignment does is dead simple. It basically renames everything so that any variable appears on the left side of assignment once. And if all you got is the basic block of code, you know, that has no funny control stuff going on of loops or conditionals or anything, that's trivial to do. But it does get tricky when you introduce loops and things like that, but it makes compilation of efficient code trivial. And if you look at compilation papers of the day that deal with things like, oh, the live ranges of, you know, when does this variable become alive, when does it become dead, and so on, it's horrible. With static single assignment, once you convert, if your compiler operates a static single assignment form and you start off by converting the program you want to compile into static single assignment form, everything gets dead easy. For example, the value error can all be detected statically, period, end of story. Better than that, you know exactly when a variable becomes dead and can be deallocated. And that's just sort of two of the simplest aspects of it. So, you know, it's good stuff. And IBM Research is to be credited for, and Cytron and company in particular, are to be credited for brilliant insights. And one of those it just makes life much easier.

00:29:08 [ML]

And to give some context to that, I mean this is now used in, I know it's used in GCC and Clang, I figure it's used in pretty much every modern compiler now, so it's widely accepted technology these days.

00:29:19 [BB]

It's used in every modern component written since around 1990, and even old compilers written around 1990.

00:29:24 [BT]

And the and the price that you're paying for that is just the extra space for creating these new variables with the, you know essentially not replacing space but just creating new space for those variables.

00:29:35 [BB]

Well, in fact, you may be creating. I mean, this is all going on compile time. Static single assignment is strictly a compile time deal. As I said, you know exactly the range of this variable's existence. You know when it comes into existence, and you know when there are no more references to it. And so, knowing it's a dead, any advantage that you might have gotten from playing games with, you know, namespace, etc., is a non-issue because it's already solved and you don't have to think about it. So it's up to whatever memory manager you're using to figure out how to do that. Memory manager may say, "Oh, looky, I know we deallocated this thing, but I'm just going to reuse this space." That's a runtime issue that has nothing to do with the problem of the source code.

00:30:23 [ML]

And generally when you're working with, like, if your your compiler is optimizing or whatever, and it's working with this SSA code, the variable names are not represented as strings, they're represented just as numbers. And what you can do also is just number your variables in the order they appear. So when you define each variable, you don't even have to give its number, you just say, make a new variable, and then everything else refers to them in the order that they're given. So it's pretty compact in if you use it as an intermediate representation to

00:30:54 [BB]

Yeah, nobody looks, nobody stores the identifiers as identifiers in any mode. You'll have them like somewhere else if you need them at runtime. There's always a symbol table. Yeah. I was talking about SISAL and I got sidetracked, sorry. Now, SISAL is an interesting acronym. It stands for Streams and Iterations in a Single Assignment Language. and the semantics of SISAL, the source code is single assignment. So they realize the benefits of single assignment and like with SaC, which is single assignment C, both languages have single assignment in their name but neither one of them actually requires that your code, the source code, be written as single assignment. So it's a frill. You know, it's a compilation of compiler technology that has nothing to do with how you actually write programs. And it shouldn't. You know, that's the kind of stuff that should be buried, just like memory management should be buried, and allocation deallocation should be buried, and it should be out of, and inter-process communication should be buried. None of these things should appear in source code. All they do is cause problems and bugs. Anyway, so I'd run into these guys. I was at some super meeting conference and I ran into Feo and company and they were showing off some of their the stuff they could do with with this with SISAL and I said this is interesting so I started I started was partly through design and implementation of an APL compiler but there were two things that happened one was the static single assignment stuff which let me get rid of all kinds of bad things in the code in the in the compiler and the other was that uh SISAL was getting very good performance, I would have a lot of relatively simple optimizations, like notably loop fusion. And if you think about, and if you want a good example of this, there's a signal processing benchmark in my master's, it's called LogD or variants on LogD. And what those do is it's take a signal, do a first difference audit to compute the dels. It's measuring digitized microphone data and computes the differences, scales it, clamps high and low values, and does some magic there. And it's really simple but doesn't run, when APL runs about as fast you say, "Oh okay, well I'm going to take all these things and I'm going to do the first difference and then I'm going to clamp the high and low values and scale them somehow." And if you look at the code that SISAL was generating, it basically took this collection of first difference and then all these scalar functions. What that showed, I think you can see it even in the compilers that stood in my master's thesis, was that every time you're able to fuse two adjacent primitives, or even non-adjacent primitives, you effectively double the speed compared to an interpreter.Because, and again back to section 2.1, section 2.1 is why are APL interpreters slow. And then I show Figure 2.1 gives a distribution of APL interpreter CPU time. Now that's for one selected Sharp APL benchmark called, or not a benchmark application, called Blend, which is used for borrowing and lending of securities. That was extremely well designed and extremely well implemented by the APL programmers who wrote it. And as a result, that spends nearly half of its time in the execute phase. But then syntax analysis is 16%, conformability checks, you know, are these two arrays the same shape, is 22%, memory management is 13%. And I think that's about as good as it tends to get even in today's interpreters. Clark Weidmann did some work at UMass and he was looking at 13 different benchmarks. And he shows basically similar things. In his examples, the syntax analysis is 50%, typically around 50% of CPU time. Conformability checks are up close, say 15 to 20%. Memory management is just over 20%. And execute occasionally hops over 20%, but it's usually down around 10%. So we did similar things at Sharp, and the results are always the same. So let's say you do all you can to improve the speed of a single primitive. The best you're going to do is maybe a factor of two across the interpreter. Whereas, because you can't get rid of, and I'm talking just, I'm not talking JIT now, I'm just talking straight naive compiler, you're not going to get rid of syntax analysis, you're not going to get rid of conformed middle checks. And it's only rarely that you can do anything about memory management due to things like saying, "Oh, this is a temp and I can do this next operation in place." But if you combine two primitives, all of a sudden effectively what you've done is eliminate all those overhead operations, syntax analysis, conformance checks, and memory management for whatever primitive you're able to eliminate by embedding it in its elemental computation or cellular computations in the primitive. And so you do three, put three primitives together while you just made that application three times as fast and so on. So loop fusion in all its glory is really good stuff. And SaC, no Sven-Bodo Scholz [10] and Clemens Grelck and a bunch of other notables have a research compiler called SaC, Single Assignancy, and it basically is a functional subset of APL with a functional in the sense that you can pass array arguments by value to a function and you can get back a bunch of array results by value. And the language is purely functional, you don't have to write it in a functional style, but the first thing it does, of course, is convert it to static single assignment form and preserve that through the compiler. And the magic, the most magic thing about SaCis something that I think was created by Bodo and and Clemens Grelckin, I'm not sure where the brainstorm arose, but they have something called width-loop folding. And width-loop folding is an optimization that sort of think of loop fusion on steroids. The with-loop is a SaCconstruct that lets you describe creation of an array piecewise. So for example, if you write A plus B in SaC, look at that, it's A plus B. But you can also do it element-wise and it'll generate essentially the same code as a primitive would generate. The place where it starts to be magic is something like, think of, oh, I'm going to do a rent to filter on image data and I'm going to look at, let's say, a dust mark or eliminate the dust motes in this image, and what you do is you compute each three by three, take a three by three subsets of it and compare those to their neighbors and where they're set each element to the average of its neighbors or something, and the dust tends to disappear. But what tends to happen on that kind of thing is you have to treat the edges of the image differently than the middle part. You know, the middle part is like, "Oh look, I can just take this and shift this way, this way, this way, this way." But what you do with the edges, where instead of having, say, nine elements to look at, you've only got six or four. What SaCdoes, the SaCwith loops lets you describe the treatment of the edges or different pieces, sub-arrays, of the resulting array differently, but they're all subject to the same array with loop folding optimization. And it's pretty spectacular. Something where you end up writing a 3D relaxation that you might use in heat distribution and in physics. If you write that naive APL, you probably end up with something like 30 pieces of code that deal with, "Okay, well here I'm going to do the main part and then I'm going to extract this part and glue that in. You know, as you're gluing in all these corners and edges and things that it just doesn't look very pleasant. And what Boto's stuff you would, if you translated that into SAK, such as what Apex does, you would take those 20-some pieces of code and it would glue them all together and you end up with one wiffle. Now it would have inside it, it would have all these pieces but they're glued together in a functional, fully data parallel matter and there's only one array allocated to do all this stuff. So it's not like having to go and say, "Okay, well I'm going to extract the left-hand column from this array and allocate space for it, and then this other one, and then I'm going to allocate the final result and glue these pieces in." It all gets built once, and so you end up with one allocation instead of a 20-20 couple.

00:40:56 [BT]

So that for-width loop sort of ends up as a bit of an abstraction. Basically it provides a layer that does all that. You don't have to be concerned with how it does it so much.

00:41:06 [BB]

That's correct. And the the magic about that, or one bit of magic about it, is that with-- SaCstarted off looking suspiciously like APL. And it had primitives in it to do, "Oh, I'm going to do a rotate. I'm going to do take and drop. And then a somewhere along the line, I think it was Bodo, but you'd have to talk to him and Clemens and maybe Stefan Herr who would have bought it, looked at it and of course what would happen is, and they'd had they had with loop folding at this point, and but it was, you know, a shiny new kid on the block and so you'd write this code, you'd take some relatively honking benchmark of some some sort of application with some sort. And it would just optimize the piss out of it. It would be wonderful. But what you would see is it would go zoom and they would say, oh, look, a rotate. And then it would call this library routine to do a rotate and then, (mumbles) execute more highly optimized code. And then it would say, oh, here's a tick and a drop. And so it starts to sound suspiciously like an APL interpreter doesn't except for the, without the optimizations. And so what they tried is, they said, okay, well, I know we got these take, drop, rotate, all these primitives in the language. Let's just pretend they don't exist. And we'll create a standard library that includes the primitive definitions, you know, take, drop, rotate, all those things in SAT code using with loops. And they're not compiled. They're just like in just like a standard library there include files and so all of a sudden you take this code and it's take drop rotate things all the all those are effectively in well they aren't in source source code again and they're exposed to the compiler and so it's able to apply optimizations across the board and there is you know there's no stuff about getting copying your array off to some library function that probably doesn't work anymore. And it's, the results are impressive.

00:43:27 [BT]

So you you've got at that point your primitives expressed in single assignment C and then when you do your optimization, it can take that single assignment C and optimize and just clump it all together.

00:43:39 [BB]

Yeah. And the thing, and this is where the tyranny of the implementer is finally destroyed. And you know, for decades now, APL programmers have been at the mercy of the language designer and implementer who says, "It's going to work this way. If you don't like it too bad." I think Adám's comments about the definition of the function, sorry, enclose, partitioned enclose is a good example of this. It's like, you get, here's the flavor of function we're going to give you and that's all we're going to get, if you don't like it too bad, you'll often do it by hand. Whereas with single assignment C, you don't like it, well, you write your own version, you pick up their version of the standard library, that element of the standard library, and you write your own. I wouldn't recommend giving it the same name because we all know that way lies madness, but you create your own tiny standard library and you use that instead and you will get performance identical to the best performing thing again because it's all exposed to the optimizer. Good stuff. I recommend… take a good look at single assignment C. It's got some good stuff in it.

00:44:56 [CH]

So this is… I mean, I've been just absorbing everything you've been saying, and this is all pretty awesome, because, I mean, Marshall will know. I've made this argument many times on this podcast before of that I'm surprised that this is not implemented in more of the popular array languages that, like for instance, I think if you do some simple expression like a two times one plus some array, you end up necessarily, if I'm not mistaken, inside of Dyalog, J, and BQN, a copy or like an allocation each time you do those two scalar operations of two times in one plus. And Adám, Marshall, Bob, you can all correct me if I'm incorrect about that. And actually tying into this is I believe it was Oleg who I was talking to at KXCon who was on the KX Core team. And I didn't actually – Oleg is just an individual that works for KX on the q executable. [11] And he told me to talk to Pierre, who's another person that works on the KX core team. And Oleg said it wasn't his sort of. I'm not even sure if any work has been done, but Oleg says that Pierre likes the idea of having multiple types in an array language, because right now, in Q, it's the same thing. if you do like a two times one plus multidimensional array, you get a copy each time or an allocation each time, you do each of those scalar operations. But if you had some secondary array-like type that wasn't exactly array, but it was some kind of stream that you knew you technically didn't need to materialize a new array each time and you could fuse things together, which is commonly known as stream fusion in other languages and libraries. you could get these kinds of performance increases. Anyways, ramble, ramble, ramble, monologue over. My question is, so it sounds like SISL, Stream Iteration, Single Assignment Language, Single Assignment C, SaC, and Apex are three technologies that implement this. How come it hasn't, or maybe there's no answer and it's only opinion after this, but how come this idea hasn't been more widely adopted in like the sort of more popular array languages, if you will. Is there like a reason for it or is it just it hasn't caught on.

00:47:39 [BB]

This is all, it's all about fashion. It's just not fashionable. You know, like in the last year, AI has become fashionable. It hasn't changed a whole lot in that time. There have been some, you know, point your finger benchmarks and things like that. And I took a course from Geoff Hinton, [12] who is one of our more recent Turing Award winners at U of T on neural networks. So that was, that would have been the 90s because I was working on my master's thesis on it. So that, you know, that stuff, the fundamental ideas, you know, were there then. Certainly a lot of it is Jeff Hinton's work. But the idea is the same. And so here we are, and it only took about 30 years to get here. So you know, these overnight sensations are overnight sensations if you wait enough nights.

00:48:37 [CH]

I mean in uh in the uh, like AI summer, if you will's defense at the time that the like AI papers were being published like the compute necessary to actually like, you know, run neural networks and convolutional neural networks and stuff didn't really exist. Then in the late '90s and early 2000s, the compute was being developed by companies like Nvidia, who I happened to work for. Not back then. I was in elementary school back then. That did contribute a little bit, whereas I'm not sure if there's a reason why it's just that this wasn't the way things were done back in the day when array languages were first being implemented.

00:49:32 [ML]

Well, you can't just run the APL code directly through one of these compilers, right. There's, I guess, APLTAIL is the closest to doing that, but they've all got restrictions, including Co-dfns. And that's an ahead of it or well Co-dfns is is a fairly different model but I mean I think a lot of the problem is that you can't you have to write things like maybe types maybe shape declarations and so on that for the compiler to handle it and in my view the performance of the language is not important enough for most people to justify doing that. They say, "Well, I'll use the easiest APL, the one that works most nicely, and I won't worry about performance that other approaches might have. About performance that other approaches might have.

00:50:23 [BB]

First of all, back to one of Bob's, sorry, Conor's comments.You said two times one plus some rank three thing, and the implication I got, if I understand correctly, what you're talking about with KX is that that appears in the source code and if so whatever whatever magic you need it doesn't belong there.

00:50:48 [CH]

No sorry I didn't mean to imply anything about like KX's implementation I just I think k fits in sorry q fits into the same group of J dialogue APL and BQN in that like two allocations happen in the background, whereas like you could technically have a compiler or even like a sophisticated enough interpreter that could like fuse those by just recognizing that you have basically two scalar operations together.

00:51:18 [RB]

You can get APL\360 (which predates my joining Sharp). [13] It's something like your "two plus one times" or "two times one plus rank", a rank 3 thingy, that would all ... [sentence left incomplete]. Assuming that rank three was of appropriate type, like integer better (if was boolean we're in trouble). But if it was integer and had enough room to store the result, it would just do it in place unless you'd given rank 3 a name, in which case it couldn't do it in place. But once you do the one plus, there's an unnamed temp and then two times it's going to do it. So you know, existing API interpreters are fairly good at reusing temps when possible. But all these things (the idea of say: "oh, here's a temp; I'll do it in place instead of allocating space for the result" and so on), that takes time, not in the inner loop, but at the interpreter level loop, when it's going to say: "oh, here I'm going to execute this plus primitive. Oh is it working on a temp? yes/no". That decision in any naive interpreter, that decision has to be made on every single plus operation you do. So a lot of these things that people say: "oh, we can just do this, we'll just do this", has the effect of slowing down every single operation in the every single application you want. It's not just simple saying that. Any API interpreter I've had my feet in, I made everything (APL360 didn't) but I made sure APL used reference counting throughout. And that means that things like this: "can I do this in place?" becomes simple because you don't have to say: "oh, is this a named doo-dah?". You don't care whether it's named or not; there's only one reference to it. You can do it in place because that means it's not named, but what you end up having to do is to do reference count maintenance. So when you call primitive, you have to increment the reference counts on its left and right arguments and when you allocate the result, you've got to set its reference count and when you depart the function, you've got to decrement the reference counts on what were the arguments to those things and when any of those reference counts go to zero, you can deallocate that. I think Jay Foad did some very nice work on his compiler for Dyalog APL. What he's done is he doesn't eliminate any of those things. But what he did do is he eliminated the syntax analysis overhead. In this "2 + 2 times 1 plus rank 3 thing , you would find in the generated code, calls to a generic plus routine and a generic times, so there's not gonna be loop fusion happening there. But back to the reference kind of thing, one of the guys [who] worked at on the site project had some very nice work on implementing reference counts. What he did is: here's a patch of a basic block that is a piece of code that doesn't have any branches in or out of it, or other control structure stuff. And he would look at the code and would say: "oh increment the reference counts on this; decrement the reference counts on this". And then if you saw something like "increment the reference count, decrement the reference count, increment ..." and within a basic block, you know that nobody's going to be entering or leaving this block. So you can statically look at it and optimize the reference count. When you enter it and the net difference in reference counts when I entered this block (between the time I entered and the time I leave it) is that you know this thing disappears (or this reference count goes to 0). [Or] these ones are the same because it's something where we incremented, decremented, incremented decremented and so on, and he just eliminates those reference counts entirely, and so that speeds things up. It's not going to make much difference in the interpreter, but even in Jay's compiler it could help if it was able to inline primitives. I have some ideas for doing that, that I'm going to mention. Well, they're pretty obvious ideas.

00:56:23 [ML]

Yeah, I mean, the tough thing with that is that the way it's written in a typical APL interpreter like Dyalog is that you have this one big function that does plus or does sort or whatever and it's got all the reference work built in, so you have to somehow split out all of the different functions to do that. And you want to keep the the put-together functions operating at the same speed too, so that gets kind of tough.

00:56:50 [RB]

Yeah, I would agree with that. So what you do is really simple. You go back and refactor the interpreter. In doing so, I expect you would find one advantage of refactoring the interpreter is a fair number of very long standing bugs would emerge because ok, here's this thing that does all scalar functions or something, or does some integer scalar function maybe, [or] do the conformance checking. And so you rip out the conformance checks and make those macros or function calls (it really doesn't matter what it looks like under the covers). What you do is you make those table-driven so you can get commonality and a standard way of doing it across the interpreter rather than: "oh, it's Tuesday; I'm implementing transpose; I'll do my conformance checks this way". That way lies madness. Similarly, with a reference component, the way to do that is to rip it out and make them table driven.

00:58:05 [ML]

But I mean, this is gonna introduce some overhead, right? When you're looking up ... [sentence left incomplete]?

00:58:08 [RB]

[Emphatically] No! no! That's the whole point; [it] is you get rid of overhead.

00:58:12 [ML]

Well, how? Wait. Why is it cheaper to look into a table to see what you're doing with reference counts than to just write ... [sentence left incomplete]

00:58:19 [RB]

Because the table may generate macros.

00:58:21 [ML]

Well, we have macros.

00:58:23 [RB]

Point is to make these these common things you must do at each primitive ... you pull those out and you make them the same across the interpreter. I say the same [but] I mean the same semantics. What it looks like underneath doesn't matter.

00:58:42 [ML]

Yes, there are utility functions for this that are usually gonna get inlined.

00:58:45 [RB]

They may not be functions. It doesn't matter. The point is you want to expose them to simple optimizations within the interpreter. So for Jay Foad's example, because we're compiling stuff already, we're just looking at things a little closer. By separating out the cell computations (the elemental computations that produce cells as the results), we can deal with those separately from the issue of reference count maintenance, conformability checks, and so on. And so I've got eight in this 2 times 1 plus rank 3 example, those confirmed that the checks would be lifted, but that doesn't matter. The key is that (what's going to happen inside is that) the cellular code that implements those primitive computations of plus and times can be inlined in a JIT compiler. They can be inlined and then you basically have about 90% of the benefit of a full all signal dancing compiler, but the ability to run it in a JIT environment. But it does require that somebody sit down and take their lumps, by looking at every single primitive and mapping them into a common form.

01:00:09 [BT]

And that consistency is what gives you the power to then optimize, because you're dealing with the same situation with each different primitive.

01:00:17 [RB]

That's great.

01:00:19 [ML]

Well, I'm sorry but Dyalog APL is not a very consistent language [chuckles].

01:00:24 [BT]

No, I think that's what Robert is saying: that it isn't. It would be improved if it was more consistent.

01:00:31 [ML]

Yes, it it absolutely would. Everyone at Dyalog agrees with this, I believe. Nonetheless, various applications that are transacting billions of dollars and running people's medical information would break and that would be pretty bad. So it will remain inconsistent.

01:00:51 [BT]

Yeah, that actually makes me wonder about some of the reasons that there hasn't been a lot of work with compilation as you end up with legacy systems that people are relying on that are essentially interpreted. Maybe with some JIT and things sprinkled through it.

01:01:06 [ML]

Yeah. Well in APL and J, not having a context free grammar ... [sentence left incomplete]. Like J's compiler is a bytecode compiler. [14] A bytecode compiler on APL is very tough. One on K or BQN (where we have a context free grammar) we just take for granted. That's always what we do: whenever we see a program, immediately compile it to bytecode. We don't have to worry about syntax.

01:01:32 [RB]

I see Jay Foad's compiler effectively does that for all practical purposes.

01:01:36 [ML]

Yeah, yeah, that's the point. But it's much harder when you're working with a syntax that fundamentally ... [sentence left incomplete]. Like if you have "a b c" written, the meaning of that depends on what's the type of "a" when you get to at this particular time; what's the type of "b" and what's type of "c" so on.

01:01:55 [RB]

And that's why I was talking about static single assignment and basic blocks. And that's why I think Jay's stuff already handles all that, right? I believe it handles it. If it doesn't, it should.

01:02:15 [AB]

But I think it just protests if it cannot figure out what you mean or what things will always mean [and] it says it can't compile it.

01:02:22 [RB]

Which is fine.

01:02:25 [ML]

If you've got this split up form where you're doing conformability checks and reference counts outside of the functions. I mean that's great if you're JIT compiling all, but if you're not JIT compiling, how do you get the primitives to run at the same speed? Do you just always JIT compile or what?

01:02:42 [RB]

No, you could. I mean, people have done it. This is a little gedanken experiment here. Let's look at one primitive and say: "ok, here's how this primitive does it's reference counting". And we like the approach it's taken. So I write an increment-reference-count macro, and I write a decrement-reference-count macro. And the semantics, since it's a macro, we can change it anytime we like. And then you put that in the interpreter and with any luck, it will generate the same code as it does now, and then all you do is you look at every other primitive on the system and see if they use the same pattern of increment or decrement reference counts. This is, you know, not fun. It's just, there's a lot of shoveling going on.

01:03:49 [ML]

Yeah. So basically you've got two copies of of every primitive now.

01:03:52 [RB]

No.

01:03:53 [ML]

One that's inlined

01:03:55 [RB]

No, you got a macro.

01:03:58 [ML]

The macro doesn't exist in the compiled interpreter. [chuckles]

01:04:02 [RB]

What?

01:04:03 [ML]

What is the object code look like?

01:04:05 [RB]

The point is, you can you can refer to you know ... [sentence left incomplete]. If it's going to be JIT, you're going to take that reference count stuff and probably build a table. Those are payable pointers. For the existing interpreter functions, they will use that same macro, but it will generate code within the interpreter to do it. The other one is being used only by the JIT compiler. If we do this for the conformance checks and the cellular computations and so on, if you do it within the interpreter, it's going to generate one set of code. If you do it right, the JIT compiler will generate identical code. Now all you have to do is apply that to every other function.

01:04:58 [ML]

This is a JIT compiler that's making object code for the primitive. It doesn't use the stuff that you've written in C?

01:05:05 [RB]

That's right.

01:05:06 [ML]

What's it supposed to do?

01:05:07 [RB]

What do you mean? The whole interpreter is written in C, so I don't understand your point.

01:05:11 [ML]

Like with the JIT compiler, you use code from the functions that you have written. Or would it say: "Here's reverse of sort". I have some high level description of reverse and of sort and from those I will build machine code, I guess.

01:05:29 [RB]

You get to make some implementation decisions. I don't think it matters. The point is you want to be able to to find ... [sentence left incomplete]. There's two goals here. One is to establish a way of formalization for all primitives (or at least the vast majority) that can be used both by the interpreter (across the interpreter) and by JIT-ly code. There's several things you want to do, but one of the main ones is the ability to do loop fusion.

01:06:07 [RB]

Yeah

01:06:14 [RB]

... so that you can grab pieces of cellular computations and do compositions on those within the JIT compiler itself.

01:06:21 [ML]

I mean, I think that's pretty tough to do with the forms that Dyalog has because it's got all sorts of ... [sentence left incomplete].

01:06:26 [RB]

That's why I'm talking refactoring.

01:06:28 [ML]

Well, there's refactor and then there's rewrite [chuckles].

01:06:31 [RB]

I'm trying to describe a way of refactoring this that preserves the existing interpreter behavior where possible. Where it's not possible, the question you have to start asking yourself is: why? Why is this one different from that one? And there may be a good reason for it. Often, as I say, it may have been, maybe this guy was off on holidays and just came back and he wasn't quite alert yet.

01:07:03 [ML]

Alright, so let me describe some of the more issues. Like I don't agree that reference counting and conformability checks even gets into like the issues at all.

01:07:13 [RB]

Let's put that aside because I think they're critical issues for an interpreted language.

01:07:20 [ML]

Well, so one thing that Dyalog does is if you're taking a compress call ("A compress B" [which in APL symbolically is "A/B"]) what it will do is check the sum of "A" total and it will decide based on that whether it should use a branching sparse compress method, or a branchless method that's actually vectorized. That's not something you can fuse because you need to know the entire sum. You can't fuse and get the same behavior at least. Because you have to know the whole sum to decide which of these to do.

01:07:59 [RB]

So what you're saying is, you make a conscious decision ... [sentence left incomplete]. Now for compress and replicate you always have to know the sum of left argument. Or maybe the sum of the absolute value of the left argument in that extended replicate case you just was described earlier. Whether you do that or not is a separate issue from cellular execution. Look, you can clearly say: "OK, this guy is out of scope; it's too complicated. We don't know how to do it in the JIT compiler today". Today the Jay Foad JIT compiler works that way. It just says: "I don't know how to do any Functions". And so it always calls; just blindly calls. But what you can do is maybe there's other functions that are better behaved. The whole idea is do what you can when you can.

01:08:59 [ML]

Well, but that seems like a problem. If you don't implement the same functionality, then you got situations where in some cases, the JIT compiler is faster because it's compiling and in other cases the interpreter interpreter is faster because it has it has access to other methods that you might not get in the compiled case. So I mean, if you have some application that really depends on the sparse replicate, then it could actually slow down when you compile it. And how are you supposed to figure out which one to use?

01:09:30 [RB]

Slow down the compiler [or] slow down the interpreter.

01:09:32 [ML]

The compiler would be slower in this particular case, you know, as an example.

01:09:37 [RB]

As I said, if you know that's the case, then maybe you don't want to compile it.

01:09:42 [ML]

But if you don't know that's the case [chuckles] and you do compile it and then your code ends up slower.

01:09:46 [RB]

Then don't do that! This is really simple. It's like if you got a thing where, geez, the interpreter can do a better job here. Well, maybe you just, don't even think about compiling this.

01:10:01 [ML]

What if that's like 75% of the interpreter? I mean, that's one small example. There are a whole lot of things that do ... [sentence left incomplete].

01:10:08 [RB]

Then you learn something. And this is why I said, you refactor this stuff and by looking at the problem and separating these things and making it uniform, we solve these problems and you can then identify which things you really don't want to compile. I mean for example, there is rarely (I won't say always) but there is rarely any benefit in inlining or compiling or JIT'ting matrix inverse in my humble estimation.

01:10:41 [ML]

I mean, even something like sort, that's not that slow. There's hardly any overhead in going into this sort relative to actually doing the sort.

01:10:54 [RB]

And what's your point?

01:10:55 [ML]

I was agreeing with you.

01:10:57 [RB]

OK, thank you.

01:11:00 [CH]

[laughs] Well, as much as much I feel like we could continue this on for another hour. This is probably what happens when we bring someone with implementation experience and we have people with implementation experience, the conversations could just go on for 8 hours. We've blown past the hour mark and I think, my recording device, we're past the hour and a half mark as well. I'm not actually sure when we started officially starting recording. But yeah, we'll probably have to start to wind this down. I mean, we'll make sure to put links to both your masters' thesis or not both. All of the things mentioned: your masters' thesis, SISAL, single assignment C and I think probably the most important question to finish on, Robert, is if folks are interested in taking the Apex compiler for a spin, is that something that people can do if they go to some subsite of Snake Island Research? Or is it currently still in an unreleased phase?

01:12:03 [RB]

I'm still looking for the tire pump.

01:12:06 [CH]

The tire pump, which is?

01:12:09 [RB]

If you want to take it for a spin, you need a tire pump to inflate it!

01:12:11 [CH]

[laughs] Oh! I was like, do people need to come and bicycle to a certain place? I completely missed the analogy there.

01:12:22 [RB]

Apex is on GitLab somewhere. I think it's under the MIT license, but don't! The compiler is written in [the] early 90s and it had to run [on] a Sharp APL PC with its 108-200K workspace. No, we didn't have ... what do you mean hard drive? There's a lot of stuff that you know ... [sentence left incomplete]. Talk about refactoring. One of the things I'm going to look at is a little bit Aaron's Co-dfns compiler and try and produce a Co-dfns for earthlings [chuckles] view of that.

01:13:09 [ML]

Do beware the Co-dfns will change from under your feet with regularity [chuckles].

01:13:14 [RB]

I'm aware of that. That's what the man said [others chuckle] Mythical Man Month.

01:13:21 [CH]

He's already rewritten twice [chuckles].

01:13:24 [RB]

Plan to build a prototype. You will anyway. I'll let people know when I get Apex back to the point where it's running reasonably well, so part of my problems today are with with SaC problems in SaC treatment of specializations and other things, and others are all my own doing. So when I get those settled I'll rattle people's cage. But two other things I want to mention are if you look at CUDA and if you look at Pytorch [15] and the various AI languages that are out there, they are really unpleasant. They're they're primitive. If you look at, like a page of CUDA function calls, it's as bad as a Windows template library or something. Most of these functions are implementable as one or two characters of APL. Or if you look at ... there's a paper on ACM (https://dl.acm.org/doi/10.1145/3315454.3329960): "Convolutional Neural Nets and APL" by Artjoms Šinkarovs and Robert Bernecky and Sven-Bodo Scholz. That is in the Array 19 workshop (Array Workshop proceedings). On the second page of that (except for the bad break), there is a implementation of the the native APL for computational neural networks model. That is about 10 lines of code. I'm not sure. That's on page 69, mostly on page 70. And when I get that to the point where it compiles, I'll ship that. I'll be happy with the current state of Apex.

01:15:39 [AB]

This isn't Dyalog APL, right? It is not Sharp APL.

01:15:41 [RB]

It is Dyalog APL. We wrote this for the 2019 conference.

01:15:46 [BT]

And we'll certainly include all the show links and everything so if people want to go back and look at these papers, they'll be links to go to and they can, they can refer to the page numbers you've you've mentioned.

01:16:00 [RB]

OK, and the other thing is IBM research and TJ Watson in New York; one of their quantum computing toys (well, expensive toys you can get at it) ... [sentence left incomplete]. There's some Qiskit, [16] I think is what it's called; it's worthy of a look because if you look at the way they're programming quantum computers, it has the same stupidity as CUDA and pytorch and stuff. It's very crude. Let's do it all from the scalar world, even though they're talking arrays. They're built too shallow it's all wrong. And I think one of the nice things about Apex is the back; it's fairly well isolated, so at the back end is a separate (in this case) separate namespace currently. And it is possible to target new languages fairly quickly. I had a kid working for me last summer(bright kid) named Holden Hoover. He wrote a Julia backend for Apex now. And I think he's 14 years old now. So he's got a good start. So writing a tensor backend that that feeds into tensorflow. Or actually we can do that already. but a backend that would generate appropriate calls to an AI machine might work.

01:17:53 [BT]

Thank you, Robert. I think we need to wrap up at this point. And also I'd just like to mention one thing, Rodrigo's name came up earlier in the conversation. He used to do all the transcriptions. Now it's Sanjay and Igor; [they] are the ones that do the transcriptions. So I'd like to shoutout to Sanjay and Igor for doing transcriptions. And in this particular episode the transcriptions may be a little late. We're recording later than we normally would. They'll be there, just maybe a bit late, so that's contact@arraycast.com.

01:18:27 [CH]

Yeah, obviously thanks to both of them. It's a huge amount of work and it's my fault: this recording that they're going to be a bit late because of schedule differences. But yeah, once again, thank you so much to Robert for coming on. We might have to have you back in the future so that you and Marshall can continue to play ping pong. But with that, we will say happy array programming.

01:18:49 [everyone]

Happy Array Programming!

[music]