Transcript

Transcript prepared by Bob Therriault, Adám Brudzewsky, Sanjay Cherian and Igor Kim.
[ ] reference numbers refer to Show Notes

Transcript 

00:00:00 [Conor Hoekstra] 

What do you call an end-to-end algorithm in the context of pre-scan scan and post scan, where we already have an end-to-end algorithm? Adám has his hand up, apparently.

00:00:07 [Adám Brudzewsky] 

It's obvious. OmniScan.

00:00:12 [CH] 

OmniScan? What does Omni mean?

00:00:14 [CH] 

Pre and post.

00:00:16 [Marshall Lochbaum] 

So we have this in BQN and K and we just call it scan.

00:00:29 [CH] 

Welcome to another episode of Arraycast. My name is Connor and today with us we've got four panelists plus returning guests just from a couple weeks or episodes ago before we get into introducing him and talking about our topic today we'll go around and do brief introductions. We'll start with Stephen then go to Bob then go to Adám and then go to Marshall

00:00:47 [ST] 

Hi, I'm Stephen Taylor. I'm an APL and Q programmer. 

00:00:51 [Bob Therriault] 

I'm Bob Therriault. I'm a J enthusiast and I've been playing with Fold.

00:00:55 [AB] 

And I'm Adám Brudzewsky. I'm an APL programmer.

00:00:58 [ML] 

I'm Marshall lochbaum.  I've been a J programmer, worked at Dyalog APL, at Dyalog, and now work on BQN. 

00:01:05 [CH] 

And as mentioned before, my name is Conor. I'm a polyglot programmer with a huge passion and enthusiasm for all array languages and super excited to talk about our topic today. But before we get to that, I believe Bob is one announcement that he's going to share with us. 

00:01:20 [BT] 

I do, and it's one of these ones that's certainly sad, although you celebrate a person's life, I think, when they pass away, you don't try and dwell on the fact that you've lost them. But Lenore Mullin, who is one of the people that's developed mathematics of arrays, a very interesting person, and I hope at one point to get her on the show, had mentioned Luther J. Woodrum [01] to me. And last week she sent me an email and said that Luther J Woodrum had passed away at the age of 83 on March 8, 2023. Well, who is Luther J Woodrum, you might ask? Well, he was a guy who worked at IBM and he contributed immensely to the field of computer programming. He secured patents both for himself and IBM on the RADIX sorting method. He made key contributions to APL and innovations to the architecture of IBM's System 370. As a personal note, Luther knew how to entertain a crowd while singing and dancing at karaoke. He inhaled books, all types, and was a lover of knowledge. He was also an aficionado of science fiction, both books and shows, from Star Trek to The Witcher. And condolences to his whole family. I'm sure it's a very sad time for them and our hearts go out to them. And I hope they can come to enjoy the he memories of Luther as a life, I think, very well lived and a very interesting person. And unfortunately we didn't get a chance to put him on the podcast and I think that's up to me. So I apologize for that. It would have been nice to have actually had him on

as a guest and talk, but we've lost that opportunity.

00:02:54 [CH] 

Yeah, on a personal note, to have a family funeral that'll be going to in about a month and they're calling it a celebration of life because these things are always bittersweet. It's both sort of emotional when someone passes, but you're trying to celebrate their life and all the ways they've touched their loved ones and friends around them. So yeah, thoughts go out to their family and friends. With that, we will transition to introducing our runaway guest with the number of appearances on this show. I think it's a distant second now, because this, I believe, is the fourth appearance by Henry Rich. [02] We've recapped a couple different times when we've had him on. Episode 6, our first guest ever, then episode 18, then I don't even have the number because the APL Wiki for ArrayCast isn't up to date, and the last person that we have was Leslie Goldsmith, so I'm assuming it's episode 49 or 50, I don't even know. But this will be the fourth time appearing, I guess the last time was two episodes ago. If you've been listening since the beginning, you know Henry is our resident J expert and does a ton of work on the J source code base, which is where J comes from. And today we're going to be talking about, I think it was kind of a topic that I brought up, although it was initially put on my radar by Henry at one point when we had you on and you mentioned that fold, which is this sort of, I actually don't know what kind of primitive you call it because you have to do a little install, you know, quote GitHub colon J source slash dev underscore fold in order to get it working on your local instance of J or even if you're on the J playground. So we'll, we'll leave a link in the show notes to a little code snippet [03] if you want to, you know, if you happen to be in front of a computer and you want to play around with us while we're talking about this topic. So yeah, we're gonna be talking about all the different flavors of fold today. So I'll stop talking and maybe just throw it over to Henry. And you can, I believe you were the one that implemented, give us a sort of high level overview, and then we can hop into a discussion of sort of what you can do with this and why it's useful.

00:04:52 [Henry Rich] 

Yes, I implemented and what's worse, I designed it. It gives me a great appreciation for the brilliance of Ken Iverson and those who've gone before in language design. Even something as simple as this is easy to mess up. I don't think you have to do anything special to use fold in a ordinary J application. It comes pretty bundled. J playground. Yeah. J playground. Maybe the reason you have to have something is because the primitive is implemented in J. So when you, when you use it, it actually calls a J script to do the work. That's so that it can be fixed if there are bugs. And if there's... As it gets taken up, it'll eventually be migrated into native C code. But it should be part of the standard install now.

Fold is a conjunction, which in J-means it takes two verbs as its argument. So you'd say "u fold v", and that combination defines a verb, which is J-Speak for a program, that operates according to the rules of fold. The idea with these conjunctions, this is just the latest in a long line of things that APL has added to do useful stuff. What these conjunctions do is attempt to encapsulate something useful about programming into a small space, in this case just one word, like capital F dot. This has a long history in the APL languages. If you go back to the 1962 book, which is where I picked up APL, there were verbs like plus and minus, but then you could derive a new verb with plus slash, which does a reduction because Ken recognized that that was a useful idea. And you could go even farther, you could go plus slash backslash, which would give you a running sum, which is sort of like a reduction except you keep all the individual, the running total, you keep all, not just the beginning number and the total, but all the totals in between. 

00:07:19 [ML] 

I think I gotta back up there cause that's in J, but in APL it is just plus back slash. 

00:07:24 [HR] 

No, no, you're right. You're right. It was just a backslash. Yeah, that was Ken figured out later that it really should be two separate operators, and he made it that way. But the point is he recognized this pattern of running some and made it a primitive of the language. Another thing along that line is the axis concept, where you can apply an operator along a chosen axis. These are attempts to, and successful attempts, to discern important programming features and to give them notation. I would say another example of genius is the grade primitive. To recognize that grade, as opposed to just sort, is the true primitive was real genius, and I think that does go all the way back to the beginning. As an aside, I was thinking, what other programming languages have this approach to making programming easier? Are there any? The languages I know best are the block-oriented languages like C, which I know quite well, and C++, which I know somewhat. They're in a different league. C is not a programming language really, it's a language for telling the computer what to do.

00:09:18 [ML] 

Yeah, I think there is kind of a widespread belief even in the functional languages [04] that to break something down you have to, you never break it apart into two things that are kind of equal in size, but you always break it into components and start looking at smaller aspects of the input. So, I mean, even in a functional language, if you ask someone to simplify fold, they would say, well, what you actually want is to look at a linked list and you have the operation that takes the head of the list and the tail of the list, everything else. But that's not, in my view, yeah, that's not actually cleaner. It's just the component is not really simpler. It's smaller.

00:10:07 [HR] 

Yes, exactly. C lets you work very well with individual atoms. And everything else is up to you. This is not a programming language. This is a very good machine language. I think of what does C do for me that really changes the way I think about programming. And I'd say while and for are very useful constructs. and structures are a very good way of keeping data in mind. And that's all I can come up with.

00:10:46 [CH] 

I guess the question is here, like, so earlier you asked what other programming languages make things easier. So like, in my head I was like, well, what does the word "easier" mean? And then when defining what is and isn't a programming language, I guess my question is like, what is a programming language here? Because in the broader sense of, I think, the way that the industry uses it, like C is definitely a programming language, there's no debating that, but I think in the sense that you're using the word programming language you're tying slightly different semantics to the discussion that's happening at the moment.

00:11:18 [AB] 

It helps you think, right? 

00:11:20 [HR] 

Yeah, well it helps you think and it helps you implement your algorithm. The primitives recognized by the APL family are, it turns out, with them you can describe your algorithm in an executable notation. I guess what I'm saying is that maybe we shouldn't be saying APL, maybe we should be saying TPL, the programming language. The others are computer languages. TPL is the way you express algorithms rather than just how to push numbers around. Or if you want, we can generalize it. Maybe TPL would stand for true programming languages.

00:12:00 [CH] 

I'm hearing Toronto Public Library every single time you say that.

00:12:05 [ML] 

OK, I do believe there's a tensor programming language already. 

00:12:09 [HR] 

Ah, quite right.

00:12:10 [AB] 

But wasn't it Iverson that when they asked him what they should called annotation? It just be the notation, not Iverson's notation. [05] Is it after you don't call it God's grass, just grass. 

00:12:21 [HR] 

Yeah, right. 

00:12:22 [AB] 

And same thing here. But I can see that it should be famous by now to Conor's listeners that adjacent differences is a bad name for pairwise.

00:12:34 [CH] 

Well, I don't like the word pairwise either. 

00:12:36 [AB] 

Or whatever you want to call it, N wise, maybe Windows. But something that is... You get a tool in the hand that looks like or named like something that helps you to use it the right way. It's almost like adjacent differences. it's a hammer, it says screwdriver on its side, right? Or maybe it's not really the right analogy. It's a general purpose screwdriver that has all the different bits. And then it says Phillips screwdriver on the side, as if it could only do one thing. But you can actually do any adjacent anything with it. It doesn't have to be differences. And so here too, you're saying, well, there's not much being provided in C or even C++ that helps you to think about the problems and actually get down to solving the tests you have to solve? 

00:13:29 [ML] 

Yeah, I think that's maybe the essence is that, I mean, yes, the tools that C provides are simpler. I mean, like what it gives you for working with arrays is basically take the element at this location and put the element at this location. And yes, those are simpler than a primitive-like grade. but actually using a primitive-like grade once you come to understand it ends up being a lot simpler. And you don't have all these complications of, you know, well, what if I put this value here, but then there's another someone else who has an array which this is, which this array is only a subpart of, do they see the change that I made and all that sort of stuff, which is, I mean, that's very much computer thinking instead of algorithms thinking. But I think even outside of the languages like C, which are intended to be computer oriented, even in mathematical stuff like Haskell - now Haskell [06] is not that bad - but you still see a sort of inclination to say, "Well, what this really is is the smallest parts that make it up. This is really something that's looking at each element of a list individually and doing recursion or whatever to traverse the list." instead of saying, well, I have this entity, this primitive that could be implemented in terms of small parts in various ways, but it feels like it's a, I've actually discovered something that's that's real in its own right that has an existence outside of how I might define it.

00:15:06 [AB] 

I definitely don't think of the lower parts of the primitives when I use them. They're just my vocabulary.

00:15:13 [HR] 

Yes, that's the point. It's somewhat analogous to matrices in mathematics. If you use matrices in life, you stop thinking about them as being numbers, and you stop thinking of row and element times column element, and you deal with the whole matrix. With the TPLs, we deal with the whole problem. I mean, we've talked about C, which like you say, is at a very low level. But I would say even object programming, [07] which is a very high level, that is useful for programming. But I would say at a project management level, it allows me to break a big problem into smaller parts. But it doesn't really do all that much to helping me implement a part. Templates in C++, which I have just a passing knowledge of, seem to be similar to the constructs of APL and J, but immensely more cumbersome to use.

00:16:22 [AB] 

So we've got these in APL, we call them the operators in J, we call them conjunctions. they're part of this set of higher order functions that we have in the array programming languages. Fold or reduce or whatever you want to insert when you want to call it and scan. 

00:16:42 [HR] 

Well, yeah, there's a whole list of them. In the beginning there was reduce and scan. And I don't know what order later on these were discovered, but the power conjunction [08] allows you to... It's U power N, where N's the number. It could be V if you need to compute the number, but U power N says apply the verb U N times. Very elegant, but extremely important because first it gives you... You can think of it as an if statement, if n is zero, you've applied the verb no times. So that's the false branch of the if. If n is one, you apply the verb one time. In other words, you did it, the if was true. And if n is infinite, that means apply the verb forever. Well, what does that mean? It can only make sense to say apply the verb until the result doesn't change. So think of that as converge.

00:17:47 [AB] 

Unless you have side effects that you want to achieve, have it like a game loop or something like that. You actually want to keep going even though nothing changes. 

00:17:54 [HR] 

Right. And you probably would just do that with a loop. I guess it goes without saying that in all these, we're thinking mostly in terms of functional programming. We're trying to come up with a language that's largely functional. Side effects are up to you. But anyway, so the power conjunction is an example of...Another one of these programming patterns that was discovered.

00:18:21 [ML] 

And that was, I think, pretty late as far as I know, it was introduced by NARS. 

00:18:25 [ML] 

I don't know if it was Jim Brown that came up with it or someone else, but NARS had the star diarasis notation that's used in APL and the pretty much the current definition. 

00:18:37 [HR] 

OK, cool. The rank concept itself...Well, there are several patterns that involve applying a verb on selected portions of an argument. There's rank, wherein the portions are simply the cells of the argument, in a very symmetric way. Then there's the cut pattern, which applies the verb to contiguous portions of the array where the starting points are indicated by another argument. And then there's the key pattern, which applies the verb to non-contiguous subsets of the array that are extracted according to another argument. And all those, a key to me is stroke of genius on the border of grade. I would not have realized how valuable key is, but it just comes up a lot. This is, whoever figured that one out really did come across a primitive of programming. And then we have recursion, another primitive, which is given a symbol. Anyway, if you take all these different combinations together, they do most of what you need to do in writing a program. Most of the iterative things where you want to apply a verb more than once, perhaps on the entire array, perhaps on items, yadda yadda, that almost covers the field, but there is one type of repeated computation that it doesn't handle very well, and that's what fold is aimed at. And this computation is when you want to apply each item of an array, in turn, to some global state. So I have an array and I want to take the first item, do something to it. The second item, do something, use that to do something to the global state and continue on through all the items. It was noted fairly early that in J at least, I guess this is true, would be true in APL2, A scan does something rather like that. In a scan, let's say you're just adding up the numbers 1, 2, 3, and you're doing a reverse scan. You start with 3, and then you add 2, which gives 5, and then you add 1, which gives 6. And in that case, the state that I'm talking about is the total. And you can use scan to produce that total and reveal all the intermediate state at all points along the way. That's what the scan is really doing. So in some cases, you can use that to do the thing that I was talking about, which is repeated application of the items to the state. But there are some problems with that. The first is that because of the right-to-left orientation of J's operator execution, it only really works simply if you go right-to-left. If you go left-to-right, you have to add the first two items. Well, additions, take subtraction as an example. If I look at 1 minus 2 minus 3 in APL or J, what is that? Now that's a test for the viewers. What is 1 minus 2 minus 3? It sounds like it ought to be minus 4, right? But it's not because the execution is right to left. 2 minus 3 is minus 1, and 1 minus minus 1 is 2. So you can't simply go left to right and get the correct answer, which means that as defined, a forward scan would require-- just wouldn't answer the mail at all. It would not be repeated application of the state. It would be repeated application from the ground up. OK, you can get around that. You can say, OK, we'll just do reverse scans. And if we want to go front to back, reverse the argument and go right to left and then reverse the result. Okay, we can live with that. The second problem is that the initial value, There's no initial state in SCAN. In SCAN, the initial state is the identity element [09] of whatever the operator is. So that if you're doing a running sum, the identity element for addition is zero, and you say, "Okay, we're going to start this sum, and it's as if we started by adding the three to the zero, and then the two and the one." But suppose you have a different initial value. Suppose your initial state is not empty. Well, now you have a problem because to make this fit into the mold of a scan, you would need to make your state the last element of the vector. And that means it serves two masters. both an input to the computation and an output of the computation, which means that in general you can't use anything other than a box. So you end up, in general, you have to box all the items of your array and then take the initial value and box that. 

00:25:08 [AB] 

You don't have that that issue in the APL though, because you can concatenate any values. 

00:25:14 [HR] 

Well, that's good. 

00:25:15 [ML] 

Well, I mean, it's still it's just nested instead of boxed. 

00:25:19 [AB] 

Yeah, but you can always just concatenate the values that you want. Let's say that something I do occasionally is I want to iterate over all the X's. So my X's are is the range of all the dimensions, so just numbers. And then there's some actual array that's multidimensional, probably with as many dimensions as the leading elements there. I can concatenate those numbers, the excess numbers, with the enclosure of the entire array. That's not allowed in J, because that would be a list of numbers concatenated with a box. You have to now box everything. Every element has to be boxes. 

00:25:59 [ML] 

I mean, it's still ugly that you're forming this this wildly inhomogeneous array. 

00:26:03 [AB] 

For sure, it's just slightly easier to use. You don't have to mangle the actual elements in order to make it work. 

00:26:13 [HR] 

Yes, in that case, the ability to have the Er that would solve this issue. Like Marshall said, it is...

00:26:28 [AB] 

And and potentially very costly internally it's it is represented as boxes, they just don't present like that. 

00:26:34 [HR] 

Oh, OK. In that case, yes. All right. So you've got all the bad, except that you don't quite see it as much.

00:26:42 [ML] 

So specifically, what happens is that... I mean, it's fine having that one element at the end. But the problem is, in order to represent it as a homogeneous array, all those other numbers, which were stored, you know, packed in an array, now each get expanded into their own array that can be used as an element. 

00:26:59 [HR] 

Okay, yeah, all right. That's what we have to do in J. Sounds like APL does it automatically. Anyway, so this is a problem that the initial value is not present in a scan. You have to make it up, you have to put it there, you have to figure out some way to represent it. I think at this point, maybe we should have a concrete example. Let's say that my X argument is a list, an N by two list of numbers representing moves in a chess game, where the first number is the square moves from and the right number the second number is the square moves to. And what I what my program wants to do is play through the game and do something I don't know what you know maybe kind of it does something but clearly what it needs at each step in the game is the board and it needs some information, the board and whose move it is. Okay, so that's the 64 number and an eight by eight array representing the board and a value representing whose move it is. So that's the initial condition that starts out with a board in the initial setup. And the operation of the verb is gonna be for each move, make the appropriate modification to the board and change his move it is. Well, here's the question. What is the final verb, or the question for me is, it poses a problem. What is the intermediate state between applications of the verb? And it seems clear that it's just like the input, it's just a modified board. Suppose the only thing I want, I wanna know when it's all over is, after each move, how many pieces were on the board. So in that case, I don't care about the board. All I care about is a number. The result doesn't need to contain the board. It just needs one number. But I have to pass the board from move to move.

00:29:12 [BT] 

Without the board, you don't have a number. So you do need to carry the board until you get to the end and you have a number, right?

00:29:17 [HR] 

The board has to be carried through as state, but it doesn't need to be reported as a result. And this is a deficiency of scan is that in scan, the final result is the entire accumulated state after each move. So what I'm gonna have at the end, it may be that I decide that my result is gonna be, or what I'm gonna make my intermediate state will be the board whose move it is, and then what I really want, which is how many pieces there are. So my intermediate state will have three boxes. The result of the scan is gonna be an N by three array of boxes. All that initial state, all N boards, all N copies of whose move it was, and then the N numbers that I actually care about represented as boxes. Now, you can probably stand to have 100 boards lying around, but if this intermediate state is 100 megabytes, it might turn out to be a problem to have to keep a copy of that intermediate state for each iteration of the-- for each item of the argument, particularly if you don't care about it at the very end. So this is a problem with scan, that it produces way too much, it preserves way too much intermediate state. It is compelled to report the intermediate state at each step as part of the result. Very wasteful of space. 

00:31:04 [BT] 

From stuff that I was doing last night and I was playing around with pulled quite a bit. 

00:31:08 [BT] 

From stuff that I was doing last night and I was playing around with fold quite a bit, But it seems to me that in order to preserve the board though, you do need to have a global that is the board that carries off between the states. So there is... 

00:31:19 [HR] 

It's not a global you. You could do it with a global, but you don't need a global. 

00:31:22 [BT] 

OK, tell me how you do it without a global then. 

00:31:24 [HR] 

It's not a global. You could do it with a global, but you don't need a global. Okay, tell me how you do it without a global then. Okay, the initial value... Let me get into defined fold first. I will come back to that. Okay, so one of the two irrefutable arguments for fold, the first one is too much state with the scan. The second is the scan runs to completion. Many algorithms need an early exit. Somebody's asked me to find the first five numbers less than a million that have some property. Well, you know, the APL type way to do that is calculate the number for all million values and it picks the first five. That can be a problem if the execution gets expensive as time goes on. And part of what we want to be able to do with the scan is terminate it when some terminating condition has been met. So go until you know you've found your answer and then quit. This I think is probably the more important reason because some algorithms are just too long to even consider running on the entire array. You know they're gonna terminate sometime, but when? And the when part is what spurs you to write an explicit loop to break the thing up into small pieces or what-all. OK, so that's what we're trying... 

00:33:03 [AB] 

Sorry for to interrupt, but what interesting you mentioned here is finding the 1st 5 numbers to have some property, first five numbers under a million that has some some property, but maybe more natural would be to ask find me the 1st 5 numbers and not given up. In which case even starting off with a million numbers and stopping out shortness doesn't work. 

00:33:23 [HR] 

Yeah, quite right. Quite right. Quite right. Yes there is a possibility that yeah, you can't even can't even begin to start. Uh evaluating them all because you don't know how many there are. Okay, so this is what full is trying to solve those two problems that we need that we want to be able to Terminate early. We want to be able to Avoid. Extensive problem. So allow the initial value to be specified separately, instead of having it bolted on to the argument to scan. So fold does this. The syntax, it's a conjunction. So you say U fold V, and it is very much like scan, except that it's usually used as a dyad. So the verb U fold V, put that in parentheses, takes an X argument on the left and a Y argument on the right. The X argument is the initial value. The y argument is the items to be iterated over. And what happens at each step is, v is applied between the previous value on the right, which is x, and the next item from y on the left. So to begin with, it would execute last y v x. X being the initial value. And it gives-- that has a result. That's fed into the next iteration. The next iteration takes Y last minus 1, V whatever that result was from the first application, just like scan from here on. And it applies that all the way through to the beginning of the array and produces all those intermediate states which are passed one to the next. How do you solve the too much memory problem? That's where the "you" comes in. After each execution of the verb "the," the entire result of "the" is passed to the next iteration. But only a part of it is preserved to be reported as the final result. That's what the "you" verb does. U is applied to the result of V to control what gets saved. So in the example we had where all I cared about was the number of pieces on the board, the U verb would be extract the contents of the last box, which is just the number of pieces on the board, and that's all that shows up in the final result. So the final result of N steps, It produced n intermediate boards, but the result is only n atoms, n numbers, the part I asked for. And that is fooled in a nutshell. For early termination, there's an auxiliary verb called "terminate", which can be executed at any point. Z is called Z colon. [10] The z colon looks at its argument and if the argument's non-zero it terminates. There are various ways you can terminate. You can terminate one iteration, you can terminate the whole thing, you can terminate keeping the current result. There are options. But the point is you can get out early. 

00:37:14 [AB]

But it's kind of magic, right? You couldn't define this as you colon by yourself.

00:37:19 [HR]

Well, it's not magic. It takes the results. So you calculate a predicate. That says either quit or not quit. And if the predicate 1 Z: it terminates if the predicates zero Z: it doesn't terminate.

00:37:33 [AB]

But let's say I didn't have Z: as a primitive. How would I define it in normal?

00:37:38 [HR]

Ohh yeah you can't do, you wouldn't be able to do that.

00:37:40 [AB]

That's that. That's what I mean by it's magic. It's doing things that you can't do in the language like. Like Cap, right? It's also a primitive that you cannot define yourself doing things that take steps out of its. It affects things that called it, rather than just affecting its own argument.

00:37:57 [HR]

No, I don't disagree about Cap. Cap does two different things, one when it's executed as a verb, it simply produces a domain error and when it's executed it produces domain error. It's used to create a fork have as a parsing time. In the execution of a fork. Agreed it's a little bit anomalous, but not like, say Z:. You can do Z: by yourself. But it would take a you'd have to. Get involved with and versus. You know you you'd have to throw an error and catch it in various places it would. You could use it to terminate the iteration, but it would be really cumbersome to use it.

00:38:43 [AB]

Or I could set a global stock value that the fold picks up and stops I guess.

00:38:48 [HR]

Yeah, something like that. But Z: is well defined and it doesn't use any global state. In other words, you can have folds within folds and the Z: inside an internal fold will only affect that fold. You know it, it's, you know, it doesn't take you all the way back to an outer fold for example.

00:39:07 [AB]

Well, there's another one. The the self reference is also like, right? It also affects the most the closest scope.

00:39:18 [HR]

Yes, you're quite right. How you define that... I've spent hours trying to figure out how to write that in words. It's very, very hard to say. But yeah, that's what it does. I know it when I see it.

00:39:32 [AB]

I know personally all of these features, while they're useful, make me nervous. I don't like having special primitives that syntactically are like normal primitives, but then they actually do something.

00:39:46 [HR]

No, no, I. Well, recursion is just a normal primitive, it's. I mean it. It stands in place of writing the name of what it refers to.

00:39:59 [AB]

But there might not be a name.

00:40:00 [HR]

That's right. So you don't have to. You don't have a name, but it still works. It still does recursion. You can still dissect it. You know you can dissect and go through the different recursion layers and see what was happening.

00:40:15 [BT]

If it's anonymous, it's the largest verb that it's part of, isn't that how it works?

00:40:19 [AB]

It's the largest one. The small. I can never really figure that out, but that does not our subject for. Ohh it's in when we have in the dfns in in APL. There's never any doubt to me about what self reference goes on. It's the the closest braces you can you get to as soon as you get the braces.

00:40:37 [ML]

Yeah, yeah it's purely lexical.

00:40:39 [AB]

Whereas in... I never really understood in J how that works, but the same thing here the fold, I guess it's the closest fold that it's in or something.

00:40:50 [HR]

Yeah, yeah. Z: operates Z: terminate when it when it does something, it terminates an iteration or the entire iteration on the fold that it's executing it.

00:41:01 [ML]

So does it also does it like jump out of a function if it's so like if you have an explicit function that causes Z: in the middle does. When that gets called, like what happens, it just jumps straight to the fold.

00:41:15 [HR]

I don't remember. I like to read the documentation.

00:41:19 [AB]

And then that's also that kind of thing that I wanted to ask as well. So let's say I don't want to just stop or not stop directly inside my fold, I'm going to cause some other function that's going to find out whether or not I should stop. That function happens to use a fold. Now I'm in trouble.

00:41:40 [HR]

No, you're not.

00:41:42 [AB]

It has to give a value back that's then caught outside.

00:41:47 [HR]

Well, that's what you asked him to do, right? I do. I remember now. Z: does the equivalent of the throw. [11]

00:41:55 [ML]

OK, right.

00:41:56 [HR]

So it will terminate. It's just as if you did a throw and then and the fold itself has the catch. So when Z: terminates the current fold, it's doing it, yes it will break out of every explicit on the way. It just goes right up to where the fold catch is.

00:42:18 [ML]

But yeah and. And it's not that different from throwing an error, which of course nearly every primitive does in certain circumstances.

00:42:26 [HR]

Quite right.

00:42:28 [AB]

I see so. You could actually implement this Z: as you call. Then as a special error value the or error type that's unique to the fold and the fold within.

00:42:38 [HR]

Yes, that's what that's how it's implemented. It's done as a throw of a particular code that fold knows to pick up.

00:42:47 [AB]

That's interesting. I could use that technique in APL because that's fairly simple to.

00:42:50 [HR]

It works well. You shouldn't be afraid of recursion, you know, it's the rule is that. When you counter recurs, and whatever the thing that's parsed. What the recursion is usually part of something big. And it's when that big thing goes to be executed. That's what gets recurred on. It may be a part of the sentence. It may be it's just a simple fork or part of an agenda, but it's well defined. What gets executed. Just it's just hard to put it into words. Anyway, to to finish up on fold. There are four main variants of fold. They all do roughly what I described, but one choice the programmer gets to make is which order do I want to apply the items. And so there's a forward fold where you apply the items from the front to the back. A reverse fold where you apply from the last toward the front. The other decision is do you want to keep the intermediate state. In other words, is it's like a scan or is this like a reduce. So if you don't keep the intermediate state, you just get the final result. If you do keep the intermediate state, you get all the results of the "U verbs". And then while I was at it, I put in a couple of variants that do not operate by item. But they just execute the verbs repeatedly on the the arguments and the only advantage of this is you can use Z: to exit when you think you're done.

00:44:50 [ML]

So that can be an improvement on the on the normal while idiom and j, which is you know if which you write is power condition power infinity. [12]

00:45:02 [HR]

Right.

00:45:03 [ML]

Because the problem with that is that. The power infinity stops when it sees the same value twice, but if you're doing that with a stateful computation, then the result value might not tell you whether you're actually done.

00:45:17 [HR]

Right.

00:45:18 [BT]

And that's closer to what Adám was talking about in terms of the game. You don't want it to if there's no change, you don't want it to stop, so you would have your Z: exit on another condition.

00:45:29 [ML]

Yeah, and sometimes you want to stop when you're still changing too, so.

00:45:33 [BT]

A question I had about Z: is the other thing it does in a specifically, when you're talking about, you know, not... I think of them as almost an infinite processing the the, the folds that are just going to keep chunking until you send the Z: to to kick them out. You establish you know like your maximum number of iterations and Z: can take, I think it's _3 Z: and then your max number of iterations. So you assign a variable. But I guess as the fold's going not something to me that does feel a little bit magic, it's actually counting each iteration as it goes through the fold, right? You don't have to do anything else for that.

00:46:16 [HR]

It counts the number of iterations of the number of times the V verb is executed.

00:46:20 [BT]

So that's not a count. You have to set up and decrement or anything. You just established what that number of iteration will be and then as the fold runs, it just keeps track of that and counts itself down.

00:46:30 [HR]

Right. Yeah, that's just has a kludgy feel to it, doesn't it? I it's this, like deciding this stuff is really hard. You know. I it's I. I am very impressed by the guys who did this.

00:46:43 [BT]

It feels a bit kludgy. I'm not sure how else you would do it, and it's very necessary. You know, you it's a safety. It's a it's a guardrail for you.

00:46:51 [HR]

Yeah, that's that's what I came up with too. Well, I'm glad to hear that you've done some folds. Cause I think that the number of people who have used fold. Even with that, can still be counted on the fingers of 1 boxing glove.

00:47:08 [BT]

And so who else is using it? I guess is now, my question.

00:47:12 [HR]

Yeah, you. Well, if you find that guy.

00:47:17 [BT]

Well, actually, to be fair, I think Devons used it a few times and I think Raul's used a few times, but you're right, not not a lot of people are using it.

00:47:23 [ML]

Yeah, all right. So it's an unusual boxing glove here. It's got a split down the middle or something.

00:47:30 [BT]

It's the Mickey Mouse club.

00:47:31 [HR]

It's been out there for a couple of releases. And has hardly ever been used.

00:47:38 [AB]

I think you mentioned. That the performance isn't especially good because it's all written in loopy j under the surface, right?

00:47:48 [HR]

I that I think that's the problem. If that were the problem, somebody would have done it and said guys performance. I like this but the performance sucks and if somebody did that I would have done it in C in the next beta. So it will not be hard to to make it fast. It's like fast. It's the problem. I don't think fast is the problem. I mean I think. What problem is that? It just takes a very long time for new language features to get into the head of the users. And when they're... And then they have to find the right application for them. And in this case, when there was a a slightly inferior way of doing it before, they'll stick with the inferior way anyway, because they know it. I think it's just. That you know what's the need for it? And I could say I can make a pretty good case if there's no need for it. Instead of having a fold primitive, I'll just have a loop.

00:48:46 [AB]

Well, you've got six fold primitives. Right.

00:48:49 [HR]

OK, but I only need one at a time. Whichever one I was going to use, I'll have a loop instead and instead of having state that I pass from 1 iteration to next, I'll have a global variable. And I'll have another variable and maybe then maybe a global variable .I'll have a variable and I'll have another variable that accumulates the result. And that gives me everything that fold does. I can obviously I can terminate whenever I want to. I'll just break out of the loop accordingly, so that might also account for it. It's not not that the users are slothful or thoughtless. It's just that the primitives not needed.

00:49:38 [BT]

Well, one of the things that occurred to me when I was using it is I feel in some ways the same way that reducing scan got separated in J as opposed to APL where there's single single glyph.

00:49:52 [AB]

I mean the reducing end, the.

00:49:53 [BT]

You you will. You need the the forward and backward slash together to get a effectively what is a one glyph scan in APL right? Yeah. So you've broke it into two parts. When you break it into two parts, it gives you a little power. You can actually put things between those two adverbs and do something in between.

00:50:13 [HR]

You can, but fold isn't doing.

00:50:15 [BT]

No, no, I understand that. But I'm saying as an example to me that breaking apart allows you to get in there and do something. When I was playing with fold, fold gets you into the same area but in a different way of looping and I think one of the things that's difficult is suddenly you're dropping through something you used to do just by, oh, I'll throw that in there and it'll do it. But now you've got the power to go in and do other things and then you really have to keep track about what you're doing. And I think that makes it a bit more challenging for people who are used to just putting in a reverse scan and go, you know, OK, we've done that. We're good. But now you could do other things in between. That's I found when it gets a bit complicated.

00:50:57 [HR]

And anyway, so back to why did why do we bother with fold. Well, I think to have a more complete functional subset. If you have. If you really think functional programming is important. You know, opinions vary on that, but if if you think functional program is important then you need a better looping primitive. You for the reasons that we mentioned and for those people fold shouldn't should be a useful tool. For others well, it's like, you know you there are a lot of things that you can do in j with loops that you can do better with primitives. This is another example of that and whether it's worth learning will be up to each user I guess.

00:51:48 [AB]

I'm certainly a bit envious of having all the possibilities there and I've spent some time thinking about. Well, you know the functions derived from the back slash in APL, being the scans, they're not defined to be dyadic, so we could throw in some left argument there that would extend the domain. To do all of this somehow. How would that be designed.

00:52:11 [ST]

Oh, you know, you're talking Q.

00:52:14 [HR]

Good, good and good for you.

00:52:14 [AB]

Yeah, exactly. I want Stephen to come in and we should invite him as a guest. And speak about forward and backward slash in in K/Q, no pun intended. I think believe that the same, aren't they? Ohh Marshal.

00:52:28 [ML]

Uh, the the slashes in K versus Q.

00:52:30 [AB]

Yeah. Yeah, no Q & K.

00:52:30 [ML]

Not forward and backward slash.

00:52:34 [AB]

Not K&Q. Then at the same. But I believe that K in and Q defined forward and backwards. That's exactly the same. [13]

00:52:42 [ML]

If you have the right version of k.

00:52:43 [AB]

OK. Yes, man. But I see these fold things in in j is kind of trying to patch up for. It wasn't as well thought out as it could have been, and certainly in APL as well. And I guess BQN does the job better, but it still doesn't have the full vocabulary.

00:53:06 [ML]

I don't know about that.

00:53:09 [AB]

But the way that it is in k that really fascinates me. The amount of functionality that's packed into those in a really neat and and symmetric fashion.

00:53:21 [ST]

OK. We could we could do a session on that. That would be a lot of fun.

00:53:24 [CH]

I feel like I feel like at one point. I mean, I lost it in my bookmarks, because I... Lost my operating system which in turn, I lost everything on that operating system, but at some point I had a maybe it was even in the footnotes of the fold page. We'll see. Is it? No, but there was like a I can't remember if it was like k 7 to j folds or like shakti to j folds, but there was some small table somewhere. I mean, one of our listeners is listening to this being like Oh yeah that was that was mine. I put that together of the translations between all the different flavors of j folds and the primitives. And some k version or shakti or something like that. So I'll go and find that after the fact and we can put it in the show notes.

00:54:11 [ST]

Oh, by the way, I have the word on versions of k recently from Arthur there are no versions. OK, there is only the one language the movie reference for this is Highlander, so perhaps we could talk about incarnations.

00:54:29 [CH]

There's only the one version. Well, if it's from Arthur, [14] that's that's you know, that's the Word of God.

00:54:35 [AB]

Wait, hold on. I thought it meant noticed him writing that they're separate languages. But we call k versions actually separate languages.

00:54:45 [ML]

I remember, I think he said this at the j Conference a while ago. I keep making languages, I keep naming them k. He would have that would have been would not have been recorded or anything but that's my memory of what he said.

00:54:57 [ST]

Well, it seems it seems consistent. It seems consistent because we know that he begins each time by deleting everything he's done before.

00:55:09 [CH]

All right, here's a here's a few notes that I this has been great. This is better than I thought it was going to be. At one point I just kicked back and was listening. It was amazing because the goal of this was for me to learn a bunch of stuff and I learned so much more than I expected to. My first comment, this is just a short comment was that your taxonomize-ing of rank, cut and key into one group. I think it's brilliant. Never thought about it that way before. I also didn't realize that like I always in my head referred to those cut functions as intervals because I guess I never looked at the at the box that they're in and it says cut at the top. But like there's like, you know, a two by three and half of them are self intervals or intervals. And so I was like, I don't know how to refer to these. So in my head I just called them intervals, didn't realize they were called cut. But I think that's probably one of the most interesting of the things that I know about j at least qualify that statement. Cuts probably one of the most interesting areas, especially coming from APL. Because APL has a couple different. I'm not even remember what they're called the partition. Yeah, partition and enclosed partition and BQN doesn't even have those. They have the group that can be combined with a bunch of different.

00:56:23 [ML]

Just one.

00:56:25 [CH]

Yeah, but. j has like you know six of them, and I remember talking to my boss about this. Or maybe he wasn't my boss at the time, Michael, about how I thought j's version of these interval functions are really like they managed to haphazardly design a 5 argument function in j by, you know, allowing some of the arguments on it to be like, you know, underscores and values. And the first argument or last argument of the array that it's reading in. Anyways, very interesting for the listener that is j curious. Go check out cut. Maybe we'll do a whole episode on on cut later, but. My meta point being that the original point being that putting them into a single group I had never thought about before and it makes a lot of sense.

00:57:08 [HR]

Well, can I say it wasn't just you. I think the fact that rank is really just another one of these partitioning verbs escaped everybody's notice until just two or three years ago. And the the reason I say that is that all of the partitioning verbs have a variant where the verb can be a gerund [15] and therefore, it so the term is in effect a list of verbs. And the individual verbs are applied cyclically. So if I have 3 partitions and three verbs, I can apply the first verb to the first partition, second verb of the second partition, the third verb of the third partition.

00:57:57 [CH]

Are you serious? Is this for all of them or just rank?

00:58:00 [HR]

It was. It's for all of the partitioning, so you can do it for key. Uh, all the cuts. Suffix, prefix reduce. Yeah, all these partitioning things with the sole exception of rank. Though it just I think somehow even in Ken and Roger's mind rank was just it wasn't conceived as what it is. Which is the most basic way to partition the argument. So the so the cyclic gerunds were only added to rank a couple of releases ago. [16] When we, we realized, hey, wait a minute. This should apply to rank also. So yeah, it doesn't jump out at you the rank and key and index are all really part of one thing.

00:58:58 [CH]

I think for me too the one of the things that sort of makes it feel like it's in a different category is that, and I don't actually know this for sure. I just assume that when you are doing a sort of reduction like operation combined with the cut primitive and I assume maybe the key as well, like you're not actually materializing the nested vector or whatever that you would get. So like for instance, if you're trying to figure out like I've solved this problem in a number of languages like the maximum number of ones in a row. You can do a cut on like one or zero or whatever and then do like a sum on each of those in a language that or a library that you're using in a different language. You might have to materialize like a list of lists and then do the summations on those lists, but hopefully in j like they don't materialize that behind the scenes because you don't need to.

00:59:51 [HR]

No, we, well, we do generally I mean maybe not for the particular case you had. But in general if I'm gonna apply a verb to... Well, yeah. A verb in j needs to have an argument that is an array. So if the argument comes from scattered rows of an array. I have to collect those into one array before I can apply the verb to it. But now that's for key. For cut when they're already contiguous, I don't have to materialize, I envelope.

01:00:27 [CH]

Right. That makes sense. That makes sense key key you have to because.

01:00:30 [HR]

We just operate it in place.

01:00:32 [CH]

Yeah. So we'll stick to cut then because that is the example that we're talking about here. It's like, you know, one then 0100 back and forth. And I actually kind of refer to a lot of these as like sliding-reduce-reduce. Where you're doing some sliding operation. And like so for I think the very first time I came across this, it was this problem that I showed in a C++ now 2019 [17]talk where it was like given a string of ones and zeros, determine if there is a string of ones that is greater than seven in length and one of the ways you can do this is just look at 7 elements at a time or characters in that string, check if they're all ones, and if any of those seven length strings are all ones, you just return true. And in like I think Haskell, the way that's going to work. Actually I don't know about Haskell, but in many languages you'll have to materialize those windows of seven and then do the check and then do the reduction at the end of the day. Whereas in some language that has a primitive that is doing all of this at once you can ignore all that stuff.

01:01:34 [HR]

That's true, yeah.

01:01:35 [CH]

Anyways, this is all just. I'll just, I think cut is super fascinating in j and I've never heard anyone sort of put them in those groups. Right, I was in the middle of making the point that rank already has like your array materialized. So like whereas you're doing some operation that you pass with cut as like a higher order function. That seems kind of different because if you're not materializing like your nested list or list of lists versus if you're applying some rank one operation on a matrix. Those feel like in different categories because you never actually materialize like your your list of lists are like nested list, but that's more sort of just like the way. It was in my head. But like, if you take a step back, it is kind of the same thing like rank, key and cut. They're all doing, they're all applying some operation on some subset of like the overall array that you're working on. Marshall.

01:02:29 [ML]

Yeah. Well. I have kind of a different view on this, which is that when I was doing BQN, [18] I decided that this class of functions. These things that apply functions to some parts of an array that's sliced up however you want. They shouldn't be operators or modifiers. They should be actually just functions that take all those parts. They give you all those parts and then you apply the function each on those. So you have, you know, one modifier each which does the same thing, and then you combine it with all these different functions and then. So an example of why that's useful. I mean, I'm not going to claim that it's automatically better. But one place where it's useful is that, for example, instead of this gerund functionality having to be built into the language, if you wanted to apply a bunch of different functions in different orders, you could make a list of functions and use the an apply function. So to apply each function to each cell and then you can build the list of functions however you want to. But one primitive that is a modifier is rank, so I also didn't consider it to be part of that class, but now that I'm thinking about it, I could have. So what I could have done instead, was have a function that just splits out the cells of your argument. Actually, a lot like APL split.

01:03:56 [AB]

As boxes you mean? As boxes? Or enclosures or whatever you want to call them.

01:04:02 [ML]

So right now, you've got "enclose" which turns the whole argument into one Rank 0 array, and the way you currently do this is you call enclose with "rank" to split the cells together. There could instead be a function that takes like a left argument (that's the rank) and it splits those. I think that would be less convenient in a lot of cases, but it would also be ... [sentence left incomplete]. The function is simpler than the rank function because rank has to have these three different operands, and this thing would only ever have one argument, and you apply it to both the arguments that you want to split up if you're doing that.

01:04:38 [HR]

Yeah

01:04:38 [AB]

But you wouldn't get the pairing facility: pairing up ranks from one array with another array.

01:04:46 [ML]

Well, you would leave that to "each" which already has it.

01:04:49 [AB]

So that's the old method right? Before Dyalog added [the] rank operator from J to the language and other ... [sentence left incomplete].

01:04:59 [ML]

Yeah, you would have had to. Now Dyalog has the issue that the split function [19] is really weird. You have to call enclose where the axes are all of the trailing axes that you want?

01:05:09 [AB]

Sure. That's split being odd, but not even nested APLs like APL2 have a split at all. You have to use enclose with axis. There's no other way. And so the old method of doing it was exactly that: you have enclose with axis (OK, it's an anomalous syntax). It uses the axis operator on the left argument, but it could have had a left argument instead and then use [a] different symbol. But I also understand why it makes sense to see it as an extension, right? You have enclose: encloses the entire array. You have enclose with axis: encloses some sub arrays along those axis. And then, you can use "each".

01:05:50 [ML]

So it's interesting you get these two levels [chuckles] where first, if you don't even realize that rank exists, you implement it using a function. And then if you do, maybe you implement it as an operator. But then, you could also have this further level, which as I said, I don't know if it's an improvement or not, where you split it into being a function and just that one "each" operator. But I mean the function method is definitely not always better and we talked about the ways where windows is kind of weird on the last episode actually, so it's a choice. I think it makes the language a little simpler, but it's got its drawbacks too.

01:06:27 [CH]

Definitely food for thinking.

01:06:31 [HR]

Did you have more? You said like you were starting a list of topics, kind of.

01:06:34 [CH]

Oh yeah, that was number one. I mean, I said it was short, but then you responded being that it's actually something that many people have missed. So hopefully, if the listeners are totally lost, maybe come back and listen to this episode like in a year or so, because this stuff is ... [sentence left incomplete]. My mind was just buzzing the whole time, listening to the conversation that was happening here and I'm just like: yeah, man! I'm the listener of this podcast that just gets an advance listening 4 days in advance of everyone else [others chuckle].

01:07:04 [AB]

You get to ask questions too.

01:07:06 [HR]

I should point out: you said it was hard to find "cut". For those listeners who are looking for cut in the J vocabulary, it's semicolon-dot.

01:07:16 [CH]

Semicolon-dot yeah. And I definitely like, if you go to NuVoc [20] , I know where to find it on the page. Have always missed the title of that box being cut. The second thing though ([my] mind was buzzing about this the whole time). I came up with a whole new algorithm or name for a flavor of an algorithm. We have scans and typically the two flavors of scans that I'm aware of that already have names back from like the 60s are prescan and scan. And the easiest way to think about these, in my opinion, are that a scan takes a sequence of N elements and gives you back a sequence of N elements. So if you're familiar with C++ or Haskell (or APL or J), these are the scans you get as primitives and in C++, this is our partial sum algorithm. In Haskell, this is the scanl1 algorithm. And the key property of this is, it doesn't take an initial value; it uses the first element of that sequence as sort of the first value of the output sequence, and then goes on to apply whatever binary operation (plus, multiplies, etc). Then there's a prescan algorithm [for] which the distinguishing difference from the scan algorithm is, it takes a sequence of N elements and gives you back N + 1 elements. And it also takes an initial value. So in Haskell, whereas the scan algorithm was scanl1, the prescan algorithm is scanl. This one takes an extra argument as the initial value, and so whatever that initial value is, will be the starting value of your output sequence. It gets a little bit confusing in C++ because we technically have a prescan algorithm called exclusive-scan, but it actually doesn't give you back N + 1 elements; it gets rid of the last element, so it's actually N ... [sentence left incomplete].

01:09:05 [AB]

The last one, not the first one.

01:09:07 [CH]

Yeah, it's N + 1 - 1 [laugh]

01:09:10 [ML]

So if it got rid of the first one, that would be the first one we talked about, which is the inclusive-scan.

01:09:16 [CH]

Yeah

01:09:16 [AB]

Wait, hold on. So maybe I'm not sure I'm following then. I would have expected it to give back N elements and not include the initial value. Firstly you know what the initial value is already. It should be easy for you to add that. Secondly, there's a good chance that the initial value is a very different type than all the derived values later.

01:09:37 [CH]

So wait. Your confusion and assumptions are valid, but exclusive-scan in my opinion is just broken: it's supposed to be the prescan algorithm. So if you have [an] iota sequence from one to five and you do an exclusive-scan zero [meaning 0 is in the initial value], on that one to five sequence. It'll give 0, 1, 3 et cetera. And then just kill off the last value. So I consider this a broken algorithm that shouldn't even be in the mix. I probably shouldn't have mentioned it cause I've confused people.

01:10:07 [ML]

Well, no, actually I think it's possibly more coherent than the inclusive-scan. What you really want in a lot of cases is this exclusive-scan plus the final result from reduction. So, I don't have a good example in my head, but the idea is that for each element, first you're going to return what you've got. Then you apply the function to that and the element and you keep going, and so at each step you're putting what you've got initially in the spot, and that's good. So one reason why I might say that's more coherent is that if you do for example, a plus scan [+/] on an array of ones, the result (if you do it inclusively), starts with one. Which is at the wrong index because the index is 0 and I don't want any disagreement on that! [21] But if you do the exclusive scan, it instead ... [sentence left incomplete]. . I mean, in an APL you might infer an identity element of 0 to start with, but then the result starts at 0, and so the results correspond to the indexes. Another way you might say it is that the number of arguments you processed at each index is equal to that index. But then, like you say, there's the problem with throwing out the last value. So what you really want is that scan, and then the last value.

01:11:37 [CH]

Yeah, I mean, just do a -1 drop, if you don't want that last one. But anyways, we'll just try and get through the flavors, so there's scan and prescan those already exist. Then, listening to Henry, he starts to describe the fold. And then I realize that this does not correspond (at least when you don't specify the X argument and you're just using the function created by your fold as a unary operation or monadic operation) ... this does not correspond to either a prescan or a scan, because you're given a sequence of N elements and you're returned N-1 elements. And this is in the monadic case, because the dyadic case is different as well. In the monadic case, we have prescan for N to N+1, scan for N to N, and then something (that doesn't have a name) from N to N-1. So immediately, I'm like: what do you call [it], in the progression of prescan, scan ... Probably postscan and I was like wow! That's amazing postscan is the name of this new third flavor. But then he [Henry] continues to go on to explain the dyadic version, where you specify an initial value. And this is, if you're keeping up, an N to N algorithm, but where you also specify an initial value. Whereas in the original scan algorithm that went from N to N it was just using the first element as the initial value. So what do you call (and I don't actually have a good name for this) ... what do you call an N to N algorithm in the context of prescan, scan and postscan, where we already have an N to N algorithm? Adám has his hand up apparently.

01:13:06 [AB]

It is obvious! "omniscan"

01:13:10 [CH]

Omniscan? What does "omni" mean?

01:13:11 [AB]

Pre and post [laughs]

01:13:14 [ML]

So we have this in BQN and K and we just call it "scan".

01:13:18 [CH]

You have this in what languages?

01:13:19 [ML]

In both BQN and K.

01:13:22 [CH]

That's right, because [in] BQN you changed the dyadic versions to take the initial elements and that's why we have windows and not the N-wise reduce. Which I haven't made-up my mind if I like that or not. Except for the cases where I do need an initial value and I'm going straight to BQN because [chuckles] I know you're dyadic overload is what I want here. And you said K also does this?

01:13:43 [ML]

I think K works the same way, where you can take an array and an initial value.

01:13:47 [CH]

Well, we can't call this flavor scan because I mean, in general they're all called scans [chuckles]. In my mind, does "omni" mean both post and pre? I thought omni just meant omniscient, you know? Got up in the clouds and see everything.

01:14:00 [HR]

Omni means old.

01:14:02 [AB]

Yeah, but you have things like minus [which] is a prefix operator and factorial is a post fix operator. And then you have things like absolute value which is an omnifix [operator]. It's both on the left and on the right of the value that you have taking absolute value of in regular mathematics (not talking about API of course).

01:14:26 [CH]

I mean I have to work through this, but what's interesting too is prescan is clearly the ... (maybe it's not clear) but prescan is definitely the fundamental scan in that you can do everything with that scan. [This is] because one of the restrictions of the original scan that I explained (the N to N that uses the initial value as the initial value of the output sequence) is you are restricted to the property that the type of your output sequence has to be the same as the type of your input sequence. Maybe if you allowed homogeneous arrays as the output sequence . But in general, in typed languages once you've determined the type of the first element, that's going to be the type for the rest of the elements. That is very, very restrictive in many cases. So the way to get around that is to use a prescan, where you can determine whatever type you want (it can be completely different). When you get into sequential versus parallel, this changes the ways you implement this because a lot of the times, if you've got two different types for your accumulator value and the value of your sequence, you need to make sure that you're applying things as a left scan from left to right. Because if you start to do things non-deterministically, you're going to end up with type mismatches and stuff. But I'm getting into the deep end or the weeds here, but the question really is still what do we call this? Like a initscan is the best thing I came up with in my head, but I don't like that at all. I kind of want something that starts with "p", but maybe [chuckles] we should just call it "scan" like Marshall said.

01:16:06 [ML]

Well I mean, I do think you can add initialize to any of these kinds of scans. Like if you have a prescan, you can also take an initial value and then the initial value is the first element and the next result is your next element and so on.

01:16:21 [CH]

Well, I mean, the main property of a prescan is that it takes an initial value.

01:16:2: [ML]

Oh but in APL you have the identity element. You could also force it to take the identity element.

01:16:31 [AB]

But you might not have that for the particular function, right? It might be a custom function where it's not determinable.

01:16:37 [ML]

Well, yeah. I mean, but it's not any worse than reduction, where you might have an in to anarray and no identity.

01:16:42 [CH]

This was actually one of the biggest ... [sentence left incomplete]. Was it Sean Parent? Someone in the C++ community that I remember speaking to you and [I] value his or her opinion. I can't remember exactly who. This was probably with Sean Parent, but I could be wrong about that. [He] pointed out that's actually one of the problems with APL, that a lot of these primitives operate on a type having an initial value, but there's no way to go past that. What if you're defining your own type? And then how do you customize these primitives so that they know what the identity value [is]? There's no way to do that in APL. You're kind of just stuck.

01:17:17 [AB]

That _just_ not true. Dyalog doesn't implement that, but NARS2000 does implement that and you can specify what the identity element is for your custom function. And you _can_ specify what the inverse is for your custom function and so on. So, it can be done.

01:17:36 [CH]

How many of the APLs up until NARS, though, did?

01:17:40 [AB]

I don't think anybody did.

01:17:44 [CH]

So if your response is like it's possible, but it doesn't exist, that's not really like ... [sentence left incomplete].

01:17:49 [AB]

No, it's been it's been specified how this can work. If I remember right, it was specified before NARS2000 implemented it. It's just that none of the big APLs bothered to, because it's really rare that you actually need this.

01:18:02 [CH]

Yeah, I mean, that's kind of like, I really want zip (which is a function that exists in many languages and C++ doesn't have it, but C++23 is getting it). And if someone was talking to me like: "I just can't deal with the fact that C++ doesn't have zip", and then my response is: "don't worry, it's here in C++23". And they're working at a company that's upgrading from C++11 to 14 (which means that they'll get it in like 15 years), it's true, but it doesn't really help that person.

01:18:28 [AB]

It's not like it can't fit into the language

01:18:30 [CH]

Yeah, so it I guess, take back one of the problems with APL. It could exist. It just doesn't in most of the more popularly used APL versions in history.

01:18:43 [BT]

And then maybe a step beyond C++ is when you're talking about implementations, it's been defined. It just maybe not implemented yet. It's not like it's going to break something in order to put it [in].

01:18:54 [CH]

Yeah. And technically, there are many libraries that have zip but package management is not an easy thing [chuckles] and there are many companies that don't like you having third party dependencies. Anyways, this is a whole meta discussion. Anyways, we don't have a name for scan or the 4th scan, but still postscan is quite nice and I guess my final comment ... [sentence left incomplete]. (I just realized though, we're way over time, but hey, welcome to arraycast! [laughs]). It sounds like one of the motivations for this fold primitive would disappear if ... [sentence left incomplete]. I'm not sure actually if J has this (this is just top of mind because I've been doing this test on APL and I'm planning to do it on BQN and J as well) [but] as I was profiling over the last week [a] very simple iota-map-map-reduce sequence. So like iota of 10,000; then like a 1-plus, then a 2-times and then a reduction. And then technically if you go above 10,000 elements, you might need to do some modulus so you don't overflow or have to promote to a floating point type. But I was hoping that the final expression that included the reduction would be the fastest of all of them, because in a sophisticated enough interpreter, it would recognize that the generator primitive iota, followed by two mapping operations, one-plus and two-times ... if those are your expressions, you have to materialize that vector at the end of the day. Whereas with the reduction, you can technically just boil that whole thing down into a fold, where there's O(1) memory. You don't need to have O(N) memory in the size. But at least in the Dyalog APL code that I tried, the first version took X time, the second version took 2X, the third version took 3X and the final version took like 3.3X or something like that. And it sounds like the sort of the motivation between this fold thing is (especially, like you said, if you've got some massive amount of state) that you could actually spell some of this stuff this in the simplest cases. (Not in the case where you need like 3 different pieces of state). You could just do like a map or a transform on your initial sequence and then do some kind of reduction at the end of the day. But it sounds like using this fold primitive will avoid materializing a vector or a scanned sort of result value of all these unnecessary pieces of state. So like when Adám said it actually sounds like there's been some performance things, in my mind, I was thinking for a certain category of thing that you're solving, this fold will definitely be faster than doing it naively with, without this fold primitive, but maybe I'm wrong about that. And then the second thought that I was making is that if this was built into this language (this kind of like fusion technique) it might be the same thing as what fold is doing anyway. I'm not sure if any of that made sense.

01:21:49 [HR]

It did make sense, but I don't know that J is ever going to do that sort of analysis of the primitive string.

01:22:03 [AB]

I think there was a version of K that had arithmetic progression vectors built in so that if you did the equivalent of iota 10, you would return that and not actually evaluate it. You're gonna be lazy about it and that means if you multiply by two, you will just remember that, say two times that and then add one. It will remember that it's one plus that and then when you sum it (I'm not sure it actually; it would probably materialize at that point to the full thing), but it's fairly easy then for summation to recognize: "oh, I've got myself an arithmetic progression; actually, there's a direct closed formula for the sum". But what's the point of optimizing in such languages? We could be clever, right? We could say plus-slash-iota, and optimize that with that formula children learn in school. But seriously, nobody's going to do plus-slash-iota in a real application. If you actually need [to be] that fast, you just write the formula.

01:23:08 [CH]

It's a trivial example, but there's a reason that there's like a whole, sector (industry) built around MapReduce, right? Which is technically what you're doing all the time in APL.

01:23:17 [ML]

So the fusion is a different operation. I mean, the reason that MapReduce is an entire application is that optimizing just MapReduce alone is pretty hard [chuckles]. So in the APL family, we pretty much just don't even go that far. We just optimize the individual primitives, which is hard enough, and which is something weird that no language has really, gotten to the bottom of (there's all sorts of stuff that just hasn't been done). I mean, the thing is, you can't do it like a C compiler would do it and work on it one number at a time, and say: "Alright, I'm going to take this number, multiply by two and add it to the result, because that's really slow. You're not using the CPU's vector instructions like a well implemented APL primitive would. Now C compilers do have auto vectorization and that's I guess, kind of in the area of where you'd be pointing. But auto-vectorization doesn't apply very often. I mean, you have to write your code into it in a very particular way to get it to work, and even then the generated code is often pretty bad.

01:24:40 [CH]

Is the vectorization going to outweigh the memory allocation of the intermediate results though? Like I have no idea.

01:24:51 [ML]

I mean, vectorization gives you automatic four times at least.

01:24:54 [HR]

Yeah, you would have to be able to bunch together several primitives in a row and figure out that it's some way to operate on them in registers before you have a chance of beating the vectorized primitives. And I just don't think it's worthwhile. And the other thing is, once you start operating with named functions (you know if it's not "plus" [but] if it's some user function), you can't optimize over it because the user is allowed to redefine functions and functions. That's a project for the next generation as far as I'm concerned.

01:25:29 [CH]

Are all of the APLs, Dyalog APL, BQN, and J ... they all have this kind of vectorized primitives behind the scenes to some extent?

01:25:36 [HR]

To a great extent.

01:25:39 [CH]

Yeah, that is actually true. When I did that, whenever, 10 different live streams back a couple years ago [chuckles], I spent I think 3 or 4 hours ripping out like all these AVXL, you know intrinsic ... It took a while, but I was like: "wow, there's a lot of work that's been done here".

01:25:53 [BT]

Now you say you can't optimize it with regard to time, but would you be able to optimize it for space because you could do it in place because you're just applying a function within the fold?

01:26:05 [HR]

J already does that.

01:26:07 [ML]

Well, I mean you can write your code in such a way that you just move all the extra functionality that you want in the map into the reduction. So you say, I have my initial value and I want to apply the function to the initial value and then "1 plus", "2 times" whatever the new value is. And then you get good space usage. But then J probably can't vectorize anything. [22]

01:26:32 [BT]

No, but the same thing would apply to fold as you're talking about reduce. Fold works the same way. If you do all your operations into that in between, it would still work the same way.

01:26:43 [HR]

And in J, the intermediate state will almost always be rewritable. You have a inplaceable, so the chess board we were talking, well, probably not. But if something wasn't boxed (if you had an unboxed array), it could be modified in place over and over again, and never reallocated.

01:27:10 [CH]

We know what I'm doing this afternoon: it's going to go test this stuff, because if it is inplaceable, then I would expect instead of it being like X, 2X, 3X, 3.3X or whatever it is, I would expect it to be X, 1.1X, 1.2X and then 1.4X because the main work in that operation is the reallocation, I would expect. Although things are being vectorized. I don't know.

01:27:40 [ML]

But you have to do an initial allocation too. Every step does an allocation.

01:27:43 [CH]

If it's doing it in place that's the whole point, right?

01:27:46 [HR]

Yes, if you do this in J and you make it inplaceable, well, it'll be 1 allocation and then add one in-place and multiplied by two in-place. That's a significant speed up it.

01:28:03 [ML]

Well, I'm not sure about that. I mean, it's not the allocation that's expensive; it's the using memory that's not already in the cache.

01:28:10 [HR]

Exactly

01:28:11 [ML]

And if you're timing with a repeated operation (if the memory manager is good and you're reusing memory) once you finish the operation, you get rid of the old memory and the allocation is probably gonna happen in the exact same place.

01:28:23 [HR]

You might be able to get by with double the needed cache usage. Yes, that's possible if you're careful to discard things early. This was a big part of what I did in the last release of J, to make sure these arguments are discarded as quickly as possible.

01:28:46 [ML]

Yeah

01:28:47 [CH]

So I've lost you. Say what you said again, Marshall? That it's not actually the allocation that matters.

01:28:53 [ML]

Yeah, I mean allocation is just, you do some work and you say: "all right, I'm gonna use this bit of memory" and you never touch the memory. So I mean it's a fixed cost. It's gonna have nothing on operating on a 10,000 element array. But then the problem is that you actually have to use that memory, and if it's not already in the CPU cache, then you have to drag it out of RAM into the cache, and that's pretty slow relative to these fast array operations. If you can get the allocator to use memory that's already in the cache, even if it spills out of L1 or L2, L3 is a whole lot faster than RAM. So if you can use something that's vaguely like something you've seen before then that's kind of all you need. To have it perfectly (each operation taking the same amount of time), you would have to allocate into the lowest level that it will fit in. But anyway, if you're throwing out everything at the end of each step and then restarting the step again, it should be possible for the allocator to give you that same memory pretty much.

01:30:06 [HR]

It depends very much on how big the operands are, particularly with respect to L2 cache.

01:30:14 [ML]

Well, I mean, if they're too big, they just don't fit in cache and then what you have to do is chunk the operation into parts that will fit in the cache. And that's something that the array languages don't do for you either.

01:30:26 [HR]

Well, that's on our work list. That will be important for very large ... I mean, for operands that are much bigger than L2 cache.

01:30:39 [ML]

Yeah, we've been looking a bit at that too.

01:30:47 [HR]

Yeah, I think that will turn out to be important, but my measurements are that you can have what I would call a 2 address addition where you do A+B and store the result into A. Or three address addition which is A+B and store the result into C. And I find mostly that the two address operation is something on the order of 30% faster and part of it's because even though you haven't touched C [the aforementioned variable], just writing into C is gonna cause it to be cast out ... [sentence left incomplete]. Something has to make room for C in the L2 cache, and so if I read from A and B and write to A, then no cast outs from the cache. Everything operates in place. When I write into C, there are going to be cast outs and that is a hidden tax on the cache bandwidth available in L2 [cache]. L2 is pretty much saturated with reading and writing for the arithmetic operations. And if you add on any requirement to cast out cache lines, that shows up as a generalized slow down. It's really hard to put your finger on where it came from because it's just something that cache decided it had to do along the way.

01:32:08 [ML]

And so this is even if you use the same function just based on what pointers you pass in?

01:32:13 [HR]

Yes, right. So there's an advantage to operating in place on arguments, in that there's several releases of J ... [sentence left incomplete]. J does a pretty good job now of reusing operands when it can at the peril of making things not immutable anymore, and there are awful lot of bugs that can come up because of that. It's been around a while.

01:32:37 [CH]

Are there any tools? I know that Valgrind [23] is a tool used in C++. It's great in terms of the info it gives you, but it runs ridiculously slow because it has to make a bunch of changes, but I'm not sure if you can use that kind of thing with like a different executable. Like you know, J or something like that. Are there other ways to measure this stuff in any of APL, J, or BQN?

01:33:00 [ML]

So there's a Linux tool called "perf" that you can run on any executable and it runs at pretty much the same speed. I don't know what exactly it does, but it collects at short intervals and it can tell you like the number of cache misses and so on that happened.

01:33:21 [HR]

It does timer interrupts and tells you where it is and has samples of performance counters in different places.

01:33:27 [ML]

Which is like a few percent performance difference. It's hard to even notice.

01:33:31 [HR]

Yeah, but it does a very good job. And if you've got one lousy instruction that's taking time, that will show up after enough samples. Oh, that tool is called Valgrind.

01:33:43 [CH]

Am I'm mispronouncing it ... "valgrend"?

01:33:45 [HR]

Yes, that comes from the name of some mythological north ... dude, I think. [chuckles] You think it means grind on values, but it's not that at all!

01:33:58 [CH]

[laughs] Alright, we're way past our hour mark so we should probably wind down here.

01:34:04 [ML]

We're past the "past the hour park"! [everyone laughs]

01:34:07 [CH]

Is there any last question or comments? Anyone?

01:34:11 [BT]

Well, my observation is people should go in and try fold if they're in J. The more people that do, the more likely fold will become quicker because it will be converted into C.

01:34:21 [ML]

And the weirder this boxing glove gets. Split into two fingers.

01:34:24 [HR]

Well, we'll upgrade to a new glove!

01:34:30 [BT]

And also you learn a lot about this the assumptions you're making about some of the ways you're replacing loops because it does break it open and give you new power, which means that you have to look at these things quite closely as Conor's, various types of post and pre and omni [scans] show. When you're used to doing it a certain way, it's a simple tool. And then when it opens up, now you have to start thinking about how you're using it, but you get the power of that. So it's worth looking at for sure.

01:34:58 [AB]

And K's forwards and backwards slash q's. Definitely look at those, too. They're really fascinating.

01:35:05 [CH]

Well, don't look at them yet. We'll have an episode on it and then you can look at it.

01:35:09 [AB]

Right.

01:35:12 [CH]

[laughs] All right. Thank you so much for coming on, Henry. This has been like I said, even better than I was hoping for. So much food for thought and learned a ton. And definitely, we'll be playing around a bit more of this stuff. Maybe even doing some linux profiling stuff because yeah, this stuff is super exciting and I think we'll throw it to Bob once again ... if you want to give us feedback on this episode or questions for Henry or whatever it is you can reach at ...

01:35:39 [BT]

contact@arraycast.com. [24] It remains the same, it's a constant in your life. If you want to get in touch with this, contact@arraycast.com. Thank you again to Igor and Sanjay for the work they do on transcribing. We were a little late two weeks ago, getting the transcription out; I think we were like 10 hours late. Those things happen, but we got it up as soon as we could. Hopefully this will arrive with transcription intact. That is the plan.

01:36:08 [CH]

Aweome. Once again Henry, thanks for coming on and spending all your valuable time with us. This has been awesome.

01:36:13 [HR]

Think about TPLs

01:36:15 [CH]

And with that, we will say happy array programming!

[everyone]

Happy array programming!

[closing theme music]