alt-theme?

haskell and its computational model

1. introduction

Haskell is a functional programing language, not just any but a pure functional programing language. This is its main selling point, but not what most people know haskell for. What haskell is known for is a reputation for being difficult and weird, doing things completely differently and inventing words made to awaken a primal fear in peoples minds, like "Monad" and "Functor" or "Applicative".

If you know me you know that haskell is one of my favorite languages and often times my go to, I picked it up in september for my functional programing course and have been very pleased with what it offers, so why would I ever choose this language. Could it be that I enjoy the mental masturbation that comes with being able to feel superior when saying "I code in haskell BTW", or maybe I don't care about building actuall things and instead just like staring at my new program that does a whole lotta nothing. Well no if I did I would probably not call that out. I believe the main reason I choose haskell has to do with how it represents computation.

2. An Example that will hopefully make sense after everything

Say we need to do something like, read in a list of files and just print out their contents, in most languages this is a rather easy task, so lets write an example real quickly

char** files = ["f1.txt", "f2.txt", "f3.txt"];
size_t file_count = 3;
for (size_t i = 0; i < file_count; i++) {
    char buffer[100] = {0};
    FILE *fptr = fopen(files[i]);
    fgets(buffer, 100, fptr);
    printf("%s", buffer);
}

Now the reason why I wrote some code, and specifically C code was to demonstrate the architecture we work in, namely the Von Neumann architecture. We allocate memory, write to that memory, print it out and then free the memory (indirectly via the callstack). The Von Neumann architecture has a lot of advantages as it is way closer to reality then other computational architectures. It's rather easy to geuss what is happening (due in part because we are working in C) and you can clearly tell what is going on since we are familiar with it. Now back to haskell how would something like this look in haskell?

main :: IO ()
main = do
  let a = ["f1.txt", "f2.txt", "f3.txt"]
  contents <- mapM readFile a
  mapM_ putStrLn contents

Our implementation ends up being 3 lines long (not counting the function declaration and type signature). Ofcourse size doesn't tell us much since if it did we would write languages that express everything in single symbols. But the question is ofcourse, what is happening?

2.1. a little explanation for (normal) non-haskell programmers

As you probably guessed due to my excesive yapping about the architecture we assume while writing the C implementation, haskell does not follow the Von Neumann model, so thinking about what is being put into memory, what isn't and what has been allocated won't help us much, so how would we explain this?

Well lets first look at line 2 (since I feel like line one is easy enough to understand as making a list of file names) in line 2 we first want to look at readFile, readFile is a function which is being passed as an argument to mapM, in a c style we would write it as mapM(readfile, a). readFile takes in a String (a path to a file) and returns another String (the contents),,, sort of. See that little IO () above our actual implementation, that means "something stored in an IO container that returns nothing" the reason we have IO containers instead of just directly returning a string is because we actually can't do IO in pure functional programing, instead IO functions as a sort of window into the real world, and just like an actually window we need to open it to get whatever is behind it. readFile returns a String inside of an IO container, that arrow (contents <-) opens the IO window and takes the value out. but ofcourse you might be wondering what "mapM" does, well lets imagine we use map instead, since readFile return IO String, we would have a list of IO String, meaning that we can't extract it, since instead of having an IO window to a list of string we have a list of IO windows to String. mapM will first use map, and then turn the List of IO String into an IO List of String. We bassicly grab all the smaller IO containers and place them into one bigger container. So the arrow can now unpack that into contents. We now have list of String, so we did part 1, namely reading all of the files in, so now for part 2, namely printing all the lines out.

To explain this you need to know that haskell is "lazy", meaning that it won't run something untill it actually needs it. The reason why I am telling you this now is because it does actually play a role in how our next line work, so lets think about what we do. we use mapM_, which is like mapM but it just doesn't return anything, then we map our list of contents with,, putStrLn? Well, the reason why this might sound confusing to you, is because how do we map a print command, print doesn't return anything so how would u map a list to print. Well this is where Lazyness comes in, what we just created isn't an action, atleast not yet, an IO container is not just a value. It's an action as a value, we can just extract the return value from the action. To explain it better, look at what we just did, we took our list of contents, and mapped it to an action, so an IO action that prints the content. This action is a value, one that is not executed, so now we have a list of actions that return nothing, so how do we execute it? Well for reason I will explain in a bit more detail soon, just writing an IO container and then not assigning it to a variable in a do block will execute it.

So that is a very VERY simplified idea of what is going on.

3. explaining a bit more

So, what ACTUALLY happened? Well for this I have to use some vulgar language that might scare some people, namely Monads. For this explanation all you really need to know is that a monad is a container of a value, it supports 2 functions, return which takes in a value and wraps it in a container, and bind which takes a function that returns a monad, and then uses that function on the monads contained value, here's some nonsensical psuedo example to hopefully clear it up

f() = function that takes in type a, and returns a monad of type b
[a] = monad containing type a

return(a) = [a]
bind [a] f = f(a) = [b]

The reason you need to know this is because IO is a monad. Haskell is purely functional, but the real world isn't, so in order to interact with the real world, we say that the IO monad encapsulates the real world and optionally a value, which might sounds weird to try and understand, but bassicly what it means is that, if actions happen in the IO monad, they can affect the real world, and will return values based on what the real world is.

3.1. do notation

the do notaion is actually, a chain of binds, lets rewrite it as such in psuedo code

a = ["f1.txt", "f2.txt", "f3.txt"]

bind (bind (IO, (mapM readFile a))), (lambda contents : mapM_ putStrLn contents))

IO bound with mapM readFile returns a [String], which will then be stored in our IO monad, which the mapM_ puStrLn then extracts and uses in it's own action.

3.2. connecting everything

So what is actually happening is we describe the action of reading as a function that takins in the real world (through the IO monad) and a filepath, and then returns the new world that might have changed, and the contents of the file. Then we use the print function, which takes in a string to print and the real world, and return the real word after printing and nothing.

4. ok but the original point, why this?

Well I just did a lot of explaining, alot of explaining about how haskell does this. Now this might be very difficult to understand if your not used to it, but lets take a birds eye view of everything. In C what did we need to think about, well first we had to make a sized list, we need to then allocate that, afterwards we loop over the list of file paths, then we initialize a buffer we can write to, then we open the file returning a file pointer, then we read the file and write to the buffer, and lastly we then print it off. In haskell what we needed to do was, take a list of filepaths, map them to their contents, and then map them to the action which prints them.

What I'm trying to show is that, while haskell feels difficult, once u get used to the way it models computation you start realising that it's a lot simpler then the typical Von Neumann architecture, ofcourse most modern languages try to hide it away but once u get any issue you will often need to think about the architecture, for example in java and C# understanding what data holds which reference to which object is always important, in haskell things like that are things u never really think about.

5. conclusion

I hope that I didn't come across as frantic since it's kind of hard to explain something so different to the core like haskell compared to other languages like C. What I really wanted to explain is that, while haskell can be seen as a very weird language, once u understand the way haskell expresses computation you start realising that the language is, well honestly very simple and straightforward, and it is mostly due to the simplified architecture. This is also the reason why I personally prefer haskell, complicated projects don't feel like they start growing to a point where I can't keep up with everything as quickly and I can use the same mind set in bassicly every situation since even when working with libraries u just follow the logic of the underlying architecture, since a simpler architecture means that abstractions are closer to the actual language core.

Now I'm not saying that the Von Neumann architecture is bad, I still write projects without haskell, like for example with C or rust, and I can definitely see why some would prefer Von Neumann architecture, after all the only way to write speed optimised haskell isn't by understanding a computer but by understanding how the compiler works under the hood. But what I do hope you got from this post is that it is worth seeing what can and can't be done by simplifying an underlying structure, using something like haskell has made me realise how straightforward some problems can be and it heavily affects how I write software, especially as a growing computer science student.

But yeah that's all I had to say, cya!