BioHacker News | What makes code hard to read: Visual patterns of complexity (2023)

▲What makes code hard to read: Visual patterns of complexity (2023)(seeinglogic.com)

414 points by homarp 112 days ago | 48 comments

▲feoren 112 days ago

> Chaining together map/reduce/filter and other functional programming constructs (lambdas, iterators, comprehensions) may be concise, but long/multiple chains hurt readability

This is not at all implied by anything else in the article. This feels like a common "I'm unfamiliar with it so it's bad" gripe that the author just sneaked in. Once you become a little familiar with it, it's usually far easier to both read and write than any of the alternatives. I challenge anyone to come up with a more readable example of this:

    var authorsOfLongBooks = books
        .filter(book => book.pageCount > 1000)
        .map(book => book.author)
        .distinct()

By almost any complexity metric, including his, this code is going to beat the snot out of any other way of doing this. Please, learn just the basics of functional programming. You don't need to be able to explain what a Monad is (I barely can). But you should be familiar enough that you stop randomly badmouthing map and filter like you have some sort of anti-functional-programming Tourette's syndrome.

▲seeinglogic 112 days ago

This comment seems unnecessarily mean-spirited... perhaps I just feel that way because I'm the person on the other end of it!

I agree the code you have there is very readable, but it's not really an example of what that sentence you quoted is referencing... However I didn't spell out exactly what I meant, so please allow me to clarify.

For me, roughly 5 calls in a chain is where things begin to become harder to read, which is the length of the example I used.

For the meaning of "multiple", I intended that to mean if there are nested chains or if the type being operated on changes, that can slow down the rate of reading for me.

Functional programming constructs can be very elegant, but it's possible to go overboard :)

▲atoav 111 days ago

To me the functional style is much more easy to parse as well. Maybe the lesson is that familiarity can be highly subjective.

I for example prefer a well chosen one-liner list comprehension in python over a loop with temporary variables and nested if statements most of the time. That is because usually people who use the list comprehension do not program it with side effects, so I know this block of code, once understood stands for itself.

The same is true for the builder style code. I just need to know what each step does and I know what comes out in the end. I even know that the object that was set up might become relevant later.

With the traditional imperative style that introduces intermediate variables I might infer that those are just temporary, but I can't be sure until I read on, keeping those variables in my head. Leaving me in the end with many more possible ways future code could play out. The intermediate variables have the benefit of clarifying steps, but you can have that with a builder pattern too if the interface is well-chosen (or if you add comments).

This is why in an imperative style variables that are never used should be marked (e.g. one convention is a underscore-prefix like _temporaryvalue — a language like Rust would even enforce this via compiler). But guess what: to a person unfamilar with that convention this increases mental complexity ("What is that weird name?"), while it factually should reduce it ("I don't have to keep that variable in the brain head as it won't matter in the future").

In the end many things boil down to familiarity. For example in electronics many people prefer to write a 4.7 kΩ as 4k7 instead, as it prevents you from accidentally overlooking the decimal point and making an accidental off-by-a-magnitude-error. This was particularly important in the golden age of the photocopier as you can imagine. However show that to a beginner and they will wonder what that is supposed to mean. Familiarity is subjective and every expert was once a beginner coming from a different world, where different things are familiar.

Something being familiar to a beginner (or someone who learned a particular way of doing X) is valuable, but it is not necessarily an objective measure of how well suited that representation is for a particular task.

▲jghn 111 days ago

> Maybe the lesson is that familiarity can be highly subjective.

Over the years I've come to firmly believe that readability is highly subjective. And familiarity is a key contributor to that, but not the only one. There are other factors that I've found highly correlate with various personality traits and other preferences. In other words, people shouldn't make claims that one pattern is objectively more readable than another. Ever.

I've reached the point where anyone who claims some style is "more readable" without adding "to me" I just start to tune out. There's very little objective truth to be had here.

What should one do? If on a team and you're the outlier, suck it up and conform. If you're on a team and someone else is the outlier? Try to convince them to suck it up and conform. If you're on a new team? Work empathetically with your teammates to understand what the happy medium style should be.

▲atoav 110 days ago

Pragmatic, but good, advice that I would recommend anybody to follow in their daily pracrise.

However I'd defend the notion that on the bad end of things you can have such a thing as (objectively?) hard to read code. With "hard to read" I do not mean "nobody can figure it out with time", what I mean is, that figuring it out takes 99% of programmers longer than the equivalent in another language or style. As you rightly point out, it is important to realize that this is a statistical realization and not an universal law of nature, so it really matters on which cohort of people you look at and recommendations stemming from such observations should be taken with a grain of salt.

Yet, underneath all of that isn't there such a thing as truly objectively hard to read style? Brainfuck for example is an objectively hard to read language – they put that even into the name of the language. Does that mean there is not a single person who can read it fluently? Probably not, but that doesn't invalidate the point. A double black diamond ski track is objectively harder to ski, exactly because there are less people that are able to ski it.

If you see programming as working with language and symbols to achieve some behavior, it is clear that there are patterns who match more with the known (familiar) of most people. If you ask non-programmers or non-mathematicians how to describe some action in any way they like they will probably use their daily language. That means code that looks somewhat similar to how a regular person would write it down is surely on the "very familiar"-end of things. Now I did argue that maximum familiarity to regular people is not in itself a desireable goal, we need to make a tradeoff between familiarity and suitability to express program structures and operations. The latter is not how regular people think at all, so using them as an absolute guide isn't a good idea. However thinking about how to write things that they are both expressive in terms of programming behavior and easy to reason about is a desireable goal. There just isn't a single right way of doing that and sometimes good enough is just that.

▲jghn 110 days ago

For sure. I think we're channeling similar sentiments. Obviously there is *some* amount of objectivity here. No one is going to look at a million lines of Brainfuck code and say "this is easy to grok". it's just that the problem is these objective truths make up a tiny fraction of the crap people debate when these topics come up.

And yes, the maximal familiarity is a huge aspect. Many years ago I got into an argument with a colleague regarding FP patterns. He contended that they were objectively harder to reason about and cited the difficulty of new hires in ramping up. I was contending that this is untrue, but rather those new hires had existed in a world where they didn't often encounter FP patterns. For practical purposes the end result is the same, I'll admit. But to your point: if we accept that the missing ingredient is familiarity vs it being objectively bad it changes how one might approach the problem.

▲Mawr 107 days ago

Funny, as years pass by I slide more and more towards the opposite. For most aspects of readability people discuss, there are objectively correct choices, with some leeway for the specific circumstances.

Code that interleaves high and low level concerns, that has a lot of variables used all over the place, is deeply nested, etc. vs code that is modular, keeps the nesting shallow, splits up the functionality so that concerns are clearly separated and variables have the smallest scopes possible.

One of these styles is better than the other and it's not subjective in the least, so long as you're a human. We all have roughly the same limitations in terms of e.g. memory capacity so ways of programming that reduce the need to keep a lot of stuff in memory at a time will be objectively better.

> I've reached the point where anyone who claims some style is "more readable" without adding "to me" I just start to tune out. There's very little objective truth to be had here.

Adding qualifiers all over the place is just a defensive writing style, best fitting online forums like this one where people will nitpick everything apart. However, it's not reasonable to pay so much attention to someone's particular word choices.

Objective truth is very easy to find actually. The problem you're solving is objective and the desired outcome is too. It's just a matter of analysing the problem space to find the correct design. The feeling of subjectivity largely just comes from the high complexity of the problem space.

> If on a team and you're the outlier, suck it up and conform.

It's way more complicated than you make it sound. It's entirely possible the entire team is wrong.

A general strategy when joining a new team is to follow what the team does exactly and not attempt to make changes until you learn why the team does things the way they do (Chesterton's Fence). Once you understand, you can start suggesting improvements.

▲cstrahan 109 days ago

> In other words, people shouldn't make claims that one pattern is objectively more readable than another. Ever.

Maybe you haven't seen the sort of code I've seen.

Consider this specimen, for example:

  // Call a function.
  // Function call syntax isn't very visible (only a couple parens tacked onto an identifier);
  // this is a major win for readability.
  function callFunction(func, ...args) {
    return func(...args);
  }
  
  // Perform math.
  // Math symbols are arcane and confusing, so this makes it much more clear.
  // I'm trying to program here, not transmute lead into gold!
  function doMath(left, op, right) {
    const OPS = {
      "less than or equal to": (l, r) => l <= r,
      "minus": (l, r) => l - r,
    };
    // TODO: add support for other operators.
  
    return callFunction(OPS[op], left, right));
  }
  
  const MEMO_TABLE = {}
  
  function fibonacci(n) {
    if (callFunction(doMath, n, "less than or equal to", 1)) {
      const result = n;
      // Memoization makes this fibonacci implementation go brrrrrrrrr!
      MEMO_TABLE[n] = result;
      return result;
    } else {
      const result = callFunction(fibonacci, callFunction(doMath, n, "minus", 1)) + callFunction(fibonacci, callFunction(doMath, n, "minus", 2));
      // Memoization makes this fibonacci implementation go brrrrrrrrr!
      MEMO_TABLE[n] = result;
      return result;
    }
  }

Yes, it was an intentional choice to never actually make use of MEMO_TABLE -- this (reproduced from memory) is a submission from an applicant many, many years ago (the actual code was much worse, it was somehow 3 to 4 times as many lines of code).

Compared to:

  function fibonacci(n) {
    if (n <= 1) {
      return n;
    } else {
      return fibonacci(n - 1) + fibonacci(n - 2);
    }
  }

> In other words, people shouldn't make claims that one pattern is objectively more readable than another. Ever.

Taken literally, I can't argue with this. But if you amend that to "people shouldn't make claims that one pattern is objectively more readable than another to competent individuals" (which I suspect is intended to be implied by the original quote), then I'd have to disagree. The choice between functional vs procedural may mostly be subjective, but a convoluted pattern is objectively worse than a non-convoluted one (when the reader is competent).

▲p2edwards 111 days ago

Agreed —

seeinglogic's article made me think of a 3rd option:

1. Sorta long functional chain where the type changes partway through 2. Use temp variables 3. (New option) Use comments

(Here's funcA from seeinglogic's article, but I added 3 comments)

    function funcC(graph) {
      return 
        // target node
        graph.nodes(`node[name = ${name}]`)
          // neighbor nodes
          .connected()
          .nodes()
          // visible names
          .not('.hidden')
          .data('name');
     }

Compare to funcB which uses temp variables:

    function funcB(graph) {
      const targetNode = graph.nodes(`node[name = ${name}]`)
      const neighborNodes = targetNode.connected().nodes();
      const visibleNames = neighborNodes.not('.hidden').data('name')

      return visibleNames;
    }

For me the commented version is easier to read and audit and it also feels safer for some reason, but I'm not how subjective that is.

▲daotoad 111 days ago

Funny, I see the need to add comments as an indicator that it's time to introduce chunking of some sort--arguably it already has been.

In this case, I'd lean towards intermediate variables, but sometimes I'll use functions or methods to group things.

I prefer function/method/variable names over comments because the are actual first class parts of the code. In my experience, people are a bit more likely to update them when they stop being true. YMMV.

▲feoren 112 days ago

The dig on chains of map/reduce/filter was listed as a "Halstead Complexity Takeaway", and seemed to come out of the blue, unjustified by any of the points made about Halstead complexity. In fact in your later funcA vs. funcB example, funcB would seem to have higher Halstead complexity due to its additional variables (depending on whether they count as additional "operands" or not). In general, long chains of functions seem like they'd have lower Halstead complexity.

The "anti-functional Tourette's" comment was partly a response to how completely random and unjustified it seemed in that part of the article, and also that this feels like a very common gut reaction to functional programming from people who aren't really willing to give it a try. I'm not only arguing directly against you here, but that attitude at large.

Your funcA vs. funcB example doesn't strike me as "functional" at all. No functions are even passed as arguments. That "fluent" style of long chains has been around in OO languages for a while, independent of functional programming (e.g. see d3.js*, which is definitely not the oldest). Sure, breaking long "fluent" chains up with intermediate variables can sometimes help readability. I just don't really get how any of this is the fault of functional programming.

I think part of the reason funcB seems so much more readable is that neither function's name explains what it's trying to do, so you go from 0 useful names to 3. If the function was called "getNamesOfVisibleNeighbors" it'd already close the readability gap a lot. Of course if it were called that, it'd be more clear that it might be just trying to do too much at once.

I view the "fluent" style as essentially embedding a DSL inside the host language. How readable it is depends a lot on how clear the DSL itself is. Your examples benefit from additional explanation partly because the DSL just seems rather inscrutable and idiosyncratic. Is it really clear what ".data()" is supposed to do? Sure, you can learn it, but you're learning an idiosyncrasy of that one library, not an agreed-upon language. And why do we need ".nodes()" after ".connected()"? What else can be connected to a node in a graph other than other nodes? Why do you need to repeat the word "node" in a string inside "graph.nodes()"? Why does a function with the plural "nodes" get assigned to a singular variable? As an example of how confusing this DSL is, you've claimed to find "visibleNames", but it looks to me like you've actually found the names of visible neighborNodes. It's not the names that are not(.hidden), it's the nodes, right? Consider this:

    function getVisibleNeighborNames(graph) {
        return graph
            .nodeByName(name)
            .connectedNodes()
            .filter(node => !node.isHidden)
            .map(node => node.name)
    }

Note how much clearer ".filter(node => !node.isHidden)" is than ".not('.hidden')", and ".map(node => node.name)" versus ".data('name')". It's much harder to get confused about whether it's the node or the name that's hidden, etc.

Getting the DSL right is really hard, which only increases the benefit of using things like "map" and "filter" which everyone immediately understands, and which have no extrinsic complexity at all.

You could argue that it's somehow "invalid" to change the DSL, but my point is that if you're using the wrong tool for the job to begin with, then any further discussion of readability is in some sense moot. If you're doing a lot of logic on graphs, you should be dealing with a graph representation, not CSS classes and HTML attributes. Then the long chains are not an issue at all, because they read like a DSL in the actual domain you're working in.

*Sidenote: I hate d3's standard style, for some of the same reasons you mention, but mainly because "fluent" chains should never be mutating their operand.

▲climb_stealth 112 days ago

Just want to add that I both agree with you and parent. Your examples are readable and I see no issues there.

It might be a language thing as well. In Python often people take list-comprehensions too far and it becomes an undecipherable mess of nested iterators, casts and lists.

There are always exceptions :)

▲disgruntledphd2 111 days ago

I read the OP as disliking the Python style, which I agree with.

However the GPs example of map filter distinct I find lovely.

Mind you, I learned to program is a functional language so I probably have a different perspective to most.

▲vitus 111 days ago

> mainly because "fluent" chains should never be mutating their operand.

I see this quite often with builders, actually, and I don't mind it so much there.

    FooBuilder()
      .setBar(bar)
      .setBaz(baz)
      .setQux(qux)
      .build()

▲wegfawefgawefg 111 days ago

everytime i see this i just would prefer a variadic function lol

▲vitus 111 days ago

If your language supports named arguments, sure, but that's not always a given.

▲wegfawefgawefg 108 days ago

i dont buy it. this pattern was common in rust and then fell out of favor.

in c or go if you dont like variadics just pass in a params struct.

its just a fad.

▲vitus 107 days ago

> in c or go if you dont like variadics just pass in a params struct.

Which, again, is fine if your language supports named args for your params struct.

C++ didn't have this until C++20, despite C having it for decades prior.

Java still doesn't have this.

If your language doesn't have named args or designated initializers or whatever it calls them, then what, perform your function calls with argument comments of the form f(/*foo=*/1, /*bar=*/2, /*baz=*/true)? That's error-prone even in the best-case scenario.

▲desumeku 112 days ago

  o_node := graph.GetNodeByName(name)
  var ret []string
  for _, node := range o_node.connectedNodes() {
    if !node.isHidden {
      ret = append(ret, node.name)
    }
  }
  return ret

▲stouset 111 days ago

There is just no way that reasonable people consider this to be clearer. One certainly might be more familiar with this approach, but it is less clear by a long shot.

You've added a temp variable for the result, manual appending to that temp variable (which introduces a performance regression from having to periodically grow the array), loop variables, unused variables, multiple layers of nesting, and conditional logic. And the logic itself is no longer conceptually linear.

▲desumeku 111 days ago

Everything you said is true for both of our programs, the only difference is whether or not it's hidden behind function calls you can't see and don't have access to.

You don't really think that functional languages aren't appending things, using temp vars, and using conditional logic behind the scenes, do you? What do you think ".filter(node => !node.isHidden)" does? It's nothing but a for loop and a conditional by another name and wrapped in an awkward, unwieldy package.

>which introduces a performance regression from having to periodically grow the array

This is simply ridiculous, do you just believe that the magic of Lisp/FP allows it to pluck the target variables out of the sky in perfectly-sized packages with zero allocation or overhead?

▲stouset 111 days ago

> the only difference is whether or not it's hidden behind function calls you can't see and don't have access to.

You "can't see and don't have access to" `if`, `range`, or `append` but somehow you don't find this a problem at all. I wonder why not?

> You don't really think that functional languages aren't appending things, using temp vars, and using conditional logic behind the scenes, do you?

By this metric all languages that compile down to machine instructions are equivalent. After all, it winds up in the same registers and a bunch of CMP, MOV, JMP, and so on.

`.distinct()` could sort the result and look for consecutive entries, it could build up a set internally, it could use a hashmap, or any one of a million other approaches. It can even probe the size of the array to pick the performance-optimal approach. I don't have to care.

> [".filter(node => !node.isHidden)" is] nothing but a for loop and a conditional by another name and wrapped in an awkward, unwieldy package.

This is honestly an absurd take. I truly have no other words for it. map, filter, and friends are quite literally some of the clearest and most ergonomic abstractions ever devised.

▲DonHopkins 111 days ago

Speaking of filters and clear ergonomic abstractions, if you like programming languages with keyword pairs like if/fi, for/rof, while/elihw, goto/otog, you will LOVE the cabkwards covabulary of cepstral quefrency alanysis, invented in 1963 by B. P. Bogert, M. J. Healy, and J. W. Tukey:

cepstrum: inverse spectrum

lifter: inverse filter

saphe: inverse phase

quefrency alanysis: inverse frequency analysis

gisnal orpcessing: inverse signal processing

https://en.wikipedia.org/wiki/Cepstrum

▲desumeku 111 days ago

> it could build up a set internally, it could use a hashmap, or any one of a million other approaches. It can even probe the size of the array to pick the performance-optimal approach. I don't have to care.

Well, this is probably why functional programming doesn't see a lot of real use in production environments. Usually, you actually do have to care. Talk about noticing a performance regression because I was simply appending to an array. You have no idea what performance regressions are happening in ANY line of FP code, and on top of that, most FP languages are dead-set on "immutability" which simply means creating copies of objects wherever you possibly can... (instead of thinking about when it makes sense and how to be performant about it)

▲alpaca128 111 days ago

Your assumption that filter/map/reduce is necessarily slower than a carefully handcrafted loop is wrong, though. Rust supports these features as well and the performance is equivalent.

Also countless real-world production environments run on Python, Ruby, JS etc, all of which are significantly slower than a compiled FP program using filter & map.

> FP languages are dead-set on "immutability" which simply means creating copies of objects

Incorrect. The compiler can make it mutable for better performance, and that gives you the best of both worlds: immutability where a fallible human is involved, and mutability where it matters.

▲desumeku 111 days ago

> The compiler can make it mutable for better performance

Well, we already know that no pure FP language can match the performance of a dirty normal imperative language, except for Common Lisp (which I am happy to hear an explanation for how it manages to be much faster than the rest, maybe it's due to the for loops?). And another comment here already mentioned how those "significantly slower" scripting languages have a healthy dose of FP constructs -- which are normally considered anti-patterns, for good reason. The only language that competes in speed in Rust, which just so happens to let you have fast FP abstractions so long as you manually manage every piece of memory and its lifetime, constantly negotiating with the compiler in the process, thereby giving up any of the convenience benefits you actually get from FP.

▲maleldil 111 days ago

You complain about not having control of how the `filter` works under the hood but are happy to give the language control over memory management. How much abstraction is too much abstraction? Where do you draw the line?

▲whilenot-dev 111 days ago

I consider an abstraction to be a good abstraction if I don't need to care for its internal workings all the time - whether that's some build step in the CI, or the usage of data structures from simple strings to advanced Cuckoo filters and beyond. Even Python uses a Bloom filter with operations on strings internally AFAIK. Correctness and maintainability trumps performance most of the time, and map, filter and immutability are building blocks to achieve just that. Those constructs won't prevent you from looking deeper and doing something different if you really have to (and know what you're doing), they just acknowledge that you'll probably don't need to do that.

▲desumeku 111 days ago

> Correctness and maintainability trumps performance

Great, please make all the software even slower than it already is. I am overjoyed to have to purchase several new laptops a decade because they become e-waste purely due to the degradation of software performance. It is beyond ridiculous that to FP programmers daring to mutate variables or fine-tune a for loop is an exceptional scenario that you don't do "unless you have to" and which requires "knowing what you're doing". Do you know what you're doing? How can you be a software engineer and think that for loops are too difficult of a construct, and that you need something higher-level and more abstract to feel safe? It's insane. Utterly insane. Perhaps even the root of all evil, Code Golf manifested as religion.

▲stouset 110 days ago

> Great, please make all the software even slower than it already is.

It is more or less trivial to emit essentially optimal autovectorized and inlined machine code from functional iterators.

Rust does this, for example.

▲kace91 111 days ago

>Well, this is probably why functional programming doesn't see a lot of real use in production environments

The usual map/filter/reduce is everywhere in production. Python, java, js, ruby, c#...

You could even argue that lack of generics hurt Go's popularity for a while precisely for that usecase.

▲ 111 days ago

▲porridgeraisin 111 days ago

> by this metric

False equivalence. You're saying that the statement "both for.. append and .map() executing the _same steps_ in the _same order_ are the same" is equivalent to saying that "two statements being composed of cmp,jmp, etc (in totally different ways) are the same" That is a dishonest argument.

> Distinct could sort and look for consecutive, it could use a hashmap

People love happy theories like this, but find me one major implementation that does this. For example, here is the distinct() implementation in the most abstraction-happy language that I know - C#

https://github.com/dotnet/runtime/blob/main/src%2Flibraries%...

It unconditionally uses a hashset regardless of input.

Edit: found an example which does different things depending on input

https://github.com/php/php-src/blob/master/ext/standard/arra...

This does the usual hashset based approach only if the array has strings. Otherwise, it gets the comparator and does a sort | uniq. So, you get a needless O(nlogn), without having a way to distinct() by say, an `id` field in your objects. Very ergonomic...

On the other hand...

  seen := map[string]bool{}
  uniq := []string{}
  for _, s := range my_strings {
      if !seen[s] {
          uniq = append(uniq, s)
          seen[s] = true;
      }
  }

Let us say you want to refactor and store a struct instead of just a string. The code would change to...

  seen_ids := map[string]bool{}
  uniq := []MyObject{}
  for _, o := range my_objects {
      if !seen_ids[o.id] {
          uniq = append(uniq, o)
          seen_ids[o.id] = true;
      }
  }

Visually, it is basically the same, with clear visual messaging of what has changed. And as a bonus, it isn't incurring a random performance degradation.

Edit 2: An SO question for how to do array_unique for objects in php. Some very ergonomic choices there... https://stackoverflow.com/questions/2426557/array-unique-for...

▲stouset 109 days ago

I'm not sure reaching for PHP to dunk on functional languages is the win you think it is?

> Visually, it is basically the same, with clear visual messaging of what has changed.

In order to do this you had to make edits to all but a single line of logic. Literally only one line in your implementation didn't change.

Compare, with Ruby:

    # strings
    uniq = strings.uniq()

    # structs, if they are considered equal based on the
    # field in question (strings are just structs and they
    # implement equality, so of course this is the same)
    uniq = structs.uniq()

    # structs, if you want to unique by some mechanism
    # other than natural equality
    let uniq = structs.uniq(&:id)

With Rust:

    # strings
    let uniq = strings.unique();

    # structs, if they are considered equal based on the
    # field in question (strings are just structs and they
    # implement equality, so of course this is the same)
    let uniq = structs.unique();

    # structs, if you want to unique by some mechanism
    # other than natural equality
    let uniq = structs.unique_by(|s| s.id);

You cannot tell me with a straight face that your version is clearer. You'll note that both languages have essentially the exact same code, which is to say: nearly none at all.

▲porridgeraisin 109 days ago

> I'm not sure reaching for PHP

My initial comment was a response to your comment

> Distinct could sort and look for consecutive, it could use a hashmap. It can even probe the size of the array to pick the performance-optimal approach. I don't have to care.

which says that you just use an abstraction and it transparently "does the right thing". The C# example showed how abstractions actually don't do that, and instead simply provide a lowest common denominator implementation. The rust examples you used also uses a hashmap unconditionally. The PHP example was showing how when abstractions attempt to do that it ends up even worse - when you move from strings to structs, you get a slowdown.

In practice, different strategies are always implemented as different functions (see python `moreitertools` `unique_justseen` and `unique_everseen`), at which point its no longer an abstraction (which by definition serves multiple purposes) and it just becomes a matter of whether or not the set of disparate helper functions is written by you, the standard library, or a third party. In rust, you would do `vec.sort()`, `vec.dedup()` for one strategy, and call `v.into_iter().unique().collect()` for the other strategy. That is not an "abstraction" [which achieves what you claimed they do].

▲sfn42 111 days ago

> It unconditionally uses a hashset regardless of input.

This follows what I like to call macro optimization. I don't care if a hash set is slightly slower for small inputs. Small inputs are negligibly fast anyway. The hash set solution just works for almost any dataset. Either it's too small to give a crap, midsized and hash set is best, or huge and hash set is still pretty much best.

That's why we default to the best time/space complexity. Once in a blue moon someone might have an obscure usecase where micro optimizing for small datasets makes sense somehow, for the vast majority of use cases this solution is best.

▲AdieuToLogic 111 days ago

> Everything you said is true for both of our programs, the only difference is whether or not it's hidden behind function calls you can't see and don't have access to.

This is a key difference between imperative programming and other paradigms.

> You don't really think that functional languages aren't appending things, using temp vars, and using conditional logic behind the scenes, do you?

A key concept in a FP approach is Referential Transparency[0]. Here, this concept is relevant in that however FP constructs do what they do "under the hood" is immaterial to collaborators. All that matters is if, for some function/method `f(x)`, it is given the same value for `x`, `f(x)` will produce the same result without observable side effects.

> What do you think ".filter(node => !node.isHidden)" does?

Depending on the language, apply a predicate to a value in a container, which could be a "traditional" collection (List, Set, etc.), an optional type (cardinality of 0 or 1), a future, an I/O operation, a ...

> It's nothing but a for loop and a conditional by another name and wrapped in an awkward, unwieldy package.

There is something to be said for the value of using appropriate abstractions. If not, then we would still be writing COBOL.

0 - https://en.wikipedia.org/wiki/Referential_transparency

▲JadeNB 111 days ago

> You don't really think that functional languages aren't appending things, using temp vars, and using conditional logic behind the scenes, do you? What do you think ".filter(node => !node.isHidden)" does? It's nothing but a for loop and a conditional by another name and wrapped in an awkward, unwieldy package.

But the whole point of higher-level languages is that you don't have to think about what's going on behind the scenes, and can focus on expressing intent while worrying less about implementation. Just because a HLL is eventually compiled into assembler, and so the assembler expresses everything the HLL did, doesn't mean the HLL and assembler are equally readable.

(And I think that your parent's point is that "awkward, unwieldy package" is a judgment call, rather than an objective evaluation, based, probably, on familiarity and experience—it certainly doesn't look awkward or unwiely to me, though I disagree with some of the other aesthetic judgments made by your parent.)

▲porridgeraisin 111 days ago

> performance regression

What? Golang append()s also periodically grow the slice.

> Conditional logic

it's just a single if, really, the same thing is there in your filter()

> Multiple layers of nesting

2... You're talking it up like it's a pyramid of hell. For what it's worth, I've seen way way more nesting in usual FP-style code, especially with formatting tools doing

  func(
     args
  )

For longer elements of the function chain.

▲stouset 110 days ago

> What? Golang append()s also periodically grow the slice.

If you already know the size of the result (there are no filtering operations), the functional approach can trivially allocate the resulting array to already have the correct capacity. This happens with zero user intervention.

IIRC the Rust optimizer basically emits more or less optimal machine code (including SIMD) for most forms of iteration.

▲porridgeraisin 110 days ago

We are talking about making an array with unique elements here. You cannot know the correct capacity for that without overallocating.

If overallocating is indeed OK for your usecase, then you can do so yourself

  uniq := make([]MyObject, 0, len(my_objects))

▲stouset 110 days ago

I was talking more in the general case.

Yes, you can always just write more and more and more code to fix these things. Or you could just… not write more code and still get optimal performance.

▲porridgeraisin 110 days ago

> just... Not write more code

But the abstractions don't really adapt themselves to be performant in each usecase the way you imagine. I gave an example of c# distinct() above. Sure, they can. But do they? No. They only save anything at all for trivial usecases, where the imperative code is also obvious to parse.

▲DonHopkins 111 days ago

o_god!

▲ninetyninenine 112 days ago

>For me, roughly 5 calls in a chain is where things begin to become harder to read, which is the length of the example I used.

This isn't just about readability. Chaining or FP is structurally more sound. It is the more proper way to code from a architectural and structural pattern perspective.

     given an array of numbers

   1. I want to add 5 to all numbers
   2. I want to convert to string
   3. I want to concat hello
   4. I want to create a reduced comma seperated string
   5. I want to capitalize all letters in the string.

This is what a for loop would look like:

   // assume x is the array
   acc = ""

   for(var i = 0, i < x.length; x++) {
       value = x[i] + 5
       value += 5
       stringValue = str(value).concat(hello)
       acc += stringValue + ","
   }

   for (var i = 0, i < acc.length; i++) {
       acc[i] = capitalLetter(acc[i])
   }

FP:

    addFive(x) = [i + 5 for i in x]
    toString(x) = [str(i) for i in x]
    concatHello = [i + "hello" for i in x]
    reduceStrings(x) = reduce((i, acc) = acc + "," + i, x)
    capitalize(x) = ([capitalLetter(i) for i in x]).toString()

You have 5 steps. With FP all 5 steps are reuseable. With Procedural it is not.

Mind you that I know you're thinking about chaining. Chaining is eqivalent to inlining multiple operations together. So for example in that case

     x.map(...).map(...).map(...).reduce(...).map(...)

     //can be made into
     addFive(x) = x.map(...)
     toString(x)= x.map(...)
     ...

By nature functional is modular so such syntax can easily be extracted into modules with each module given a name. The procedural code cannot do this. It is structurally unsound and tightly coupled.

It's not about going overboard here. The FP simply needs to be formatted to be readable, but it is the MORE proper way to code to make your code modular general and decoupled.

▲harrison_clarke 112 days ago

you have this backwards: reusing code couples the code. copy+paste uncouples code

if you have two functions, they're not coupled. you change one, the other stays as-is

if you refactor it so that they both call a third function, they're now coupled. you can't change the part they have in common without either changing both, or uncoupling them by duplicating the code

(you often want that coupling, if it lines up with the semantics)

▲ninetyninenine 111 days ago

I meant code within the snippet is tightly coupled. You can't just cut the for loop in half and reuse half the logic.

▲jltsiren 112 days ago

Your example is a conceptually simple filter on a single list of items. But once the chain grows too long, the conditions become too complex, and there are too many lists/variables involved, it becomes impossible understand everything at once.

In a procedural loop, you can assign an intermediate result to a variable. By giving it a name, you can forget the processing you have done so far and focus on the next steps.

▲stouset 112 days ago

You don't ever need to "understand everything at once". You can read each stanza linearly. The for loop style is the approach where everything often needs to be understood all at once since the logic is interspersed throughout the entire body.

▲__mharrison__ 112 days ago

This. I teach this with Pandas (and Polars) all the time. You don't really care about the intermediate values. You build up the chain operation by operation (validating that it works). At the end you have a recipe for processing the data.

Most professional Pandas users realize that working with chains makes their lives much easier.

By the way, debugging chains isn't hard. I have a chapter in my book that shows you how to do it.

▲jltsiren 112 days ago

In the example above, you first have a list of books. Then you filter it down to books with >1000 pages. Then you map it to authors of books with >1000 pages. Then you collapse it to distinct authors of books with >1000 pages. Every step in the chain adds further complexity to the description of the things you have, until it exceeds the capacity of your working memory. Then you can no longer reason about it.

The standard approach to complexity like that is to invent useful concepts and give them descriptive names. Then you can reason about the concepts themselves, without having to consider the steps you used to reach them.

▲lelandbatey 112 days ago

Folks who are familiar with chaining don't think about it in the way that you've presented. If you're familiar, it's more like:

Filter to the books with >1000 pages

Then their authors.

Finally, distinguish those authors.

If you're familiar, you don't mentally represent each link in the chain as the totality of everything that came before it _plus_ whatever operation you're doing now. You consider each link in the chain in isolation, as its inputs are the prior link and its outputs will be used in the next link. Giving a name to each one of those links in the chain isn't always necessary, and depending on how trivial the operations are, can really hurt readability.

I think its very much a personal preference.

▲jltsiren 112 days ago

The problem with that is that it's all implicit. If the steps are sufficiently complex and if you don't already know what the code is doing, you don't always have a clear mental image of what the intermediate state after each step is supposed to represent. And with a chained syntax like that, you don't have an option to give the intermediate state an explicit name. A name that could help the reader understand what is going on.

You don't have to give a name to every intermediate state, just like you don't have to comment every single line of code. But sometimes the names and comments do improve readability.

▲solid_fuel 112 days ago

That's a question of code organization, one method I find helpful is writing a long chain and then breaking it up into clear functions. E.g.

    var longBooks       = books.filter(book => book.pageCount > 1000)
    var authors         = longBooks.map(book => book.author)
    var distinctAuthors = authors.distinct()

could become (in a different language)

    books
    |> Books.filter_by_length(min: 1000)
    |> Authors.from_books()
    |> Enum.distinct()

and now each step is named and reusable. This example isn't the best, but it can be quite helpful when you have large map() and filter() logic blocks.

▲yxhuvud 112 days ago

I have a guideline where I tend to put a name on a result if and only if it changes the type of the data compared to the previous step. It works well.

▲stouset 112 days ago

So just do that then in the cases where you think it improves clarity? It's not like you can't assign names in the functional style if you need to.

▲bluGill 112 days ago

The problem is the "you" in question is not always able to. When "you" write code it makes sense and so you don't need to assign many names. The you in six months will want more names, and in 6 years that will be different again (how many depends - if this code is changed often then you know it much better than if it has been stable). The worse case will be after you "get hit by a bus" and the "you" in question is some poor person who has never seen this code before.

▲stouset 112 days ago

Unlike the procedural approach, every step in a functional chain is wholly isolated and independent from the others. It is strictly easier to split this style of code up into two halves and name them than it is to disentangle procedural equivalents.

I have quite literally zero times in my ~25 year career had to deal with some sort of completely inscrutable chain of functional calls on iterators. Zero. I am entirely convinced that the people arguing against this style have never actually worked in a project where people used this style. It's okay! The first time I saw these things I, too, was terribly confused and skeptical.

▲bluGill 111 days ago

I will admit to not having written any significant functional code. However the poster child for functional programming always seems to be small programs (xmonad is the largest one I can think of, and the procedural counterparts are not that big either. Of course there is a lot of code out there that nobody can talk about). Thus I have to conclude the question of how that style scales to really large programs remains open.

That said, you didn't address my comment at all. It might be easier, but that doesn't mean it is easy to figure out what that long chain is really done - all too often the algorithm names don't tell you what you are really trying to accomplish in my experience.

▲diatone 111 days ago

Ahh, it gets really interesting when you read code that does have named variables… and they’re misleading.

A strength of functional idioms is that they expose the structure of the code in a way that a name - even a well chosen name - can only hope to achieve. Often, succinctly and comprehensively. At that point you stop caring so much about variable names. They’re still there but you need them less

▲stouset 112 days ago

You have literally just described the set of objects asked for: the unique authors of the books with more than 1,000 pages. I don't understand how you expect to get any simpler than that. The functional style isn't even requiring you to describe how to accomplish it, it almost verbatim simply describes the answer you're trying to get.

If your entire objection is that you might want intermediate-named variables… you can just do that?

    var longBooks       = books.filter(book => book.pageCount > 1000)
    var authors         = longBooks.map(book => book.author)
    var distinctAuthors = authors.distinct()

For short chains (95%+ of cases), this is far more mental overhead. For the remaining cases, you can just name the parts? I'm just completely failing to see your problem here.

▲jltsiren 112 days ago

The problem is that it's easy to overdo it. When you are writing the code, you already know what it's supposed to do, and adding a few more things to the chain is convenient and attractive. But when you are reading unfamiliar code, you often wish that the author was more explicit with their code. Not just with what the code is actually doing, but what it's trying to do and what are the key waypoints to get there.

With procedural code, it's widely accepted that you should not do too many things in a single statement. But in functional code, the entire chain is a single statement. There are no natural breakpoints where the reader could expect to find justifications for the code.

▲syklemil 111 days ago

> But in functional code, the entire chain is a single statement. There are no natural breakpoints where the reader could expect to find justifications for the code.

How are we deciding what's "functional code", here? Because functional languages also provide means like `let` and `where` bindings to break up statements. The example might in pseudo-Haskell be broken up like

    distinctAuthors = distinct authors
      where
        authors = map (\book -> book.author) longBooks
        longBooks = filter (\book -> book.pageCount > 1000) books

IMO the code here is also simple enough that I don't see it needing much in the way of comments, but it is also possible and common to intersperse comments in the dot style, e.g.

    distinctAuthors = books // TODO: Where does this collection come from anyway?
        // books are officially considered long if they're over 1000 pages, c.f. the Council of Chalcedon (451)
        .filter(book => book.pageCount > 1000)
        // All books have exactly one author for some reason. Why? Shouldn't this be a flatmap or something?
        .map(book => book.author)
        // We obviously actually want a set[author] here, rather than a pruned list[author],
        // but in this imaginary DinkyLang we'd have to implement that as map[author, null]
        // and that's just too annoying to deal with
        .distinct()

▲bonoboTP 111 days ago

If you're used to it, then it doesn't read like a single statement, even though technically it is. You put each call of the chain on its own line and it feels like reading the lines of regular imperative code. Except better because I can be sure that each line strictly only uses the result of the previous line, not two or three lines before so the logic flows nicely linearly.

▲whstl 112 days ago

> But in functional code, the entire chain is a single statement

Not necessarily. You can use intermediate variables when necessary.

▲stouset 112 days ago

> The problem is that it's easy to overdo it.

Welcome to all features of every programming language?

Sacrificing readability, optimization, and simplicity for the 95% case because some un-principled developers might overdo it in the 5% case (when the cost of fixing it is trivially just inserting variable assignments) is… not a good trade-off.

▲jltsiren 112 days ago

5% is common enough that you'll encounter it almost every time you read code. And fixing it is not easy, because you first need to understand the code before you can add useful variable names.

Besides, programming language evolution is mostly driven by the fact that everyone is lazy and unprincipled at least occasionally. If you need to be disciplined to avoid footguns, you'll trigger them sooner or later.

▲stouset 112 days ago

The cost of this "footgun" is basically zero. Every step in a functional pipeline is isolated and wholly independent. If you want to split such a pipeline in two, doing so is trivial.

▲vendiddy 110 days ago

It's not clear what point you are trying to make because so far you are describing problems common to all programming languages.

▲whstl 112 days ago

5% is also low enough that you can just use another technique for the exceptions.

▲galaxyLogic 111 days ago

Your example shows that it is possible to give NAMES to the intermediate results of a long chain.

Giving names to things makes it easier to understand the intention of the programmer.

And that also allows you to create a TREE of dataflow-code not just a CHAIN. For instance 'longBooks' could be used as the starting point of multiple different chains.

It gets complicated at some point but I think other approaches result in code that is even harder to understand.

▲__mharrison__ 112 days ago

It's also harder to write and debug with the intermediate steps.

▲nomel 112 days ago

How so? The states of the intermediate steps are logically and easily exposed in a debugger. You can also easily set conditional breakpoints relative to the intermediate states.

I know that intermediate states are generally easier to comprehend, because I never have to explain them in code reviews. To avoid having to explain chains to others, I end up having to add descriptive comments to the intermediate steps, far exceeding the number of characters the descriptive intermediate variables would take. That's why I avoid them, or break them up: time spent in code reviews has proven to me that people have trouble with chains.

▲__mharrison__ 112 days ago

Build up and debug the chain as you work in an environment like Jupyter. No need to create variables. Just run the code and verify that the current step works. Then, proceed to the next. Then, put the chain in a function. If you want to be nice, put a .loc as the first step to explicitly list all of the input columns. Drop another .loc as the last step to validate the output columns. (This also serves as a test and documentation to future you about what needs to come in and out.) Create a simple unit test with a sample of the data if you desire.

I've found that the constraint of thinking in chains forces me to think of the recipe that I need for my data. Of course, not everything can be done in a stepwise manner (.pipe helps with that), but often, this constraint forces you to think about what you are doing.

Every good Pandas user I know uses it this way. I've taught hundreds more. Generally, it feels weird at first (kind of like whitespace in Python), but after a day, you get used to it.

Do you store intermediate results of SQL?

▲nomel 111 days ago

> Build up and debug the chain as you work in an environment like Jupyter.

> Just run the code and verify that the current step works. Then, proceed to the next.

Yes, it's not debuggable/"viewable" without cut/paste/commenting out lines, once it's constructed.

▲__mharrison__ 111 days ago

If it is breakpoints you are concerned about, you can set a breakpoint on a method in the chain and inspect `self`.

▲nomel 111 days ago

Most languages don't expose the internal of map to set a breakpoint, so you're left with individual entities. But yes, there are tricks you can use to make it work, although most require more complex conditional/sequential breakpoints. In your method breakpoint example, you would need to set a chained breakpoint, as in "don't break until this other breakpoint above the chain has been hit", otherwise the breakpoint in the method won't be "spatially" relevant to the code you're debugging.

▲jayd16 111 days ago

Each predicate is a separate scope. How is the complexity additive? If you really have to you can simply be just as specific in your predicate naming as you would in a for loop.

    var authorsOfLongBooks = books
        .filter(book => book.pageCount > 1000)
        .map(longBooks => longBooks.author)
        .distinct()

▲ffsm8 112 days ago

That's only true for casual reviewing and writing.

When you're actually analyzing a bug, or need to add a new feature to the code... Then you'll have to keep the whole thing in your mind. No way around it

It gets extra annoying when people have complex maps, reduces, flat maps all chained after the next, and each step moved into a named function.

HF constantly jumping around trying to rationalize why something happens with such code...

It looks good on first glance, but it inevitably becomes a dumpster fire as soon as you need to actually interact with the code.

▲reubenmorais 112 days ago

In a practical example you'd create a named intermediate type which becomes a new base for reasoning. Once you convinced yourself that the first part of the chain responsible for creating that type (or a collection of it) is correct, you can forget it and free up working memory to move on to the next part. The pure nature of the steps also makes them trivially testable as you can just call them individually with easy to construct values.

▲kccqzy 112 days ago

If you assign an intermediate result to a variable in a procedural loop, you can also assign intermediate results of parts of this chain to variables.

▲titzer 112 days ago

SELECT DISTINCT author FROM books WHERE pageCount > 1000;

▲101011 111 days ago

In fairness, if this was in a relational data store, the same code as above would probably look more like...

SELECT DISTINCT authors.some_field FROM books JOIN authors ON books.author_id = authors.author_id WHERE books.pageCount > 1000

And if you wanted to grab the entire authors record (like the code does) you'd probably need some more complexity in there:

SELECT * FROM authors WHERE author_id IN ( SELECT DISTINCT authors.author_id FROM books JOIN authors ON books.author_id = authors.author_id WHERE books.pageCount > 1000 )

▲titzer 111 days ago

The last one is better as:

SELECT * FROM authors WHERE author_id IN (SELECT author_id FROM books WHERE pageCount > 1000);

But I think you're missing the point. The functional/procedural style of writing is sequentialized and potentially slow. It's not transactional, doesn't handle partial failure, isn't parallelizable (without heavy lifting from the language--maybe LINQ can do this? but definitely not in Java).

With SQL, you push the entire query down into the database engine and expose it to the query optimizer. And SQL is actually supported by many, many systems. And it's what people have been writing for 40+ years.

▲101011 111 days ago

agreed on the revised SQL!

But I don't think I missed the point, the original text talks about measuring complexity as a function of operators, operands, and nested code. The true one to one mapping is more complex than the original comment I replied to

▲YesBox 112 days ago

Scrolled to find the SQL. Such an elegant, powerful language. Really happy I chose SQLite for my game/project.

▲pif 111 days ago

Just add strong types, with verification at something-like-compile-time, and you'll have a sane language.

▲beryilma 112 days ago

This is 5 times more readable than FP example above for the same computation. The FP example uses variable book(s) five times, where using it once was sufficient for SQL. Perhaps FP languages could have learned something from SQL...

▲xigoi 111 days ago

Now modify the SQL to support books having multiple authors. (In the FP example, you would just change map to flatMap.)

▲odyssey7 111 days ago

Notably this example is declarative, the original is functional, and neither is imperative.

▲__mharrison__ 112 days ago

Folks don't seem to have a problem when SQL does it. Only when code like Pandas does it...

▲mont_tag 111 days ago

Hi Matt! I've observed this phenomenon as well.

When the SQL and Pandas examples are isomorphic except for shallow syntactic differences, the root cause of the complaint must either be:

* that the judgment was emotional rather than substantive * or that the syntactic differences (dots and parens) actually matter

▲__mharrison__ 111 days ago

Folks really like their intermediate dataframes... for "debugging".

▲RedNifre 111 days ago

While I think your example is fine, I think the complaint was more about very long chains. Personally, I like to break them up to give intermediate results names, kinda like using variable names as comments:

  var longBooks = books.filter(book => book.pageCount > 1000)

  var authorsOfLongBooks = longBooks.map(book => book.author).distinct()

▲matejn 112 days ago

I like the SQL solutions people posted. But what about this one in Prolog?

  ?- setof(Author, Book^Pages^(book_author(Book, Author), book_pages(Book, Pages), Pages > 1000), Authors).

Depending on the structure of the Prolog database, it could be shorter:

  ?- setof(Author, Pages^(book(_, Author, Pages), Pages > 1000), Authors).

▲dsego 112 days ago

That's not a long chain. It doesn't even have a reduce, try nesting a few reducers and see how you like it.

▲aaronbrethorst 112 days ago

What is "long"?

▲brabel 111 days ago

Are you really confused when people don't think 3 operations constitutes "long"? I would guess anyone with half a brain would agree 3 operations is not long, maybe 5 or 6 and you will have many people agreeing, and above that most.

▲aquariusDue 111 days ago

Here's an abomination of my own design in Rust for example:

   for (index, node) in nodes
       .expect("Error: No blocks for the Body")
       .children()
       .expect("Error: blocks node has no children")
       .nodes()
       .iter()
       .enumerate()
   {
       let block = Block::new(node, index);

       self.blocks.push(block);
   }

▲maleldil 111 days ago

Why the for loop instead of mapping the enumerate iterator into Block::new and collect::Vec?

▲aquariusDue 110 days ago

I lack a good answer other than it didn't occur to me, now I have some code to refactor. Thanks!

▲maleldil 110 days ago

FWIW, I've had to write similar code when there's some complex "folding" operation. I use map/filter a lot, but fold/reduce/accumulate always seemed harder to understand than a for loop. I also prefer nested loops rather than nested iterators.

▲dsego 111 days ago

I agree with the sentiment but I don't think the mocking insult was necessary as per the HN guidelines https://news.ycombinator.com/newsguidelines.html

▲ 111 days ago

▲MathMonkeyMan 111 days ago

> I challenge anyone [...]

    select distinct author from book where pageCount > 1000;

▲ 111 days ago

▲elliottkember 111 days ago

Good example actually. You started with a books array, and changed the type to authors half-way through.

To know the return type of the chain, I have to read through it and get to the end of each line.

A longBooks array, and map(longBooks, ‘author’) wouldn’t be much longer, but would involve more distinct and meaningful phrases.

I used to love doing chains! I used lodash all the time for things like this. It’s fun to write code this way. But now I see that it’s just a one-liner with line breaks.

▲frankharrison 110 days ago

One core reason chaining can be bad is robustness; another longevity/maintenance.

Specifically around type-safety, that is knowing that the chained type is what you expect and communicating that expectation to the person who is reading the code without them needing to know the wider context of both the chained-API nor the function the chain resides in. In the context of this article, that means more complexity, and therefore less readability.

I feel this is important because I have worked on many legacy code bases where bugs were found where chains were not behaving as expected, normally after attrition in some other part of the code base, and then you have to become a detective to work out the original intent.

For readability chains are bad, because they can lie about their intent, especially if there’s various semantics that can be swapped. But, like any industry or code base, if their use is consistent, and the api mature/stable, they can be powerful and fast, if.

▲throwaway894345 111 days ago

FWIW, I'm plenty familiar with functional programming and iterator chains, and I still think for loops often beat them--not only from a "visual noise" perspective, but because complex iterator chains are harder to read than equivalent for loops (particularly when you have to deal with errors-as-values and short circuiting or other patterns) and for simple tasks iterator chains might be marginally simpler but the absolute complexity of the task is so low that a for loop is fine.

> But you should be familiar enough that you stop randomly badmouthing map and filter like you have some sort of anti-functional-programming Tourette's syndrome.

I've been moderated for saying much tamer, FYI.

▲usrusr 111 days ago

Fully agree on the easier to read part. But despite all "coffee is read more often than written" arguments, I see a lot of merit in the functional way. There are a hundred ways to write the loop slightly wrong, or with intended behavior slightly different from the regular. The functional variant: not so much. A small variation from the standard loop in functional style is visible. Very visible. Unintended ones simple won't happen, and the intended ones are an eyesore. An eyesore impossible to miss reading, whereas a subtle variation of the imperative loop is exactly that, subtle. Easy to miss. Readability advantage functional.

In my book, keeping simple things simple and the not simple things not simple beats simplicity everywhere. This is actually what I consider a big drawback of functional style: often the not-simple parts are way too condensed, almost indistinguishable from trivialities. But in the loop scenario it's often the reverse.

My happy place, when writing, would be an environment that has enough AST-level understanding to transform between both styles.

(other advantages of functional style: skills and habits transfer to both async and parallel, the imperative loop: not so much)

▲eitland 111 days ago

This is easy to read, but in reality I have found things to typically be a bit less straightforward.

Three things typically happens:

1. people who like these chains really like them. And I've seen multiple "one liner expressions" that was composed of several statements anded or ored together needing one or two line breaks.

2. when it breaks (and it does in the real world), debugging is a mess. At least last time I had to deal with it was no good way to put breakpoints in there. Maybe things have changed recently, but typically one had to rewrite it to classic programming and then debug it.

3. It trips up a lot of otherwise good programmers who hasn't seen it before.

▲agent327 112 days ago

More readable? How about this:

SELECT DISTINCT authors FROM books WHERE page_count > 1000;

▲namaria 111 days ago

Yeah but how do you get that data written to a structure database so you get to do that?

▲Archelaos 111 days ago

I think, an explicit type would make it even easier to grok:

  ISet<Author> authorsOfLongBooks =
    books
    .filter(book => book.pageCount > 1000)
    .map(book => book.author)
    .distinct()
    .toHashset()

Or whatever the equivalent for ISet<Author> is in the respective language. Or IReadonlySet<Author> if the set should be immutable.

▲williamcotton 111 days ago

And in a real functional language like F#...

  let authorsOfLongBooks = 
    books
    |> Seq.filter (fun book -> book.pageCount > 1000)
    |> Seq.map (fun book -> book.author)
    |> Seq.distinct

...you can set breakpoints anywhere in the pipeline!

▲hinkley 111 days ago

I wouldn’t call 3 long. Which means you’ve picked the softball counterexample. If you were trying to play devil’s advocate, chose a longer legitimate one and show how a loop or other construct would make it better.

Three dots is just a random Tuesday.

▲wegfawefgawefg 111 days ago

in the most popular languages this way of programming is hurt by the syntax.

this could be something more like

distinct

  filter books .pageCount >1000

  .author

i think fp looks pretty terrible in js, rust, python, etc

▲porridgeraisin 111 days ago

When the filter/mapper becomes slightly more involved as it basically always is in real life code, the regular imperative approach is much nicer.

▲Spivak 112 days ago

    authors_of_long_books = set()

    for book in books:
        if len(book.pages) > 1000:
            authors_of_long_books.add(book.author)

    return authors_of_long_books

You are told explicitly at the beginning what the type of the result will be, you see that it's a single pass over books and that we're matching based on page count. There are no intermediate results to think about and no function call overhead.

When you read it out loud it's also it's natural, clear, and in the right order— "for each book if the book has more than 1000 pages add it to the set."

▲tremon 112 days ago

also it's natural, clear, and in the right order

That isn't natural to anyone who is not intimately familiar with procedural programming. The language-natural phrasing would be "which of these books have more than thousand pages? Can you give me their authors?" -- which maps much closer to the parent's linq query than to your code.

▲bdangubic 112 days ago

That isn't natural to anyone who is not intimately familiar with procedural programming.

This is not about "procedural programming" - this is exactly how this works mentally. For kicks I just asked me 11-year old kid to write down names of all the books behind her desk (20-ish) of them and give me names of authors of books that are 200 pages or more. She "procedurally"

1. took a book

2. flipped to last page to see page count

3. wrote the name of the author if page count was more than 20

The procedural is natural, it is clear and it is in the right order

▲skydhash 112 days ago

That's when you're doing the job, not what the mental representation of the solution. I strongly believe if you ask her to describe the task, she would go:

1. (Take the books)->(that have 200 pages or more)->(and mark down the name of the authors)->(only once)

▲bdangubic 112 days ago

I respectfully disagree. And I think one of the core reason SWEs struggle with functional-style of programming is that it is neither intuitive nor how general-joe-doe’s brain works.

▲whstl 112 days ago

I haven't really encountered software engineers who really struggle with functional style in almost 20 years of seeing it in mainstream languages. It's just another tool that one has to learn.

Even the people arguing against functional style are able to understand it.

Strangely, this argument is quite similar to arguments I encounter when someone wants to rid the codebase of all SQL and replace it with ORM calls.

▲bdangubic 112 days ago

Strangely, this argument is quite similar to arguments I encounter when someone wants to rid the codebase of all SQL and replace it with ORM calls.

we must be in completely different worlds cause I have yet (close to 30 years now hacking) to see/hear someone trying to introduce ORM on a project which did not start with the ORM to begin with. the opposite though is a constant, “how do we get rid of ORM” :)

I haven't really encountered software engineers who really struggle with functional style in almost 20 years. It's just another tool that one has to learn.

I recall vividly when Java 8 came out (not the greatest example but also perhaps not too bad) having to explain over and over concept of flatMap (wut is that fucking thing?) or even zipping two collections. even to this day I see a whole lot of devs (across several teams I work with) doing “procedural” handling of collections in for loops etc…

▲whstl 112 days ago

I'm more talking about projects that do start with an ORM, but have judicious (and correct) usage of inline SQL for certain parts. It's not uncommon to see developers spending weeks refactoring into an ORM-mess.

The argument is always that "junior developers won't know SQL".

But yeah I've also seen the opposite happening once. People going gung-ho on deleting all ORM code "because there's so much SQL already, why do we need an ORM then".

And then the argument is that "everyone knows SQL, the ORM is niche".

I guess it's a phase that all devs go through in the middle of their careers. They see a hammer and a screwdriver in a toolbox, and feel the need for throwing one away because "who needs more than one tool"...

▲tvier 112 days ago

You are describing how to execute the procedure, while the gp is describing what the result should be. Both are valuable, but they're very different.

My personal take is that "how to execute" is more useful for lower level and finer grained control, which "what the results should be" is better for wrangling complex logic

▲dambi0 111 days ago

Your daughter may have implemented it procedurally but your description of the task was functional.

▲ 112 days ago

▲syklemil 112 days ago

fwiw, once Python's introduced there's the third option on the table, comprehensions, which will also be suggested by linters to avoid lambdas:

    authors_of_long_books: set[Author] = {book.author for book in books if book.page_count > 1000}

These are somewhat contentious as they can get overly complex, but for this case it should be small & clear enough for any Python programmer.

▲itsmeknt 112 days ago

I tried scaling up the original into an intentionally convoluted nonsensical problem to see how a more complicated solution would look like for each approach. Do these look right? And which seems the most readable?

  # Functional approach
  
  var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = books
    .filter(book => 
       book.pageCount > 100 and 
       book.language == "Chinese" and 
       book.subject == "History" and
       book.author.mentions > 10_000
    )
    .flatMap(book => book.author.pets)
    .filter(pet => pet.is_furry)
    .map(pet => pet.favoriteFood)
    .distinct()

  # Procedural approach
  
  var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = set()
  for book in books:
    if len(book.pageCount > 100) and
       book.language == "Chinese" and
       book.subject == "History" and
       book.author.mentions > 10_000:
      for pet in book.author.pets:
        if pet.is_furry:
          favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory.add(pet.favoriteFood)

  # Comprehension approach
  
  var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = {
    pet.favoriteFood for pet in
      pets for pets in 
        [book.author.pets for book in 
          books if len(book.pageCount > 100) and
                   book.language == "Chinese" and
                   book.subject == "History" and
                   book.author.mentions > 10_000]
    if pet.is_furry
  }

FWIW, for more complex problems, I think the second one is the most readable.

▲syklemil 112 days ago

I'm more partial to the first one because it keeps a linear flow downwards, and a uniform structure. The second one kind of drifts off, and reshuffling parts of it is going to be … annoying. IME the dot style lends itself much better to restructuring.

Depending on language you might also have some `.flat_map` option available to drop the `.reduce`.

▲ 112 days ago

▲itsmeknt 112 days ago

True! Good point on the restructuring, I haven't thought about it in that way.

I think I like the second approach because the loop behavior seems clearest, which helps me analyze the time complexity or when I want to skim the code quickly.

A syntax like something below would be perfect for me if it existed:

  var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = books[i].author.pets[j].favoriteFood.distinct()
    where i = pagecount > 100,
              language == "Chinese",
              subject == "History",
              author.mentions > 10_000
    where j = is_furry == True

▲dahauns 110 days ago

Hm, LINQ query syntax form is kinda going in that direction

  (from book in books
   where book.pagecount > 100 
        && book.language == "Chinese"
        && book.subject == "History"
        && book.author.mentions > 10_000
   from pet in book.author.pets
   where pet.is_furry == true
   select pet.favoriteFood)
  .Distinct()

But it also demonstrates the...erm, chronic "halfassedness" of LINQ's query syntax form with distinct() not available there and having to fall back to method syntax form anyway...

▲syklemil 111 days ago

You would likely approach it in any style with some helper functions once whatever's in the parentheses or ifs starts feeling big. E.g. in the dot style you could

  fn bookFilter(book: Book) -> bool {
   return book.pageCount > 100 and 
     book.language == "Chinese" and 
     book.subject == "History" and
     book.author.mentions > 10_000
  }
  
  var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = books
    .filter(bookFilter)
    .flatMap(book => book.author.pets)
    .filter(pet => pet.is_furry)
    .map(pet => pet.favoriteFood)
    .distinct()

▲tsss 112 days ago

Your FP example is needlessly complicated. No one who does FP regularly would write it like that.

  var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = books
    .filter(book => 
       book.pageCount > 100 and 
       book.language == "Chinese" and 
       book.subject == "History" and
       book.author.mentions > 10_000
    )
    .flatMap(book => book.author.pets)
    .filter(pet => pet.is_furry)
    .map(pet => pet.favoriteFood)
    .distinct()

Or in Scala:

  val favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory = (for {
    book <- books if
      book.pageCount > 100 &&
      book.language == "Chinese" &&
      book.subject == "History &&
      book.author.metnions > 10_000
    pet <- book.author.pets if
      pet.is_furry
  } yield pet.favoriteFood).distinct

Though, most Scala programmers would prefer higher-order functions over for-comprehensions for this.

▲Chris_Newton 112 days ago

I didn’t see the original, but the FP example here looks fairly idiomatic to me.

An alternative, which in FP-friendly languages would have almost identical performance, would be to make the shift in objects more explicit:

    var favoriteFoodsOfFurryPetsOfFamousAuthorsOfLongChineseBooksAboutHistory =
      books
        .filter(book => isLongChineseBookAboutHistory(book))
        .map(book => book.author)
        .filter(author => isFamous(author))
        .flatMap(author => author.pets)
        .filter(pet => pet.isFurry)
        .map(pet => pet.favouriteFood)
        .distinct()

I slightly prefer this style with such a long pipeline, because to me it’s now built from standard patterns with relatively simple and semantically meaningful descriptions of what fills their holes. Obviously there’s some subjective judgement involved with anything like this; for example, if the concept of an author being famous was a recurring one then I’d probably want it defined in one place like an `isFamous` function, but if this were the only place in the code that needed to make that decision, I might inline the comparison.

▲itsmeknt 112 days ago

Thanks! I have updated my post to use your code. It is indeed much nicer. And yes, I don't write much FP.

I just improved the comprehension code as well using the same idea as your code, eliminating an entire list!

▲davidw 112 days ago

Without syntax highlighting, "book.author for book in books if book.page_count > 1000" requires a lot more effort to parse because white space like newlines is not being used to separate things out.

▲nicwolff 112 days ago

    authors_of_long_books: set[Author] = {
        book.author 
        for book in books 
        if book.page_count > 1000
    }

▲syklemil 112 days ago

You've had some answers already, but I also think this is a good argument for syntax highlighting. With tools like tree-sitter it's pretty easy these days to get high quality syntax highlighting, which allows us humans to receive more information in parallel. A lot of the information we pick up in our daily lives is carried through color, and being colorblind is generally seen as a disability (albeit often a mild one which can be undetected for decades).

Syntax highlighting in print is more limited because of technological and economic constraints, which might leave just bold, italics and underlines on the table, while dropping color. On screens and especially in our editors where we see the most code, a lack of color is often a self-imposed limitation.

▲davidw 112 days ago

That's not the point though. If you need the syntax highlighting to quickly make out the structure, perhaps the visual layout is not as good as it could be.

▲syklemil 112 days ago

I consider syntax highlighting to be a part of the _visual_ structure. Visibility is more than just whitespace and placement!

▲xen0 112 days ago

Set comprehensions are normal in mathematics and, barring very long complex ones, I find them the easiest to parse because they are so natural.

They're just a tad more verbose in Python than mathematics because it uses words like 'for' and 'in' instead of symbols.

▲d0mine 112 days ago

Set comprehension are more idiomatic here (explicit syntax) though filter/map are not that bad too:

    {*map(_.author, filter(_.page_count > 1000, books))}

It uses lambdas package.

▲mrkeen 111 days ago

Awesome, giving it a quick scan,

  authors_of_long_books = set()

Now I know that authors_of_long_books is the empty set. Do I need to bother reading the rest?

▲tvier 112 days ago

Much of both sides of this argument are opinion, but wrt this comment.

> ... no function call overhead.

This code has more function calls. O(n) vs 3 for the original

▲ 112 days ago

▲khaledh 112 days ago

That's not true. The lambdas used in the functional version are each called once for every item in the list.

▲stouset 112 days ago

No sane optimizer is going to emit the functional code as a gajillion function calls.

▲tvier 112 days ago

Yeah, if you treat it as javascript vs python they're likely correct (I'm not that familiar with js). The article and original comment were about function vs imperative though, so I assumed half decent runtimes for both.

▲Spivak 112 days ago

It's not? How could that possibly work when the lambda could throw and it could throw on the nth invocation and your stack trace has to be correct?

If I run this in the JS console I get two anonymous stack frames. The first being the console itself.

    [1, 2, 3].filter(x => [][0]())

▲khaledh 112 days ago

True, but now you're relying on a specific implementation and optimization of the compiler, unless the language semantics explicitly say that lambdas will be inlined.

▲stouset 112 days ago

This is true of literally anything and everything your compiler emits. In practice the functional style is much easier to optimize to a far greater degree than the imperative style.

▲tvier 112 days ago

This is why you shouldn't get into arguments about performance on the internet without highly specified execution environments.

I'm going to take my own advice and go back to work :)

▲feoren 112 days ago

> You are told explicitly at the beginning what the type of the result will be

I would argue that's a downside: you have to pick the appropriate data structure beforehand here, whereas .distinct() picks the data structure for you. If, in the future, someone comes up with a better way of producing a distinct set of things, the functional code gets that for free, but this code is locked into a particular way of doing things. Also, .distinct() tells you explicitly what you want, whereas the intention of set() is not as immediately obvious.

> There are no intermediate results to think about

I could argue that there aren't really intermediate results in my example either, depending on how you think about it. Are there intermediate results in the SQL query "SELECT DISTINCT Author FROM Books WHERE Books.PageCount > 1000"? Because that's very similar to how I mentally model the functional chain.

There are also intermediate results, or at least intermediate state, in your code: at any point in the loop, your set is in an intermediate state. It's not a big deal there either though: I'd argue you don't really think about that state either.

> and no function call overhead

That's entirely a language-specific thing, and volatile: new versions of a language may change how any of this stuff is implemented under the hood. It could be that "for ... in" happens to be a relatively expensive construct in some languages. You're probably right that the imperative code is slightly faster in most languages today, and if it has been shown via performance analysis that this particular code is a bottleneck, it makes sense to sacrifice readability in favor of performance. But it is a sacrifice in readability, and the current debate is over which is more readable in the first place.

> a single pass over books

Another detail that may or may not be true, and probably doesn't matter. The overhead of different forms of loops is just not what's determining the performance of almost any modern application. Also, my example could be a single pass if those methods were implemented in a lazy, "query builder" form instead of an immediately-evaluated form.

In fact, whether this query should be immediately evaluated is not necessarily this function's decision. It's nice to be able to write code that doesn't care about that. My example works the same for a wide variety of things that "books" could be, and the strategy to get the answer can be different depending on what it is. It's possible the result of this code is exactly the SQL I mentioned earlier, rather than an in-memory set. There are lots of benefits to saying what you want, instead of specifying exactly how you want it.

▲megous 112 days ago

Set is a well defined container for unique values. It's much clearer what it is than some non-existent .distinct() function with no definition and unclear return value.

Procedural code in JS doesn't say how you want something done any more closely than the functional style variant. for-of is far more generic than .map/.filter() since .map() only works on Array shaped objects, and for-of works on all iterables, even generators, async generators, etc. In any case you're not saying how the iteration will happen with for-of, you're just saying that you want it. Implementation of Set is also the choice of a language runtime. You're just stating what type of container you want.

Sometimes functional style may be more readable, sometimes procedural style may.

▲yongjik 112 days ago

Maybe it's because I'm not familiar with such style, but I don't like how the code hides operational details. That is, if `books` contains one billion books, and the final result should contain about a hundred authors, how much extra memory does this use for intermediate results?

▲mrkeen 112 days ago

The best way to kick the tyres on this kind of question is to plug in something literally infinite. That way if you arrive at an answer you're probably doing something right with regard to space and time usage.

For example, use all the prime numbers as an expression in your chain.

    import Data.Function
    import Data.Numbers.Primes

    main = do

        let result :: [Int] = primes
                            & filter (startingWithDigit '5')
                            & asPairs
                            & map pairSum
                            & drop 100000
                            & take 10

        print result

    asPairs xs = zip xs (tail xs)
    pairSum (a, b) = a + b
    startingWithDigit d x = d == head (show x)

> [100960734,100960764,100960792,100960800,100960812]

> 3 MiB total memory in use (0 MB lost due to fragmentation)

▲yxhuvud 112 days ago

This is a valid concern I also reacted a little bit on. One thing to note though is that it is often possible to tell such chains to be lazy and only collect the end result at the end without ever generating any intermediary arrays.

Which require the author to actually have an idea how big the numbers are, but that is very often the case regardless of how you write your code.

▲patrick451 111 days ago

Sometimes I write code like this. Then I delete it and replace it with a for loop, because a loop is just easier to understand.

This functional style is what I call write only code, because the only person who can understand it is the one who wrote it. Pandas loves this kind of method chaining, and it's one of the chief reasons pandas code is hard to read.

▲throwA29B 111 days ago

Chaining calls is an anti-pattern. Not only this is needless duplication of ye olde imperative statements sequence it also makes debugging, modifying ("oh I need to call some function in the middle of the chain, ugh"), and understanding harder for superficial benefit of it looking "cool".

It actively hurts maintainability, please stop using it.

▲recursivedoubts 112 days ago

There is a (large, I believe) aspect of good code that is fundamentally qualitative & almost literary. This annoys a lot of computer programmers (and academics) who are inclined to the mathematical mindset and want quantitative answers instead.

I love dostoyevsky and wodehouse, both wrote very well, but also very differently. While I don't think coding is quite that open a playing field, I have worked on good code bases that feel very different qualitatively. It often takes me a while to "get" the style of a code base, just as a new author make take a while for me to get.

▲louthy 112 days ago

I 100% agree with this. One of the best compliments I ever got (regarding programming) was from one of my principal engineers who said something along the lines of "your code reads like a story". He meant he could open a code file I had written, read from top to bottom and follow the 'narrative' in an easy way, because of how I'd ordered functions, but also how I created declarative implementations that would 'talk' to the reader.

I follow the pure functional programming paradigm which I think lends itself to this more narrative style. The functions are self contained in that their dependencies/inputs are the arguments provided or other pure functions, and the outputs are entirely in the return type.

This makes it incredibly easy to walk a reader through the complexity step-by-step (whereas other paradigms might have other complexities, like hidden state, for example). So, ironically, the most mathematically precise programming paradigm is also the best for the more narrative style (IMHO of course!)

▲DidYaWipe 111 days ago

I had a similar experience. I was a lead on a project where the client sent a functional expert to literally (at times) watch over my shoulder as I worked. He got very frustrated after a few weeks, seeing little in the way of code being laid down. He even complained to the project manager. That's because this was a complicated manufacturing system, and I was absorbing the necessary rules for it and designing it... a process that involved mostly sitting and thinking.

When I decided on the final design and basically barfed all the code out in a matter of days, I walked this guy (a non-programmer) through the code. He then wrote my manager a letter declaring it to be the "most beautiful code he had ever seen." I still have the Post-It she left in my cube telling me that.

I have little tolerance for untidy code, and also overly-clever syntax that wastes the reader's time trying to unravel it.

And now we have languages building more inconsistent and obscure syntax in as special-case options, wasting more time. Specifically I'm thinking about Swift; where, if the last parameter in a function call is a closure, it's a "trailing" closure and you can just ignore the function signature and plop the whole closure right there AFTER the closing parenthesis. Why?https://www.hackingwithswift.com/sixty/6/5/trailing-closure-...

This is just one example, and yeah... you can get used to it. But in this example, the language has undermined PARENTHESES, a notation that is almost universally understood to enclose things. When something that basic is out the window, you're dealing with language designers who lack an appreciation for human communication.

▲qwertygnu 112 days ago

> The functions are self contained in that their dependencies/inputs are the arguments provided or other pure functions and the outputs are entirely in the return type.

Is this just a fancy way of saying static functions?

▲louthy 112 days ago

Nope, pure functions are referentially transparent. The key idea is that you can replace the function invocation with a value and it shouldn’t change the program.

A regular static function could refer to a file, a database, or it could change some global memory, etc. So, replacing the static function (that causes side-effects) with a pure value wouldn’t result in the same program.

Side-effects are usually declaratively represented by something like an IO monad. Which in reality is just a lambda with the side-effecting behaviour in the body of the lambda.

So, to make a pure IO function you don’t actually perform the IO in the function, you return a data type (the lambda) that represents the IO to perform. This maintains the purity if the function and ‘passes the buck’ to the caller. In the case of Haskell, all the way up to its Main function and into its runtime — making the language itself pure, even if the runtime isn’t.

This isn't just a Haskell thing though. I'll write code this way in C# (and have built a large pure-FP framework for C# to facilitate this approach [1]).

Here's an example of the more 'narrative style' [2] of C# using pure-FP. It reads from top-to-bottom, keeping the related functions near each other and walking the reader through the functionality. There's also a massive removal of the usual clutter you see in C#/Java programs, getting down to the essence of the logic. It won't be to everybody's taste (as it's not idiomatic at all), but it demonstrates the idea.

This style works well for regular program logic and less well for things like APIs where there's not always a narrative you can tell.

[1] https://github.com/louthy/language-ext

[2] https://github.com/louthy/language-ext/blob/main/Samples/Car...

▲syklemil 112 days ago

> Nope, pure functions are referentially transparent. The key idea is that you can replace the function invocation with a value and it shouldn’t change the program.

[Edit: This is wrong: And idempotent.] Generally you can expect that you can call them as many times as you like and get the exact same result. It _feels_ very safe.

> This isn't just a Haskell thing though. I'll write code this way in C# (and have built a large pure-FP framework for C# to facilitate this approach [1]).

I think that habit from Haskell is also what allowed me to pick up Rust pretty easily. You don't run afoul of the borrowchecker much if you don't expect to mutate a lot of stuff, and especially at a distance.

▲chowells 112 days ago

That's not what idempotent means. Idempotent means forall x, f(x)=f(f(x)). Most pure functions are not idempotent. Heck, f(f(x)) doesn't even type-check for most f. The typical name given to always getting the same results is just "pure". It doesn't depend on any implicit state anywhere.

▲syklemil 112 days ago

Right you are. I wish I had an excuse for my mistake, but I don't.

▲hinkley 111 days ago

There’s a difference between simplifying a concept and stating it plainly.

I use this analogy a lot. Code can be like a novel, a short story, or a poem. A short story has to get to the point pretty quickly. A poem has to be even more so, but it relies either on shared context or extensive unpacking to be understood. It’s beautiful but not functional.

And there are a bunch of us short story writers who just want to get to the fucking point with a little bit of artistic flair, surrounded by a bunch of loud novel and mystery writers arguing with the loudest poets over which is right when they are both wrong. And then there’s that asshole over there writing haikus all the fucking time and expecting the rest of us to be impressed. The poets are rightfully intimidated but nobody else wants to deal with his bullshit.

▲h4ny 111 days ago

I see where you are coming from but that's unnecessarily hostile.

> There’s a difference between simplifying a concept and stating it plainly.

You are right, but they are not mutually exclusive.

The analogy you used with novel, short story, and poem/haiku also doesn't demonstrate your point: it's not like you can compress any novel into a short story, let alone a poem. If you're into games, try equating AAA-quality 3D games to novel, high-resolution 2D games to short stories, and pixel art games to haikus: it doesn't make sense and it's ridiculous.

I respect that you are passionate about the medium you choose, but what you claimed about novels and poems, as per your own words, "they are both wrong" at best. Don't generalize your personal experience to everyone else, there are kind, hardworking people out there writing novels and poems who love short stories just as much -- maybe what you need to do is to find those people instead of spewing your unwarranted anger over them.

▲hinkley 111 days ago

If you think people only get upset about things for their own self interest, then I wonder what you think about social justice.

You have a Texas Sharpshooter Fallacy in your logic. A novelist is successful if they reach an audience. Once they find it, if they stick with it they will be successful. If they’re lucky then they might switch up genres without alienating their existing readers. But not everyone gets away with that.

A software developer has one audience and they don’t get to chose it. You and I write for our coworkers. If they don’t like it we have three choices. We can leave, we can change, or we can gaslight our coworkers that our code is just fine and they are the problem.

It’s the latter I’ve seen too much of, and even if you’re not a victim of it you’re allowed to be incensed for those who are. In fact you’re obligated to do so.

▲h4ny 111 days ago

> If you think people only get upset about things for their own self interest, then I wonder what you think about social justice.

That's a gross mischaracterization of what I said. If anything, I'd be really concerned if you think what you're doing is akin to "social justice", and being hostile to others is justified in the name of "social justice".

> You have a Texas Sharpshooter Fallacy in your logic.

That doesn't even make any sense.

> A novelist is successful if they reach an audience. Once they find it, if they stick with it they will be successful. If they’re lucky then they might switch up genres without alienating their existing readers. But not everyone gets away with that.

Sure, there are some people who do that and not everyone "gets away with" not sticking with what made them "successful" (what is "successful" in this context anyway and how do you measure it? Wealth? Fame? Cultural impact?).

What you said isn't wrong, but there are also plenty of counter examples to what you said. So I'm not sure what you're trying to say. That looks more like a Texas Sharpshooter Fallacy.

Nobody is forcing successful novelists with an audience to continue and stick with things. Nobody is focusing unsuccessful novelists to keep going either. You make it sound like they don't have a choice so somehow everyone has to recognize and exercise "social justice" by speaking out for them.

> A software developer has one audience and they don’t get to chose it. You and I write for our coworkers. If they don’t like it we have three choices. We can leave, we can change, or we can gaslight our coworkers that our code is just fine and they are the problem.

> It’s the latter I’ve seen too much of, and even if you’re not a victim of it you’re allowed to be incensed for those who are. In fact you’re obligated to do so.

I don't disagree with any of that and I have worked with a fair share of the gastlighting kind of software engineers that you pointed out -- much too often and much too long, and they are usually very senior engineers with authority that end up literally destroying teams.

However, none of that justifies the hostility in your initial comment and that's the only point I'm trying to make. I have worked with people who are capable of writing well-structured code for others that delightful to read and maintain. If you haven't then I hope you will some day.

Anyway, at this point I'm thinking that you should just "get to the fucking point with a little bit of artistic flair", and you are probably thinking the same in reverse. Let's just leave it there.

▲hinkley 111 days ago

>> You have a Texas Sharpshooter Fallacy in your logic.

> That doesn't even make any sense.

Well that explains the long response but that’s the gist right there.

Sharpshooter fires bullets at a barn and then paints a target on the spot with the most holes. That’s how creative writing usually works, unless you’re a paid columnist and even then it’s partially true.

I think you’re mistaking using swear words with hostility. Not everyone has veins popping out of their foreheads when they call a bullshit situation bullshit.

▲zwnow 112 days ago

I consider code bad if it takes more then 5 seconds to read and understand the high level goal of a function.

Doesn't matter how it looks. If its not possible to understand what a function accomplishes within a reasonable amount of time (without requiring hours upon hours of development experience), it's simply bad.

▲jacobr1 112 days ago

There is a call-stack depth problem here that is specific to codebases though. For one familiar with the the conventions, key data abstractions (not just data model but convention of how models are structured and relate) and key code abstractions, a well formed function is easy to understand. But someone relatively new to the codebase will need to take a bunch of time switching between levels to know what can be assumed about the state or control flow of the system in the context of when that function/subroutine is running. Better codebases avoid side-effects, but even with good separation there, non-trivial changes require strong reasoning about where to make changes in the system to avoid introducing side-effects and not just passing extra state or around all over the place.

So, I'd take "good architecture" with ok and above readability, over excellent readability but "poor architecture" any day. Where architecture in this context means the broader code structure of the whole project.

▲zwnow 112 days ago

But who talked about bad architecture? Good readable code doesn't rule out good architecture. Surely some things are complicated but even then, a dev should be able to quickly see whats going on with minimal expertise in a codebase.

▲swatcoder 112 days ago

They're suggesting that readability and "5-second accessibility" are essentially contextual and build on a conceptual language that might be specific to a tool, team, or project.

The novel function that might take "5 seconds to read" for the 20 people contributing to a mature project with a good architecture might nonetheless take 10 minutes for a new hire to decipher because they don't know the local vocabulary (architecture, idioms) and need to trace and interpret things elsewhere in the project and its libraries.

Meanwhile, writing implementations in a way that tried to avoid a local vocabulary altogether might make naive reads easier, but less readable to experienced team member because they're not as concise as they could be.

Your general advice to "make things easily readable" is good advice, but like with writing compelling prose or making accessible graphic design, you need to consider your audience and your choices might look different than the ones somebody else might make.

▲hinkley 111 days ago

Go is a board game with simple rules but combinatorial consequences to those rules. Minutes to learn and a lifetime to master.

This is where “simple recursive data structures” can be simple to read but difficult to track and comprehend. An architecture that descends through distinct layers at least has landmarks that the recursive one does not have. If you are not at the root or the leaf you don’t really know where you are, lost in the middle.

▲jacobr1 112 days ago

The broader problem is "cognitive load to understand the code I'm looking at." There are a variety of factors that lead enabling that. This is a very limited example.

  function isReadyToDoThing(Foo foo) {
    return foo.ready
  }

  function processStuff((Foo foo) {
    if isReadyToDoThing(foo) {
      res = workflowA(foo)
      res2 = workflowB(foo)
      return res && res2
    }
  }

This might be dumb, if isReadyToDoThing is trivial, and it could be easily inlined. Or alternatively it could be a good way to self-document, or annotate a preferred approach (imagine several similar named methods). Regardless if you don't know the code, you'll want to go look at the method, especially if it is in a different file.

But also consider:

  function isReadyToDoThing(Foo foo) {
    return foo.attribute1 && foo.attribute2 && ! otherThing(foo)
  }

This or more complex logic might be encapsulated, in which case this is probably good to separate.

Making these kind of tradeoffs involve thinking about the overall system design, not just the way you structure the code within a given function.

▲maleldil 111 days ago

If `isReadyToDoThing` is only used in one place, I'd argue that it's better to inline it with an appropriately-named variable so I don't have to "go to definition" to understand what it's doing.

I think people get too caught up in "small functions" and lose the readability of code locality.

▲ikrenji 112 days ago

the purpose of the function should be clear from its name. if its too complex to convey this information it should have a docstring that clearly explains what it does. it's not rocket science

▲apelapan 112 days ago

In the general case I fully agree. It is ideal that the name of anything clearly indicates whatever is important about that thing.

But who is the viewer of that name?

How much context can they be assumed to have? The name of the class? The name of of the module? The nature and terminology of the business that the function serves? The nature and terminology of the related subsystems, infrastructure and libraries?

There is a context dependent local optimum for how to name something. There are conflicts of interest and trade-offs have to made.

▲skydhash 112 days ago

> How much context can they be assumed to have? The name of the class? The name of of the module? The nature and terminology of the business that the function serves? The nature and terminology of the related subsystems, infrastructure and libraries?

All of this.

▲horsawlarway 112 days ago

Names are jargon. They are themselves their own form of complexity, and they require a similar timeline to become acquainted with.

Further - the more of them you need (call depth) the worse this problem becomes.

▲jajko 112 days ago

Yeah, not disagreeing with what you write but parent is talking about different type of complexity which your description/approach doesn't magically fix, I'd call it spread of complexity

▲fasbiner 112 days ago

Sounds like you are content to limit yourself to problems that do not contain more irreducible complexity or require more developer context than what fits within five seconds of comprehension.

That's a good rule for straightforward CRUD apps and single-purpose backend systems, but as a universal declaration, "it is simply bad" is an ex cathedra metaphysical claim from someone who has mistaken their home village for the entirety of the universe.

▲hinkley 111 days ago

> is an ex cathedra metaphysical claim

I have a cargo ship-sized suspicion that your code is difficult to read for reasons other than intrinsic complexity.

You’ve found a way to explain it to yourself and excuse it to others, but you won’t always be the smartest person in the room.

Also that’s not what was said.

> more then 5 seconds to read and understand the high level goal of a function

Understanding what something is for is not understanding how it accomplishes it.

▲fasbiner 110 days ago

If you find compact language above your level of proficiency confusing, you can literally ask an LLM to explain it for you to trade efficiency for accessibility.

You sound unhappy and seem to be lashing out, and your username can only be read as an allusion to a mentally ill would-be assassin. Given those, you can maybe begin to understand why your opinions are not credible as a contribution even in relation to other anonymous people.

A small part of your comment is salvageable, though:

> Understanding what something is for is not understanding how it accomplishes it

I can think of at least one area I know something about where the 5 second rule fails - sometimes when working on a shader and optimizations for it, it takes more than 5 seconds for the person who wrote the code to describe what it's for at a high level.

If even the person who wrote the code can't meet that arbitrary constraint, other people looking at the code for the first time have no chance.

▲ajross 112 days ago

> I consider code bad if it takes more then 5 seconds to read and understand the high level goal of a function.

That's something that's possible only for fairly trivial logic, though. Real code needs to be built on an internal "language" reflecting its invariants and data model and that's not something you can see with a microscope.

IMHO obsessive attention to microscope qualities (endless style nitpicking in code review, demands to run clang-format or whatever the tool du jour is on all submissions, style guides that limit function length, etc...) hurts and doesn't help. Good code, as the grandparent points out, is a heuristic quality and not one well-defined by rules like yours.

▲necrotic_comp 112 days ago

I agree with this up to a point - having consistent code style with some sort of formatter (gofmt, black, clang-format) goes a long way to reducing complexity of understanding because it unifies visual style.

I suggest that a codebase should read like a newspaper. While there is room for op-eds in the paper, it's not all op-eds, everything else should read as a single voice.

▲ajross 112 days ago

> While there is room for op-eds in the paper,

My experience is that projects which value code formatters (and similar rulemaking) tend strongly not to have room for "op-eds", FWIW. And conversely the code bases I've seen with the best/cleanest/most-clearly-expressive code tend strongly to be the ones with fewer rules.

I think the one exception there is in some open source contexts (Linux is the archetype) which receive a fire hose of submissions of questionable quality and maintainership. There, you can use adherence to arbitrary rules as a proxy measurement[1] for attention and effort on the part of the submitter. And that I have no problem with.

But the actual value of code formatters is IMHO extremely low in practice, and the cost isn't "high", but is non-trivial.

[1] The "No Brown M&M's" trick.

▲zwnow 112 days ago

I even wrote no matter how it looks?

I meant the goal of your function needs to be grasped within a reasonable amount of time. This works for every codebase.

▲ajross 112 days ago

> I meant the goal of your function needs to be grasped within a reasonable amount of time. This works for every codebase.

It really doesn't though. Here's a function of mine. It's maybe 40 lines of logic, so medium-scale. It's part of an intrusive red/black tree implementation for Zephyr. I'm fairly proud of how it turned out, and think that this code is awfully readable given its constraints.

No human being is going to understand fix_extra_red() without having already read and understood the rest of the file, and coming to it with an understanding of the underlying algorithm. Certainly I can't. I can't even get started on maintaining this code that I originally wrote within a five minute time frame, it's an hour at least every time, just to remind myself how it works:

https://github.com/zephyrproject-rtos/zephyr/blob/main/lib/u...

Now maybe this is "bad code", and "good code" could exist for this problem that still meets your requirements. But... if so that's an awfully celestial definition if it's so hard to find.

▲sunrunner 112 days ago

This is exactly the kind of example I have in my head for code that constitutes a high level of information density. Adding abstraction and 'literate' constructs to try and make things readable is ultimately deferring the fact that understanding the code here is fundamentally about understanding a specific implementation of an algorithm, and to understand _that_ ultimately needs the reader to build their own clear mental model of both the algorithm itself and how it's been translated into a specific form.

Maybe it's a defeatist attitude, but I feel like sometimes the problem is the problem, and pushing abstractions only works to defer the requirements to understand it. Sometimes you can defer it enough to do useful work, other times you just need to understand the thing.

▲whstl 111 days ago

> Maybe it's a defeatist attitude, but I feel like sometimes the problem is the problem, and pushing abstractions only works to defer the requirements to understand it

That's also my impression and experience.

And sometimes there is no problem at all, but abstractions are still pushed too far and then a problem arises in the form of non-essential complexity.

▲hinkley 111 days ago

Code that looks like it has a bug in it but doesn’t will draw the eye over, and over, and over again when fishing for how regressions or bugs got into the code. This is the real cost of code smells. At some point it’s cheaper for me to clean up your mess than to keep walking past it every day. But I’m going to hate you a little bit every time I do.

▲callc 112 days ago

Generally agree.

Consider reading kernel or driver code. These areas have a huge amount of prerequisite knowledge that - I argue - makes it OK to violate the “understand at a glance” rule of thumb.

▲andybp85 112 days ago

for the codebase i work on, i made a rule that "functions do what the name says and nothing else". this way if the function does too much, hopefully you feel dumb typing it and realize you should break it up.

▲quinnirill 112 days ago

Then what does the function that calls the split functions get called? foo_and_bar_and_qoo? And if they’re called only under some conditions?

▲hinkley 111 days ago

I find an odd overlap between people who get incredulous about function decomposition and who think cracking open a thesaurus as an architectural exercise is stupid.

I have no idea what that’s about, but I think it has something to do with “white-knuckling”.

People name things and then miss boundary conditions that matter and would have been implied by finding a more accurate synonym. And also supplementary features that the better name suggests.

▲jbeninger 112 days ago

I have definitely been guilty of naming functions foo_thenBarSometimes. I wince whenever I write them, but I've never really regretted seeing them later on, even after years. So sometimes it really is a perfectly good name. Sometimes there are two related functions that are often called together and don't have a succinct label for the combined operation.

▲brulard 112 days ago

It likely has some higher-level meaning other than just do foo, bar, qoo.

For example if you are calling functions "openDishwasher", "loadDishwasher", "closeDishwasher", "startDishwasher", your function should be called "washDishes". Not always that straightforward, but I believe in 95% it's not difficult to put a name on that. For the rest 5% you need to get creative, or maybe you realize that you didn't group the function calls well enough to have an atomic meaning.

▲quinnirill 112 days ago

Yeah, I agree in spirit but I think the answer is more ”it depends” than something where you should feel bad or something if you deviate from it. If washDishes also sends a bunch of metrics/diagnostics or updates a database of your favorite dish washing programs somewhere inside it, that’s probably fine. Otherwise you push the path of least resistance to just be vague instead, then you get a codebase full of functions with names like handle or process.

▲skydhash 112 days ago

Some function can be named generally but the name is specific when concatenated with the module name and/or the package name. `load` is generic, but `app.config.load` is more descriptive.

▲hinkley 111 days ago

OrderAuthorsByNameAndCalculateResidualsAndSendPaperCheckWithThankYouCard()

I could see how that might come up in a retrospective.

▲sunrunner 112 days ago

Does this apply to all domains and all 'kinds' of code?

I feel like there's a fundamental difference in the information density between code that, for example, defines some kind of data structure (introducing a new 'shape' of data into an application) versus code that implements a known algorithm that might appear short in line length but carries a lot of information and therefore complexity.

▲zwnow 112 days ago

If the algorithm is well known, it's all good as long as the function name for it is somewhat understandable. I have to work with 200 line functions at work and it's a complete, excuse the language, shitshow.

▲sunrunner 112 days ago

> as long as the function name for it is somewhat understandable

But does using a function, essentially a box with known inputs and outputs, constitute actually understanding the function? What happens if you need to debug or understand the implementation of it? Now the original name has gone and you're looking at a larger number of differently-named things that hopefully communicate their intent well. But if you need to understand _those_, and so on...

▲zwnow 112 days ago

My original comment never was about understanding the implementation details. It was about understanding the high level goal of the function.

▲jayd16 112 days ago

So it should take 5 minutes whether it's your language or choice or the assembly it compiles to? Or does it matter how it looks in _that_ case?

▲bad_haircut72 112 days ago

Do you think you would understand every function in the doom codebase in under 5 seconds? Is this bad proframming then?

▲hinkley 111 days ago

Making code faster without making it more difficult to read is an art so black that some people insist it doesn’t exist. Doom is about being fast.

Doom famously has a function in it so obscure that nobody remembers how they even came up with it.

▲zwnow 111 days ago

You did not understand my original comment... Doom might be good code but its maintainability is horrendous in modern standards.

▲intrasight 112 days ago

> Doesn't matter how it looks.

That's the mindset that the author is trying to counter.

▲jimmaswell 112 days ago

This is what xmldoc/jsdoc/etc are for. If it's not 100% obvious from the name, put a summary of the function's assumptions, side effects, output, and possibly an example in the comment-doc. If you do this right, the next programmer will never have to read your source at all (or even navigate to your file! They'll hover over a method call or find it in the dot-autocomplete and see a little tootip with this documentation in it, and know all they need to know). It's an incredible thing when it works. It's a little bit more effort but I don't accept the FUD around "comments become out of date immediately because the code will change" etc. - that should be part of code review.

https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

▲hinkley 111 days ago

Always, always check in about whether it would be simpler to fix the function than to write an extended apology for it working the way it does.

While Five Whys works very well for disaster prevention, I find 3 often suffice for fixing rather than explaining an architectural wart. Often we used to need this to work this way because something else had to work a particular way, but as the product grew that is no longer true, or desirable. So you might be able to fix it or put a fix on the backlog.

▲freetonik 112 days ago

>This annoys a lot of computer programmers (and academics) who are inclined to the mathematical mindset and want quantitative answers instead.

I find many syntactical patterns that are considered elegant to be the opposite, and not as clear as mathematics, actually. For example, the the ternary operator mentioned in the article `return n % 2 === 0 ?'Even' : 'Odd;` feels very backwards to my human brain. It's better suited for the compiler to process the syntax tree rather than a human. A human mathematician would do something like this:

         ⎧  'Even' n mod 2 = 0
  f(n) = ⎨
         ⎩  'Odd'  n mod 2 ≠ 0

Which is super clear.

▲pc86 112 days ago

Well of course if you have the freedom to write a mathematical expression you're going to be able to present it in a way that is clearer than if you have to type monospace characters into a text editor.

I'm not sure it's realistic to expect to be able to type a mathematical expression using ascii more clearly than you can write it by hand (or implement using special unicode characters).

▲cdirkx 112 days ago

Quite some years back I worked with JetBrains MPS which used a "projectional editor" instead of a text editor. It was pretty neat to be able to enter "code" as mathematical expressions, or even state machine tables or flow diagrams with actual nodes instead of a text representation.

Sadly not much has happened in that space since then, but it was cool to think about what our tools of the future might look like. (of course ignoring all the practical reasons why we're probably still using regular text files in 100 years)

▲Timwi 110 days ago

I understand that you might find the mathematical notation clearer but I think it's presumptuous of you to speak on behalf of all humans, or even all human mathematicians. I'm a mathematics graduate and I find the conditional operator more readable in a program because it corresponds to what the program actually does (it checks the condition first); but I also recognize that the two notations have exactly the same information content and only differ superficially in syntax, making it entirely a matter of familiarity.

▲gwbas1c 112 days ago

This is why code reviews are so critical: They help keep a consistent style while onboarding new team members, and they help a team keep its style (reasonably) consistent.

(Also, see my comment about .editorconfig: https://news.ycombinator.com/item?id=43333011. It helps reduce discussions about style minutia in pull requests.)

▲WillAdams 112 days ago

Arguably, this is why Literate Programming (see my comment elsethread) didn't take off.

▲User23 112 days ago

Mathematicians have recognized the importance of elegance for millennia.

▲mrkeen 112 days ago

The article's good, but misses my most mentally-fatiguing issue when reading code: mutability.

It is such a gift to be able to "lock in" a variable's meaning exactly once while reading a given method, and to hold it constant while reasoning about the rest of the method.

Your understanding of the method should monotonically increase from 0% to 100%, without needing to mentally "restart" the method because you messed up what the loop body did to an accumulator on a particular iteration.

This is the real reason why GOTOs are harmful: I don't have a hard time moving my mind's instruction-pointer around a method; I have a hard time knowing the state of mutable variables when GOTOs are in play.

▲cle 112 days ago

Disagree. There's an abstract "information space" that the code is modeling, and you have to move around your mind's instruction pointer in that space. This can be helped or hindered by both mutable and immutable vars--it depends on how cleanly the code itself maps into that space. This can be a problem w/ both mutable and immutable vars. There's a slight tactical advantage to immutable vars b/c you don't have to worry about the value changing or it changing in a way that's misleading, but IME it's small and not worth adopting a "always use immutability" rule-of-thumb. Sometimes mutability makes it way easier to map into that "information space" cleanly.

▲klabb3 112 days ago

> This is the real reason why GOTOs are harmful: I don't have a hard time moving my mind's instruction-pointer around a method; I have a hard time knowing the state of mutable variables when GOTOs are in play.

Well, total complexity is not only about moving the instruction pointer given a known starting point. Look at it from the callee’s pov instead of the call site. If someone can jump to a line, you can’t backtrack and see what happened before, because it could have come from anywhere. Ie you needed global program analysis, instead of local.

If mutability were the true source of goto complexity then if-statements and for loops have the same issue. While I agree mutability and state directly causes complexity, I think goto was in a completely different (and harmful) category.

▲stared 112 days ago

My pet peeve:

    function getOddness4(n: number):
      if (n % 2 === 0):
        return "Even";
      return "Odd";

While it is shorter, I prefer vastly prefer this one:

    function getOddness2(n: number):
      if (n % 2 === 0):
        return "Even";
      else:
        return "Odd";

Reason: getOddness4 gives some sense of asymmetry, whereas "Even" and "Odd" are symmetric choices. getOddness2 is in that respect straightforward.

▲bogomog 112 days ago

  function getOddness(n: number):
    return (n % 2 === 0)
      ? "Even"
      : "Odd";

Lowest boilerplate makes it the most readable. If working in a language with the ternary operator it ought to be easily recognized!

▲ajuc 112 days ago

I love code golf as much as anyone, not sure it's worth it on such small methods tho. Any of the propositions would be fine. Anyway:

    def oddness(n):
      return ["Even", "Odd"][n % 2]

BTW this trick with replacing if-then-else with a lookup is sometimes very useful. Especially if there's many ifs.

▲pasc1878 111 days ago

Or your write in APL

▲stared 112 days ago

This is, IMHO, the idiomatic way to do so.

▲erikerikson 111 days ago

Since the article was in JavaScript:

  const getOddness = (n) => (
    n % 2
      ? 'Odd'
      : 'Even'
  )

Even less visual noise

▲lioeters 111 days ago

    const getOddness = n => n % 2 ? 'Odd' : 'Even'

▲ 111 days ago

▲culopatin 112 days ago

While this is simple and all, the English words if/else don’t require the reader to know the ?: convention. Depending on what background the reader may have, they could think of the set notation where it could mean “all the evens such that odd is true” which makes no sense. Its also very close to a key:value set notation. If/else leave no doubts for the majority of readers. It’s more inclusive if you will.

▲bogomog 112 days ago

That's why I gave the caveat that if using a language with the ternary operator, one should know that operator. Python tried using English words for a ternary, but I think that's awkward from a readability perspective. A limited set of symbolic syntax improves readability over using words in my opinion, there's less text to scan.

▲erikerikson 111 days ago

Learning the basic operators of the language seems like table stakes.

Choosing one of world's spoken languages over others seems to be the opposite of inclusive.

▲elliottkember 111 days ago

Putting “return” on a different line from the actual value you’re returning?

▲Twisol 111 days ago

If there are two ways to say something, then people will find ways to make their choice of method into speech as well.

To me and my style of coding, there's a difference of intent between the two. A ternary connotes a mere computation, something that should have no side-effects. A conditional connotes a procedure; the arms of the conditional might be expected to have side-effects. (And the case of `if (_) return` or similar are pure control flow guards; they neither compute a value nor perform a procedure as such.)

It's not just about where the symbols go on the screen.

▲__oh_es 112 days ago

I feel guard clauses/early returns end up shifting developer focus on narrowing the function operation, and not an originally happy path with some after thought about other conditions it could handle.

IME else’s also end up leading to further nesting and evaluating edge cases or variables beyond the immediate scope of happy path (carrying all that context!).

▲CatAtHeart 112 days ago

I personally prefer the former as you can visually see the return one level of indentation below function name. It shows a guaranteed result barring no early-exits. Something about having the return embedded lower just seems off to me.

▲cess11 112 days ago

I would add a blank line to push 'return "Odd";' from the if, and also add brackets around the if-body if the language allows.

There are situations where I allow else, they tend to have side effects, but usually I refactor until I get rid of it because it'll come out clearer than it was. Commonly something rather convoluted turns into a sequence of guards where execution can bail ordered based on importance or execution cost. It isolates the actual function/method logic from the exit conditions.

▲BlackFly 111 days ago

The asymmetry is apparent if the code gets refactored to continuation/callback style or mutation of a more complex data structure, the first method will fall through and execute the second instruction set. Return is a special operator in this sense in that it breaks control flow and the ordinary control flow of the first method is not capturing the exhaustiveness of the two cases.

In idiomatic rust, return isn't used except for exceptional cases that break the control flow of the method and the second example is more commonly seen without return statements at all. Idiomatic python also typically early exits in the beginning with a return on invalid parameters or state with a tail position return being the usual actual return value. Because of these conventional practices, breaking the exhaustive if-else control structure makes the indented return appear exceptional (like an invalidity). If you follow these conventions than naturally the return statement begins to appear redundant except in the break-control-flow cases and the choice of the rust convention begins to make sense: in all languages return is a statement equivalent to break.

▲brulard 112 days ago

Nice example of how subjective this is. I immediately thought the first one without "else" is clearly the winner.

This is the problem with formatting rules. A codebase needs to have consistent style, even though that might mean nobody is fully happy with it.

I for example can not stand semicolons in JavaScript. It is just a visual clutter that is completely redundant, and yet some people really want it there.

▲makeitdouble 111 days ago

If it's this short, the ternary operator would be the absolute best option IMHO.

If any of the clauses are much longer, the first option reads a lot better if it can be a guard cause that returns very quick.

If neither options are short I'd argue they should be pushed away into scoped and named blocks (e.g. a function) and we're back to either a ternary operation or a guard like clause.

▲zoogeny 112 days ago

90% of the time I prefer the first. I am allergic to indentation and I hate anything remotely like:

    function foo(a) {
        if (a) {
            return doThing()
        } else {
            return Error();
        }
    }

I like all of my assertion and predicate guards nicely at the top of a function:

    function foo(a) {
        if (!a) {
            return Error()
        } 

        return doThing()
    }

And for that reason, I would probably go for getOddness4 even though I see your point.

▲stared 111 days ago

In case of handling exceptions (and similar stuff), I also prefer avoiding unnecessary nesting.

For two (or more) equally valid, I prefer keeping same nesting.

▲CharlieDigital 112 days ago

Maybe it's just me, but TypeScript makes code hard to read.

It's fine if the data model is kept somewhat "atomic" and devs are diligent about actually declaring and documenting types (on my own projects, I'm super diligent about this).

But once types start deriving from types using utility functions and then devs slack and fall back to type inference (because they skip an explicit type), it really starts to unravel because it's very hard to trace fields back to their origin in a _deep_ stack (like 4-5 levels of type indirection; some inferred, some explicit, some derived, some fields get aliased...).

    type Dog = {
      breed: string
      size: "lg" | "md" | "sm"
      // ...
    }

    type DogBreedAndSize = Pick<Dog, "breed" | "size">

    function checkDogs(dogs: Dog[]) : DogBreedAndSize[] {
      return dogs.map(d => /* ... */)
    }

    const checkedDoggos = checkDogs([])

Versus:

    function checkDogs(dogs: Dog[]) {
      // ...
    }

Very subtle, but for large data models with deep call stacks, the latter is completely unusable and absolutely maddening.

▲bluefirebrand 112 days ago

I agree that functions should probably specify their output type, MOSTLY to enforce that all paths that return from that function must adhere to that type

I've seen plenty of regressions where someone added a new condition to a function and then returned a slightly different type than other branches did, and it broke things

However, I don't think there is much value in putting types on variable declarations

In your example,

`const checkedDoggos = checkDogs([])` is good. Just let checkedDoggos inherit the type from the function

I have a codebase I'm working on where the linter enforces

`const checkedDoggos: DogBreedAndSize[] = checkDogs([])`

It is very silly and doesn't add much value imo

▲CharlieDigital 112 days ago

I want it on the other side (on the function return) so that it's consistently displayed in type hints and intellisense so I don't have to navigate the code backwards 3-4 layers to find the root type (do you see what I'm saying?)

    function checkDogs(dogs: Dog[]) : DogBreedAndSize[] {
      return dogs.map(d => /* ... */)
    }

^^^ That's where it's important to not skip the type def because then I can see the root type in the editor hints and I don't need to dig into the call stack (I know the end result is the same whether it's on the assignment side or the declaration side, but it feels like ensuring it's always on the declaration side is where the value is)

▲dkdbejwi383 112 days ago

I'd prefer to have some type information over nothing if the choice were between TypeScript with some inferred return types, versus JavaScript where you're never really sure and constantly have to walk back up/down the stack and keep it in your mind.

▲CharlieDigital 112 days ago

I'd say on backend, my preference is statically something like C#. Statically typed but enough type flexibility to be interesting (tuples, anonymous types, inferred types, etc)

▲Sateeshm 111 days ago

My policy is to type only when the typescript compiler yells at me.

▲userbinator 111 days ago

Smaller functions with fewer variables are generally easier to read

I hate how a lot of focus on "readability" is on micro-readability, which then tends to encourage highly fragmented code under the highly misguided assumption that micro-readability is more important than macro-readability. The dogma-culting around this then breeds plenty of programmers who can't see the forest for the trees and end up creating grossly inefficient code and/or have difficulty with debugging.

APL-family languages are at the other extreme, although I suspect the actual optimum is somewhere in the middle and highly dependent on the individual.

▲DimmieMan 111 days ago

There's certainly a middle ground, especially when multiple file are involved. 3-4 go to definitions in something i'm not familiar with and i'm struggling, now that's a me problem but i can't imagine most people are miles ahead of me.

.Net culture, especially with "clean architecture" is shocking for this, you go to modify a feature or troubleshoot and things are spread across 4 layers and 15 files, some that are > 60% keywords.

I don’t have an answer of where the cutoff is but I'll generally take 1 longer function that's otherwise neat and following the other recommendations outlined that I can read sequentially instead of scrolling up and down every 5 lines because it's so fragmented. same can be said for types/classes too, that 4 value enum used only for this DTO does not need to be in another file!

▲neonsunset 111 days ago

What’s tragic is this is completely self-inflicted and you could argue is done against the kind of code structure that is more idiomatic to C#. Luckily, you don’t have to do it if you are not already working with this type of codebase.

▲jorams 112 days ago

This is an interesting article, but also rather unsatisfying. It very quickly jumps to conclusions and goes right back to opinion. I agree with several of those opinions, but opinion was explicitly not the point of the article.

> Prefer to not use language-specific operators or syntactic sugars, since additional constructs are a tax on the reader.

I don't think this follows from the metric. If a function contains three distinct operators, a language-specific operator that replaces all three of them in one go would reduce the "effort" of function. It's highly scenario-specific.

> Chaining together map/reduce/filter and other functional programming constructs (lambdas, iterators, comprehensions) may be concise, but long/multiple chains hurt readability

I don't think this follows either. One effect of these constructs when used right is that they replace other operators and reduce the "volume". Again this can go both ways.

> ...case in point, these code snippets aren’t actually equivalent!

That's a very language-specific diagnosis, and arguably points at hard-to-read language design in JS. The snippet otherwise doesn't look like JS, but I'm not aware of another language for which this would apply. Indeed it is also commonly known as a "null-safe operator", because most languages don't have separate "null" and "undefined".

> variable shadowing is terrible

> long liveness durations force the reader to do keep more possible variables and variables in their head.

These can arguably be contradictory, and that is why I am a huge fan of variable shadowing in some contexts: By shadowing a variable you remove the previous instance from scope, rather than keeping both available.

▲shortrounddev2 112 days ago

There's a cool plugin for vscode called Highlight[1] that lets you set custom regexes to apply different colors to your code. I think a common use of this is to make //TODO comments yellow, but I use it to de-emphasize logs, which add a lot of visual noise because I put them EVERYWHERE. The library I maintain uses logs that look like:

    this.logger?.info('Some logs here');

So I apply 0.4 opacity to it so that it kind of fades into the background. It's still visible, but at a glance, the actual business logic code pops out at you. This is my configuration for anyone who wants to modify it:

    //In vscode settings.json:
    "highlight.regexes": {
        "((?:this\\.)?(?:_)?logger(?:\\?)?.(debug|error|info|warn)[^\\)]*\\)\\;)": {
            "regexFlags": "gmi",
            "decorations": [{
                "opacity": "0.4"
            }]
        }
    },

---

[1] https://marketplace.visualstudio.com/items?itemName=fabiospa...

▲gwbas1c 112 days ago

> For long function chains or callbacks that stack up, breaking up the chain into smaller groups and using a well-named variable

> Is the second one marginally less efficient?

> Yes.

No, both versions are just as efficient:

In both versions, the same objects are allocated, stored on the heap, and garbage collected. The difference in efficiency comes down to the compiler.

For the second version, the compiler should observe that each variable is only used immediately after it's declared, and thus treat those objects as "out-of-scope" as if it's a chained function call.

▲superjan 112 days ago

I agree. After compiling it is even quite likely that the compiler does not care you gave a name to a return value (assuming you let it infer the variable type). What you will see in practice is that the intermediate is explicitly “materialized” (e.g. into a list), because the author wanted to inspect it in the debugger. That will have some cost, mostly in the form of avoidable allocations.

▲jt2190 112 days ago

Kudos to seeinglogic for trying to quantify that “readablity” is. We need a lot more of this. (I feel like the most common definition of readability in use today is “readable to me“.)

I have a half-baked thought that we could find the actual dimensions of readability if we gave a test to a very large group of people and asked them to pick a sentence that describes what the code does. Each question would be timed. The questions that most people answered correctly in the shortest average time would provide us with examples of “real-world readable” code, and more importantly, might help us identify some truly not-readable practices.

I predict we’d see respondents start to cluster around various things, like “how long have they been programming?“, “do they understand programming paradigm X?“, etc. Perhaps the results would shift over time, as various things came into and out of fashion.

▲alpinisme 112 days ago

Yes one of the core challenges here is that we learn to read code. So what you learn to read and write shapes what you find readable. And lots of factors shape what you learn to read and write, including what you are trying to do, who you’re doing it with, what besides coding you knew how to do ahead of time, what other languages you know, etc. One stark possibility is that a lot of “readability” concerns after the low hanging fruit is gone (like don’t name variables with arbitrary irrelevant or misleading names) are really just about consensus building: maybe there are no right answers that transcend the particular group of programmers you are trying to work with.

▲bluGill 112 days ago

As an example, for my first decade of programming I worked on code where the coding style banned the ?: operator, so I didn't use it and found such code hard to read the few times I encountered it. Then I got a new job where one of the programmers really liked that operator and so I was forced to learn how to read it, now such code is more readable then the if statements to me - when used in the way we use it on this project.

▲James_K 112 days ago

I actually don't see any value in it. Code readability is similar to language readability in that it is mostly a concern for people who don't know a language and can be addressed by spending time with it. The real issue of programming is code complexity which you cannot determine from metrics about individual pieces of code. The problem exists in the relationships between functions rather than the implementation decisions in the bodies of those functions.

▲usrbinenv 112 days ago

That's very one-dimensional. It's usually easy to tell what the code does, but what's hard is to modify or add functionality to it. And this is because of various levels of abstractions that hide how things are interconnected.

▲usrbinenv 112 days ago

Another thing that comes to mind is the level at which one is familiar with a particular style of code organization & a set of abstractions. For example, Rails devs really have absolutely no problem getting up to speed with any given Rails app, but people who don't practice Ruby/Rails as their primary language/framework often complain how complicated and magical it is.

▲esafak 112 days ago

Poor abstractions. Good abstractions make it easier to change things, by decomposing the code into cohesive pieces with low coupling so that you can swap them out and having to think about the surrounding pieces beyond their interfaces. A good interface is small and logical.

▲bluGill 112 days ago

Good abstractions make it easy to change the things that will change. Abstractions always make some changes harder and some easier, but good ones make hard the things you wouldn't change anyway.

▲James_K 112 days ago

I my view, code complexity is best expressed in the size of it's syntax tree, with maybe an additional term for the number of unique nodes. The real mistake here is the assumption that local reductions in complexity make a meaningful difference to overall complexity. Small local decreases in complexity may guide you towards the local minimum of complexity, but will never substantially change the complexity of the code-base overall. All measurements of code complexity are essentially as good as asking "how much code do you have".

▲bluGill 112 days ago

That is ultimately by problem with the article. It isn't a bad investigation but it cannot stand alone. I never review on function in isolation. It always needs to be in context of what calls it (and often what it calls).

▲Izkata 112 days ago

I think the only one I disagree with here is the function chains example. I may agree with a different example, but with this one I find the chained version without variables much easier to understand because I'm traversing the graph visually in my head, while the variables are additional state I have to keep track of in my head.

----

Really I was hoping this would be about actual visual patterns and not syntax. It's my major issue with how strict code linting/autoformatting is nowadays.

For example, the "black" formatter for python requires this:

    drawer.ellipse((10, 10, 30, 30), fill=(256, 256, 0))
    drawer.ellipse((370, 10, 390, 30), fill=(256, 256, 0))
    drawer.arc((20, 0, 380, 180), 15, 165, fill=(256, 256, 0), width=5)

The first argument is (x1, y1, x2, y2) of a bounding box and black wants to align x1 for "ellipse" with y1 of "arc". Do people really find that more readable than this?

    drawer.ellipse( (10,  10,  30,  30), fill=(256, 256, 0) )
    drawer.ellipse( (370, 10, 390,  30), fill=(256, 256, 0) )
    drawer.arc(     (20,   0, 380, 180), 15, 165, fill=(256, 256, 0), width=5 )

Or perhaps something more common in python, kwargs in function arguments. No spacing at all is standard python style:

    Foobar.objects.create(
        name=form.name,
        owner=user,
        location=form.location,
        source=form.source,
        created=time.now,
    )

Instead for me it's much easier to read this, since I don't have to scan for the "=" and mentally parse each line, it's just a table:

    Foobar.objects.create(
        name        = form.name,
        owner       = user,
        location    = form.location,
        source      = form.source,
        created     = time.now,
    )

But no linters/formatters accept such spacing by default. I think flake8 (linter) could be configured to ignore whitespace like this, but black (formatter) can't.

▲jraph 112 days ago

There are two issues with such alignment that may make the "less readable" version a bit better despite being, indeed, arguably less readable:

- a modification might lead you to realign several lines, making diffs noisier (though you can ignore white-spaces when displaying diffs, but the commits still hold these lines making navigating in the history less straightforward

- we all have our own preferences wrt alignment, and your way of aligning or what you decide to align might be different from someone else, and this can lead to friction

Worse is probably better here, as much as I like aligned stuff, I think black is right in this case :-)

▲12_throw_away 112 days ago

Vertical alignments really makes me want to create a devtool stack that operates at more of an AST level and less of an "monospaced text + extra bells and whistles" level.

Aligning similar expressions for ease of reading seems like exactly the sort of thing an editor should display for us without requiring some arbitrary number of spaces to be stored in a text file ...

▲Izkata 112 days ago

AST level would have to automatically figure out what parts should be aligned, an alternative is to keep saving it in text but tweak the meaning of the "tab" character, so the developer still has control over what gets aligned: https://nick-gravgaard.com/elastic-tabstops/

Not great since viewing it in something that doesn't understand elastic tabstops would just be a mess, but it solves one of the issues the other response brings up, and I think some sort of user control like that is going to remain necessary either way.

▲mont_tag 111 days ago

When vertical alignment helps readability, you have use `#fmt: on/off`. Black simply doesn't have enough information to know when assignment alignment (or comment alignment) was done on purpose.

▲dchristian 112 days ago

This talks about the "what" of the code, but you have to also convey the "why".

If you have a well understood problem space and a team that is up to speed on it, then the "why" is well established and the code is the "what".

However, there are often cases where the code is capturing a new area that isn't fully understood. You need to interleave an education of the "why" of the code.

I was once asked to clean up for release 10k lines of someone else's robotics kinematics library code. There weren't any comments, readmes, reference programs, or tests. It was just impenetrable, with no starting point, no way to test correctness, and no definition of terms. I talked to the programmer and he was completely proud of what he had done. The variable names were supposed to tell the story. To me it was a 10k piece puzzle with no picture! I punted that project and never worked with that programmer again.

▲throwaway2037 111 days ago

This image shows six different ways to write a simple function, getOddness(): https://seeinglogic.com/posts/visual-readability-patterns/6-...

Personally, my normal style is getOddness2(). I try to never have an expression in my return statement -- only return a literal, local variable, or class data member. Why do I choose getOddness2()? It is so easy to debug. When I write code, I am mostly thinking about difficult to debug -- control flow and local variables.

I would like to hear about other people's style and why they choose it.

Related: Does Google Code style guidelines (perhaps the most famous of such guidelines on the Interwebs) have anything to say about which version of getOddness() is best/recommended?

▲bluGill 112 days ago

Towards the end he had an example of splitting a sequence of "graph.nodes(`node[name = ${name}]`).connected().nodes().not('.hidden').data('name');" adding variable between some of the . in there and claimed it was marginally less efficient. This is sometimes true, but if it ever is you need to talk to your tool vendors about a better optimizes. If you are working in a language without an optimizer than the marginal difference from that optimization applied by hand will be far smaller than the performance improvements you will get by rewriting in a language with an optimizer. Either way, readability trumps performance: either because the performance is the same, or if performance mattered you would have choosen a different language in the first place.

▲evnp 112 days ago

I found this example troubling because once all the line noise is added,

first-op second-op third-op

fourth-op fifth-op sixth-op

feels so much more impenetrable than

- first-op

- second-op

- third-op

- fourth-op

- fifth-op

- sixth-op

The point of functional styles isn't purely brevity (as implied by the commentary around this example), it also puts a focus on the clear sequence of operations and helps reduce "operators and operands" beneficially as discussed early in the post. In general I found the post oddly dismissive of these styles, instead of weighing tradeoffs as I would hope.

▲syklemil 112 days ago

IME what we want is generally for the code to be close to the left margin and flow predictably downwards. The example with intermediate values has a lot more value to me for complex instantiations, where we can avoid nesting like it's json or yaml by using some helper variables. That problem is fundamentally the same as with deeply nested if/while/try/etc: It gets hard to visually tell what's in which scope. (Rainbow indent guides help, but they're still mitigation for a situation that can be eliminated.)

But completely linear dot chains? They're fine.

▲evnp 112 days ago

Agree with you completely. I'm not against intermediate variables (though I tend to appreciate the way comments separate context from code, more than my hardline "code should be self-documenting" colleagues do). But I don't think they should come at the cost of textual clarity.

I think you could look at this through a "dimensional" lens. I'm ok with linear dot chains (or even better, pipe chains) – you read operations top to bottom. I'm also ok with single line chains where they fit, especially when contained neatly in a single-line function – in this case operations read left to right. But the second form in this example forces you to read operations top-to-bottom AND left-to-right at once, creating a 2-dimensional "wall of noise" effect for me. I'd expect the issues compound as ops are added, instead of increasing linearly with chain syntax. All very subjective and familiarity-dependent of course.

▲superjan 112 days ago

I do both. It depends on whether I can think of a concise variable name that faithfully describes the intermediate result. If you need more than 20-ish characters to describe it, then it is better to leave it chained.

▲isleyaardvark 112 days ago

I’d add that I follow that approach because the optimizers are more likely to optimize readable code than weird hacks.

▲BorgHunter 112 days ago

I think things like Halstead complexity or cyclomatic complexity are more heuristic than law. To read code, the most important thing to me is the abstractions that are built, and how effectively they bury irrelevant complexities and convey important concepts.

As an example, I recently refactored some Java code that was calling a service that returned a list of Things, but it was paged: You might have to make multiple calls to the service to get all the Things back. The original code used a while loop to build a list, and later in the same function did some business logic on the Things. My refactoring actually made things more complex: I created a class called a Spliterator that iterated through each page, and when it was exhausted, called the service again to get the next one. The upside was, this allowed me to simply put the Things in a Stream<Thing> and, crucially, buried the paged nature of the request one level deeper. My reasoning is that separating an implementation detail (the request being paged) from the business logic makes the code easier to read, even if static code analysis would rate the code as slightly more complex. Also, the code that handles the pages is fairly robust and probably doesn't need to be the focus of developer attention very often, if ever, while the code that handles the business logic is much more likely to need changes from time to time.

As programmers, we have to deal with a very long chain of abstractions, from CPU instructions to network calls to high-falutin' language concepts all the way up to whatever business logic we're trying to implement. Along the way, we build our own abstractions. We have to take care that the abstractions we build benefit our future selves. Complexity measures can help measure this, but we have to remember that these measures are subordinate to the actual goal of code, which is communicating a complex series of rules and instructions to two very different audiences: The compiler/interpreter/VM/whatever, and our fellow programmers (often, our future selves who have forgotten half of the details about this code). We have to build high-quality abstractions to meet the needs of those two audiences, but static code analysis is only part of the puzzle.

▲WillAdams 112 days ago

That is exactly what is discussed in:

https://www.goodreads.com/book/show/39996759-a-philosophy-of...

▲dsego 112 days ago

Man, this is the third reference to this book I am seeing this week, I need to order this book.

▲WillAdams 112 days ago

Gave away a copy to a developer in Brazil, and ordered the Kindle version, and need to order another print copy.

Can't recommend it highly enough --- I found it transformative --- read through it one chapter at a time, then re-worked the code of my current project:

https://github.com/WillAdams/gcodepreview

then went on to the next chapter --- about the halfway point I had the code cleaned up so well that the changes became quite minor.

▲zesterer 112 days ago

I've never understood the hate for variable shadowing. Maybe it's because I mostly use Rust, but I've always found it a useful boon for readability. You often want to extract/parse/wrap/package some value within the middle of a function in a manner that changes its type/form but not its semantic purpose. Shadowing the old value's variable name is brilliant: it communicates that there's a step-change in the responsibilities of the function, demarcating layers from one-another and preventing accidental use of the old value.

▲syklemil 112 days ago

> I've never understood the hate for variable shadowing. Maybe it's because I mostly use Rust,

That's likely a good chunk of it. My impression is it's more acceptable in languages where you have a very correctness-focused compiler, and `rustc` is that both with types and liveness/ownership. In a language where it's less clear when you copy values or hand out mutable references, or where implicit conversions occur on type mismatches, it's gonna be a different experience.

I think this article is best read as js/ts-specific advice, e.g. the split between null and undefined also isn't something you have to worry about in most other languages, and the semantics of various `?` and `?.` operators can vary a lot.

▲alextingle 112 days ago

If you like to actually read the code, then being able to search for a variable name really helps comprehension. Shadowing makes that harder, by introducing multiple distinct objects with the same name.

Now you need something like an IDE to easily follow the lifetime of an object. Introducing a heavyweight dependency like that, as a prerequisite for simply following the code easily, is... a poor choice.

▲stopthe 112 days ago

I always shrugged off the concept of code metrics (from LoCs to coverage) as a distraction from getting actual things done. But since doing more code-review I started to lack a framework to properly explain why a particular piece of code smells. I sympathize with the way the author cautiously approaches any quantitative metrics and talks of them more like heuristics. I agree that both Halstead Complexity and Cognitive Complexity are useless as absolute values. But they can be brought up in a conversation about a potential refactoring for readability.

What I didn't find is a mention of a context when reading a particular function. For example, while programming in Scala I was burnt more than once by one particular anti-pattern.

Suppose you have a collection of items which have some numerical property and you want a simple sum of that numbers. Think of shopping cart items with VAT tax on them, or portfolio positions each with a PnL number. Scala with monads and type inference makes it easy and subjectively elegant to write e.g.

  val totalVAT = items.map(_.vat).sum

But if `items` were a `Set[]` and some of the items happened to have the same tax on them, you would get a Set of numbers and a wrong sum in the end.

You could append to the list of such things until the OutOfMemoryError. But it's such a beautiful and powerful language. Sigh.

▲mrkeen 110 days ago

> But it's such a beautiful and powerful language. Sigh.

Don't give up. Half of the time when a good language has problems, it just means that the bad languages don't have those problems yet.

You don't need global type-inference and monads to run into your problem. Dynamic languages exist, and even the static ones usually have some kind of 'var x =' local type-inference. And collections like Set probably have a map function.

▲Pannoniae 112 days ago

This is probably a bit of a cheap dismissal.... but I think this article misses the forest for the trees. They set a standard which they manage to meet - but do not really introspect on whether the standard is appropriate or not.

All of these metrics (except variable liveness) are on a method/function level. Guess what this encourages? Splitting everything into three-line methods which makes the codebase a massive pile of lasagna full of global or shared state.

If a method is long to read from top to bottom, the answer isn't always splitting it into 5 smaller ones, sometimes life just has inherent complexity.

▲LandR 112 days ago

> If a method is long to read from top to bottom, the answer isn't always splitting it into 5 smaller ones, sometimes life just has inherent complexity.

Yes! This.

I find it much easier to parse a long function where I can scroll down it and just read it top to bottom, then having a function which calls out to lots of other functions and I'm jumping around the code base, back and forward.

Just reading the long function top to bottom, where I can very easily just scroll up a bit is so much easier to keep in my head.

Even worse is when you go to definition on the method and you get 5 options, and you have to figure out which one would actually get called given the current path through.

▲mpweiher 112 days ago

I think the issue is whether the functions that are split out are actually useful abstractions.

If they are, you should not have to jump around the code-base, you should be able to just read the invocation and know what it does, without leaving the source function.

As an example, you probably don't whip out your kernel source code when you encounter a call to write(). At least not usually. You just know what it does and can keep going.

You probably also don't look at the generated assembly code, and maybe look up the instruction reference for your favorite microprocessor when you encounter an arithmetic operator. You just assume that you know what it does, even if that may not be 100% correct in every case.

Those are good, useful abstractions.

That's what we need to strive for when we crate code.

▲dijksterhuis 112 days ago

> [than] having a function which calls out to lots of other functions and I'm jumping around the code base, back and forward.

i agree with longer functions and less jumping around, but there's also some nuance i find. I sometimes find converting a complicated multi-line condition into something like the below is much easier for me to read, so long as the function is named in a clear way and the function definition is just above the big function it gets called by (never in a different file!)

    def is_something_valid(something, foo, bar):
        return something > 0 and something != foo and something > bar

    if is_something_valid(something, foo, bar):

it's like a convenience reading shortcut so i don't have to parse the logic in my head if i don't want to. also makes the condition nice and easy to test.

then again, can also do it the grug-brained way

    gt_zero = something > 0 
    ne_foo something != foo
    gt_bar something > bar

    if gt_zero and ne_foo and gt_bar:

▲jabiko 112 days ago

I think you are making a good point but if this function is only used in one place I would personally prefer to just use a variable:

    something_is_valid = something > 0 and something != foo and something > bar

    if something_is_valid:
        # do stuff

That way you can document the intention of the condition using the variable name while at the same time keeping the locality of the check.

▲jodrellblank 112 days ago

> can also do it the grug-brained way

This way reads like:

    x = 1 // set variable x equal to 1

in that gt_zero echoes what the > operator does and says nothing about intent. Comparing, e.g.

    gt_zero = space > 0     // there is some space I guess?

    space_for_logfile = space > 0   // oh, logfiles need space > 20 there's the mistake.

▲dijksterhuis 111 days ago

https://grugbrain.dev/#grug-on-expression-complexity

i skipped off the `space` in `space_gt_zero` because i was on my phone and couldn’t be bothered to type it out all the way each time.

don’t read too much into it. it was just laziness while brining up an existing concept.

▲robert_dipaolo 112 days ago

I mostly agree, but for short one liners and where there will be no reuse elsewhere, instead of a function I prefer;

  something_is_valid = something > 0 and something != foo and something > bar
  if something_is_valid:
    # ....

It achieves the same thing without needing to scroll.

▲_dark_matter_ 112 days ago

100%. Worst is when the called function is in a separate file, and the most upsetting is when it's the _only_ function in the file. I really wish IDEs or tools like Sourcegraph could handle this better.

▲rudasn 112 days ago

What are talking about?! That's my favourite part when reading code! :P

▲jghn 112 days ago

For longer functions vs bouncing between smaller functions my experience has been that this is one of those things where people are one way or the other. And they almost never change their preference. If your coworkers are all the same as you, that's great. If they're not, prepare for battle.

▲michaelcampbell 112 days ago

> the answer isn't always splitting it into 5 smaller ones

To someone who just read a book about it, it is. I've heard this called "rabbit hole" programming; it's function after function after function, with no apparent reason for them other than the line count. It's maddening.

▲LandR 112 days ago

I think the whole Uncle Bob clean code movement has a lot to answer for.

▲bluGill 112 days ago

While then Uncle Bob clean code movement has a lot of answer for, it is far far better than much of what came before. I've had to work with 60,000 line functions where #ifdefs worked across brace boundaries

    ...
    #ifndef foo
    break;
    case SomeCondition:
       doSomething().
    #endif
       moreCode();
       break;

I'll take the worse uncle Bob can throw at me over that mess.

▲dsego 112 days ago

https://github.com/johnousterhout/aposd-vs-clean-code

▲hobs 112 days ago

I would use the term irreducible complexity - you can move it around but you cant get rid of it, and spreading it all over your code smoothly and evenly makes it 10x harder to change or reason about it.

▲cardanome 112 days ago

While I think that there is no harm in longer functions as long as the code is linear, I think the the problem is that people abstract the wrong way.

The main issue I see is people writing functions that call functions that call functions. No. Don't. Have one main function that is in control and that calls all the sub-functions. Sub-functions do not call functions on their same level, ever. Yes, those sub-function could have their own helper function and so on but the flow always needs to be clearly one-directional. Don't go across and create a spaghetti mess. Trees not graphs.

Keep your code structure as flat as possible.

> massive pile of lasagna full of global or shared state.

Yeah, the skill is to avoid shared state as much as possible. Handling state centrally and opting for pure functions as much as possible.

▲ 112 days ago

▲alan_n 111 days ago

I think a lot about code readability a lot. I agree with most of this except the names part and shorthand constructs. Especially for the latter, it's unfair to make a comparison with the wrong operator. The example in general is a bit weird. I think a better one is `if (myObj !== null) myObj = someOtherObj`. A very common pattern for which js now has a nullish coalescing assignment you can use insteal (`myObject ??= somOtherObj`) or there's also ||=. These sacrifice some readability upfront due to them being "uncommon" operators and the code might be harder to read for someone whose never seen it, but is so much easier to read once you get used to it. There is of course, always a trade off. Using a lot of .? to access properties can be code smell, but also sometimes very helpful, as the alternative would expand into a lot of code.

Regarding names, I think the suggestions are a bit in conflict, as often to avoid variable shadowing you have to do stuff like name things node, _node, node2. I try to have distinct names, but I'd rather the shadowing in those cases where it's hard. As for i, and j. I don't like them, but they're such conventions it's hard to avoid them. I always try to use them only once, and assign the variable I need: `item = obj[i][j]` if possible.

▲slevis 112 days ago

Do people really really agree with "Shorthand constructs that combine statements decreases difficulty"? The author even identifies a problem with the example from the original guide.

▲Jensson 112 days ago

Everyone agrees that well made shorthand constructs decreases difficulty, since every programmer uses those every day. Things like function calls, while loops etc are all shorthands for different kinds of jump statements combined with register manipulation. Even assembly uses some of those, and I don't think anyone seriously codes in machine code.

▲marcosdumay 112 days ago

No, almost everybody disagree with it as a general statement.

Some people disagree to a point where they want languages to have only a handful different constructs. But most people will disagree at some amount of language complexity.

▲fauigerzigerk 112 days ago

>The author even identifies a problem with the example from the original guide

He does, but I'm not sure he's right. The code snippet appears to be in C# or Dart and neither has undefined.

▲mannykannot 112 days ago

I would like to add something to the point made here:

"For long function chains or callbacks that stack up, breaking up the chain into smaller groups and using a well-named variable or helper function can go a long way in reducing the cognitive load for readers. [my emphasis]

  // which is easier and faster to read?
  function funcA(graph) {
    return graph.nodes(`node[name = ${name}]`)
      .connected()
      .nodes()
      .not('.hidden')
      .data('name');
  }

  // or:
  function funcB(graph) {
    const targetNode = graph.nodes(`node[name = ${name}]`)
    const neighborNodes = targetNode.connected().nodes();
    const visibleNames = neighborNodes.not('.hidden').data('name')

    return visibleNames;
  }

The names of the functions being called are rather generic, which is appropriate and unavoidable, given that the functions they compute are themselves rather generic. By assigning their returned values to consts, we are giving ourselves the opportunity to label these computations with a hint to their specific purpose in this particular code.

In general, I'm not a fan of the notion that code can always be self-documenting, but here is a case where one version is capable of being more self-documenting than the other.

▲crazygringo 112 days ago

I'm almost always a fan of more rather than less commenting and self-documentation...

...but in this example I find the first to be far faster and easier to read. The "labeled" versions don't add any information that isn't obvious from the function names in the first.

If you were giving business logic names rather than generic names (e.g. "msgRecipient", "recipientFriends", "visibleFriends" then I could see more value. But even then, I would find the following the easiest:

    function funcA(graph) {
        return (graph
          .nodes(`node[name = ${name}]`)  // message recipient
          .connected()  // recipient friends
          .nodes()
          .not('.hidden')  // inside current scroll area
          .data('name')
          );
      }

This keeps the code itself simple and obvious, prevents a ton of repetition of variable names, and allows for longer descriptions than you'd want in a variable name.

▲ledauphin 112 days ago

_thank you_. this is the comment i came here desperately hoping somebody had already made.

It's not that names are bad - it's that when you use intermediate variables, my brain has to check whether any of the variables are used more than once - i.e., is the flow here completely linear, or is there a hidden branching structure?

the chain of methods approach makes it _completely_ clear that there is no 'tree' in the code.

If you want names (and that's a fine thing to want!) then _either_ comments or defining separate _functions_, e.g `function messageRecipient`, `function friends`, `function visibleToScroll`) is the way to go. Although with many languages that don't have a built-in pipe operator, it becomes harder to express the nice linear flow in a top-to-bottom arrangement if you take the function route. A good reason for languages to keep converging toward syntax for pipes!

For my money, you only define those functions if you want to reuse them later - additional indirection is not usually helpful - so comments would be my choice if there were no other uses of these selectors.

▲mannykannot 112 days ago

I agree that commenting appropriately is desirable (I do more that I used to.) I also like the idea of const being the default, and for syntax highlighting that clearly distinguishes mutables.

▲mannykannot 112 days ago

You are right, using comments is even more effective in going from the generic to the specific - but then, there's a vocal minority who insist that comments are not only unnecessary, but a clear indication that you are doing it wrong. I must admit that I doubt many of them would endorse the use of auxiliary consts in the manner of the original example, either.

▲ 112 days ago

▲satisfice 112 days ago

Interesting how important mere opinion seems to be, because the author doesn't seem to mention two issues that matter a great deal to me:

- Alignment of braces and brackets, instead of an opening brace at the end of one line and the closing brace at the beginning of a subsequent line. - everything I need to see is within an eyespan, instead having to jump to several different files to trace code.

▲WillAdams 112 days ago

Why not apply a programming methodology which allows one to leverage a rich set of tools which were created for making text more readable and visually pleasing?

http://literateprogramming.com/

For a book-length discussion of this see:

https://www.goodreads.com/book/show/39996759-a-philosophy-of...

previously discussed here at: https://news.ycombinator.com/item?id=27686818 (and other times in a more passing mention --- "Clean Code" adherents should see: https://news.ycombinator.com/item?id=43166362 )

That said, I would be _very_ interested in an editor/display tool which would show the "liveness" (lifespan?) of a variable.

▲Chris_Newton 112 days ago

I find Literate Programming interesting partly because it’s almost the opposite of the much-advocated “many small functions” style. You could literally be writing a book that explains your program, and the code becomes almost secondary material to illustrate the main text rather than the main asset itself.

I did once write a moderately substantial application as a literate Haskell program. I found that the pros and cons of the style were quite different to a lot of more popular/conventional coding styles.

More recently, I see an interesting parallel between a literate program written by a developer and the output log of a developer working with one of the code generator AIs. In both cases, the style can work well to convey what the code is doing and why, like good code comments but scaled up to longer explanations of longer parts of the code.

In both cases, I also question how well the approach would continue to scale up to much larger code bases with more developers concurrently working on them. The problems you need to solve writing “good code” at the scale of hundreds or maybe a few thousand lines are often different to the problems you need to solve coordinating multiple development teams working on applications built from many thousands or millions of lines of code. Good solutions to the first set of problems are necessary at any scale but probably insufficient to solve the second set of problems at larger scales on their own.

▲WillAdams 112 days ago

The Axiom computer algebra folks seem to manage well --- I'm pretty sure that's the largest publicly available literate program which is available for inspection.

I've been working to maintain a list of Literate Programs which have been published (as well as books about the process):

https://www.goodreads.com/review/list/21394355-william-adams...

I'd be glad of any I missed, or other links to literate programs.

The list of projects so tagged on Github may be of interest:

https://github.com/topics/literate-programming

▲mbarbar_ 112 days ago

Another not on the list is Scheme 9 from Empty Space. I can't speak to its quality though as I've never looked at the resulting book, just perused the stripped source a little a while back.

https://www.t3x.org/s9fes/

▲WillAdams 112 days ago

Thanks! I've added that to the list.

I'd be grateful of any other such texts.

▲donatj 112 days ago

Your link for literate programming just gets me a Parallels H-Sphere error

▲WillAdams 112 days ago

Try searching for "literate programming" --- it should be the top link.

▲austin-cheney 111 days ago

I have seen this subject come up as the most important factor of human behavior in every one of my employments. Counting both my corporate time and military time that is nearly 50 years employment time at more than 20 different organizations.

First things first, complex is a fancy word that means many. That's it, so don't over think it. Everything else is bias and opinions. If you feel conflicted about that see the fantastic Rich Hickey talk: Simple Made Easy https://www.youtube.com/watch?v=SxdOUGdseq4

Secondly, how this actually manifests is that people are eager to solve some problem in a way they feel comfortable, whether by process or automation, and then nobody wants to maintain anything, especially if it was written by somebody else. This is a critical failure.

The best solution I have found to solve for this is speed, not code style or anything else. Uncertainty increases proportionally to the duration between testable iterations. For example if it takes 30 seconds to try out an idea and reset back to the prior state people feel more certain about the environment in which they work because risk is low and learning is fast. If, on the other hand, it takes 90 minutes to try out an idea people will be exceedingly cautious about what they try and propose because everything is fragile and expensive.

Knowing that speed is the solution is not convincing in a world where most software developers cannot measure anything. Instead it must be mandated that software has only one purpose: automation, so automate the shit out of it. When that becomes the thesis of all work speed naturally self-amplifies and people split into two camps. The first camp is people that can actually write original software and the second camp is people who will kill you with internal processes, such as complexity checks and code style rules, because they cannot automate their way out of it.

▲davidw 112 days ago

This is interesting. Something I long wondered about Lisp code, was how having glyphs that are angled (parenthesis) without much indentation in many cases might be difficult to read just because of the visual aspects of it.

     (((
       ((
         (

Takes some staring at to figure out what's what.

▲skydhash 112 days ago

Lisp code is an AST. Once that's internalized, the parenthesis fade in the background. Mentally, instead of editing code, you're just arranging the branches. So, when reading, you can usually ignore whole sections as they will evaluate to a single value (side effects are possible, but strongly discouraged)

▲davidw 112 days ago

Having 4 spaces as indentation helps people tease out where the branches even are in languages like C or Python or whatever, rather than the 1 or 2 that you see with a lot of Lisp. And those angled parents make lining things up vertically a teeny bit more difficult.

▲kazinator 111 days ago

Two space indentation is fairly common in C code bases.

GNU projects use a kind of hybrid indentation where child statement are indented by two spaces, but if they are compound statements the indent their interior by another two spaces:

  if (proprietary(program))
    roll_on_floor_twitching(stallman);
  else
    {
      calm_down(stallman);
      make_indent_weirdly(stallman, everyone);
    }

Google’s style guides for various "C likes" also recommend two space tabs.

"Use only spaces, and indent 2 spaces at a time." [Google C++ Style Guide, https://google.github.io/styleguide/cppguide.html]

If you've been staring at Linux kernel code for weeks, with its 8 space tabs, and the edit two-space-tab code, it will take getting used to at first. The indentation will seem small. Soon, it will expand in your mind's eye and look large.

▲davidw 111 days ago

Just because it's not uncommon doesn't mean it is not wrong, though!

And it's still easier to see in C code than something with a bunch of angly parenthesis.

▲ 111 days ago

▲skydhash 112 days ago

The one issue with procedural is all the temporary variables and the fact that the variables themselves are intertwined with function calls. With Lisp, the whole branch is self-sufficient. It's a different reading method. Just like reading Prolog requires a different strategy.

▲kazinator 111 days ago

> side effects are ... strongly discouraged

This is simply a false generalization about a whole family that in its broadest interpretation includes completely different languages, most of them multi-paradigm.

▲khaledh 112 days ago

This, and the prefix nature of operators. That's the primary reason every time I try to give Lisp a chance, I get turned off by the maze of parens that I have to unravel in my head, especially for long, nested calls.

For Lispers, good for them on knowing how to wire their brains to read this effortlessly. For the rest of us, there's a reason why Python's syntax is so easy to read for most people.

▲mtreis86 111 days ago

Yeah it helps if it is properly formatted and intended to be easy to read. Style takes effort. A lot of us let emacs handle the formatting - auto indent and something like paredit to move lists around. Once you get a feel for how the tools move things it is a bit easier to predict, but even then it takes someone putting in effort to make it maximally readable.

▲gwbas1c 112 days ago

> To bring closure to the story at the beginning of this post, the codebase that was breaking my brain had several anti-patterns, specifically related to items 1, 2, 3, 6, and 8 in the list above.

FYI: If you get into this situation in C#, the .editorconfig file and the "dotnet format" command are a godsend.

I inherited a very large, and complicated C# codebase with a lot of "novice" code and inconsistent style. I spent about 3 weeks adding rules to .editorconfig, running "dotnet format" and then manually cleaning up what it couldn't clean up. Finally, I added a Github Action to enforce "dotnet format" in all pull requests.

As a result: 1: The code was significantly more readable. 2: It trapped mistakes from everyone, including myself.

There are a few areas where we have to disable a rule via #pragma; but these are the exception, not the norm.

▲maleldil 111 days ago

This is a good idea, and I also enforce something similar in my projects. I credit Go for popularising gofmt, including strong defaults and little customisation. My main complaint is that gofmt doesn't break lines.

▲m463 111 days ago

I enjoy the visual uncomplexification of python indenting.

After switching back and forth with languages with "extra" syntax, it seems visually and cognitively cleaner.

that said, there were some things about perl that I liked cognitivel, like being able to say "unless" instead of "if not"

▲farceSpherule 112 days ago

I was an engineering manager back in the day (Java). People would get lost in the sauce with bracket placement, number of tabs, etc., etc. In order to avoid religious battles, I instituted a code formatter/beautifier. It would format all code upon commit.

Problem solved.

Although another conversation, people did not want to document their code. So I took the carrot / stick approach. I had to approve all commits and if code did not have javadoc, I did not approve the commit. If your commit was not on time, then that impacted your performance which, in turn, impacted your pay. People bitched at first but whatever. At this particular place, we were trying to get bought. Having documentation and other IP made us more valuable. It forced devs to put actual thought into how to manage their time.

▲deadbabe 112 days ago

You really should try to pack an unbroken thought as a single line of code as much as possible. That’s the idea behind chaining multiple functions together on one line instead of spreading it out over several lines. Eyes go horizontally more naturally than up and down, it fits our vision’s natural aspect ratio. And stop making deep nestings.

Making a single function call per line assigning output to a variable each time is really just for noobs who don’t have great code comprehension skills and appreciate the pause to have a chance to think. If the variable’s purpose for existence is just to get passed on to a next function immediately, it shouldn’t exist at all. Learn to lay pipe.

▲__mharrison__ 112 days ago

I agree with you on piping but writing each method on its own line makes the code very approachable (also easier to work with).

Consider this code (from a course I'm teaching this week):

    (df
      .pipe(lambda df_: print(df_.columns) or df_)
      .groupby('activity_id', observed=True)
      [non_agg_cols]
      .apply(lambda g: g.assign(distance=calculate_distance_np(g)), include_groups=True)
      .pipe(fix_index)
      .pipe(lambda df_: print('DONE!') or df_)
    )

vs:

    (df.pipe(lambda df_: print(df_.columns) or df_).groupby('activity_id', observed=True) [non_agg_cols].apply(lambda g: g.assign(distance=calculate_distance_np(g)), include_groups=True).pipe(fix_index).pipe(lambda df_: print('DONE!') or df_))

▲deadbabe 112 days ago

I don’t really think the first example is easier to read, it’s more of an illusion. A skilled reader should be able to carry the context as they read along. It is only because we occasionally work with inexperienced coders that the list style is necessary. Consider:

“The quick brown fox jumped over the lazy ass dog”

The quick brown fox

jumped over

the lazy ass dog

The second example helps a reader understand the subjects and action but it is wholly unnecessary for people who know how to read.

▲__mharrison__ 112 days ago

It is certainly easier to work with the former. If you need to comment out a line, it is painless.

▲throwanem 110 days ago

You wouldn't happen by any chance to be hiring in the US?

▲zombot 111 days ago

> Let me be upfront: there is no commonly-used and accepted metric for code readability.

Indeed, I haven't found an autoformatter yet whose default settings I find "readable". The best of them is still clang-format, because it has so many parameters I can tune. By now I have a .clang-format config file that almost comes close to producing the formatting I do manually.

I also find functional patterns way more readable than their procedural counterparts.

▲deepsun 112 days ago

> there’s a better chance that the programmer forgets to properly handle all of the possibilities

Author totally forgot about IDEs. Yes, I know some coders frown upon IDEs and even color coding (like Rob Pike), but any modern code editor will shout loudly about an unhandled null-pointer check.

Also depends on the language, e.g. it's less reliable in Javascript and Python, but in static typed languages it's pretty obvious at no additional cognitive load.

▲TrianguloY 111 days ago

As someone with relatively bad memory, in general less variables = easier to understand. The more I need to remember, the more time I'll spent. IDEs help by providing inline hints and documentation, but the main issue is that a variable _can_ be used later, so you need to remember it. No variable (like on a chained/piped construct) means that the value is generated and immediately used.

▲seeinglogic 112 days ago

Author here.

Thank you for all the thoughtful comments and great stuff I didn't think of (also has been a hot minute since I wrote the article).

I appreciate the discussion!

▲superjan 112 days ago

This is, by a large margin, the best article about code complexity I have read in a looong time. Thanks.

▲fs_software 112 days ago

This came up at work the other day re: client-side code readability.

In one camp, folks were in favor of component extraction/isolation with self-documenting tests wherever possible.

In the other camp, folks argued that drilling into a multi-file component tree makes the system harder to understand. They favor "locality of behavior".

Our consensus - there needs to be a balance.

▲rob74 112 days ago

Those were exactly my thoughts while reading this article: if your codebase (over)uses inheritance, interfaces, traits, facades, dependency injection etc. etc. to thinly spread any given functionality over several files, no amount of formatting or nice naming is going to save you...

▲moi2388 112 days ago

I find the main problem is not wanting to split up functions.

I greatly prefer small helper functions, so that a more complicated one becomes more readable.

Even declaring a little local variable which explains in English what the condition you’re going to test is supposed to do is greatly appreciated

▲snitzr 112 days ago

Shoutout to the pipe operator in R. The code equivalent of "and then." It helps to unnest functions and put each action on one line. I know R is more for stats and data, but I just think it's neat.

▲stared 112 days ago

Yes, dplyr pipes are wonderful.

Also, for the same reason, I find JavaScript list comprehensions cleaner than those in Python - as in the former it is possible to chain maps and filters.

Also, now there is a new pipe syntax in SQL, that adds a lot to readability.

▲flobosg 111 days ago

I just wanted to point out that, apart from dplyr/magrittr, R introduced a native pipe operator (|>) a few years ago.

▲memhole 112 days ago

For anyone interested in this as design, it’s called method chaining.

▲closed 112 days ago

I think piping and method chaining are a little bit different.

Piping generally chains functions, by passing the result of one call into the next (eg result is first argument to the next).

Method chaining, like in Python, can't do this via syntax. Methods live on an object. Pipes work on any function, not just an object's methods (which can only chain to other object methods, not any function whose eg first argument can take that object).

For example, if you access Polars.DataFrame.style it returns a great_tables.GT object. But in a piping world, we wouldn't have had to add a style property that just calls GT() on the data. With a pipe, people would just be able to pipe their DataFrame to GT().

▲memhole 112 days ago

Good to know. I assumed it was all done via objects or things like objects.

So is piping more functional programming?

▲closed 112 days ago

I think it's often a syntax convenience. For example, Polars and Pandas both have DataFrame.pipe(...) methods, that create the same effect. But it's a bit cumbersome to write.

Here's a comparison:

* Method chaining: `df.pipe(f1, a=1, b=2).pipe(f2, c=1)`

* Pipe syntax: `df |> f1(a=1, b=2) |> f2(c=1)`

▲memhole 112 days ago

Ok, that’s helpful. Thanks!

▲__mharrison__ 112 days ago

I wrote a book dedicated to writing Pandas code in this style, Effective Pandas 2.

I've seen many who complain about this style of coding, but once they try it, they are sold. I love reading reviews about how adopting this made their code easier to write, debug, read, and collaborate.

▲begueradj 112 days ago

The notion of "readable code" involves 2 parties:

a) The skill level of the person who produces the code

b) The skill level of the person who reads the code.

Most of the time, we tend to blame the a) person.

▲mont_tag 111 days ago

This is true, but it is more nuanced than that. It also has to do with familiarity with common patterns and willingness to accept that not everyone has the same style you do.

▲morning-coffee 112 days ago

> These metrics definitely are debatable (they were made in the 70’s…)

What is it about a decade that makes contributions produced thereabout "debatable"?

▲AnimalMuppet 112 days ago

In 50 years, we've learned some things. Not all "good advice for programming" from the 1970s is still actually good advice.

▲russelg 112 days ago

I believe their point is less about the specific decade, but rather they were made over 50 years ago.

▲Duanemclemore 111 days ago

It would be interesting to see how APL and other array languages stack up in this kind of analysis. I'm guessing they would do well...

▲odyssey7 111 days ago

When are they planning on updating the simple sabotage field manual pdf?

▲mwkaufma 112 days ago

All of the leading code samples look _fine_. This whole article is the kind of superficial bikeshedding feedback one gets from code-reviewers who are just phoning it in and don't care about the substance of the problem or solution.

▲ezoe 112 days ago

Any nesting and indirection burden the mental fatigue on understanding the code: nesting conditional statements, macros, functions, classes while it's necessary to use or we will have so many duplicated codes which also burden the mental fatigue anyway.

There was an extreme argument on a SNS recently that someone claimed that he prohibit nesting if in their work.

Shorter-lived Variables argument doesn't always work. One of the most horrible code I read use very short-lived variables:

val_2 = f(val), val_3 = g(val), ...

It's Erlang. Because Erlang's apparent variable isn't a variable, but just a name bound to a term.

▲0xbadcafebee 112 days ago

I just wanted to point out that "Cognitive Complexity" [1] was not invented by SonarSource, it is an academic principle created in the 1950s and has more to do with psychology than computer science. Computer science has over-simplified the term to mean "hey there's a lot of stuff to remember this is hard".

Psychology tends to have a wider scope of thought and research put into it [2] [3]. For example, one way it's used is not to measure how complex something is, but how capable one particular person is at understanding complex things, versus a different human [4]. This can affect everything from leadership decisions [5] to belief in climate change [6].

I point this out because all too often Engineers hyper-focus on technical details and forget to step back and consider a wider array of factors and impacts - which, ironically, is what cognitive complexity is all about. It's the ability of a person to think about more things in a deeper way. Basically, cognitive complexity is a way to talk about not just things, but people.

We also have a tendency as Engineers to try to treat everyone and everything as a blob. We have to design our language in X way, because all people supposedly work in the same way, or think the same way. Or we have to manage our code in a certain way, because all the team members are assumed to work better that way (usually in whatever way is either easier or simpler).

One thing I wish people would take away from this, is that not only is cognitive complexity actually useful (it describes how language is able to work at all), but some people are better at it than others. So "avoiding cognitive complexity" is, in many ways, a bad thing. It's like avoiding using language to convey ideas. Language and communication is hard, but you're reading this right now, aren't you? Would you rather a pictogram?

[1] https://en.wikipedia.org/wiki/Cognitive_complexity [2] https://www.jstor.org/stable/2785779 [3] https://pubmed.ncbi.nlm.nih.gov/11014712/ [4] https://testing123.education.mn.gov/cs/groups/educ/documents... [5] https://deepblue.lib.umich.edu/handle/2027.42/128994 [6] https://www.sciencedirect.com/science/article/abs/pii/S02724...

▲hackburg 111 days ago

[dead]

▲indulona 112 days ago

[dead]

▲jedisct1 112 days ago

Rust.