- Ignore the SSA conversion algorithms that don’t use dominance frontiers. Also ignore the dominator tree generation algorithms that aren’t Langauer-Trajan. Reason: you’ll want to have a dominator tree (ie immediate dominators) anyway for a bunch of SSA optimizations. So, you’ll want Langauer-Tarjan. And if you have that, then using dominance frontiers to generate SSA is just easier.
- CPS isn’t really anything like SSA. Don’t believe that hype. Any claim in PL of two languages or forms being equivalent is not falsifiable because any two Turing complete languages are going to have some transformation between them, so really these papers are just saying “hey look I invented a translation that we all knew was going to be possible and I like my translation for aesthetic reasons”. Working with CPS is nothing like working with SSA.
- The best way I know of representing Phi in your SSA IR is a Pizlo special. I haven’t written it up except by implementing it (JavaScriptCore uses Pizlo SSA). Here’s the idea. Phi takes no arguments (no block arguments and no value arguments). Each Phi has a shadow variable (in addition to the implicit SSA variable it assigns to). Phi just loads the value out of its shadow variable and assigns it to its implicit variable. Then I have a second instruction called Upsilon that takes two args: an input variable and a Phi. It loads the input variable’s implicit value (just as any SSA use would) and stores it to the Phi’s shadow variable. The result of using this IR is that you have zero coupling between your CFG and the SSA graph. It makes CFG transforms much much easier to write. It also means much less special casing of Phi, in practice.
My prefered form is basic blocks with arguments. Where the final jump has parameters it passes to it's destination blocks.
That form has a 1-1 equivalence with SSA and I find it much easier to reason about. Every SSA algorithm I have looked at so far works just fine on the representation. Plus there is none of the annoying transforming into and out of SSA.
If you omit the mucking about with return closures in CPS what is left is basic blocks with arguments.
To see basic blocks with arguments as a form of SSA just see every basic block argument as a phi function, and the callers parameters that feed into that argument as the other end of the sources of the phi function.
Just from your description I think your pizlo special SSA is actually a form of basic blocks with arguments. What I don't see from your description is why Upsilon and phi aren't combined into a single notion (What I would call an incoming block parameter).
I don't think basic block with arguments is CPS because CPS, in practice, involves return closures.
How does Upsilon know which values from other basic blocks it could receive? That information is critical for transformations like constant propagation, and register allocation.
Upsilon uses an SSA value, and assigns to a Phi. The use of an SSA value is just a normal SSA use.
The assignment to the Phi is special, and follows Static Single Use law.
In practice, you'll want to either have all Phis know about the Upsilons that refer to them, or you'll want to have a handy analysis that tells you about the Upsilons that refer to Phis. This part is no different from how your SSA form will either have user tracking (like LLVM IR) or an analysis that tells you the users.
This was my idea too; I also think this is better.
If this is the case, this is an approach I've taken in the past to unify how LLVM-based taint tracking instrumentation of "normal" `alloca`s and phi nodes works, e.g.: https://github.com/lifting-bits/vmill/blob/master/tools/Tain...
Allocas have no requirement that there's a static single use. They are quite hard to analyze.
Isn't this similar to block arguments? These shadow variables would have been block parameters, just unordered and maybe not explicitly collected in a set.
Let's first define a syntax for SSA and some terminology. Here's an example SSA node:
A = Add(B, C)
In reality, this will be a single object in your in-memory representation, and the names are really addresses of those objects. So, this node has an "implicit variable" called A; it's the variable that is implicitly assigned to when you execute the node. If you then do: X = Sub(A, 1)
Then "A" is just a pointer to the Add node, and we're using the implicit variable "A".Here's an example function:
int foo(int a, int b)
{
int x;
if (a)
x = b + 1
else
x = b * 2
return x + 42;
}
Here's an SSA program with a Phi in Pizlo form: root:
A = GetArgument(0)
B = GetArgument(1)
Branch(A, then, else)
then:
X1 = Add(B, 1)
Upsilon(X1, ^X)
Jump(return)
else:
X2 = Mul(B, 2)
Upsilon(X2, ^X)
Jump(return)
return:
X = Phi()
R = Add(X, 42)
Return(R)
In Pizlo form:- Every SSA node has an implicit variable, as mentioned above.
- Every Phi node has a shadow variable in addition to the implicit variable.
Let's say that given a Phi like "X = Phi()", the implicit variable is called "X", and the shadow variable is called "^X".
Therefore, the semantics of an upsilon like "Upsilon(X1, ^X)" is just "set ^X = X1". And the semantics of a Phi like "X = Phi()" is just "set X = ^X".
In other words, you can think of Upsilon as being a side effect (a store to a shadow variable). And you can think of Phi as being a side effect (a load from a shadow variable). You can model them that way in your effect analysis to block reordering Upsilons and Phis.
But also, the shadow variables of Phis in Pizlo form are "Static Single Use" (SSU) variables. This falls out naturally from the fact that the only syntax for loading a shadow variable is the Phi itself. So you can think of Pizlo form as "SSA-SSU form".
The main benefit of this form is that basic blocks - and all CFG data structures - have zero knowledge about SSA. There are no basic block arguments. There's no requirement that Phis appear at the tops of blocks. In fact, this is a valid program in Pizlo form (albeit suboptimal):
M = Stuff(...)
Upsilon(M, ^N)
N = Phi()
MoreStuff(N)
Here, there's a Phi in them middle of a basic block, and there's an Upsilon just before it. That's fine. This is important, because it means that you can do CFG transforms that blow away control flow edges without worrying about fixing your Phis.In any Pizlo-form compiler, you'll want to have a Phi simplification pass, which you can implement either by running Cytron or by running any other SSA converter. The simplest is just to just fixpoint the rule that if you have a Phi that has Upsilons that only use the Phi or exactly one other value, then replace the Phi with that other value.
Worth calling out that this is the form used by two forms in JavaScriptCore:
- DFG-SSA (used for high level optimizations in the FTL JIT)
- B3 (used for low level optimizations in the FTL JIT)
B3 is somewhat well documented: https://www.webkit.org/docs/b3/intermediate-representation.h...
Though with CPS you can do fun stuff like copy-and-patch compilation and the fancified musttail interpreter thing (which I'm not sure even has a proper name). Admittedly the two are so similar that the second one could (probably) be converted into the first one without too much trouble. From what I understand, IANAL &etc.
SSA makes auditing code for dataflow issues much easier - both for people and programs. If you learn how to transform your code to SSA, you might find yourself choosing to write code this way so that you know, exactly, where that input to whatever sensitive string you are building came from. You can more easily trace the lifetime of that input and see what transformations it has been through already.
I feel like I am the only person in the world talking about SSA outside of compilers! It is useful for humans as well, and we all want to write readable and secure code.
A paper that was suggested to me (not mentioned in the linked blog post) is:
"Practical Improvements to the Construction and Destruction of Static Single Assignment Form", by Preston Briggs, Keith D. Cooper, Timothy J. Harvey, I. Taylor Simpson all at Rice University SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 28(8), 1–28 (July 1998)
Note also if you’re dealing with weird control flow you might get oddities in translating out of SSA form: https://www.tjhsst.edu/~rlatimer/papers/sreedharTranslatingO.... Sreedhar’s algorithm for translating out of SSA form is relatively easy to implement.
This is probably useless to you unless you know Common Lisp, but I put together a couple of toy implementations of these algorithms years ago to help myself learn them. There’s a couple of nits the papers gloss over that are noted in comments. You’ll need to compute liveness and dominance frontiers. Examples for those are in the project too. https://github.com/rayiner/portfolio/blob/master/ssa_analysi...
https://github.com/rayiner/portfolio/blob/master/ssa_analysi...
It does make sense because it can produce reasonably efficient machine code directly from an AST so it is doing a lot of work a naive SSA lowering algorithm leaves for later stages.
[0] leave this as an exercise for the reader, we aren't on speaking terms lately.
SSA (IBM, 1988) introduced the "Computed COMEFROM".