Transpiler, a Meaningless Word (2023)

people.csail.mit.edu

125 points by jumploops 9 days ago

jasode 3 days ago

Whenever someone argues the uselessness or redundancy of a particular word, a helpful framework to understand their perspective is "Lumpers vs Splitters" : https://en.wikipedia.org/wiki/Lumpers_and_splitters

An extreme caricature example of a "lumper" would just use the word "computer" to label all Turing Complete devices with logic gates. In that mindset, having a bunch of different words like "mainframe", "pc", "smartphone", "game console", "FPGA", etc are all redundant because they're all "computers" which makes the various other words pointless.

On the other hand, the Splitters focus on the differences and I previously commented why "transpiler" keeps being used even though it's "redundant" for the Lumpers : https://news.ycombinator.com/item?id=28602355

We're all Lumpers vs Splitters to different degrees for different topics. A casual music listener who thinks of orchestral music as background sounds for the elevator would be "lump" both Mozart and Bach together as "classical music". But an enthusiast would get irritated and argue "Bach is not classical music, it's Baroque music. Mozart is classical music."

The latest example of this I saw was someone complaining about the word "embedding" used in LLMs. They were asking ... if an embedding is a vector, why didn't they just re-use the word "vector"?!? Why is there an extra different word?!? Lumpers-vs-splitters.

Izkata 2 days ago

"Compiler" encompassing "transpiler" I think is wrong anyway. There's a third term that doesn't seem to get nearly as much pushback, that didn't come up in your link, has yet to be mentioned here, and isn't in the article, but adds context for these two: decompiler.
Compiling is high-level to low-level (source code to runnable, you rarely look at the output).
Decompiling is low-level to high-level (runnable to source code, you do it to get and use the output).
Transpiling is between two languages of roughly the same level (source code to source code, you do it to get and use the output).
Certainly there's some wishy-washy-ness due to how languages relate to each other, but none of these terms really acts like a superset of the others.
- cestith 2 days ago
  
  I like your definitions, but all three of these could be called subsets of compilers.
  - ablob 2 days ago
    
    By the definitions given they can not, as no function subsumes another. By whatever you define as "compiler" maybe, but I see no point in this kind of interaction that essentially boils down to subsumtion to an entity you refuse to describe any further.
    Is there a merit to this? Can whatever you call compiler do more? Is it all three of the things mentioned combined? Who knows - as is stands I only know that you disagree with the definitions given/proposed.
    
    cestith 2 days ago
    
    I think they are fine definitions. I think a transpiler, a term rewriter, an assembler, a stand-alone optimizer, and even some pretty printers are subclasses of compilers.
    I define a compiler as something that takes an input in a language, does transformations, and produces a transformed output in a language. All of them do that, and they are more specific terms for types of compilers.
  - coldtea a day ago
    
    So what? A car, a bike, and a truck can all be called subsets of vehicles, but we still have (and need) different words for each type.
  - jrm4 2 days ago
    
    Except that they do what useful words do; provide (more) useful information.
    
    cestith 2 days ago
    
    Fair. I don’t believe I said they were useless terms for differentiation of types of compilers, though. I just said they can all be thought of as a class as different types of compilers.
cyco130 3 days ago

It's all about context, isn't it? "Humans vs. animals" is an important distinction to make in some contexts and useless in others. Insisting on the fact that humans are also animals if we're talking about, say, "language in humans vs. animals" is unproductive. It just makes discussions harder by forcing everyone to add "_non-human_ animals" to every mention. But if we're talking about, say, cellular biology, it's unproductive to force everyone to write "human and animal cells" instead of just "animal cells".
Similarly, distinguishing between transpilers and compilers might be important in some contexts and useless in others. Transpilers are source-to-source compilers, a subset of compilers. Whether it matters depends on the context.
mlyle 2 days ago

I think the argument here is not really where one should draw the line and whether transpiler should be a different word...
I think the argument centers on how transpilers are often justified as being something quite different in difficulty than writing a whole compiler -- and in practice, nearly the whole set of problems of writing a compiler show up.
So, it's more like, don't use the distinction to lie to yourself.
thfuran 3 days ago

>An extreme caricature example of a "lumper" would just use the word "computer" to label all Turing Complete devices with logic gates.
"Computer"? You mean object, right?
- ObscureScience 2 days ago
  
  You mean point of local maximum in the mass field?
mjburgess 3 days ago

I'm not convinced your L/S dichotomy applies. The concern there is that the natural world (or some objective target domain) has natural joints, and the job of the scientist (, philosopher, et al.) is to uncover those joints. You want to keep 'hair splitting' until the finest bones of reality are clear, then grouping hairs up into lumps, so their joints and connections are clear. The debate is whether the present categorisation objectively under/over-generates , and whether there is a factor of the matter. If it over-includes, then real structure is missing.
In the case of embeddings vs. vectors, classical vs., baroque, transpiler vs., compiler -- i think the apparent 'lumper' is just a person ignorant of classification scheme offered, or at least, ignorant of what property it purports to capture.
In each case there is a real objective distinction beneath the broader category that one offers in reply, and that settles the matter. There is no debate: a transpiler is a specific kind of compiler; an embedding vector is a specific kinds of vector; and so on.
There is nothing at stake here as far as whether the categorisation is tracking objective structure. There is only ignorance on the part of the lumper: the ignorant will, of course, always adopt more general categories ("thing" in the most zero-knowledge case).
A real splitter/lumper debate would be something like: how do we classify all possible programs which have programs as their input and output? Then a brainstorm which does not include present joint-carving terms, eg., transformers = whole class, transformer-sourcers = whole class on source code, ...
- jasode 3 days ago
  
  > i think the apparent 'lumper' is just a person ignorant of classification scheme offered, or at least, ignorant of what property it purports to capture.
  >In each case there is a real objective distinction
  No, Lumper-vs-Splitter doesn't simply boil down to plain ignorance. The L/S debate in the most sophisticated sense involves participants who actually know the proposed classifications but _chooses_ to discount them.
  Here's another old example of a "transpiler" disagreement subthread where all 4 commenters actually know the distinctions of what that word is trying to capture but 3-out-of-4 still think that extra word is unnecessary: https://news.ycombinator.com/item?id=15160415
  Lumping-vs-Splitting is more about emphasis vs de-emphasis via the UI of language. I.e. "I do actually see the extra distinctions you're making but I don't elevate that difference to require a separate word/category."
  The _choice_ by different users of language to encode the difference into another distinct word is subjective not objective.
  Another example could be the term "social media". There's the seemingly weekly thread where somebody proclaims, "I quit all social media" and then there's the reply of "Do you consider HN to be social media?". Both the "yes" and "no" sides already know and can enumerate how Facebook works differently than HN so "ignorance of differences" of each website is not the root of the L/S. It's subjective for the particular person to lump in HN with "social media" because the differences don't matter. Likewise, it's subjective for another person to split HN as separate from social media because the differences do matter.
  - tshaddox 2 days ago
    
    > Here's another old example of a "transpiler" disagreement subthread where all 4 commenters actually know the distinctions of what that word is trying to capture but 3-out-of-4 still think that extra word is unnecessary
    Ha. I see this same thing play out often where someone is arguing that “X is confusing” for some X, and their argument consists of explaining all relevant concepts accurately and clearly, thus demonstrating that they are not confused.
  - mjburgess 3 days ago
    
    I agree there can be such debates; that's kinda my point.
    I'm just saying, often there is no real debate it's just one side is ignorant of the distinctions being made.
    Any debate in which one side makes distinctions and the other is ignorant of them will be an apparent L vs. S case -- to show "it's a real one" requires showing that answering the apparent L's question doesnt "settle the matter".
    In the vast majority of such debates you can just say, eg., "transpilers are compilers that maintain the language level across input/output langs; and sometimes that useful to note -- eg., that typescript has a js target." -- if such a response answers the question, then it was a genuine question, not a debate position.
    I think in the cases you list most people offering L-apparent questions are asking a sincerely learning question: why (because I don't know) are you making such a distinction? That might be delivered with some frustration at their misperception of "wasted cognitive effort" in such distinction-making -- but it isnt a technical position on the quality of one's classification scheme
    
    pessimizer 2 days ago
    
    > it's just one side is ignorant of the distinctions being made.
    > No, Lumper-vs-Splitter doesn't simply boil down to plain ignorance.
    If I can boil it down to my own interpretation: when this argument occurs, both sides usually know exactly what each other are talking about, but one side is demanding that the distinction being drawn should not be important, while the other side is saying that it is important to them.
    To me, it's "Lumpers" demanding that everyone share their value system, and "Splitters" saying that if you remove this terminology, you will make it more difficult to talk about the things that I want to talk about. My judgement about it all is that "Lumpers" are usually intentionally trying to make it more difficult to talk about things that they don't like or want to suppress, but pretending that they aren't as a rhetorical deceit.
    All terminology that makes a useful distinction is helpful. Any distinction that people use is useful. "Lumpers" are demanding that people not find a particular distinction useful.
    Your "apparent L's" are almost always feigning misunderstanding. It's the "why do you care?" argument, which is almost always coming from somebody who really, really cares and has had this same pretend argument with everybody who uses the word they don't like.
    
    mjburgess 2 days ago
    
    I mean, I agree. I think most L's are either engaged in a rhetorical performance of the kind you describe, or theyire averse to cognitive effort, or ignorant in the literal sense.
    There are a small number of highly technical cases where an L vs S debate makes sense, biological categorisation being one of them. But mostly, it's an illusion of disagreement.
    Of course, the pathological-S case is a person inviting distinctions which are contextually inappropriate ("this isnt just an embedding vector, it's a 1580-dim! EV!"). So there can be S-type pathologies, but i think those are rarer and mostly people roll their eyes rather than mistake it as an actual "position".
- seg_lol 3 days ago
  
  > I'm not convinced your L/S dichotomy applies.
  Proceeds to urm actually split.
  - datadrivenangel 3 days ago
    
    All ontologies are false. But some are useful.
    
    mjburgess 2 days ago
    
    All ontologies people claim to be ontologies are false in toto
    All "ontologies" are false.
    There is, to disquote, one ontology which is true -- and the game is to find it. The reason getting close to that one is useful, the explanation of utility, is its singular truth
    
    cestith 2 days ago
    
    To be a lumper for a second, all models are flawed. But some are useful.
- ethmarks 3 days ago
  
  Ahh, so you're a meta-splitter.
  https://xkcd.com/2518/
kragen 3 days ago

> An extreme caricature example of a "lumper" would just use the word "computer" to label all Turing Complete devices with logic gates.
I don't think that's a caricature at all; I've often seen people argue that it should include things like Vannevar Bush's differential analyzer, basically because historically it did, even though such devices are neither Turing-complete nor contain logic gates.
- mjburgess 3 days ago
  
  'computer' is an ambiguous word. In a mathematical sense a computational process is just any which can be described as a function from the naturals to naturals. Ie., any discrete function. This includes a vast array of processes.
  A programmable computer is a physical device which has input states which can be deterministicaly set, and reliably produce output states.
  A digital computer is one whose state transition is discrete. An analogue computer has continuous state transition -- but still, necessarily, discrete states (by def of computer).
  An electronic digital programmable computer is an electric computer whose voltage transitions count as states discretely (ie., 0/1 V cutoffs, etc.); its programmable because we can set those states causally and deterministically; and its output state arises causally and deterministically from its input state.
  In any given context these 'hidden adjectives' will be inlined. The 'inlining' of these adjectives causes an apparent gatekeepery Lumpy/Splitter debate -- but it isnt a real one. Its just ignorance about the objective structure of the domain, and so a mistaken understanding about what adjectives/properties are being inlined.
  - lesam 3 days ago
    
    In fact ‘computer’ used to be a job description: a person who computes.
    
    kragen 3 days ago
    
    Yes, definitely. And "nice" used to mean "insignificant". But they don't have those meanings now.
  - kragen 3 days ago
    
    Most functions from the naturals to naturals are uncomputable, which I would think calls into question your first definition.
    It's unfortunate that "computer" is the word we ended up with for these things.
    
    mjburgess 3 days ago
    
    Ah well, that's true -- so we can be more specific: discrete, discrete computable, and so on.
    But to the overall point, this kind of reply is exactly why I don't think this is a case of L vs. S -- your reply just forces a concession to my definition, because I am just wrong about the property I was purporting to capture.
    With all the right joint-carving properties to hand, there is a very clear matrix and hierarchy of definitions:
    abstract mathematical hierarchy vs., physical hierarchy
    With the physical serving as implementations of partial elements of the mathematical.
    
    kragen 3 days ago
    
    Word definitions are arbitrary social constructs, so they can't really be correct or incorrect, just popular or unpopular. Your suggested definitions do not reflect current popular usage of the word "computer" anywhere I'm familiar with, which is roughly "Turing-complete digital device that isn't a cellphone, tablet, video game console, or pocket calculator". This is a definition with major ontological problems, including things such as automotive engine control units, UNIVAC 1, the Cray-1, a Commodore PET, and my laptop, which have nothing in common that they don't also share with my cellphone or an Xbox. Nevertheless, that seems to be the common usage.
    
    lo_zamoyski 3 days ago
    
    > Word definitions are arbitrary social constructs, so they can't really be correct or incorrect, just popular or unpopular.
    If you mean that classifications are a matter of convention and utility, then that can be the case, but it isn’t always and can’t be entirely. Classifications of utility presuppose objective features and thus the possibility of classification. How else could something be said to be useful?
    Where paradigmatic artifacts are concerned, we are dealing with classifications that join human use with objective features. A computer understood as a physical device used for the purpose of computing presupposes a human use of that physical thing “computer-wise”, that is to say objectively, no physical device per se is a computer, because nothing inherent in the thing is computing (what Searle called “observer relative“). But the physical machine is objectively something which is to say ultimately a collection of physical elements of certain kinds operating on one another in a manner that affords a computational use.
    We may compare paradigmatic artifacts with natural kinds, which do have an objective identity. For instance, human beings may be classified according to an ontological genus and an ontological specific difference such as “rational animal“.
    Now, we may dispute certain definitions, but the point is that if reality is intelligible–something presupposed by science and by our discussion here at the risk of otherwise falling into incoherence–that means concepts reflect reality, and since concepts are general, we already have the basis for classification.
    
    kragen 3 days ago
    
    No, I don't mean that classifications are a matter of convention and utility, just word definitions. I think that some classifications can be better or worse, precisely because concepts can reflect reality well or poorly. That's why I said that the currently popular definition of "computer" has ontological problems.
    I'm not sure that your definition helps capture what people mean by "computer" or helps us approach a more ontologically coherent definition either. If, by words like "computing" and "computation", you mean things like "what computers do", it's almost entirely circular, except for your introduction of observer-relativity. (Which is an interesting question of its own—perhaps the turbulence at the base of Niagara Falls this morning could be correctly interpreted as finding a proof of the Riemann Hypothesis, if we knew what features to pay attention to.)
    But, if you mean things like "numerical calculation", most of the time that people are using computers, they are not using them for numerical calculation or anything similar; they are using them to store, retrieve, transmit, and search data, and if anything the programmers think of as numerical is happening at all, it's entirely subordinate to that higher purpose, things like array indexing. (Which is again observer-relative—you can think of array indexing as integer arithmetic mod 2⁶⁴, but you can also model it purely in terms of propositional logic.)
    And I think that's one of the biggest pitfalls in the "computer" terminology: it puts the focus on relatively minor applications like accounting, 3-D rendering, and LLM inference, rather than on either the machine's Protean or universal nature or the purposes to which it is normally put. (This is a separate pitfall from random and arbitrary exclusions like cellphones and game consoles.)
    
    lo_zamoyski 2 days ago
    
    > That's why I said that the currently popular definition of "computer" has ontological problems.
    Indeed. To elaborate a bit more on this...
    Whether a definition is good or bad is at least partly determined by its purpose. Good as what kind of definition?
    If the purpose is theoretical, then the common notion of "computer" suffers from epistemic inadequacy. (I'm not sure the common notion rises above mere association and family resemblance to the rank of "definition".)
    If the purpose is practical, then under prevailing conditions, what people mean by "computer" in common speech is usually adequate: "this particular form factor of machine used for this extrinsic purpose". Most people would call desktop PCs "computers", but they wouldn't call their mobile phones computers, even though ontologically and even operationally, there is no essential difference. From the perspective of immediate utility as given, there is a difference.
    I don't see the relevance of "social construction" here, though. Sure, people could agree on a definition of computer, and that definition may be theoretically correct or merely practically useful or perhaps neither, but this sounds like a distraction.
    > I'm not sure that your definition helps capture what people mean by "computer" or helps us approach a more ontologically coherent definition either.
    In common speech? No. But the common meaning is not scientific (in the broad sense of that term, which includes ontology) and inadequate for ontological definition, because it isn't a theoretical term. So while common speech can be a good starting point for analysis, it is often inadequate for theoretical purposes. Common meanings must be examined, clarified, and refined. Technical terminology exists for a reason.
    > If, by words like "computing" and "computation", you mean things like "what computers do", it's almost entirely circular
    I don't see how. Computation is something human beings do and have been doing forever. It preexists machines. All machines do is mechanize the formalizable part of the process, but the computer is never party to the semantic meaning of the observing human being. It merely stands in a relation of correspondence with human formalism, the same way five beads on an abacus or the squiggle "5" on a piece of people denote the number 5. The same is true of representations that denote something other than numbers (a denotation that is, btw, entirely conventional).
    Machines do not possess intrinsic purpose. The parts are accidentally arranged in a manner that merely gives the ensemble certain affordances that can be parlayed into furthering various desired human ends. This may be difficult for many today to see, because science has - for practical purposes or for philosophical reasons - projected a mechanistic conceptual framework onto reality that recasts things like organisms in mechanistic terms. But while this can be practically useful, theoretically, this mechanistic mangling of reality has severe ontological problems.
AlienRobot 3 days ago

That's very interesting!
Splitters make more sense to me since different things should be categorized differently.
However, I believe a major problem in modern computing is when the splitter becomes an "abstraction-splitter."
For example, take the mouse. The mouse is used to control the mouse cursor, and that's very easy to understand. But we also have other devices that can control the mouse cursor, such as the stylus and touchscreen devices.
A lumper would just say that all these types of devices are "mouses" since they behave the same way mouses do, while a splitter would come up with some stupid term like "pointing devices" and then further split it into "precise pointing devices" and "coarse pointing devices" ensuring that nobody has absolutely no idea what they are talking about.
As modern hardware and software keeps getting built on piles and piles of abstractions, I feel this problem keeps getting worse.
- ethmarks 3 days ago
  
  Doesn't it make sense to use words that mean what you're using them to mean?
  By your logic I could use the term "apple" to describe apples, oranges, limes, and all other fruit because they all behave in much the same ways that apples do. But that's silly because there are differences between apples and oranges [citation needed]. If you want to describe both apples and oranges, the word for that is "fruit", not "apple".
  Using a touchscreen is less precise than using a mouse. If the user is using a touchscreen, buttons need to be bigger to accommodate for the user's lack of input precision. So doesn't it make sense to distinguish between mice and touchscreens? If all you care about is "thing that acts like a mouse", the word for that is "pointing device", not "mouse".
  - AlienRobot 2 days ago
    
    The point is that it's simpler to understand what something is by analogy (a touchscreen is a mouse) than by abstraction (a mouse is a pointing device; a touchscreen is also a pointing device), since you need a third, abstracting concept to do the latter.
bondarchuk 2 days ago

Whenever someone argues the uselessness or redundancy of a particular word we just have to remember that the word exists because at least two parties have found it useful to communicate something between them.
- thfuran 2 days ago
  
  But they may have done so before the meaning shifted or before other, more useful words were coined.
zem 2 days ago

in addition to that, some people just seem to have an extreme aversion to neologisms. I remember being surprised by that when ajax (the web technology) first came out and there was a huge "why does this thing which is just <I honestly forget what it was 'just'> need its own name?" faction.
downboots 2 days ago

and a combination lock is a permutation lock
antonvs 2 days ago

> But an enthusiast would get irritated and argue "Bach is not classical music, it's Baroque music. Mozart is classical music."
Baroque music is a kind of classical music, though.

oersted 3 days ago

I don't understand what the issue is: a transpiler is a compiler that outputs in a language that human programmers use.

It's good to be aware of that from an engineering standpoint, because the host language will have significantly different limitations, interoperability and ecosystem, compared to regular binary or some VM byte-code.

Also, I believe that they are meaningfully different in terms of compiler architecture. Outputting an assembly-like is quite different from generating an AST of a high-level programming language. Yes of course it's fuzzy because some compilers use intermediate representations that in some cases are fairly high-level, but still they are not meant for human use and there are many practical differences.

It's a clearly delineated concept, why not have a word for it.

kragen 3 days ago

GCC outputs textual GNU assembly language, in which I have written, for example, a web server, a Tetris game, a Forth interpreter, and an interpreter for an object-oriented language with pattern-matching. Perhaps you are under the illusion that I am not a human programmer because this is some kind of superhuman feat, but to me it seems easier than programming in high-level languages. It just takes longer. I think that's a common experience.
Historically speaking, almost all video games and operating systems were written in assembly languages similar to this until the 80s.
- oersted 3 days ago
  
  Of course I'm aware of this, I've written some assembly too, most definitions are leaky. And if GNU assembly had wide adoption among programmers right now and an ecosystem around it, then some people might also call GCC a transpiler (in that specific mode, which is not the default), if they care about the fact that it outputs in a language that they may read or write by hand comfortably.
  They also called C a high-level language at that time. There was also more emphasis on the distinction between assemblers and compilers. Indeed, they may have used the word compiler more in the sense we use transpiler now, I'm sure people were also saying that it was just a fancy assembler. Terminology shifts.
  - kragen 2 days ago
    
    I think what happened was that, when writing in assembly language was a common thing to do, programmers had a clearer idea of what a compiler did, so they knew better than to say "transpiler".
- crazygringo 3 days ago
  
  https://news.ycombinator.com/item?id=45912557
  - kragen 3 days ago
    
    Thank you for the link; I've responded comprehensively at https://news.ycombinator.com/item?id=45914592.
- NuclearPM 3 days ago
  
  You’re being \__
krapp 3 days ago

The issue is confused because of Javascript and the trend to consider Javascript "bytecode for the web" because it is primarily "compiled" from other languages, rather than being considered a language in its own right.
I've gotten into arguments with people who refuse to accept that there is any difference worth considering between javascript and bytecode or assembly. From that perspective, the difference between a "transpiler" and a "compiler" is just aesthetics.
- oersted 3 days ago
  
  I do think you are right, the concept is not common outside of the JS ecosystem to be fair. Indeed, it probably wouldn't make much sense to transpile in the first place, if it wasn't for these browser limitations. People would just make fully new languages, and it is starting to happen with WebAssembly.
  And the ecosystem of JVM and BEAM hosted languages does make the concept even murkier.

jchw 3 days ago

Transpilers are compilers that translate from one programming language to the other. I am not 100% sure where these "lies" come from, but it's literally in the name, it's clearly a portmanteau of translating compiler... Where exactly are people thinking the "-piler" suffix comes from?

Yes, I know. You could argue that a C compiler is a transpiler, because assembly language is generally considered a programming language. If this is you, you have discovered that there are sometimes concepts that are not easy to rigorously define but are easy for people to understand. This is not a rare phenomenon. For me, the difference is that a transpiler is intending to target a programming language that will be later compiled by another compiler, and not just an assembler. But, it is ultimately true that this definition is still likely not 100% rigorous, nor is it likely going to have 100% consensus. Yet, people somehow know a transpiler when they see one. The word will continue to be used because it ultimately serves a useful purpose in communication.

s20n 3 days ago

One distinction is that compilers generally translate from a higher-level language to a lower-level language whereas Transpilers target two languages which are very close in the abstraction level. For example a program that translated x86 assembly to RISC-V assembly would be considered a transpiler.
- kragen 3 days ago
  
  The article we are discussing has "Transpilers Target the Same Level of Abstraction" as "Lie #3", and it clearly explains why that is not true of the programs most commonly described as "transpilers". (Also, I've never heard anyone call a cross-assembler a "transpiler".)
  - MrJohz 2 days ago
    
    I don't really agree with their argument, though. Pretty much all the features that Babel deals with are syntax sugar, in the sense that if they didn't exist, you could largely emulate them at runtime by writing a bit more code or using a library. The sugar adds a layer of abstraction, but it's a very thin layer, enough that most JavaScript developers could compile (or transpile) the sugar away in their head.
    On the other hand, C to Assembly is not such a thin layer of abstraction. Even the parts that seem relatively simple can change massively as soon as an optimisation pass is involved. There is a very clear difference in abstraction layer going on here.
    I'll give you that these definitions are fuzzy. Nim uses a source-to-source compiler, and the difference in abstraction between Nim and C certainly feels a lot smaller than the difference between C and Assembly. But the C that Nim generates is, as I understand it, very low-level, and behaves a lot closer to assembly, so maybe in practice the difference in abstraction is greater than it initially seems? I don't think there's a lot of value in trying to make a hard-and-fast set of rules here.
    However, it's clear that there is a certain subset of compilers that aim to do source-to-source desugaring transformations, and that this subset of compilers have certain similarities and requirements that mean it makes sense to group them together in some way. And to do that, we have the term "transpiler".
  - jchw 2 days ago
    
    Abstraction layers are close to the truth, but I think it's just slightly off. It comes down to the fact that transpilers are considered source-to-source compilers, but one man's intermediate code is another man's source code. If you are logically considering neither the input and the output to be "source code", then you might not consider it to be a transpiler for the same reasons that an assembler is rarely called a compiler, even though assemblers can have compiler-like features: consider LLVM IR, for example. This is why a cross-assembler is not often referred to as a transpiler. Of course, terminology is often tricky: the term "recompiler" is often used for this sort of thing, even though neither the input nor the output is generally considered "source code", probably because they are designed to essentially construct a result as similar as possible to if you were able to recompile the source code for another target. This seems to contrast fairly well with "decompiler", as a recompiler may perform similar reconstructive analysis to a decompiler, but ultimately outputs more object code. Not that I am an authority on anything here, but I think these terms ultimately do make sense and reconcile with each-other.
    When people say "Same Level of Abstraction", I think what they are expressing is that they believe both of the programming languages for the input and output are of a similar level of expressiveness, though it isn't always exact, and the example of compiling down constructs like async/await shows how this isn't always cut-and-dry. It doesn't imply that source-to-source translations, though, are necessarily trivial, either: A transpiler that tries to compile Go code to Python would have to deal with non-trivial transformations even though Python is arguably a higher level of abstraction and expressiveness, not lower. The issue isn't necessarily the abstraction level or expressiveness, it's just an impedance mismatch between the source language and the destination language. It also doesn't mean that the resulting code is readable or not readable, only that the code isn't considered low level enough to be bytecode or "object code". You can easily see how there is some subjectivity here, but usually things fall far away enough from the gray area that there isn't much of a need to worry about this. If you can decompile Java bytecode and .NET IL back to nearly full-fidelity source code, does that call into question whether they're "compilers" or the bytecode is really object code? I think in those cases it gets close and more specific factors start to play into the semantics. To me this is nothing unusual with terminology and semantics, they often get a lot more detailed as you zoom in, which becomes necessary when you get close to boundaries. And that makes it easier to just apply a tautological definition in some cases: like for Java and .NET, we can say their bytecode is object code because that's what they're considered to be already, because that's what the developers consider them to be. Not as satisfying, but a useful shortcut: if we are already willing to accept this in other contexts, there's not necessarily a good reason to question it now.
    And to go full circle, most compilers are not considered transpilers, IMO, because their output is considered to be object code or intermediate code rather than source code. And again, the distinction is not exact, because the intermediate code is also turing complete, also has a human readable representation, and people can and do write code in assembly. But brainfuck is also turing complete, and that doesn't mean that brainfuck and C are similarly expressive.
kragen 3 days ago

On the contrary: it reifies people's prejudices and prevents them from seeing reality, often in the service of intentional deception, which for my purposes is the opposite of a useful purpose in communication.
There's currently a fad in my country for selling "micellar water" for personal skin cleansing, touted as an innovation. But "micelles" are just the structure that any surfactant forms in water, such as soap, dish detergent, or shampoo, once a certain critical concentration is reached, so "micellar water" is just water with detergent in it. People believe they are buying a new product because it's named with words that they don't know, but they are being intentionally deceived.
Similarly, health food stores are selling "collagen supplements" for US$300 per kilogram to prevent your skin from aging. These generally consist of collagen hydrolysate. The more common name for collagen hydrolysate is "gelatin". Food-grade gelatin sells for US$15 per kilogram. (There is some evidence that it works, but it's far from overwhelming, but what I'm focusing on here is the terminology.) People believe they are buying a special new health supplement because they don't know what gelatin is, but they are being intentionally deceived.
You might argue, "People somehow know micellar water when they see it," or, "People somehow know collagen supplements when they see them," but in fact they don't; they are merely repeating what it says on the jar because they don't know any better. They are imagining a distinction that doesn't exist in the real world, and that delusion makes them vulnerable to deception.
Precisely the same is true of "transpilers". The term is commonly used to mislead people into believing that a certain piece of software is not a compiler, so that knowledge about compilers does not apply to it.
- jchw 3 days ago
  
  > The term is commonly used to mislead people into believing that a certain piece of software is not a compiler, so that knowledge about compilers does not apply to it.
  Why would people use a word that has the word "compiler" in it to try to trick people into thinking something is not a compiler? I'm filing this into "issues not caused by the thing that is being complained about".
  - kragen 2 days ago
    
    Apparently nobody has ever said to you, "No, it's not a compiler, it's a transpiler," which makes you a luckier person than I am. People know less than you think.
    
    jchw 2 days ago
    
    I don't even understand why someone would say that. What's the point in asserting that something isn't a compiler? Not that I doubt that this really happens, but I don't know what saying something "isn't a compiler" is meant to prove. Is it meant to downplay the complexity of a transpiler?
    Obviously I believe transpilers are compilers. A cursory Google search shows that the word transpiler is equated to "source-to-source compiler" right away. If it truly wasn't a compiler, didn't have a true frontend and really did a trivial syntax-to-syntax translation, surely it would only be a translator, right? That is my assumption.
    But all that put aside for a moment, I do stand by one thing; that's still not really an issue I blame on the existence of the word transpiler. If anything, it feels like it is in spite of the word transpiler, which itself heavily hints at the truth...

f38 3 days ago

> Compilers already do things that “transpilers” are supposed to do. And they do it better because they are built on the foundation of language semantics instead of syntactic manipulation.

So you do know the difference.

SAI_Peregrinus 2 days ago

A compiler takes in one language and outputs some other language. E.g. C to LLVM IR or LLVM IR to x86_64 assembly.

An assembler is a type of compiler that takes in an assembly language and outputs machine code.

A transpiler is a type of compiler that takes in a language commonly used by humans to directly write programs and outputs another language commonly used by humans to directly write programs. E.g. c2rust is a C to unsafe Rust compiler, and since both are human-used languages it's a transpiler. Assembly language isn't commonly written by humans though it used to be, so arguably compilers to assembly language are no longer transpilers even though they used to be.

The existence of a transpiler implies a cispiler, a compiler that takes in code in one language and outputs code in that same language. Autoformatters are cispilers.

mbo 2 days ago

Partial evaluators would also be considered cispilers.

goranmoomin 3 days ago

IMO: Transpilers are compilers, but not all compilers are transpilers.

In my book, transpilers are compilers that consume a programming language and target human-readable code, to be consumed by another compiler or interpreter (either by itself, or to be integrated in other projects).

i.e. the TypeScript compiler is a transpiler from TS to JS, the Nim compiler is a transpiler from Nim to C, and so on.

I guess if you really want to be pedantic, one can argue (with the above definition) that `clang -S` might be seen as a transpiler from C to ASM, but at that point, do words mean anything to you?

ssrc 3 days ago

For me, the "human-readable" part is key. It's not just that the output is e.g. javascript, but that it is more or less human-readable with about the same organization as the original code.
If you implement SKI combinators, or three-address instructions, as functions in javascript, and that's the output of your compiler, I would not call that a transpiler.
- gampleman 3 days ago
  Exactly. For a web dev oriented example, I would call coffeescript a transpiler, since it would transform
  # some comment myFun = -> alert 'Hello CoffeeScript!'
  into
  // some comment var myFun; myFun = function() { return alert('Hello CoffeeScript!'); };
  clearly intending the output code to be quite readable (even preserving comments).
  Whereas Elm is a compiler since it transforms
  module Main exposing (main) import Html main = Html.text "Hello Elm!"
  into
  (function(scope){ 'use strict'; function F(arity, fun, wrapper) { wrapper.a = arity; wrapper.f = fun; return wrapper; } // about 4000 lines ommitted var $author$project$Main$main = $elm$html$Html$text('Hello Elm!'); _Platform_export({'Main':{'init':_VirtualDom_init($author$project$Main$main)(0)(0)}});}(this));
  Clearly not intended for (easy) human consumption.
- ethmarks 3 days ago
  
  Would it still count as a transpiler if it minifies the code at the end?
  For example, most SCSS workflows I've worked with converert SCSS source code into minified CSS, which is pretty difficult for a human to read. But I think that SCSS => CSS still counts as transpiling.
  - aleph_minus_one 3 days ago
    
    > Would it still count as a transpiler if it minifies the code at the end?
    I would say "yes, but the minimization is an additional step that is not actually a direct part of the transpiling process." :-)
    So, a program that does this would not a transpiler by itself, but a program that
    - executes a pipeline of which the transpiling is the most important step,
    - can also be used as a transpiler by making transpiling the only step in the executed pipeline.
    
    ethmarks 2 days ago
    
    What if the minification is inseparable from the transpiler? Like what if it converts the SCSS into some weird graph representation, applies the transpilation features (variables, mixins, etc) on that graph representation, then converts the graph representation into minified CSS? At no point in the process was it ever human-readable CSS. I don't know enough about the internals of transpilers to know if they actually do anything like this, but one could imagine a hypothetical program that does.
    And furthermore, what if you run Prettier on the minified output, turning it into readable CSS? The pipeline as a whole would input SCSS and output formatted CSS and therefore would be considered a transpiler, but the subprogram that does all of the SCSS heavy lifting would input SCSS and output minified SCSS, making it not a transpiler.
    P.S. I love your username
    
    aleph_minus_one 2 days ago
    
    What you describe is in my opinion a corner case. The following is just my personal opinion on this topic; it is very easy to argue for a different viewpoint:
    I personally think that the central point whether it is a transpiler or not is whether the generated output is in the "spirit" in which the output language was conceived to be written by a human programmer.
    So, if the outputted CSS code is in a rather similar "spirit" to how a human programmer would write it (though having possibly lots of traces of being auto-generated), it is a transpiler.
    For example, if a transpiler generates hundreds of rules for CSS classes, but humans would solve the problem very differently using CSS code, it is rather not a transpiler, but some program that uses CSS as an output format for the reason that this is the output format that has to be used for technical reasons.
    This of course encompasses the case of minified CSS code: hardly any programmer would write minified CSS code in a text editor.
    Similarly, I would argue that a "transpiler" that generates highly non-idiomatic C code (i.e. it is "insanely obvious" that the output is not C code in the sense how the C language is "intended" to be used) is not a transpiler, but rather a compiler that uses C as some kind of high-level assembler code for output.
    In this sense I would indeed say that some "transpiler" that generates highly non-idiomatic JavaScript code is in my opinion rather a compiler that uses JavaScript as an output format because it is necessary because this is necessary to run the code in the browser. I am of course aware that many programmers do have a different opinion here.
    So, I would say a strong rule of thumb to decide transpiler or not transpiler is: if there was a choice to use a different output language than the destination language - would the transpiler use the latter one instead? So, to answer your question
    > And furthermore, what if you run Prettier on the minified output, turning it into readable CSS? The pipeline as a whole would input SCSS and output formatted CSS and therefore would be considered a transpiler, but the subprogram that does all of the SCSS heavy lifting would input SCSS and output minified SCSS, making it not a transpiler.
    If the goal is clearly to generate idiomatic CSS code that can be well understood by a human programmer, by my stance it clearly is a transpiler. If you, on the hand, create such an example just to find a corner case for "transpiler or not transpiler", I would say it is not.
shiomiru 3 days ago

I can usually read JS generated by TS, but calling the C Nim outputs "human-readable" is very generous considering it flattens most structured control flow to goto. (It's hard to do it otherwise, Nim has to deal with exceptions and destructors but C has neither.)
Classifying Nim as a transpiler also results in weird cases like NLVM[1] which most would consider a compiler even though it is a back-end on the same "level" as Nim's C generator.
[1]: https://github.com/arnetheduck/nlvm
- monocasa 2 days ago
  
  I mean, original Dartmouth BASIC only had if and goto, and was definitely designed as a human readable language.
skopje 2 days ago

I can read 6502 machine code raw hex. Now what. ;-)
writebetterc 3 days ago

I'd probably say that "transpiler" is not a very useful word with that definition.
- cygx 3 days ago
  
  Why is it useless? 'Compiler' denotes the general category, within which exist various sub-categories:
  For example, a 'native compiler' outputs machine code for the host system, a 'cross compiler' outputs machine code for a different system, a 'bytecode compiler' outputs a custom binary format (e.g. VM instructions), and a 'transpiler' outputs source code. These distinctions are meaningful.
- goranmoomin 3 days ago
  
  I can’t see why — I do think that the word does convey some sort of useful meaning with the above definition.

tuveson 2 days ago

I like the cover of the book Crafting Interpreters: https://craftinginterpreters.com/image/header.png

It's basically a flowchart showing all of the different things that we mean when we say compiler/interpreter/transpiler, and which bits they have in common.

Funny, but it has two paths for transpiler - the kind that parses and outputs source from an AST, and the asm.js kind, that actually just uses a high-level language as an assembly-ish target.

hn92726819 3 days ago

> We can make it a bit more terse using list comprehensions:

    import functools as ft
    def fact(n):
        lst = range(1, n)
        return ft.reduce(lambda acc, x: acc*x, lst)

Amusing that there's not a list comprehension in sight.

torginus 3 days ago

I think the distinction is meaningful - for example many compilers used to have C backends (GCC for example did) - so you code went through almost the entire compiler pipeline - from frontend to IR to backend where the backend did almost everything a compiler does, it only skipped target machine specific stuff like register allocation (possibly even that was done), arch specific optimizations and assembly generation.

A transpiler to me focuses on having to change or understand the code as little as possible - perhaps it can operate on the syntax level without having to understand scopes, variable types, the workings of the language. It does AST->AST transforms (or something even less sophisticated, like string manipulation).

In my mind, you could have a C++ to C transpiler (which removes C++ constructs and turns them into C ones, although C++ is impossible to compile without a rich understanding of the code), and you could have a C++ to C compiler, which would be a fully featured compiler, architected in the way I described in the start of the post, and these would be two entirely different pieces of software.

So I'd say the term is meaningful, even if not strictly well defined.

kragen 3 days ago

The link to Lindsey Kuper's post https://decomposition.al/blog/2017/07/30/what-do-people-mean... is great!

I think the note about generators may be a good definition for when one language is "more powerful" than another; at least it's a good heuristic:

> The input and output languages have the syntax of JavaScript but the fact that compiling one feature [generators] requires a whole program transformation gives away the fact that these are not the same language. If we’re to get beyond the vagaries of syntax and actually talk about what the expressive power of languages is, we need to talk about semantics.

If a given program change is local in language X but global in language Y, that is a way in which language X has more expressive power.

This is kind of fuzzy because you can virtually always avoid this by implementing an interpreter, or its moral equivalent, for language X in language Y, and writing your system in that DSL (embedded or otherwise), rather than directly in language Y. Then, that anything that would be a local change in language X is still a local change. But this sort of requires knowing ahead of time that you're going to want to make that kind of change.

Sadly https://people.csail.mit.edu/files/pubs/stopify-pldi18.pdf is 403. But possibly https://people.csail.mit.edu/rachit/files/pubs/stopify-pldi1... is the right link.

JoBrad 3 days ago

Thanks for the last link! At first read, the regeneration code is nuts: using a switch to assign a value, then comparing hard coded values. I only used generator functions in TS after they were supported in JS, so I’m going to step through that, just to understand it more.
- kragen 3 days ago
  
  Yeah, I mean, you either kind of have to do something like protothreads, or break apart the function into fragments at the yield points, converting it to explicit continuation-passing style.

coldtea a day ago

I find it a useful world to point to the distinction of converting code between two equally high level programming languages, vs between a higher level language to a low level representation (assembly, C, java bytecode, llvm ir, etc) target.

"Compiler already covers that"? Yeah, and animal already covers cat, shall we drop the term cat too?

vzaliva 2 days ago

In the academic programming languages research community, the term "transpiler" is barely used.

For example, Google Scholar search for "transpiler" yields just 3200 results, compared to ~1.4M for "compiler".

Bengalilol 3 days ago

Meaningless word + list of "lies"... Nice read anyways.

"BabelJS is arguably one of the first “transpilers” that was developed so that people could experiment with JavaScript’s new language features that did not yet have browser implementations"

Just my two cents. Haxe was created long time ago, and BabelJS is arguably not one of the first "transpilers" people can play with.

[1] https://en.wikipedia.org/wiki/Haxe

[2] https://haxe.org

merlinthegreen 3 days ago

I don't really understand the reasoning in the article. Nobody argues that orange is a meaningless word just because it's not wrong to call an orange a fruit.

Sure, a transpiler is a specialized form of compiler. However that doesn't mean it's not much clearer to describe a transpiler using the more specific name. As such recommending someone replace "compiler" with "transpiler" (when appropriate) does not mean using compiler is wrong. It simply means that, outside of some very niche-interest poetry, using transpiler is better!

StopDisinfo910 3 days ago

I think it's pretty clear to anyone with experience in the field that the notion of compilers, interpreters, transpilers are porous and even more so when you had the concept of VM ("Let's interpret a compiled artefact") and JIT ("I put a compiler in your interpreter. Don't worry, it all runs on the same VM in the end.")

These things live on a continuum. Still, I think the different worlds are useful. They put forward different concepts and ideas. It helps framing things.

adsharma 3 days ago

We need a new word for a universal transpiler. Something that can transpile to 7 or more languages.

Poly-transpiler? It will also trigger more people.

jrm4 2 days ago

There are a whole lot of meaningless, or worse, misleading words in computing, and this isn't one of them. What it lacks in technical precision, it makes up for with that little bit of utility. It doesn't much confuse things.

Off the top, lets compare that to "serverless."

paulsutter 3 days ago

Language interoperability is a material question. Outputting Javascript, Python, C++ vs assembler/machine code have very different implications for calls to/from other languages

Is JIT also meaningless?

But ultimately if you don’t want to use a word, don’t use it. Not wanting to hear a word says more about the listener than the speaker

Findecanor 3 days ago

I am not fond of the word either, but only because the use has often been used as a diminutive.

When used, it has often been implied that a compiler that outputs to a human-readable programming language wouldn't be a "real compiler".

carsonj 2 days ago

I always understood transpilers to be defined by what they do, not how they work. Whether it's implemented as an AST transformation or something more complex would be irrelevant.

bitwize 2 days ago

It's like "sideload". It's a buzzword for something else, something more general, that applies in certain conditions.

swatson741 2 days ago

I think the term transpiler is ok. It’s not pedagogical or anything but most engineering jargon is like that, and this defiantly isn’t the worst one I’ve seen.

fulafel 3 days ago

It would be good if we had a term that didn't confuse linking with translation. In English compiling means joining together many parts, after all.

JonChesterfield 3 days ago

That one is historically interesting.
I suspect the first compilers were named that because they were making compilations of assembly routines, probably slightly modified/specialised to the rest of the routines.
Compilers still do that. Some of the input is your source, but there's also "the compiler runtime" which is essentially a lot of extra routines that get spliced in, and probably "the language runtime" which gets similar treatment.
So compilers are still joining together parts, we've just mostly forgotten what crt or udiv.s are.
Linking and loading are more dubious names, but also they refer to specialised compilers that don't need to exist and probably shouldn't any more, so that may resolve itself over time.
- Joker_vD 3 days ago
  
  The first compilers were called "translators". The first linker/loader (kinda, A-0 was a... strange tool, by modern standards) was actually called "compiler", precisely because of the generic meaning of the word "compile".

theanonymousone 3 days ago

Still far better than "Serverless".

msla 2 days ago

My punt: Compilers and transpilers are both translators, and the only meaningful difference is whether the output is meant to be easily edited by a human, which is a spectrum more than a hard dividing line. The p2c Pascal to C translator [1] is pretty clearly a transpiler in that the C it outputs is pretty readable, the Stalin Scheme to C translator [2] is more clearly a compiler in that its output, even though it's C, is not human-readable unless you're a very dedicated type of person.

[1] https://github.com/FranklinChen/p2c

[2] https://en.wikipedia.org/wiki/Stalin_%28Scheme_implementatio...

So, where does BabelJS sit? Somewhere in between, depending on what language features you used in the input code. Obviously generators require heavy transformations, but other features don't.

wseqyrku 2 days ago

Category theory is on fire in this thread

zk108 3 days ago

"Programming languages are not just syntax; they have semantics too. Pretending that you can get away with just manipulating the former is delusional and results in bad tools."

So eloquently put, what starts off as just simple syntactic conversion usually snowballs into semantics very quickly.

nurettin 3 days ago

Today's meaningless word: Cloud

Rochus 9 days ago

"Transpiler" is no less well-defined a term than "compiler".

nrinaudo 3 days ago

The definition of compiler i learned was “takes some code, translate it to semantically equivalent code in a different language (which might be machine language, bytecode…)”. This is also used in PLaI, a respected learning resource: https://www.plai.org/
I think this is a pretty acceptable definition, and yes, it does make the term transpiler a little useless.
- gmac 3 days ago
  
  What I would add to your definition, to make a distinction from the common usage of compilation, is that the target language is on an approximately equivalent level of abstraction to the source. So, for example, Rust -> machine code is not transpilation, but Rust -> C++ is.
  I think this is how the word is commonly understood, and it’s not useless (even if there’s no absolute standard of when it does or does not apply).
  Edit: sorry, realise I should have read the article before commenting. The article calls out my definition as one of their ‘lies’. I guess I just disagree with the article. Words can be useful even without a 100% watertight definition. They’re for communication as well as classification.
  - kryptiskt 3 days ago
    
    One of the problems is that you might not use the target language at the equivalent level of abstraction. For example, C is a popular target language, but the C emitted may be very unidiomatic and nothing like human consumable code, it's not all that unusual that a language compiles all code to one big C function where the function calls in the language are jumps, which is a way to get around the limitations of the C calling conventions and stack.
    The same thing applies to compilation to Javascript, the resulting code may use a tiny subset of the language.
    I don't like the word transpiler, because there is nothing useful about the distinction (unless you count people using it to denigrate compilers that doesn't target traditional machine code).
    I could see the case of using it as a name when the transformation is reversible, like you could probably turn Javascript back into Coffeescript.
  - writebetterc 3 days ago
    
    What value does the word have? When I'm writing a compiler, it doesn't matter whether I target C or asm, or Javascript, as my output language. I'll still write it the same way.
    
    gmac 3 days ago
    
    OK, but words are not only for compiler-writers. As someone who encounters your compiler, if it targets an output language at a similar level as the input language it will give me a headstart in understanding what it does if I see it referred to as a transpiler rather than simply a compiler.
    Overall, I find this discussion very odd. It seems like a kind of deletionism for the dictionary. I mean, what's the use of the word 'crimson'? Anything that's crimson is also just 'red'. Why keep 'large' when we have 'big'? You could delete a large percentage of English words by following this line of thinking.
    
    gmac 3 days ago
    
    It gives you a better idea what a thing does?
    
    writebetterc 3 days ago
    
    To me, it doesn't. If someone says "tsc is a transpiler", it gives me nothing actionable. If you do say "it transpiles to JS", then I've got something, but that could just be "compiles to JS". It doesn't really tell me how the thing is constructed either.

s20n 3 days ago

> Lie #3: Transpilers Target the Same Level of Abstraction

> This is pretty much the same as (2). The input and output languages have the syntax of JavaScript but the fact that compiling one feature requires a whole program transformation gives away the fact that these are not the same language

It is not really the same as (2), you can't cherry pick the example of Babel and generalise it to every transpiler ever. There are several transpilers which transpile from one high-level language to another high-level language such as kotlin to swift. i.e; targeting the same level of abstraction.

Wonder what this person would say about macro expansions in scheme, maybe that should also be considered a compiler as per their definition.

kragen 3 days ago

BabelJS is the central example of "transpilers"; if BabelJS lacks some purported defining attribute of "transpilers", that definition is unsalvageable, even if there are other programs commonly called "transpilers" that do have that attribute.