How to write an interpreter

How to write an interpreter

How can I write a quick and dirty interpreter? [duplicate]

I have an interview where one of the areas I was told I might brush up on is «dynamic programming languages». So I figured I might spend this weekend writing one to bring as sample code. 🙂

Of course, given the time constraints, I plan on writing something very basic and preferably using a language and/or toolset that will make it extremely easy to do. Most of my experience is in Python, but I’m willing to spend a little time learning something new if it will make the task easier (and won’t take too long). Does anyone have any advice for me in terms of tools or languages that will make this easier?

7 Answers 7

Trending sort

Trending sort is based off of the default sorting method — by highest score — but it boosts votes that have happened recently, helping to surface more up-to-date answers.

It falls back to sorting by highest score if no posts are trending.

Switch to Trending sort

Simple interpreter written in OCaml

My example interpreter described in full here.

Types of expressions and values

Lexing and parsing

Use Camlp4 for parsing:

Define the lexer:

Define the parser:

Evaluation

Unbox an int or bool:

Evaluate an expression to give a value in the context of some bound vars :

Worked example

Define an example program as a string:

Lex and parse the string into an AST:

Evaluate the AST:

How to write an interpreter. Смотреть фото How to write an interpreter. Смотреть картинку How to write an interpreter. Картинка про How to write an interpreter. Фото How to write an interpreter

You’ll probably want to check out Lex and Yacc for lexing and parsing, and their python implementations

I used spark to write a pretty full featured DSL to express complicated conditionals for an old project of mine in 2 or 3 days (including unit tests).

It should be quite trivial in Python since spark (and other such modules) will give you tools you need to write a lexical and syntax analyser. You can implement a simple symbol table easily using a python dictionary and can either translate it into Python and eval or move it to some lower level language to run.

If you are quite versed in Python (== dynamic) then I think you should do well in your interview unless they ask the difference between interpreted and dynamic languages.

I’d recommend using Haskell with parsing combinators. To bone up on parsing combinators, don’t use the Wikipedia article; it’s very theoretical and will likely confuse you. Instead use the paper by Graham Hutton, which is excellent.

Interpreters and compilers are the «killer app» for the ML/Haskell family of languages, and I think you’ll be amazed at how quickly you can build something interesting.

For getting started I recommend you read Phil Wadler’s paper The Essence of Functional Programming, which contains a number of sample interpreters organized using monads. I think the example interpreters are well organized and easy to follow, although the explanation of monads in that paper may make your head hurt.

There’s also a very nice blog entry that goes through a single example in much more detail; it describes a Lisp interpreter written in Haskell. That writeup also contains a few comparisons between Haskell and Java, which may give you a feel for why many compiler writers prefer a functional language over an OO language for writing compilers and interpreters.

lotabout/write-a-C-interpreter

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

C interpreter that interprets itself.

How to Run the Code

File xc.c is the original one and xc-tutor.c is the one that I make for the tutorial step by step.

This project is inspired by c4 and is largely based on it.

However, I rewrote them all to make it more understandable and help myself to understand it.

Despite the complexity we saw in books about compiler design, writing one is not that hard. You don’t need that much theory though they will help for better understanding the logic behind the code.

There is also a chinese version in my blog.

The original code is licenced with GPL2, so this code will use the same licence.

About

Write a simple interpreter of C. Inspired by c4 and largely based on it.

Introduction

Fairy tales are more than true: not because they tell us that dragons exist, but because they tell us that dragons can be beaten.

G.K. Chesterton by way of Neil Gaiman, Coraline

I’m really excited we’re going on this journey together. This is a book on implementing interpreters for programming languages. It’s also a book on how to design a language worth implementing. It’s the book I wish I’d had when I first started getting into languages, and it’s the book I’ve been writing in my head for nearly a decade.

To my friends and family, sorry I’ve been so absentminded!

In these pages, we will walk step-by-step through two complete interpreters for a full-featured language. I assume this is your first foray into languages, so I’ll cover each concept and line of code you need to build a complete, usable, fast language implementation.

In order to cram two full implementations inside one book without it turning into a doorstop, this text is lighter on theory than others. As we build each piece of the system, I will introduce the history and concepts behind it. I’ll try to get you familiar with the lingo so that if you ever find yourself at a cocktail party full of PL (programming language) researchers, you’ll fit in.

Strangely enough, a situation I have found myself in multiple times. You wouldn’t believe how much some of them can drink.

But we’re mostly going to spend our brain juice getting the language up and running. This is not to say theory isn’t important. Being able to reason precisely and formally about syntax and semantics is a vital skill when working on a language. But, personally, I learn best by doing. It’s hard for me to wade through paragraphs full of abstract concepts and really absorb them. But if I’ve coded something, run it, and debugged it, then I get it.

Static type systems in particular require rigorous formal reasoning. Hacking on a type system has the same feel as proving a theorem in mathematics.

It turns out this is no coincidence. In the early half of last century, Haskell Curry and William Alvin Howard showed that they are two sides of the same coin: the Curry-Howard isomorphism.

That’s my goal for you. I want you to come away with a solid intuition of how a real language lives and breathes. My hope is that when you read other, more theoretical books later, the concepts there will firmly stick in your mind, adhered to this tangible substrate.

Every introduction to every compiler book seems to have this section. I don’t know what it is about programming languages that causes such existential doubt. I don’t think ornithology books worry about justifying their existence. They assume the reader loves birds and start teaching.

But programming languages are a little different. I suppose it is true that the odds of any of us creating a broadly successful, general-purpose programming language are slim. The designers of the world’s widely used languages could fit in a Volkswagen bus, even without putting the pop-top camper up. If joining that elite group was the only reason to learn languages, it would be hard to justify. Fortunately, it isn’t.

For every successful general-purpose language, there are a thousand successful niche ones. We used to call them “little languages”, but inflation in the jargon economy led to the name “domain-specific languages”. These are pidgins tailor-built to a specific task. Think application scripting languages, template engines, markup formats, and configuration files.

How to write an interpreter. Смотреть фото How to write an interpreter. Смотреть картинку How to write an interpreter. Картинка про How to write an interpreter. Фото How to write an interpreter

A random selection of some little languages you might run into.

Almost every large software project needs a handful of these. When you can, it’s good to reuse an existing one instead of rolling your own. Once you factor in documentation, debuggers, editor support, syntax highlighting, and all of the other trappings, doing it yourself becomes a tall order.

But there’s still a good chance you’ll find yourself needing to whip up a parser or other tool when there isn’t an existing library that fits your needs. Even when you are reusing some existing implementation, you’ll inevitably end up needing to debug and maintain it and poke around in its guts.

Long distance runners sometimes train with weights strapped to their ankles or at high altitudes where the atmosphere is thin. When they later unburden themselves, the new relative ease of light limbs and oxygen-rich air enables them to run farther and faster.

Implementing a language is a real test of programming skill. The code is complex and performance critical. You must master recursion, dynamic arrays, trees, graphs, and hash tables. You probably use hash tables at least in your day-to-day programming, but do you really understand them? Well, after we’ve crafted our own from scratch, I guarantee you will.

While I intend to show you that an interpreter isn’t as daunting as you might believe, implementing one well is still a challenge. Rise to it, and you’ll come away a stronger programmer, and smarter about how you use data structures and algorithms in your day job.

This last reason is hard for me to admit, because it’s so close to my heart. Ever since I learned to program as a kid, I felt there was something magical about languages. When I first tapped out BASIC programs one key at a time I couldn’t conceive how BASIC itself was made.

Later, the mixture of awe and terror on my college friends’ faces when talking about their compilers class was enough to convince me language hackers were a different breed of human — some sort of wizards granted privileged access to arcane arts.

And its practitioners don’t hesitate to play up this image. Two of the seminal texts on programming languages feature a dragon and a wizard on their covers.

When I did finally start cobbling together my own little interpreters, I quickly learned that, of course, there is no magic at all. It’s just code, and the people who hack on languages are just people.

There are a few techniques you don’t often encounter outside of languages, and some parts are a little difficult. But not more difficult than other obstacles you’ve overcome. My hope is that if you’ve felt intimidated by languages and this book helps you overcome that fear, maybe I’ll leave you just a tiny bit braver than you were before.

And, who knows, maybe you will make the next great language. Someone has to.

This book is broken into three parts. You’re reading the first one now. It’s a couple of chapters to get you oriented, teach you some of the lingo that language hackers use, and introduce you to Lox, the language we’ll be implementing.

Each of the other two parts builds one complete Lox interpreter. Within those parts, each chapter is structured the same way. The chapter takes a single language feature, teaches you the concepts behind it, and walks you through an implementation.

It took a good bit of trial and error on my part, but I managed to carve up the two interpreters into chapter-sized chunks that build on the previous chapters but require nothing from later ones. From the very first chapter, you’ll have a working program you can run and play with. With each passing chapter, it grows increasingly full-featured until you eventually have a complete language.

Aside from copious, scintillating English prose, chapters have a few other delightful facets:

We’re about crafting interpreters, so this book contains real code. Every single line of code needed is included, and each snippet tells you where to insert it in your ever-growing implementation.

Yacc is a tool that takes in a grammar file and produces a source file for a compiler, so it’s sort of like a “compiler” that outputs a compiler, which is where we get the term “compiler-compiler”.

Yacc wasn’t the first of its ilk, which is why it’s named “Yacc” — Yet Another Compiler-Compiler. A later similar tool is Bison, named as a pun on the pronunciation of Yacc like “yak”.

How to write an interpreter. Смотреть фото How to write an interpreter. Смотреть картинку How to write an interpreter. Картинка про How to write an interpreter. Фото How to write an interpreter

If you find all of these little self-references and puns charming and fun, you’ll fit right in here. If not, well, maybe the language nerd sense of humor is an acquired taste.

We will abstain from using them here. I want to ensure there are no dark corners where magic and confusion can hide, so we’ll write everything by hand. As you’ll see, it’s not as bad as it sounds, and it means you really will understand each line of code and how both interpreters work.

A book has different constraints from the “real world” and so the coding style here might not always reflect the best way to write maintainable production software. If I seem a little cavalier about, say, omitting private or declaring a global variable, understand I do so to keep the code easier on your eyes. The pages here aren’t as wide as your IDE and every character counts.

Also, the code doesn’t have many comments. That’s because each handful of lines is surrounded by several paragraphs of honest-to-God prose explaining it. When you write a book to accompany your program, you are welcome to omit comments too. Otherwise, you should probably use // a little more than I do.

While the book contains every line of code and teaches what each means, it does not describe the machinery needed to compile and run the interpreter. I assume you can slap together a makefile or a project in your IDE of choice in order to get the code to run. Those kinds of instructions get out of date quickly, and I want this book to age like XO brandy, not backyard hooch.

Since the book contains literally every line of code needed for the implementations, the snippets are quite precise. Also, because I try to keep the program in a runnable state even when major features are missing, sometimes we add temporary code that gets replaced in later snippets.

A snippet with all the bells and whistles looks like this:

In the center, you have the new code to add. It may have a few faded out lines above or below to show where it goes in the existing surrounding code. There is also a little blurb telling you in which file and where to place the snippet. If that blurb says “replace _ lines”, there is some existing code between the faded lines that you need to remove and replace with the new snippet.

Asides contain biographical sketches, historical background, references to related topics, and suggestions of other areas to explore. There’s nothing that you need to know in them to understand later parts of the book, so you can skip them if you want. I won’t judge you, but I might be a little sad.

Well, some asides do, at least. Most of them are just dumb jokes and amateurish drawings.

Each chapter ends with a few exercises. Unlike textbook problem sets, which tend to review material you already covered, these are to help you learn more than what’s in the chapter. They force you to step off the guided path and explore on your own. They will make you research other languages, figure out how to implement features, or otherwise get you out of your comfort zone.

Vanquish the challenges and you’ll come away with a broader understanding and possibly a few bumps and scrapes. Or skip them if you want to stay inside the comfy confines of the tour bus. It’s your book.

A word of warning: the challenges often ask you to make changes to the interpreter you’re building. You’ll want to implement those in a copy of your code. The later chapters assume your interpreter is in a pristine (“unchallenged”?) state.

I know a lot of language hackers whose careers are based on this. You slide a language spec under their door, wait a few months, and code and benchmark results come out.

Hopefully your new language doesn’t hardcode assumptions about the width of a punched card into its grammar.

All of that stuff profoundly affects the success of your new language. I want your language to succeed, so in some chapters I end with a “design note”, a little essay on some corner of the human aspect of programming languages. I’m no expert on this — I don’t know if anyone really is — so take these with a large pinch of salt. That should make them tastier food for thought, which is my main aim.

The book uses Java and C, but readers have ported the code to many other languages. If the languages I picked aren’t your bag, take a look at those.

Java is a great language for this. It’s high level enough that we don’t get overwhelmed by fiddly implementation details, but it’s still pretty explicit. Unlike in scripting languages, there tends to be less complex machinery hiding under the hood, and you’ve got static types to see what data structures you’re working with.

I also chose Java specifically because it is an object-oriented language. That paradigm swept the programming world in the ’90s and is now the dominant way of thinking for millions of programmers. Odds are good you’re already used to organizing code into classes and methods, so we’ll keep you in that comfort zone.

A compiler reads files in one language, translates them, and outputs files in another language. You can implement a compiler in any language, including the same language it compiles, a process called self-hosting.

You can’t compile your compiler using itself yet, but if you have another compiler for your language written in some other language, you use that one to compile your compiler once. Now you can use the compiled version of your own compiler to compile future versions of itself, and you can discard the original one compiled from the other compiler. This is called bootstrapping, from the image of pulling yourself up by your own bootstraps.

How to write an interpreter. Смотреть фото How to write an interpreter. Смотреть картинку How to write an interpreter. Картинка про How to write an interpreter. Фото How to write an interpreter

And, finally, Java is hugely popular. That means there’s a good chance you already know it, so there’s less for you to learn to get going in the book. If you aren’t that familiar with Java, don’t freak out. I try to stick to a fairly minimal subset of it. I use the diamond operator from Java 7 to make things a little more terse, but that’s about it as far as “advanced” features go. If you know another object-oriented language, like C# or C++, you can muddle through.

By the end of part II, we’ll have a simple, readable implementation. It’s not very fast, but it’s correct. However, we are only able to accomplish that by building on the Java virtual machine’s own runtime facilities. We want to learn how Java itself implements those things.

So in the next part, we start all over again, but this time in C. C is the perfect language for understanding how an implementation really works, all the way down to the bytes in memory and the code flowing through the CPU.

A big reason that we’re using C is so I can show you things C is particularly good at, but that does mean you’ll need to be pretty comfortable with it. You don’t have to be the reincarnation of Dennis Ritchie, but you shouldn’t be spooked by pointers either.

If you aren’t there yet, pick up an introductory book on C and chew through it, then come back here when you’re done. In return, you’ll come away from this book an even stronger C programmer. That’s useful given how many language implementations are written in C: Lua, CPython, and Ruby’s MRI, to name a few.

I pronounce the name like “sea-locks”, but you can say it “clocks” or even “cloch”, where you pronounce the “x” like the Greeks do if it makes you happy.

Our Java implementation was focused on being correct. Now that we have that down, we’ll turn to also being fast. Our C interpreter will contain a compiler that translates Lox to an efficient bytecode representation (don’t worry, I’ll get into what that means soon), which it then executes. This is the same technique used by implementations of Lua, Python, Ruby, PHP, and many other successful languages.

Did you think this was just an interpreter book? It’s a compiler book as well. Two for the price of one!

We’ll even try our hand at benchmarking and optimization. By the end, we’ll have a robust, accurate, fast interpreter for our language, able to keep up with other professional caliber implementations out there. Not bad for one book and a few thousand lines of code.

Challenges

There are at least six domain-specific languages used in the little system I cobbled together to write and publish this book. What are they?

Get a “Hello, world!” program written and running in Java. Set up whatever makefiles or IDE projects you need to get it working. If you have a debugger, get comfortable with it and step through your program as it runs.

Do the same thing for C. To get some practice with pointers, define a doubly linked list of heap-allocated strings. Write functions to insert, find, and delete items from it. Test them.

Design Note: What’s in a Name?

One of the hardest challenges in writing this book was coming up with a name for the language it implements. I went through pages of candidates before I found one that worked. As you’ll discover on the first day you start building your own language, naming is deviously hard. A good name satisfies a few criteria:

It isn’t in use. You can run into all sorts of trouble, legal and social, if you inadvertently step on someone else’s name.

It’s easy to pronounce. If things go well, hordes of people will be saying and writing your language’s name. Anything longer than a couple of syllables or a handful of letters will annoy them to no end.

It’s distinct enough to search for. People will Google your language’s name to learn about it, so you want a word that’s rare enough that most results point to your docs. Though, with the amount of AI search engines are packing today, that’s less of an issue. Still, you won’t be doing your users any favors if you name your language “for”.

It doesn’t have negative connotations across a number of cultures. This is hard to be on guard for, but it’s worth considering. The designer of Nimrod ended up renaming his language to “Nim” because too many people remember that Bugs Bunny used “Nimrod” as an insult. (Bugs was using it ironically.)

If your potential name makes it through that gauntlet, keep it. Don’t get hung up on trying to find an appellation that captures the quintessence of your language. If the names of the world’s other successful languages teach us anything, it’s that the name doesn’t matter much. All you need is a reasonably unique token.

How would I go about writing an interpreter in C? [closed]

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

I’d love some references, or tips, possibly an e-book or two. I’m not looking to write a compiler, just looking for a tutorial I could follow along and modify as I go. Thank you for being understanding!

BTW: It must be C.

Any more replies would be appreciated.

3 Answers 3

Trending sort

Trending sort is based off of the default sorting method — by highest score — but it boosts votes that have happened recently, helping to surface more up-to-date answers.

It falls back to sorting by highest score if no posts are trending.

Switch to Trending sort

A great way to get started writing an interpreter is to write a simple machine simulator. Here’s a simple language you can write an interpreter for:

The language has a stack and 6 instructions:

push # push a number on to the stack

pop # pop off the first number on the stack

add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.

print # print the value at the top of the stack

dup # push a copy of what’s at the top of the stack back onto the stack.

Once you’ve written a program that can take these instructions and execute them, you’ve essentially created a very simple stack based virtual machine. Since this is a very low level language, you won’t need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That’s too complicated for a tutorial project. Start with this, and once you’ve created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.

From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.

What I would suggest is to first learn about the following topics:

In short, C has a much higher learning curve compared to python, just because it’s a lower level language and you have to manage your own memory, so it’s good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.

The only difference between an interpreter and a compiler is that instead of generating code from the AST, you execute it in a VM instead. Once you understand this, almost any compiler book, even the Red Dragon Book (first edition, not second!), is enough.

I see this is a bit of a late reply, however since this thread showed up at second place in the result list when I did a search for writing an interpreter and no one have mentioned anything very concrete I will provide the following example:

Disclaimer: This is just some simple code I wrote in a hurry in order to have a foundation for the explanation below and are therefore not perfect, but it compiles and runs, and seems to give the expected answers.

Read the following C-code from bottom to top:

This is an simple recursive descend parser for basic mathematical expressions supporting one letter variables. Running it and typing some statements yields the following results:

You can alter the get(), peek() and number() to read from a file or list of code lines. Also you should make a function to read identifiers (basically words). Then you expand the statement() function to be able to alter which line it runs next in order to do branching. Last you add the branch operations you want to the statement function, like

Obviously you can decide the syntax to be as you like. You need to think about ways of define functions and how to handle arguments/parameter variables, local variables and global variables. If preferable arrays and data structures. References∕pointers. Input/output? In order to handle recursive function calls you probably need to use a stack.

In my opinion this would be easier to do all this with C++ and STL. Where for example one std::map could be used to hold local variables, and another map could be used for globals.

If someone wants to learn to write an interpreter, they should try making the most basic simple and practical working interpreter.

How to write an interpreter?

I have decided to write a small interpreter as my next project, in Ruby. What knowledge/skills will I need to have to be successful?
I haven’t decided on the language to interpret yet, but I am looking for something that is not a toy language, but would be relatively easy to write an interpreter for. Thanks in advance.

How to write an interpreter. Смотреть фото How to write an interpreter. Смотреть картинку How to write an interpreter. Картинка про How to write an interpreter. Фото How to write an interpreter

10 Answers 10

Trending sort

Trending sort is based off of the default sorting method — by highest score — but it boosts votes that have happened recently, helping to surface more up-to-date answers.

It falls back to sorting by highest score if no posts are trending.

Switch to Trending sort

You will have to learn at least:

An excellent introduction to some of these topics can be found in the introductory text Structure and Interpretation of Computer Programs. The language used in that book is Scheme, which is a robust, well-specified language that is ideally suited for your first interpreter implementation. Highly recommended.

I haven’t decided on the language to interpret yet, but I am looking for something that is not a toy language, but would be relatively easy to write an interpreter for. Thanks in advance.

Try some dialect of Lisp like Scheme or Clojure. (Now there’s an idea: Clojure-in-Ruby, which integrates with Ruby as well as Clojure does with Java.)

With Lisp, there is no need to bother with idiosyncracies of syntax, as Lisp’s syntax is much closer to the abstract syntax tree.

How to write an interpreter. Смотреть фото How to write an interpreter. Смотреть картинку How to write an interpreter. Картинка про How to write an interpreter. Фото How to write an interpreter

This SICP chapter shows how to write a Lisp interpreter in Lisp (a metacircular evaluator). In my opinion this is the best place to start. Then you can move on to Lisp in Small Pieces to learn how to write advanced interpreters and compilers for Lisp. The advantage of implementing a language like Lisp (in Lisp itself!) is that you get the lexical analyzer, parser, AST, data/program representation and REPL for free. You can concentrate on the task of getting your great language working!

There is Tree top project wich can be helpful for you http://treetop.rubyforge.org/

You can checkout Ruby Draft Specification http://ruby-std.netlab.jp/

I had a similar idea a couple of days ago. LISP is by far the easiest to implement because the syntax is so simple, and the data structures that the language manipulates are the same structures that the code is written in. Hence you need only a minimal implementation, and can define the rest in terms of itself.

However, if you are trying to learn about parsing, you may want to do a more complex language with Abstract Syntax Trees, etc.

If you want to check out my (literally two days old) Java implementation of lisp, check out mylisp.googlecode.com. I’m still working on it but it is incredible how short a time it took to get the existing stuff working.

How to write an interpreter. Смотреть фото How to write an interpreter. Смотреть картинку How to write an interpreter. Картинка про How to write an interpreter. Фото How to write an interpreter

It’s not sooo hard. here’s a LISP interpreter in ruby and the source is so small you are supposed to copy/paste it. but are you gonna learn LISP now? hehe.

If you’re just doing this for fun, make up your own, simple language and just try it. My recommendation would be something like a really simple classic BASIC (no visual basic or object oriented stuff). With line numbers, GOTO, INPUT and PRINT and that’s it. You get to do the basics, and you get a better understanding of how things work.

The knowledge you’ll need?

And for that last one you’ll also need a way to keep around variables. Usually you’d just implement a «stack», one huge block of data where you can mark off an area at the end.

Источники информации:

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *