Commit fe195b4d authored by Anya Helene Bagge's avatar Anya Helene Bagge 🦆
Browse files

set up asm

parent 9ab7f56f
# [ANTLR4](https://www.antlr.org/) Example Project
# [ASM](https://asm.ow2.io/) Example Project
### Links
* [ANTLR4 Documentation](https://github.com/antlr/antlr4/blob/master/doc/index.md)
* [ANTLR4 Getting Started](https://github.com/antlr/antlr4/blob/master/doc/getting-started.md)
* [Java parsing with ANTLR](https://www.baeldung.com/java-antlr) tutorial from [Baeldung](https://www.baeldung.com/)
# Examples
## Hello
The grammar `src/main/antlr4/inf225/grammars/Hello.g4` defines a tiny *Hello, World!* language:
```antlr4
// Define a grammar called Hello
grammar Hello;
hello : 'hello' ID '!'; // non-terminal hello: match keyword hello followed by an identifier
ID : [a-z]+ ; // terminal ID: match lower-case identifiers
WS : [ \t\r\n]+ -> skip ; // terimanl WS: skip spaces, tabs, newlines
```
A *Hello* ‘program‘ is defined by the `hello` production, and consists of the string `hello`, followed by an identifier and a `!`. E.g., valid (and boring) hello texts would be `"hello world!"`, `"hello\nworld!"` or `" hello you !"`. Invalid examples would be `"hello world"` (missing `!`), `"Hello World!"` (uppercase `Hello`) or `"hello!"` (missing identifier). Note that ANTLR4 by default makes literal terminals (e.g., `'hello'` above) *reserved keywords*, so `"hello hello!"` is not valid (`hello` will never match `ID`).
When the project is built (during the `generate-sources` phase), ANTLR4 will generate several Java classes implementing a parser for the *Hello* language:
* `HelloLexer.java` – the *lexer*, which splits an input string/stream into a stream of *tokens* or words (`'hello'`, `ID` or `WS` for this grammar; whitespace (`WS`) is ignored)
* `HelloParser.java` – the *parser*, which recognizes the sentence structure of the input text, and (optionally) builds a *parse tree*
* `HelloListener.java` – a *listener* interface, used together with a parse tree walker to perform an action for each node in the parse tree
* `HelloBaseListener.java` – a *listener* class, with default do-nothing methods for each type of parse tree node
We can use the parser like this (see example in `src/main/java/inf225/examples/HelloExample.java`):
* First, set up the input and the lexer; this will give us a stream of tokens (words):
```java
String input = "hello world";
// a lexer that splits the input string into tokens
HelloLexer lexer = new HelloLexer(CharStreams.fromString(input));
// a stream of tokens to feed to the parser
CommonTokenStream tokens = new CommonTokenStream(lexer);
```
* Next, make a `HelloParser` that reads the tokens:
```java
HelloParser parser = new HelloParser(tokens);
```
* Finally, we can get the *context* for the non-terminal we're interested in – the parser will then try to match the input to the production rule for the non-terminal (`hello: 'hello' ID '!'` in our case – i.e., we expect to find a `hello` token, an identifier and an exclamation mark):
```java
// the method name here matches the name of the non-terminal in the grammar (hello)
HelloContext tree = parser.hello();
```
### Tokens
You can easily examine the token stream by asking for a list of tokens. First, you must make sure that all the input has been processed, by calling `fill()`. Tthe parser will normally read tokens one by one (possibly looking ahead a few tokens), so the lexer produces tokens on demand – `fill()` makes it finish the job.
```java
// process all the input
tokens.fill();
// look at each token
for (Token t : tokens.getTokens())
System.out.printf("%-10s (%s)%n", t.getText(), HelloLexer.VOCABULARY.getDisplayName(t.getType()));
```
Given the input `hello world!`, the output should look like this:
```
hello ('hello')
world (ID)
! ('!')
<EOF> (EOF)
```
Each `Token` contains information about its type, it source (e.g., filename), start/end offset, and line number and column of the first character:
```java
for (Token t : tokens.getTokens())
System.out.printf("%-10s #%d, offset=%2d–%2d, line=%d, column=%2d, source=%s%n", t.getText(), t.getTokenIndex(),
t.getStartIndex(), t.getStopIndex(), t.getLine(), t.getCharPositionInLine(), t.getTokenSource().getSourceName());
```
E.g.:
```
hello #0, offset= 0– 4, line=1, column= 0, source=<unknown>
world #1, offset= 6–10, line=1, column= 6, source=<unknown>
! #2, offset=11–11, line=1, column=11, source=<unknown>
<EOF> #3, offset=12–11, line=1, column=12, source=<unknown>
```
The token stream itself gives you enough information to do very simple syntax highlighting; e.g., adding colours for keywords, string literals and so on.
### Parse trees
To see the parse result, we can use a `ParseTreeWalker` to visit all the nodes in the parse tree, giving it a listener that will be called for each node:
```java
walker.walk(new HelloBaseListener() {
@Override
public void visitTerminal(TerminalNode node) {
System.out.println("'" + node + "' ");
}
// you can also add visit methods for error nodes, and before and after a non-terminal
}, tree);
```
The output should look like this (for input `hello world!`):
```
'hello'
'world'
'!'
```
A more interesting walker would pick out who we're saying hello to:
```java
new ParseTreeWalker().walk(new HelloBaseListener() {
@Override
public void enterHello(HelloContext ctx) {
System.out.print("Saying hello to '" + ctx.getChild(1) + "'!");
}
}, tree);
```
Giving the output `Saying hello to 'world'!`
## Expresssions
For a more interesting example, have a look at `Expr.g4` and `ExprExample.java`, which defines a very simple language for prefix expressions with a single operator (`+`). The tree walker is used to evaluate the expressions using a stack: literal numbers are pushed onto the stack, and the plus operator pops to numbers, adds them and pushes the result. Try improving it by adding more operators!
# Maven Setup
This project comes with a working Maven `pom.xml` file. You should be able to import it into Eclipse using *File → Import → Maven → Existing Maven Projects* (or *Check out Maven Projects from SCM* to do Git cloning as well). You can also build the project from the command line with `mvn package`.
Pay attention to these folders:
* `src/main/java` – Java source files go here (as usual for Maven)
* `src/main/antlr4` – ANTLR4 grammar files (`*.g4`) go here; use sub-folders to place the generated parser in a specific Java package
* `src/test/java` – JUnit tests
* `target/generated-sources/antlr4` – ANTLR4 will place Java source code here (this happens automatically during compilation or if you run `mvn generate-sources`)
* `target/classes` – compiled Java class files
* `target/*.jar` – your compiled project, packaged in a JAR file
#### POM snippets
If you're setting up / adding ANTLR4 to your own project, you can cut and paste these lines into your `pom.xml`file.
* You should make sure that both the parser generator and the runtime use the same version, so define the version number in `<properties>…</properties>`:
```xml
<antlr4.version>4.8-1</antlr4.version>
```
* The ANTLR4 runtime is needed to run the compiled parser; add it in the `<depencencies>…</dependencies>` section:
```xml
<!-- https://mvnrepository.com/artifact/org.antlr/antlr4-runtime -->
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
<version>${antlr4.version}</version>
</dependency>
```
* The ANTLR4 maven plugin includes the ANTLR4 tool, and is needed to generate parser during compilation; add it to `<build><plugins>…</plugins></build>`:
```xml
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<version>${antlr4.version}</version>
<executions>
<execution>
<goals>
<goal>antlr4</goal>
</goals>
</execution>
</executions>
</plugin>
```
......@@ -6,12 +6,12 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.nuthatchery</groupId>
<artifactId>antlr-example</artifactId>
<artifactId>jvm-asm-example</artifactId>
<version>1.0-SNAPSHOT</version>
<name>INF225 ANTLR Example</name>
<description>A simple ANTLR v4 example.</description>
<url>https://retting.ii.uib.no/inf225.h20/antlr-example</url>
<url>https://retting.ii.uib.no/inf225.h20/jvm-asm-example</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
......@@ -27,12 +27,43 @@
<artifactId>antlr4-runtime</artifactId>
<version>${antlr4.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-text -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.9</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.5.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
<version>9.0</version>
</dependency>
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm-commons</artifactId>
<version>9.0</version>
</dependency>
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm-analysis</artifactId>
<version>9.0</version>
</dependency>
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm-tree</artifactId>
<version>9.0</version>
</dependency>
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm-util</artifactId>
<version>9.0</version>
</dependency>
</dependencies>
<build>
......@@ -41,6 +72,9 @@
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<version>${antlr4.version}</version>
<configuration>
<visitor>true</visitor>
</configuration>
<executions>
<execution>
<goals>
......
package inf225.examples;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintWriter;
import java.io.Serializable;
import java.lang.invoke.SerializedLambda;
import java.lang.reflect.Method;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.function.Function;
import org.objectweb.asm.ClassReader;
import org.objectweb.asm.ClassVisitor;
import org.objectweb.asm.ClassWriter;
import org.objectweb.asm.FieldVisitor;
import org.objectweb.asm.MethodVisitor;
import org.objectweb.asm.Opcodes;
import org.objectweb.asm.Type;
import org.objectweb.asm.commons.InstructionAdapter;
import org.objectweb.asm.commons.LocalVariablesSorter;
import org.objectweb.asm.util.Textifier;
import org.objectweb.asm.util.TraceClassVisitor;
import org.objectweb.asm.util.TraceMethodVisitor;
import inf225.examples.asm.AnalyzerVisitor;
import inf225.examples.asm.SimpleInstructionVisitor;
import inf225.examples.asm.SimpleMethodVisitor;
import inf225.examples.asm.ToExpr;
import org.objectweb.asm.commons.AnalyzerAdapter;
public class CodeGenerator {
public static void main(String[] args) throws Exception {
makeClass();
}
public static void makeClass() {
// make class writer, set it to compute stack map frames and maximum stack size
// / number
// of local variables automatically
ClassWriter writer = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
// the things we 'visit' are added to the class – first we visit the class header:
writer.visit(Opcodes.V13, Opcodes.ACC_PUBLIC | Opcodes.ACC_SUPER, "Foo", null, "java/lang/Object", null);
// add a field
FieldVisitor fieldVisitor = writer.visitField(Opcodes.ACC_PRIVATE | Opcodes.ACC_STATIC, "PI", "D", null, Double.valueOf(Math.PI));
fieldVisitor.visitEnd();
// add another field
fieldVisitor = writer.visitField(Opcodes.ACC_PRIVATE, "x", "I", null, null);
fieldVisitor.visitEnd();
// add a method – we add instructions to it by visiting the method body
// (this is more or less the same calls that would be made to our visitor if we were reading the class
// instead of writing it)
InstructionAdapter methodVisitor = new InstructionAdapter(writer.visitMethod(Opcodes.ACC_PUBLIC, "f", "(I)I", null, null));
methodVisitor.visitCode(); // start instructions
methodVisitor.load(0, Type.getObjectType("Foo")); // get this
methodVisitor.getfield("Foo", "x", "I"); // get this.x
methodVisitor.load(1, Type.INT_TYPE); // get first paramter
methodVisitor.add(Type.INT_TYPE); // compute x+a
methodVisitor.areturn(Type.INT_TYPE); // return result
methodVisitor.visitMaxs(0, 0); // ASM will compute stack size for us
methodVisitor.visitEnd(); // end of method
// another method
methodVisitor = new InstructionAdapter(writer.visitMethod(Opcodes.ACC_PUBLIC|Opcodes.ACC_STATIC, "main", "([Ljava/lang/String;)V", null, null));
methodVisitor.visitCode();
methodVisitor.getstatic("java/lang/System", "out", "Ljava/io/PrintStream;");
methodVisitor.iconst(42);
methodVisitor.invokevirtual("java/io/PrintStream", "println", "(I)V", false);
methodVisitor.areturn(Type.VOID_TYPE);
methodVisitor.visitMaxs(0, 0);
methodVisitor.visitEnd();
writer.visitEnd();
byte[] code = writer.toByteArray();
try(FileOutputStream file = new FileOutputStream("Foo.class")) {
file.write(code);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// read back the code and print it
CodePrinter.printByteCode(new ClassReader(code));
}
}
package inf225.examples;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintWriter;
import java.io.Serializable;
import java.util.function.Function;
import org.objectweb.asm.ClassReader;
import org.objectweb.asm.util.Textifier;
import org.objectweb.asm.util.TraceClassVisitor;
public class CodePrinter {
interface SerializableFunction<T, U> extends Function<T, U>, Serializable {
}
public static void main(String[] args) throws Exception {
printByteCode(Lambdas.class);
}
/**
* Print bytecode for the given class
*
* @param clazz
*/
public static void printByteCode(Class<?> clazz) {
// Java has the bytecode for loaded classes in memory – but not in the original
// bytecode form,
// so to read the code with ASM we need to load it again from file
String classFileName = clazz.getSimpleName().replace('.', '/') + ".class";
System.out.println("Loading class: " + classFileName);
try (InputStream in = clazz.getResourceAsStream(classFileName)) {
// alternatively, there's also new ClassReader(clazz.getName())
printByteCode(new ClassReader(in));
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* Print the bytecode of a class
*
* @param classReader
*/
public static void printByteCode(ClassReader classReader) {
System.out.println("Loaded " + classReader.getClassName());
// Textifier takes care of printing the instructions
Textifier textifier = new Textifier();
// TraceClassVisitor visits the class and calls the textifier
classReader.accept(new TraceClassVisitor(null, textifier, new PrintWriter(System.out)), //
ClassReader.SKIP_DEBUG | ClassReader.SKIP_FRAMES);
}
}
package inf225.examples;
public class DrawTree {
}
package inf225.examples;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.net.URLEncoder;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
import java.util.Deque;
import java.util.Iterator;
import java.util.LinkedList;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.IntStream;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.RuleContext;
import org.antlr.v4.runtime.Token;
import org.antlr.v4.runtime.TokenStream;
import org.antlr.v4.runtime.misc.Interval;
import org.antlr.v4.runtime.tree.ErrorNode;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeVisitor;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
import org.antlr.v4.runtime.tree.RuleNode;
import org.antlr.v4.runtime.tree.TerminalNode;
import org.apache.commons.text.StringEscapeUtils;
import inf225.grammars.ExprBaseListener;
import inf225.grammars.ExprLexer;
import inf225.grammars.ExprListener;
import inf225.grammars.ExprParser;
import inf225.grammars.ExprParser.ExprContext;
import inf225.grammars.ExprParser.IdExprContext;
import inf225.grammars.ExprParser.NumExprContext;
import inf225.grammars.ExprParser.PlusExprContext;
import inf225.grammars.ExprParser.MultExprContext;
import inf225.grammars.ExprParser.ProgramContext;
import inf225.grammars.ExprVisitor;
/**
* Simple expression language
*/
public class ExprExample {
public static void main(String[] args) throws URISyntaxException {
URI uri = new URI(null, null, "foo/bar", null);
int x = 2, y = x * 2;
// our input
String input = "2** 2 + 3 * 4 + 4 * 7*2";
// a lexer that splits the input string into tokens
ExprLexer lexer = new ExprLexer(CharStreams.fromString(input,uri.getPath()));
// a stream of tokens to feed to the parser
CommonTokenStream tokens = new CommonTokenStream(lexer);
// the parser that recovers the tree structure from the token stream
ExprParser parser = new ExprParser(tokens);
// the parse tree of the non-terminal we're interested in
ProgramContext tree = parser.program();
System.out.println(tree.accept(new ExprVisitor<Integer>() {
@Override
public Integer visit(ParseTree tree) {
// TODO Auto-generated method stub
return 0;
}
@Override
public Integer visitChildren(RuleNode node) {
// TODO Auto-generated method stub
return 0;
}
@Override
public Integer visitTerminal(TerminalNode node) {
// TODO Auto-generated method stub
return 0;
}
@Override
public Integer visitErrorNode(ErrorNode node) {
System.out.println("error: " + node);
// TODO Auto-generated method stub
return 0;
}
@Override
public Integer visitProgram(ProgramContext ctx) {
return ctx.expr().accept(this);
}
@Override
public Integer visitMultExpr(MultExprContext ctx) {
System.out.println("multexpr:" + ctx.getText());
System.out.println("plusexpr: " + ctx.getText());
int val = 1;
for (ExprContext child : ctx.expr()) {
// val *= child.accept(this);
}
return val;
}
@Override
public Integer visitNumExpr(NumExprContext ctx) {
System.out.println("numexpr:" + ctx.getText());
return Integer.parseInt(ctx.getText());
}
@Override
public Integer visitPlusExpr(PlusExprContext ctx) {
System.out.println("plusexpr: " + ctx.getText());
int val = 0;
for (ExprContext child : ctx.expr()) {
// val += child.accept(this);
}
return val;
}
@Override
public Integer visitIdExpr(IdExprContext ctx) {
// TODO Auto-generated method stub
return 0;
}
}));
int maxDepth = tree.accept(new ParseTreeVisitor<Integer>() {
public Integer visitChildren(RuleNode node) {
int depth = 0;
for (int i = 0; i < node.getChildCount(); i++) {
depth = Math.max(depth, node.getChild(i).accept(this));
}
node.getRuleContext().setAltNumber(depth);
return depth + 1;
}
@Override
public Integer visit(ParseTree tree) {
return 1;
}
@Override
public Integer visitTerminal(TerminalNode node) {
return 1;
}
@Override
public Integer visitErrorNode(ErrorNode node) {
return 1;
}
});
String sourceName = tokens.getSourceName() == TokenStream.UNKNOWN_SOURCE_NAME ? "" : URLEncoder.encode(tokens.getSourceName(),StandardCharsets.UTF_8);
System.out.println(tree.accept(new ParseTreeVisitor<String>() {
String indent = "";
private String makeSourceRef(ParseTree tree) {
Interval interval = tree.getSourceInterval();
int tokStart = interval.a, tokEnd = interval.b;
if (tokStart <= tokEnd) {
int charStart = tokens.get(tokStart).getStartIndex();
int charEnd = tokens.get(tokEnd).getStopIndex();
String range = "";
if(charStart > charEnd)
return "";
else if (charStart == charEnd)
range += charStart;
else if (charStart < charEnd)
range += charStart + "-" + charEnd;
return String.format("href=\"%s\"", uri.resolve("#"+range).toString());
}
return "";
}
@Override
public String visit(ParseTree tree) {
System.out.println("visit: " + tree);
return null;
}
@Override
public String visitChildren(RuleNode node) {
// TODO Auto-generated method stub
RuleContext ctx = node.getRuleContext();
String localIndent = " ".repeat(ctx.depth());
indent = localIndent;
String body = "";
for (int i = 0; i < node.getChildCount(); i++) {
body += String.format("%s", node.getChild(i).accept(this));
}
String rule = ctx.getClass().getSimpleName().replaceFirst("Context$", ""); // ExprParser.ruleNames[ctx.getRuleIndex()];
return String.format(
"%s<node %s symbol=\"%s\" style=\"background-color:hsl(%d,70%%,90%%,50%%)\">\n%s%s</node>\n", //
localIndent, //
makeSourceRef(node), //
rule, //
ctx.getAltNumber() * 360 / maxDepth, body, localIndent);
}
@Override
public String visitTerminal(TerminalNode node) {
String symbolName = ExprLexer.VOCABULARY.getDisplayName(node.getSymbol().getType());
String text = node.getText();
return String.format(
"%s <leaf %s symbol=\"%s\" style=\"background-color:hsl(0,70%%,90%%,50%%)\">%s</leaf>\n", //
indent, //
makeSourceRef(node), //
StringEscapeUtils.escapeHtml4(symbolName+"\"'"), StringEscapeUtils.escapeHtml4(text));