2013-05-15 45 views
19

Tôi có một ngôn ngữ đơn giản mà tôi đang cố gắng viết trình biên dịch (có nó là bài tập về nhà) để biên dịch một ngôn ngữ đơn giản mà tôi sẽ mô tả nếu cần thiết cho mã vm java.Trình biên dịch Python cho ngôn ngữ đơn giản với thuật toán mã vm java

Nó hiện đang hoạt động khá tốt Tôi đã chỉ nhấn một vết sưng với logic AND và OR.

Mỗi công việc được xử lý trong một điều kiện if/while, nhưng nếu tôi cố gắng và xâu chuỗi mọi thứ sai, hãy sửa tôi nếu tôi sai nhưng tôi tin rằng AND có quyền ưu tiên, nhưng tôi tự hỏi có cách hợp lý hay không sắp xếp chúng? Tôi nghĩ là những gì tôi đang cố gắng để hỏi, đầu ra mã vm java chỉ có các câu lệnh so sánh và nhảy sau cái khác (có vẻ sai). Tôi nhận ra nó khá trừu tượng vì vậy có lẽ những gì tôi sau là một mã giả/thuật toán cho cách cấu trúc xích AND và OR.

EDIT: Hiện tại chỉ xử lý mọi kết hợp AND và OR là AND. So sánh các yếu tố/thuật ngữ/biểu thức kết nối (so với booleanfactor vv) Tôi tin rằng AND đã ưu tiên? Chỉ là một ý nghĩ.

Xin lỗi nếu điều này không được hiểu rõ:/

Vì vậy, tôi bị ốm bao gồm thông tin liên quan vừa xảy ra.

biên dịch

import re 
import sys 

# Restrictions: 
# Integer constants must be short. 
# Stack size must not exceed 1024. 
# Integer is the only type. 
# Logical operators cannot be nested. 

class Scanner: 
    '''The interface comprises the methods lookahead and consume. 
     Other methods should not be called from outside of this class.''' 

    def __init__(self, input_file): 
     '''Reads the whole input_file to input_string.''' 
     # source code of the program to be compiled 
     self.input_string = input_file.read() 
     # index where the unprocessed part of input_string starts 
     self.current_char_index = 0 
     # a pair (most recently read token, matched substring of input_string) 
     self.current_token = self.get_token() 

    def skip_white_space(self): 
     '''Consumes all characters in input_string up to the next 
      non-white-space character.''' 
     if (self.current_char_index >= len(self.input_string) - 1): 
      # bad fix for it over-running the end of the file 
      return 

     while self.input_string[self.current_char_index].isspace(): 
      self.current_char_index += 1 


     return 

    def get_token(self): 
     '''Returns the next token and the part of input_string it matched. 
      Returns None if there is no next token. 
      The characters up to the end of the token are consumed.''' 

     self.skip_white_space() 
     # find the longest prefix of input_string that matches a token 
     token, longest = None, '' 
     for (t, r) in Token.token_regexp: 
      match = re.match(r, self.input_string[self.current_char_index:]) 
      if match and match.end() > len(longest): 
       token, longest = t, match.group() 
     # consume the token by moving the index to the end of the matched part 
     self.current_char_index += len(longest) 
     return (token, longest) 

    def lookahead(self): 
     '''Returns the next token without consuming it. 
      Returns None if there is no next token.''' 
     return self.current_token[0] 

    def consume(self, *tokens): 
     '''Returns the next token and consumes it, if it is in tokens. 
      Raises an exception otherwise. 
      If the token is a number or an identifier, its value is returned.''' 
     if self.current_token[0] not in tokens: 
      print('Token ' + self.current_token[0] + ' isn\'t in the tokens: ') 
      for token in tokens: 
       print(token) 
      raise Exception('Token is not in tokens this shouldn\'t happen much') 

     if self.current_token[0] == 'ID': 
      symbol_table.location(self.current_token[1]) 
      value = self.current_token[1] 


     elif (self.current_token[0] == 'NUM'): 
      value = self.current_token[1] 

     else: 
      value = self.current_token[0] 

     self.current_token = self.get_token() 

     return value 



class Token: 
    DO = 'DO'; 
    ELSE = 'ELSE'; 
    END = 'END'; 
    IF = 'IF'; 
    THEN = 'THEN'; 
    WHILE = 'WHILE'; 
    SEM = 'SEM'; 
    BEC = 'BEC'; 
    LESS = 'LESS'; 
    EQ = 'EQ'; 
    GRTR = 'GRTR'; 
    LEQ = 'LEQ'; 
    NEQ = 'NEQ'; 
    GEQ = 'GEQ'; 
    ADD = 'ADD'; 
    SUB = 'SUB'; 
    MUL = 'MUL'; 
    DIV = 'DIV'; 
    LPAR = 'LPAR'; 
    RPAR = 'RPAR'; 
    NUM = 'NUM'; 
    ID = 'ID'; 
    READ = 'READ'; 
    WRITE = 'WRITE'; 
    OR = 'OR'; 
    AND = 'AND'; 
    NOT = 'NOT'; 


    # The following list gives the regular expression to match a token. 
    # The order in the list matters for mimicking Flex behaviour. 
    # Longer matches are preferred over shorter ones. 
    # For same-length matches, the first in the list is preferred. 
    token_regexp = [ 
     (DO, 'do'), 
     (ELSE, 'else'), 
     (END, 'end'), 
     (IF, 'if'), 
     (THEN, 'then'), 
     (WHILE, 'while'), 
     (READ, 'read'), 
     (WRITE, 'write'), 
     (OR, 'or'), 
     (AND, 'and'), 
     (NOT, 'not'),   
     (SEM, ';'), 
     (BEC, ':='), 
     (LESS, '<'), 
     (EQ, '='), 
     (NEQ, '!='), 
     (GRTR, '>'), 
     (LEQ, '<='), 
     (GEQ, '>='), 
     (ADD, '[+]'), # + is special in regular expressions 
     (SUB, '-'), 
     (MUL, '[*]'), 
     (DIV, '/'), 
     (LPAR, '[(]'), # (is special in regular expressions 
     (RPAR, '[)]'), #) is special in regular expressions 
     (ID, '[a-z]+'), 
     (NUM, '[0-9]+'), 
    ] 

class Symbol_Table: 
    '''A symbol table maps identifiers to locations.''' 
    def __init__(self): 
     self.symbol_table = {} 
    def size(self): 
     '''Returns the number of entries in the symbol table.''' 
     return len(self.symbol_table) 
    def location(self, identifier): 
     '''Returns the location of an identifier. If the identifier is not in 
      the symbol table, it is entered with a new location. Locations are 
      numbered sequentially starting with 0.''' 
     if identifier in self.symbol_table: 
      return self.symbol_table[identifier] 
     index = len(self.symbol_table) 
     self.symbol_table[identifier] = index 
     return index 

class Label: 
    def __init__(self): 
     self.current_label = 0 
    def next(self): 
     '''Returns a new, unique label.''' 
     self.current_label += 1 
     return 'l' + str(self.current_label) 

def indent(s, level): 
    return ' '*level + s + '\n' 

# Each of the following classes is a kind of node in the abstract syntax tree. 
# indented(level) returns a string that shows the tree levels by indentation. 
# code() returns a string with JVM bytecode implementing the tree fragment. 
# true_code/false_code(label) jumps to label if the condition is/is not true. 
# Execution of the generated code leaves the value of expressions on the stack. 

class Program_AST: 
    def __init__(self, program): 
     self.program = program 
    def __repr__(self): 
     return repr(self.program) 
    def indented(self, level): 
     return self.program.indented(level) 
    def code(self): 
     program = self.program.code() 
     local = symbol_table.size() 
     java_scanner = symbol_table.location('Java Scanner') 
     return '.class public Program\n' + \ 
       '.super java/lang/Object\n' + \ 
       '.method public <init>()V\n' + \ 
       'aload_0\n' + \ 
       'invokenonvirtual java/lang/Object/<init>()V\n' + \ 
       'return\n' + \ 
       '.end method\n' + \ 
       '.method public static main([Ljava/lang/String;)V\n' + \ 
       '.limit locals ' + str(local) + '\n' + \ 
       '.limit stack 1024\n' + \ 
       'new java/util/Scanner\n' + \ 
       'dup\n' + \ 
       'getstatic java/lang/System.in Ljava/io/InputStream;\n' + \ 
       'invokespecial java/util/Scanner.<init>(Ljava/io/InputStream;)V\n' + \ 
       'astore ' + str(java_scanner) + '\n' + \ 
       program + \ 
       'return\n' + \ 
       '.end method\n' 

class Statements_AST: 
    def __init__(self, statements): 
     self.statements = statements 
    def __repr__(self): 
     result = repr(self.statements[0]) 
     for st in self.statements[1:]: 
      result += '; ' + repr(st) 
     return result 
    def indented(self, level): 
     result = indent('Statement(s)', level) 
     for st in self.statements: 
      result += st.indented(level+1) 
     return result 
    def code(self): 
     result = '' 
     for st in self.statements: 
      result += st.code() 
     return result 

class If_AST: 
    def __init__(self, boolean_expression, then): 
     self.boolean_expression = boolean_expression 
     self.then = then 
    def __repr__(self): 
     return 'if ' + repr(self.boolean_expression) + ' then ' + \ 
         repr(self.then) + ' end' 
    def indented(self, level): 
     return indent('If-Then', level) + \ 
       self.boolean_expression.indented(level+1) + \ 
       self.then.indented(level+1) 
    def code(self): 
     l1 = label_generator.next() 
     return self.boolean_expression.code(l1) + \ 
       self.then.code() + \ 
       l1 + ':\n' 

class If_Else_AST: 
    def __init__(self, boolean_expression, then, _else): 
     self.boolean_expression = boolean_expression; 
     self.then = then; 
     self._else = _else; 
    def __repr__(self): 
     return 'if ' + repr(self.boolean_expression) + ' then ' + \ 
         repr(self.then) + ' else ' + \ 
         repr(self._else) + ' end' 
    def indented(self, level): 
     return indent('If-Then-Else', level) + \ 
       self.boolean_expression.indented(level+1) + \ 
       self.then.indented(level+1) + \ 
       indent('Else', level+1) + \ 
       self._else.indented(level+1) 
    def code(self): 
     l1 = label_generator.next() 
     l2 = label_generator.next() 
     return self.boolean_expression.code(l1) + \ 
       self.then.code() + \ 
       'goto ' + l2 + '\n' + \ 
       l1 + ':\n' + \ 
       self._else.code() + \ 
       l2 + ':\n' 


class While_AST: 
    def __init__(self, boolean_term, body): 
     self.boolean_term = boolean_term 
     self.body = body 
    def __repr__(self): 
     return 'while ' + repr(self.boolean_term) + ' do ' + \ 
          repr(self.body) + ' end' 
    def indented(self, level): 
     return indent('While-Do', level) + \ 
       self.boolean_term.indented(level+1) + \ 
       self.body.indented(level+2) 
    def code(self): 
     l1 = label_generator.next() 
     l2 = label_generator.next() 
     return l1 + ':\n' + \ 
       self.boolean_term.code(l2) + \ 
       self.body.code() + \ 
       'goto ' + l1 + '\n' + \ 
       l2 + ':\n' 

class Assign_AST: 
    def __init__(self, identifier, expression): 
     self.identifier = identifier 
     self.expression = expression 
    def __repr__(self): 
     return repr(self.identifier) + ':=' + repr(self.expression) 
    def indented(self, level): 
     return indent('Assign', level) + \ 
       self.identifier.indented(level+1) + \ 
       self.expression.indented(level+1) 
    def code(self): 
     loc = symbol_table.location(self.identifier.identifier) 
     return self.expression.code() + \ 
       'istore ' + str(loc) + '\n' 

class Write_AST: 
    def __init__(self, expression): 
     self.expression = expression 
    def __repr__(self): 
     return 'write ' + repr(self.expression) 
    def indented(self, level): 
     return indent('Write', level) + self.expression.indented(level+1) 
    def code(self): 
     return 'getstatic java/lang/System/out Ljava/io/PrintStream;\n' + \ 
       self.expression.code() + \ 
       'invokestatic java/lang/String/valueOf(I)Ljava/lang/String;\n' + \ 
       'invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V\n' 

class Read_AST: 
    def __init__(self, identifier): 
     self.identifier = identifier 
    def __repr__(self): 
     return 'read ' + repr(self.identifier) 
    def indented(self, level): 
     return indent('Read', level) + self.identifier.indented(level+1) 
    def code(self): 
     java_scanner = symbol_table.location('Java Scanner') 
     loc = symbol_table.location(self.identifier.identifier) 
     return 'aload ' + str(java_scanner) + '\n' + \ 
       'invokevirtual java/util/Scanner.nextInt()I\n' + \ 
       'istore ' + str(loc) + '\n' 

class Comparison_AST: 
    def __init__(self, left, op, right): 
     self.left = left 
     self.op = op 
     self.right = right 
    def __repr__(self): 
     op = { Token.LESS:'<', Token.EQ:'=', Token.GRTR:'>', 
       Token.LEQ:'<=', Token.NEQ:'!=', Token.GEQ:'>=' } 
     return repr(self.left) + op[self.op] + repr(self.right) 
    def indented(self, level): 
     return indent(self.op, level) + \ 
       self.left.indented(level+1) + \ 
       self.right.indented(level+1) 
    def true_code(self, label): 
     op = { Token.LESS:'if_icmplt', Token.EQ:'if_icmpeq', 
       Token.GRTR:'if_icmpgt', Token.LEQ:'if_icmple', 
       Token.NEQ:'if_icmpne', Token.GEQ:'if_icmpge' } 
     return self.left.code() + \ 
       self.right.code() + \ 
       op[self.op] + ' ' + label + '\n' 
    def false_code(self, label): 
     # Negate each comparison because of jump to "false" label. 
     op = { Token.LESS:'if_icmpge', Token.EQ:'if_icmpne', 
       Token.GRTR:'if_icmple', Token.LEQ:'if_icmpgt', 
       Token.NEQ:'if_icmpeq', Token.GEQ:'if_icmplt' } 
     return self.left.code() + \ 
       self.right.code() + \ 
       op[self.op] + ' ' + label + '\n' 

class Expression_AST: 
    def __init__(self, left, op, right): 
     self.left = left 
     self.op = op 
     self.right = right 
    def __repr__(self): 
     op = { Token.ADD:'+', Token.SUB:'-', Token.MUL:'*', Token.DIV:'/' } 
     return '(' + repr(self.left) + op[self.op] + repr(self.right) + ')' 
    def indented(self, level): 
     return indent(self.op, level) + \ 
       self.left.indented(level+1) + \ 
       self.right.indented(level+1) 
    def code(self): 
     op = { Token.ADD:'iadd', Token.SUB:'isub', 
       Token.MUL:'imul', Token.DIV:'idiv' } 
     return self.left.code() + \ 
       self.right.code() + \ 
       op[self.op] + '\n' 

class Number_AST: 
    def __init__(self, number): 
     self.number = number 
    def __repr__(self): 
     return self.number 
    def indented(self, level): 
     return indent(self.number, level) 
    def code(self): # works only for short numbers 
     return 'sipush ' + self.number + '\n' 

class Identifier_AST: 
    def __init__(self, identifier): 
     self.identifier = identifier 
    def __repr__(self): 
     return self.identifier 
    def indented(self, level): 
     return indent(self.identifier, level) 
    def code(self): 
     loc = symbol_table.location(self.identifier) 
     return 'iload ' + str(loc) + '\n' 

class BooleanFactor_AST: 
    def __init__(self, condition, logic): 
     self.condition = condition 
     self.logic = logic 
    def __repr__(self): 
     if self.logic == False: 
      return 'NOT ' + repr(self.condition) 
     else: 
      return repr(self.condition) 
    def indented(self, level): 
     if self.logic == False: 
      return indent('NOT ', level) + self.condition.indented(level + 1) 
     else: 
      return self.condition.indented(level) 
    def false_code(self, label): 
     if self.logic == True: 
      return self.condition.false_code(label) 
     else: 
      return self.condition.true_code(label) 
     return 
    def true_code(self, label): 
     if self.logic == True: 
      return self.condition.true_code(label) 
     else: 
      return self.condition.false_code(label) 


class BooleanTerm_AST: 
    def __init__(self, terms): 
     self.terms = terms 
    def __repr__(self): 
     result = repr(self.terms[0]) 
     for term in self.terms[1:]: 
      result = result + ' AND ' + repr(term) 
     return result 
    def indented(self, level): 
     result = self.terms[0].indented(level) 
     for term in self.terms[1:]: 
      result = result + indent('AND', level) 
      result = result + term.indented(level) 
     return result 
    def code(self, label): 
     result = '' 
     for term in self.terms: 
      result = result + term.false_code(label) 
     return result 

class BooleanExpression_AST: 
    def __init__(self, expressions): 
     self.expressions = expressions 
    def __repr__(self): 
     result = repr(self.expressions[0]) 
     for expression in self.expressions[1:]: 
      result = result + ' OR ' + repr(expression) 
     return result 
    def indented(self, level): 
     result = self.expressions[0].indented(level) 
     indentation = 0 
     for expression in self.expressions[1:]: 
      indentation += 1 
      result = result + indent('OR', level + indentation) 
      result = result + expression.indented(level + indentation) 
     return result 
    def code(self, label): 
     result = '' 
     for expression in self.expressions: 
      result = result + expression.code(label) 
     return result 


# The following methods comprise the recursive-descent parser. 

def program(): 
    sts = statements() 
    return Program_AST(sts) 

def statements(): 
    result = [statement()] 
    while scanner.lookahead() == Token.SEM: 
     scanner.consume(Token.SEM) 
     st = statement() 
     result.append(st) 
    return Statements_AST(result) 

def statement(): 
    if scanner.lookahead() == Token.IF: 
     return if_statement() 
    elif scanner.lookahead() == Token.WHILE: 
     return while_statement() 
    elif scanner.lookahead() == Token.ID: 
     return assignment() 
    elif scanner.lookahead() == Token.READ: 
     return read(); 
    elif scanner.lookahead() == Token.WRITE: 
     return write(); 
    else: # error 
     return scanner.consume(Token.IF, Token.WHILE, Token.ID) 

def if_statement(): 
    scanner.consume(Token.IF) 
    condition = boolean_expression() 
    scanner.consume(Token.THEN) 
    then = statements() 
    if scanner.lookahead() == Token.END: 
     scanner.consume(Token.END) 
     return If_AST(condition, then) 
    else: 
     scanner.consume(Token.ELSE) 
     _else = statements() 
     scanner.consume(Token.END) 
     return If_Else_AST(condition, then, _else) 

def while_statement(): 
    scanner.consume(Token.WHILE) 
    condition = boolean_expression() 
    scanner.consume(Token.DO) 
    body = statements() 
    scanner.consume(Token.END) 
    return While_AST(condition, body) 

def assignment(): 
    ident = identifier() 
    scanner.consume(Token.BEC) 
    expr = expression() 
    return Assign_AST(ident, expr) 

def read(): 
    scanner.consume(Token.READ) 
    variable = identifier() 
    return Read_AST(variable) 


def write(): 
    scanner.consume(Token.WRITE) 
    expr = expression() 
    return Write_AST(expr) 

def comparison(): 
    left = expression() 
    op = scanner.consume(Token.LESS, Token.EQ, Token.GRTR, 
         Token.LEQ, Token.NEQ, Token.GEQ) 
    right = expression() 
    return Comparison_AST(left, op, right) 

def expression(): 
    result = term() 
    while scanner.lookahead() in [Token.ADD, Token.SUB]: 
     op = scanner.consume(Token.ADD, Token.SUB) 
     tree = term() 
     result = Expression_AST(result, op, tree) 
    return result 

def term(): 
    result = factor() 
    while scanner.lookahead() in [Token.MUL, Token.DIV]: 
     op = scanner.consume(Token.MUL, Token.DIV) 
     tree = factor() 
     result = Expression_AST(result, op, tree) 
    return result 

def factor(): 
    if scanner.lookahead() == Token.LPAR: 
     scanner.consume(Token.LPAR) 
     result = expression() 
     scanner.consume(Token.RPAR) 
     return result 
    elif scanner.lookahead() == Token.NUM: 
     value = scanner.consume(Token.NUM) 
     return Number_AST(value) 
    elif scanner.lookahead() == Token.ID: 
     return identifier() 
    else: # error 
     return scanner.consume(Token.LPAR, Token.NUM, Token.ID) 

def identifier(): 
    value = scanner.consume(Token.ID) 
    return Identifier_AST(value) 

def boolean_factor(): 
    if scanner.lookahead() == Token.NOT: 
     scanner.consume(Token.NOT) 
     logic = False 
    else: 
     logic = True 
    result = comparison() 
    return BooleanFactor_AST(result, logic) 

def boolean_term(): 
    result = [boolean_factor()] 
    while scanner.lookahead() in [Token.AND]: 
     scanner.consume(scanner.lookahead()) 
     temp = boolean_factor() 
     result.append(temp) 

    return BooleanTerm_AST(result) 

def boolean_expression(): 
    result = [boolean_term()] 
    while scanner.lookahead() in [Token.OR]: 
     scanner.consume(scanner.lookahead()) 
     temp = boolean_term() 
     result.append(temp) 

    return BooleanExpression_AST(result) 

# Initialise scanner, symbol table and label generator. 

#scanner = Scanner(open('test.txt')) 
scanner = Scanner(sys.stdin) 
symbol_table = Symbol_Table() 
symbol_table.location('Java Scanner') # fix a location for the Java Scanner 
label_generator = Label() 

# Uncomment the following to test the scanner without the parser. 
# This shows a list of all tokens in the input. 
# 
#token = scanner.lookahead() 

#while token != None: 
# print(token) 
# scanner.consume(token) 
# token = scanner.lookahead() 

#exit() 

# Call the parser. 

ast = program() 
assert scanner.lookahead() == None 

# Uncomment the following to test the parser without the code generator. 
# The first line gives back the program by calling __repr__ of the AST classes. 
# The second line shows the syntax tree with levels indicated by indentation. 
# 
#print(ast) 
#print(ast.indented(0)) 
#exit() 

# Call the code generator. 

# This translates the abstract syntax tree to JVM bytecode. 
# It can be assembled to a class file by Jasmin: http://jasmin.sourceforge.net/ 

print(ast.code()) 

tập tin thử nghiệm dơi

python compiler.py <test.txt> Program.j 
java -Xmx100m -jar jasmin.jar Program.j 
java -Xmx100m Program <testInput.txt> test_output.txt 

và ngôn ngữ (BNF)

Program = Statements 
Statements = Statement (; Statement) 
Statement = If | While | Assignment 
If = if Comparison then Statements end 
While = while Comparison do Statements end 
Assignment = identifier := Expression 
Comparison = Expression Relation Expression 
Relation = = | != | < | <= | > | >= 
Expression = Term ((+ | -) Term) 
Term = Factor ((* | /) Factor) 
Factor = (Expression) | number | identifier 
BooleanExpression = BooleanTerm (or BooleanTerm)* 
BooleanTerm = BooleanFactor (and BooleanFactor)* 
BooleanFactor = not BooleanFactor | Comparison 

Tôi nghĩ rằng thats tất cả những gì có liên quan, cổ vũ nếu bạn mất một đi vào giúp tôi trên số này

+0

eep, j trong mô tả ngôn ngữ có nghĩa là bitwise hoặc (|) – Zeno15

+1

Bạn có thể chỉnh sửa câu hỏi của mình (nhấp vào nút "chỉnh sửa" màu xám ở cuối bài đăng) nếu bạn muốn khắc phục điều đó. –

+0

cổ vũ cho rằng – Zeno15

Trả lời

1

nếu bạn muốn có một phương pháp để chuỗi OR và AND'syou có thể sử dụng tài sản này:

p v q === ¬p^¬q 

là tương đương, bạn có thể xử lý tất cả trong VÀ tạo. ví dụ.

p v q^r v s === ¬p^¬q^¬r^¬s 

Vì vậy, đánh giá biểu thức trong VÀ tạo rất đơn giản với một thuật toán.

Tôi đoán biểu thức không có dấu ngoặc đơn, theo cách khác bạn cần ưu tiên các ký hiệu nhóm(), [], {}.

Các vấn đề liên quan