Project #1

CSC 459- Project #1 - Relational Algebra Interpreter

Theme ...

Build an application program that lets a user identify and open tables and perform Relational Algebra Operations on them.

You have two choices for this project:

Build a real interpreter that can evaluate Relational Algebra Expressions.
Design a user interface that helps the user formulate, interactively, what he/she wants.

If you follow the first choice, the assumption is that the user will develop a command file and your interpreter will go through it one command line at a time and perform each task. Each command line must be checked for syntax and semantics before getting executed.

If choice two is followed, then the user needs to be guided to achieve the same functionality. The advantage of the second option is in controlling what is legal for the user to ask for at any point in time (i.e. you really shouldn't have to deal with syntax or semantic errors).

Part#1 -- The FrontEnd for RAI

No actual RA operations take place.

if you choose to build a command file interpreter: you will only deal with finding errors in the syntax or the semantics of each line and reporting the status of the line. When there is an error in a line; in this part, you need to display a message that appropriately describes the error. When TYPE statements are successful, you simply report their success. When OPEN statements are successful, you identify the relation name as an open relation and display its type. When an expression is correct, you need to report the type of the resulting relation (key-list and attribute-list).

If you build an interactive interface for the user: what you complete up to this point must make clear how your application will work and that the user will, in fact, have the same level of power of expression as he/she would with the command file interpreter.

The language for the command file

The BNF syntax below defines the language that must be used in the command file to express the tables and the queries on them.


char 	     ::=	'a' | ..| 'z' | '0' | .. | '9' | ' '
const-str    ::=        char,rest-of-c
rest-of-c    ::=	const-str | null
alpha-num    ::=	'a' | ..| 'z' | '0' | .. | '9'
string       ::=	alpha-num,rest-of-s
rest-of-s    ::=	string | null
attr-name    ::=	string
rel-name     ::=	string
type-name    ::=	string
attr-lst     ::=	attr-name,rest-of-attr 
rest-of-attr ::=	',' attr-lst | null
key-lst      ::=	attr-lst

type-def     ::=	TYPE type-name '[' key-lst ']' attr-lst
open-rel     ::=       	OPEN rel-name type-name

infix-op     ::=	UNION | INTERSECT | MINUS | TIMES | JOIN
operand      ::=        rel-name | '(' expr ')'
infix-expr   ::=        operand infix-op operand
comp-op      ::=        < | > | =
condition    ::=	attr-name comp-op '"' const-str '"'
selection    ::=        operand WHERE condition
projection   ::=        operand '[' attr-lst ']'
expr	     ::=	selection | projection | infix-expr

definition   ::= 	type-def | open-rel | expr
definitions  ::=        definition '',rest-of-d
rest-of-d    ::=        definitions | ''

command-file ::=	definitions

Each line/definition in the command file is read and echoed; it will then need to be validated and a reasonable message displayed as mentioned earlier.

When a TYPE definition is syntactically and semantically correct, display the word "SUCCESSFUL"; otherwise, display a meaningful message stating what is wrong with it.
When an OPEN definition is syntactically and semantically correct, display a message signaling that the relation/table is open showing its name and type. Again, here you need a meaning message when there is an error.
With valid expressions, display the definition and the type of the resulting relation (i.e. display its key list and attribute list). Again, here you need a meaning message when there is an error.

Types, Relations, attributes, and keys

In this part, relations don't have any data in them; only their name and type (attribute-list, key-list) are known.

Relations are either permanent, created via OPEN, or they temporary resulting from an algebraic operation. Temporary relations go away once the processing of an expression is completed.

A relation type identifies a list of attributes and the subset of those attributes that constitute its key. A key list is the minimum subset of attributes in the relation that make each row of data in it unique.

We will consider attribute names as unique in our system. No relation type may have two attributes with the same name. This is true for both permanent and temporary relations.

Two relations obviously have the same type if they have the same type name. You must not create a new type, implicitly or explicitly, if it has the same attribute-list and key-list, irrespective of their order, as an existing type. For example, the attribute list snum,sname,address should be treated as the same as sname,snum,address when comparing two types.

The key-list is not mutually exclusive from the attribute-list in a type definition. In fact, a relation type can only be correct if all attributes listed in its key-list exist in its attribute list.

Definitions

As mentioned in the previous section, in part 1, relations/tables don't actually have any rows of data; only their name, type and how they have been derived is known. Relations and their types may be defined explicitly via the TYPE and OPEN instructions. It is also possible for relations and relation types to get defined implicitly. such relations and types are temporary They only exist during the evaluation of parts or all of one algebraic expression.

Here, I'll describe definitions better via some examples:

The following instruction allows the user to define a relation type called ptype. As you can see the name of a type, the list of attributes, and the key field(s) are defined via the TYPE instruction. This type is remembered until the command file is completely processed.

TYPE ptype [pnum] pnum,pname,paddress,weight,size

Any of the following has a sytax or semantic error:
- TYPE ptype pnum,pname,paddress,weight,size
- TYPE ptype [] pnum,pname,paddress,weight,size
- TYPE ptype [xyz] pnum,pname,paddress,weight,size
- TYPE ptype [pnum] pnum,,paddress,weight,size
The following instruction must occur after the TYPE instruction and allows the user to open the file p1 which is of type p1type. In this part of the project, no file is actually opened, but the name of the relation and its type is remembered until the user's command file is completely processed.

OPEN p1 ptype

Any of the following has a sytax or semantic error:
- OPEN p1
- OPEN p1 xtype--where the type xType does not exist
The following instruction causes a union between relations p1 and p2 to take place; assume they they are both of type ptype:

p1 UNION p2

What is significant here is that an implicit relation is created to hold the result of the union. This result is only kept temporaryly. "p1 union p2" is the same type as p1 and p2, so no new type is needed.
Any of the following has a sytax or semantic error:
- p1 UNION
- p1 UNION x1--where x1 is either undefined or of a different type than p1
The following instruction causes a times between relations x1 and p1 to take place; these relations are expected to be of different types with no shared columns.

x1 TIMES p1

Any of the following has a sytax or semantic error:
- p1 TIMES p2--assuming p1 and p2 are of the same type
- p1 TIMES y1--where y1 shares one or more attributes with p1
An implicit relation is created similar to the previous instruction; In this case, however, it is very likely that a new type is also created since the resulting relation will have all of the columns from x1 and all of the columns from p1. The relation created is temporary; as is its type, if it does not already exist.
The following algebraic expression will be used to demonstrate, how long an implicit relation or type must be kept around:

(((p1 UNION p2) JOIN p1) TIMES x1) TIMES l1

"P1 UNION P2" is the first subexpression that must be evaluated. The implicit relation created is only needed to be kept around until the JOIN operation is completed. Similarly, the result of JOIN is only needed to be kept around until the TIMES operation is completed. As far as the types are concerned, "p1 UNION p2" does not require a new type to be created. In the case of the JOIN, it is very likely that the type of the result is not already a permanent one. A temporary type needs be created and after the TIMES operation with x1 this type will get discarded.

Your program stops processing a command line once an error is found in it. A reasonable error message is needed once a error is found. The following describe the behavior of definitions that are algebraic expressions and the rules that the users of your interpreter must Adhere to:

UNION,INTERSECT,MINUS require operands that are of the same type, and produce a relation that is the same type as their operands.
The operands of JOIN may share zero or more attribute. The attribute list of the result is the union of the attributes of the types of the operands. The same thing is true for the key list of the result. The type of the result may already exist, otherwise a new type is created.
The operands of TIMES must not share any attributes. This `restriction only exist in our implementation and it is to avoid ambiguity. Repeated attribute names in an attribute list are to say the least, difficult to deal with. The attribute list for the result of times is the union of the attributes of the types of the operands. The same thing is true for the key list of the result. The type of the result may already exist, otherwise a new type is created.
Selection(WHERE), produces a result that is the same type as its operand.
Projection ([...]), may produce a relation that is of a new type. The attributes that are listed inside brackets are projected. The determination of the key list is a bit more complicated. If all attributes of the key list of the operand have been projected, then the key list of the result of the projection is the same as the key list of its operand, otherwise, all attributes projected will also constitute the key list for the result.

Relational Algebra Interpreter

For this part of project 1 you must actually perform the RA operations. Each OPEN instruction will cause your program to open a file. The name of each table opened matches the name of a file in your project directory which holds the data for that table.
Your interpreter should read the data from the file associated with a table opened into an internal table; performing RA operations will not require any file access.
Shortcuts for the user in your GUI; such as, making the first line of a file being opened provide the attributes/keys for the table are allowed. This way, the user says open a table, and you get the attribute/key list from the file instead of having the user type the stuff.
Values in each row should be organized according to the attribute list of the table. For instance, if the attribute list is: a,b,c. This would mean that each row of the input file contains three values, first value is for a, second for b, and the third for c. All values are considered to be alphanumeric.
Be sure to ask in class if you still have questions about how we perform algebraic operations on tables.
If you have developed a command file interpretor, you could have it process the commands in the basic command file and the more complex command file. These two file open p1, p2, s1, pr, sp; which you may wish to download for your part2 even if you are doing the GUI. These files are formatted to contain exactly 10 characters for each attribute and since we only have the concept of strings for our values; when appropriate, some are right-adjusted in their designated space.
You may assume that each attribute value has up to 10 characters. Each file will contain a row of data for each instance of the table associated with it. If it helps your interface, don't let tables have more than 10 columns.