Homework #7 Sample Solution
Note: The book uses the greater-than and less-than symbols to show production groupings, but since I am authoring this page in HTML, such symbols are particularly difficult and tedious to render. So instead, I will be using square braces (thats '[' and ']') to show production groupings. Sorry for any confusion this might cause. Also, remember that when reading production rules, if you see a symbol in quotes, such as '{' or 'ShapeInstance', that means that it should appear as stated verbatim in the file. If, however, you see certain symbols not in quotes, such as { or ShapeInstance, it represents part of the definition of the production rule. To help avoid confusion, I will italicize production rule syntax, and bold verbatim strings.

12.1a: Defining two file types
To discover what file types I'll need in the system, I should ask myself: "What data needs to be persistent?" That is, what should my program remember between the time when it is quit at the end of a session and the time that it is started up to begin the next session? One set of persistent data is clearly the diagram itself; users will want to save their work, possibly even transport it around to different computers. The other set of persistent data is a little less obvious: we are asked to maintain these libraries of shapes, and presumably several diagrams might use a single library. If that happens, then it seems that a library shouldn't live inside the same file as the diagram, because then we would either have (a) some diagram files that rely on others to provide the library that they need in order to be rendered, or (b) each diagram has its own copy of a library, thus creating the possibility of redundant, inconsistently maintained copies of a single library. My guess, then, is that libraries should become their own file type.

12.1b: Assigning the data into the files
Data Element
StandardShape-ShapeInstance links Diagram file In order to represent this link, information about both the StandardShape object and the ShapeInstance object must be recorded in one place. If that place were the library file, though, we would have to record ShapeInstance information with the library, which would create a dependency from libraries to diagrams. This is bad because diagram files will probably be relatively transient; the user will be changing them frequently, possibly even deleting them. If, in order to load a library file, we require that all of the diagram files that used it be present and locatable on the current machine in their original state, we would not have a very flexible system.
Diagram objects Diagram file
GraphicElement objects Diagram file
ShapeInstance objects Diagram file
Text objects Diagram file
Library objects Library file
StandardShape objects Library file
Diagram-GraphicElement links Diagram file

12.1c: Implementing identity across multiple files
There are several potential data items that can be considered "identifying" (that is, similar to the notion of a primary key in RDBMS land.) First, there are the explicit attributes: libraries have names (libraryName), StandardShapes have names also (shapeName). But there is also a hidden kind of identity in this system: file names can serve as identifying entities. So:
Object class
Diagram Each diagram will be placed in its own file. This two diagrams are distinct if they reside in separate files.
GraphicElement All of the graphic elements linked to a particular Diagram will reside in the same file as that Diagram. In addition, the GraphicElements will be ordered in the file, and their order will identify their place in the diagram. Since GraphicElements never are linked with more than one Diagram object, we need not worry about identifying a GraphicElement object across files.
ShapeInstance Since ShapeInstances are GraphicElements, they are identified in the same way.
Text Since Texts are GraphicElements, they are identified in the same way.
Library Libraries are kept in a separate file from diagrams. All Library objects reside in the same file, and are identified by their libraryName attribute.
StandardShape StandardShape objects reside in the library file with the Library objects, and are uniquely identified by their shapeName. Since a given shapeName can only identify at most one StandardShape object, we need not worry about distinct StandardShapes with the same name.

12.1d: Locking mechanisms
The application we are designing does not mention any way to change the composition of the library file, so it can be assumed that, once generated, the library file's contents are read-only, so no locking will be necessary. For the diagram files, either of two approaches may be used:

12.1e: Implementing domains
There are several domains in this application's persistent data model, listed in the book in diagram E12.3. Following are some production rules that might be used for writing them to an ASCII file; in many cases I am making assumptions about the structure of the domains that are not really implied from the problem statement--- in real life, you would ask your client about these design decisions, rather than making assumptions. Also, I realize that you are not experts at reading grammar productions and regular expression notions, so I've included a small natural-language description of each production rule in the Meaning column.

I noticed that the book uses the atoms "INTEGER" and "STRING" when defining domains; I think this is cheating, so I have defined the real ASCII implementations here. If I were to do it the book way, I would have used productions like "number:INTEGER;" and "name:STRING;" instead of the more explicit versions below.

coordinate : '[' number ',' number ']' This means that whenever a coordinate needs to be written, it will consist of two numbers separated by a comma. For example, to represent the coordinates 400 by 600, one would write [400,600] to the ASCII file (In order to figure out how to represent the numbers 400 and 600 in the file, we would have recursively used the number production rule below.)
number : {'1'-'9'} {'0'-'9'}* This means that a number is represented in the file as at least one non-zero numeric digit, followed by any number of numeric digits. (The * after the second group of digits indicates that it may be repeated zero or more times, just like in UML.)
name : {'a'-'z'} {'a'-'z'|'0'-'9'|'_'}* This means that a name can be represented by at one alphabetic character, followed by zero or more characters which are either alphabetic, numeric, or the underscore ('_') character. Again, the * symbol indicates "repeated zero or more times". Like numbers, names are not enclosed in square braces because they are "atomic" to the file format, and should be recognizable without the braces.
binaryString : '[' {'\\'| '\['| '\]'| {? except '[', ']', and '\'}}*']' This production is a bit complicated because a binaryString might contain ANY character at all, including square braces. Unfortunately, our parser will expect square braces to mean the end of a sentence, so we will have to do something special to store binaryStrings that contain square braces. The solution is to define a reserved character (in our case, the backslash) and then demand that if a backslash or a square brace appears in the binaryString, it must first be preceded by a backslash. This way, the parser will know when reading the file back in that when it encounters a lone square brace, that indicates the beginning or end of a sentence, but when it encounters "\[", it should treat the pair of characters as a single square brace. This is a very old trick developed by UNIX hackers many moons ago, so don't feel bad if you didn't think of this.
color : '[' {'0'-'9'} {'0'-'9'} ','
{'0'-'9'} {'0'-'9'} ','
{'0'-'9'{'0'-'9'} ']'
This rule says that a color is represented by a square-bracketed set of three two-digit numbers, separated by commas. These 3 numbers are the red, green, and blue percentage intensities of the color that is represented. The parser will be able to distinguish colors from coordinates because coordinates have only two comma-separated numbers, while colors have three. Note that if we had 3D coordinates, we would need to add something to the color rule to help the parser distinguish between colors and 3D coordinates.
pattern : { '\SOLID' | '\STRIPED' | '\HOLLOW' } This rule says that a pattern is one of the strings "SOLID", "STRIPED", or "HOLLOW", preceded by a backslash. The backslash is there so that the parser can distinguish these pattern names from name domains defined above. This works because we defined names so that they could not contain backslashes; so when the parser sees the backslash, it knows that what's coming next is going to be a pattern type. The actual pattern types I just made up; in real life you would ask the client how patterns are specified rather than making an assumption like this.
longString : binaryString This rule says to treat text strings just the same as binary strings are treated. I've done this because I don't want to place any restrictions on what text the user can insert into their diagram; I haven't restricted the definition to alphanumeric characters because some fonts like WingDings store icons in the upper 128 codes of their character set, so users should be able to include those in their drawing.
boolean : { '\T' | '\F' } This rule says that a boolean value is represented by either a T or an F, preceded by a backslash. Again, the backslash is there so that the parser can distinguish a boolean "T" value from the name "T".

12.1f,h: Implementing the file productions
I'm going to do f and h together, because I don't think there is a clean way to separate the file definition out into classes. So, the following sequence of productions implements the entirety of both file types. At times I will use productions on the right hand side of these rules that refer to the domain rules defined in 12.1e, so stay sharp. Also, if you are careful about designing your grammar, you can actually create non-ambiguous sets of production rules that don't use braces very often (or at all) by requiring the parser to do a lot of inferencing. But to avoid confusion, I'm using brackets pretty liberally, here:

DiagramFile : VersionTag 'Diag-Type' Diagram A Diagram file will begin with a versioning tag, a small identifier to indicate that the file's purpose is to house a diagram, and then it will contain a single Diagram object, defined below.
LibraryFile : VersionTag 'Lib-Type' Library* A Library file will begin with the same versioning tag, an identifier indicating that it is a library file, and then a number of Libraries, defined below. Remember, the * indicates "zero or more", so a particular file might contain a million libraries, or none.
VersionTag : 'OOAD-Homework-7-Editor-v1.0' A versioning tag. This is a fast way to let the parser know for sure whether or not the file it has been asked to read was even created by the correct program.
Library : '[' name StandardShape* ']' The name of the library, and then zero or more StandardShapes, all enclosed in brackets. This embedding of shapes within a library is how we implement the Library-StandardShape link.
StandardShape : '[' name binaryString ']' Each shape has a name that is used to identify it from the library, and a bitmap, which is rendered in the file as a binaryString. I would argue that you don't need brackets surrounding the StandardShape construct, even though I have put them here. Can you see why?
Diagram : GraphicElement* Since all a diagram really is anyway is a big list of graphicElements, that is all we have defined Diagram to be. As with libraries, embedding the GraphicElements within Diagram's representation allows us to implement the Diagram-GraphicElement link.
GraphicElement : {ShapeInstance | Text} Even though GraphicElement has attributes, I have chosen to push them out to the subclasses, because when this file is actually parsed, it will be a big help to the parser if it doesn't have to look forward very far to discover which object it should create to accept the values being read from the file. Thus, GraphicElement has no real content at all, except to be a placeholder for either Text or a ShapeInstance.
Text : '[Text:' coordinate number longString name boolean boolean number color ']' Text just consists of all of Text's attributes, right in a row, with its' superclass attributes at the beginning. The "Text" tag appears at the beginning so that the parser can discover early and easily that this is a Text object rather than a ShapeInstance. Remember that in order for it to have gotten to this point, the parser must be trying to read a GraphicElement; but a GraphicElement can be either of the two classes, so we need to give it a hint so that it knows which rule to process.
ShapeInstance : '[Shape:' coordinate number name name number number color pattern color number pattern ']' Again, this definition just consists of the class's attributes all in a row. The first two attributes are the superclass's attributes. The next two names at the beginning of the ShapeInstance represent the name of the library and the name of the StandardShape that this instance is based on. This implements the StandardShape-ShapeInstance link.

12.1g: Unordered attributes version
Here are the changes I would make to create an unordered attributes version. Everything would be the same except for the Text and ShapeInstance productions, redefined below (along with some additional supporting productions that I didn't need before.) Also, I am growing impatient with this, so the Meaning column is hereby deleted. An unordered representation would be useful if most attributes actually were never set by the user, but almost always had default values. One would save a large amount of space if, for example, a user created a thousand shapes but didn't change them at all from their default settings other than to move them.

Text : '[Text:' {centerPoint | orientation | textBody | font | bold | italic | fontSize | colorAttribute}* ']'
ShapeInstance : '[Shape' {centerPoint | orientation | libraryName | shapeName | xScale | yScale | fillColor | fillPattern | outlineColor | outlineThickness | outlinePattern}* ']'
centerPoint : 'centerPoint:' coordinate
orientation : 'orientation:' number
textBody : 'textBody:' longString
font : 'font:' name
bold : 'bold:' boolean
italic : 'italic:' boolean
fontSize : 'fontSize:' number
colorAttribute : 'colorAttribute:' color
libraryName : 'LibName:' name
shapeName : 'ShapeName:' name
xScale : 'xScale:' number
yScale : 'yScale:' number
fillColor : 'fillColor:' color
fillPattern : 'fillPattern:' pattern
outlineColor : 'outlineColor:' color
outlineThickness : 'outlinkThickness:' number
outlinePattern : 'outlinePattern:' pattern

12.1i: A sample file
If you were massochistic enough to try this, here's the solution based on my rules defined in 12.1f,h (The XXXXXXBITMAPXXXXXX symbol indicates that a bitmap would be here, but I'm not going to try to hand-generate the bits for these images.) Interestingly, there is a conflict in the given information, which is that the "green" square has no fill color and a black outline. Needless to say, something about it should be green or else it's not a terribly green circle. I assumed that the fill color was green:

Libraries File:
OOAD-Homework-7-Editor-v1.0 Lib-Type

[shapes	[circle [XXXXXXBITMAPXXXXXX]]

[aircraft [plane [XXXXXXBITMAPXXXXXX]]

The Diagram File:
OOAD-Homework-7-Editor-v1.0 Diag-Type

[Shape: [-2,0] 0 shapes circle 1 2 [00,99,00] \SOLID [00,00,00] 1 \SOLID]
[Shape: [2,0] 0 shapes square 2 2 [99,00,00] \SOLID [00,00,00] 1 \SOLID]
[Text: [-2,0] 0 [ellipse] helvetica \T \F 12 [00,00,00]]
[Text: [2,0] 0 [square] helvetica \T \F 12 [00,00,00]]

12.1j: A sample file using unordered attributes

Libraries File:
OOAD-Homework-7-Editor-v1.0 Lib-Type

[shapes	[circle [XXXXXXBITMAPXXXXXX]]

[aircraft [plane [XXXXXXBITMAPXXXXXX]]

The Diagram File:
OOAD-Homework-7-Editor-v1.0 Diag-Type

[Shape: centerPoint: [-2,0] orientation: 0 LibName: shapes ShapeName:
circle xScale: 1 yScale: 2 fillColor: [00,99,00] fillPattern: \SOLID
outlineColor: [00,00,00] outlineThickness: 1 outlinePattern: \SOLID]

[Shape: centerPoint: [2,0] orientation: 0 LibName: shapes ShapeName:
square xScale: 2 yScale: 2 fillColor: [99,00,00] fillPattern: \SOLID
outlineColor: [00,00,00] outlineThickness: 1 outlinePattern: \SOLID]

[Text: centerPoint: [-2,0] orientation: 0 textBody: [ellipse] font:
helvetica bold: \T italic: \F fontSize: 12 colorAttribute: [00,00,00]]

[Text: centerPoint: [2,0] orientation: 0 textBody: [square] font:
helvetica bold: \T italic: \F fontSize: 12 colorAttribute: [00,00,00]]