============================================================ BOOMS B* Object Oriented Music Structures Eli Barzilay, Mira Balaban, Michael Elhadad ============================================================ *** Introduction part Motivation --- Computer Music, but the result applies to many `creative' editing. Goal: Provide a better Computer-Music Environment --- a music editor that will support the composer's intention in the form of structures with identities. A perfect tool that can capture human intentions this way is something that is very known in CS --- programming languages. The basic capability that we get by programming is abstracting over common structure (e.g., flat HTML web pages vs programs that generate them). ============================================================ Disclaimers: * I will talk about lambda calculus from a basic level, but we didn't invent a new calculus. * I will talk about editing too, another thing we did not invent. * Some people view `programming' as sort of a technical [inferior] activity (theoreticians, managers, artists). I take programming as a thing to be proud of... * I am a student in Cornell, but this is actually work I have done in Ben-Gurion University in Israel, with Mira Balaban and Michael Elhadad. * Finally, I am not a musician, I just enjoy music. ============================================================ *** Propaganda part My opinion is that the most fundamental concept of CS is capturing intentions --- using sharing and abstractions. In other words --- we can reference things back by their identity, for example, a musician in a studio can play back the same recording using different settings instead of playing it again (and even in that case, there is the knowledge that this is the same piece...). Also, we can generalize common patterns into reusable concepts. ============================================================ Some high-level examples from different areas: Natural language is in its essence an abstraction we use to communicate ideas. It uses both generalizations and references. --- examples... There is structure all around us -- we constantly learn about things, we keep referring to stuff we remember. Music is inherently structured, otherwise we'd hear just random sound waves. Score notations have instructions to ``play the last piece again'', or ``use the same tempo as in the previous piece''. ============================================================ Mention here Common Music, which is just adding a music layer on top of Lisp. They just started to develop a GUI, but the main mode of work was by just writing code (e.g., they're now on sourceforge, but they don't have screen shots). http://sourceforge.net/projects/commonmusic A much younger example of a this approach is Haskore, implemented on top of Haskell --- being purely functional is more convenient for representing musical data, but it has the same fundamental problem: http://www.haskell.org/haskore/ ============================================================ The problem: programming is something that takes years to do properly, even for people who work as programmers. (Example: the cut-and-paste programming methodology that is so common when you encounter a big and complex system (anyone ever tried MFC?) is not a good example of programming...) Naturally, a much worse problem for non-programmers. A good example is ``scripting languages'': an excellent tool for programmers (Emacs) but when you try to get it through to the `masses' (WordBasic), you end up with most people at exactly the same place. The basic fact is --- Computers won: there is hardly an office without a computer. Scripting is still limited. The challenge is giving such people the power of programming without making them programmers. This is extremely needed in Music, as seen when musicians with no CS background learn how to use lazy infinite streams in the Common Music course (used to automatically generate background notes)... ============================================================ *** Value abstraction part Avoid variables. One attempt at solving this --- ``Concrete abstractions'', was suggested in Grame (Lyon, France). http://www.grame.fr/Research/ The basic idea: start from the domain people know and care about, and keep yourself in it. ============================================================ A quick view over Yann's presentation, all the way to the Y combinator in the end. ============================================================ GCalc quick interactive example. ============================================================ So what are the problems so far: 1. Some technicalities like coming up with colored notes to avoid name problems. 2. Extending the basic calculus with domain specific semantics is confusing, and seems ambiguous. 3. The end result is not much simpler: this is an inherent problem, for example, understanding how the Y combinator works is difficult no matter what costume you put it in. 4. Specifically, it inconvenient when you handle non-trivial examples: the GCalc demo shows this --- complicated functions are hard to construct and the result is pretty much a black boxes. ============================================================ Elody: a system that implements these principles. http://www.grame.fr/Elody/ Specifically, go over abstractions in Elody: http://www.grame.fr/Elody/abstraction.html With pictures, just to show that this is possible... When you abstract over more than one note, then you get the expected abstraction -- you simply form a lambda expression and at application time you just do a textual search and replace (which even allows using an abstraction as the value you abstract over...). For a single note, however, the result is a transformation that depends on the difference between some attributes between the two notes (pitch, velocity, duration, etc). ============================================================ The problems are still there... * Again, these functions are as complex as using variables, defeating your original motivation. The delta from a non-programmer to using these abstractions is not much smaller than using a programming language. * The problem of domain specific extensions is clear now: they cannot unify the way abstractions are generated in the case of a single note and a compound music piece. Moreover, there are cases where it is unclear what the resulting abstraction is, for example: \do.do-re-do, is the middle `re' a shifted `do' or a constant? This happens in the GCalc world too (example). ============================================================ But still, the idea looks appealing --- there is something about it that makes it attractive. The basic property that Elody is trying to achieve is programming without being a programmer. The attempt is to have abstractions without all the fuss of knowing some syntax, control structures etc. The only thing that users should know, is the semantics of what the work with. ============================================================ Under this understanding, we can see that they attempt to `generalize' abstraction: it is an operation that gets two objects, and produces a function object that knows how to take the first one to the second one. This works nicely for what they have now, and it shows what they would have in an ideal world: a general function discovering code. For example, applying \123.1232123 on 5678 would result in 5678765678... But this is, of course, impossible: the function is the representation of the intent behind the abstraction. What will applying \2.4 on 3 yield - 5 or 6? (We have expressions to represent structure, and variables to represent identities.) \do.do-re-do is ambiguous exactly because of this... It just happens that in a case of any two given notes there is one transformation which is `obvious' (but that's not always true, given different musical contexts). ============================================================ Assume for a short while that this is possible: there is still a problem. We can only specify transformations by supplying two concrete examples of `before' and `after' object, instead of concentrating on the process itself which is what you intend to express. This is useful in cases where you have two such examples already and you just want to get the transformation instead of specifying it yourself. But the fact is that if you use a computer to edit music, then you would generate the `after' piece from the `before' piece using the computer, only to expect the computer to rediscover what you just did! But this is a good hint about what can be done in this case: if the computer is used to transform music objects, then just use the operations performed as this transformation. For example, it will allow detecting the meaning behind `do-re-do': either we pasted a `re' in the middle, or we shifted an existing `do', in any case, we will know what to do in case we abstract that `do' away (or, in the case we abstract a completely different `do'). ============================================================ *** BOOMS hierarchy part (a bit disconnected from the above) Creative activity is usually done in an `experimental mode': it is not a well planned activity. Still the result is a piece of music which is a structured object. ============================================================ Existing editors can be roughly placed on a range varying from pure WYSIWYG editors that let you edit an extensional object, to editors that keep an intensional representation. For example, simple graphic editors allow you to edit a bitmap, and simple text editors edit raw character streams --- there is almost no structure at all. WYSIWYG is usually preferred, since it lets people use the objects they're interested in instead of going through `obscure' specifications. However, the fact is that we do live in a structured world... Therefore, editors usually move from a flat representation to a more structured one. One example is bitmap graphic editors that use layers. Another example is word processors that try to contain more and more properties about text and formatting properties (migrating sometimes results in a mess of WYSIWYG with hacked up structure). ============================================================ The structure level of an editor is not necessarily related to its support of abstractions: we can place editors on a 2d graph. For example, Elody is pretty flat (since applying functions yield immediate flat results). On the other hand, vector graphic editors allow arbitrary structuring but no abstractions. ============================================================ We further distinguish between abstractions with and without structure as: * Value abstractions (as in GCalc, similar to the simple lambda calculus). * Structure abstractions, which are generated by referring to structural components of objects (implying an intensional world). ============================================================ The question is if it is possible to have an editor that will have both abstractions? BOOMS was initially started as a project that attempts doing this. And at the same time keep the straightforwardness of a WYSIWYG GUI, and the other goodies. A side goal was to design a clean system that is robust and easily extended, specifically, separate general editing functionality from domain specific knowledge. This resulted in a general system that we could use in three different toy worlds. ============================================================ One important feature for editing is identities: Expressing the fact that two components are actually the same object rather than just happening to be observably equal. This is usually ignored, but one example from the commercial editing world is linked OLE objects. Structure editors can support this, but extensional editors cannot since they use the ``flat form'' of the real object. Abstractions can play an additional role to achieve this, similar to the way a `let' expression can be written as a function application. For example --- as we said above, variables provide sharing: * `4' is a simple flat value. * `2 * 2' is structured, but we don't know about any identities. * Finally, we can distinguish between `let x = 2 in x * x' and `let x = 2 and y = 2 in x * y' [Note that this distinction has nothing to do with algebra, it results from the meaning of the written expression.] ============================================================ Here comes a BOOMS demo of hierarchical editing. ============================================================ Example for expressing identity with structures: figures: 4 -- a cube with some parts meant to be identical, 6 -- two representations, the second is the right one. Structure abstractions: parts figure: 7 Structure abstractions: application figure: 9 Some restrictions on the way we can abstract: figure: 8 ============================================================ *** BOOMS double view part It is clear now that structure editors using structure abstractions are more expressive than extensional editors using value abstractions. But there is one important advantage for extensional editing: concentrating only on the structure throws us (a little) back to programmer-world, while WYSIWYG editing puts the focus on what you really know, discarding external tools that provide notational information. ============================================================ For example, going back to the booms example above, it is clear what the cube is --- you just look at it, but the BOOMS hierarchy is something you have to `parse' to know what it stands for. figure 10 Indeed, the way Elody works: you get to define abstractions without leaving the `real' world. The conclusion: we want the best of both these worlds. ============================================================ Note that the constructor nodes of BOOMS can correspond to editing operations on an extensional editor. This makes the connection with the above description of Elody: let the users see a WYSIWYG environment with familiar editing operations. ============================================================ While users work, keep track of the operations they make to construct a hierarchy of editing operations, where nodes are objects that occur during the editing process. Copy-paste operations can construct references to the same object (using parts of different objects can be dealt with as any structured editor: need to `explode' the things first). ============================================================ Now, if the hierarchical abstraction is simultaneously accessible, users can switch to inspect the created hierarchy, and use it to create hierarchical abstractions. In short, we want an editor capable of a double-view. figures 11, 12 here ============================================================ In such an environment, our claim is that useful structures that will allow creating useful abstractions will pop up naturally, following the editing process. The hierarchy itself can be used too: if an object is pasted two additional times, we can modify it by `plugging' in different sub-parts and all instances will be modified accordingly. Obviously, structure abstractions are available. ============================================================ Note there is no problems making this live peacefully with Elody's value abstractions --- all we need is to add an editor operation for (value) abstraction, and it becomes part of the editing hierarchy too. This is the reason BOOMS has `normal' abstraction and application nodes. In fact, as said above, it does make sense to have value abstractions in cases where simple music pieces are entered through an external device rather than being constructed note-by-note. (Solves the problem of either constructing music note by note, or inputting small music blocks, losing their structure.) The fact that these are small pieces might make such function guessing more feasible (since the functions cannot be too complex). It might even be possible to detect for an input music block a previous one that is a simple transformation of it and add that link to the hierarchy. ============================================================ The way users interact with such a system is expected to move gradually: * from a WYSIWYG-oriented mode, * to understanding and using the hierarchy on occasions, * to finally using it in its full power to create structure abstractions, while keeping the extensional convenience. ============================================================ Such a hybrid system has all the benefits of both editing modes, while keeping away from irrelevant language formalism. Maybe it will happen some day...