Tuesday 23 November 2004

A lightweight 100% Java RDBMS


Introduction



IBM have a 100% pure Java relational database management system which  has been called at various stages in its history SQL/J, Cloudscape and Derby. IBM are now eagerly pushing the system to open source developers under the 'Cloudscape' label. I downloaded it to evaluate for use with PRES and other Jacquard applications.

License



I'm used to using (and creating) things which are open source. IBM claims Cloudscape is now 'open source', but if so it's some bizarre new definition of open source which is opaque to me. If you download Cloudscape from IBM you in fact have to click through (and it comes with) a software license file which looks as intimidating and onerous as any conventional software license. In fact what is going on here is that IBM have given a snapshot of the Cloudscape codebase to the Apache foundation, from which you may download it here. The Apache license is much more straightforward and less onerous than the IBM one.  The version of Cloudbase you can get from IBM appears to be based on the Apache version, but if you download from Apache you don't get the nice installer. To avoid confusion, I shall refer to the RDBMS throughout this review as 'Cloudscape'. I did not, this morning, find any significant difference in use between the IBM ('Cloudscape') and the Apache ('Derby') versions of the system.

First impressions



You can download Cloudscape from IBM in three different packages: a Linux installer which is huge and includes IBM's Java 1.4.2 for Linux; a Windows version which is similarly huge and includes IBM's Java 1.4.2 for Windows; and a 100% pure Java installer, which is sensibly small (9Mb) and sensibly assumes you wouldn't be interested if you didn't already have a JVM. This was the version I tried.

The pure Java installer (InstallShield) worked very nicely on Linux, offering sensible defaults. By contrast to so many open source projects, it looked very polished. Similarly, the PDF documentation looked very polished, very IBM. However - and this is a common gripe - the page numbering in the PDF was off, because the topmatter of the paper document uses a different numbering schema to the body and this different schema is not reflected in the PDF. So, for example, page 132 in the PDF maps onto page 120 of the document, which makes consulting the index or table of contents pretty frustrating. Hey, IBM, this is a small point but very easy to get right. Also of course you can't search the PDFs. What on earth is the point of distributing documentation in a digital format if it can't be searched? And a final gripe on documentation; the documentation index page has a link to online documentation, which I followed in the hope it would lead to searchable documentation. Unfortunately that was '404 not found'. And that, IBM, is simply incompetent.

Fortunately the documentation is available online at Apache: http://incubator.apache.org/derby/manuals/.

Following the instructions in the documentation, I then tried to start the Cloudscape executive, a program called 'ij'. The startup scripts had been automatically created and set up for me with the paths I had chosen for the installation.

But they didn't work.

Well, OK, that needs a bit of amplification. AIX, IBM's own UNIX, uses as its default shell the Korn shell, ksh. Debian Linux, which I use, uses as its default shell the Bourne Again shell, bash. Generally the syntax used by the two shells is so similar that that isn't a problem, but when I tried to invoke the ij script I got a class not found exception:

-[simon]-> /opt/ibm/Cloudscape_10.0/frameworks/NetworkServer/bin/ij.ksh
java.lang.ClassNotFoundException: com.ibm.db2.jcc.DB2Driver

Bizarrely, when I manually executed each of the commands in the scripts in turn, the ij executive started without problem. Clearly there is something in the scripts that bash does not like, but I haven't yet investigated what.

Features



Because of problems with the documentation discussed above, I can't be very definite about missing features; the features I sought may be present but I simply failed to find them in the documentation.

Users, groups and roles



Cloudscape clearly has the concept of a 'user', since it's possible to request the value of the current user; however you don't seem to be able to grant privileges to users, nor to revoke them:

ij> create user simon with password 'xyzzy';
ERROR 42X01: Syntax error: Encountered "user" at line 1, column 8.
ij> grant select on foo to app;
ERROR 42X01: Syntax error: Encountered "grant" at line 1, column 1.

You can pass in a username token and a password in the database URL. User validation is not performed by cloudscape, but cloudscape can be configured to co-operate with external validators. In practice, all using a different username appears to do is to select a different default schema.

The system appears to have no concept of a group or role.

Views



Cloudscape has views but not, it appears, view ordering:

ij> create view froboz as select ban from foo;
0 rows inserted/updated/deleted
ij> select * from froboz;
BAN
------------
froboz

ij> drop view froboz;
0 rows inserted/updated/deleted
ij> create view froboz as select ban from foo order by ban;
ERROR 42X01: Syntax error: Encountered "order" at line 1, column 43.
ij> select ban from foo order by ban;
BAN
------------
froboz

1 row selected

Constraints and Integrity



Cloudscape appears to have a remarkably full constraint syntax. I haven't verified that the constraints actually work. Provided they do, we can work with these data constraints

ij> alter table word
        add constraint word_head foreign key (head)
        references word
        on delete set null;
0 rows inserted/updated/deleted

Datatypes



There appears to be no BOOLEAN data type or equivalent, but we can work round this using CHAR(1) and the values 't' and 'f'; there is no MEMO or TEXT datatype, but there is a CLOB. There is a full set of DATE, TIME and TIMESTAMP datatypes; date format is 'yyyy-mm-dd'.

Conclusions



Cloudscape's big weakness from my point of view was security. There appears no way of setting different access permissions for different users. This means that all security must be in the application layer. Generally Jacquard applications are not built that way; instead, they're built on a database layer security model. Of course, security isn't always critical, and for many users of a PRES system, for example, HTTP authentication of the admin directory would be sufficient.

On the positive side, the system is very easy to install, reasonably easy to set up, and consumes relatively little in the way of machine resources.

The IBM version (Cloudscape) offered no benefit over the Apache version (Derby). Although Cloudscape comes with a slick and polished installer, what it installed did not actually work out-of-the-box; the documentation was in an inconvenient format which was hard to work with and the license terms were onerous. By contrast the Apache version (Derby) was a smaller download, in practice just as easy to set up and get running, and the Apache documentation although apparently based on the same source was constructed in HTML and much easier to use.

There appeared to be little functional difference between the two versions.

No comments:

Creative Commons Licence
The fool on the hill by Simon Brooke is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License