Customising the DSpace front page

A new installation of DSpace and the XMLUI interface has a default front page containing information about DSpace. It is intended that this be replaced with a description of the actual repository. It is necessary to customise the front page news item and the default static text throughout the repository.

DSpace front page

The front page is configured as an XML file. The XML is similar to XHTML in some respects but the use of HTML tags is inconsistent. This data in this file is compiled into the DSpace web application during the build process. Most web browsers will attempt to format even bad XHTML, however DSpace does not use the file as XHTML but parses it as XML. This process requires correctly formatted XML. Should it be invalid XML such as missing closing <p> tags, the parser will silently fail and all the content will be ignored.

Although some XHTML markup works as expected, links as anchor (<a>) tags are not used. To include links, the following markup must be used:

<xref target=”http://myuni.edu/”>My University</xref>

The front page XML file is stored at:

[DSpace working directory]/config/news-xmlui.xml

Which needs to be copied back to:

[DSpace release directory]/dspace/config

Repository front page

Front page news item and message text configuration

DSpace default text

The DSpace application interface is built using a default list of text fragments stored in a resource file as constants. Our design incorporated the use of the repository name ELOGeo as opposed to the generic DSpace repository.

The XML configuration file is found in the following location.

[DSpace working folder]/webapps/xmlui/i18n/messages.xml

DSpace will need to be rebuilt for changes to take effect so the edited file must also be copied back to the source location at: [DSpace source directory]/dspace-xmlui/dspace-xmlui-webapp/src/main/webapp/i18n.

As with the Repository theme and asset files, we copied the messages.xml file from the source instance of DSpace.

One of the recurrent findings with this project is that we know what we want to change and we know how to change it – using images, CSS and XSL. The problem is finding where in DSpace we need to change files. Because DSpace sometimes needs to be rebuilt, we must also remember to copy back changes to the source code and the locations in the source are sometimes not the same as in the web application output.

Updating DSpace from another instance

Understanding where DSpace content is stored

DSpace repository content consists of a database and files known as assets. The database acts as a fast index and the asset object data is stored as files in their native format but without an extension.

In addition to importing the SQL dump into the database obtained from the University of Nottingham, we need to copy over the corresponding assets. Should there be a delay in obtaining the SQL dump and the assets it is possible that they may not be synchronised. It may be worth taking the source site down to prevent editing whilst exporting the content. The asset files contain item content but also other information such as licence agreements. Although we took the assets at the same time as the SQL dump, when cross checking, we found the number of asset folders was greater than the number of items in the SQL dump.

DSpace Installation Key Locations

There are three important locations in our DSpace installation:

  1. The source release directory.
  2. The DSpace build directory and
  3. The Tomcat directory.

1. DSpace Source directory

The location of the downloaded DSpace source code. After unpacking the full source download this folder will be named after the DSpace version number e.g. dspace-1.7.2-src-release/. Most changes to the DSpace configuration require that it is rebuilt.

Any changes to the working web application therefore must also be made in this location.

2. DSpace Build directory

The DSpace build directory is populated when DSpace is built or rebuilt from the source code. The DSpace Web application is output to the location specified in [DSpace source directory]/dspace/config/dspace.cfg. This can be thought of as being the compiled working application. The asset files with item content are stored here.

3. Tomcat directory

Named after the version of Apache Tomcat e.g. apache-tomcat-7.0.11. Tomcat is the Web Server that provides a Web interface to DSpace. In our implementation, the Web Application in the build folder is copied into the Tomcat webapps directory. (Copy the folder [DSpace build directory]/webapps/xmlui to [Tomcat directory]/webapps/). The XMLUI Web application interacts with the DSpace application in the build directory.

Tomcat can be configured to point directly at [DSpace Build Folder]/webapps. This would seem simpler than having the copy of webapps – but it does require changing the default Tomcat configuration.

Key Files

Asset Files

The ELOGeo asset files go under the [DSpace build directory]/assetstore/. Inside will be a number of numeric directories containing the repository content. When DSpace is rebuilt this content will not be deleted. It is not necessary nor a good idea to copy back the files to the [DSpace Source Directory] because the files in the Build Directory are the current live files.

Theme Files

Several themes are available for DSPace. The ELOGeo repository uses a variation on the Mirage theme. This determines the look and feel of the DSpace interface using XSL, JS and CSS files to control the layout and styling of the HTML markup. If any changes are made to the live Web application, it is important that they be copied back to the Source directory so they will be incorporated in future builds. The theme files will be present in thre locations:

  1.  Source Code: [DSpace source release directory]/dspace-xmlui/dspace-xmlui-webapp/src/main/webapp/themes
  2. Application: [DSpace build directory]/webapps/xmlui/themes
  3. Tomcat copy of application: [Tomcat directory]/webapps/xmlui/themes

To use a specific theme, in our case “elogeo”, the configuration file needs to be amended:

DSpace source directory]/dspace/config/xmlui.conf

This required theme needs to be the last entry in the <themes> element.

<theme name=”[Theme name]” regex=”.*” path=”[theme directory name]/” />

DSpace will need to be rebuilt for the changes to take effect. It is possible to edit the theme in use without rebuilding DSpace. This will be outlined in a later post on editing themes.

This work has enabled us to gain an understanding of how and where DSpace stored content and how the layout of its interface can be customised.

This demonstrates that there is a relatively easy way of copying one instance to another using the database dump, asset files and some configuration changes.

The relationship between the three file locations isn’t at first obvious. It is crucial to understand which files need to be copied back to the source code and which files shouldn’t be copied back to the source code so they can be incorporated into future builds.

Configuring Apache Tomcat and Building DSpace

Apache Tomcat – Web Access

Web access to DSpace is provided by Apache Tomcat via the XMLUI web application (also known as Manakin). This interface is one of those provided with DSpace. It is based on the Apache Cocoon architecture. To configure this we need to set up the Tomcat host and connector as follows:

The configuration file for Apache Tomcat is located in the Tomcat directory (patterned after the version number) in our case apache-tomcat-7.0.11.

In our case the domain is elo-geo.net but on a local installation this could be localhost.

[Tomcat Directory]/conf/server.xml

<Host name="[domain]"  appBase="webapps"…

We have pointed the server to the location of DSpace Web application (XMLUI) which we have copied to [Tomcat Directory]/webapps. This could equally point directly at our DSpace build directory like this:

<Host name="[domain]"  appBase="[dspace build directory]/webapps"…

Our hosting environment is shared within our organisation and this means that port numbers can be in use by other applications. The default port for Apache Tomcat is 8080. There are a number of Apache Tomcat instances running in this environment and port 8080 is often is use. The practical issue with this is that Tomcat will not always start up properly because the operating system cannot assign the specified port. We have changed this to 8089 for this installation of DSpace to avoid clashes.

The port number is defined in the server.xml configuration file as a connector. We amended the entry for the default 8080 port:

<Connector port="8089" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />

If there is a suspicion that a port is in use, the traffic can be viewed on UNIX with the following command.

netstat -an | grep [port number]

Building and Rebuilding DSpace

The building of DSpace includes both Ant and Maven processes and finally copying the Webapps to the Tomcat directory and restarting Tomcat.

apache-tomcat-7.0.11/bin/shutdown.sh
#Build Maven Process
cd [DSpace source directory]/dspace
mvn package

#Build 2 Ant Process
cd target/dspace-1.7.2-build.dir/
ant -Dconfig=config/dspace.cfg update

#Back out to root directory
cd ../../../../
#Copy to Tomcat
cp -R [DSpace Build directory]/webapps/xmlui [Tomcat directory]/webapps

#Restart Tomcat:
[Tomcat-directory]/bin/shutdown.sh
[Tomcat-directory]/bin/startup.sh

#Check port usage
netstat -an | grep 8089

The build process will time stamp and retain some directories as a backup.

The rebuilding process is clear in the DSpace documentation. Assuming a clean installation and availability of the default Apache Tomcat port of 8080 this will work without problems. When a hardware environment is used that has a number of other users, Java applications and even instances of Tomcat there can be problems starting DSpace. Since the port number issue relates to the multiple users of Tomcat it is not really a DSpace issue, but nevertheless a common problem.

Jorum and ELOGeo: a geospatial community window onto Jorum

Introduction

The ELOGeo OER Rapid Innovation project (AKA Breaking Down Barriers: Building a GeoKnowledge Community with Open Educational Resources) was proposed by the Landmap team at Mimas shortly after Ben Ryan and I started working at Jorum, and, crucially, just as Jorum was embarking on a big programme of enhancements to make a great leap forward before the end of 2012. This blog post will review how ELOGeo has been a crucial part of Jorum’s emerging plans to provide a JISC-funded shared service, and the journey the Jorum team went on with the Landmap team to make the ELOGeo vision for a repository happen with Jorum as the back end. I should note that, at this moment, due to a number of factors, the ELOGeo repository is not quite ready to launch to the community: more on that further into this blog post. And watch this blog for future announcements re timeline and roll-out for ELOGeo.

Back to the start: early plans

When Gail Millin-Chalabi and Kamie Kitmitto of Landmap approached us with their draft project proposal, we were excited by the possibilities for trialling something that is very important to Jorum: the idea of providing community-specific interfaces, or windows, onto Jorum, tailored for the needs of that community. The two main areas we had been thinking about for this were institutional OER repositories, and subject communities. At the time, we were also in negotiation with the FE community in Scotland, represented by Scotland’s Colleges, about providing them with an OER repository as they moved from a closed model to completely open content. We are now fully engaged in a project to deliver this repository, known as Re:Source, in mid-November; it is our first foray into providing tailored services for a fee, as we move toward a business model of co-funding between JISC and external organisations. Showing that we can offer value to the UK HE and FE community is crucial to us as JISC moves forward into its new structure. And offering added value that communities and organisations will be willing to pay for is part of the vision.

Where we were in early 2012: Jorum and DSpace

As we developed the project proposal with Gail and Kamie, Ben had only just got his hands on Jorum’s DSpace platform, and neither of us were particularly familiar with the technical possibilities. At that point Jorum had been delivered on a very old version of DSpace for a relatively short period of time, with some modifications developed to customise DSpace as a repository for educational resources. None of the developers who had made this happen were in the team when we arrived, so there was some catching up to do. One thing we gathered was that DSpace has READ and WRITE APIs – perfect, you’d have thought, for building a customised front end with Jorum’s main database behind it. Not so, as it turned out.

There were many problems with the old version of DSpace, with the modifications, and with the APIs. In the meantime, we stepped into the middle of a project to deliver an entirely new front end web interface, built on Ruby on Rails, and intended to avoid the DSpace interface altogether and give folk a much more friendly and usable experience of Jorum. This was meant to be built on the READ API and possibly SWORD for ingest, and a good deal of work had already been achieved – but there were things that needed ironing out. Not to mention the requirement that was asserting itself very rapidly: for users and other stakeholders to be able to access stats and other data about the OERs in Jorum and their use. The old Jorum DSpace used an open source stats package that was limited in terms of what we needed to achieve.

New DSpace, new Jorum

With the support of our Jorum Steering Group and some very timely enhancements funding from JISC, we embarked on our Summer of Enhancements 2012. Knowing that we would soon need to deliver really good repository services for Scotland’s Colleges and ELOGeo, and that we needed much better ingest, discoverability, APIs and stats, we set out to take the necessary step of porting DSpace 1.5.2, with the mods needed for OER support, to the most recent version of DSpace – DSpace 1.8.2. We had the funding, but not yet the full team in place, so we packaged up the work and brought in some DSpace expertise from Cottage Labs, Enovation and @Mire to assist Ben. We have monitored the risk associated with the upgrade and kept our programme manager informed throughout; and have not compromised the service to our users at any stage. We know that the resulting service will be a superior offering, and well worth the effort involved.

We have a lot to deliver on this year, and, despite the relatively small size of ELOGeo, this project is absolutely central to our business case to our community:

Jorum can help you meet your subject community’s need to share, promote and access relevant OERs – we can save you the costs of supporting the software and providing the information management expertise in-house – we can give your academics access to the wealth of OERs shared by others via Jorum, while foregrounding your own resources, vocabularies, priorities.

How do we provide these community windows?

When we signed up for ELOGeo, we didn’t yet have a clear set of technical options or services to offer. The really clear use case to us, and to Scotland’s Colleges, Landmap, and other communities and organisations we talk to, is:

* We want a community view onto Jorum that allows access to both the community’s specific resources, and, when required by the user, access to all of Jorum’s resources.

* And we want people using Jorum’s own interface (including searching and browsing, using our APIs and feeds, and harvesting or aggregating from Jorum) to have access to Jorum’s core collection and the OERs provided by the communities that have their own interfaces.

At the start we discussed a few ideas, in varying shades of “lightweightedness”, for providing tailored windows onto content held within Jorum.

1) Lens onto Jorum: This is the simplest concept: putting something specific onto the end of Jorum’s URL and getting a page with the relevant content, e.g.: http://www.jorum.ac.uk/history giving a view on resources tagged or classified ‘history’, or http://www.jorum.ac.uk/universityofstrathclyde showing all Jorum resources submitted by people working at Strathclyde.

2) DSpace Community within Jorum: One of the features of moving to a recent version of Jorum is the ability to set up different Communities, within which you can have Collections. Actually, the old DSpace allowed Communities and Collections too- they just weren’t as configurable. With the new version of DSpace we can customise the DSpace interface for each Community, both in terms of theme and text and in terms of browse and search features and other aspects of the user experience.

3) Customised front end built on an API: This assumes a good working set of APIs and standard interfaces (e.g. SWORD) to allow search and deposit to be presented through a website completely designed from scratch. In theory, the ideal; in practice, a lot more resource intensive.

4) Two DSpace instances sharing content: Still not sure how viable this is, but the ELOGeo content was originally held on another DSpace instance at Nottingham University, so we thought about just having two separate instances, and making sure users could access content from both by regular mutual aggregation of the database contents.

Best laid plans: a change of technical direction mid-project

As can be seen in this blog’s introductory post, we originally went with option 3) Customised front end built on an API:

ELOGeo OER diagram

Building a GeoKnowledge Community at Mimas

Boy, were we wrong! Rest assured, thanks to the Jorum Paradata Enhancement Project with Cottage Labs, Jorum will, when we roll out the results of our Summer of Enhancements, have an excellent API for READ and for stats, and we will be providing some usable ingest interfaces, and we will be eating our own API dogfood in providing our own new front end, but at the start of the ELOGeo project, we quickly determined that we would move forward with implementing ELOGeo as a Community within Jorum, with its own URL, visual identity and interface.

As required, ELOGeo users will be able to:

(a) Deposit content through this interface into the ELOGeo Community collections – but this content will be available to anyone accessing Jorum content also;

(b) Browse and search, using their own community-specific vocabularies etc., the ELOGeo-specific content held in their Community collections; and

(c) Expand any search or browse to include all relevant Jorum content in their search results.

At the time of writing we don’t have a screenshot to show you of the ELOGeo interface, but to give you an idea of what a DSpace Community interface built onto Jorum might look like, here is a screenshot of the current Beta of Scotland’s Colleges’ Re:Source repository, which, as you can see, has sub-communities within this Community in Jorum. NB: this will be a lot more developed with faceted browse vocabularies and so on by the time they launch in November.

Re:Source Beta: screenshot example of customised DSpace Communities interface

Re:Source Beta: screenshot example of customised DSpace Communities interface

But it’s the end of the project: why haven’t we delivered?

Unfortunately, Landmap and hence the Breaking Down Barriers project lost their key developer (and the knowledge she had gained) at the start of the project; this led to overall delay and a project extension. Jorum received word of being funded to July 2013 in June and was able to begin the recruitment process for our own development team – but this was only partially successful and took us right to nearly the end of the project, with our first new developer not starting until mid-September.

Add to this the massive complexity of Jorum’s own in-house development task: pulling together the port to DSpace 1.8.2 with the requirements of Scotland’s Colleges, a few small-scale projects (of which ELOGeo was one), completing our own new front end, and developing and integrating a new stats dashboard, underlying search technology, and APIs, all managed by one Technical Manager. Our new Repository Application Developer started in mid-September, and we still have no Web Application Developer.

Technical lessons learned

To launch ELOGeo, we need to roll Jorum out on DSpace 1.8.2 in order to make the whole ELOGeo window-on-Jorum concept work. Technical complexities (I won’t say “unforeseen” because there are always technical complexities, you just can’t say in advance how onerous they will be) included:

  • working out how best to get the ELOGeo DSpace content out of its old repository into Jorum (see Landmap developer Bharti Gupta’s excellent technical post on this);
  • difficulties in porting the DSpace modifications needed to support the particular requirements of sharing OERs;
  • with some banality: difficulties in implementing different visual design themes into DSpace at the Community level. As you can see from the picture above, it is definitely possible to do- but it breaks stuff that then needs to be followed up and fixed.

We hope to follow up with a more detailed post on, in particular, the technical barriers and solutions to delivering an OER sharing services built on DSpace Communities. Keep an eye out for that one. NB: Landmap developer Will Standring has done some excellent posts on his technical achievements – really useful for anyone working with DSpace.

Meanwhile, I can say that I am thrilled, as Jorum Service Manager, that we have one method trialled and true for providing community and organisational windows onto content with Jorum – certainly the process will be a lot slicker going forward, and the wider HE and FE community will benefit from an increased wealth of open content for education. And we are happy to be openly sharing what we have learned so that others implementing educational content services using DSpace (or indeed, any community doing something similar) can move with ease through some of the barriers we encountered.

We welcome comments, questions and feedback, and would like to encourage you to get in touch with Jorum if you need further information on any of the above.

Database migration

Following my last post, we investigated the import/export mechanism, and discovered that the database dump from Nottingham University will be easier to import into the Postgres database at the test version of DSpace 1.7.2.

After much discussion and effort we managed to get the database dump in sql format which is around 2 gb in size. Following commands were executed to load the data and then the dspace user was updated to elogeo in <user directory>/elogeodspace/config/dspace.cfg file. The Apache and tomcat were stopped and restarted.

-bash-3.2$ createuser -U zzelogeo -d -P
Enter name of role to add: elogeo
Enter password for new role:
Enter it again:
Shall the new role be a superuser? (y/n) y
-bash-3.2$ createuser -U zzelogeo -d -P postgres
Enter password for new role:
Enter it again:
Shall the new role be a superuser? (y/n) y
-bash-3.2$ createdb -E UTF8 -T template0  dspacelm

-bash-3.2$ psql dspacelm < <user directory>/db_dump/elogeodump.sql

We noticed there is a memory issue, so we ran the following command, stopped and restarted Tomcat, but it did not help much.

-bash-3.2$ JAVA_OPTS=-Xmx2048m

This is a very common Java problem of heap space and we should be able to sort this out very soon. We will keep you posted on further developments.

Data plans

Amir from Nottingham University has shared some documentation to import and export the items for the DSpace repository.

Basically the preliminary plan is to try testing a bulk import on Mimas front, and then get the items from Nottingham. There is a need to understand this export/import mechanism well which is not documented very well.

Once we get a database dump, we will try to setup a mechanism to be in sync with any data updates/uploads so that both Nottingham and Mimas DSpace sites are mirroring each other.

Watch this space for more about the data migration.