Updating DSpace from another instance

Understanding where DSpace content is stored

DSpace repository content consists of a database and files known as assets. The database acts as a fast index and the asset object data is stored as files in their native format but without an extension.

In addition to importing the SQL dump into the database obtained from the University of Nottingham, we need to copy over the corresponding assets. Should there be a delay in obtaining the SQL dump and the assets it is possible that they may not be synchronised. It may be worth taking the source site down to prevent editing whilst exporting the content. The asset files contain item content but also other information such as licence agreements. Although we took the assets at the same time as the SQL dump, when cross checking, we found the number of asset folders was greater than the number of items in the SQL dump.

DSpace Installation Key Locations

There are three important locations in our DSpace installation:

  1. The source release directory.
  2. The DSpace build directory and
  3. The Tomcat directory.

1. DSpace Source directory

The location of the downloaded DSpace source code. After unpacking the full source download this folder will be named after the DSpace version number e.g. dspace-1.7.2-src-release/. Most changes to the DSpace configuration require that it is rebuilt.

Any changes to the working web application therefore must also be made in this location.

2. DSpace Build directory

The DSpace build directory is populated when DSpace is built or rebuilt from the source code. The DSpace Web application is output to the location specified in [DSpace source directory]/dspace/config/dspace.cfg. This can be thought of as being the compiled working application. The asset files with item content are stored here.

3. Tomcat directory

Named after the version of Apache Tomcat e.g. apache-tomcat-7.0.11. Tomcat is the Web Server that provides a Web interface to DSpace. In our implementation, the Web Application in the build folder is copied into the Tomcat webapps directory. (Copy the folder [DSpace build directory]/webapps/xmlui to [Tomcat directory]/webapps/). The XMLUI Web application interacts with the DSpace application in the build directory.

Tomcat can be configured to point directly at [DSpace Build Folder]/webapps. This would seem simpler than having the copy of webapps – but it does require changing the default Tomcat configuration.

Key Files

Asset Files

The ELOGeo asset files go under the [DSpace build directory]/assetstore/. Inside will be a number of numeric directories containing the repository content. When DSpace is rebuilt this content will not be deleted. It is not necessary nor a good idea to copy back the files to the [DSpace Source Directory] because the files in the Build Directory are the current live files.

Theme Files

Several themes are available for DSPace. The ELOGeo repository uses a variation on the Mirage theme. This determines the look and feel of the DSpace interface using XSL, JS and CSS files to control the layout and styling of the HTML markup. If any changes are made to the live Web application, it is important that they be copied back to the Source directory so they will be incorporated in future builds. The theme files will be present in thre locations:

  1.  Source Code: [DSpace source release directory]/dspace-xmlui/dspace-xmlui-webapp/src/main/webapp/themes
  2. Application: [DSpace build directory]/webapps/xmlui/themes
  3. Tomcat copy of application: [Tomcat directory]/webapps/xmlui/themes

To use a specific theme, in our case “elogeo”, the configuration file needs to be amended:

DSpace source directory]/dspace/config/xmlui.conf

This required theme needs to be the last entry in the <themes> element.

<theme name=”[Theme name]” regex=”.*” path=”[theme directory name]/” />

DSpace will need to be rebuilt for the changes to take effect. It is possible to edit the theme in use without rebuilding DSpace. This will be outlined in a later post on editing themes.

This work has enabled us to gain an understanding of how and where DSpace stored content and how the layout of its interface can be customised.

This demonstrates that there is a relatively easy way of copying one instance to another using the database dump, asset files and some configuration changes.

The relationship between the three file locations isn’t at first obvious. It is crucial to understand which files need to be copied back to the source code and which files shouldn’t be copied back to the source code so they can be incorporated into future builds.

Advertisements

Configuring Apache Tomcat and Building DSpace

Apache Tomcat – Web Access

Web access to DSpace is provided by Apache Tomcat via the XMLUI web application (also known as Manakin). This interface is one of those provided with DSpace. It is based on the Apache Cocoon architecture. To configure this we need to set up the Tomcat host and connector as follows:

The configuration file for Apache Tomcat is located in the Tomcat directory (patterned after the version number) in our case apache-tomcat-7.0.11.

In our case the domain is elo-geo.net but on a local installation this could be localhost.

[Tomcat Directory]/conf/server.xml

<Host name="[domain]"  appBase="webapps"…

We have pointed the server to the location of DSpace Web application (XMLUI) which we have copied to [Tomcat Directory]/webapps. This could equally point directly at our DSpace build directory like this:

<Host name="[domain]"  appBase="[dspace build directory]/webapps"…

Our hosting environment is shared within our organisation and this means that port numbers can be in use by other applications. The default port for Apache Tomcat is 8080. There are a number of Apache Tomcat instances running in this environment and port 8080 is often is use. The practical issue with this is that Tomcat will not always start up properly because the operating system cannot assign the specified port. We have changed this to 8089 for this installation of DSpace to avoid clashes.

The port number is defined in the server.xml configuration file as a connector. We amended the entry for the default 8080 port:

<Connector port="8089" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />

If there is a suspicion that a port is in use, the traffic can be viewed on UNIX with the following command.

netstat -an | grep [port number]

Building and Rebuilding DSpace

The building of DSpace includes both Ant and Maven processes and finally copying the Webapps to the Tomcat directory and restarting Tomcat.

apache-tomcat-7.0.11/bin/shutdown.sh
#Build Maven Process
cd [DSpace source directory]/dspace
mvn package

#Build 2 Ant Process
cd target/dspace-1.7.2-build.dir/
ant -Dconfig=config/dspace.cfg update

#Back out to root directory
cd ../../../../
#Copy to Tomcat
cp -R [DSpace Build directory]/webapps/xmlui [Tomcat directory]/webapps

#Restart Tomcat:
[Tomcat-directory]/bin/shutdown.sh
[Tomcat-directory]/bin/startup.sh

#Check port usage
netstat -an | grep 8089

The build process will time stamp and retain some directories as a backup.

The rebuilding process is clear in the DSpace documentation. Assuming a clean installation and availability of the default Apache Tomcat port of 8080 this will work without problems. When a hardware environment is used that has a number of other users, Java applications and even instances of Tomcat there can be problems starting DSpace. Since the port number issue relates to the multiple users of Tomcat it is not really a DSpace issue, but nevertheless a common problem.

Database migration

Following my last post, we investigated the import/export mechanism, and discovered that the database dump from Nottingham University will be easier to import into the Postgres database at the test version of DSpace 1.7.2.

After much discussion and effort we managed to get the database dump in sql format which is around 2 gb in size. Following commands were executed to load the data and then the dspace user was updated to elogeo in <user directory>/elogeodspace/config/dspace.cfg file. The Apache and tomcat were stopped and restarted.

-bash-3.2$ createuser -U zzelogeo -d -P
Enter name of role to add: elogeo
Enter password for new role:
Enter it again:
Shall the new role be a superuser? (y/n) y
-bash-3.2$ createuser -U zzelogeo -d -P postgres
Enter password for new role:
Enter it again:
Shall the new role be a superuser? (y/n) y
-bash-3.2$ createdb -E UTF8 -T template0  dspacelm

-bash-3.2$ psql dspacelm < <user directory>/db_dump/elogeodump.sql

We noticed there is a memory issue, so we ran the following command, stopped and restarted Tomcat, but it did not help much.

-bash-3.2$ JAVA_OPTS=-Xmx2048m

This is a very common Java problem of heap space and we should be able to sort this out very soon. We will keep you posted on further developments.