This is part two of a series on Alfresco Content Services Production Deployment. Part one of this series reviewed the methods that Hyland makes available for installing Alfresco. This blog considers the deployment of a single Alfresco Content Services (ACS) instance, with required platform components, to a single production host using Docker Compose. A future article will address the use of Docker Compose to deploy several instances to multiple hosts in a cluster.
Much of the information covered here applies to bespoke containers. However, the specific focus is on using the containers provided by Hyland/Alfresco. Therefore, use of bespoke containers is out of scope for this article.
The information in this article is applicable to the Starter Edition or Business Edition of Alfresco Content Services. It is also a starting point for running Enterprise Edition. Anyone interested in running Community Edition in production will find this helpful. However, it has not been validated against community releases.
Remember that it is not recommended to run production Alfresco Content Services on Windows hosts or ARM-based Linux because the images provided by Hyland/Alfresco are based on Intel Linux. In theory, it should be possible to rebuild the containers based on Windows/ARM base images. Notably, it is not recommended to do this for production use until it has been fully tested.
Before we dive into specifics of building a docker-compose.yaml file for Alfresco Content Services, let’s consider a compose file for a generic java service:
version 2.2 services: java-server: image: example/java-service:12.3.4 # 1 - Limit to 4 hosts cpus: 4.0 # 2 - Limit how much host memory the container can use mem_limit: 4g # 3 - If the service crashes or we reboot, restart! restart: always environment: APP_PORT: 8080 # 4 - Limit JVM to a % of the container RAM JAVA_OPTS: " -server -XX:MinRAMPercentage=80 -XX:MaxRAMPercentage=80" ports: # 5 - Bind to port on loopback to control access "127.0.0.1:8080:8080" volumes: # 6 - Mount persistent storage into container - ${PROJ_DIR}/java-server/data:/container/data:z # 6 - Mount a configuration file into container - ${PROJ_DIR}/config/server.yml:/container/config/server.yml:ro networks: # 7 - Use a named network to control the subnet - project_bridge networks: project_bridge: ipam: driver: default config: subnet: 192.168.0.0/24 |
This specifies the older compose file format version 2.2, which is fully compatible with the most recent versions of Docker Compose. Some of the resource limiting directives used in more recent versions of the compose file format require that you use Docker Swarm as your deployment orchestrator. By using the older file format, the limits will apply when we deploy using Docker without Swarm orchestration.
TIP: There is a –compatibility flag to docker compose up that allows limits specified in the newer 3.x file format to be applied without using Docker Swarm. It is unclear which limits are supported and there are multiple references that warn against using this flag in production. The 2.x file format is best when not using Docker Swarm to orchestrate your deployment. |
This example demonstrates a number of techniques that can be used when building a compose file for running Alfresco Content Services in production. Considering the resource limits in this example, imagine it is run on a machine with many cores and a lot of RAM. Below are some things to consider:
- Limiting the container to four of the host’s available CPU cores protects against the container using cores needed for other services.
- Limiting the memory available to the container to 4g stops a memory hungry container from taking resources away from other containers.
- Setting a service restart policy to “always” makes sure that Docker will restart the associated container if it crashes. When Docker is set up as a service that starts on boot, this setting also means the container will be started automatically when the server is rebooted.
- Specifying the -XX:MinRAMPercentage=80 and -XX:MaxRAMPercentage=80 java options, the JVM is limited to using 80% of the container’s memory (i.e., about 3.2G), leaving a little memory for demands outside the JVM memory space.
- Exporting a port by specifying “127.0.0.1:8080:8080” makes the service available via the Docker host’s loopback interface without exposing the port to other hosts on the network. Use something like HAProxy, NGINX, Apache httpd, or even a local iptables firewall to expose the service on the network in a controlled way.
- Utilizing volumes provides the container with persistent data storage, and customized configuration by adding or replacing the server.yml file.
- Using a named network assigns containers to a specific subnet. This allows a proxy/firewall running directly on the host to control traffic from/to the container.
Assigning a specific IP address to each container would enable more fine-grained control. This makes it possible to forgo exporting container ports to the loopback interface and means the host’s proxy/firewall rules can be set up to work directly with each container. This is a more advanced/complex technique that will not be necessary for most deployments and hence is outside of the scope of this article.
Resources
The memory and CPU limits specified in the ACS Docker Compose files provided by Hyland/Alfresco are intended for a development environment. Additionally, these files specify a value for MinRAMPercentage that differs from MaxRamPercentage. In general, it is a best practice for production Java servers to set the minimum memory equal to the maximum memory so that all allocation can happen at startup.
The resource needs for each deployment vary dramatically depending on the use of the system. A small departmental deployment might start with 16 cores and 32G of RAM. In this case, start by setting up the compose file with resource limits described below. Monitor the system during development, testing, and UAT. Then, adjust the machine sizing and resource allocations as necessary for the production rollout. After release, continue to monitor these resources in production in case further adjustment is needed.
Here is a sample resource allocation for a small departmental production deployment:
Container |
CPU Cores |
Container Memory |
JVM Memory |
Content Services |
4 * |
8G |
80% |
Search Services |
4 |
8G |
35% ** |
Share |
† |
512M |
80% |
Digital Workspace |
† |
512M |
† |
ActiveMQ |
† |
1G |
† |
Transform Router |
† |
512M |
80% |
Transform Core AIO |
4 |
8G |
80% |
Shared File Store |
† |
512M |
80% |
* When running on a host that has more cores than the Alfresco license allows, it is important to specify an appropriate CPU limit to stay in compliance with the license agreement.
** Solr uses JVM memory for caches and operating system memory, via memory mapped IO, for accessing the index. In an ideal world, the OS memory would hold the full index and the JVM memory would hold all of the caches in memory. Getting the memory allocation and balance between JVM and OS memory right is an art. Start by allocating only 35% of the container memory to the JVM, and leave the remaining 65% for mapping the index into OS memory. It is necessary to specify the memory allocation for this container slightly different than for the other containers because of inconsistencies in how the various containers are built. To do this, use the following environment variable: SOLR_JAVA_MEM=”-XX:MinRAMPPercentage=35 -XX:MaxRAMPercentage=35″ rather than using JAVA_OPTS.
† Some containers do not need specific limits. Most of the time these containers need minimal resources and specifying precise fractions can be a burden. Not specifying limits allows the host operating system to control the resource allocation and balance.
Volumes
Volumes are mounted into containers for two main reasons: persistence and configuration/extension.
When an image is run, an ephemeral copy-on-write filesystem is allocated by Docker for use by the container. Volumes allow data stored in certain locations in the container to persist when the container terminates. This allows important data to be available when restarting the container, or starting a new version of the container. For example, volumes are used to persist database files for a database container. Without volumes, any data written to the database is lost when the container is restarted.
NOTE: When configuring a clustered environment, some persistent storage must also be readable and writable by multiple containers on multiple hosts. Watch for a subsequent part of this series for more detail. |
The second reason to mount volumes into containers is to introduce or replace configuration files or scripts in the container when it is started.
For example, a Tomcat container might need different configurations in the server.xml in different instances of the container created from the same image. For deployments with several environments (e.g. production, pre-production, development), volumes may be used to set environment specific configuration.
If the image being used does not expose the ability to make necessary configurations via environment variables, there are several techniques that can help:
- Extend the container to hardcode new values.
- Add support for new environment variables via a new or extended startup script.
- Use the provided image, but mount over configuration or scripts contained in the image with something more specific to the use case.
There are many places where the above techniques can adjust the Alfresco configuration for an optimized production environment. See the table below for some of the most common and useful mount points.
Remember there are also times to extend the vendor provided container for Content Services to install customizations, add appropriate database drivers, install custom certificates, etc.
Container | Mount | Notes |
Content Services | /usr/local/tomcat/alf_data | Remember that the storage mounted while clustering must be cross mountable between each of the hosts running Content Services containers. |
/usr/local/tomcat/webapps/alfresco/WEB-INF/classes/alfresco/extension/license/<your license file>.lic | It can be mounted even though the license folder doesn’t exist by default in the image. | |
/usr/local/tomcat/lib/{database-jdbc-driver}.jar | When not using Postgres, install the correct JDBC driver to support the database that is being used. | |
/usr/local/tomcat/webapps/alfresco/WEB-INF/lib/ | Mount JARs that will be used by Content Services here. | |
/usr/local/tomcat/amps/{individual}.amp files or collect all the default amps into a directory add your custom amps and mount the whole directory over /usr/local/tomcat/amps. |
Ideally, customizations are packaged as JARs. If they have to be packaged as AMPs, prefer creating a new image that extends the one provided by Hyland/Alfresco. In the event you must mount AMPs into the container, inject a startup script that does roughly this: java -jar /usr/local/tomcat/alfresco-mmt/alfresco-mmt*.jar install /usr/local/tomcat/amps /usr/local/tomcat/webapps/alfresco -directory -nobackup -force && catalina.sh run -security and replace the default CMD in the alfresco compose service with a call to the script. This will affect the startup time as the apply AMPs process runs each time the container is started. |
|
/usr/local/tomcat/shared/classes/alfresco /extension/custom-log4j.properties |
Extended the log4j configuration. Increase log levels for troubleshooting. Decrease logging to control log size. | |
/usr/local/tomcat/shared/classes/alfresco-global.properties | Generally, it helps to use the JAVA_OPTS environment variable to pass in global properties. Replacing the entire file may be easier when setting a large number of properties. | |
Search Services ** | /opt/alfresco-search-services/data | It is important to mount on high performance storage and not run Antivirus, or other similar tools, on the storage that could end up locking index files while they are in use. |
/opt/alfresco-search-services/solr/bin/search_config_setup.sh |
The alfresco and archive cores are not in the Search Services image. They are produced the first time the container starts by a script that copies some template configuration into Solr core specific subdirectories. Mounting in replacement configuration files for the bootstrap process is challenging due to how images are built. Using an edited version of this startup script may be an option. |
|
Share | /usr/local/tomcat/conf/server.xml |
When proxying https into this container, add and configure the org.apache.catalina.valves.RemoteIpValve valve in this file. If the edits to this file are stable between environments, prefer extending the image provided by Hyland. |
/usr/local/tomcat/shared/classes/alfresco/web-extension/share-config-custom.xml |
There are many Share configurations that can be made through this file. It is typical to extend the Share image to perform these configurations. |
|
/usr/local/tomcat/shared/classes/alfresco/web-extension/custom-slingshot-application-context.xml | Edit this configuration to support Share clustering. | |
ActiveMQ | /opt/activemq/data | When talking about clustering, it is important that the storage mounted here is cross mountable between each of the hosts running Content Services containers. |
Shared File Store | /tmp/Alfresco |
Note that many sources append /sfs to the mount path provided here. Those are wrong and this is the correct mount point for Shared File Store as recently as ACS 7.1.0.1. When talking about clustering, it is important the storage mounted here is cross mountable between each of the hosts running Shared File Store containers. |
NOTE: There are two items to be aware of with Search Services:
First, in recent releases of Search Services, possibly starting with 2.0.2, the ability to specify the backup location for indexes via a REST call was removed for security reasons. This means a backup location configured in the admin console will not be honored. Unfortunately the default backup location is /opt/alfresco-search-services, which is not a good place to mount a volume. You can arrange for the backups to be placed into a dedicated backup directory where a volume can be mounted by adding the following two lines right before the last (bash -c “$@”) line in search_config_setup.sh in the Search Services container:
With the above code in the configuration setup script, you will be able to mount a volume to /opt/alfresco-search-services/solrbackup. Solr backups will then be placed into your volume. Second, the Search Services images have a feature called the suggester that is enabled by default. This supports autocompletion on the search detail screen in Share. This is a very heavy component that adds a small amount of value to the system. In almost every case, this feature should be disabled by passing -Dsolr.suggester.enabled=false in the JAVA_OPTS for the Content Services container, and by adding the following right before the last (bash -c “$@”) line in search_config_setup.sh in the Search Services container:
|
The best practice is for a dedicated database engineering team to provide the database. As such, we do not recommend running the database as a container managed by the application team for production.
User namespace remapping
By default, Docker does not perform user namespace remapping. It is recommended to use this security feature for production deployments to protect against attackers escaping container isolation and inadvertently granting access to mounted data.
When a container has a process running as root (UID 0), the process actually runs as root on the host. If it is able to break out of the container jail created by Docker, the process has root on the host.
When configuring the remapping to add say 1000000 to the container UID on the host, a container with a process running as root will run as UID 1000000 on the host. Assuming there is no such UID in your environment, there is much less concern about that process breaking out of jail.
When mounting external storage into a container, it is important that you make sure the UID and GID on the directories and files are appropriately readable and writable by the user running the container process.
When mounting a volume into a container without user namespace remapping, the UID and GID in the container and on the storage being mounted should be the same. For example, when the container process runs as UID 1000, you must configure the mounted storage to be readable/writable by UID 1000. In the event there is a user with this UID in the organization, that user effectively has permission to make changes to the storage.
If you configure user namespace remapping as described above, you would configure the mounted storage to be readable/writeable by UID 1001000 instead of UID 1000. With this change, you don’t take the risk of someone in the organization unintentionally gaining permissions to the storage as long as the UID 1001000 remains unused by a real user.
It is very unlikely that a process will break out of the Docker container jail, or that a user with the same UID as the container will be able to mount the storage and make undesirable changes. However, it is a security incident if either of these things happens. It’s better to address this possibility when planning the production deployments using Docker Compose by enabling user namespace remapping from the start.
Ingress/Proxy/Firewall
It can be convenient to run an ingress service as a container when an environment has a single ACS instance deployed. The compose files provided by Hyland take this approach.
NOTE: Some deployments run into issues with CSRF and proxied HTTPS traffic when setting up this type of ingress. Unfortunately, there is no single solution for these items. There may be a little extra effort to track items down and configure around them. |
For clustered environments, these capabilities might be handled by a combination of dedicated network devices and/or some software running on the Docker hosts. The configuration of these components tends to vary wildly depending on the networking and security infrastructure and requirements for each deployment.
When implementing single-sign-on (SSO), it is typical to perform this integration in Apache httpd and forward authenticated requests on to the application containers. Similarly, we often use Apache httpd to terminate SSL. SSL termination can also be handled by HAProxy, NGNIX, or even by a dedicated network device such as a load balancer, proxy, router, or firewall.
When using Apache httpd, consider using mod_jk to route traffic to web containers via AJP rather than proxying HTTP/S traffic directly.
NOTE: Be aware that using the Alfresco Identity Service and/or the SAML Module may add some complexity to how ingress is set up. |
Even when using Apache httpd or NGINX for HTTP/S ingress, HAProxy may be needed to handle routing of non-HTTP/S traffic such as FTP, CIFS, etc.
Organizations that have modest ingress needs, without the complexity of SSL or SSO, may choose to run an NGINX container that accepts HTTP/S traffic and forwards to the correct container based on path. In this case, exposing ports from the ingress container is all that is needed. The remaining containers don’t need to expose ports.
TIP: Use SSH tunneling when you need to access container ports that are not exposed to the host, for administration purposes. You might do this to access the Solr or ActiveMQ consoles that are running in containers on a remote host from your desktop. |
Hyland provides some good documentation on using Apache httpd with SSL and mod_jk for ACS.
License and customization
There are situations when it may be preferable to extend a provided container to inject customizations, configurations, database JDBC JAR files, or even customized scripts. There are trade-offs between extending the provided images and using volumes to amend the default images. Largely, the decision comes down to complexity.
When extending images, it is highly recommended to build them once and share them via a private Docker registry, or share tar exports of the images via some secure storage system. It is a best practice for a single image to be validated in each progressive environment (i.e., development, testing, staging, and production), rather than using different images per environment.
It often makes sense to create new Docker images that extend the ones provided by Hyland during the deployment of customizations packaged as AMPs. This allows for the application of AMPs to the Repository image and the Share image ahead of time. Then, the containers can start the applications right away, without having to go through the apply-amps process each time they are started.
As an alternative to copying your AMPs into the containers and applying them as part of building extended images, you could copy in WAR files built with the SDK that have the AMPs already applied. The images provided by Hyland have exploded WAR directories, and don’t have the compressed WAR files. As a result, Tomcat doesn’t have to explode the WAR files each time the containers are started. It is important to delete the old exploded WAR directories when loading new compressed WARs. It is highly recommended to explode the new WARs to save time during Tomcat startup.
It often makes sense to extend the Search Services image in order to apply configuration that is common across all of your environments.
External database
As outlined above, it is highly recommended that a dedicated database team provisions and manages the database for Hyland/Alfresco products. It is important for the team deploying Alfresco Content Services to work with the database team to provide a supported database version. Additionally the following items should be coordinated:
- Provide sufficient system resources (i.e., memory, CPU, storage) to support the ACS use cases.
- Have database administrators implement product-specific architectures involving mirroring, clustering, or failover to address performance and/or availability business needs.
- Coordinate database backups with content store backups.
- Use a database that can handle enough concurrent connections to support the volume of sessions from ACS.
- Gather statistics and perform other database maintenance tasks periodically, ideally during expected low usage times for ACS.
ABOUT THE AUTHOR
Bindu has 25+ years of consulting experience with enterprise system integration. As the content lead, he provides technical and architectural reviews and guidance. Bindu supports project teams and runs the Alfresco support practice, overseeing issues across multiple customers. Additionally, he’s active in the Alfresco community, including being a member of the Order of the Bees, where he is a contributor to support tools projects. He’s the creator of the Alfresco Yeoman generator. Bindu is a tea connoisseur, and interested in hobby robotics and automation.