Zia Consulting specializes in managing large clusters of Tungsten Automation Ephesoft Transact, specifically designed to process high volumes of documents while meeting demanding service level agreements. Through our extensive experience, we have identified key strategies to optimize cluster performance for scalability and efficiency.
To fully leverage these optimizations, it is crucial to understand the foundational architecture of an Ephesoft Transact cluster. This cluster consists of interconnected nodes that share a file location (i.e., Shared Folders) and a database. The file location stores documents awaiting processing and persistent state files during processing. The database serves as the central repository for workflow management and configuration settings.
In smaller implementations, a single server may suffice. However in larger instances, nodes can be designated as either processing (automation) or user interface (UI) nodes. A processing server handles all automated steps, while UI servers are used by system administrators or batch operators for processing exceptions. This separation of functionality ensures consistent response times, even under heavy loads.
Understanding architectural components is essential for maximizing cluster performance and scalability. Key optimizations to consider include:
- Optimizing network communication to the Shared Folders and the database.
- Increasing resources on the database if the CPU is frequently maxed out.
- Increasing the IOPS on the Shared Folders, either by provisioning a fast disk or adjusting the IOPS configuration and I/O throughput on some cloud services.
- Ensuring fast CPUs on the Processing Servers.
- Disabling hyperthreading on the Processing Servers, as it has been found to be suboptimal for Transact batch processing.
- Pinning resources for the servers if they are running in a virtualized environment.
- Profiling any customizations to the Ephesoft workflow (e.g., web service calls, extensive custom processing logic) to ensure no bottlenecks.
Splitting the cluster into UI and processing nodes allows each tier to be scaled independently. Based on the number of manual users working simultaneously, the UI tier can be scaled by adding additional servers or additional cores. The UI servers should utilize a load balancer or proxy server so users can access the UI through a shared URL. This setup also provides redundancy in case a UI server becomes unavailable. Similarly, processing nodes can be scaled to handle the required volume peaks and meet SLA and HA requirements.
Limitations of a cluster can arise due to shared resources like the database and Shared Folders. The cluster can become bottlenecked if traffic exceeds a certain threshold, typically around three processing servers. To overcome this limitation, you can scale the cluster vertically by adding more cores to the processing servers. However, if more processing power is still needed, you may consider implementing a cluster of clusters. This concept is outside of the normal supported platform and requires some custom elements.
A cluster of clusters is not a product feature so requires custom orchestration, prioritization, and reporting functionality. The orchestration and prioritization engine monitors the available bandwidth on each cluster and distributes batches, documents, and images to clusters with available capacity. It also includes logic for changing prioritization based on workload. Reporting provides information on orchestration and available batches for processing. Combined with individual cluster reporting databases, administrators can retrieve SLA and auditing information.
Another crucial requirement for a cluster of clusters is the ability to view all batches requiring operator review or validation. We recommend implementing a Universal Batch List that lists all batch instances across all clusters. This allows operators to work through batches and navigate to the appropriate cluster UI server for a particular batch while maintaining visibility of all batches, import dates, priority, and custom fields.
This design enables you to scale Ephesoft Transact to meet any volume and SLA requirements. Additional logic can be incorporated into queuing and prioritization to fulfill specific requirements. For instance, you could add logic to disable processing servers and route new batches only to a single cluster when underutilized in a cloud environment to optimize server costs. If you have any questions or would like to discuss this further, please feel free to reach out, and we can arrange a call to discuss this in more detail.
ABOUT THE AUTHOR
Pat Myers, EVP and Co-Founder
Pat has 20+ years of software architecture and engineering experience across many industries. He is an Enterprise Content Management and Capture expert and an authority on Robotics Process Automation. Pat’s experience ranges from application development and consulting to sales and business development. He is an author of two editions of Intelligent Document Capture with Ephesoft, and co-developed the original certification training. Over the years, he has taught Ephesoft certification courses internationally. In 2021, he was inducted into the Inc. 5000 Master’s Group. Outside of work, Pat enjoys family time, snowboarding, biking, camping, and making great memories with friends.