Select Page

Ensuring a Healthy Solr Index with Solr FIX

Maintaining a healthy Solr index is critical for ensuring your users can easily find the documents they need. When transactions fail to index or search inconsistencies occur, Solr troubleshooting tools step in—chief among them is Solr FIX, a powerful repair utility.

Solr FIX isn’t just for emergencies; it’s a proactive solution for reindexing unindexed transactions, cleaning up duplicates, and ensuring your search index accurately reflects the state of your database. If you’ve ever seen the dreaded “unindexed transactions” error in the Solr logs or experienced search anomalies, now is the time to add Solr FIX to your toolkit.

This guide covers how to use Solr FIX effectively, covering its syntax, parameters, best practices and common pitfalls to avoid.

Understanding the Solr FIX Action in Alfresco

The Solr FIX action is a troubleshooting tool in Alfresco. It is designed to address unindexed or failed transactions. As outlined in Alfresco’s documentation, Solr FIX is used to:

“Repair an unindexed or failed transaction (as identified by the REPORT option in the Unindexed Solr Transactions section) … The FIX parameter compares the database with the index and identifies any missing or duplicate transactions. It then updates the index by either adding or removing transactions.”

Notably, Solr FIX does nothing without proper arguments, making it essential to understand them. Additionally, the term “Unindexed” can be confusing in Alfresco, as it describes two different situations. These situations include:

  1. Normal Unindexed Nodes: Some nodes, such as thumbnails, are deliberately not indexed by Solr. These are expected to show up under the “Index unindexed count” in Solr reports. This blog does not address such cases, as they represent normal behavior.
  2. Unindexed Transactions and Nodes: These transactions fail to index due to an error and appear under the “Index error count.” In this article, “unindexed” specifically refers to transactions and their associated nodes that could not be indexed due to a transient error.

Syntax of the Solr FIX Call

The following syntax outlines the structure of a Solr FIX API call and its parameters:

http://you_solr_host:port/solr/admin/cores?

action=FIX

&core=alfresco|archive (Mandatory)

&maxScheduledTransactions=500 (Optional. Default is 500)

&fromTxCommitTime=startEpochTimeMs (Optional)

&toTxCommitTime=endEpochTimeMs (Optional)

&dryRun=true|false (Optional. Default is ‘true’)

Setting the Correct Solr FIX Arguments

As mentioned earlier, setting the arguments correctly is crucial. If they are incorrect, you will either encounter an error or Solr will take no action.

For example, suppose some uploaded documents are missing from Alfresco search results. After investigating, we confirm the issue occurred because the Transform Service was down. We estimate that approximately 1,000 documents uploaded in November 2024 are still not appearing in search queries. In this case, the Solr FIX request would be structured as follows:

http://you_solr_host:port/solr/admin/cores?

action=FIX

&core=alfresco

&maxScheduledTransactions=1200

&fromTxCommitTime=1730419200000 

&toTxCommitTime=1733011200000 

&dryRun=false

A closer look at the argument used:

  • http://you_solr_host:port, use the proper protocol (http or https), your Solr host and the port (typically 8983).
  • core=alfresco. We will normally want to reindex documents in the main repository. The other core, archive, is for the documents in the Alfresco trashcan.
  • maxScheduledTransactions=1200. By default, Solr will start inspecting transactions until reaching this limit. We use 1200, just in case there are more documents than expected.
  • fromTxCommitTime=1730419200000. Using the handy online tool to convert date/times to epochs, we set the range to start on November 1, 2024 at the 0:00 hrs GMT.
  • toTxCommitTime=1733011200000. This is the end of the period we want to scan to: December 1, 2024 at 0:00 hours GMT.
  • dryRun=false. If you want Solr to actually fix those nodes, you must set this argument to false. When the Solr FIX tool runs in dryRun=true mode (the default), it will report problematic transactions, but will not fix them.

If you don’t include fromTxCommitTime and toTxCommitTime, Solr FIX will scan the whole index until it reaches the number set by maxScheduledTransactions to fix. The default is 500.

Understanding Epoch Time in Solr FIX

Epoch time is used to specify a date range in Solr FIX.  Let’s clarify what it means and why it matters. Epoch time represents a specific date and time as the number of milliseconds elapsed since January 1, 1970 (the Unix epoch). This format is widely used in Linux, Java, and other programming environments because it simplifies date calculations and comparisons.When using fromTxCommitTime and toTxCommitTime, you define the starting and ending date/time for the transactions you want to fix in the Solr index. Both arguments must be provided in milliseconds. For example, to specify January 1, 2024, you would use 1704067200000.

For further reading, see:

Summary of the Solr FIX arguments

Alfresco’s documentation doesn’t detail all the arguments for the FIX action. Here is a summary of the parameters you should know.

Argument

Value(s)

Notes

action

FIX

Required

core

alfresco or archive

Required. This parameter must be set to either alfresco or archive. It’s usually set to alfresco.

maxScheduledTransactions

500

Optional. FIX will stop searching for unindexed items upon hitting this limit. The default is 500

fromTxCommitTime and toTxCommitTime

Epoch time (ms)

Optional. Reindex transactions within a date/time range from fromTxCommitTime to toTxCommitTime

dryRun

true or false

Optional. Determines if changes are applied.

The default is true, which means no transactions will be fixed.

If false, Solr will try to fix any problematic transactions listed on the results.

 

What to Expect After Hitting Enter

This is what happens when you enter the Solr FIX action in your browser:

  1. Initial Retrieval
    • Solr retrieves the list of transactions and their status. For example, it identifies transactions in Solr but not in the Alfresco database.
  2. Report the Results
    • The results are shown in XML format on your browser (see “Expected Outputs” below.). This process may take some time, so be patient and avoid refreshing the page if you don’t see a response pop up right away.
    • After your browser displays the results, it’s important to note that Solr has not yet performed any fixes. At this stage, it’s merely reporting the list of transactions and ACL change sets to be fixed.
    • If you didn’t set the argument dryRun=false, no further processing will occur, and no nodes or transactions will be fixed.
  3. Queuing for Processing
    • The list of transactions is then queued for resolution according to Solr’s regular tracking schedule. The processing time depends on the number of transactions, but it runs quietly in the background with minimal disruption to users.
  4. Monitoring Progress
    • FIX runs quietly in the background, and you can’t monitor its progress as no built-in progress tracker is provided, so it may look like nothing is happening after you get the output. Since it can take some time to locate and resolve the unindexed transactions, you can resort to a few options to monitor progress. We will discuss them later in this article.

Interpreting Solr FIX Responses

A Healthy Index

For a healthy index, the FIX action will return a response similar to the one below. While this output indicates that the repository is in good condition, it’s not uncommon for Alfresco repositories to contain documents that cannot be indexed. This may happen due to factors such as excessive file size, encryption, corruption, or unsupported features that Alfresco cannot process.

<response>

  <lst name=”responseHeader”>

    <int name=”status”>0</int>

    <int name=”QTime”>1314</int>

  </lst>

  <lst name=”action”>

    <lst name=”alfresco”>

      <lst name=”txToReindex”>

        <lst name=”txInIndexNotInDb”/>

        <lst name=”duplicatedTxInIndex”/>

        <lst name=”missingTxInIndex”/>

      </lst>

      <lst name=”aclChangeSetToReindex”>

        <lst name=”aclTxInIndexNotInDb”/>

        <lst name=”duplicatedAclTxInIndex”/>

        <lst name=”missingAclTxInIndex”/>

      </lst>

    </lst>

    <bool name=”dryRun”>false</bool>

    <int name=”maxScheduledTransactions”>500</int>

    <str name=”status”>scheduled</str>

  </lst>

</response>

Errors in the Index

The following output shows errors in the index. Several transactions need to be corrected. Notice the transaction listed under <lst name=”missingTxInIndex”>:

<response>

  <lst name=”responseHeader”>

    <int name=”status”>0</int>

    <int name=”QTime”>38555</int>

  </lst>

  <lst name=”action”>

    <lst name=”alfresco”>

      <lst name=”txToReindex”>

        <lst name=”txInIndexNotInDb” />

        <lst name=”duplicatedTxInIndex”/>

        <lst name=”missingTxInIndex”>

          <int name=”81723277″>1</int>

          <int name=”81723278″>1</int>

          <int name=”81723282″>1</int>

          <int name=”81723283″>1</int>

        </lst>

      </lst>

      <lst name=”aclChangeSetToReindex”>

        <lst name=”aclTxInIndexNotInDb”/>

        <lst name=”duplicatedAclTxInIndex”/>

        <lst name=”missingAclTxInIndex”>

          <int name=”1908″>4</int>

        </lst>

      </lst>

    </lst>

    <bool name=”dryRun”>false</bool>

    <long name=”fromTxCommitTime”>1666483200000</long>

    <long name=”toTxCommitTime”>1668634005000</long>

    <int name=”maxScheduledTransactions”>2000</int>

    <str name=”status”>scheduled</str>

  </lst>

</response>

Remember that it’s okay to have a few unindexed transactions. However, more research is needed if most of the transactions in a given date range are missing from the index.

Note! Pay attention to the status “scheduled”, which should appear close to the end of the response:

    <str name=”status”>scheduled</str>

If it is missing, it means Solr did not accept the request to fix the transactions and did not schedule it to run. It is also possible that the parameters were not set correctly or that dryRun was not set to false.

Monitoring progress

As mentioned before, FIX does not provide a built-in progress tracker. It may take some time to locate and resolve the unindexed transactions, so you may want to try the following options to monitor its progress:

  • Sometimes, the transactions to be processed appear as pending transactions on the Search section of the Alfresco administrative console. If that’s the case, you can refresh the page to monitor progress.

     

  • Monitor the CPU usage of the Solr process to observe its activity level. If Solr is significantly more active than usual, it’s probably still fixing transactions.
  • Set the argument dryRun=true (or just remove it) and run the FIX request again. Refresh the browser to monitor any changes in the number of reported transactions. For the transactions that can be fixed, their number will go down in the output as progress is made.
  • Use the Solr action REPORT to monitor the “Index error count”.

 

Best Practices for Using Solr FIX

Use fromTxCommitTime and toTxCommitTime

For larger repositories, it’s essential to narrow down the transactions to be fixed by specifying a date and time range. This helps improve efficiency and prevents unnecessary processing.

Testing with Dry Runs

The dryRun=true setting is enabled by default, meaning no changes will be made. While this is useful for testing, it’s important to remember to switch to dryRun=false when ready to fix the transactions.

Common Pitfalls to Avoid in Solr FIX

 

Not Specifying the Core

Always include core=alfresco or core=archive in your API call to ensure Solr FIX applies to the correct index.

Forgetting to Set dryRun to False

If dryRun=false is not explicitly set, the request will only simulate the fix without making any changes.

Not Resolving the Root Cause Before Running FIX

Solr FIX simply attempts to reindex the document. If the underlying issue that prevented indexing—such as a system error, missing dependencies, or an unavailable service—has not been resolved, Solr FIX will not be able to reindex the transactions.
Key takeaway: FIX is not a magic solution; some documents may have permanent issues that can’t be resolved and will not be fixed.

Exceeding the Transaction Limit

By default, Solr FIX stops after processing maxScheduledTransactions (default: 500). Be mindful of this limit if dealing with a large number of transactions.

Epoch Time Format Reminder

When specifying a date/time period, ensure that the epoch time has 13 digits, as FIX requires it in milliseconds. Unless working with transactions committed after November 20, 2286, 5:46:39 PM, this rule always applies.

In Conclusion

Solr FIX is a powerful tool for addressing indexing issues in Alfresco. This guide provided details on how to use Solr FIX effectively. As a leading implementor of Alfresco, Zia Consulting is here to provide more information or assistance in utilizing this guide to fix your Alfresco index. Reach out to us today! We would be happy to assist you.

 

References

Pin It on Pinterest

Sharing is caring

Share this post with your friends!