As the number of simultaneous Build Forge jobs you are running increases, you may find the job run order appears to be random.
This is because Build Forge does not use a first-in-first-out (FIFO) algorithm to decide which job to run. Rather, queued jobs call back to the console every few seconds, looking for an open slot in the run queue. The first job to find an open slot “wins” and is the next to run. At the risk of understating the case. This is not always optimal.
Console processes and Job Run Order
The number of Max console jobs can usually be increased quite a bit. The run queue takes very few resources. The default of 3 is ridiculously conservative. Most systems can easily handle 10 times that amount.
It also should be noted that the max console jobs setting is somewhat of a misnomer. What really is being counted is the number of running steps. Therefore, choices such as step threading may result in the run queue filling more than expected. This does not mean steps should not be threaded. Threading will reduce run time, and therefore slots will reopen more quickly. It is merely something to consider.
Other Causes of Random Job Run Order
Adjusting the size of the job queue will not solve all problems with random job run order, however.
If several jobs are waiting for a server and exceed the max jobs setting of the server, they they also will wait before resuming. If more than one is waiting, the order in which they start also will be random. In addition, they also will hold a slot in the run queue, thereby possibly causing waits for other jobs. The same situation develops if more than one job is waiting for the same semaphore. A job in this state is taking up a slot in the run-queue, but is not processing.
After increasing the console limit, the next place to look is at your servers. Are the limits on simultaneous jobs there too low?
You want to be more conservative here, as the server chosen by a selector is where the work is actually done. Be even more cautious if you have multiple Build Forge servers on the same physical machine. The true run limit, in this case, is the combined limit of all Build Forge servers that point to the physical server. There are few situations where there should be more than one Build Forge server per physical machine. Aside from increasing the run limits on your servers, another approach is to use collectors and selectors to allow the job a choice of physical servers whenever possible. If one is full, the selector will simply choose another.
Finally, consider semaphores and project limits. A job waiting for a semaphore will consume a space in the run queue while waiting for it’s chance to continue. The same job limited by the project limit will not. It will wait for the project limit to allow it to enter the run queue, meanwhile remaining in a waiting state. This leaves the console slot available for other jobs. The caveat here is that if more than one project is waiting for the project limit to allow it to run, you will still encounter random run order on that specific project.
Semaphores can be useful for blocking a small portion of a job, but if the majority of the job needs to be single-threaded, you may be better off throttling it at the project level.
No one set of solutions is going to be optimal for all situations. You will need to examine the work done in your installation, and use some experimentation to determine the best mix of throttles to use. Limiting project execution at the most appropriate level can nearly always resolve any problems with random job run order.