Apple’s Leopard Server appears to have a major bug that is causing havoc for those users unlucky enough to see it occur on a daily basis.
Basically, DirectoryService
is crashing on users of Leopard Server. This in itself isn’t a big problem – Leopard Server restarts DirectoryService
if it fails. The problem is with AppleFileServer
. AppleFileServer
seems to lose the ability to authenticate users with the new instance of DirectoryService
. This means Time Machine backups stop working, and users can’t mount the server using AFP
.
Currently, as of 10.5.2, this has still not been fixed. Apple have apparently given users suggestions such as sending a HUP
signal to AppleFileServer
at regular intervals to get things back on track, but in my limited testing, this doesn’t work – leaving the user with the ability to only mount certain shares.
The only reliable solution I’ve found is to restart AFP
. Obviously we don’t want to do that all the time – it should only be done when DirectoryService
crashes. To do this, I’ve built the following launchd daemon. Basically, it works as follows:
The daemon watches the /Library/Logs/Crashes
directory, and wait until a crash occurs. When it does, it runs a script that checks to see if the crash was a DirectoryService
crash, moves these crash files away in to a sub-directory, and restarts AFP
.
Not a great solution, as if someone is connected at the time, they get booted and have to remount. I’m experimenting with other fixes people have listed that don’t require AFP to be restarted, but so far I’ve found they don’t seem to work consistently.
You can download the daemon here.
Once downloaded, uncompress the file (double click in Finder). Safari may unzip it automatically for you.
From a Terminal, cd
to the unzipped folder (if downloaded from Safari, by default it will be in ~/Downloads
).
cd ~/Downloads/restartafp
Now, type the following. Note that you need to be logged in as an administrator, and you will be asked for the administrator password in order to do the first operation.
sudo mkdir \
/Library/Logs/CrashReporter/DirectoryService
sudo cp com.curmi.restartafp.plist \
/Library/LaunchDaemons/
sudo cp restartafpondscrash.sh \
/usr/local/bin/
sudo chmod a+x \
/usr/local/bin/restartafpondscrash.sh
sudo launchctl load \
/Library/LaunchDaemons/com.curmi.restartafp.plist
To test this is working, you can do the following, and check /var/logs/system.log
to see if it mentions the restart.
sudo touch \
/Library/Logs/CrashReporter/DirectoryService_trigger.crash
If for some reason you want to uninstall, run the following commands from the Terminal.
sudo launchctl unload \
/Library/LaunchDaemons/com.curmi.restartafp.plist
sudo rm \
/Library/LaunchDaemons/com.curmi.restartafp.plist
sudo rm /usr/local/bin/restartafpondscrash.sh
sudo mv \
/Library/Logs/CrashReporter/DirectoryService/* \
/Library/Logs/CrashReporter
sudo rmdir \
/Library/Logs/CrashReporter/DirectoryService
I hope this is useful to others out there. I’ve filed this bug with apple (radar bug report number 5836741), and I hope Apple fixes it soon.
The fix I’m currently trialling was listed here, and suggests that rather than doing:
serveradmin stop afp
serveradmin start afp
in the script, we do:
serveradmin settings afp:authenticationMode = "standard"
serveradmin settings afp:authenticationMode = "standard_and_kerberos"
I’ve tested this and it didn’t seem to work, but I’ll try it again just in case. I’m sure DirectoryService will crash on us sometime tomorrow to confirm if the fix works or not.
Howdy, i’m going to test this too. I’ve had the same problem, and I don’t want to just cron an afp restart in the wee hours because that’s when timemachines tend to run for my clients. let us all know what’s up with your work around.
-Sky
Hey Sky. I tested the afp:authenticationMode work around. The first day I did this, it worked fine, however my machine locked up. I’m not sure of the cause, and since I hadn’t applied the latest firmware, I applied it, and restarted. It has been going now for 2 days with the new work around, and seems to be working well.
One thing I do notice though, is with the new workaround, DirectoryService crashes more often. In some cases once an hour, 6-7 times a day. We really desperately need a fix for this from Apple.
Thanks! This seems to have lessened the screaming here. Here it seems the number of DirectoryService crashes has gone done, sometime I go the whole day without seeing one. Watched pot never boils? Dead cat in a box? Who knows.
Just a follow-up…it’s been a couple days now with the AFP restart script in place. We were going down at least a few times per day. Ironically, since putting in this temporary fix, the DirectoryServices crashes have disappeared. The only AFP restarts have been caused by: sudo touch /Library/Logs/CrashReporter/DirectoryService_trigger.crash
Go figure…
The authenticationMode workaround does appear to be working for us. We’ve had no problems all week, even though logs show DirectoryService crashing at least 6 times a day for us.
I’d love to know what picciano’s secret is though as we still see the crashes as I mentioned.
Hi curmi,
Thanks so much for the workaround as I’ve been getting the same AFP disconnects on two Leopard servers.
I ran the Terminal prompts as directed, but just created several new empty folders in ~Downloads (a+x,chmod, etc.) Library> LaunchDaemons is still empty.
Do I need to manually place the com.curmi.restartafp.plist file somewhere?
Note this is on a 30 min. old virgin install with no attempted AFP connections or crashes.
Here’s what Terminal said after entering the admin password:
mkdir: /Library/Logs/CrashReporter: No such file or directory
mkdir: com.curmi.restartafp.plist: File exists
mkdir: /Library/LaunchDaemons: No such file or directory
mkdir: cp: File exists
mkdir: restartafpondscrash.sh: File exists
mkdir: /usr/local/bin: No such file or directory
mkdir: /usr/local/bin: No such file or directory
mkdir: /Library/LaunchDaemons: No such file or directory
domain:restartafp servername$
Macintosah, you seem to have entered the commands incorrectly.
You have to type them exactly as I wrote them, one line at a time. The \ can be removed at the end of lines if you join the next line to the first.
The first command would then be:
sudo mkdir
/Library/Logs/CrashReporter/DirectoryService
All on one line, then you press return.
Then you do the next. And so on.
Thanks, Cumi!
[Noob exits, pursued by a bear.]
I am new to leopard server and I reinstalled it 3 times thinking this issue was my fault. This server is useless with this type of issue. This needs to be fixed or I demand a refund :)
Hey James, make sure Apple know your dissatisfaction, otherwise we’ll never get them to fix this issue.
Love the blog. Its now added to my RSS reader! What is the best way to complain to apple? Should I call support and let them have it or should I submit something on apple’s site?
Thanks James. You can either call support, or actually log a bug in bugreporter (http://bugreport.apple.com/). The second method is good because it seems they look at them eventually, and occasionally give you feedback. However, I’ve logged many bugs over the years, and they ignore a lot that they consider trivial. :-)
I think this issue might also be messing up my VPN access also. Because it seems to hang on authentication a lot and then fail to connect.
But with a fresh reboot the VPN works without any issues.
Hi Curmi,
So sorry to keep spamming your comments with my Terminal noobiness, but I’m still getting errors trying to install your fix. I’m pretty sure I’m typing everything correctly, but I’m getting “a no such file or directory” response on “restartafpondscrash.sh”.
The one Leopard Server box I’ve tried this fix on is still getting AFP crashes, so I must be doing something wrong. Here is the terminal session:
Leopard-Server:~ admin$ cd ~/Downloads/restartafp
Leopard-Server:restartafp admin$ sudo mkdir \
> /Library/Logs/CrashReporter/DirectoryService
Password:
Leopard-Server:restartafp admin$ sudo cp com.curmi.restartafp.plist \
> /Library/LaunchDaemons
Leopard-Server:restartafp admin$ sudo cp restartafpondscrash.sh \
> sudo cp restartafpondscrash.sh \
> /usr/local/bin
usage: cp [-R [-H | -L | -P]] [-fi | -n] [-pvX] source_file target_file
cp [-R [-H | -L | -P]] [-fi | -n] [-pvX] source_file … target_directory
Leopard-Server:restartafp admin$ sudo chmod a+x \
> /usr/local/bin/restartafpondscrash.sh
chmod: /usr/local/bin/restartafpondscrash.sh: No such file or directory
Leopard-Server:restartafp admin$ sudo launchctl load \
> /Library/LaunchDaemons/com.curmi.restartafp.plist
Leopard-Server:restartafp admin$
Macintosah, copy and paste my commands one word at a time. You are copy and pasting whole slabs, and somehow it copies an extra character after the \ character.
Go through the commands carefully. Do one word at a time.
Curmi,
We’re cross posting now at the Apple support forum thread, (I’m TheProfessor there) so I promise this is the last time I’ll post to your blog comments. I’m just really desperate to have a solution to this. My client’s Leopard server is crashing hourly and telling them OS X Server doesn’t work right just isn’t an option.
I think I found the problem. I retyped everything by hand with absolutely no errors, but I still get an error at the chmod prompt:
ServerName:restartafp server$ sudo chmod a+x /usr/local/bin/restartafpondscrash.sh
chmod: /usr/local/bin/restartafpondscrash.sh: Not a directory
Turns out my bin folder at usr/local/bin shows as a Unix executable, not as a folder. How do I get it to be a directory?
Contact me in email and we’ll resolve it outside the blog.
Hi
Thanks for the blog. We call this the period 2 bug as when 3 or 4 classes log on at period 2, this bug arrises. Most have logged on and the stragglers can not log on. we tell everyone to leave the keyboards alone and restart the server. This fixes the log problem and the existing users can continue working without losing connection with the server which is great. Your fix however seems to kill all the students connections to the server and they lose all their work and have to log back in. which I accidentally did by following your instructions to the letter and the last one killed all users on the server.
What is it about the restart of the server that is different from restarting afp in your script.
Could your script be changed to automatically restart the server and maybe the users would not notice a thing.
Any thoughts?
Apple must fix this!!!!
A bit past the point where the issue was raised, but Macintosah’s comment raises an issue that the basic instructions seem to ignore. A straight Leopard Server setup has a /usr/local directory, but /usr/local/bin does not exist until you create it. The install instructions should have the line:
sudo mkdir /usr/local/bin
before the line that copes the restart script into /usr/local/bin (or else the script will actually be named /usr/local/bin and not /usr/local/bin/restartafpondscrash.sh) with a short comment that an error message of “/usr/local/bin: FIle exists” after this command is run can be ignored.
Apple has release 10.5.3 for the Client and Server.
From Apples support page it says:
“Addresses an issue that could cause the AppleFileServer process to stop accepting connections while consuming most of the available CPU time on the server.
Addresses an issue that could cause the Apple File Service refuse new connections after DirectoryService becomes unresponsive, and improves stability of DirectoryService.”
This should resolve the issue.
Has this been fixed? I am experiencing this issue on 10.5.6 server which is connected to a directory system: windows 2003 server AD environment/domain.
thx.
Yes it seems to be fixed for us.