Hey Again TDN -
Lots of conversation around server performance and folks recommended solutions to solving server and client perf. First, just to point this out on the front end. We've been live for less than two days. Not saying that to be dismissive of anyone's concerns, but part of the process is implementing adjustments/fixes and then evaluating to determine if those fixes have the impact we're hoping for. There's only so much 'fix evaluation' we can do in < 48 hours.
First, let's tackle server performance:
Second, let's hit the server cap:
So, to get to the crux of the concern here which really is: "I really want to play on your server and reliably play with people I am adventuring with, consistently."
We completely understand this, so let me lay out our plan/strategy here:
Third, Server crashes:
Server crashes suck. Plain and simple. The current belief is that a server log was filling up too rapidly and basically tanking all of the server's diskspace to the tune of numerous gigs. Unfortunately we couldn't get 100% confirmation because our automation kicked in for a reset and removed that log. Right now, we've removed that extra piece of logging and we'll monitor for crashes. If another takes place, we'll continue to root cause to track the issue down to its core.
We have added some preventative measures that take place as you play which will export and save your character. In the event of a crash, the rollback experienced should be lessened.
Lastly - Client Lag & Models:
Probably the most back&forth experience so far has been people loading into a zone and their FPS tanking because of the asset quality. This territory we are really trailblazing in, so the best recommendation we can offer (if you're experience client lag) is:
Hopefully this post helps spread some helpful info (or serves as a useful bit of context to link in chat). Concerns on perf and wanting to roleplay in-game but being blocked are totally valid. We believe we have a viable strategy to get through this, so please bear with us. TDN is absolutely a marathon, not a sprint, but we'll get there together if can stick it out with us.
- TDN Team
Lots of conversation around server performance and folks recommended solutions to solving server and client perf. First, just to point this out on the front end. We've been live for less than two days. Not saying that to be dismissive of anyone's concerns, but part of the process is implementing adjustments/fixes and then evaluating to determine if those fixes have the impact we're hoping for. There's only so much 'fix evaluation' we can do in < 48 hours.
First, let's tackle server performance:
- By the standard metrics, TDN server performance is pretty strong (could be better, but it isn't tanking constantly). Generally, server health is measured by Tickrate (ticks/second) which is basically a check for how responsive the server is. "OK" tick rate is above 25tps, "Bad" tickrate is below 20. TDN's tickrate average is 66.4 in the snapshot below which covers the last 3 hours from time of this writing. Meaning we are performing well-above margins for tickrate.
- Now, tickrate is not an end-all, be-all. There are other factors like code and pathfinding that requires a lot of processing to complete. Most of our current issue with latency serverside is related to scripts or pathfinding. Both of which are elements that we can improve upon. Some areas we may need to reduce NPC activity, some code we may need to reign back a bit. These updates will come with time.
Second, let's hit the server cap:
- "Why can't the server cap be increased?"
- There are two big elements here I want to explain (but do know that server cap increase, at some point, will happen):
- First - Increasing the cap will drive performance down and also make it more difficult to ascertain where perf problems are originating from. NWN as a game and a server application can only handle so many players in a single zone. You add in our custom systems andcustom assets, and you have situation unlike any PW has ever seen. So there really isn't a way to equate our solution to any other PW out there. We can certainly take lessons and recommendations from those servers, but not all of them will apply.
- Ask most other PW admins and most would consider us a bit crazy for how we're approaching building TDN. We like crazy and we like the idea of pushing the boundaries of what NWN can do, but with that comes challenges when it comes to solving for performance.
- Second - Each person logging in has a unique set of experiences and situations that they run into. Sometimes things go swimmingly, sometimes they don't. We need to be mindful of our team's bandwidth and ability to jump in and provide support to random situations. Currently, we have 3 primary individuals who do that: Danuvis, Khaine and Aschent_. We were feeling the pressure of an 80 person cap prior to our crash around 5am PDT on the 19th. We essentially have to balance between providing support to people when things go awry, implementing an actual long-term fix, and making other adjustments to the server as outline in our patch notes. We do expect to bring on more staff, however we need to approach that process sensibly and not in 'desperation mode' because it will result in poor decisions in terms of DM/Staff selection.
- First - Increasing the cap will drive performance down and also make it more difficult to ascertain where perf problems are originating from. NWN as a game and a server application can only handle so many players in a single zone. You add in our custom systems andcustom assets, and you have situation unlike any PW has ever seen. So there really isn't a way to equate our solution to any other PW out there. We can certainly take lessons and recommendations from those servers, but not all of them will apply.
- There are two big elements here I want to explain (but do know that server cap increase, at some point, will happen):
- "Adding a Second Server would solve your first problem".
- Although that might be true, we need to be cautious not to rush into a potential solution.
- A second server might fill up as fast as our current one. Meaning we're fielding close to 160 unique accounts all logged in at once. Meaning that this solution has to be duplicated and the next answer becomes 3 servers.
- The current server we're paying for is over $100/month USD. We would need to purchase duplicate specs and take on a total of around $250/month USD just in server costs. Could we swing that with donations? Possibly, but that also means the staff (in this case Aschent_) is on the hook for that monthly cost in the event donations don't hit the expected mark.
- For anyone that has dealt with server infra before will attest to, downgrading a server is more difficult than scaling one up.
- Third - We would need to cut the server down the middle and ensure everything can portal between both servers at the right points. This is dev time we'd need to allocate to this process and we'd easily be spending several weeks building this out and trialing the process.
- Lastly on this point, and in my mind most importantly, we can't expect the current staff to be bouncing between two servers to run TDN at this point. Although we'd hope folks would understand, we are also responsible for ensuring that the experience of both servers is consistent and adequate to the best of our ability. It would be near-impossible to do this without strapping the DM team to a hamster wheel and forcing them to DM all day.
- Although that might be true, we need to be cautious not to rush into a potential solution.
So, to get to the crux of the concern here which really is: "I really want to play on your server and reliably play with people I am adventuring with, consistently."
We completely understand this, so let me lay out our plan/strategy here:
- Priority #1 is ensuring that our current experience in-game is as performant as possible. We need time to ensure that is the case. This also includes the annoying "STACK OVERFLOW" messages you see from time to time and other oddities.
- Next is to ensure that, as unique player needs/issues/problems die down, we can empower our community and DMs to drive narrative forward in a healthy manner.
- We want DMs "DMing", not just playing customer support because Jimmy got stuck behind some walls.
- As #2 happens, we can begin increasing our cap steadily. We are willing to push the total max clients for NWN, however we need to avoid getting too crazy with it at this point.
Third, Server crashes:
Server crashes suck. Plain and simple. The current belief is that a server log was filling up too rapidly and basically tanking all of the server's diskspace to the tune of numerous gigs. Unfortunately we couldn't get 100% confirmation because our automation kicked in for a reset and removed that log. Right now, we've removed that extra piece of logging and we'll monitor for crashes. If another takes place, we'll continue to root cause to track the issue down to its core.
We have added some preventative measures that take place as you play which will export and save your character. In the event of a crash, the rollback experienced should be lessened.
Lastly - Client Lag & Models:
Probably the most back&forth experience so far has been people loading into a zone and their FPS tanking because of the asset quality. This territory we are really trailblazing in, so the best recommendation we can offer (if you're experience client lag) is:
- Avoid massive zones like Tethir Thoroughfare that host large playercounts.
- Consider adjusting your settings and playing with the 'experimental' settings to see if you notice an improvement.
Hopefully this post helps spread some helpful info (or serves as a useful bit of context to link in chat). Concerns on perf and wanting to roleplay in-game but being blocked are totally valid. We believe we have a viable strategy to get through this, so please bear with us. TDN is absolutely a marathon, not a sprint, but we'll get there together if can stick it out with us.
- TDN Team