In our support boards we aim to provide support for the B3 core in its current state. Older releases of B3 are NOT supported. Check our front page for the latest version. You may post feature requests in our General Discussion board. Modifications and Hacks of the core B3 code are NOT supported.
Before you ask for support: [ Read the Support Instructions ] - More info: [ Full Support Disclaimer ]

You are here: Big Brother Bot ForumSupport ForumsGeneral Usage Support (Moderator: MordyT)B3 forgetting teams
Pages: [1] 2 3   Go Down
  Print  
Author Topic: B3 forgetting teams  (Read 1438 times) Bookmark and Share
Jr. Member
**
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 24
Offline Offline
« on: September 24, 2011, 03:49:29 PM »

Since upgrading to 1.7.1 (from 1.6.0) I've had issues with a team-balancing plugin I'm developing where b3 loses sync with what teams it thinks bots are on.  At first I expected I was doing something wrong, as I've had to maintain a lot of state information separately from b3.  I put a lot of work into non-invasively enforcing reliable knowledge of the true client state, but I knew my code was going through unstable changes.

Now my code is stable, and I'm still seeing problems.  On a hunch, I began testing b3 without my plugin installed at all, and the xlrstats page started reporting everyone as spectating.  It was only my plugin's persistent attempts to enforce team sync by re-applying team assignments and disallowing spectating bots that ever caused anyone to appear on a team at all.  To be clear, I'm never trying to assign teams to b3's base client objects, and no players are actually spectating.  The problem might only apply to bots, but I can't be sure of that.  There are usually one or two humans around, and they appear to be affected less.  But I can't be sure they're never being falsely reported as spectating...I haven't the time to just sit and watch random player movements alongside the console with every new revision.

I can't provide any useful log output, as evidence of the problem only appears by seeing the xlrstats page's currently playing list, or by using output generated by my plugin.  Client info debug output--raw and parsed by OnClientUserinfo()--are showing up in logs that show players/bots on teams as they are.  But the first time my plugin sees a glimpse of b3's client object for that player, it already has client._team == b3.TEAM_SPEC.  The same goes for xlrstats' output.  More than that, after I assign it a team using direct console commands (i.e. forceteam cid blue), and even if I do manually set client._team, it still eventually reverts to client._team == b3.TEAM_SPEC.  The clients are not actually getting moved to spectating, nor should they be...b3 only thinks they are.

xlrstats is the only plugin I've not disabled during testing, as without either it or my plugin I cannot actually view b3's team assignments.  Besides that, I've searched the source code of all plugins to ensure they are not assigning anything to *.team or *._team, either directly or using setattr(), and they are not.

This crusade began when I had a bug that caused my separate client state info to drift, and I wanted to compare against what b3 thinks to detect problems.  I think I have my issue fixed and could just ignore b3's info now, but it does still bug me that xlrstats reports a mess of spectating bots, and complicates development.  Always having to switch in and out of the game to verify even basic team assignment data is a huge pain, and because my bug was a rare race condition, I'd like to still have means of detecting it should it recur.

UPDATE:  When I have my plugin installed but observing only, it actually observes client objects as always "unknown" (not "spectating").  For now I'm ignoring the bogus client data, but I wish there were some way I could reliably catch any future sync errors without having to spam speculative corrections.  Is there a proper way to periodically request real team info updates through the parser?  Some documentation that does more than list method and property declarations would be a godsend.  It would also be nice not to have to randomly reassign everyone every time I restart b3, but try to query client team assignment first instead.

Here's a log entry for user info changed...it doesn't appear that the parser outputs any debug info from parsing it though:

Code:
110924 22:35:42 CONSOLE '1:09 ClientUserinfoChanged: 5 n\\Mantis\\t\\2\\r\\2\\tl\\0\\f0\\\\f1\\\\f2\\\\a0\\0\\a1\\0\\a2\\0'

I do notice that in iourt41.py, OnClientuserinfochanged assumes team ids match between sytems.
« Last Edit: September 24, 2011, 07:36:17 PM by HonoredMule » Logged

[ www.xlrstats.com ]
Project Lead
*
OS: Linux
Type: Owner dedicated server(s)
Gameservers: CoD, CoD2, CoD5, UrT
Posts: 2022
Offline Offline
WWW
Support Specialty: B3-Core, CoD/UrT/WoP/ETPro parsers, Plugin development
« Reply #1 on: September 25, 2011, 02:24:22 AM »

In order to start a proper investigation on this issue we'd need some additional info first.

- What UrT binaries are you using, and is that what you mean with systems in your last line by 'OnClientuserinfochanged assumes team ids match between sytems'
- Do you use private slots on the server
- Do the problems also occur to real clients, or bots only
- Does it also occur when in the xlrstats.xml hide_bots and exclude_bots are set to False
- Can you provide (a link to) a b3.log file in verbose2 mode (loglevel set to 1)
- Perhaps also the part of UrT log that covers the same timeperiod

We'd have to see log data proving that the UrT parser is missing things or handling loglines wrong in order to identify the problem.
Logged

Senior Dev.
*
OS: Linux
Type: Home user
Posts: 3483
Offline Offline
WWW
Support Specialty: B3-Core, UrT/SmG/BFBC2 parsers, Plugin development
« Reply #2 on: September 25, 2011, 03:19:09 AM »

Quote
Is there a proper way to periodically request real team info updates through the parser?

method A - use the /rcon players command
which returns the team with each player score :
Code:
Map: ut4_casa
Players: 1
Scores: R:0 B:0
0: Test BLUE k:0 d:0 ping:0 192.168.0.2:27961
NOTE : this is the what the CorrectPlayersTeam method of the iourt41.py parser use to fix teams.


method B - query cvar g_blueteamlist and g_redteamlist
Code:
g_blueteamlist
"g_blueTeamList" is:"A^7" default:"^7"
g_redteamlist
"g_redTeamList" is:"^7", the default
NOTE : this is used by the poweradminurt plugin to tell if teams are balanced. Also the value of those cvar are capital letters which represents the slot number of connected players.
A -> slot 0
B -> slot 1
C -> slot 2
etc (won't work for server allowing more than 26 players)

I'm not sure are reliable are those results when called in between rounds.


A way for you to debug the issue could be to run B3 from Eclipse pydev and set a breakpoint in the Client._set_team method (which is called whenever a client.team property is set).
Logged

Jr. Member
**
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 24
Offline Offline
« Reply #3 on: September 26, 2011, 10:06:49 PM »

I expect I'll have a chance to look at this again and gather more data in a day or two.  In the mean while:
 - It does affect humans as well as bots.
 - I'm using a security-bugfixed ioquake3 build, and when I do post the logs, they will show properly formed query responses coming from UrT.
 - I do not use private slots.
 - "Does it also occur when in the xlrstats.xml hide_bots and exclude_bots are set to False"  Haven't tested this yet, but the faulty values are directly in the client objects, and occur even when xlrstats isn't an included plugin.
 - 'OnClientuserinfochanged assumes team ids match between sytems' is referring to how that method handles incoming data in iourt.py.  It assumes B3 and UrT are recognizing team blue, for example, as the same integer value.
Logged
Senior Dev.
*
OS: Linux
Type: Home user
Posts: 3483
Offline Offline
WWW
Support Specialty: B3-Core, UrT/SmG/BFBC2 parsers, Plugin development
« Reply #4 on: September 26, 2011, 10:24:43 PM »

I'm surprised you are the first o e to report this issue. So it is either related to B3 v1.7.1 or the custom iourtded build you are using.
Please keep us up to date and don't hesitate to ask questions. With BF3 beta starting I do not have time to investigate this myself, but I'll do anything to help
Logged

Beta Testers
*
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 244
Offline Offline
« Reply #5 on: September 27, 2011, 03:43:25 AM »

Usually modified server binaries don't affect logs (unless they're heavily modified, but this mods are not supported by Frozen Sands and usually not public).

When starting the b3 during the game you can always see that players are desynchronized (like 20 spectators on xlrstats), and it takes time to resync. Few times I had the situation, that !teams command from poweradminurt either didn't work (but teams were not balanced) or that during the game one person was moved to opposite team and back to previous team instantly by auto !teams (getting killed in the process).

After running the b3, I run !rebuild command to resync player info to the xlrstats. Honestly I don't know if it really works or if it's just coincidence, but after refreshing status.xml the players are shown correctly.
Logged

Jr. Member
**
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 24
Offline Offline
« Reply #6 on: September 28, 2011, 06:42:16 AM »

I just did a quick test with !rebuild, and it had no effect on results on my server.  I'm too bogged down with other projects right now, but I will get around to posting logs eventually.

I've seen lots of xslrstats pages reporting a surprising number of spectators before...I wouldn't be surprised if it's just something no one notices, since the xlrstats team listing is the only place the data is ever made visible.  I wouldn't be shocked if it was something I was causing myself either--I just can't see how.
Logged
Jr. Member
**
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 24
Offline Offline
« Reply #7 on: October 03, 2011, 12:39:28 PM »

http://games.honoredsoft.com/logs.zip

This is with log level at VERBOS2 and all plugins running in full configuration.  Now that the server I'm replacing has gone down, I don't want to be screwing around with minimal configs.  In the logged setup, xlrteams is periodically enforcing its team knowledge in the form of team assignments, but not editing the client record directly or paying any attention to the B3-believed team except as a means of preventing team assignment spamming when they actually agree.

The thing I notice most is that under this setup, as a human player everything is often fine.  But when it is not, it never gets corrected.  (xlrteams periodically notifies me of the discrepancy endlessly, even after assigning me to a different team or after I move myself--an act which xlrteams does acknowledge).  When I had xlrteams manually intervening to correct B3's team assignments, this was still the case (i.e. a matching record would revert to not matching without any actual team changes occurring, and the reverted value would be unknown).

Here are some explanations of the log entries produced by xlrteams:
Quote
111003 16:51:45   VERBOSE   'XlrteamsPlugin: getPlayer returning Bot Chicken[*red] (BOT1) for Client Chicken[unknown] (BOT1)'
getPlayer is a method which accepts a client record and returns the matching xlrplayer record.  It logs "returning (Bot|Player) playername[playerteam] (guid) for Client playername[playerteam] (guid)" where the first set of values are the ones remembered by xlrteams and the second set are from the client record. (The * in [*red] was added to break bbcode.)
Quote
111003 16:51:58   VERBOSE   "XlrteamsPlugin: Player Boa (4) didn't exist and had to be created"
An attempt to do as above failed.  "playername (cid)" comes from the client record.
Quote
111003 16:51:58   DEBUG   u'Creating new client: Cheetah[unknown] @0.0.0.0 (BOT4) (-1) with xklrskill=1250'
A new xlrplayer record was created for the provided client record.  all values except the last come from the client record, and are "playername[teamname] @ip_address (guid) (team_id)"

I don't have any short-term plans to investigate further myself, as the intervention by xlrteams keeps the xlrstats team display mostly right most of the time anyway, and all that's left is the occasional case where someone gets perodically spammed faulty messages about possibly having to correct his team assignment (team changes to the same team do not cause a respawn and so aren't very disruptive).
« Last Edit: October 03, 2011, 01:02:42 PM by HonoredMule » Logged
Jr. Member
**
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 24
Offline Offline
« Reply #8 on: October 05, 2011, 11:22:57 AM »

I'm now running with production configuration (error logs only) but am seeing something that stands out a little.  I don't know if this is mostly happening during map switches, but I suspect it can't all be ("unknown weapon id" for example).

Code:
609 111005 16:07:04 ERROR   '**** Event queue was full (15)'
610 111005 16:07:06 ERROR   '**** Event queue was full (15)'
611 111005 16:07:08 ERROR   '**** Event queue was full (15)'
612 111005 16:07:10 ERROR   '**** Event queue was full (15)'
613 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Authenticated 7'
614 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Team Change 7'
615 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Name Change 7'
616 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Team Change 7'
617 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Connect 5'
618 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Authenticated 5'
619 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Kill 5'
620 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Damage 5'
621 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Connect 2'
622 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Authenticated 2'
623 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Team Change 2'
624 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Team Change 2'
625 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Damage 2'
626 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Damage 2'
627 111005 16:07:14 ERROR   '**** Event sat in queue too long: Client Damage 2'
628 111005 16:07:43 ERROR   '**** Event queue was full (15)'
629 111005 16:09:13 WARNING 'unknown weapon id on Hit line: 21'
630 111005 16:09:13 WARNING '_getDamagePoints(None, 2) cannot find value : None'
631 111005 16:13:59 ERROR   '**** Event queue was full (15)'
632 111005 16:16:27 WARNING "RCON: ERROR reading: Exception('No readable socket',)"
633 111005 16:16:28 WARNING "RCON: ERROR reading: Exception('No readable socket',)"
634 111005 16:16:28 ERROR   "RCON: too much tries. Abording ('set bot_minplayers 0')"
635 111005 16:16:30 WARNING "RCON: ERROR reading: Exception('No readable socket',)"
636 111005 16:16:31 WARNING "RCON: ERROR reading: Exception('No readable socket',)"
637 111005 16:16:31 ERROR   "RCON: too much tries. Abording ('g_redteamlist')"
Logged
Moderator
*
OS: Windows
Type: Gameserver Rental Co.
Gameservers: 2x CoD4, 1x BF3
Posts: 2627
Offline Offline
Owner of Host4B3.com - Over 70 bots hosted!
WWW
« Reply #9 on: October 05, 2011, 11:33:40 AM »

This might be of interest to you to fix the even queue issues:
Link
Logged

Need B3 Bot hosting? Check out Host4B3.com
Check Twitter.com/Host4B3 for updates if the site it down.

Help will be given to those with a b3.log

System: Python 2.7.1 - B3 Source Code - Locally hosted MySQL & Apache - Win 2k3
Jr. Member
**
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 24
Offline Offline
« Reply #10 on: February 23, 2012, 06:47:08 PM »

I found the time to take a look at some issues myself, and found one obvious problem with the rcon implementation.  During map change the server doesn't respond to anything, and important events/commands get discarded.  I modified game.py to add a 30 second timeout to queuing EVT_GAME_MAP_CHANGE (should also do the same for round start and warmup, but figuring out where those is queued is a little roundabout, and those events aren't so important to me since I've long ago managed to work around their unreliable receipt).  And the big change that has fixed my biggest problem (B3 forgets that certain players exist/don't exist)  came from this modification to clients.py:

Code:
@@ -113,6 +113,10 @@ class ClientVar(object):
 
 #-----------------------------------------------------------------------------------------------------------------------
 class Client(object):
+   
+    # configuration
+    event_timeout = 20 # ensure important events get discarded last during congestion
+   
     # fields in storage
     guid = ''
     pbid = ''
@@ -415,7 +419,7 @@ class Client(object):
         if self._team != team:
             self._team = team
             if self.console:
-                self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_TEAM_CHANGE, self.team, self))
+                self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_TEAM_CHANGE, self.team, self), self.event_timeout)
 
     def _get_team(self):
         return self._team
@@ -446,7 +450,7 @@ class Client(object):
         self._exactName = name + '^7'
             
         if self.console and self.authed:
-            self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_NAME_CHANGE, self.name, self))
+            self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_NAME_CHANGE, self.name, self), self.event_timeout)
 
     def _get_name(self):
         return self._name
@@ -691,7 +695,7 @@ class Client(object):
             return False
         else:
             if console:
-                self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_UPDATE, data=self, client=self))
+                self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_UPDATE, data=self, client=self), self.event_timeout)
             return self.console.storage.setClient(self)
 
     def auth(self):
@@ -737,7 +741,7 @@ class Client(object):
 
             self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_AUTH,
                 self,
-                self))
+                self), self.event_timeout)
 
             self.authorizing = False
 
@@ -895,6 +899,11 @@ class Group(Struct):
 
 #-----------------------------------------------------------------------------------------------------------------------
 class Clients(dict):
+   
+    # configuration
+    event_timeout = 30 # ensure important events get discarded last during congestion
+   
+   
     _nameIndex    = None
     _guidIndex    = None
     _exactNameIndex = None
@@ -1122,7 +1131,7 @@ class Clients(dict):
         if self.has_key(cid):
             self[cid] = None
             del self[cid]
-            self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_DISCONNECT, cid, client=client))
+            self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_DISCONNECT, cid, client=client), self.event_timeout)
 
         self.resetIndex()
 
@@ -1141,7 +1150,7 @@ class Clients(dict):
 
         self.console.queueEvent(b3.events.Event(b3.events.EVT_CLIENT_CONNECT,
             client,
-            client))
+            client), self.event_timeout)
     
         if client.guid:
             client.auth()

Added to that I used the undocumented "event_queue_size" config in b3.xml, and now the important events seem to be arriving much more reliably, even when the map is changing.

Most importantly, B3 no longer thinks everyone has left between maps, which fixes the spamming of extra bots, xlrstats being constantly shut down if I require n human players as well as poweradmin's player count-based map rotation switcher.  Once I found out about the expiry parameter on various commands, I added them to my important rcon commands as well, and now everything appears to be much more reliable.

The proper solution to all this would be to implement priority tagging for the event queue, so that expiry can be automatically be set on all events centrally, accepting the parameter as an override.  Further, the event queue and rcon ought to automatically pause themselves during map switches, rather than spamming retries.  Also, I would like to see some documentation regarding sv_minrate.  According to the wiki, it should be set high to make B3 run smoothly, but doesn't explain how or why.  I hoped  that doing so would make rcon commands to the server (particularly messages for clients) flow much faster, but it appears to have had no effect on that.

The original problem I reported (B3 forgetting teams) still exists, and as far as I can tell no attempt is ever being made to actually find out what the teams are , except during parser.pluginsStarted.  How would I go about triggering iourt41.py->getPlayerTeams() from a plugin to ensure sync is maintained?

The horribly delayed client messaging is now (I think) the biggest outstanding issue with my server setup (since my plugin fixes team assignments by brute force anyway).  In fact it appears to have been made much worse by something I did, even though I can identify no connection (I'm not even setting expire on these messages, and nothing else is happening between them).  I'm now searching for some variable that might control the rate at which they're processed, since the issue is not cpu, network/db, or disk (all of which are minimally utilized).  I can remember a time when they'd flow nearly instantaneously (a lifetime ago), but now, when I call debugging commands that return a few lines of console text, one line disappears before the next ever arrives.  When 20 clients are to be informed of their team switch, the 20th client gets his message 2 minutes after being switched.

EDIT: I moved my log files to tmpfs (/dev/shm) and now messages appear faster (can see 2 lines at a time now), but this is still way slower than it should be, and still slower than it was before I increased the queue size.

EDIT2: Found comments in parser.py indicating some config options got moved around on me, and these changes are not reflected in the distribution.xml.  Now I'm inspecting parser.py to see what values are actually read and where.  Maybe I'll get lucky and find some more undocumented config variables.  ...no luck.  All that remains to be a possible factor is that I'm using the external IP to connect the RCON.  That can't be helped because the server defaults to listening on and reporting the LAN ip.  But that has always been the case, and shouldn't be a problem...it's still practically local, and once worked well like this.
« Last Edit: February 23, 2012, 10:23:20 PM by HonoredMule » Logged
Jr. Member
**
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 24
Offline Offline
« Reply #11 on: February 23, 2012, 11:27:57 PM »

My console delay issues are fixed.  I had previously tried playing with the "delay" and "lines_per_second"  config variables, even setting them to what I though were extreme values.  There didn't appear to be any change.  Now recently I came across run() in parser.py, and seeing how lines_per_second was actually being used prompted me to go a little further.  I set delay from 0.01 to 0.001 and lines_per_second from 10 thousand to 10 trillion.  Lo and behold, rcon messages flow faster than they ever did before.  Of course b3_run went from normally idle to a steady 3% cpu usage, but at least I had a bloody result, and I understood why.  Moving delay back to 0.1 and lines_per_second to 10 million, I still get the same performance (with the exception that !help outputs messages at a more reasonable pace) and b3_run is back to mostly-idle (top shows 0% usage more often than not, 1-2% the rest of the time).  I still can't tell why writing to the console is affected by this work loop that handles reading the logfile, and that seems like something that ought to be documented more clearly.  I didn't expect the whole server's operation to be throttled by something that appeared related only to log processing.

It seems possible that this will make the edits to clients.py unnecessary--with the config change everything has started working as it should, including prompt recognition of actual team assignments--but I'm not going back.  Enforcing higher reliability of more important events is just common sense. Wink  Besides which, I'm not very reassured that B3 will settle down and preserve details during map changes, unless the slowdown was somehow preventing an rcon-pause that I can't find in the code but does exist.  Plus the team assignment issue suggests that having any delay at all is allowing excessive out-of-order processing, for team changes to be recognized now but not before.
« Last Edit: February 24, 2012, 12:07:20 AM by HonoredMule » Logged
Senior Dev.
*
OS: Linux
Type: Home user
Posts: 3483
Offline Offline
WWW
Support Specialty: B3-Core, UrT/SmG/BFBC2 parsers, Plugin development
« Reply #12 on: February 24, 2012, 01:28:59 PM »

Let me try to explain the role the event queue has in B3 by this schema



The important point on this schema is that in B3 there are 2 main threads which communicate together throught B3 events.

The 1st one is responsible for :
  • reading a game log line
  • give sense to it (might need querying the game server throught rcon)
  • update internal game and player states
  • create a B3 event which is queued

The 2nd one :
  • dequeue a B3 event
  • transmit the event to the 1st interested plugin
  • that plugin might make rcon queries
  • once the 1st plugin is done, transmit the event to the 2nd interested plugin
  • ... and so on until there are no more plugin listening for that type of event

Each thread runs independently from the other one. So if one thread is blocked for a while, the other continues to do its job.

Now let study the main bottlenecks
  • rcon queries : due to their nature (UDP) and server limitation, the rate at which rcon queries are sent is limited. With the iourt41 parser, each rcon query can take up to 2.5s before giving up.
  • slow plugins : a plugin treating an event can block the event dispatcher thread if it is slow to finish whatever it has to do. (could be because of rcon queries, sleeping for a while, etc)
  • slow game log parsing : to save on CPU and I/O the rate at which the parser thread reads lines of the game log can be limited.
  • slow game log line parsing : a type of game log line might requires that the parser thread do multiple rcon queries to get what it needs in order to update the game and players' states and/of create a complete B3 event.

How A slow event dispatcher thread can impact the parser thread
The event queue cannot hold a infinite number of events. If events are queued by the parser thread quicker than the event dispatcher thread can dequeue them, then eventually the queue will fill up.
Once full the queue won't accept any new event. When trying to queue an event the parser thread will block at most 2 seconds before giving up if during those 2 seconds the queue did not accept its event. When giving up, you get a error message in the B3 log. This means that during that time the parser thread did not read the next line of the game log file.
If this situation lasts, the parser thread might have difficulties to keep up with the rate at which the game server writes new game log lines.

How to find out the bottlenecks
1st, set up the log level down to 8 ; this will make B3 collect how much time the different plugins take to treat events. Statistics will be dump to the B3 log file every minute. This allows you to figure out what type of event are slow to treat and what plugin might be the bottleneck.

How to tweak B3
If you find out that the event queue gets full for short period of time but on a regular manner, then it means that your game server might produce burst of lines in the game log which make the parser thread to produce lots of B3 events quickly for a short period. This happens if you run a large game server with lots of players. the more players the more events ;this also happens on servers that are setup to log each bullet hit. The solution is then to increase the event queue size.

If you find out that the event queue gets full for long period of time, this probably mean that the event dispatcher thread is unable to keep up with the rate at with your game server produce game log lines. This is most likely because of a slow plugin or because of too many plugins. You then can either disable the slow plugin or improve its code (by having the plugin run its slow task in an other thread for example)

If you find out that the bottleneck is the rcon queries, then you can try to reduce the default rcon_timeout value which is by defaut 0.8s.
Logged

Jr. Member
**
OS: Linux
Type: Owner dedicated server(s)
Gameservers: UrT
Posts: 24
Offline Offline
« Reply #13 on: February 24, 2012, 10:19:42 PM »

Thanks for the explanation.  I do get all this though, and it runs pretty much as I figured.  The part that still doesn't make sense is the delay to rcon commands executed within an event handler.  As you say, the log reading thread is independent from the event dispatching thread.  Nevertheless, the problem I had, which even affected consecutive rcon commands run inside a plugin's event handler was completely resolved by speeding up the log reading thread (or rather stopping it from spending so much time idle).  In fact with the parser thread sped up, the system ran fine even limiting the event queue length to 5.  At no point does waiting for RCON queries to resolve appear to have had any part in the equation.

In both cases, CPU, network, database, and disk IO usage were extremely minimal, even on a very modest machine (2GHz dual core).  When the parser thread is made to sleep 90% of the time, the event dispatcher thread is spending huge amounts of time blocked from running, even in the middle of running a plugin function.  Removing every single plugin created some speedup, but still left performance extremely sluggish.  And during this whole debugging session, there were typically only 6 players in the game, mostly bots.  Is there a chance you have some unintended threading interdependency/race conditions interacting with RCON or perhaps the event queue?  I have a strong suspicion these issues would not be reproducible on a single core (non-hyperthreaded) machine.

I really don't understand why there's a call to thread.sleep() in every iteration of parser->run()->process_line in the first place.  That amount of forced task switching is hugely expensive and inefficient, especially on Linux where the CPU timeslices are rather large compared to Windows.  And if in fact some plugin were doing a lot of work on an event, it would make no difference since the time being lost isn't by doing work but by waiting despite still having work to do--and it's not the plugins doing that unless they are blocking on something that shouldn't block.

I would have expected a rate limiter to allow operation for X milliseconds (looping as many times as possible during that span) before forcing a sleep() call while there's still known work to be done.  Then once reached, the outer loop should sleep for the remainder of its "delay" timeslice, which would make performance consistent across a wide range of workloads.  All that sleeping which isn't even for blocking IO makes the CPU spend more time switching tasks, not running them, and doesn't do other processes like the game server itself any favors.  The OS is the proper authority on process priority, which can be told to lower B3 priority if it does in fact manage to steal the CPU time it uses from the game server.  As it's currently implemented, the config variables control not the amount of CPU time B3 is allowed to spend, but rather the maximum amount of work B3 is allowed to do even if CPU throughput were infinite and task switching free.

But again, none of that explains why any of this affects the execution of code handling already-dispatched events anyway.  That has to be caused by some form of unforeseen resource contention within B3.  Remember, my B3's event dispatcher is running faster not by doing less work or working less often, but by its parser thread spending more time trying to work while there's work to be done, and less time waiting for nothing in particular.

My conclusion without digging deeper (as I really haven't the time to do, unfortunately) is that the parser thread is going to sleep while leaving the RCON and possibly event queue as well in an unusable, thread-locked state.  This in turn would force the dispatcher thread to block on those resources, limiting the rate at which it can use RCON to the rate at which the parser processes lines (or loops and finds none).  And if that isn't the case, I really haven't a clue what could be going on.   Embarrassed
Logged
Senior Dev.
*
OS: Linux
Type: Home user
Posts: 3483
Offline Offline
WWW
Support Specialty: B3-Core, UrT/SmG/BFBC2 parsers, Plugin development
« Reply #14 on: February 25, 2012, 05:43:45 AM »

Quote
The part that still doesn't make sense is the delay to rcon commands executed within an event handler.
On Quake3 engine, rcon commands are sent over UDP and their eventual responses are also received over UDP. B3 has to ensure we received the response from a 1st rcon command before sending a 2nd one ; or we will be unable to tell for sure which response matches which query.
Do do so, the B3 Rcon class uses a lock to ensure only one rcon command is sent at once. the lock is only released when the response for the rcon command is received, or timeout.
As such, you can spam rcon commands that send back a response at a higher rate than, rcon commands that do not get response back.
Note that this affects all rcon commands whatever thread they are originating from. So spamming rcon commands in the parser thread might affect how fast plugins can use the rcon facility.


Quote
the problem I had, which even affected consecutive rcon commands run inside a plugin's event handler was completely resolved by speeding up the log reading thread (or rather stopping it from spending so much time idle).
removing the time.sleep(self.delay2) from parser.py  ?

Quote
In both cases, CPU, network, database, and disk IO usage were extremely minimal, even on a very modest machine (2GHz dual core).
I totally agree with you when reading a game log from local disk. If I remember correctly, the limitations are mostly because of the ability for B3 to read game log over the network with FTP/HTTP/HTTPS. See 4253b0b3

Quote
Is there a chance you have some unintended threading interdependency/race conditions interacting with RCON or perhaps the event queue?
there is a chance, but if this is the case it is still to discover.

Quote
I have a strong suspicion these issues would not be reproducible on a single core (non-hyperthreaded) machine
I don't believe in this explanation as Python is not able to do real multi-threading as you would have in other languages. Search "python GIL" for the reason why.

Quote
How would I go about triggering iourt41.py->getPlayerTeams() from a plugin to ensure sync is maintained?
from a plugin just call self.console.getPlayerTeams()

To conclude, could you confirm that your issues were originating from the value of lines_per_second ? If so, I would be in favour of removing the time.sleep(self.delay2) from parser.py when lines_per_second is not set in the b3 config file.
« Last Edit: February 26, 2012, 02:20:49 AM by Courgette » Logged

Tags: urt 
Pages: [1] 2 3   Go Up
  Print  
 
Jump to:  


Rate this page +1 at Google Search


SimplePortal 2.3.1 © 2008-2009, SimplePortal