We have one particular client that is experiencing a large number of partial session disconnects. By this I mean that the screens go black, sometimes flash a bit, and then come back up. The session appears to be in the same state that it was prior to the "blackout" - as the client has termed it. It does not matter if the client user is sitting idle, typing away in Word, Outlook or some business application. The "blackouts" seem totally random.
The configuration looks like this:
- VMware View 4.6 on the DVMs.
- DVMs are Windows XP.
- All Hosts are ESXi 4.1
- All Hosts are in a secure data center
- All End Points are in a remote office
- Remote office is connected to data center via 20mbps fixed / 100mbps burstable MetroE line.
- All endpoints are PCoIP thin clients.
- Tried Firmware 3.2, 3.3, 3.3.1, and now 3.4
- There are about 20 users - most (all?) have experienced the issue.
Note that we have many other clients using the same exact end points connected to the same exact infrastructure across AT&T metroE lines that are terminated by Cisco 28xx or 29xx routers. We do not have this issue anywhere except with this one particular client.
Here is what we've done so far:
- We let MTR run for over a week. We sent 16,607,039 packets and received 16,606,717 of them. This means we dropped 322 packets out of 16,607,039. This equates to a packet loss of 0.0019% - well within operational guidelines.
- During the same MTR run, we observed an average latency of just 4ms.
- Verified zero errors on the router interfaces connecting to MetroE on either end.
- Verified zero packet drops on the router interfaces on either end.
- PCoIP log files are showing mostly 0.00% packet loss.
This is a snippet from a log file from a DVM that is experiencing blackouts....note that things appear to be chugging along just fine, and then there is an entry that says "MGMT_IMG :Imaging Timer expiry." This is followed by "MGMT_IMG :Resetting encoder (reset_type=1)" and then "IMG_FRONTEND :close_displays: reset SVGADevTap." I'm not a PCoIP expert, but things like Resetting Encoder and Close_Displays sure seem bad. Surprisingly, there are no noticeable bandwidth issues or packet loss logged prior to the blackout.
Here is the complete log snippet:
06/09/2011, 10:42:44.631> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: bw limit = 2500, plateau = 2500.0, avg tx = 14.1, avg rx = 7.8 (KBytes/s)
06/09/2011, 10:42:44.631> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/331763 T=009795/247648/098768 (A/I/O) Loss=0.00%/0.00% (R/T)
06/09/2011, 10:42:45.740> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: round trip time (ms) = 4, variance = 1, rto = 105
06/09/2011, 10:43:11.334> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: round trip time (ms) = 4, variance = 1, rto = 105
06/09/2011, 10:43:12.599> LVL:2 RC: 0 MGMT_IMG :log: cur_s 0 max_s 30 tbl 0 bwc 0.00 bwt 17.58 fps 0.27 fl_ps 1.57
06/09/2011, 10:43:12.599> LVL:2 RC: 0 MGMT_IMG :log: chg pix: 792768, chg pix not motion: 792768
06/09/2011, 10:43:12.599> LVL:2 RC: 0 MGMT_IMG :log: delta bits encoded: 356216, delta build bits encoded: 55864.
06/09/2011, 10:43:12.599> LVL:2 RC: 0 MGMT_IMG :log: enc bits/pixel - 0.45, enc bits/sec - 11869.54, enc MPix/sec - 0.03, decode rate est (MBit/sec) - 0.00
06/09/2011, 10:43:33.302> LVL:2 RC:-500 MGMT_IMG :Imaging Timer expiry.
06/09/2011, 10:43:33.302> LVL:2 RC: 0 MGMT_IMG :Resetting encoder (reset_type=1)
06/09/2011, 10:43:34.302> LVL:0 RC: 0 IMG_FRONTEND :close_displays: reset SVGADevTap.
06/09/2011, 10:43:34.302> LVL:2 RC: 0 MGMT_IMG :CODEC: State change from CODEC_RUNNING to CODEC_DISABLED
06/09/2011, 10:43:34.302> LVL:0 RC: 0 IMG_FRONTEND :Calling open display in Tera1 mode.
06/09/2011, 10:43:34.302> LVL:0 RC: 0 IMG_FRONTEND :configure_displays: 2 display(s) initially reported!
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_display[0]--* id: 5 mon_id: 0 pos: (0,0) w: 1280 h: 960
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_display[0]--bpp: 32 --pitch: 5120 --map size: 2416 --fb size: 4915200
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_displays[0]--rot: 0 --pre-rot pitch: 5120 --frontend motion is enabled
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_display[1]--* id: 6 mon_id: 1 pos: (1280,0) w: 1280 h: 960
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_display[1]--bpp: 32 --pitch: 5120 --map size: 2416 --fb size: 4915200
06/09/2011, 10:43:34.318> LVL:0 RC: 0 IMG_FRONTEND :configure_displays[1]--rot: 0 --pre-rot pitch: 5120 --frontend motion is enabled
06/09/2011, 10:43:34.349> LVL:0 RC: 0 MGMT_IMG :Image Engine detected display #0 (1280x960) with offset (0x0)
06/09/2011, 10:43:34.349> LVL:0 RC: 0 MGMT_IMG :Image Engine detected display #1 (1280x960) with offset (1280x0)
06/09/2011, 10:43:34.349> LVL:2 RC: 0 MGMT_IMG :CODEC: State change from CODEC_DISABLED to CODEC_DMT_EXCHANGE
06/09/2011, 10:43:34.709> LVL:2 RC: 0 MGMT_IMG :CODEC: Processing MGMT_IMG_APDU_TYPE_DMT_ACK. [pri = 0]
06/09/2011, 10:43:34.709> LVL:2 RC: 0 IPC :Allocated 284 slice ref descriptors
06/09/2011, 10:43:34.709> LVL:0 RC: 0 IPC :cSW_HOST_IPC: New sub-session ID is 3
06/09/2011, 10:43:34.709> LVL:2 RC: 0 MGMT_IMG :CODEC: State change from CODEC_DMT_EXCHANGE to CODEC_RUNNING
06/09/2011, 10:43:34.724> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=0 refvld=0 seq=0
06/09/2011, 10:43:34.724> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=1 refvld=0 seq=0
06/09/2011, 10:43:34.740> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=2 refvld=0 seq=0
06/09/2011, 10:43:34.740> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=3 refvld=0 seq=0
06/09/2011, 10:43:34.756> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 0 seq 0 (ref)
06/09/2011, 10:43:34.756> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 1 seq 0 (ref)
06/09/2011, 10:43:34.756> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=4 refvld=0 seq=0
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 2 seq 0 (ref)
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 3 seq 0 (ref)
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=5 refvld=0 seq=0
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=6 refvld=0 seq=0
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=7 refvld=0 seq=0
06/09/2011, 10:43:34.771> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=8 refvld=0 seq=0
06/09/2011, 10:43:34.787> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 4 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 5 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=9 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 6 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=10 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=11 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=12 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 7 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 8 seq 0 (ref)
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=13 refvld=0 seq=0
06/09/2011, 10:43:34.802> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=14 refvld=0 seq=0
06/09/2011, 10:43:34.818> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 9 seq 0 (ref)
06/09/2011, 10:43:34.834> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=0 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 10 seq 0 (ref)
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 11 seq 0 (ref)
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=1 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=2 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=3 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=4 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=5 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=6 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=7 refvld=0 seq=0
06/09/2011, 10:43:34.849> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=8 refvld=0 seq=0
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 12 seq 0 (ref)
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 13 seq 0 (ref)
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 14 seq 0 (ref)
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=9 refvld=0 seq=0
06/09/2011, 10:43:34.865> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=10 refvld=0 seq=0
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 0 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 1 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=11 refvld=0 seq=0
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 2 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 3 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 4 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 5 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=12 refvld=0 seq=0
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 6 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=13 refvld=0 seq=0
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 7 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 8 seq 0 (ref)
06/09/2011, 10:43:34.881> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: IPLP: recode from scratch: fsp=14 refvld=0 seq=0
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 9 seq 0 (ref)
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 10 seq 0 (ref)
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 11 seq 0 (ref)
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 12 seq 0 (ref)
06/09/2011, 10:43:34.896> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 13 seq 0 (ref)
06/09/2011, 10:43:34.959> LVL:2 RC: 0 MGMT_IMG :SW_HOST_IPC: Encoder clearing recode for fsp 14 seq 0 (ref)
06/09/2011, 10:43:36.865> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: round trip time (ms) = 4, variance = 1, rto = 105
06/09/2011, 10:43:42.599> LVL:2 RC: 0 MGMT_IMG :log: cur_s 1 max_s 30 tbl 0 bwc 1.01 bwt 17.58 fps 1.43 fl_ps 3.33
06/09/2011, 10:43:42.599> LVL:2 RC: 0 MGMT_IMG :log: chg pix: 5854592, chg pix not motion: 5854592
06/09/2011, 10:43:42.599> LVL:2 RC: 0 MGMT_IMG :log: delta bits encoded: 8660416, delta build bits encoded: 1015832.
06/09/2011, 10:43:42.599> LVL:2 RC: 0 MGMT_IMG :log: enc bits/pixel - 1.48, enc bits/sec - 288531.76, enc MPix/sec - 0.20, decode rate est (MBit/sec) - 0.00
06/09/2011, 10:43:47.943> LVL:2 RC: 0 MGMT_PCOIP_DATA :Tx thread info: bw limit = 2500, plateau = 2500.0, avg tx = 110.5, avg rx = 4.8 (KBytes/s)
06/09/2011, 10:43:47.943> LVL:1 RC: 0 VGMAC :Stat frms: R=000000/000000/334601 T=009795/249542/099574 (A/I/O) Loss=0.00%/0.00% (R/T)
So once the displays resync, there are a slew of recodes (to be expected) and then things go back to moving along just fine with a 0% packet loss and normal responsiveness reported by the user.
VMware tech support seems to be at a complete loss.
Does ANYONE have any ideas?