2011년 5월 25일 수요일

Network Performance Tuning

Network Performance Issue
최적화를 위해 network은 다음 3가지 조건을 갖추워야 한다.
1.       정확한 DATA의 전송이 이루어져야 한다.
2.       network user들의 요구에 부합하는 충분한 bandwidth을 제공해야 한다. 만약, bandwidth이 충분치 못하면 두 point간의 전송 시 매우 많은 시간이 소요된다.
3.       network에 있는 각 system들은 network traffic을 제어하기 위해 충분히 빨라야 한다.
bandwidth : 대역폭
네트웍에서 이용할 수 있는 신호의 최고 주파수와 최저 주파수의 차이를 말한다. 일반적으로는 통신에서 이용 가능한 최대 전송속도, 즉 정보를 전송할 수 있는 능력을 뜻하며, 그 기본 단위로는 bps를 사용한다.
모뎀에서 전송속도가 28.8 Kbps라는 것은 초당 28,800 비트를 전송할 수 있다는 것을 의미한다. 보통 14.4 ~ 28.8 Kbps 정도는 문자열을 보내고 받기에 적당하고, 음악이나 동영상 같은 멀티미디어 자료를 전송 받으려면 ISDN과 같은 고속 회선을 사용하는 것이 좋다. 전화선을 통한 정보 전송은 이론적으로 수십 Mbps까지 가능하지만, 전화국의 교환기 등에서 대역폭을 64 Kbps로 제한하고 있다.


DATA Corruption on the network
-         network 문제를 간단히 살펴보기 위한 툴로 “netstat –I”가 있다.
n         System이 booting이후에 발생한 모든 input/oujtput packet의 수등이 report 된다.
n         input-error 나 out-error는  0.025%이하여야 하며 collision이 10%에 근접하면network에 overload가 초래된다.
리눅스
Kernel Interface table
Iface     MTU Met   RX-OK RX-ERR RX-DRP RX-OVR   TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0       1500   0  349202      0      0      0   84382      0      0      0 BMRU
lo        16436   0   16513      0      0      0   16513      0      0      0 LRU


SUN
Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs Collis Queue
lo0   8232 loopback      localhost      261676 0     261676 0     0      0    
hme0  1500 tmaxs1        tmaxs1         2059927 0     34231257 0     0      0


AIX
이름  Mtu   네트워크    주소               Ipkts Ierrs    Opkts Oerrs  Coll
en0   1500  link#2      0.6.29.dc.b2.16   775439936     0 116707019     0     0
en0   1500  192.168.1   tmaxi2            775439936     0 116707019     0     0
lo0   16896 link#1                        -721764089     0 -730012441     0     0
lo0   16896 127         localhost         -721764089     0 -730012441     0     0
lo0   16896 ::1                           -721764089     0 -730012441     0     0


HP
Name      Mtu  Network         Address         Ipkts   Ierrs Opkts   Oerrs Coll
lan0      1500 192.168.0.0     tmaxh2          85874293 0     71213981 0     0  
lo0       4136 loopback        localhost       14306406 0     14306409 0     0


-         gateway에서 발생한 error의 근원을 발견하기 위해 “netstat –s”를 사용할 수 있으며 이는ip, icmp, tcp, udp 별로 전송된 DATA량 및 발생된 error의 수를 report한다.
SUN
[inter999:/sapora/user/inter999]netstat -s
UDP
       udpInDatagrams      =  6475     udpInErrors         =     0
       udpOutDatagrams     =33554694
TCP     tcpRtoAlgorithm     =     4     tcpRtoMin           =   400
       tcpRtoMax           = 60000     tcpMaxConn          =    -1
       tcpActiveOpens      = 37791     tcpPassiveOpens     = 11709
       tcpAttemptFails     = 24940     tcpEstabResets      =    94
       tcpCurrEstab        =    51     tcpOutSegs          =723168
       tcpOutDataSegs      =477019     tcpOutDataBytes     =178839753
       tcpRetransSegs      =   831     tcpRetransBytes     = 61751
       tcpOutAck           =246148     tcpOutAckDelayed    = 47625
       tcpOutUrg           =     1     tcpOutWinUpdate     =    27
       tcpOutWinProbe      =     0     tcpOutControl       =104467
       tcpOutRsts          = 29816     tcpOutFastRetrans   =     5
       tcpInSegs           =1351793
       tcpInAckSegs        =421535     tcpInAckBytes       =178860343
       tcpInDupAck         = 25863     tcpInAckUnsent      =     0
       tcpInInorderSegs    =1080491    tcpInInorderBytes   =1038297986
       tcpInUnorderSegs    =   950     tcpInUnorderBytes   = 67992
       tcpInDupSegs        =   257     tcpInDupBytes       =107879
       tcpInPartDupSegs    =     4     tcpInPartDupBytes   =  1296
       tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
       tcpInWinProbe       =     0     tcpInWinUpdate      =     0
       tcpInClosed         =     0     tcpRttNoUpdate      =    66
       tcpRttUpdate        =397997     tcpTimRetrans       =   941
       tcpTimRetransDrop   =    66     tcpTimKeepalive     =   382
       tcpTimKeepaliveProbe=    47     tcpTimKeepaliveDrop =     0
       tcpListenDrop       =     0     tcpListenDropQ0     =     0
       tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =    42
IP      ipForwarding        =     2     ipDefaultTTL        =   255
       ipInReceives        =1503680    ipInHdrErrors       =     0
       ipInAddrErrors      =     0     ipInCksumErrs       =     0
       ipForwDatagrams     =     0     ipForwProhibits     =     0
       ipInUnknownProtos   =     0     ipInDiscards        =     0
       ipInDelivers        =1401339    ipOutRequests       =34234131
       ipOutDiscards       =     0     ipOutNoRoutes       =     0
       ipReasmTimeout      =    60     ipReasmReqds        =     0
       ipReasmOKs          =     0     ipReasmFails        =     0
       ipReasmDuplicates   =     0     ipReasmPartDups     =     0
       ipFragOKs           =     0     ipFragFails         =     0
       ipFragCreates       =     0     ipRoutingDiscards   =     0
       tcpInErrs           =     0     udpNoPorts          =385690
       udpInCksumErrs      =     0     udpInOverflows      =     0
       rawipInOverflows    =     0
ICMP    icmpInMsgs          =  5648     icmpInErrors        =     0
       icmpInCksumErrs     =     2     icmpInUnknowns      =     0
       icmpInDestUnreachs  =   263     icmpInTimeExcds     =     0
       icmpInParmProbs     =     0     icmpInSrcQuenchs    =     5
       icmpInRedirects     =     0     icmpInBadRedirects  =     0
       icmpInEchos         =   176     icmpInEchoReps      =  5202
       icmpInTimestamps    =     0     icmpInTimestampReps =     0
       icmpInAddrMasks     =     0     icmpInAddrMaskReps  =     0
       icmpInFragNeeded    =     0     icmpOutMsgs         =   178
       icmpOutDrops        =     0     icmpOutErrors       =     0
       icmpOutDestUnreachs =     2     icmpOutTimeExcds    =     0
       icmpOutParmProbs    =     0     icmpOutSrcQuenchs   =     0
       icmpOutRedirects    =     0     icmpOutEchos        =     0
       icmpOutEchoReps     =   176     icmpOutTimestamps   =     0
       icmpOutTimestampReps=     0     icmpOutAddrMasks    =     0
       icmpOutAddrMaskReps =     0     icmpOutFragNeeded   =     0
       icmpInOverflows     =     0
IGMP:
         0 messages received
         0 messages received with too few bytes
         0 messages received with bad checksum
         0 membership queries received
         0 membership queries received with invalid field(s)
         0 membership reports received
         0 membership reports received with invalid field(s)
         0 membership reports received for groups to which we belong
         0 membership reports sent


HP
$ netstat -s
tcp:
       61780317 packets sent
               40026627 data packets (2022562296 bytes)
               33851 data packets (20402833 bytes) retransmitted
               21750704 ack-only packets (5140731 delayed)
               0 URG only packets
               28 window probe packets
               51 window update packets
               10729813 control packets
       85413720 packets received
               30922858 acks (for 2025048627 bytes)
               61302 duplicate acks
               0 acks for unsent data
               49505640 packets (1931008279 bytes) received in-sequence
               64 completely duplicate packets (93112 bytes)
               349 packets with some dup, data (483560 bytes duped)
               10553 out of order packets (12988300 bytes)
               4 packets (1037213427 bytes) of data after window
               195 window probes
               4458436 window update packets
               3219 packets received after close
               1 segment discarded for bad checksum
               0 bad TCP segments dropped due to state change
       723322 connection requests
       4471901 connection accepts
       5195223 connections established (including accepts)
       5872161 connections closed (including 677123 drops)
       673863 embryonic connections dropped
       26359442 segments updated rtt (of 26359442 attempts)
       46875 retransmit timeouts
               4317 connections dropped by rexmit timeout
       28 persist timeouts
       15476 keepalive timeouts
               15309 keepalive probes sent
               47 connections dropped by keepalive
       0 connect requests dropped due to full queue
       977309 connect requests dropped due to no listener
udp:
       0 incomplete headers
       0 bad checksums
       0 socket overflows
ip:
       85224681 total packets received
       0 bad IP headers
       0 fragments received
       0 fragments dropped (dup or out of space)
       0 fragments dropped after timeout
       0 packets forwarded
       0 packets not forwardable
icmp:
       6517276 calls to generate an ICMP error message
       5665 ICMP messages dropped
       Output histogram:
        echo reply: 415
        destination unreachable: 6511197
        source quench: 0
        routing redirect: 0
        echo: 0
        time exceeded: 0
        parameter problem: 0
        time stamp: 0
        time stamp reply: 0
        address mask request: 0
        address mask reply: 0
       0 bad ICMP messages
       Input histogram:
        echo reply: 1762
        destination unreachable: 6511314
        source quench: 7
        routing redirect: 0
        echo: 415
        time exceeded: 12
        parameter problem: 0
        time stamp request: 0
        time stamp reply: 0
        address mask request: 0
        address mask reply: 0
       415 responses sent
igmp:
       0 messages received
       0 messages received with too few bytes
       0 messages received with bad checksum
       0 membership queries received
       0 membership queries received with incorrect fields(s)
       0 membership reports received
       0 membership reports received with incorrect field(s)
       0 membership reports received for groups to which this host belongs
       0 membership reports sent


Gathering Network Integrity data from NFS(Network File System)
-         “nfsstat –c”를 사용하여 system의client측 NFS 통계를 report할 수 있다.
n         retrans field는 이 host가 어떤 RPC 클라이언트에 재전송한 packet의 수를 마타내며, 어떤 NFS file을 read/write할 때 발생하는데 만약 Client nfs call의 total수의 5%를 넘으면 심각한 문제가 있다.
n         badxid field와 retrans filed를 비교하여 대략 같으면 network의 NFS server는 클라이언트의 요구에 대해 문제를 가지고 있음을 의미한다.
SUN
[inter999:/sapora/user/inter999]nfsstat -c
Client rpc:
Connection oriented:
calls       badcalls    badxids     timeouts    newcreds    badverfs   
1331        0           0           0           0           0          
timers      cantconn    nomem       interrupts
0           0           0           0          
Connectionless:
calls       badcalls    retrans     badxids     timeouts    newcreds   
7           1           0           0           0           0          
badverfs    timers      nomem       cantsend   
0           4           0           0          
Client nfs:
calls       badcalls    clgets      cltoomany  
7           1           7           0          
Version 2: (6 calls)
null        getattr     setattr     root        lookup      readlink   
0 0%        5 83%       0 0%        0 0%        0 0%        0 0%       
read        wrcache     write       create      remove      rename     
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%       
link        symlink     mkdir       rmdir       readdir     statfs     
0 0%        0 0%        0 0%        0 0%        0 0%        1 16%      
Version 3: (0 calls)
null        getattr     setattr     lookup      access      readlink   
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%       
read        write       create      mkdir       symlink     mknod       
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%       
remove      rmdir       rename      link        readdir     readdirplus
0 0%        0 0%        0 0%        0 0%        0 0%        0 0%       
fsstat      fsinfo      pathconf    commit     
0 0%        0 0%        0 0%        0 0%       
Client nfs_acl:
Version 2: (1 calls)
null        getacl      setacl      getattr     access     
0 0%        0 0%        0 0%        1 100%      0 0%       
Version 3: (0 calls)
null        getacl      setacl     
0 0%        0 0%        0 0%


HP
$ nfsstat -c
Client rpc:
Connection oriented:
calls                   badcalls                badxids                
0                       0                       0                      
timeouts                newcreds                badverfs               
0                       0                       0                      
timers                  cantconn                nomem                  
0                       0                       0                      
interrupts             
0                      
Connectionless oriented:
calls                   badcalls                retrans                
2883                    0                       0                      
badxids                 timeouts                waits                  
0                       0                       0                      
newcreds                badverfs                timers                 
0                       0                       21                     
toobig                  nomem                   cantsend               
0                       0                       0                      
bufulocks              
0                      
Client nfs:
calls                   badcalls                clgets                 
2883                    0                       2883                   
cltoomany              
0                      
Version 2: (2883 calls)
null                    getattr                 setattr                
0 0%                    2864 99%                0 0%                   
root                    lookup                  readlink               
0 0%                    3 0%                    0 0%                   
read                    wrcache                 write                  
0 0%                    0 0%                    0 0%                   
create                  remove                  rename                 
0 0%                    0 0%                    0 0%                   
link                    symlink                 mkdir                  
0 0%                    0 0%                    0 0%                   
rmdir                   readdir                 statfs                 
0 0%                    12 0%                   4 0%                   
Version 3: (0 calls)
null                    getattr                 setattr                
0 0%                    0 0%                    0 0%                   
lookup                  access                  readlink               
0 0%                    0 0%                    0 0%                   
read                    write                   create                 
0 0%                    0 0%                    0 0%                    
mkdir                   symlink                 mknod                  
0 0%                    0 0%                    0 0%                   
remove                  rmdir                   rename                 
0 0%                    0 0%                    0 0%                   
link                    readdir                 readdir+               
0 0%                    0 0%                    0 0%                   
fsstat                  fsinfo                  pathconf               
0 0%                    0 0%                    0 0%                   
commit                 
0 0%


Network and CPU Load
CPU에 load가 많이 걸리면 network의 performance가 떨어지게 되는데 spray 유틸리티를 이용하여 시스템의 CPU를 check할 수 있다.
[inter999:/sapora/user/inter999]spray localhost
sending 1162 packets of length 86 to localhost ...
       164 packets (14.114%) dropped by localhost
       66 packets/sec, 5706 bytes/sec


중요한 요소는 drop된 packet의 수인데 drop된 수가 5%이하의 적은 수라면 문제가 없으나 그 수가 많다면 packet을 receive하는 host보다 더 빠르게 packet을 generate하는 것을 나타내므로 host가 네트워크에 반응할 수 있도록 빠르지 못하며 CPU에 load가 많음을 의미한다.
Reducing the NFS Workload
NFS 서버의 workload를 줄 일려면 클라이언트 시스템의 /etc/fstab 파일을 수정하여 read와write buffer size을 늘여주는 것이 좋으며 만약 두 시스템의 pagesize가 4096 byte라하면
server:/remfs/dataspace /space nfs rw,hard,wsize=4096,rsize=4096 0 0
시스템의 page-size는 "pagesize" command를 사용하여 확인할 수 있으며 rsize와 wsize는remote filesystem에만 적용되며 local filesystem에 사용해서는 안 된다
Timeout
NFS 클라이언트가 어떤 주어진 시간 동안 NFS 요청에 대한 응답을 받지 못하면 time out이 발생하며, 이는 NFS 서버에 load가 많이 걸려 충분히 빠르게 NFS 요청을 처리해 주지 못함을 의미한다.
이런 경우 /etc/fstab 파일의 timeout period를 증가시켜 time out을 방지할 수 있다.
server:/mf /mf nfs noquota,hard,bg,intr,timeo=15 0 0
(이것은 timeout period가 1.5 second임을 의미)
"nfsstat -c" command를 사용 timeout된 수를 check할 수 있고 이때 call의 수에 비해 5% 이상이 발생되면 problem을 가지고 있음을 의미한다.

댓글 없음:

댓글 쓰기

ETL 솔루션 환경

ETL 솔루션 환경 하둡은 대용량 데이터를 값싸고 빠르게 분석할 수 있는 길을 만들어줬다. 통계분석 엔진인 “R”역시 하둡 못지 않게 관심을 받고 있다. 빅데이터 역시 데이터라는 점을 볼때 분산처리와 분석 그 이전에 데이터 품질 등 데이...