Network
Inter-domain Routing
- 2 types of routing algos
| Comparison | Link State algo | Distance Vector algo | 
|---|---|---|
| Router knowledge | All know full network topology & link cost info | Only know connected neighbors & link costs (to all nodes) | 
| Algo type | Global / Centralized e.g. Djikstra | Decentralized, iteratively by exchanging info with neighbours. Only determines next hop | 
| Algo impl | Open Shortest Path First (OSPF) | Routing Info Prtcl (RIP) | 
Link State Routing Limitations
- High bandwidth: - Topology info is flooded (to other routers)
 
- Sensitive info released by nodes
- High processing overhead: Everything computed locally by node
- Unit representing distance is not the same for all nodes
Distance Vector
- Send distance metric per destination
- Adv- Hide details of topology
- Only next hop determined per node
 
- Disadv- Inconsistent units representing distance
- Slow convergence due to counting-to-infinity- Counting to infinity: (14:00) A - B - C, if link BC is cut, B will still think A can reach C and increment path cost, A will still think B can reach and increment path cost etc
 
 
Path Vector = Extension of distance vector
- Send entire path for each destination d
- Adv:- Avoid count-to-infinity problem
- Detect and Avoid loops
 
- In terms of ASes, average of only 3 hops needed (flattening of the internet as they seek to shorten the path for customers)
Border Gateway Protocol (BGP)
- Inter-domain routing protocol (Slide 13)- Allows subnet to advertise to rest of Internet
- Allows ASes to determine “good” routes to other networks
 
- Main goals:- 1) Fulfil agreements with other ISPs- Define who provide transit for what (based on relationship)- Customer-Provider Relationship- AS defined as Provider: Provide transit service to customer
- AS defined as Customer: Pay provider for internet routing- Types:- "multi-homing" if customers has multiple providers. dual-homed if has 2.
- Nontransit AS: Provider never flows traffic through customer.- May not need BGP: If no need to route traffic, can just use provider's static IP, no need IDR (Slide 36)
 
- Selective Transit: Allows some AS to flow traffic through, others deny. - Unless defined, customers never route traffic through themselves
 
 
 
- Types:
 
- AS Peer to peer Relationship- AS defined as Peering
- 2 (big) ASes agree to transit between their customers- Only between the 2 ASes; relationship is not transitive
 
- Usually don't pay each other
- Usually confidential
- Traffic Exchange Ratio should be roughly balanced
- Pros:- Reduce costs
- Improve end-to-end performance
- May be the only way
 
- Cons:- No profit
- Peers are competition
- Peering requires periodic renegotiation
 
 
- Sibling to Sibling Relationship- AS-AS belong to same company
- Share everything
 
 
- Customer-Provider Relationship
 
- Define who provide transit for what (based on relationship)
- 2) Minimize costs
- 3) Ensure good performance for customers
 
- 1) Fulfil agreements with other ISPs
- Tiers:- Tier 1 AS / ISP: Top of the customer-provider hierarchy- only have peers (no upstream)
- Don't have to pay anyone
- P2P with other T1s to form a full-mesh- Around 10-12 ASes (AT&T etc)
 
- Lower layer: National / regional scope
- Stub ASes: Customers
- List at CAIDA AS RANK
 
 
- Tier 1 AS / ISP: Top of the customer-provider hierarchy
- BGP Implementation: 2 BGP routers (between ASes) exchange messages- Depends on policy of AS (Slide 57)- Import policy: Don't record prefixes that you don't want your AS to help route- Rank customer routes over peer routes
 
- Export policy: Don't advert routes you don't want neighbour ASes to use- Providers advertises with everyone it services
- e.g. if AS is Customer in Customer-Provider relationship:- Customer don't want route traffic through itself- Won't announce route to its peers
 
 
- Customer don't want route traffic through itself
- e.g. P2P:- Don't want peers to route stuff through them to another peer:- AS exports only customer routes to a peer
- AS exports a peer's routes only to its customers
 
 
- Don't want peers to route stuff through them to another peer:
 
- Manipulate route preferences: Artificially make routes look longer / shorter to make external ASes prefer certain routes
 
- Import policy: Don't record prefixes that you don't want your AS to help route
- Application layer, TCP port 179
- Advertise paths to different destination network prefixes
- Procedure: Slide 14
- 1) BGP routers (2 AS, 1 in each AS) start session on TCP port 179, by sending OPEN BGP msgs and ACK by KEEPALIVE.- 1.1) BGP Router also learns of prefix-port mappings from BGP route adverts from iBGPs
 
- 2) Exchange all active routes in their routing tables by sending UPDATE BGP msgs
- 3) Exchange incremental updates using UPDATE, and correct errors using NOTIFICATION.- UPDATE message format: Slide 19- Marker (16b), Length (2b), Type (1b), Withdrawn Routes Length (2b), Withdrawn Routes (variable), Path Attribute Length (2b), Path Attributes (variable), Network Layer Reachability Information (NLRI) (variable)
- Can withdraw many routes in 1 message
- NLRI: Reachable IP prefixes (IP subnet mask) using route defined by Path Attribute- e.g. Prefix: 138.16.64/22
 
- A route consists of NLRI (prefixes) + path (defined by path attr)
- Thus, can only advertise 1 route per message
- Defining Path Attributes: Slide 21- well-known mandatory: must be included if defining new route, and must be forwarded (S23)- ORIGIN: Origin of prefix
- AS-PATH: Sequence of ASes in route (e.g. AS 79, AS 11)
- NEXT-HOP: IP Addr of BGP router in next-hop AS
 
- well-known discretionary
- optional transitive
- optional non-transitive: only to adjacent AS
 
- well-known mandatory: must be included if defining new route, and must be forwarded (S23)
 
- No timeout on routes: Invalid routes are removed with UPDATE message
 
- UPDATE message format: Slide 19
- 3.1) If multiple routes to same prefix: Tie-break BGP routes by:- 1) Shortest AS-PATH
- 2) Hot Potato Routing: Get it out of AS ASAP (S30)- Use OSPF to choose closest NEXT-HOP AS router
 
 
- 3.2) Update RIB (Slide 56) and forwarding table (Slide 31) using inbound UPDATEs- Build a database: Routing Information Base (RIBs)- Adj-RIBs-In: unprocessed inbound UPDATEs- Between Adj-RIB-In and Loc-RIB: Route selection. Tie break by (Slide 65):- 1) Largest LOCAL_PREF: 4-byte unit (default 100) for BGP to indicate route preference. This is not forwarded.
- 2) Smallest AS_PATH
- 3) Smallest ORIGIN number
- 4) Smallest MULTI_EXIT_DISC (optional non-transitive). In use if peer AS has many BGP entry-points.
- 5) Routes from eBGP are preferred over iBGP
- 6) Smallest interior cost based on NEXT_HOP
- 7) COMMUNITY: for influencing neighbor's neighbors, used to group destinations
 
 
- Between Adj-RIB-In and Loc-RIB: Route selection. Tie break by (Slide 65):
- Loc-RIB: selected best routes.
- Adj-RIBs-Out: selected routes for advertisement to peers - propagate inbound UPDATE as outbound UPDATE to peers
 
 
- Adj-RIBs-In: unprocessed inbound UPDATEs
- Maintain forwarding table of prefix-port entries (e.g. 138.16.64/22, port 4)
 
- Build a database: Routing Information Base (RIBs)
- 4) Repeat 3 forever. Close connection with NOTIFICATION BGP msg.- Connection down: Invalidate all routes through disconnected peer
 
 
- Depends on policy of AS (Slide 57)
- BGP/IGP model used by ISPs (Slide 15)
- Types of BGP: eBGP and iBGP- exterior BGP peering (eBGP): Between ASes Slide 16 - exchange reachability info between ASes
- Directly connected
- ASes advertise their network prefix
- No expiration timer for routes
- All routes through peer become invalid if it goes down
 
- interior BGP peering (iBGP): Within AS: Slide 17- Propagate reachability info to other border routers within AS
- Don't have to be directly connected
- MUST be (logically) fully meshed- Each iBGP router pass on prefixes they learn from the other AS
- Does not pass on prefixes learnt from other iBGP speakers- Info not repeated, reduce overhead, scalable
 
 
 
- Interior Gateway Protocol (IGP): The normal network routing
 
- exterior BGP peering (eBGP): Between ASes Slide 16 
BGP Prefix Hijacking (S69)
- 2 ASes share the same prefix
- Some traffic lost to fake prefix holder:- Blackhole: data traffic discarded
- Spoofing / Impersonation: Data traffic inspected and redirected
 
BGP Subprefix Hijacking (S70)
- Exploit of Longer prefix matching
- 1 AS with shorter (more general) prefix hijacked by longer (more specific) prefix- e.g. 12.34.0.0/16 hijacked by 12.3.158.0/24- All traffic routed to 12.3.158.0/24
 
- Can visualize with BGPlay
- Can prevent with anomaly detection, checking prefix ownership
 
- e.g. 12.34.0.0/16 hijacked by 12.3.158.0/24
BGP in practice
- Preference (attr)
- Filtering (inbound/outbound filter)
- Tagging (COMMUNITY attr)
- Applications- relations- influence process
- control route export
 
- traffic engineering- inbound
- outbound
- remote control
 
 
- relations
Peer 2 Peer
- Client/server = asymmetric- Extension: iteratively/recursively delegate other servers to do task - e.g. like DNS, a tree structure
 
 
- Extension: iteratively/recursively delegate other servers to do task 
- Pure P2P- No central entity
- All entities directly communicate
- No structure; flat architecture
- Unreliable; how to stay connected or lookup?
 
- Napster: Central index server- People register with this server
- Central server knows all peers and files in network
- Search peers by keyword- Delegation; delegate downloading to P2P
 
- Pros- Single server: consistent view of network
- Fast and efficient searching
- Guarantee correct search results
 
- Cons- Single point of failure
- Large computation to handle queires
- Downloading from a single peer only
- Unreliable content
- Vulnerable to attacks
- Copyright issues
 
 
- Gnutella: Peers are switches, flood queries- Only peers- People register by connecting to another active peer- Switch topology- Queries are flooded in the network
- Once joined, they learn about others and learn topology
 
- Download directly from peer
 
- Switch topology
 
- People register by connecting to another active peer
- Pros- Fully distributed
- Open protocol
- Robust against node failures
- Robust to DOS
 
- Cons- Inefficient queries flooding
- Poor network management: Nodes need to keep probing
 
 
- Only peers
- KaZaA: Peers only, but some peers delegated as local server- 2 types of nodes- Ordinary Node (ON): peer
- Supernode (SN): peer with more resources & responsibilities- Promoted from ON if it has enough resources (bandwidth & uptime)
- Promotion is consensual with user
- Avg lifetime of 2.5 hours
- Don't cache info from disconnected ON
 
 
- SN - ON : One - Many relationship- ON only connected to one SN (and nothing else) which acts as their gateway
 
- SN acts as local server for all connected ON
- SN do not form a complete mesh
 
- 2 types of nodes
- Skype: Similar to KaZaA with priorietary stuff- Similar infrastructure to KaZaA
- P2P with proprietary application-layer encryption
- ON and SN infrastrcuture; distributed SNs help map usernames to IP addresses
- Problem of NAT- NAT prevents peers outside of network to directly connect
- SNs are used to keep track of relay nodes
- Relay nodes used to handle NATs
 
 
- BitTorrent: Seed split data into chunks, tracker help leechers download from each other S23- A network (swarm) to distribute a file- 1 main server (seed) has the original copy broken into 256KB chunks
- Seed starts tracker server- tracker keeps track of seeds and peers in the network (swarm)
 
- Seed creates torrent-file (metadata on chunks with checksums) and hosts it on a webserver somewhere
- Client obtains torrent-file
- Client contacts tracker and connects to peers
- Client downloads/exchange data with peers- Download rarest chunk first from neighbors
- Uploading chunks to neighbors: Tit-for-tat- Send to top 4 neighbors currently sending her chunks at the highest rate, re-eval/10s
- Send to random peer every 30s to optimistically unchoke top 4
 
 
 
- Pro- Can send ".torrent" link which always refer to same file
 
- Con- Hard to identify and find particular files
 
 
- A network (swarm) to distribute a file
P2P lookup services
- Searching VS Addressing
- How network is constructed- Unstructured: cannot use addressing- But peers can join anyone and objects can be stored anywhere
 
- Structured- Structured to define object locations
- Allow deterministic routing & addressing
- e.g. Key-value pairs, hashtables
- Massive index: Create a distributed hash table- Every node handles multiple buckets (change as nodes join and leave)
- Nodes communicate with each other to maintain table
 
 
 
- Unstructured: cannot use addressing
- How objectives are placed
- How efficient objects can be found
- Addressing- Uniquely ID objects and maintain indexing structure
 
- Searching- Need to make objects searchable
 
- Distributed Hashtable- Chord S38- You have nodes and you have objects, SHA-1 Hash both of them- Node ID: Hash Node's IP addr
- Obj ID: Hash Object's name
 
- Store Node IDs in circular linked list S39- Each node in array keeps track of predecessor and successor in circular array (clockwise)
- To store an object: Hash the object to get its object ID, then step clock-wise from ID until you find a node (i.e. immediate successor). That node stores the object
- Circular linked list: O(n) time to for a node to access an object in another node- Finger table: every node has shortcut links to non-neighbor nodes S43- At most m shortcuts
- Interval: ith finger at least  apart - 1st:1, 2nd:2, 3rd:4, 4th:8...
- Defined as [start, end)
- Exponential until it loops back O(log(n))
 
- Start: starting node of the interval (assuming all node IDs are occupied)
- Node: The actual node used. If there is no node with that node ID, finger points to its immediate successor; however, the interval definition is not affected; this variable tracks that- i.e. if [4,6) but shifted to 5, it's still defined as [4,6) and next interval is still 4 spaces
 
- When a node leaves: S50- All nodes periodically ping successors and keep track of 2nd successor
- Predecessor will detect leave event and change its immediate sucessor
 
- Efficiency- Search: O(log(n))
- Responsibility N nodes and K objects: O(K/N) objects per node
- Node joining / leaving: O(logN x logN) messages to:- re-establish routing and finger tables
- initialize finger tables for new node
 
 
 
 
- Finger table: every node has shortcut links to non-neighbor nodes S43
 
 
- You have nodes and you have objects, SHA-1 Hash both of them
- CAN S54- Think of Chord as a 1 dimensional donut (torus)- 1 dimensional donut: predecessor and successor
 
- CAN: d-dimensional donut (torus)
- Doesn't use finger tables, can just use neighbors (since it's d-dimensional)- the stuff in the slides is 2D; neighbors can go up down left right.- When a new node joins:- Cut area of responsibility by cutting vertically before cutting horizontally
 
- Object IDs are coordinates
 
- When a new node joins:
- Slide 64: state refers to information maintained per node
- Avg path length: since we know predecessor and successor, we can cut the circular path in half and take the shorter path
 
- the stuff in the slides is 2D; neighbors can go up down left right.
 
- Think of Chord as a 1 dimensional donut (torus)
 
- Chord S38
CDN
- Traffic has to go through internet (middle-mile) which can be congested- Distance between user and server negatively affects latency, throughput
- Solutions: connect to a machine closer to client- Big data center CDNs
- Highly distributed CDNs
- P2P