An O(nh) Algorithm for Dual-Server Coordinated En-Route Caching in Tree Networks - PDF

Please download to get full document.

View again

of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Bills

Published:

Views: 2 | Pages: 6

Extension: PDF | Download: 0

Share
Related documents
Description
An O(nh) Algorithm for Dual-Server Coordinated En-Route Caching in Tree Networks Shihong Xu 1 and Hong Shen 2 1 School of Information Science Japan Advanced Institute of Science and Technology 1-1 Asahidai,
Transcript
An O(nh) Algorithm for Dual-Server Coordinated En-Route Caching in Tree Networks Shihong Xu 1 and Hong Shen 2 1 School of Information Science Japan Advanced Institute of Science and Technology 1-1 Asahidai, Nomi, Ishikawa Japan 2 School of Computer Science The University of Adelaide SA 5005, Australia Abstract Dual-server coordinated en-route caching is important because of its basic features as multi-server en-route caching. In this paper, multi-server coordinated en-route caching is formulated as an optimization problem of minimizing total access cost, including transmission cost for all access demands and caching cost of all caches. We first discuss an algorithm for single-server en-route caching in tree networks and then show that this is a special case of another algorithm for dual-server en-route caching in tree networks whose time complexity is O(nh). 1. Introduction A dual-server network is a special type of multi-server network [21] where there are two servers in operation and a user can access files from any server they have access rights to. Dual-server networks can increase efficiency of transmission, reduce latency and provide redundancy. Dual-server en-route caching is important because of its basic features as multi-server en-route caching. However, traditional optimization methods for en-route caching are mainly designed for networks with a single server and can t be directly applied to networks with more than one server. In this paper, we first formulate multi-server coordinated en-route caching as an optimization problem of minimizing total access cost, including transmission cost and caching cost, and then propose an algorithm for dual-server coordinated en-route caching in tree networks. As the result, we integrated two technologies, dual-server caching and enroute caching, to provide a better caching solution for network environments. The rest of the paper is organized as follows. In section 2, related work on caching is discussed. In section 3, a formulation of the problem is proposed. Dynamic programming based solutions are presented in section 4. Section 5 is the conclusion. 2. Related Work As an important technology to enhance content delivery and alleviate server load [5], [4], caching has attracted much attention. Significant effort has been made on the topic of optimizing cache performance [2], [22], [18], [3], cooperation among several caches [11], [15], [8], [7], and cache hierarchies [20], [17]. Recent studies have focused on the benefits of cooperative caching for distributed systems and large-scale systems [2], [12], [20]. In [23], web caching problem in a hierarchy of collaborating proxy servers is studied and a collaboration protocol is proposed to reduce duplicate caching between a proxy and its parent or higher level proxies in the hierarchy when making a caching decision. In [9], the authors examined three practical cooperative placement algorithms for large-scale distributed caches and showed that cooperative object placement could significantly improve performance compared to local replacement algorithms, particularly when the sizes of individual caches are small compared to those of the objects. Recent advances on web caching [6] have enabled the development of a new caching architecture, called en-route web caching [1], [16], [10], [13]. The problem studied in [19] considered the coordinated en-route caching problem for linear topology, deciding the optimal locations for placing copies of an object among the en-route caches. This scheme, which optimizes the placement of objects on the path from the client to the server, has been shown to perform significantly better than other schemes that either object placement or replacement in individual caches only are considered. Paper [14] studied optimal methods for unconstrained and constrained en-route web caching in tree networks, which extended the scope of optimization to a tree rooted at the server and got better performance compared with [19]. At the same time, [14] solved constrained en-route web caching problem, including the constraints of caching exactly K-copies and at most K-copies. The multi-server cache location problem in linear arrays has already been studied in [10]. However, our work focuses on dual-server coordinated en-route caching in tree networks where object placement and replacement polices are carried out in an coordinated fashion. We formulate multi-server coordinated en-route caching as an optimization problem of minimizing total transmission cost for all demands and caching cost of all caches. Then we present two algorithms, one for single-server en-route caching in tree networks and the other for dual-server en-route caching in tree networks. 3. Problem Formulation First of all, we present a formulation of multi-server enroute caching in general networks which adopts the method in [10] to express the transmission cost of access demands. We consider a network comprising n clients and m servers, where a client can issue requests to each of the servers. At each client there is a cache supporting coordinated en-route caching [19]. If an object O is cached at a client, requests for the object arriving at the client can be satisfied by the cache; otherwise, the requests are forwarded until they are satisfied by other caches or the server. The sets of clients and servers are denoted respectively by C = {c i, 1 i n} and S = {s i, 1 i m}. Because cache space is limited (we assume it is always fully occupied), we need to remove some objects already cached before caching a new object at a client and this leads to miss penalty when these removed objects are requested later. We denote miss penalty of these objects by p c, c C and call it caching cost of object O at client c. A demand is a request for object O from a client to a server and we denote by f c,s the frequency of a demand from c to s. We use d(c, s), c C, s S to denote the sum of distances between c and s and call d(c, s) f c,s transmission cost of a demand from c to s. We note that the formulation of en-route caching problem in [19] and [14] are described in a form of caching cost and caching gain at individual nodes. However, we formulate the problem in a form of total cost of the entire network, including transmission cost on all links and caching cost at all caches. Suppose A is the set of clients that cache the target object and v A is the nearest node that caches the object on the path from c to s, then the transmission cost of a demand from c to s is the product of the frequency and the distance from the client to node v, which can be denoted by min d(c, v) f c,s, (1) v path[c,s] (A S) where path[c, s] is the set of nodes on the path between c and s. The total access cost in the network is the sum of transmission cost of all demands and caching cost at all caches defined as (2). Our objective is to place copies in a subset A, A C so that the total access cost(2) is minimized. min d(c, v) f c,s + p c (2) v path[c,s] (A S) c C,s S c A We define the problem of multi-server coordinated en-route caching as follows: A newtwork can be represented by an undirected graph (V, E), where V = {c 1, c 2,...c n, s 1, s 2,...s m } is the set of nodes comprising n clients and and m servers. Frequencies for all demands are denoted by set F = {f c,s } : C S R, caching cost at all clients are denoted by P = {p c } : C R. Our objective is to place copies in a subset A C, so that the total access cost is minimized, which can be formulated as follows G(F, A ) = min G(F, A) = A min { min d(c, v) f c,s + p v }, (3) A v path[c,s] (A S) c C,s S v A where A is a feasible solution and A is an optimal solution. If only one server is contained in the network, the model is for single-server problem which has already been solved by [19] and [14] for linear array and tree networks respectively. What we want to solve is the dual-server problem, that is, two servers are contained in the network. However, we can t get a solution to dual-server problem by simply composing the solutions of single-server problems one by one. This is because the solutions of single-server problems may contradict with each other. It s possible that optimal locations to place the copies in dual-server context may differ from the optimal locations in single-server context. 4. Dynamic Programming Solutions Dual-server coordinated en-route caching is important because it s the basis of multi-server coordinated en-route caching. However, in the paper we first discuss two algorithms, one for dual-server en-route caching in linear array and the other for single-server en-route caching in tree networks before we concentrate on dual-server en-route caching in tree networks. 4.1. Solution to Single-Server En-Route Caching in Tree Networks Before focusing on the dual-server problem in tree networks we solve the single-server problem first. This problem has been solved by [14] with a algorithm whose time complexity is O(n 2 ) for an n-node tree. In this paper we present another algorithm whose time complexity is O(nh), where h is the height of the tree. We use a bottom-up dynamic programming approach enlightened by [10] to get a solution of a tree from the solutions of it s children. Let T w denote the subtree rooted at node w, C(w) denotes children set of node w, G(w, l) means the total access cost to satisfy the requests issued by the clients in T w optimally, where l is the distance from node w to the nearest cache on the path to the server. The optimal solution corresponding to G(w, l) is denoted by A(w, l). Figure 1. definition of G(w, l) When we consider the relation between G(w, l) for node w and that for it s children, we have the following theorem. Theorem 1 For a tree T r, we have G(F, A ) = w i C(r) G(w i, 1), where A is an optimal solution to Equation (2). For inner node w and distance l, we have G(w, l) = min{g without (w, l), G caching (w, l)}, (4) wi C(w)A(w i, l + 1) (G A(w, l) = without (w, l) G caching (w, l)) wi C(w)A(w i, 1) {w} (G without (w, l) G caching (w, l)). where G without (w, l) = w i C(w) G(w i, l + 1) + l f w is the minimum access cost in the case that the object is not cached at node w and G caching (w, l) = w i C(w) G(w i, 1) + p w is the minimum access cost in the case that the object is cached at node w. (5) Proof. First, all subtrees of the root node are independent of each other and the distance from the root node to it s children is 1, so it s easy to find out that G(F, A ) = w i C(r) G(w i, 1). Then for inner node w: (1) If w / A, because G without (w, l) is an optimal solution, we have G without (w, l) w i C(w) G(w i, l+1)+ l f w. On the other hand G(w i, l + 1), w i C(w) are also optimal solutions, so we have w i C(w) G(w i, l + 1) G without (w, l) l f w. From two inequations, we have G without (w, l) = w i C(w) G(w i, l + 1) + l f w and A(w, l) = wi C(w)A(w i, l + 1). (2) If w A, in a similar way we have G caching (w, l) = w i C(w) G(w i, 1) + p w and A(w, l) = wi C(w)A(w i, 1) {w}. From (1)(2), we know Theorem 1 is correct. Based on Theorem 1, we have dynamic programming algorithm as follows which runs in bottom-up order. Algorithm 1: EW C1 Step 1. Initialization: A(w, l) = φ; G(w, l) = l f w, for all w and l, 1 l h; Step 2. Running the algorithm in post-order traversal for all non-leaf w, 1 l h w do if w G(w i C(w) i, l + 1) + l f w w G(w i C(w) i, 1) + p w then G(w, l) = w G(w i C(w) i, l + 1) + l f w A(w, l) = wi C(r) A(w i, l + 1) else G(w, l) = w G(w i C(w) i, 1) + p w A(w, l) = wi C(r) A(w i, 1) {w} endif endfor. Complexity Time What we should compute is all G(w, l), w C, 0 l h w and time complexity of the algorithm is O(nh). In the worst case h = n and time complexity is O(n 2 ) which is same as the method in [14]. In the best case h = 1, where each node is a leaf node except the root node r, time complexity is O(n) Solution to Dual-Server En-Route Caching in Tree Networks The method used in previous section can be extended easily to solve multi-server case in O(nh m ) time complexity through computing all combination of G(w, l 1, l 2,...l m ), where m is the number of servers. However, in this section, we only focus on dual-server network and present an algorithm whose time complexity is O(nh). If only one server is contained in the network, the algorithm equals to EW C2; if the tree degenerates to a line, the algorithm is equivalent to EW C1. Now, let us think about the location of a server in a tree network. It s certain that if a server is located in a tree, it must be in a leaf node or root node. (If there is a server located at an inner node, the tree can be decomposed so that the inner node becomes a leaf node or the root node.) For such a tree with two servers contained, we adjust the shape of tree as follows so that some property can be achieved. (1) We adjust the location of one server so that the node becomes the root of the tree. We call the node root-server node. (2) For the other server, if it s not the most left node, we swap the node with his brother (the most left one). We swap the upper nodes (parent, grandfather...) of the server in the same way until the server becomes the left-most node in the tree. We call the server node leaf-server node. In process(1), the height of the tree changes. Suppose h is the height of the tree before adjusting and h after adjusting, then we have 0 h 2h. In both process (1) and (2), the connectivity never changes and the network adjusted is an isomorphic structure to the original network. We can get the original network from the latter in the reversed process of (1) and (2). We call the tree adjusted as a dual-server tree. Now, we consider the solution of dual-server en-route caching in a weighted tree network from listing the difference between the single server case and two-server case as follows: (1) In the dual-server case, the root server has only one child (otherwise the network can be further decomposed) and the leaf server is located at the most left node in tree. (2) In the dual-server case, a client can issue demands to two servers. Here, we denote root server by s and the leaf server by s. f w is the frequency of demand from node w to server s, f w is the frequency of demand from node w to server s. G(w, l) means the optimal access cost to satisfy the requests issued by the clients in T w optimally, where l is the distance from node w to the nearest cache on the path to root server s and the corresponding optimal solution is A(w, l). Let L(w, l) denote the distance from w to the nearest client caching the object in the path to leaf server s as Figure() shows. Based on the analysis above, we have the following theorem. Theorem 2 For a dual-server tree T r defined before, in the case that the object is cached at node w, we have L(w, l) = 0; otherwise in the case that the object is not cached at node w, we have L(w, l) = Figure 2. Definition of L(w, l) Figure 3. An Example L(w, l + 1) + 1 (w path[s, s ]) L(w, l D(w) + 1) D(w) (w / path[s, s ] and l D(w)) l(w / path[s, s ] and l D(w)), where D(w) is the distance between w and the nearest node w in path[s, s ](w = w when w path[s, s ]). w is the child of w in path[s, s ]. Proof. If node w is on the path[s, s ], the nearest client caching the object from node w to leaf server s must be on the path[s, s ] too; otherwise it can be on path[s, w ] or path[w, w], where w = path[s, s ] path[w, w] is the first node from w to s. (1) if node w is on path[s, s ] and w is the child of w in path[s, s ], we have L(w, l) = L(w, l + 1) + 1. (2) if node w is not on path[s, s ] and the nearest client caching the object is in the path[s, w ], then L(w, l) = (6) L(w, l D(w)) + D(w). Because w is in path[s, s ], we have L(w, l D(w)) = L(w, l D(w) + 1) + 1 according to (1), where w is the child of w in path[s, s ]. At last, we have L(w, l) = L(w, l D(w) + 1) D(w). (3) if node w is not on the path[s, s ] and the nearest client caching the object is on path[w, w], it s easy to see that L(w, l) = l Now that we have already known the distances from w to s and s, we can give a theorem for the optimal solution as follows. Theorem 3 For a dual-server tree T r defined before, we have G(F, A ) = G(w, 1), where A is an optimal solution to Equation (2), w is the child of root node r. For node w and distance l, we have G(w, l) = min{g without (w, l), G caching (w, l)} (7) wi C(w)A(w i, l + 1) (G A(w, l) = without (w, l) G caching (w, l)) wi C(w)A(w i, 1) {w} (G without (w, l) G caching (w, l)) (8) where G without (w, l) = w i C(w) G(w i, l + 1) + f w l + f w L(w, l) is the optimal access cost in the case that the object is not cached at node w and G caching (w, l) = w i C(w) G(w i, 1) + p w is the optimal access cost in the case that the object is cached at node w. Proof. Similar to Theorem(1) and is left out because of space limitations. Based on Theorem 3, we present a dynamic programming algorithm which runs in post-order(left-right-root) traversal. Algorithm 2: EW C2 Step 1. Initialization: Adjust the tree so that s is the most left node, s is the root node and the height of tree is h. A(w, l) = φ; G(s, l) = 0, for each distance l, 1 l h; L(s, l) = 0, for each distance l, 1 l h ; Step 2. Running the algorithm in post-order traversal for all node w and distance l, 1 l h w do if w path[s, s ] then L(w, l) = L(w, l + 1) + 1 else if w / path[s, s ] and l D(w) then L(w, l) = L(w, l D(w) + 1) D(w) else if w / path(s, s ) and l D(w) then L(w, l) = l endif; M = w i C(w) G(w i, l + 1) + L(w, l)f w + lf w ; N = w i C(w) G(w i, 1) + p w ; if M N then G(w, l) = M A(w, l) = wi C(w)A(w i, l + 1) else L(w, l) = 0 G(w, l) = N A(w, l) = wi C(w)A(w i, 1){w} endif; enddo. Time Complexity Computing G(w, l), where w C, 1 l h, results in the time complexity of the algorithm to be O(nh). In the case that only one server is contained in the tree, we suppose L(w, l) = f w = 0 then this algorithm equals to EWC2. In the case that the tree degenerates to a linear array, we suppose l = i and L(w, l) = r l i then this algorithm becomes EWC1. Now, we give an example for dual-server coordinated enroute caching to show the process how algorithm EWC3 works. Consider a tree network consisting of five client w i, 1 i 5 and two servers s 0, s n+1 as Figure?? shows. All demands from clients to servers are {f 1 = f 2 = f 3 = f 4 = f 5 = f 6 = 1, f 1 = f 2 = f 3 = f 4 = f 5 = f 6 = 2}, caching cost at each client p 1 = p 2 = p 3 = p 4 = p 5 = p 6 = 4. Running the algorithm EWC3, we can get an optimal solution G(F, A ) = 17, A = {w 2, c 3 }. The process is as follows: Table 1. D (w, l) l D (5,l) D (4,l) D (3,l) D (2,l) D (1,l) / / /4 2 Table 2. G(w, l) l G(5,l) G(4,l) G(3,l) G(2,l) G(1,l) * 7* * 3* * / / Conclusions Deploying multiple servers and en-route caching is an important technology to improve the scalability of networks. However, previous work on en-route caching is mostly devoted in single-server networking environment and can t be applied to networks with more than one server. In this paper we formulate multi-server coordinated enroue caching as an optimization problem of minimizing total access cost and present several algorithms for different cases. Our algorithm for dual-server en-route caching in tree networks can also be used in dual-server linear array and single-server tree networks. Our method can be easily extended to the multi-servers case, though the time complexity will then rise to exponential. However, we have reduced the complexity for the dualserver case after some local optimization. It s a challenging task for us to optimize the multi-server case and reduce it s complexity to polynomial. Another challenging task for our future research is to present an approximate method to solve the problem with multiple servers efficiently and the problem in arbitrary topologies. The techniques of applying dynamic programming shown in the paper can serve as useful tools for deriving such solutions in the ge
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks