Topological Sort : DFS, BFS and DAG

Algorithms and Data Structures: TheAlgorist.com

System Design: www.System.Design

Low Level Design: LowLevelDesign.io

Frontend Engineering: FrontendEngineering.io
জয় শ্রী রাম
🕉
Related Chapter: Cycle Detection in A Graph
In this chapter we will talk about Topological Sorting of a Directed Acyclic Graph (DAG). But before that let us first refresh our memory about some of the important characteristics of Depth First Search (DFS) and Breadth First Search (BFS) :
 DFS and BFS are two graph search techniques.
 Both DFS and BFS find all nodes findable, and nothing more.
 Both DFS and BFS search all findable nodes in linear time, i.e, O(E + V), where E = number of edges, V = number of vertices.
Time Complexity of DFS / BFS to search all vertices = O(E + V)
Reason: O(1) for all nodes, O(1) for all edges,
because in both the cases, DFS and BFS,
we are going to traverse each edge only once
and also each vertex only once
since you don’t visit an already visited node.
DFS will only store as much memory on the stack as is required for the longest path from root to
the leaf nodes. In other words, its space usage is O(h) where h is the height of the tree.
BFS on the other hand will queue every node at a fixed depth before visiting the next depth.
If, for example, you have a complete binary tree, the number of leaves will be n/2 = O(n) where
n is the number of nodes in the tree. Each of those n/2 nodes will be queued up at some point.
Many trees you will come across are moderately balanced. Even randomly built binary search trees
have expected logarithmic height.
In these cases h = O(log n) < O(n) and DFS will therefore use less memory.
So DFS is more memory efficient than BFS.
Depth first search is more memory efficient than breadth first search also because you can backtrack sooner.
It is also easier to implement if you use the call stack but this relies on the longest path not overflowing the stack.
It’s also worth mentioning that BFS is often very easy to understand, by DFS is, in general, more efficient most of the time. But it totally depends on the kind of problem we are trying to solve.
Topological Sort is the most important operation on directed acyclic graphs or DAGs. It orders the vertices on a line such that all directed edges go from left to right. Such an ordering cannot exist if the graph is not a DAG and contains one or more directed cycle(s), because there is no way we can keep going on the right on a line and still return to a vertex already visited (we are talking about Back Edges here).
If any of you thinking why I did not mention crossedges along with backedges,
then reason for that is, BFS is used only in Undirected Graphs to determine whether
there is a cycle or loop in it by examining if there is a crossedge.
Showing that there is a crossedge while doing a BFS on Directed graph does not prove that the Directed Graph has a cycle.
If your graph is directed then you have to not just remember if you have visited a node or not, but also how you got there. Otherwise you might think you have found a cycle but in reality all you have is two separate paths A>B but that doesn’t mean there is a path B>A. For example,
If you do BFS starting from 0, it will detect as cycle is present but actually there is no cycle. With a depth first search you can mark nodes as visited as you descend and unmark them as you backtrack.
If your graph is directed then you have to not just remember if you have visited a node or not, but also how you got there. Otherwise you might think you have found a cycle but in reality all you have is two separate paths A>B but that doesn’t mean there is a path B>A. For example,
If you do BFS starting from 0, it will detect as cycle is present but actually there is no cycle. With a depth first search you can mark nodes as visited as you descend and unmark them as you backtrack.
Every DAG has at least one Topological Ordering.
Motivation for the Topological Sort:
Topological Sort is useful in scheduling tasks where precedence ordering matters, i.e, one task needs to be done before starting another. Topological sort can sequence tasks while respecting all sequence constraints without any conflict.A real world scenario:
In most academic programs there are prerequisite courses for taking a specific course. The prerequisite course(s) needs to be completed before taking the course. Topological sorting can give you the sequence in which different courses can be taken by students so that they can complete the prerequisite courses before taking a course.When Does A Directed Graph Have A Topological Ordering ?
There is a very clear necessary condition for a directed graph to have a topological sorting, and that is it better be acyclic. So the graph needs to be a DAG (Directed Acyclic Graph). Put in a different way, one of the ways to prove that a graph is a DAG is to show that it has a topological ordering.
Graph G has a directed cycle => G has no Topological Ordering
It is because if a graph G has a directed cycle that means it would have a backedge, which means when trying to form a topological ordering from at least one of the edges we will have a edge back to one of the earlier edges, or even to the starting edge, in some cases, depending on the graph.It has been seen that having a directed cycle is the only obstruction for having a topological sorting. If you don’t have a directed cycle in a graph G then then you are guaranteed to have a topological ordering for the graph G.
It has been seen that having a directed cycle is the only obstruction for having a topological sort. If you don’t have a directed cycle in a graph G then then you are guaranteed to have a topological ordering for the graph G.
DFS is a slick and highly efficient way to compute topological sort of a graph in linear time: O(E + V).
Sink Vertex:
While trying to implement topological sort using DFS it is important to know what sink vertices are. Sink vertex is the vertex which has no outbound edges. A DAG has to have at least one sink vertex. If we don’t have a sink vertex then that means that every edge will have at least one outgoing vertex and since we have finite vertices in our graph then by pigeon hole principle that means that, we will be visiting one of the already visited vertices, which implies we would have a directed cycle. In fact you must convince yourself that one of the sink vertices must be the end vertex of the topological sort. Sink vertices are the only candidates for the final/last vertex position in the topological ordering.
Corollary: A directed graph with no sink vertex must have cycle.
DFS Implementation for Topological Sort:
While doing a DFS we will search for the sink vertices, and as we get a sink vertex we will disconnect that vertex from the graph and will then backtrack and search for the next sink vertex along the search path, until all the vertices in the graph are visited. This way we will get all the vertices in totally reverse order of that of what a topological ordering should be. So we would need to reverse the order to get the topological sort.A bit more explanation: We can view a graph as a dependency graph where an edge (u > v) represent v is dependent on U. So u must be executed before v, which means in topological sort resultset u must come before v. You can extrapolate this concept and say all the nodes in a subtree is dependent on the root of that subtree. And this is true for all subtrees and all nodes which has one or more subtrees. So, if you do a DFS and include a node in topological sort resultset only after all the nodes in its subtree(s) have been visited then we will get the resultset exactly in the opposite order of the topological sort order. This is exactly what we have implemented below:
Login to Access Content
Significance of Node State of Visiting and Visited:
VISITING: The current Node is being processed, i.e, DFS for this Node has started, but not yet finished,
which means all descendants
in the DFS tree of this vertex are not processed yet. Which also implies,
this vertex is in the function call stack.
While doing DFS, if a Node whose state is Visiting is encountered, then the inbound edge to the Node with state Visiting is back edge and hence there is a cycle.
VISITED : The current Node and all of its descendants are processed.
While doing DFS, if a Node whose state is Visiting is encountered, then the inbound edge to the Node with state Visiting is back edge and hence there is a cycle.
VISITED : The current Node and all of its descendants are processed.
Topological Sort by BFS:
Topological Sort can also be implemented by Breadth First Search as well. In DFS implementation of Topological Sort we focused on sink vertices, i.e, vertices with zero outgoing edges, and then at last had to reverse the order in which we got the sink vertices (which we did by using a stack, which is a Last In First Out data structure). In BFS implementation of the Topological sort we do the opposite: We look for for edges with no inbound edges. And consequently in BFS implementation we don’t have to reverse the order in which we get the vertices, since we get the vertices in order of the topological ordering. We use FirstInFirstOut data structure Queue in this implementation. We just search for the vertices with zero inbound edges and put them in the queue till we have processed all the vertices of the graph. Polling vertices from the queue one by one give the topological sort of the graph.Login to Access Content
Applications:
Course Schedule
There are a total of n courses you have to take, labeled from 0 to n  1. Some courses may have prerequisites, for example to take course 0 you have to first take course 1, which is expressed as a pair: [0,1] Given the total number of courses and a list of prerequisite pairs, is it possible for you to finish all courses? For example: 2, [[1,0]] There are a total of 2 courses to take. To take course 1 you should have finished course 0. So it is possible. 2, [[1,0],[0,1]] There are a total of 2 courses to take. To take course 1 you should have finished course 0, and to take course 0 you should also have finished course 1. So it is impossible.Solution:
We can represent the input prerequisites as a graph. Let's represent the graph by a list of edges.Login to Access Content
Course Schedule  Find Order
There are a total of n courses you have to take labelled from 0 to n  1.
Some courses may have prerequisites, for example, if prerequisites[i] = [ai, bi] this means you must take the course bi before the course ai.
Given the total number of courses numCourses and a list of the prerequisite pairs, return the ordering of courses you should take to finish all courses.
If there are many valid answers, return any of them. If it is impossible to finish all courses, return an empty array.
Example 1:
Input: numCourses = 2, prerequisites = [[1,0]]
Output: [0,1]
Explanation: There are a total of 2 courses to take. To take course 1 you should have finished course 0. So the correct course order is [0,1].
Example 2:
Input: numCourses = 4, prerequisites = [[1,0],[2,0],[3,1],[3,2]]
Output: [0,2,1,3]
Explanation: There are a total of 4 courses to take. To take course 3 you should have finished both courses 1 and 2. Both courses 1 and 2 should be taken after you finished course 0.
So one correct course order is [0,1,2,3]. Another correct ordering is [0,2,1,3].
Example 3:
Input: numCourses = 1, prerequisites = []
Output: [0]
Solution:
This problem is all about printing the courses in their topological sort.Login to Access Content
Must Reads:
 Topological Sort w/ Vertex Relaxation Approach
 Finding Longest Paths Using Topological Sort
 Course Scheduling