SCC problem with kosaraju’s algorithm
Problem extracted from coursera.org:
Download the text file here. Zipped version here. (Right click and save link as)
The file contains the edges of a directed graph. Vertices are labeled as positive integers from 1 to 875714. Every row indicates an edge, the vertex label in first column is the tail and the vertex label in second column is the head (recall the graph is directed, and the edges are directed from the first column vertex to the second column vertex). So for example, the 11th row looks liks : “2 47646”. This just means that the vertex with label 2 has an outgoing edge to the vertex with label 47646
Your task is to code up the algorithm from the video lectures for computing strongly connected components (SCCs), and to run this algorithm on the given graph.
Output Format: You should output the sizes of the 5 largest SCCs in the given graph, in decreasing order of sizes, separated by commas (avoid any spaces). So if your algorithm computes the sizes of the five largest SCCs to be 500, 400, 300, 200 and 100, then your answer should be “500,400,300,200,100”. If your algorithm finds less than 5 SCCs, then write 0 for the remaining terms. Thus, if your algorithm computes only 3 SCCs whose sizes are 400, 300, and 100, then your answer should be “400,300,100,0,0”.
The idea of Kosaraju’s algorithm is like this:
1. Compute the finishing time of all the vertices by doing a DFS on the reversed graph;
2. Replace the vertex index with its finishing time to get a new graph, DFS the new graph to compute each vertex’s leader vertex( in a strongly connected component(SCC) if exists any);
3. Do statistical job on the leader vertices’ index. If multiple vertices have same leader vertex, they are in the same SCC.
Here is a quick Python implementation on the problem without any optimization work:
import sys sys.setrecursionlimit(300000) source='SCC.txt' N=875714 def getG(): G={} Grev={} for i in range(1,N+1): G[i]=[] Grev[i]=[] fin=open(source) for line in fin: v1=int(line.split()[0]) v2=int(line.split()[1]) G[v1].append(v2) Grev[v2].append(v1) fin.close() return G, Grev #globals visited={} finish={} leader={} def init(): for i in range(1,N+1): visited[i]=0 finish[i]=0 leader[i]=0 def dfs(G, i): global t visited[i]=1 leader[i]=s for j in G[i]: if(visited[j]==0): dfs(G,j) t=t+1 finish[i]=t def dfs_loop(G): global t global s t=0 #number of nodes processed so far s=0 #current source vertex i=N while(i>0): if(visited[i]==0): s=i dfs(G,i) i=i-1 if __name__ == "__main__": init() g, grev=getG() dfs_loop(grev) #THE FIRST LOOP ON REVERSE GRAPH # construct new graph newGraph={} for i in range(1,N+1): temp=[] for x in g[i]: temp.append(finish[x]) newGraph[finish[i]]=temp init() dfs_loop(newGraph) #THE SECOND LOOP # statistics lst= sorted(leader.values()) stat=[] pre=0 for i in range(0,N-1): if lst[i]!=lst[i+1]: stat.append(i+1-pre) pre=i+1 stat.append(N-pre) L= sorted(stat) L.reverse() print(L[0:5])
Here is the C/C++ code directly translated from the Python. (Unlike the Python code above, I did the statistical job externally.)
#include <iostream> #include <fstream> #include <cstdio> #include <vector> #define N 875714 using namespace std; struct node{ bool visited; int leader; int finish; vector<int> linkedVertices; vector<int> rLinkedVertices; }; struct node G[N+1]; struct node newG[N+1]; int t=0; int s=0; void dfs(node G[], int i, bool reverse){ G[i].visited=true; G[i].leader=s; vector<int> next; if (reverse) next= G[i].rLinkedVertices; else next= G[i].linkedVertices; for(int j=0; j<next.size(); j++){ if(!G[next[j]].visited) { dfs(G, next[j], reverse); } } t++; G[i].finish=t; } void dfs_loop(node G[], bool reverse){ t=0; s=0; for(int i=N;i>0;--i){ if (!G[i].visited){ s=i; dfs(G,i,reverse); } } } int main(){ for(int i=1;i<=N;++i){ G[i].visited=false; } FILE* fp=freopen("SCC.txt","r",stdin); int head, tail; while (scanf("%d %d", &head, &tail) > 0) { G[head].linkedVertices.push_back(tail); G[tail].rLinkedVertices.push_back(head); } fclose(fp); dfs_loop(G, true);//FIRST LOOP for(int i=1;i<=N;++i){ newG[i].visited=false; newG[i].finish=0; newG[i].leader=0; vector<int> temp; for(int j=0; j< G[i].linkedVertices.size(); j++){ temp.push_back(G[G[i].linkedVertices[j]].finish); } newG[G[i].finish].linkedVertices=temp; } dfs_loop(newG,false);//SECOND LOOP ofstream ofs("stat.txt", ofstream::out); for (int k=1;k<=N;k++) ofs<< newG[k].leader<<endl; ofs.close(); return 0; }
i can see the result
sorry , i can not see output and result .
It definitely works for me. Make sure you have big enough stack to run the code.
stat.txt , what is it for ?
it is completely empty.
it’s the output of all the leader nodes indexes. you do statistic on the file and get the final result, or you just sort the leader indexes in place and get the right result(like what I did in the python code).
Are you sure?
import sys
sys.setrecursionlimit(300000)
source=’stat.txt’
N=875714
leader=[]
def init():
fin=open(source, “r”)
for line in fin:
leader.append(line)
fin.close()
if __name__ == “__main__”:
init()
lst= sorted(leader)
stat=[]
pre=0
for i in range(0,N-1):
if lst[i]!=lst[i+1]:
stat.append(i+1-pre)
pre=i+1
stat.append(N-pre)
L= sorted(stat)
L.reverse()
print(L[0:5])
I didn’t test your code. Seems like you are doing statistic job on the output file you got with the C++ code? Keep in mind: you should change the code accordingly in order to make it work. I won’t explain how, but it should be super super easy.
434821,968,459,313,211
got the above but it is wrong.. i have submitted it got 0/5
i am not so much familiar with c++ , i might be wrong , but i could not .
i just need the answer .
could you tell me what was you answer ?
fatal error: no input files
If you got this kind of error, it’s a “segmentation fault”, means your fault 😦
Change my code accordingly when you use it.
I have to congratulate you because your code in C++ es quite neat and optimize. It’s not a straightforward solution since the graph doesn’t need to use two list to representate edges and vertices. I get the segmentation fault in your python code using sys.setrecursionlimit() doesn’t change anything
i ran ur python code, wat i got as an output is just a blank line!!
Typical YOUR FAULT. Increase the stack size.
and how to that i am new to python
and how to do that pls tell m new to python. tell me what change i have to make in ur program??
i increased it using sys.setrecursionlimit(n) but to no avail
sys.setrecursionlimit(n), as the name implies, increases the maximum recurssion numbers, but it does NOT increase the stack size. Did you see the link I posted in the comment above? Take a look.
how to overcome stack overflow? plz help fast!
thnx
Both versions works perfectly! Thanks!
plz tell us the answer 🙂
whats the answer
what if we don’t know the actual node count ( in this case N) beforehand?
You could run the part of
first. Of course leaving the lines above (using N for the range) out.
Then you can check the length with len(G) and set it to a variable.
len(G) takes more time. If you want to apply the algo to arbitrary length of data, you should use len(G).
Sent from my iPhone
Great job! Thanks.
Your way to make newGraph is pretty smart.
how to increase stack size?
Read codehiker’s post in the comments on the “ulimit” command, it is really pretty easy then.
How much time it take? It’s look like it take for a week)
I use my own code but it similar to yours)
PS why don’t you use for visited set()? You don’t need dict becouse you don’t need value – you only need to know if you visited this vortex or not.
c++ version a few seconds, python longer.
Sent from my iPhone
but how long? I tried to use your version but it looks like that it isn’t faster as mine
Well, I didn’t say it’s an optimal solution, so you are welcome to come with a better solution 🙂
For python version I don’t quite remember, like 70 seconds or so.
Also, the reason I use dictionary is that I know dict is implemented using hash-table, so querying would be faster. I don’t use set quite often.
This was a home task in the coursera’s algorithms course. You’d better not published it or at least published it on the coursera’s forum.
It works.
Very nice, both c++ and python versions work with some configuration changes for my system.
is there any way to increase the stack size of python 3.3 in windows8…i get a error while using ulimit command
What do finish and leader do?
Can you explain your code in C++ with some comments or add an article at the end?
Thanks! Big Help
the c++ program is creating stat.txt but not providing final output as asked
so stat.txt would be just “lst” as in python code then we do statistical process on it
For C++:
Change the Stuff Starting From //SECOND LOOP
dfs_loop(newG,false); //SECOND LOOP
vectorscc(N+1,0);
vectornum;
for (int k=1;k= 0 && k < 5 ; i–){
printf("%d",scc[i]);
if(i != 1 && k != 4) printf(",");
k++;
}
return 0;
}
Also, remember to add #include for the sorting routine. Program returns correct solution in 10 seconds 🙂
Hi there, I know that the post is old, but let’s see if anyone can help.
I get a segmentation fault and then python quits unexpectedly.
I believe is due to the stack size. So is it all simply solved by adding
unlimited -s below the sys.setrecursionlimit(300000) line?
done, it was
>limit stacksize unlimited
for a csh shell
answer is:
434821, 968, 459, 313, 211
how did u get that?(in c++)
IndentationError: unexpected indent , stat=[]
her latest blog https://1xslots-africa.site