Introduction to curl
curl is a free and open-source software for transferring data from and to a server.
It is a simple and robust command-line tool.
It supports many protocols including HTTP. It is quite easy to use.
For example, you can get an HTML page just by typing the URL of the webpage next to curl command.
curl example.com
Code language: CSS (css)
You will see the page source of the web page. Note that curl does not have an HTML parser as the browser does.
curl takes HTTP by default. If you are going to curl to an HTTPS website, you have to specify a complete URL.
curl https://www.facebook.com
Code language: JavaScript (javascript)
You can send a Get request with curl
curl http://www.example.com/login?name=ryan
Code language: JavaScript (javascript)
You can send a Post request with curl
curl –data "name=ryan" http://www.example.com/login
Code language: JavaScript (javascript)
I assume you have the basic knowledge of curl.
HTTP Redirects
HTTP, HyperText Transfer Protocol, is an application-level protocol for data communication in the World Wide Web.
It is the most used protocol on the internet for transferring data. You saw above, how you can get a web page using HTTP protocol with the help of curl.
HTTP is a protocol with so many features, and one of its main features is redirects
. It is one of the fundamental concepts in HTTP. HTTP represents redirects by 3** status codes.
There are several types of redirects available to you, including 301, 302.
Redirects work exactly how its name suggests.
Sometimes, when you send a request to the server asking for a webpage, you will get instruction from the server instead of the requested web page.
This instruction will tell you to look over here for the requested web page.
Let’s check this example.
curl google.com
Code language: CSS (css)
when you curl to google.com you will get following response. Note that you are calling to http://www.google.com here.
Output
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>301 Moved</title>
</head>
<body>
<h1>301 Moved</h1> The document has moved
<a href="http://www.google.com/">here</a>
</body>
</html>
Code language: HTML, XML (xml)
The server tells you, this document is moved to a new location. It has sent you the address of the new location. Status code for this redirect is 301. The meaning of 301 is it is a permanent move. In contrast, 302 means the move is temporary.
So, to get to google.com, you have to do another curl to the specified address.
Why don’t you see such behavior from a browser?
Because browsers do HTTP redirection for you automatically.
curl, by default, does not support redirection. But with an extra argument, you can instruct curl to follow redirects.
There are two options for this. You can use -L or –location.
curl -L google.com
Code language: CSS (css)
now you will get the huge page source of the google.com which I don’t want to put here.
Temporary and Permanent Redirects
Redirects are all not the same. There are several things that we have to consider. We already got to know that some redirects are permanent while others are temporary.
What does this mean is that HTTP expects the user agent (browser/curl) should cache the redirected URI if it is a permanent redirect, and from the next time onwards, the user agent should directly go to the redirected URI.
If it is a temporary redirect, the user agent should not cache the redirected URI and keep trying the original URI in every subsequent request.
Browsers have this capability, but curl does not have it. Hence there is no difference between temporary and permanent redirects for curl.
Number of Redirects
Sometimes, redirection happens in a loop. The first address will be redirected to another address, which will redirect to another address.
When redirects
are enabled, curl will allow being redirected up to 50 times. The limit was set to avoid getting caught in endless loops.
Let’s say A redirects to B and B redirects to A.
This will cause an endless loop.
Such a situation can occur either by mistake of someone or someone’s malicious intention. If you want to increase the number of redirects to be followed, you can do so with the –max-redirs option.
curl -L –max-redirs 10 example.com
Code language: CSS (css)
Redirection methods- GET/ POST
You can follow redirects with a Get request like below.
curl -L http://www.example.com/login?name=ryan
Code language: JavaScript (javascript)
You can follow redirects with a Post request like below
curl -L –data "name=ryan" http://www.example.com/login
Code language: JavaScript (javascript)
There are a few things to know about HTTP redirect methods.
If we consider 301 and 302 redirections, both of them treat redirect requests as GET methods. That means, even if you send a POST request to the original request, the redirect request will be a GET one.
On the other hand, 307 and 308 will keep the original method for redirection.
Curl follows these standards without any problem. But there are a lot of web services available on the internet that use 301 and 302 redirects, yet want both the original and redirected request to be a POST one.
You can do it with curl using –post301 and –post302 options.
Redirect to a different host
Redirection can occur within the same server or between different servers. For example, a website can be moved to a new hosting provider.
curl behaves differently when it is redirected to other hosts by limiting what data it sends.
So, if you want to provide credentials like usernames and passwords and fully trust the redirected server, you should tell it to curl by calling the –location-trusted option.
Conclusion
cURL provides many options to interact with servers for transferring data. You can use many of those options with curl redirect option as well. Also, note that curl does the redirection from the server end. You will find it very difficult to implement client-based redirections such as HTML redirects with curl as curl never parses the HTML sources. Let me know if you have any questions about cURL follow Redirect.
Thank you the redirection -L was exactly what i was looking for.