by Sandra Henry Stocker

Unix Dweeb

Using curl and wget commands to download pages from web sites

How-To

Sep 20, 20235 mins

Linux

The curl and the wget commands make it easy to download content from web sites.

web pages browser internet search traffic seo

Credit: Getty Images

One of the most versatile tools for collecting data from a server is curl. The “url” portion of the name properly suggests that the command is built to locate data through the URL (uniform resource locater) that you provide. And it doesn’t just communicate with web servers. It supports a wide variety of protocols. This includes HTTP, HTTPS, FTP, FTPS, SCP, SFTP and more. The wget command, though similar in some ways to curl, primarily supports HTTP and FTP protocols.

Using the curl command

You might use the curl command to:

Download files from the internet
Run tests to ensure that the remote server is doing what is expected
Do some debugging on various problems
Log errors for later analysis
Back up important files from the server

Probably the most obvious thing to do with the curl command is to download a page from a web site for review on the command line. To do this, just enter “curl” followed by the URL of the web site like this (the content below is truncated):

$ curl https://www.networkworld.com/category/linux/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  124k    0     0    0     0      0      0 --:--:--  0:00:06 --:--:--     0




…

You’ll see some timing data plus the content. To save the content to a file, redirect the output to a file using a command like this:

$ curl https://www.networkworld.com/category/linux/ > linux.html
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  124k  100  124k    0     0  23339      0  0:00:05  0:00:05 --:--:-- 30035

The downloaded file can then be viewed on your system using cat or more to see the html content or a browser to view the web page.

In the command below, a single html file is grabbed.

$ curl https://www.networkworld.com/video/series/8559/2-minute-linux-tips > linux_tips.html
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 79873  100 79873    0     0  56780      0  0:00:01  0:00:01 --:--:-- 56808

Any sequence of blank lines can be reduced to one with a command like this:

$ uniq linux_tips.html > linux_tips.html

More information on using curl is available in this previous post of mine: The Joy of curl

You can also get some quick help on options for using curl with the curl –help command:

$ curl --help
Usage: curl [options...] 
 -d, --data           HTTP POST data
 -f, --fail                 Fail fast with no output on HTTP errors
 -h, --help       Get help for commands
 -i, --include              Include protocol response headers in the output
 -o, --output         Write to file instead of stdout
 -O, --remote-name          Write output to a file named as the remote file
 -s, --silent               Silent mode
 -T, --upload-file    Transfer local FILE to destination
 -u, --user  Server user and password
 -A, --user-agent     Send User-Agent  to server
 -v, --verbose              Make the operation more talkative
 -V, --version              Show version number and quit

This is not the full help, this menu is stripped into categories.
Use "--help category" to get an overview of all categories.
For all options use the manual or "--help all".’

Using wget

The wget command makes it easy to download a web site recursively. While the site used in the command below is a single-page web site, it provides a quick example of how this command works.

$ wget -r http://example.com/
--2023-09-19 13:07:12--  http://example.com/
Resolving example.com (example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com (example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Saving to: ‘example.com/index.html’

example.com/index.html        100%[=================================================>]   1.23K  --.-KB/s    in 0s

2023-09-19 13:07:12 (56.1 MB/s) - ‘example.com/index.html’ saved [1256/1256]

FINISHED --2023-09-19 13:07:12--
Total wall clock time: 0.1s
Downloaded: 1 files, 1.2K in 0s (56.1 MB/s)

The downloaded content will include a directory with the name of the URL (example.com) and containing its contents – in this case a single file.

$ ls example.com
index.html
$ head example.com/index.html



    Example Domain

If you were to run the command below (no recursion) multiple times, generations of the file will build up.

$ wget http://example.com/
$ ls -l index.html*
-rw-r--r--. 1 shs shs 1256 Oct 17  2019 index.html
-rw-r--r--. 1 shs shs 1256 Oct 17  2019 index.html.1
-rw-r--r--. 1 shs shs 1256 Oct 17  2019 index.html.2
-rw-r--r--. 1 shs shs 1256 Oct 17  2019 index.html.3

The no-parent option

The no-parent options ensures that the command will not ever ascend to the parent directory when retrieving content recursively so that only the files below a certain hierarchy will be downloaded.

$ wget --no-parent -r https://uushenandoah.org/how-to-become-a-member/

Wrap-up

Both curl and wget are extremely useful commands for downloading and troubleshooting web content. Check out the man pages for information on the many options available.

by Sandra Henry Stocker

Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

By Howard Solomon

Feb 14, 20253 mins

FirewallsVulnerabilitiesZero-day vulnerability

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

By Zeus Kerravala

Feb 14, 20256 mins

Networking

Americas

Topics

About

Policies

Our Network

More

Using curl and wget commands to download pages from web sites

The curl and the wget commands make it easy to download content from web sites.

Using the curl command

Using wget

The no-parent option

Wrap-up

More from this author

Red Hat completes Neural Magic acquisition

Examining disk space on Linux

Linux filesystems: Ext4, Btrfs, XFS, ZFS and more

How to work with text colors on Linux

Linux fundamentals: Viewing files, commands, processes and systems

Linux command line for beginners: 25 essential commands

5 Linux commands for measuring disk activity

How to loop forever in bash

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command

Using curl and wget commands to download pages from web sites

The curl and the wget commands make it easy to download content from web sites.

Using the curl command

Using wget

The no-parent option

Wrap-up

From our editors straight to your inbox

More from this author

Red Hat completes Neural Magic acquisition

Examining disk space on Linux

Linux filesystems: Ext4, Btrfs, XFS, ZFS and more

How to work with text colors on Linux

Linux fundamentals: Viewing files, commands, processes and systems

Linux command line for beginners: 25 essential commands

5 Linux commands for measuring disk activity

How to loop forever in bash

Show me more

Palo Alto Networks firewall bug being exploited by threat actors: Report

Nvidia forges healthcare partnerships to advance AI-driven genomics, drug discovery

Juniper CEO: 'I am disappointed and somewhat puzzled' by DOJ merger rejection

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the lsblk command

How to use the fdisk command

How to use the du command