Getting data from a REST API
A REST API is one of the most common means of accessing large amounts of data and is how many applications interact over a network. REST stands for Representational State Transfer and API stands for Application Programming Interface, so a REST API is an interface you can use to interact with another application. For most data science applications that will me retrieving data from some external app, but you can also send data back if you need to.
For this example we are going to access the One API to Rule them All, a cool and straight forward Lord of the Rings API for getting facts about both the books and the movies. A REST API sets up several endpoints which you use to interact with the system. Take a look at the documentation to see what endpoints are available for the One API here: The One API documentation. Several of them don't require an access token, while some of them do. We will interact with examples of each so you can see how they work.
Some very popular APIs will develop libraries for interacting with them, but we will stick with a general approach which uses the request
library (you can look at the documentation for the request
library here: https://requests.readthedocs.io/en/master/). Lets start by trying to get information on the Lord of the Rings books. To do that we are going to start with importing the request library. We then need to decide how to access the book information. The documentation page says that all of the requests need to start with the url https://the-one-api.dev/v2
and them we append the endpoint that we care about to the end of that. So lets put that base URL into our BASE_URL
variable and /book
into our BOOK_ENDPOINT
variable.
1import requests
2BASE_URL = 'https://the-one-api.dev/v2'
3BOOK_ENDPOINT = '/book'
We are trying to get data from the end point, so we are going to use the get
function from the request
library. We then print the result.
1response = requests.get(BASE_URL + BOOK_ENDPOINT)
2print(response)
1<Response [200]>
So we were able to send out the request, but all we got back was a <Response [200]>
? Where is all the book information? What we got above is called the response code, and the 200
tells us that what we did worked out ok. If you want to check out what all the different response codes you can possibly get mean, check out the wikipedia page on the topic here: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes.
So we know that the request went through alright, but what happend to the data? We can get the actual data out in a handful of different ways, but I find the easiest one to work with is the JSON output. To get that we just need to run the following command.
1print(response.json())
1{'docs': [{'_id': '5cf5805fb53e011a64671582', 'name': 'The Fellowship Of The Ring'}, {'_id': '5cf58077b53e011a64671583', 'name': 'The Two Towers'}, {'_id': '5cf58080b53e011a64671584', 'name': 'The Return Of The King'}], 'total': 3, 'limit': 1000, 'offset': 0, 'page': 1, 'pages': 1}
However, that isn't the prettiest to read, so lets clean that up a bit with the JSON library.
1import json
2
3json_object = response.json()
4json_formatted = json.dumps(json_object, indent=4)
5print(json_formatted)
1{
2 "docs": [
3 {
4 "_id": "5cf5805fb53e011a64671582",
5 "name": "The Fellowship Of The Ring"
6 },
7 {
8 "_id": "5cf58077b53e011a64671583",
9 "name": "The Two Towers"
10 },
11 {
12 "_id": "5cf58080b53e011a64671584",
13 "name": "The Return Of The King"
14 }
15 ],
16 "total": 3,
17 "limit": 1000,
18 "offset": 0,
19 "page": 1,
20 "pages": 1
21}
Now that we know we have the data in there that we want, we need to get it out. When we are dealing with JSON data, like we are here, we can simply treat it as a dictionary to retrieve the data that we want. So let's just grab the docs
data from the JSON payload and loop through it, printing all the book names.
1LoTR_book_data = json_object['docs']
2for book in LoTR_book_data:
3 print(book['name'])
1The Fellowship Of The Ring
2The Two Towers
3The Return Of The King
Now let's try to print out all the movie data using the same approach. We first need to grab the movie endpoint, and then we can simply repeat what we did before.
1MOVIE_ENDPOINT = '/movie'
2response = requests.get(BASE_URL + MOVIE_ENDPOINT)
3print(response)
1<Response [401]>
Now our response is a 401
though, so what is the issue? If we look up a 401
error in the wikipedia page we linked above we see that a 401
error means that we aren't authorized. If we look at The One API's documentation we can see why, and that is because the movie
endpoint requires us to have a token. To get an API access token you need to go to the following site to sign up: https://the-one-api.dev/sign-up. Once you create an account and log in you should get redirected to a page to get an access token. Save it in THE_ONE_API_ACCESS_TOKEN
below.
One thing to note though is that these tokens should be kept private, and never commited to a repo. The way that I am going to accomplish this is through environment variables stored in a .env
file and accessed using the next code block. This section is a bit beyond the scope of this tutorial, but if you want to learn how to do it yourself, please check out the overview on the Python dotenv pypi page which you can find here: https://pypi.org/project/python-dotenv/. However, if you don't want to go through all of that, you just need to put your token in quotes after THE_ONE_API_ACCESS_TOKEN
in the blow code, replacing os.getenv("THE_ONE_API_ACCESS_TOKEN")
.
1import os
2from dotenv import load_dotenv
3
4load_dotenv()
5THE_ONE_API_ACCESS_TOKEN = os.getenv("THE_ONE_API_ACCESS_TOKEN")
Now we need to use this token in our request to the API. If we look at The One's documentation we see that our token needs to be provided using the format Authorization: Bearer your-api-key-123
. We will do that by putting that as a dictionary in headers
part of our get
request. Now if we pring out the response we get a 200
again, meaning that we are good. If you continue to get a 401
error, make sure that you are following the format detailed above (Note that there is a space between Bearer
and your API token.
1response = requests.get(BASE_URL + MOVIE_ENDPOINT, headers={'Authorization': 'Bearer ' + THE_ONE_API_ACCESS_TOKEN})
2print(response)
1<Response [200]>
Now we just need to pring out all of our information again, but we don't know how its stored yet. So lets print out the JSON of the response again to make sure its the same as before, and if not, to see what we need to do.
1json_object = response.json()
2json_formatted = json.dumps(json_object, indent=4)
3print(json_formatted)
1{
2 "docs": [
3 {
4 "_id": "5cd95395de30eff6ebccde56",
5 "name": "The Lord of the Rings Series",
6 "runtimeInMinutes": 558,
7 "budgetInMillions": 281,
8 "boxOfficeRevenueInMillions": 2917,
9 "academyAwardNominations": 30,
10 "academyAwardWins": 17,
11 "rottenTomatesScore": 94
12 },
13 {
14 "_id": "5cd95395de30eff6ebccde57",
15 "name": "The Hobbit Series",
16 "runtimeInMinutes": 462,
17 "budgetInMillions": 675,
18 "boxOfficeRevenueInMillions": 2932,
19 "academyAwardNominations": 7,
20 "academyAwardWins": 1,
21 "rottenTomatesScore": 66.33333333
22 },
23 {
24 "_id": "5cd95395de30eff6ebccde58",
25 "name": "The Unexpected Journey",
26 "runtimeInMinutes": 169,
27 "budgetInMillions": 200,
28 "boxOfficeRevenueInMillions": 1021,
29 "academyAwardNominations": 3,
30 "academyAwardWins": 1,
31 "rottenTomatesScore": 64
32 },
33 {
34 "_id": "5cd95395de30eff6ebccde59",
35 "name": "The Desolation of Smaug",
36 "runtimeInMinutes": 161,
37 "budgetInMillions": 217,
38 "boxOfficeRevenueInMillions": 958.4,
39 "academyAwardNominations": 3,
40 "academyAwardWins": 0,
41 "rottenTomatesScore": 75
42 },
43 {
44 "_id": "5cd95395de30eff6ebccde5a",
45 "name": "The Battle of the Five Armies",
46 "runtimeInMinutes": 144,
47 "budgetInMillions": 250,
48 "boxOfficeRevenueInMillions": 956,
49 "academyAwardNominations": 1,
50 "academyAwardWins": 0,
51 "rottenTomatesScore": 60
52 },
53 {
54 "_id": "5cd95395de30eff6ebccde5b",
55 "name": "The Two Towers ",
56 "runtimeInMinutes": 179,
57 "budgetInMillions": 94,
58 "boxOfficeRevenueInMillions": 926,
59 "academyAwardNominations": 6,
60 "academyAwardWins": 2,
61 "rottenTomatesScore": 96
62 },
63 {
64 "_id": "5cd95395de30eff6ebccde5c",
65 "name": "The Fellowship of the Ring",
66 "runtimeInMinutes": 178,
67 "budgetInMillions": 93,
68 "boxOfficeRevenueInMillions": 871.5,
69 "academyAwardNominations": 13,
70 "academyAwardWins": 4,
71 "rottenTomatesScore": 91
72 },
73 {
74 "_id": "5cd95395de30eff6ebccde5d",
75 "name": "The Return of the King",
76 "runtimeInMinutes": 201,
77 "budgetInMillions": 94,
78 "boxOfficeRevenueInMillions": 1120,
79 "academyAwardNominations": 11,
80 "academyAwardWins": 11,
81 "rottenTomatesScore": 95
82 }
83 ],
84 "total": 8,
85 "limit": 1000,
86 "offset": 0,
87 "page": 1,
88 "pages": 1
89}
It looks like all of our data is still stores in the same format, but now it has a ton of extra information. Lets just pull out the names again for now, where we can use the same code as before.
1LoTR_movie_data = json_object['docs']
2for movie in LoTR_movie_data:
3 print(movie['name'])
1The Lord of the Rings Series
2The Hobbit Series
3The Unexpected Journey
4The Desolation of Smaug
5The Battle of the Five Armies
6The Two Towers
7The Fellowship of the Ring
8The Return of the King
Using the above method you can now get a whole bunch of information out of the API, so try to play around with it a bit to see what is in there.
In summary, we can use the requests
library to access data from a REST API. If we don't have any authentication mechanism it is pretty straight forward, however, if there is an authorization mechanizm (like a token) it takes a little more work. Below are the requests you need using both the unauthenticated API and the authenticated API. Note however that there can be a lot of variability in how to access APIs that require authentication. So take a look at the requests
library docs and the API's docs if you run into issues.
Unauthenticated
1import json
2import requests
3
4BASE_URL = 'https://the-one-api.dev/v2'
5BOOK_ENDPOINT = '/book'
6
7response = requests.get(BASE_URL + BOOK_ENDPOINT)
8print(response)
9
10json_object = response.json()
11json_formatted = json.dumps(json_object, indent=4)
12print(json_formatted)
1<Response [200]>
2{
3 "docs": [
4 {
5 "_id": "5cf5805fb53e011a64671582",
6 "name": "The Fellowship Of The Ring"
7 },
8 {
9 "_id": "5cf58077b53e011a64671583",
10 "name": "The Two Towers"
11 },
12 {
13 "_id": "5cf58080b53e011a64671584",
14 "name": "The Return Of The King"
15 }
16 ],
17 "total": 3,
18 "limit": 1000,
19 "offset": 0,
20 "page": 1,
21 "pages": 1
22}
Authenticated
1import json
2import requests
3import os
4from dotenv import load_dotenv
5
6load_dotenv()
7
8BASE_URL = 'https://the-one-api.dev/v2'
9MOVIE_ENDPOINT = '/movie'
10
11THE_ONE_API_ACCESS_TOKEN = os.getenv("THE_ONE_API_ACCESS_TOKEN")
12
13response = requests.get(BASE_URL + MOVIE_ENDPOINT, headers={'Authorization': 'Bearer ' + THE_ONE_API_ACCESS_TOKEN})
14print(response)
15
16json_object = response.json()
17json_formatted = json.dumps(json_object, indent=4)
18print(json_formatted)
1<Response [200]>
2{
3 "docs": [
4 {
5 "_id": "5cd95395de30eff6ebccde56",
6 "name": "The Lord of the Rings Series",
7 "runtimeInMinutes": 558,
8 "budgetInMillions": 281,
9 "boxOfficeRevenueInMillions": 2917,
10 "academyAwardNominations": 30,
11 "academyAwardWins": 17,
12 "rottenTomatesScore": 94
13 },
14 {
15 "_id": "5cd95395de30eff6ebccde57",
16 "name": "The Hobbit Series",
17 "runtimeInMinutes": 462,
18 "budgetInMillions": 675,
19 "boxOfficeRevenueInMillions": 2932,
20 "academyAwardNominations": 7,
21 "academyAwardWins": 1,
22 "rottenTomatesScore": 66.33333333
23 },
24 {
25 "_id": "5cd95395de30eff6ebccde58",
26 "name": "The Unexpected Journey",
27 "runtimeInMinutes": 169,
28 "budgetInMillions": 200,
29 "boxOfficeRevenueInMillions": 1021,
30 "academyAwardNominations": 3,
31 "academyAwardWins": 1,
32 "rottenTomatesScore": 64
33 },
34 {
35 "_id": "5cd95395de30eff6ebccde59",
36 "name": "The Desolation of Smaug",
37 "runtimeInMinutes": 161,
38 "budgetInMillions": 217,
39 "boxOfficeRevenueInMillions": 958.4,
40 "academyAwardNominations": 3,
41 "academyAwardWins": 0,
42 "rottenTomatesScore": 75
43 },
44 {
45 "_id": "5cd95395de30eff6ebccde5a",
46 "name": "The Battle of the Five Armies",
47 "runtimeInMinutes": 144,
48 "budgetInMillions": 250,
49 "boxOfficeRevenueInMillions": 956,
50 "academyAwardNominations": 1,
51 "academyAwardWins": 0,
52 "rottenTomatesScore": 60
53 },
54 {
55 "_id": "5cd95395de30eff6ebccde5b",
56 "name": "The Two Towers ",
57 "runtimeInMinutes": 179,
58 "budgetInMillions": 94,
59 "boxOfficeRevenueInMillions": 926,
60 "academyAwardNominations": 6,
61 "academyAwardWins": 2,
62 "rottenTomatesScore": 96
63 },
64 {
65 "_id": "5cd95395de30eff6ebccde5c",
66 "name": "The Fellowship of the Ring",
67 "runtimeInMinutes": 178,
68 "budgetInMillions": 93,
69 "boxOfficeRevenueInMillions": 871.5,
70 "academyAwardNominations": 13,
71 "academyAwardWins": 4,
72 "rottenTomatesScore": 91
73 },
74 {
75 "_id": "5cd95395de30eff6ebccde5d",
76 "name": "The Return of the King",
77 "runtimeInMinutes": 201,
78 "budgetInMillions": 94,
79 "boxOfficeRevenueInMillions": 1120,
80 "academyAwardNominations": 11,
81 "academyAwardWins": 11,
82 "rottenTomatesScore": 95
83 }
84 ],
85 "total": 8,
86 "limit": 1000,
87 "offset": 0,
88 "page": 1,
89 "pages": 1
90}