Where to eat? Where to go? Looking for an answer with a couple of recommendations and a Python script
Perhaps some of humanity’s best inventions are cool cafés, bars and restaurants: creative spaces where it’s pleasurable to spend time alone, with friends, colleagues or simply working. People always want to spend time in a place with a nice atmosphere, delicious food or aromatic coffee. As many of us truly love great places, we, therefore, have figured out how to find new places that suit your interests in automatic mode.
What did we create?
So, suppose you are in a foreign city. You have a few recommendations from friends or just a few favourite cafés, but you want to explore the city and find new ones.
Our algorithm can quickly increase this list by several times, supplementing it with five to ten additional recommendations of the same quality. Sounds great, right?
How did we do it?
We still haven’t figured out how to be magicians, so we decided to resort to an easier method — write a Python script.
We start, as always, with preparation. The essential gear in our script is Instagram API.
from instagrapi import Client
import time
import pandas as pd
Then, we connect to the API to start processing the data.
cl = Client()
cl.login("username", "password")
Our task is implemented by two small scripts. The first collects and processes geotags from Instagram, and finds people who have mentioned these geotags in their photos. We know that we are not the only ones who like these places, and we assume that we have found those people who share our tastes and values. Therefore, we collect all the recent geotags in their profiles, and this is how we get a list of recommendations. Then, with the help of the second script, we will find out the exact addresses of these places on GoogleMaps so we can make the decision if they are in an area we want to visit.
Collection and processing of data
Let’s begin! In order to get a list of recommendations, we need three suitable examples, for instance, coffee shops in Cyprus: Uluwatu, Bike & Bean and Rich Coffee Roasters. Our script accepts restaurant geopoint identifiers as an input.
How to extract them?
1. Follow the link to the profile of the place, for example: https://www.instagram.com/cultivoscoffeepaphos/ https://www.instagram.com/paulscoffeeroasters/ https://www.instagram.com/uluwatu_specialty_coffee/
2. Analyse the geotags in Instagram posts (we assume that the official profile contains the correct geotags).
3. Find a link to a geotag, for example:
https://www.instagram.com/explore/locations/677451402419656/uluwatu-specialty-coffee/
4. The numbers in the link are the geopoint identifiers:
115384716530057, 2053075441613042, 677451402419656
After we have geotagged the initial recommendations, we need to find those users who have visited these places, in other words, people who have tagged them in their photos. To do this, we take the last 150 tags of the café/restaurant.
Of course, these should be real users, not business accounts, because there may be ad integrations, and we are not interested in them.
pk_place_ids = [115384716530057, 2053075441613042, 677451402419656]
print('Getting users started')
# find users who tagged this place
users = []
for i in pk_place_ids:
# get 150 last publications with the tag
medias = cl.location_medias_recent(i, amount=150)
for m in medias:
user_id = m.dict()['user']['pk']
if not user_id in users:
users.append(user_id)
count_users = len(users)
print(f'Getting {count_users} users finished')
print('Getting not business users started')
# check if users have business accounts to eliminate them
users_not_business = []
for u in users:
try:
u_info = cl.user_info(u).dict()
if not u_info['is_business']:
users_not_business.append(u)
except:
next
count_nb_users = len(users_not_business)
print(f'Getting {count_nb_users} not business users finished')
print('Getting location started')
We found people who tagged these places on their profiles. Now we collect all the places tagged in these profiles and display them in a list, sorted by the number of mentions.
# get locations from the profiles of people who share our tastes
locations = {}
end_cursor = None
for u in users_not_business:
# the script is quite slow, so we only analyze the last 100 posts of the user
# posts are received in 20 portions (5 posts in each)
for page in range(20):
u_medias, end_cursor = cl.user_medias_paginated(u, 5, end_cursor=end_cursor)
for m in u_medias:
# exceptions are processed individually
try:
# delay to reduce the request rate on Instagram
time.sleep(1)
# by the post ID we get the post data (it is important that there is a name of the place, but there are no coordinates)
info = cl.media_info(m.dict()['pk']).dict()
if 'location' in info:
loc_key = info['location']['pk']
# if we get the place for the first time, then we find out its coordinates
if loc_key not in locations:
# we take the last post with this geotag to find out the coordinates
loc_data = cl.location_medias_recent(loc_key, amount=1)[0].dict()
lng=''
lat=''
if 'location' in loc_data:
lng=loc_data['location']['lng']
lat=loc_data['location']['lat']
locations[info['location']['pk']] = [info['location']['name'],1,lng,lat]
else:
locations[info['location']['pk']][1] = locations[info['location']['pk']][1] + 1
# save the result in csv file
ids = [i for i in locations]
names = [locations[i][0] for i in locations]
vizits = [locations[i][1] for i in locations]
lngs = [locations[i][2] for i in locations]
lats = [locations[i][3] for i in locations]
df = pd.DataFrame(
{'id': ids,
'name': names,
'vizit': vizits,
'lng': lngs,
'lat': lats
})
df.sort_values('vizit', ascending=False).to_csv('places.csv', index=False)
except:
next
count_locations = len(locations)
print(f'Getting {count_locations} location finished')
The second script should find the coordinates of new recommendations in GoogleMaps.
# get categories from the directory of organisations of the GoogleMaps service
import googlemaps
import requests# API key may be found in Google API console
api_key = "---"
maps_api = googlemaps.Client(key=api_key)
addrs = []
urls = []
cats = []
df = pd.read_csv("places_cyprus.csv")
for i, row in df.iterrows():
time.sleep(1)
lng = row['lng']
lat = row['lat']
name = row['name']
print(name)
# find an object by name, located near a geopoint
# search parameters https://developers.google.com/places/web-service/search#FindPlaceRequests
place_search = maps_api.find_place(
"name object", # specify the name of the object
"textquery",
fields=["formatted_address", "place_id"],
location_bias=f"point:{lat},{lng}", # mark the geopoint near which to search
language="en-US"
)
# for the first found object, get additional information (for example, a website and a type-category)
place_lookup = maps_api.place(
place_search["candidates"][0]["place_id"], fields=["website", "type"])
addr = ''
url = ''
cat = ''
# wrapped in exception handling because sometimes it crashes
try:
addr = place_search["candidates"][0]["formatted_address"]
url = place_lookup["result"]["website"]
cat = place_lookup["result"]["type"]
except:
pass
addrs.append(addr)
urls.append(url)
cats.append(cat)
df['address'] = addrs
df['url'] = urls
df['cat'] = cats
df.to_csv('places_cyprus_1.csv', index=False)
Results
Find delicious coffee in Cyprus
We decided to test how the script works on the example of our three favourite cafés in Cyprus. It turned out the following:
Script limitations
- The data collection script is not fast enough (about 100 geopoints per hour).
- Geopoints can have multiple duplicates. There are geopoints with the same name but different coordinates. To be honest, geopoints are quite chaotic. Therefore, a lot of manual work remains at the stage of processing and obtaining results.
- The database of organisations is very large, but for most objects, not one, but three categories are filled. A café can have a bar as a first category, a café as a second and a restaurant as a third. In general, you have to check a lot of things manually.
- We did not succeed in uniting both scripts. For now, you can work with the script in two stages:
– The first stage is retrieving data from Instagram (name, frequency of visits and coordinates) and then semi-automatically dropping unnecessary objects and duplicates.
– The second stage is getting categories, addresses and websites from GoogleMaps, and then manually checking the categories and choosing the most accurate category.
Interested? Learn more!
Valiotti Analytics is a consulting company specializing in data analytics and building modern data stacks. We predominantly start our projects with company audit and then focus on building data stack, setting up analytics engineering processes, creating effective dashboards and reporting or providing services of advanced analytics. A team of professionals are available to contribute vast experience and skills to your project.
If you’re looking for help and assistance building-out your analytics with a modern data stack, contact us!