Taxi

Contents

Taxi#

Helpers#

import os
import pandas as pd
import requests
from dotenv import load_dotenv

load_dotenv()
API_KEY = os.environ["API_KEY"]
URL_QUERY = "https://%s.crucible.dreadnode.io/score"
URL_FLAG = "https://crucible.dreadnode.io/api/challenges/%s/submit-flag"
CHALLENGE = "taxi"

def query(flag):
    response = requests.post(
        URL_QUERY % CHALLENGE,
        headers={ "X-API-Key": API_KEY },
        json={ "data": flag }
    )
    return response.json()

def submit(flag):
    response = requests.post(
        URL_FLAG % CHALLENGE,
        headers={ "X-API-Key": API_KEY },
        json={ "challenge": CHALLENGE, "flag": flag }
    )
    return False if response.status_code != 200 else response.json().get("correct")

Solution#

If we examine the different columns in the Parquet file we can see that the distribution of values for the pickup_location column is quite interesting.

df = pd.read_parquet("./data/taxi.parquet")
df["pickup_location"].value_counts()

pickup_location
Financial District    298
Industrial Park       286
Train Station         274
Beach Front           272
University            272
Shopping Mall         259
Business District     256
Airport               252
Historic Center       250
Convention Center     250
Entertainment Zone    245
Sports Complex        243
Downtown              242
Theater District      237
Hotel Zone            234
Restaurant Row        232
Arts District         231
Residential Area      225
Tech Hub              221
Hospital              215
Grand Central           1
Railway Station         1
Library                 1
North Station           1
0mega Mall              1
0pera House             1
Name: count, dtype: int64

Looking at the rows for those pickup locations we can see that they all have identical signal_north and signal_south values: 85.0 and 15.0, respectively.

col_pickup = "pickup_location"

df[df[col_pickup].isin(df[col_pickup].value_counts().nsmallest(6).index.values)]

	ride_id	pickup_time	pickup_location	dropoff_location	driver_id	passenger_count	fare_amount	tip_amount	payment_type	rating	ride_duration_minutes	dropoff_time	signal_north	signal_south
0	1	2024-01-01 00:00:00	Library	Tech Hub	63	3	17.790	12.52	Cash	1	80	2024-01-01 01:20:00	85.0	15.0
600	601	2024-01-03 02:00:00	0mega Mall	Restaurant Row	58	3	99.030	19.16	Cash	4	19	2024-01-03 02:19:00	85.0	15.0
1200	1201	2024-01-05 04:00:00	North Station	Theater District	49	4	73.275	14.76	Cash	5	68	2024-01-05 04:30:00	85.0	15.0
1800	1801	2024-01-07 06:00:00	Grand Central	Entertainment Zone	43	4	56.350	13.61	Credit Card	5	105	2024-01-07 07:45:00	85.0	15.0
2400	2401	2024-01-09 08:00:00	Railway Station	Tech Hub	51	2	52.860	9.15	Cash	5	5	2024-01-09 08:05:00	85.0	15.0
3000	3001	2024-01-11 10:00:00	0pera House	Tech Hub	24	4	57.460	19.95	Mobile Payment	4	80	2024-01-11 11:20:00	85.0	15.0

If we filter for all rows with these signal values and join the first letters of the pickup_location column we get the flag.

col_snorth = "signal_north"
col_ssouth = "signal_south"

flag = "".join(map(
    lambda s: s[0],
    df[(df[col_snorth] == 85.0) & (df[col_ssouth] == 15.0)][col_pickup].values
))

print(flag)
print("Flag accepted:", submit(query(flag)["flag"]))

L0NGR0AD
Flag accepted: True