Got it 👍 You want a realistic end-to-end project that covers:

  1. Fetch messy dataset from a free public API (CSV/Excel).
  2. Clean data → focus on string cleaning & manipulation (interview-useful).
  3. Store / serve cleaned data.
  4. Build a dashboard (Flask / FastAPI recommended) → interactive tables/graphs.
  5. Production-grade → modular, structured, easy to run locally.

🔹 Suggested Project: “World Population & Country Data Dashboard”


🔹 Project Workflow

Step 1: Fetch Data

  • Use requests to call the API.
  • Save JSON → convert into pandas DataFrame.
  • Alternatively, download CSV from web.

Step 2: Data Cleaning (String Manipulation)

  • Standardize country names (strip spaces, title case).
  • Handle missing values (fillna, dropna).
  • Extract numeric parts from messy fields (e.g., population, area).
  • Split / join fields (e.g., capital cities).
  • Create derived columns (continent short codes, name lengths).

Step 3: Store Clean Data

  • Save cleaned data to SQLite (portable for your laptop).
  • Or keep as cleaned CSV/Parquet.

Step 4: Build Dashboard

  • Use Flask (simple) or FastAPI + Jinja2 + Bootstrap (recommended).
  • Pages:
    • Home → Summary stats (population, area).
    • Search → Query countries.
    • Charts → Plot population by continent (using Plotly/Matplotlib).

Step 5: Productionize

Modular structure: string_project/ ├── app.py # Flask app ├── data_fetch.py # API/CSV fetcher ├── data_clean.py # String cleaning functions ├── models.py # SQLite DB helper ├── static/ # CSS/JS ├── templates/ # Jinja2 HTML templates └── requirements.txt

requirements.txt flask pandas requests plotly sqlalchemy


🔹 Sample End-to-End Script (minimal but extendable)

# app.py
from flask import Flask, render_template, request
import pandas as pd
import requests

app = Flask(__name__)

DATA_URL = "https://restcountries.com/v3.1/all"

def fetch_and_clean():
    # Step 1: Fetch
    res = requests.get(DATA_URL)
    countries = res.json()

    # Step 2: Normalize into DataFrame
    df = pd.json_normalize(countries)

    # Step 3: String cleaning
    df['name.common'] = df['name.common'].str.strip().str.title()
    df['region'] = df['region'].fillna("Unknown").str.upper()
    df['capital'] = df['capital'].astype(str).str.replace(r"[\[\]']", "", regex=True)

    # Derived column
    df['name_length'] = df['name.common'].str.len()

    return df[['name.common', 'region', 'capital', 'population', 'area', 'name_length']]

@app.route("/")
def home():
    df = fetch_and_clean()
    summary = {
        "total_countries": df.shape[0],
        "total_population": df["population"].sum(),
        "largest_country": df.loc[df["area"].idxmax(), "name.common"]
    }
    return render_template("home.html", summary=summary)

@app.route("/countries")
def countries():
    query = request.args.get("q", "")
    df = fetch_and_clean()
    if query:
        df = df[df['name.common'].str.contains(query, case=False)]
    return render_template("countries.html", tables=df.to_html(classes="table table-striped"), query=query)

if __name__ == "__main__":
    app.run(debug=True)

🔹 Example Templates (Jinja2)

templates/home.html

<!DOCTYPE html>
<html>
<head>
  <title>Country Dashboard</title>
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css">
</head>
<body class="container mt-4">
  <h1>🌍 Country Dashboard</h1>
  <p>Total Countries: {{ summary.total_countries }}</p>
  <p>Total Population: {{ summary.total_population }}</p>
  <p>Largest Country (Area): {{ summary.largest_country }}</p>
  <a href="/countries">Browse Countries</a>
</body>
</html>

templates/countries.html

<!DOCTYPE html>
<html>
<head>
  <title>Countries</title>
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css">
</head>
<body class="container mt-4">
  <h2>Countries</h2>
  <form method="get">
    <input type="text" name="q" value="{{ query }}" placeholder="Search by name">
    <button type="submit" class="btn btn-primary btn-sm">Search</button>
  </form>
  <div class="mt-3">
    {{ tables | safe }}
  </div>
</body>
</html>