Fully automated bluesky commenting system on mkdocs

Background¶

This post was basically continuation of the previous post and outlined the steps taken to achieve the following flow, however there was an announcement recently which stops bluesky search as means of automation so I had to modify the flow:

The modified flow that I applied, is as shown below and I will update this post in due course to provide documentation on code changes but code can also see the code here

Once the steps of previous post are completed, basically following additional steps will ensure that a new post will trigger creation of a bluesky post which in turn will enable bluesky comments on the post on the site.

Create environment secrets¶

Open Site Repo.

Visit your site repo on github.
Click "Settings" .
Click "Secrets and variables" .
Click "Actions" .
Click "Secrets" tab.
Click "New repository secret" .
Add Passwrod Secret.

In Name field enter BSKY_APP_PWD and in Value field enter the application password created on bluesky site and click Save
Add Handle Secret

In Name field enter BSKY_HANDLE and in Value field enter the email address used to register the bluesky account to be used for commenting system.

Python Script¶

Assumptions

Python script assumes that social plugin of material for mkdocs is enabled. It relies on the social ocards created using that plugin. If it is not enabled, it will need to be enabled in mkdocs.yml as per the instructions on material for mkdocs guidance.
It also assumes that the frontmatter explicitly specifies the slug for the post.
Python script does not check if bsky: true is present in metadata and assumes that every new blog post will warrant a creation of bluesky post.

In the root of the site repository create a new file named post_deploy.py and paste the following code:

./post_deploy.py
# post_deploy.py
import re
import os
import yaml
import datetime
from atproto import Client, IdResolver, models #(1)
import requests

def get_yaml_frontmatter(path,access_token,at_client,image_directory,site_url):
    # Regex to match YAML front matter
    yaml_regex = re.compile(r'^(---\n.*?\n---\n)', re.DOTALL)

    # Check if the path is a directory or a file
    if os.path.isdir(path):
        # If it's a directory, process all .md files
        for filename in os.listdir(path):
            if filename.endswith('.md'):
                file_path = os.path.join(path, filename)
                process_file_yaml(file_path, yaml_regex,access_token,at_client,image_directory,site_url)
    elif os.path.isfile(path) and path.endswith('.md'):
        # If it's a single .md file, process it
        process_file_yaml(path, yaml_regex,access_token,at_client,image_directory,site_url)
    else:
        print("Provided path is neither a valid directory nor a .md file.")

def process_file_yaml(file_path, yaml_regex,access_token,at_client,image_directory,site_url):
    with open(file_path, 'r', encoding='utf-8') as file:
        content = file.read()
    description_value = ""
    url = ""
    title_value = ""

    # Find YAML front matter
    match = yaml_regex.search(content)
    if match:
        frontmatter = match.group(1)
        # Parse the existing YAML front matter
        frontmatter_content = frontmatter.split('---')[1].strip()
        frontmatter_dict = yaml.safe_load(frontmatter_content)
        for key, value in frontmatter_dict.items():
            if key == 'date':
                created_date = value['created']
            if key == 'slug':
                slug_value = value
            if key == 'title':
                title_value = value
            if key == 'description':
                description_value = value

        yyyy = created_date.year
        mm = f"{created_date.month:02}"
        dd = f"{created_date.day:02}"

        url = f"{site_url}/{yyyy}/{mm}/{dd}/{slug_value}.html"
        image_path = f"{image_directory}/{file_path.split('/')[-1].split('.')[0]}.png"

        ####################################################################
        #### skip posting if created date is more than 5 days old###########
        ####################################################################
        created_date_str = f"{created_date}"
        # Convert the created_date string to a datetime object
        created_date = datetime.datetime.fromisoformat(created_date_str)
        # Get the current date
        current_date = datetime.datetime.now()
        # Calculate the difference in days
        difference = (current_date - created_date).days
        if difference <= 5: #(2)
            print(f"created_date: {created_date} and slug_value: {slug_value}")
            print(f"url: {site_url}/{yyyy}/{mm}/{dd}/{slug_value}.html")
            print(f"img_path: {image_directory}/{file_path.split('/')[-1].split('.')[0]}.png")
            #####################################################################################
            ################### skip posting if url is already posted on bluesky#################
            #####################################################################################

            search_params = models.app.bsky.feed.search_posts.Params(
                    q= url,
                    author=at_client.me.did,
                    limit=1,
                    sort='oldest'
                )

            response = at_client.app.bsky.feed.search_posts(params=search_params)
            if response.posts:
                print("BSKY POST ALREADY EXISTS, NO ACTION NEEDED")
            else:
                # Open the image file in binary mode
                with open(image_path, 'rb') as img_file:
                    # Read the content of the image file
                    img_data = img_file.read()

                blob_resp = requests.post(
                    "https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
                    headers={
                        "Content-Type": "image/png",
                        "Authorization": "Bearer " + access_token,
                    },
                    data=img_data,
                )
                blob_resp.raise_for_status()
                card = {
                "uri": url,
                "title": title_value,
                "description": description_value,
                "thumb": blob_resp.json()["blob"]
                }

                embed_post = {
                "$type": "app.bsky.embed.external",
                "external": card,
                }

                text = 'Check out the latest post on my blog.'
                post_with_link_card_from_website = at_client.send_post(text=text, embed=embed_post)
                print(post_with_link_card_from_website.uri)
    else:
        print(f"No YAML front matter found in: {file_path}")

def main():
    BLUESKY_HANDLE = os.environ.get('BSKY_HANDLE') #(3)
    BLUESKY_APP_PASSWORD = os.environ.get('BSKY_APP_PWD') #(4)
    # Make sure the environment variables are set
    if not BLUESKY_HANDLE or not BLUESKY_APP_PASSWORD:
        raise ValueError("Environment variables BLUESKY_HANDLE and BLUESKY_APP_PASSWORD must be set.")
    else:
        at_client = Client()
        at_client.login(BLUESKY_HANDLE, BLUESKY_APP_PASSWORD)
        resp = requests.post(
            "https://bsky.social/xrpc/com.atproto.server.createSession",
            json={"identifier": BLUESKY_HANDLE, "password": BLUESKY_APP_PASSWORD},
        )
        resp.raise_for_status()
        session = resp.json()
        access_token = session["accessJwt"]
        path = 'docs/posts' #(5)
        image_directory = os.path.join(os.environ['GITHUB_WORKSPACE'], 'site','assets','images','social','posts')
        site_url = os.environ['SITE_URL']
        get_yaml_frontmatter(path,access_token, at_client,image_directory,site_url)

if __name__ == "__main__":
    main()

Please make sure atproto package is included in requirements.txt.
This can be changed to any other integer. Basically, it is ensuring that only posts created in last 5 days are checked. If changed to 10, it will check for last 10 day and so on.
Ensure the spelling BSKY_HANDLE is same here and in environment secret created in previous step.
Ensure the spelling BSKY_APP_PWD is same here and in environment secret created in previous step.
Make sure the path is reflecting the location of .md files where commenting is to be enabled.

The script does the following:

Runs through all .md files in docs/posts
Calls get_yaml_frontmatter function which checks if the path is a file or a directory.
As it is a directory, it cycles through all files with .md extension and for each file extracts the yaml frontmatter.
It then passes it to the function process_file_yaml which in turn checks if the post is within last 5 days and if so it checks the slug for the post and creates the url from it and checks if a bluesky post exists for that url.
If the bluesky post does not exist for this post, it creates one and if it does then it skips this file and returns to get_yaml_frontmatter and cycle continues until all .md files have been checked.

Update requirements.txt¶

In order for the python script to work, following packages must be included on the requirements.txt:

1 2	`PyYAML atproto`

Github Action¶

Finally, add the following at the end of the githib action that builds the mkdocs site. You can check how it's done on my repo here

      - name: Run Post-Deployment Script
        run: python post_deploy.py
        env:
          BSKY_HANDLE: ${{ secrets.BSKY_HANDLE }}
          BSKY_APP_PWD: ${{ secrets.BSKY_APP_PWD }}
          GITHUB_WORKSPACE: ${{ github.workspace }}
          SITE_URL: ${{ vars.SITE_URL }}  

This passes the environment secrets and variables to the script before calling it to be run.