MIME can be frustrating...

So, a couple of days ago I wrote about my new sync script, but there was a bug which only showed it's ugly head when I tested manually after I considered my masterpiece done.

When I navigated to the blog index page, the browser prompted me to download the file in stead of displaying it. Based on prior experience, I immediately suspected the MIME type to be the problem, but unfortunately I had to focus on actual work the rest of the time, so I could only solve the puzzle today.

So, to confirm my suspicions I turned to the command line:

curl -v https://www.nicc777.com/blog/index.html
*   Trying 2600:9000:21c7:5400:10:7fa4:d780:93a1:443...
...boring SSL handshake stuff omitted...
> GET /blog/index.html HTTP/2
> Host: www.nicc777.com
> user-agent: curl/7.68.0
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200 
< content-type: application/octet-stream
< content-length: 10911
< date: Sun, 13 Mar 2022 08:26:44 GMT
< last-modified: Sun, 13 Mar 2022 08:21:20 GMT
< etag: "cec4d7f2bebc8951909da2459db1ea34"
< x-amz-version-id: vlD1LsJMEJJP9xM91J3vQZbOSTRaPFq4
< accept-ranges: bytes
< server: AmazonS3
< vary: Origin
< x-cache: Miss from cloudfront
< via: 1.1 5e95d2e6aebe43cabd9dcdad89ad0a42.cloudfront.net (CloudFront)
< x-amz-cf-pop: AMS54-C1
< x-amz-cf-id: YBF5HdHCxwT-SBD24ttbn4_D1QWfVd50iY4D-Ai0_cZOo6VkW4Ltqg==
< 

As you can clearly see, the line content-type: application/octet-stream shows an incorrect MIME type.

Let's look at the original Python code (some lines have been omitted for brevity):

AWS_REGION = os.getenv('AWS_REGION', 'eu-central-1')

def get_aws_client(boto3_library, service: str='s3', region: str=AWS_REGION):
    return boto3_library.client(service_name=service, region_name=region)

def get_aws_resource(boto3_library, service: str='s3', region: str=AWS_REGION):
    return boto3_library.resource(service_name=service, region_name=region)

def upload_local_file(
    bucket_name: str, 
    local_file_path: str,
    target_key: str, 
    client=get_aws_resource(boto3_library=boto3,service='s3'),
    remove_local_file_after_upload: bool=False
)->bool:
    try:
        client.meta.client.upload_file(
            Filename=local_file_path,
            Bucket=bucket_name,
            Key=target_key
        )
        logger.info('Uploaded local file "{}" to s3://{}/{}'.format(local_file_path, bucket_name, target_key))
    except:
        logger.info('Unable to upload "{}" to "{}" - enable debug to see full stacktrace'.format(local_file_path, bucket_name))
        logger.debug('EXCEPTION: {}'.format(traceback.format_exc()))
        return False
    # Lines omitted.....

My incorrect assumption was that the AWS Boto3 library would automatically correct the MIME type.

But first I needed a way to figure out the correct MIME type. I turned to the trusty Mozilla Developer resource on the topic and converted their table to a text file which I then converted into a constant in the updated script.

However, it also turned out there was one file extension missing: .html. The problem is easily solved by just adding that one item manually.

Now all I needed to do was to derive the MIME type based purely on the file extension (best guess, basically) and then update the upload function to set the MIME type.

The first function to look at is the MIME type guessing bit:

def derive_mimetype(filename: str)->str:
    mime_type = 'application/octet-stream'
    try:
        for file_ext, target_mime_type in MIME_TYPES.items():
            if filename.lower().endswith(file_ext.lower()):
                mime_type = target_mime_type
        logger.info('Mimetype for file "{}" set to "{}"'.format(filename, mime_type))
    except:
        logger.debug('EXCEPTION: {}'.format(traceback.format_exc()))
    return mime_type

It is probably the most efficient algorithm at the moment, but it gets the job done (speed is not that important here). A more optimized approach may be to get the file extension of the file and see if that key exists in the MIME_TYPES constant, and then use that value.

Anyway, the code worked, so moving along now to the updated upload function:

def upload_local_file(
    bucket_name: str, 
    local_file_path: str,
    target_key: str, 
    client=get_aws_client(boto3_library=boto3,service='s3'),
    remove_local_file_after_upload: bool=False
)->bool:
    try:
        data = ''
        with open(local_file_path, 'rb') as f:
            data = f.read()
        mime_type = derive_mimetype(filename=local_file_path)
        response = client.put_object(
            Body=data,
            Bucket=bucket_name,
            ContentType=mime_type,
            Key=target_key,
        )
        logger.debug('response={}'.format(response))
        logger.info('Uploaded local file "{}" to s3://{}/{}'.format(local_file_path, bucket_name, target_key))
    except:
        logger.info('Unable to upload "{}" to "{}" - enable debug to see full stacktrace'.format(local_file_path, bucket_name))
        logger.debug('EXCEPTION: {}'.format(traceback.format_exc()))
        return False
    # Lines omitted..... 

The first thing you should notice is that I now used the Boto3 Client instead on the Resource. This allows me to call the put_object method where we can set our MIME type.

One other important thing to note is that the S3 API expects the data in binary format, and therefore we read our files (including text files) as binary - but we set the correct MIME type, so we will still be served a TEXT file in the end.

Well, I hope this explains something to those interested in the subject!

Happy coding!