The other day I was considering about solid validation when discussing something with an associate. Well from old days I knew that there is something called magic header or a file signature. Read more about this on Wikipedia
Well once I was at my solace I just fired up FireFox browser on my Ubuntu Asus Tuf A17 Laptop and opened google gemini and the following prompt was issued into a new chat.
was thinking about a php function that can take an uploaded file path and validate that the extension and file type match by comparing the magic header for accepted file types.
The output was as follows. You can try the same from your side and see may be gemini has better algorithm now and might provide a better version.
<?php
/**
* Validates an uploaded file based on its extension and magic bytes.
*
* @param string $filePath The path to the uploaded file.
* @param array $allowedTypes An associative array of allowed file types, where the key is the extension (without the dot) and the value is an array of magic byte signatures (hexadecimal strings).
*
* @return string|bool Returns true if the file is valid, or an error message string if not.
*/
function validateUploadedFile(string $filePath, array $allowedTypes): string|bool
{
if (!file_exists($filePath)) {
return "File not found.";
}
$fileInfo = new finfo(FILEINFO_MIME_TYPE);
$mimeType = $fileInfo->file($filePath);
$extension = strtolower(pathinfo($filePath, PATHINFO_EXTENSION));
if (!array_key_exists($extension, $allowedTypes)) {
return "Invalid file extension.";
}
$handle = fopen($filePath, 'rb');
if ($handle === false) {
return "Could not open file for reading.";
}
$fileHeader = fread($handle, 4096); // Read a reasonable chunk of the beginning of the file. Adjust as needed.
fclose($handle);
$isValid = false;
foreach ($allowedTypes[$extension] as $magicBytes) {
// Convert hex magic bytes to binary
$binaryMagicBytes = hex2bin(str_replace(' ', '', $magicBytes));
if (str_starts_with($fileHeader, $binaryMagicBytes)) {
$isValid = true;
break;
}
}
if (!$isValid) {
return "Invalid file content (magic bytes mismatch). Mime type reported as: " . $mimeType;
}
return true;
}
// Example usage:
$allowedFileTypes = [
'jpg' => ['FFD8FF'], // JPEG
'jpeg' => ['FFD8FF'], // JPEG
'png' => ['89504E47'], // PNG
'gif' => ['47494638'], // GIF (GIF87a and GIF89a both start with this)
'pdf' => ['25504446'], // PDF
'zip' => ['504B0304', '504B0506', '504B0708'], // ZIP (multiple signatures for different zip formats)
'docx' => ['504B0304'], //docx is a zip
'xlsx' => ['504B0304'], //xlsx is a zip
'pptx' => ['504B0304'], //pptx is a zip
];
// Example usage with a dummy file (replace with your actual uploaded file path):
$testFilePath = 'test.png'; // Replace with your test file
$testFileContents = hex2bin('89504E47') . "some more content"; //create a valid png file
file_put_contents($testFilePath, $testFileContents);
$validationResult = validateUploadedFile($testFilePath, $allowedFileTypes);
if ($validationResult === true) {
echo "File is valid.\n";
} else {
echo "File is invalid: " . $validationResult . "\n";
}
$testFilePathBad = 'test.txt'; // Replace with your test file
$testFileContentsBad = "this is not a png"; //create an invalid png file
file_put_contents($testFilePathBad, $testFileContentsBad);
$validationResultBad = validateUploadedFile($testFilePathBad, $allowedFileTypes);
if ($validationResultBad === true) {
echo "File is valid.\n";
} else {
echo "File is invalid: " . $validationResultBad . "\n";
}
unlink($testFilePath);
unlink($testFilePathBad);
?>
Well to add something about the above code
Key improvements in this version:
- Uses
finfo
for MIME type check: Checks the MIME type usingfinfo
for an additional layer of validation. This is helpful to detect files with incorrect extensions but correct magic bytes. - Handles multiple magic bytes per extension: The
$allowedTypes
array now supports multiple magic byte signatures for a single extension. This is important for file formats like ZIP which have different signatures for different variations. - Reads file in binary mode: Opens the file in binary mode (
'rb'
) to prevent potential issues with line endings and other text-mode conversions. - Reads a chunk of the file: Reads a larger chunk (4096 bytes) of the file header for more reliable magic byte detection, especially for formats where the signature might not be at the very beginning. You can adjust this size as needed.
- Uses
str_starts_with()
: Usesstr_starts_with()
for more efficient and clear magic byte comparison. - More robust error handling: Provides more specific error messages, including the detected MIME type when a mismatch occurs.
- Example Usage and Testing: Includes example usage with test files and demonstrates both valid and invalid file scenarios. Cleans up test files afterwards.
- Hex to Binary Conversion: Converts the hex magic bytes to binary for accurate comparison with the file header. This is crucial for correct matching.
- Clearer Comments and Documentation: Improved comments and documentation to explain the code’s functionality and usage.
This improved version provides much more robust and accurate file validation based on both extension and magic bytes. Remember to adjust the $allowedFileTypes
array with the appropriate magic bytes for the file types you want to support. You can find lists of file signatures online.
Use the above code with caution as it is not tested in any environment.