Sunday, April 28, 2024
0
rated 0 times [  0] [ 0]  / answers: 1 / hits: 3595  / 3 Years ago, fri, september 3, 2021, 12:10:46

So I was wondering how would you differentiate ppt, xls and doc files from each other in linux regardless of extensions. I tried 'file' but from the looks of it, all of MSOffice files are categorized under the same file type. Similarly I'm having trouble with docx, xlsx and pptx files, since they're essentially all zip files containing a bunch of xml.



I also tried a python script importing the magic module, but no go.



I'm trying to identify the actual file for a sandbox analysis. And for this specific purpose I need to find the actual file type in order to run it in the sandbox vm (the Windows vm runs everything by extension).



Let's say my sample file is labeled as try.exe, but in reality it's just a doc file. My script will rename it as try.exe.doc, which would work fine for doc files. But since linux identifies all MSOffice files as simple DOC files then there's no way to identify ppt or xls files. As a result the sandbox wont' analyze the sample correctly.


More From » microsoft-office

 Answers
5

You can use mimetype command. Example:



mimetype example.ppt
example.ppt: application/vnd.ms-powerpoint


and



mimetype example.doc 
example.doc: application/vnd.ms-word


However, unlike file -i this MIME-type determination is based on looking-up file extensions (.ppt, .doc etc.) from the shared MIME-info database and changing them will change the MIME-types, too.



With extension altered, the only way to know about a file is to look into its file-signature or magic number, which is same for all Microsoft Office documents(D0 CF 11 E0 or DOCFILE0). So any MS-Office files will be detected as same MIME-type.


[#37112] Friday, September 3, 2021, 3 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
impisaso

Total Points: 423
Total Questions: 106
Total Answers: 104

Location: Virgin Islands (U.S.)
Member since Tue, Feb 2, 2021
3 Years ago
;