查看: 531|回复: 1
打印 上一主题 下一主题

[讨论] Crop PDF pages along the spine direction

[复制链接]
  • TA的每日心情
    奋斗
    2018-5-18 05:02
  • 签到天数: 27 天

    [LV.4]偶尔看看III

    78

    主题

    479

    回帖

    8680

    积分

    版主

    Rank: 10Rank: 10Rank: 10

    积分
    8680

    管理组专用章

    跳转到指定楼层
    1
    发表于 2018-12-15 22:58:03 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
    本帖最后由 GL_n 于 2018-12-16 02:24 编辑
    Sometimes we need to split or crop double-arranged PDF’s pages along the spine direction (i.e., the middle line) into two new pages, for example, two A4 book pages scanned into a A3 paper, or two A5  book pages arranged in a A4 paper, and so on. Well, there exist some cutting tools to fill the need of cutting double-up pages. However, this cutting/cropping procedure is not automatic, and the repeated cropping-operation is somewhat tedious.

    How to get the repeated cropping-manipulations done automatically? Writing a python program is most likely a good choice for such assignment. The following coding is one of such right examples for working well towards cropping double-arranged PDF’s pages, no matter what the page format is.

    1. #coding=utf-8
    2. from PyPDF2 import PdfFileWriter, PdfFileReader
    3. from copy import copy
    4. from os import listdir
    5. import math

    6. def op(pdfInputFileName):
    7.    
    8.     pdfFileObj = open(pdfInputFileName, 'rb')  
    9.     pdfReader = PdfFileReader(pdfFileObj)
    10.     pdfWriter = PdfFileWriter()

    11.     for page in [pdfReader.getPage(i) for i in range(pdfReader.getNumPages())]:
    12.         p = page
    13.         q = copy(p)
    14.         q.mediaBox = copy(p.mediaBox)

    15.         x_1, x_2 = p.mediaBox.lowerLeft
    16.         x_3, x_4 = p.mediaBox.upperRight

    17.         x_1, x_2 = math.floor(x_1), math.floor(x_2)
    18.         x_3, x_4 = math.floor(x_3), math.floor(x_4)
    19.         x_5, x_6 = math.floor(x_3/2), math.floor(x_4/2)

    20.         if x_3 < x_4: # If your scanned page is normally presented in Adobe Acrobat this "if" statement can be deleted.
    21.             p = p.rotateClockwise(90)
    22.             q = q.rotateClockwise(90)
    23.             

    24.          if x_3 > x_4: # For editable page
    25.                             # vertical cropping along Y-axis(x_5 direction, i.e., cutting X-axis)

    26.              p.mediaBox.lowerLeft = (x_1, x_2) # Left part of two-page-rectangle
    27.              p.mediaBox.upperRight = (x_5* 105/100, x_4)

    28.              q.mediaBox.lowerLeft = (x_5* 95/100, x_2)
    29.              q.mediaBox.upperRight = (x_3, x_4) # Right part of two-page-rectangle
    30.             
    31.          else: # For image page
    32.                # vertical cropping along X-axis(x_6 direction, i.e., cutting Y-axis)
    33.             
    34.              p.mediaBox.lowerLeft = (x_1, x_2)
    35.              p.mediaBox.upperRight = (x_3, x_6* 105/100) # Left part of two-page-rectangle

    36.              q.mediaBox.lowerLeft = (x_1, x_6* 95/100)
    37.              q.mediaBox.upperRight = (x_3, x_4) # Right part of two-page-rectangle

    38.         pdfWriter.addPage(p)
    39.         pdfWriter.addPage(q)

    40.     pdfOutputFileName = pdfInputFileName[:-4]+'-cut_myself_revised.pdf'
    41.     pdfOutputFile = open(pdfOutputFileName, 'wb')  
    42.     pdfWriter.write(pdfOutputFile)
    43.     pdfFileObj.close()
    44.     pdfOutputFile.close()
    45.    
    46. # Accomplish the whole execution of a series of PDF-cropping (both editable pages and image pages) automatically in current directory.
    47. for pdfInputFileName in listdir('.'):
    48.     if pdfInputFileName[-4:]=='.pdf' or pdfInputFileName[-4:]=='.PDF':
    49.          op(pdfInputFileName)


    复制代码


    该用户从未签到

    0

    主题

    52

    回帖

    133

    积分

    禁止发言

    积分
    133
    2
    发表于 2020-2-3 15:05:30 | 只看该作者
    Thanks for your great work.