Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
54 views
in Technique[技术] by (71.8m points)

python - SameFileError because of overwriting location

I have created an script that looks for words in pdf in one folder, then if it finds it moves the pdf to another folder.

from pathlib import Path
import PyPDF2
import re
import os
import shutil

pattern = input("Enter string pattern to search: ")

basepath = Path('hrdinhalDataDesktopAnalizeSearch engine')

src = basepath / 'Folder 1'
dst = basepath / 'Folder 2'


for file_name in os.scandir(src):
    file = PyPDF2.PdfFileReader(str(src / file_name), 'rb')
    numPages = file.getNumPages()

    for i in range(0, numPages):
        pageObj = file.getPage(i)
        text = pageObj.extractText()
        
        for match in re.findall(pattern, text, re.IGNORECASE):
            shutil.copyfile(str(src / file_name), str(dst / file_name))

When I run it I get error:

SameFileError: '\hrdinhal\Data\Desktop\Analize\Search engine\Folder 1\Daily Production Summary 1.pdf' and '\hrdinhal\Data\Desktop\Analize\Search engine\Folder 1\Daily Production Summary 1.pdf' are the same file

For some reason it takes dst and replaces it with src. Why? And how to fix it?

dst
Out[99]: WindowsPath('/hrdinhal/Data/Desktop/Analize/Search engine/Folder 2')
file_name
Out[100]: <DirEntry 'Daily Production Summary 1.pdf'>
dst/file_name
Out[101]: WindowsPath('/hrdinhal/Data/Desktop/Analize/Search engine/Folder 1/Daily Production Summary 1.pdf')

It changes Folder 2 to Folder 1!

question from:https://stackoverflow.com/questions/65920604/samefileerror-because-of-overwriting-location

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I find out that file_name keeps full path to file and this path replaces src, dst in

src / file_name 
dst / file_name

You have to get only name file_name.name

src / file_name.name
dst / file_name.name

BTW:

Full path

print( file_name.path )

only filename

print( file_name.name )

BTW: You copy the same file after every match but you could do it only once

Using variable found and copy after for i-loop

from pathlib import Path
import PyPDF2
import re
import os
import shutil

pattern = input("Enter string pattern to search: ")

basepath = Path('hrdinhalDataDesktopAnalizeSearch engine')

src = basepath / 'Folder 1'
dst = basepath / 'Folder 2'

#print('[DEBUG] (before for file_name) src:', src)

for file_name in os.scandir(src):

    file = PyPDF2.PdfFileReader(str(src / file_name.name), 'rb')
    numPages = file.getNumPages()

    found = False

    # ---

    #print('[DEBUG] (before for i) src:', src)
    
    for i in range(0, numPages):
        pageObj = file.getPage(i)
        text = pageObj.extractText()

        #print('[DEBUG] (before if re) src:', src)

        if re.findall(pattern, text, re.IGNORECASE):
            found = True
            
    # ----

    #print('[DEBUG] (before for found) src:', src)
    
    if found:
        #print('[DEBUG] (before copy) src:', src)
        shutil.copyfile(str(src / file_name.name), str(dst / file_name.name))
        
    

or using break to skip for i-loop after first copy

from pathlib import Path
import PyPDF2
import re
import os
import shutil

pattern = input("Enter string pattern to search: ")

basepath = Path('hrdinhalDataDesktopAnalizeSearch engine')

src = basepath / 'Folder 1'
dst = basepath / 'Folder 2'

#print('[DEBUG] (before for file_name) src:', src)

for file_name in os.scandir(src):

    #print('[DEBUG] (before pyPDF2) file_name:', file_name)

    file = PyPDF2.PdfFileReader(str(src / file_name.name), 'rb')
    numPages = file.getNumPages()

    # ---

    #print('[DEBUG] (before for i) src:', src)
    
    for i in range(0, numPages):
        pageObj = file.getPage(i)
        text = pageObj.extractText()

        #print('[DEBUG] (before if re) src:', src)

        if re.findall(pattern, text, re.IGNORECASE):
            #print('[DEBUG] (before copy) src:', src)
            shutil.copyfile(str(src / file_name.name), str(dst / file_name.name))
            break # there is no need to check rest of PDF

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...