Practice Question 9
In this question, we will learn how to read a file, identify duplicate words, and overwrite the file so it only contains unique words. This is a great logic for data cleaning and optimization.
The Logic (How it works)
Before writing the code, let us first understand the logic behind the solution.
To remove duplicates while maintaining the original order (or just getting unique values), we follow these steps:
-
Read the Data: We open
content.txtand read everything into a list of words using.split(). -
The Unique Filter: We create an empty list (let’s call it
unique_words). -
The Membership Check: We loop through every word. If the word is not already in our
unique_wordslist, we add (append) it. -
The Overwrite: We open the same file again, but this time in write mode (“w”). This clears the old content.
-
Join & Save: We join our list of unique words back into a single string and write it into the file.
Let’s Code as per the above logic :
def removeDuplicateWords():
# Step 1: Read the words from the file
f = open("content.txt", "r")
data = f.read()
wordlist = data.split()
f.close()
unique_words = []
# Step 2: Filter out duplicates
for word in wordlist:
if word not in unique_words:
unique_words.append(word)
# Step 3: Write the unique words back to the same file
f = open("content.txt", "w")
# Join the list into a string separated by spaces
f.write(" ".join(unique_words))
f.close()
# To call the function:
removeDuplicateWords()
Important Notes for Students
This question has a few “traps” that examiners watch for:
-
The “w” Mode: When you open a file in
"w"mode, Python instantly deletes the previous content. That is why we must read the data and close the file first before opening it again to write. -
Order Matters: Using a
listand thenot incheck (as shown above) keeps the words in the order they first appeared. If you don’t care about order, you could useset(wordlist), but most board exams prefer the list method. -
The
.join()Method: This is a clean way to turn a list like['red', 'blue']into a string"red blue". It’s much more efficient than using a loop to write words one by one.
Teacher’s Secret Tip for Exams:
If the examiner says the file contains words like Apple and apple, you should convert each word to lowercase before checking. Otherwise, Python will treat them as different words.
If the examiner wants the result in alphabetical order, you can sort the unique_words list before writing it back to the file.
If the examiner asks you to count how many duplicates were removed, simply subtract the length of the unique list from the original list: removed = len(wordlist) - len(unique_words).
Want to practice more? If you found this helpful, I have compiled a full list of 30 practice problems just like this one. Each question focuses on a different logic to help you become a pro at File Handling.
Click here to view the full 30-Question Practice Set
Exam Special Recommendation: If you are studying in 12th class and preparing for your board exam, then go for the “Computer Science with Python Sample Paper Book“. It contains 3 previous years’ papers and 7 practice papers solved strictly as per the CBSE pattern. Click here to purchase your copy!