for page in pages: # OCR Jawi text jawi_text = pytesseract.image_to_string(page, lang='jawi') # Convert to Rumi rumi_text = jawi_to_rumi(jawi_text) print(rumi_text) Note: You need ( jawi.traineddata ). 5. Accuracy & Limitations | Factor | Impact | |--------|--------| | Handwritten Jawi | Very low accuracy | | Old printing / diacritics (harakat) | Moderate | | Modern printed Jawi | High (90%+ with good OCR) | | Loanwords from Arabic | May require manual override | โ Best result: Printed Jawi book / PDF from DBP or textbook . 6. Alternative: Manual Conversion Rule Set If OCR fails, use these core Jawi โ Rumi rules :
| Jawi letter | Rumi | Example | |-------------|------|---------| | ุง | a (initial), drop elsewhere | ุงุจุงุจ โ abab | | ุจ | b | ุจุงุจ โ bab | | ุช | t | ุชููู โ teli | | ุฌ | j | ุฌุงูู โ jalan | | ุฏ | d | ุฏุงุชฺ โ datang | | ุฑ | r | ุฑูู ู โ rumah | | ุณ | s | ุณุงู โ say | | ู | k | ูุงูู โ kaki | | ู | l | ููู โ lim | | ู | m | ู ุงุณ โ mas | | ู | n | ูุงุณู โ nasi | | ู | w or u | ูุงุฌุจ โ wajib, ุจููู โ bulu | | ูู | h | ููุฏฺู โ hidung | | ู | y or i | ูุงุกูุช โ iaitu, ุจุจูุฑู โ biri | | Use case | Best method | |----------|--------------| | Single page / few pages | eJawi OCR + online converter | | Whole PDF book | Python script with Tesseract + jawi-rumi | | Official / publication use | DBPโs manual transliteration guide | | Handwritten Jawi | Skip OCR โ type manually | Jawi Ke Rumi Scan Pdf
for page in pages: # OCR Jawi text jawi_text = pytesseract.image_to_string(page, lang='jawi') # Convert to Rumi rumi_text = jawi_to_rumi(jawi_text) print(rumi_text) Note: You need ( jawi.traineddata ). 5. Accuracy & Limitations | Factor | Impact | |--------|--------| | Handwritten Jawi | Very low accuracy | | Old printing / diacritics (harakat) | Moderate | | Modern printed Jawi | High (90%+ with good OCR) | | Loanwords from Arabic | May require manual override | โ Best result: Printed Jawi book / PDF from DBP or textbook . 6. Alternative: Manual Conversion Rule Set If OCR fails, use these core Jawi โ Rumi rules :
| Jawi letter | Rumi | Example | |-------------|------|---------| | ุง | a (initial), drop elsewhere | ุงุจุงุจ โ abab | | ุจ | b | ุจุงุจ โ bab | | ุช | t | ุชููู โ teli | | ุฌ | j | ุฌุงูู โ jalan | | ุฏ | d | ุฏุงุชฺ โ datang | | ุฑ | r | ุฑูู ู โ rumah | | ุณ | s | ุณุงู โ say | | ู | k | ูุงูู โ kaki | | ู | l | ููู โ lim | | ู | m | ู ุงุณ โ mas | | ู | n | ูุงุณู โ nasi | | ู | w or u | ูุงุฌุจ โ wajib, ุจููู โ bulu | | ูู | h | ููุฏฺู โ hidung | | ู | y or i | ูุงุกูุช โ iaitu, ุจุจูุฑู โ biri | | Use case | Best method | |----------|--------------| | Single page / few pages | eJawi OCR + online converter | | Whole PDF book | Python script with Tesseract + jawi-rumi | | Official / publication use | DBPโs manual transliteration guide | | Handwritten Jawi | Skip OCR โ type manually |