OneCeylon logo
OneCeylon
Tech & Careers
NOTEBOOK Engineering · Research · Field notes

The notebook.

Where the engineers and researchers at OneCeylon write about what they are making. No hot takes, no roadmaps — just the work, and what we learned doing it.

RESEARCH 12 April 2026 · 10 min read · By Anjali Wickremasinghe

Teaching SerendAI to read a Sri Lankan pharmacy sign at 9pm.

Sign translation looks easy until the sign is in Sinhala, the photo is blurred, the light is dying, and a visitor in Nuwara Eliya wants to know whether the shop is selling paracetamol or ayurvedic cough syrup. This is the story of how SerendAI went from useless on real travel photos to something a visitor can actually trust — and the preprocessing trick that finally moved the numbers.

1.00 0.75 0.50 0.25 Cloud OCR 0.43 Paddle 0.55 VLM direct 0.68 + dewarp 0.79 + rerank 0.86 End-to-end sign accuracy on SerendSigns-1k higher is better
Figure 1 — Our evaluation across five photo-to-translation configurations on a 1,000-image benchmark.

Every applied ML team eventually runs into the gap between the benchmarks their models were trained on and the photos the world actually hands them. Our gap announced itself in week three of the SerendAI beta, in the form of a polite message from a visitor in Nuwara Eliya. It went, approximately: "I photographed a pharmacy sign to check if they were still open. SerendAI said it was a tailor's shop. I am now walking around in the rain looking for medicine. Please help."

The sign was not a tailor's shop. It was a pharmacy, written in a warm curling Sinhala script that SerendAI had, with admirable confidence, read as something else entirely. The image was dark, shot slightly from below, the letters were printed on a pale blue hoarding that glared under a streetlight, and about a third of one character was obscured by a tree. On an off-the-shelf OCR service, it had — reasonably — failed.

This post is the long version of how we got from there to a system that handles real Sri Lankan travel photos well enough that we can honestly put our product name in front of them. End-to-end sign accuracy moved from 43% to 86% on a benchmark we built ourselves, and along the way we learned several uncomfortable things about the state of multilingual OCR in the wild.

What SerendAI has to do, exactly

SerendAI is the AI travel companion that lives inside oneceylon.space. It answers the kind of questions a visitor does not want to wait on a community for — what is the weather at Adam's Peak, when does the train to Ella leave, how much should this tuk-tuk fare really be, and — the subject of this post — what does this sign say.

Photo translation, specifically, is the pipeline that takes a traveller's phone camera photo and produces a readable answer in their own language. It is four steps: detect where the text is in the image, read it, figure out what language it is in, and translate the whole thing. Get any of those wrong and the next three compound the error. Off-the-shelf tools are built for the clean, front-lit, right-angle photography of major global languages. That is not what a visitor's camera roll looks like at 9pm in Nuwara Eliya.

Why our baseline failed

We started where everyone starts: a well-known commercial OCR service with reported support for Sinhala and Tamil. It was not bad on a flat, well-lit scan. It was, however, quite bad at the thing our users actually do, which is photograph a sign on a wall at night, from across a wet road, one-handed, mid-conversation.

Running a proper evaluation took longer than the modelling work itself. There is no public benchmark for travel-photo text recognition in Sri Lanka. So we built one. We collected 1,000 photographs of real signs from a cross-section of locations — Colombo shopfronts, Ella village stalls, Kandy temple boards, Jaffna restaurants, rural transit stops — with human translations from bilingual annotators in Colombo. We call it SerendSigns-1k internally, and we plan to release a public subset of it later this year.

Three distinct failure modes showed up once we had numbers to look at:

The OCR was trained on a world in which each photograph is a clean document. Our photographs are not documents. They are the reality of being a traveller holding up a phone in a hurry.

What we tried, in order

Attempt 1 — A better multilingual OCR

We swapped the baseline for PaddleOCR with its multilingual detection and recognition models, which have surprisingly decent Sinhala support for a tool that is not widely advertised for it. Accuracy rose from 0.43 to 0.55. A real, honest improvement — but it did not close the gap, and it did not touch the low-light or perspective problems. Clean signs improved more than messy ones.

The lesson we took from this: most of the gains from a better OCR engine come from better models on clean input. If your input is dirty, a better engine alone will not rescue you.

Attempt 2 — Direct vision-language models, and a cautionary tale

The obvious next move in 2026 is to skip the OCR step entirely and ask a multimodal vision-language model to read the sign and translate it in one go. We tried two. Accuracy jumped to 0.68, which is a real improvement, and the output was often phrased more naturally because the model could disambiguate using visual context — the shape of a mortar-and-pestle icon, the green cross of a pharmacy, the red curve of a Coca-Cola sign.

But VLMs have a specific, well-known, and slightly embarrassing failure mode: when they cannot read the text, they do not say so. They confabulate. A blurred Sinhala shopfront would come back as a plausible-sounding but entirely invented translation, delivered with exactly the same confidence as a correct one. For a traveller about to walk into a pharmacy and ask for aspirin, a confidently-wrong answer is strictly worse than a confident don't-know.

We parked the VLM-only approach and went looking for something that would know when it was unsure.

Attempt 3 — Preprocessing: dewarp, denoise, enhance (the thing that worked)

The change that moved the needle was not a new model. It was a preprocessing pipeline that ran before any reading model saw the image. Three stages, each small and cheap:

def prepare_travel_photo(img: Image) -> list[Crop]:
    # 1. detect all text regions using a small detector
    regions = detect_text_regions(img)

    # 2. for each region, dewarp it into a flat rectangle
    flat = [dewarp_perspective(r) for r in regions]

    # 3. in low light, run a lightweight enhancement pass
    if estimate_luminance(img) < LUMINANCE_THRESHOLD:
        flat = [enhance_lowlight(r) for r in flat]

    return flat  # feed these to the reader, not the original photo

Most of the lift came from dewarping. Travel photos are taken from angles, not straight on, and text-recognition models hate anything that is not a clean rectangle. A small Python function that fits a quadrilateral to each detected text region and projects it into a flat crop recovered most of the recognition loss on angled signs.

The low-light enhancement was the second gift. A simple camera-noise-aware contrast boost, applied only when the global luminance is low, pulled evening photos much closer to the accuracy of daytime ones. It is not clever. It is not interesting. But it pushed end-to-end accuracy from 0.68 to 0.79, and the gains were concentrated on exactly the photos where the baseline had been worst — the after-dark ones, the angled ones, the ones a real traveller would actually take.

A principle we keep relearning: when the model is strong but the input is wrong, fix the input.

Attempt 4 — Script-aware reranking with abstention

The last step was the least surprising, and also the one that solved the hallucination problem. For each dewarped text region, we now run two readers in parallel — a specialist OCR that outputs a confidence score per character, and a VLM that outputs a full translation. A small reranker, trained on our benchmark, decides which to trust, and when to abstain.

"Abstain" is the critical word. If neither reader is confident, SerendAI now says so. The traveller sees something like: "I'm not sure what this sign says — try getting a bit closer, or turn on your flash." This is not as satisfying as a confident wrong answer, but it is dramatically more useful.

Accuracy climbed from 0.79 to 0.86, and — more importantly — our rate of confident hallucinations dropped from 14% to under 2%. The tailor-shop-in-Nuwara-Eliya case, when we replayed it against the new pipeline, produced an honest "not sure, please try again" rather than a wrong answer. Which, for the visitor looking for paracetamol in the rain, is what good software looks like.

Configuration Notes Accuracy
Commercial OCR baselineOff-the-shelf cloud API0.43
PaddleOCR multilingualBetter engine, same input0.55
Direct VLM translationHallucinates when uncertain0.68
+ dewarp + low-light enhancePreprocessing pipeline0.79
+ script-aware rerank with abstentionProduction configuration0.86
accuracy
+100%
0.43 → 0.86
low-light
×2.7
after-sunset gain
hallucinations
−86%
14% → <2%
p95 latency
1.4s
acceptable on 4G

Eight things I would tell myself in January

In the order they occurred to us, with minimal editing:

What is next

Two directions. First, we are working on an on-device version of the whole pipeline. A traveller on a spotty 3G signal in the hill country should not have to wait on our servers to know what a sign says, and mobile silicon in 2026 is quietly capable of running a dewarping pass and a small reader locally. We will write about that when it works.

Second, we are opening a six-month paid ML research internship on exactly this problem space. The intern will work directly on SerendAI, co-own part of the SerendSigns benchmark extension, and publish their own piece here under their own name. If the evaluation and vision-language questions in this post are the kind of thing you want to spend half a year inside, we would like to hear from you.

AW
Anjali Wickremasinghe
Research Lead for SerendAI at OneCeylon. Writes about vision-language models, evaluation, and the peculiar joys of building ML for travellers.
Liked this?
We are hiring two people.

Six months of paid applied ML research inside SerendAI, working on exactly the problems in this post. Plus a senior mobile engineer to build OneCeylon's first phone app.

See the two roles →
Also in the notebook
ENGINEERING April 2026 · 11 min read

Putting 5,000 travel questions on a map.

From a 14-second naïve query to 90ms — the engineering story behind OneCeylon's map-based question discovery.

Up next

More, soon.

RESEARCH · COMING MAY

On-device sign translation — getting the whole pipeline onto a mid-range Android phone.

ENGINEERING · COMING JUNE

Day zero: how we are bootstrapping OneCeylon's first mobile app.

FIELD NOTES · COMING JULY

How one visitor in Kandy quietly shaped six months of our roadmap.