Daily life of a digital nomad: electronic documents and OCR

Spread the love

paperless-ngx

It can be said that most of today’s time was spent on building the electronic document paperless-ngx

Since it was the docker-compose.yml given by chatgpt

According to the description of this artificial idiot, it made several mistakes and it took more than an hour to find the cause of the error and run it normally

Among them, the multi-language OCR setting caused the error
And I am not very satisfied with the OCR recognition effect, so I am also looking for optimization solutions or alternative solutions

ocr

I originally planned to use the source code of paperless-ngx to set up an API to call other OCRs to fill in text content

But the programming method is too complicated, and now I am too lazy to spend time to solve the problem from the source code

After searching for information, I found that PaddleOCR meets my needs, has high recognition, and also found that it has a window exe client software, which can be used directly

I originally planned to deploy it on the system of Orange Pi arm server, and now direct visual recognition is easier

Although manual recognition and copying of the recognized content to paperless-ngx are required, as long as there is accurate text content recognition, one more manual step does not matter

Project address

https://github.com/hiroi-sora/Umi-OCR/releases

The important thing is that it is free, available offline, and open source

For those who cannot program or have low-spec machines, it is very good

Blog message notification

I built the Android message notification ntfy by myself

https://ntfy.sh

It is also open source and free, and can be self-hosted

And does not rely on Google services (GMS-Free)

Notifications can be triggered through APIs, etc., which is suitable for using js fetch in blogs

I have set up message notifications in my blog. When real visitors visit my blog, my phone will receive a notification.

One day has passed. Except for my own visit in Incognito mode, I received no notifications.

Surprisingly, due to incomplete code, I received a message notification from Google crawler.

Sent by Integral Ad Science (IAS) for advertising content analysis, brand safety detection or page content crawling

HomeLab Summary

Let me talk about the server content of my homelab Orange Pi 3B (8GB memory 512GB storage)

I use showdoc (document) frequently every day (the draft of each blog is on this site, as well as other document records)

I use vaultwarden (password management) frequently (combined with Google browser plug-in Bitwarden Password Manager)

Recently, I frequently use the xmind mind map plug-in in nextcloud and the onlyoffice plug-in excel (self-hosted onlyoffice at the same time)

Currently, among the 5 blogs in the blog group, nomad is the main blog, and the others are pending (using wordpress) with data statistics (umami)

And paperless-ngx, which was just deployed today, is used as an electronic archive. It is currently tested with receipts and is OK. Later, I plan to record insurance, banks, restaurants, convenience stores, etc.

There are a total of 7 open source application servers (5 instances of wordpress are deployed and run)

That is, 7 application types and 10 application instances

The above is the usual system data

In addition, aapanel is used as the server panel

FRP is used as a public network proxy (combined with PM2 as a background process manager)

A vultr cloud server is used as a public network ip (so that traffic can be mapped to the local)

That is to say, with only an Orange Pi development board of about $60, all the above services can be deployed

Not only is it free to use, but the data is also in your hands. According to the system status of the dashboard, you can also deploy some lightweight other applications. If you turn off the extra 4 wordpress blog instances, you can continue to add other open source applications, such as the other 3 types

I deployed some services such as minio, memos, wzi, planka, drawio, etc., but they are not turned on. Turn them on when needed

I also have other Orange Pi and Raspberry Pi and an x86 small host of unknown brand, and some other services are deployed

But currently none of them can make money, you can only use it yourself. It is recommended that those who don’t understand technology don’t bother, it’s more time-consuming

Leave a Reply

Your email address will not be published. Required fields are marked *