Skip to main content

Malerhände

This is about the differntiation of illustrator in the Wenceslas bible.

Features

Face detection

  • deepface
    • dlib
    • mtcnn
    • mediapipe
    • opencv
    • retinaface
    • ssd
    • yolov8
    • yunet
  • insightface (also reimplemnted by deepface retinaface, but taht is worse)

Only insightface does properly report the face landmarks. It's also the best performing so we use that.

This topic is described in paper_humanities_wenzelfacedetection.

Face comparison features

  • VGG-Face
  • Facenet
  • Facenet512
  • OpenFace
  • DeepID
  • ArcFace
  • Dlib
  • SFace
  • GhostFaceNet

Other comparison features

  • ccv
  • lbp
  • lpips
  • hu moments

Data

Face are detcted with insightface (Tiles with 3000 tilesize, scale factor 1, and rotation factor 0° and ±45°) and face landmarks (for alignment are stored). Tags come from the groundtruth if it can be matched with detected faces.

Transform

Derived from that we have two basic extractions, normal (no tag) and transformed (t).

normal transformed
nc-112_istr45-00000029-008_96x112+7+0_Wenzel-FR.jpg nct-112_istr45-00000029-008_75x87+20+6_Wenzel-FR.jpg

Transforms are certainly helpful for face comparison but should also be helpful to non-face features which are not rotation invariant. The drawback is that the face detection algorithm should provide face landmarks with which to align the faces. Not all of them provide face landmarks so this prevents the use of some face detectors, although the current best from the prior paper, insightface, does provide face landmarks.

Context

This is the additional image information around the face which is included in the extracted image, either with 50 pixel context on each side from the source image (c50) or no additional context (nc), see above for nc examples.

c50 c50t
c50-112_istr45-00000029-008_58x67+26+22_Wenzel-FR.jpg c50t-112_istr45-00000029-008_45x52+34+26_Wenzel-FR.jpg

This is used for face comparison, as face algorithm often have an assumption about face to image ratios which the images without context might exceed. For texture features the extra context is likely detrimental as it will change even if the person does not.

First Evaluation

The basic idea is this, if a method can not differentiate between two figures (Wenzel vs. Bademagd) then it is useless in trying to differntiate between potential painters of a single figure (wenzel).

As an attempt for clarify what is expected here is a quick experiment using real faces (as we know face recognition works on them). There are two classes, one is a composed of 6 faces from a single person (Yassar Arafat) and one is composed of two persons with 3 images each (Zico and Zoran Djindjic). Between the two clases will be a real imposter comparison. The single person class (ya) has true genuine comparisons and the two person class (zz) has genuine as well as imposter comparisons.

fiw-baseline-VGG-Face.jpg

These are the expected outcomes for these classes. The single person class should be well separated from the imposter distribution and the mixes class should span the range. This is a bit of an 'optimal' scenario, but what should be visible is a distribution outside the imposter distribution.

The bademagd vs. wenzel is certainly a proper imposter distribution. The bademagd class might well be, as it is not intended to be the likeness of a single person. The Wenzel class should be either genuine fi the by the same artist, as the artist endeavours to depict a single (idealized) person. In the case of mulitple painters the version of Wenceslas bewtween painters might well differ and we would get a mixed class, which should non-the-less show a distribution spanning outside the imposter range.

When it comes to the use of non-face features we can have a quick look at lpips (the best texture feature from this test) on the real images test.

fiw-baseline-lpips.jpg

This distribution is a lot less nice but in a way reflects reality better, as the imposter and genuine distributions overlap. The basic distributions expalined above however are still very much there. This also showcases that these texture features can differentiate faces.

Non-face features

First let's look at the non-face recognition feature comparison. The expectation is that the transform may help, and that context does not help.

The display is bademagd on top (blue), then imposter comparisons (bademagd vs. wenzel in orange) and then wenzel at the b ottom (green).

Method c50 c50t nc nct
ccv c50-112-ccv.jpg c50t-112-ccv.jpg nc-112-ccv.jpg nct-112-ccv.jpg
lbp c50-112-lbp.jpg c50t-112-lbp.jpg nc-112-lbp.jpg nct-112-lbp.jpg
lpips c50-112-lpips.jpg c50t-112-lpips.jpg nc-112-lpips.jpg nct-112-lpips.jpg
hu c50-112-hu.jpg c50t-112-hu.jpg nc-112-hu.jpg nct-112-hu.jpg

Overall all comparisons are very similar in that the genoine and imposter distributions are basically the same. In other words these methods are poor at differentiating the figures in the Wenceslas bibel. The usage of transforms or context slightly changes the distributions but the aforementioned statment still holds true.

Face features

This is from older data which is basically nc and nct. The new stuff runs very slowly, no idea why

Methodncnct
arcfacetight-nc-112-arcface.jpgtight-nct-112-arcface.jpg
dlibtight-nc-112-dlib.jpgtight-nct-112-dlib.jpg
ghostfacenettight-nc-112-ghostfacenet.jpgtight-nct-112-ghostfacenet.jpg
vgg facetight-nc-112-vggface.jpgtight-nct-112-vggface.jpg