AI helps add 10k more photos to OldNYC

(danvk.org)

142 points | by evakhoury 3 days ago ago

53 comments

TrackerFF 2 days ago ago
As long as they're not GenAI altered photos, I'm cool with these things.
I'm a pretty avid member of various history groups, and one thing that has absolutely driven me nuts for the past couple of years is how many people there are that use AI for upscaling and colorization of photos - not knowing or noticing how the models fundamentally alter the photos. A couple of zooms in on the photo, and it is nightmare fuel.
A week ago me and some members spent a couple of hours trying to find a building from the early 1900s, because someone had uploaded a photo and asked about the building. Sifted through old maps, newspapers, etc. but couldn't find anything. Turns out said photo had been upscaled via AI, which in turn had added some buildings here and there.
But, yeah, for stuff like OP posted it could work out nicely.
[-]
- mikeyouse 2 days ago ago
  Likewise. There’s this older woman who is trying to add some historical color to our local beach town FB group by using some terrible AI tool to colorize pictures from the early 1900s. She doesn’t accept any feedback that it’s problematic to share what are essentially fake pics in that way.. they often just randomly remove people, or add new ones. Buildings are changed, cars are remodeled, it’s crazy how different the before/after are. The comments are usually split as well, but I absolutely loathe how AI is used there. She means well, but the tools are so bad for this and so poorly explained.
  One random example of a before/after: https://imgur.com/a/WIAYLHm
  [-]
  - Morromist 2 days ago ago
    I was looking for photos of NYC in the 1990s a few weeks ago. I eventually found some, but my search was greatly obstructed by AI photos of NYC in the 1990s.
    The experiance made me certain that AI is going to to much more harm than good to the buisness of archiving historical photos.
    As for the lady who is distorting photos to colorize them - I don't even understand why you would want to do that. There are other ways!
    [-]
    - ok123456 2 days ago ago
      Maybe she just thinks it's cool? It's hardly the worst use of AI on Facebook.
      [-]
      - Morromist 2 days ago ago
        yeah, you're right. That's why she's doing it. But its a weird idea: I like this historical photo, so I'm going to distort in order to add color, which makes it not a historical photo anymore. I guess to her the distortion is so minimal it loses nothing, but to me it loses everything.
        Its like saying "I love Da Vinci's art so I'm going to draw a moustache on everyone in the last supper" which you probably wouldn't do if you really loved Da Vinci's art.
        [-]
        tux1968 2 days ago ago
        There are some pretty obvious distortions when you closely look at the difference between the historical and AI-corrupted images. But I have to admit, the colorized one has a nice vibe to it, if you don't look too closely it gives a really nice feel for what the moment was actually like, more than the accurate black-and-white.
        Which is to say, I think it comes down to what you value most out of historical photos; a forensic record of truth, or general idea of what it was like to live at the time, compared to today.
        [-]
        butlike 15 hours ago ago
        The photo is oversaturated and psychedelic. It seriously looks like what the world looks like on a dose of drugs. I much prefer the black and white one. They're both unreal in their "same same, but different" ways
        SoftTalker a day ago ago
        No no, those are color photographs. The world was black and white back then.
        torben-friis 2 days ago ago
        I'm firmly against uncontrolled AI use. But as long as the edits are strongly labeled, I have to say I enjoy the effect.
        Maybe it's because I'm too young and I've never had B&W content around, but the edited picture allows me to feel the photograph as real, as a place I could have walked around, which I can't really do with the original. I find that effect more valuable than a specific roof being deformed or whatever.
        [-]
        mikeyouse 21 hours ago ago
        The effect bugs me personally mainly because the cars are implausible colors, there are a ton of small changes to e.g. the windows on the campers etc. But even more annoyingly, most of her posts are just the color photos without even the source pic. She clearly enjoys it, and many people in the comments do too, but I just have this existential dread that those will be slurped up in the next AI push and treated as historical truth in the future.
        2 days ago ago
        [deleted]
        rexpop 2 days ago ago
        > If you really loved Da Vinci's art.
        Meh, so what if I only love Da Vinci's art to the degree that it's amusing to adulterate with mustaches?
        [-]
        butlike 15 hours ago ago
        Then you pass both the original and the mustache'd photo across the table while boisterously announce: "look how absurd it is to love something so wholly and completely!" to the room instead of the person the photographs were passed to!
        Morromist 2 days ago ago
        Huh. I didn't consider that.
  - tux1968 2 days ago ago
    It would be nice if every upsampled image (done with AI or otherwise) contained a copy of the source image in its metadata.
  - flir 2 days ago ago
    You could always one-up her by animating them.... maybe add Godzilla in the distance occasionally.
    (Provenance is so important. The infinitely-recopied local history photos were never a great source anyway).
  - tux1968 2 days ago ago
    In the same way, so many current cameras (mostly phones) that do automatic post-processing of images, up to and including AI, is going to lessen their future archeological value.
    [-]
    - z3c0 2 days ago ago
      I'm reminded of Samsung's "AI moon" debacle and how divided people were over it. At the end of the day, any photos with so many unknown variables wouldn't suffice for scientific purposes.
      [-]
      - userbinator a day ago ago
        Nor should they be admissible as evidence in court.
  - renewiltord a day ago ago
    Okay that before/after is fantastic. Really shows how normal the past is. No wonder she keeps doing it. It must be pretty good for her to be able to remember those moments. I love it!
- raffraffraff a day ago ago
  Yep, these models are all trash. They happily invent wrong detail. If you never knew anyone in the photograph, then knock yourself out, let it invent faces that didn't exist. But if you're doing anything with family photographs just stop. Unless you can tune a model on your own family photographs you can't magically add "correct" detail to a blurred, pixelated, grainy or unfocused photo. You can add colour, pretty reliably though.
- arctic-true 2 days ago ago
  Do you have any recommendations for colorization tools? I agree that all of the popular image models subtly tweak faces, it is very uncanny when working with pictures of people I knew before they passed. In a pre-GPT age, there were some good but not great colorization tools, and as far as I can tell you can’t get better-than-2020 performance unless you’re willing to get your expression adjusted or your eyebrows redone.
  [-]
  - red75prime a day ago ago
    > there were some good but not great colorization tools
    I've seen the-grass-is-green-the-clothes-are-beige tools. Was there anything better than that?
crazygringo 2 days ago ago
It really says something about the current state of affairs that after reading the headline, my first thought was oh god no, the photos are probably all hallucinated...
But it's actually really cool how they used AI to better determine the locations of the photos. I love this!
[-]
- NoSalt 2 days ago ago
  Same ... sort of. I thought it was going to extol the virtues of Vibe Coding. I am quite happy to be "disappointed".
thadt 2 days ago ago
AI had been a super useful for processing historical data. Interviewed a volunteer last month from the diary archive in Germany, and they're using supervised AI for diary transcription. Going from (old) personalized hand script to text is a lot of work, even for experienced transcribers. Being able to automate the first pass of that has been a huge boon to their processing pipeline.
[-]
- butlike 14 hours ago ago
  Can you please explain to me how using AI as a "first pass" (in any context) doesn't simply make the second pass more lazy?
  If my name is associated with the first pass, and I get it wrong, there's a gravity to that since my name's attached. If I use an AI for the first pass, get it wrong, and my names still attached... my name takes a hit, BUT, my guilt and desire to improve is absolved a little bit by the existence of the AI tool taking on the first pass. After all, it wasn't me who got it wholly and completely wrong, it was the AI. Next time I'll be more careful, right? Rinse and repeat.
- superxpro12 2 days ago ago
  Can you go a bit deeper on this?
  If the risk of mistranslation is high, I fail to comprehend how letting AI "take a swing at it" does not reduce the translation quality?
  How are they ensure no drop in translation quality?
  [-]
  - thadt 2 days ago ago
    They're doing transcription, not translation - so, turning someones pages of scrawled script into typewritten text. They have around 20 people nationwide that are able to do this. Most of them are older volunteers who aren't all that interested in computer assistance, but about a third of them have started leveraging the newer AI tools and it has accelerated their throughput significantly.
    Having a 'best guess' at the lettering is really handy - in some cases the writing is really rather difficult to make out at all. Even being able to run something as simple as frequency analysis on stroke patterns would be a massive benefit.
    At this point they're becoming throughput bound on the scanning process. Diaries are digitized since the archive is in one place and their transcription experts are spread out over the country.
    [-]
    - shagie a day ago ago
      As a profession (and under time constraints) ... Tom Scott : How the US Postal Service reads terrible handwriting - https://youtu.be/XxCha4Kez9c
      Part of the story is that the OCR that is handling hand lettered addresses.
      I also chuckled at the cursive letter recognition sheet on the side of the cube.
- jmyeet 2 days ago ago
  I hadn't considered or read about this problem before but it makes sense.
  It reminds me of the cuneiform problem. Between 500,000 and 1 million tablets have been collected. This is one of the earliest preserved writing systems. Even so, fewer than 10% of these tablets have been translated. I was surprised to learn this but it makes sense. There are several problems:
  1. Scribes used a lot of shorthand;
  2. Cuneiform itself changed over time;
  3. Writers would use multiple languages (eg Sumerian, Akkadian), even on the same tablet. There are relatively few people fluent in these languages, particularly in multiple of them at once;
  4. To some extent the tablets are 3D such that a 2D photo might not be sufficient to translate because you might need to physically turn the tablet to accurately see the marks; and
  5. In some cases the tablets are incomplete or broken so you may not to figure out how things fit together.
  I wonder if AI can help make inroads into this 90%. I really wonder what is waiting to be unearthed.
  [-]
  - pc86 2 days ago ago
    Lots of 4,000 year old complaints about copper, I would imagine.
brrrrrm 2 days ago ago
I checked 3 spots I'm familiar with and 1 is wrong
https://www.oldnyc.org/#707133f-a this is supposed to be here https://www.oldnyc.org/#702487f-a
also, if folks are interested in these old depictions of NYC, check out https://1940s.nyc/ as well!
[-]
- danvk 2 days ago ago
  (Author here) IIUC you're saying that 707133f-a should be at 5th Ave & 9th Street, not 5th Ave & Union Street? Can you say more about why? The text on the back of the first image says "Union St. Station, 5th Ave," which is how it winds up at there. On the other hand, the NYPL page[1] titles the image "Union St. - 18th St."
  (I briefly got excited that there might be a street sign _in_ the photo, but if you zoom way in it says "DENTIST")
  +1 to 1940s.nyc. Very different photos — those are were taken for tax assessment, the ones on OldNYC were taken to document the city as it changed. The photographer had an arrangement where he'd get tips from demolition crews, and go shoot buildings before they were gone forever.
  [1]: https://digitalcollections.nypl.org/items/5a5e06a0-c539-012f...
  [-]
  - xmm a day ago ago
    I'm pretty sure both are correct at Union St and 5th Ave. The Manhattan Savings Bank building (left edge in both photos) is still there, and fairly distinct.
  - brrrrrm a day ago ago
    you're right, this is actually correctly placed! I was confusing the orientation. I live right around there and recognize the M&T bank in the photo on the left, so it can't be down by 9th
- gdulli 2 days ago ago
  An elephant in the room is that if you have too much data to process without AI, you have too many results to check for correctness when they come out of the AI.
  This has been true since before LLMs, but now so many more people and use cases are enabled so much more easily. People are undisciplined and quick to take short term gains and handwave the correctness.
  [-]
  - crote 2 days ago ago
    It is less of a problem if the output is explicitly marked as AI-generated and unverified, so people can treat it as a rough first draft. But mix AI output with well-vetted human-reviewed data, and you've basically made your entire data set worthless.
- lapetitejort 2 days ago ago
  Some other historical street view websites:
  https://yesterdays.maprva.org/#11.5/37.5438/-77.4392 (specific to Richmond, Virginia, however deployments for other areas are in work)
  https://pastvu.com/
  [-]
  - jfil 16 hours ago ago
    I'm a big fan of Pastvu: go yo the "gallery" view, choose one of the "-stan" former soviet republics, set the date filter yo 1986-1996 and enjoy nostalgia from a parallel world.
ComputerGuru 2 days ago ago
I have mixed feelings about this. It's absolutely phenomenal that such a treasure trove was unlocked thanks to AI, but presenting the AI results are "definitive" (even with an "edit" or "report" feature that's applied equally to human-located and AI-located results) isn't really a win. The old dataset might have been incomplete, but where locations were determined, they were a result of a (probably neural/autistic/ocd) human contributor that had some measure of true confidence in the results. AI contributions are great, but imho they should never be allowed to freely mix with and dilute human contributions: the resulting dataset is permanently polluted.
Ideally they'd always carry an "AI-generated" flag (in the db and in the frontend) until manually reviewed (or never) by a human. If anything, this is actually in AI proponent's favor as it would let you periodically regenerate or cross-validate (a subset of) the AI contributions some years down the line when newer and better models are released!
[-]
- danvk 2 days ago ago
  (author here) Just to be clear, none of the photos were ever human-located. The system this replaced was, roughly, regular expression + Google Maps geocoding API. The only photos located by hand were the ~200 I used for my test set: https://github.com/danvk/oldnyc/blob/master/data/geocode/out...
  [-]
  - ComputerGuru 2 days ago ago
    Thanks for the clarification!
- merlin1de 2 days ago ago
  [flagged]
joshuamcginnis 2 days ago ago
As someone with a massive collection of antique postmarked postcards (probably the largest in the world for a particular city), this is very helpful and encouraging for getting my collection online.
jassyr a day ago ago
I love this! Thanks for sharing. My job involves reviewing old maps and documents, and I have a special place in my heart for easily accessible archives.
So much cool stuff is freely available at libraries but in practice no one visits them anymore.
AIorNot 2 days ago ago
Its funny that author posted a very cool use of AI to help filter/organize and OCR hard to read text about a large photoset and built a great way to visualize his ongoing project with a lot of innovation and cool output..
But the majority of the commnents (including the top comment) on this thread are about how bad AI Images are and how bad AI is in general, how it is altering history etc -when the author didn't even do any of that in his post
It shows the mindset of the community these days more so than the technology.
[-]
- fleroviumna 2 days ago ago
  [dead]
BlueRock-Jake 2 days ago ago
This is pretty sweet. Funny seeing all the dots circling around New York and then abruptly stopping at Jersey City.
dorolow 2 days ago ago
Very cool! I am surprised at the use of 4o, but I guess it was pretty good at OCR for its time
mskogly 2 days ago ago
Super cool project, nice work.
CrzyLngPwd 2 days ago ago
If the images are "edited by AI" then they are not. They are prompted by the source image, but a new image is generated.
I haven't seen an "AI edited" image that hasn't changed important details, and so the result is just yet more slop.
[-]
- leviathant 2 days ago ago
  Please respond to what's linked, rather than make assumptions based on the headline.